

```
# This is formatted as code
```

**MongoDB Assignment **

---


# **Theoretical Questions**

---







1. What are the key differences between SQL and NoSQL databases?
    
    ---> SQL databases are relational, use structured tables with fixed schemas, and rely on SQL for querying. They’re great for structured data and complex joins. NoSQL databases, like MongoDB, are non-relational, use flexible schemas (e.g., JSON-like documents), and handle unstructured or semi-structured data. NoSQL supports horizontal scaling and is better for big data and real-time applications, while SQL databases are more suited for transactional systems with rigid schemas.



2. What makes MongoDB a good choice for modern applications?
    
    ---> MongoDB is great for modern apps because it uses a flexible document model, allowing easy schema changes. It supports horizontal scaling through sharding, handles large datasets, and provides high availability via replication. Its JSON-like structure makes it developer-friendly, especially for web and mobile apps, and it integrates well with modern frameworks. Plus, MongoDB Atlas simplifies cloud deployment.



3. Explain the concept of collections in MongoDB?

    ---> A collection in MongoDB is like a table in SQL but without a fixed schema. It’s a group of documents (JSON-like data) that store related data. Each document in a collection can have different fields, making it flexible for dynamic data. Collections are stored in a database and don’t enforce a predefined structure, unlike SQL tables.



4. How does MongoDB ensure high availability using replication?

    ---> MongoDB uses replica sets to ensure high availability. A replica set is a group of servers (nodes) with one primary node handling writes and multiple secondary nodes replicating the primary’s data. If the primary fails, a secondary is automatically elected as the new primary, minimizing downtime. Replication ensures data redundancy and fault tolerance.



5. What are the main benefits of MongoDB Atlas?
    
    ---> MongoDB Atlas is a cloud-hosted database service that simplifies setup and management. Its benefits include automated backups, easy scaling, built-in security (like encryption and authentication), and global distribution for low-latency access. It also offers monitoring tools and supports multiple cloud providers like AWS, Azure, and Google Cloud.



6. What is the role of indexes in MongoDB, and how do they improve performance?

    ---> Indexes in MongoDB are data structures that store a small portion of the dataset in an easy-to-search format. They improve query performance by allowing MongoDB to find data faster, reducing the need to scan entire collections. For example, an index on a frequently queried field like “user_id” speeds up searches and sorts.



7. Describe the stages of the MongoDB aggregation pipeline?

    ---> The aggregation pipeline is a framework for processing data in MongoDB. It consists of stages like $match (filters documents), $group (groups data and performs calculations like sum or average), $sort (orders results), $project (reshapes documents), and $lookup (joins data from another collection). Each stage transforms the data and passes it to the next.



8. What is sharding in MongoDB? How does it differ from replication?

    ---> Sharding is splitting a large dataset across multiple servers (shards) to handle scalability. Each shard holds a subset of the data, improving performance for large datasets. Replication, on the other hand, copies the same data across multiple nodes for redundancy and high availability. Sharding focuses on scaling, while replication focuses on reliability.



9. What is PyMongo, and why is it used?
   
    ---> PyMongo is a Python library for interacting with MongoDB. It allows developers to connect to a MongoDB database, perform CRUD operations (create, read, update, delete), and run queries or aggregations from Python code. It’s used because it simplifies MongoDB integration in Python applications, like web apps or data analysis tools.



10. What are the ACID properties in the context of MongoDB transactions?

    ---> ACID stands for Atomicity (all operations in a transaction succeed or fail together), Consistency (transactions maintain data integrity), Isolation (transactions are independent and don’t interfere), and Durability (committed transactions are permanently saved). MongoDB supports ACID transactions in replica sets and sharded clusters, ensuring reliable multi-document operations.



11. What is the purpose of MongoDB’s explain() function?

    ---> The explain() function in MongoDB provides details about how a query is executed, like which indexes are used, how many documents are scanned, and the execution time. It helps developers optimize queries by identifying performance bottlenecks, such as missing indexes or inefficient operations.



12. How does MongoDB handle schema validation?

    ---> MongoDB allows schema validation using JSON Schema rules defined when creating or updating a collection. You can specify required fields, data types, or constraints (e.g., minimum/maximum values). If a document doesn’t meet the rules during insert or update, MongoDB rejects it, ensuring data consistency while keeping flexibility.



13. What is the difference between a primary and a secondary node in a replica set?

    ---> In a MongoDB replica set, the primary node handles all write operations and accepts read requests. Secondary nodes replicate the primary’s data and can serve read requests but don’t accept writes. If the primary fails, a secondary can be elected as the new primary to maintain availability.



14. What security mechanisms does MongoDB provide for data protection?

    ---> MongoDB offers authentication (user credentials), role-based access control (RBAC) to limit permissions, encryption (data at rest and in transit), and auditing to track database activity. It also supports TLS/SSL for secure connections and integration with external systems like LDAP for enterprise security.



15. Explain the concept of embedded documents and when they should be used?

    ---> Embedded documents are sub-documents nested within a MongoDB document, like a JSON object inside another. They’re used when data is closely related and frequently accessed together, like a user’s address within a user document. This reduces the need for joins and improves performance for read-heavy applications.



16. What is the purpose of MongoDB’s $lookup stage in aggregation?

    ---> The $lookup stage in MongoDB’s aggregation pipeline performs a left outer join, combining documents from one collection with matching documents from another based on a specified field. It’s useful for linking related data, like fetching user details for an order, without embedding everything in one document.



17. What are some common use cases for MongoDB?

    ---> MongoDB is used for applications like e-commerce (storing product catalogs), content management (flexible schemas for blogs), IoT (handling sensor data), real-time analytics (fast aggregations), and social media platforms (user profiles and posts). Its flexibility and scalability suit dynamic, high-traffic systems.



18. What are the advantages of using MongoDB for horizontal scaling?

    ---> MongoDB supports horizontal scaling through sharding, distributing data across multiple servers to handle large datasets and high traffic. This improves performance, allows adding servers as needed, and avoids the limitations of vertical scaling (upgrading a single server), making it cost-effective and efficient.



19. How do MongoDB transactions differ from SQL transactions?

    ---> MongoDB transactions support multi-document ACID operations but are designed for a document model, often requiring fewer joins than SQL due to embedded documents. SQL transactions operate on rigid, normalized tables and rely heavily on joins. MongoDB transactions are newer (since version 4.0) and optimized for distributed systems.



20. What are the main differences between capped collections and regular collections?

    ---> Capped collections have a fixed size, automatically overwriting old data when full, like a circular buffer. They’re great for logs or caching. Regular collections have no size limit and store data indefinitely. Capped collections don’t allow document updates that increase size or deletions, unlike regular ones.



21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?

    ---> The $match stage filters documents in the aggregation pipeline based on conditions, like selecting documents where “age > 18”. It reduces the dataset early in the pipeline, improving performance by passing only relevant documents to subsequent stages like $group or $sort.



22. How can you secure access to a MongoDB database?

    ---> To secure MongoDB, enable authentication to require user credentials, use role-based access control (RBAC) to restrict permissions, enable TLS/SSL for encrypted connections, and use firewall rules to limit network access. Regularly update passwords and monitor access with auditing.



23. What is MongoDB’s WiredTiger storage engine, and why is it important?

    ---> WiredTiger is MongoDB’s default storage engine, offering high performance, compression, and efficient data storage. It supports document-level concurrency, reducing contention during writes, and provides crash recovery and durability. Its efficiency and scalability make it critical for modern MongoDB deployments.



# **Practical Questions**

---



1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB?

In [None]:
import pandas as pd
from pymongo import MongoClient

# Load the CSV file
df = pd.read_csv("superstore.csv", encoding="ISO-8859-1")
data = df.to_dict(orient="records")

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

# Optional: Clear old data
orders_collection.delete_many({})

# Insert data into collection
orders_collection.insert_many(data)
print("✅ Data inserted successfully!")


2. Retrieve and print all documents from the Orders collection?