### 1. What are the key differences between SQL and NoSQL databases?
- **SQL**: Relational, structured tables, fixed schema, uses SQL queries, supports ACID transactions.  
- **NoSQL**: Non-relational, flexible schema, stores data as documents/JSON, optimized for scalability.


### 2. What makes MongoDB a good choice for modern applications?
- Flexible schema for quick changes.  
- High scalability (horizontal scaling).  
- Supports JSON-like documents.  
- Strong community and cloud support.


### 3. Explain the concept of collections in MongoDB.
- A **collection** is a group of MongoDB documents, similar to a table in SQL.  
- Documents in a collection can have different fields.


### 4. How does MongoDB ensure high availability using replication?
- Uses **replica sets**: multiple servers store the same data.  
- One **primary** node handles writes, **secondary** nodes replicate data.  
- Automatic failover if the primary goes down.


### 5. What are the main benefits of MongoDB Atlas?
- Fully managed cloud service.  
- Automatic backups, scaling, and monitoring.  
- Multi-cloud and global distribution.


### 6. What is the role of indexes in MongoDB, and how do they improve performance?
- Indexes speed up query execution by avoiding full collection scans.  
- Similar to an index in a book for quick lookup.


### 7. Describe the stages of the MongoDB aggregation pipeline.
- **$match** – Filters documents.  
- **$group** – Groups documents and performs aggregations.  
- **$project** – Selects and reshapes fields.  
- **$sort** – Sorts documents.  
- **$limit / $skip** – Limits or skips documents.


### 8. What is sharding in MongoDB? How does it differ from replication?
- **Sharding**: Splits data across multiple servers for scaling.  
- **Replication**: Copies the same data to multiple servers for availability.


### 9. What is PyMongo, and why is it used?
- **PyMongo** is a Python library for interacting with MongoDB.  
- Used to perform database operations from Python code.


### 10. What are the ACID properties in the context of MongoDB transactions?
- **Atomicity** – All or nothing.  
- **Consistency** – Data remains valid.  
- **Isolation** – Transactions don’t interfere.  
- **Durability** – Data is saved permanently.


### 11. What is the purpose of MongoDB’s explain() function?
- Shows how MongoDB executes a query.  
- Helps in performance optimization.


### 13. What is the difference between a primary and a secondary node in a replica set?
- **Primary**: Accepts writes and reads (if enabled).  
- **Secondary**: Copies data from the primary, can serve read requests.


### 14. What security mechanisms does MongoDB provide for data protection?
- Authentication and authorization.  
- Role-based access control.  
- Data encryption (at rest and in transit).


### 15. Explain the concept of embedded documents and when they should be used.
- Documents inside other documents.  
- Good for storing related data together to avoid joins.


### 16. What is the purpose of MongoDB’s $lookup stage in aggregation?
- Performs a join between collections.  
- Combines documents from different collections.


### 17. What are some common use cases for MongoDB?
- Real-time analytics.  
- Content management systems.  
- IoT data storage.  
- E-commerce catalogs.


### 18. What are the advantages of using MongoDB for horizontal scaling?
- Distributes data across multiple servers.  
- Handles large data and traffic efficiently.  
- Allows adding more servers easily.


### 19. How do MongoDB transactions differ from SQL transactions?
- **SQL**: Transactions are common for all operations.  
- **MongoDB**: Multi-document transactions are supported but less frequent; designed for document-level operations.


### 20. What are the main differences between capped collections and regular collections?
- **Capped**: Fixed size, overwrites oldest data when full.  
- **Regular**: Grows dynamically without overwriting.


### 21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?
- Filters documents based on conditions.  
- Similar to SQL’s WHERE clause.


### 22. How can you secure access to a MongoDB database?
- Enable authentication.  
- Use role-based permissions.  
- Enable TLS/SSL encryption.  
- Restrict network access.


### 23. What is MongoDB’s WiredTiger storage engine, and why is it important?
- Default storage engine for MongoDB.  
- Supports compression, concurrency, and high performance.


In [1]:
# 1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB
import pandas as pd
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
orders_collection = db["Orders"]

# Load CSV
df = pd.read_csv("superstore.csv")

# Insert data into MongoDB
orders_collection.insert_many(df.to_dict(orient="records"))

print("Data inserted into MongoDB successfully.")


ModuleNotFoundError: No module named 'pymongo'

In [None]:
# 2. Retrieve and print all documents from the Orders collection
for order in orders_collection.find():
    print(order)


In [2]:
# 3. Count and display the total number of documents in the Orders collection
count = orders_collection.count_documents({})
print("Total number of documents:", count)


NameError: name 'orders_collection' is not defined

In [3]:
# 4. Write a query to fetch all orders from the "West" region
west_orders = orders_collection.find({"Region": "West"})
for order in west_orders:
    print(order)


NameError: name 'orders_collection' is not defined

In [4]:
# 5. Write a query to find orders where Sales is greater than 500
high_sales = orders_collection.find({"Sales": {"$gt": 500}})
for order in high_sales:
    print(order)


NameError: name 'orders_collection' is not defined

In [5]:
# 6. Fetch the top 3 orders with the highest Profit
top_profit = orders_collection.find().sort("Profit", -1).limit(3)
for order in top_profit:
    print(order)


NameError: name 'orders_collection' is not defined

In [6]:
# 7. Update all orders with Ship Mode as "First Class" to "Premium Class"
result = orders_collection.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)
print(f"Updated {result.modified_count} documents.")


NameError: name 'orders_collection' is not defined

In [7]:
# 8. Delete all orders where Sales is less than 50
delete_result = orders_collection.delete_many({"Sales": {"$lt": 50}})
print(f"Deleted {delete_result.deleted_count} documents.")


NameError: name 'orders_collection' is not defined

In [8]:
# 9. Use aggregation to group orders by Region and calculate total sales per region
pipeline = [
    {"$group": {"_id": "$Region", "total_sales": {"$sum": "$Sales"}}}
]
for doc in orders_collection.aggregate(pipeline):
    print(doc)


NameError: name 'orders_collection' is not defined

In [9]:
# 10. Fetch all distinct values for Ship Mode from the collection
ship_modes = orders_collection.distinct("Ship Mode")
print("Distinct Ship Modes:", ship_modes)


NameError: name 'orders_collection' is not defined

In [10]:
# 11. Count the number of orders for each category
pipeline = [
    {"$group": {"_id": "$Category", "count": {"$sum": 1}}}
]
for doc in orders_collection.aggregate(pipeline):
    print(doc)


NameError: name 'orders_collection' is not defined