1. What are the key differences between SQL and NoSQL databases  
Answer: SQL databases use structured schemas and support JOINs with ACID compliance (e.g., MySQL), while NoSQL databases like MongoDB use flexible schemas, are document-oriented, and scale horizontally.

2. What makes MongoDB a good choice for modern applications  
Answer: MongoDB is schema-less, scalable, and handles semi-structured data well. It supports JSON-like documents, automatic sharding, replication, and integrates easily with modern web frameworks.

3. Explain the concept of collections in MongoDB  
Answer: A collection in MongoDB is analogous to a table in SQL. It stores multiple documents (records), and documents in a collection do not need to have the same structure.

4. How does MongoDB ensure high availability using replication  
Answer: MongoDB uses replica sets with one primary and multiple secondaries. Data is replicated in real-time, and failover is automatic if the primary fails.

5. What are the main benefits of MongoDB Atlas  
Answer: MongoDB Atlas offers managed hosting, automated backups, scaling, global distribution, security, and monitoring tools — simplifying deployment and management.

6. What is the role of indexes in MongoDB, and how do they improve performance  
Answer: Indexes improve query performance by allowing MongoDB to quickly find data. Without indexes, MongoDB performs a full collection scan.

7. Describe the stages of the MongoDB aggregation pipeline  
Answer: The pipeline stages include `$match`, `$group`, `$sort`, `$project`, `$limit`, `$lookup`, and `$unwind`. These stages process documents sequentially.

8. What is sharding in MongoDB? How does it differ from replication  
Answer: Sharding is horizontal partitioning of data across servers for scalability. Replication creates copies for redundancy and high availability.

9. What is PyMongo, and why is it used  
Answer: PyMongo is the official Python driver for MongoDB. It allows Python programs to connect, read, write, and query MongoDB databases.

10. What are the ACID properties in the context of MongoDB transactions  
Answer: MongoDB supports multi-document ACID transactions. ACID stands for Atomicity, Consistency, Isolation, Durability — ensuring reliable database operations.

11. What is the purpose of MongoDB’s explain() function  
Answer: `explain()` provides information on how MongoDB will execute a query, helping developers optimize queries by understanding query plans and index usage.

12. How does MongoDB handle schema validation  
Answer: MongoDB allows optional schema validation using JSON Schema. You can define required fields, data types, and constraints on document structure.

13. What is the difference between a primary and a secondary node in a replica set  
Answer: The primary node handles all write operations. Secondary nodes replicate data from the primary and can serve read operations if configured.

14. What security mechanisms does MongoDB provide for data protection  
Answer: MongoDB provides authentication, authorization (RBAC), encryption at rest and in transit (TLS/SSL), IP whitelisting, and auditing.

15. Explain the concept of embedded documents and when they should be used  
Answer: Embedded documents are nested documents within another document. They are ideal when related data is frequently accessed together (denormalized).

16. What is the purpose of MongoDB’s $lookup stage in aggregation  
Answer: `$lookup` performs a left outer join between collections, allowing related documents to be merged within the aggregation pipeline.

17. What are some common use cases for MongoDB  
Answer: Content management, product catalogs, IoT, user profiles, mobile app backends, real-time analytics, and large-scale document storage.

18. What are the advantages of using MongoDB for horizontal scaling  
Answer: MongoDB's sharding enables horizontal scaling by distributing data across machines, handling large data and traffic without performance loss.

19. How do MongoDB transactions differ from SQL transactions  
Answer: MongoDB introduced multi-document transactions recently. SQL has long-standing transaction support. MongoDB transactions are more recent and optimized for distributed systems.

20. What are the main differences between capped collections and regular collections  
Answer: Capped collections are fixed-size collections that maintain insertion order and automatically overwrite oldest documents when full.

21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline  
Answer: `$match` filters documents based on conditions. It is often used early in the pipeline to reduce the number of documents processed.

22. How can you secure access to a MongoDB database  
Answer: Use authentication, enable role-based access control, enforce TLS/SSL encryption, restrict IPs, and enable audit logging.

23. What is MongoDB’s WiredTiger storage engine, and why is it important?  
Answer: WiredTiger is MongoDB's default storage engine. It provides compression, concurrency, journaling, and better performance for write-intensive workloads.


In [7]:
# 1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB
# 2. Retrieve and print all documents from the Orders collection
# 3. Count and display the total number of documents in the Orders collection
# 4. Write a query to fetch all orders from the "West" region
# 5. Write a query to find orders where Sales is greater than 500
# 6. Fetch the top 3 orders with the highest Profit
# 7. Update all orders with Ship Mode as "First Class" to "Premium Class"
# 8. Delete all orders where Sales is less than 50
# 9. Use aggregation to group orders by Region and calculate total sales per region
# 10. Fetch all distinct values for Ship Mode from the collection
# 11. Count the number of orders for each category

# 📌 Run this in Google Colab

!pip install pymongo pandas dnspython --quiet

import pandas as pd
from pymongo import MongoClient


uri = "mongodb+srv://admin:admin123@cluster0.jbvxpl5.mongodb.net/superstore_db?retryWrites=true&w=majority&appName=Cluster0"


# Connect to MongoDB Atlas
client = MongoClient(uri)
db = client["superstore_db"]
orders_col = db["orders"]

# 1. Load CSV into MongoDB
df = pd.read_csv("superstore.csv", encoding='ISO-8859-1')
df = df.dropna(subset=["Order ID"])
orders_col.delete_many({})  # Clear previous data
orders_col.insert_many(df.to_dict(orient="records"))
print("✅ 1. Inserted data into MongoDB.")

# 2. Retrieve and print all documents
print("\n2. Sample Documents:")
for doc in orders_col.find().limit(3):
    print(doc)

# 3. Count total documents
print("\n3. Total documents:", orders_col.count_documents({}))

# 4. Orders from 'West' region
print("\n4. Orders from West region:")
for doc in orders_col.find({"Region": "West"}).limit(3):
    print(doc)

# 5. Orders with Sales > 500
print("\n5. Orders with Sales > 500:")
for doc in orders_col.find({"Sales": {"$gt": 500}}).limit(3):
    print(doc)

# 6. Top 3 orders by Profit
print("\n6. Top 3 orders by Profit:")
for doc in orders_col.find().sort("Profit", -1).limit(3):
    print(doc)

# 7. Update Ship Mode from "First Class" to "Premium Class"
result = orders_col.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)
print(f"\n7. Updated {result.modified_count} documents.")

# 8. Delete orders where Sales < 50
result = orders_col.delete_many({"Sales": {"$lt": 50}})
print(f"\n8. Deleted {result.deleted_count} documents.")

# 9. Total sales per Region
print("\n9. Total Sales per Region:")
pipeline = [
    {"$group": {"_id": "$Region", "TotalSales": {"$sum": "$Sales"}}}
]
for doc in orders_col.aggregate(pipeline):
    print(doc)

# 10. Distinct values of Ship Mode
print("\n10. Distinct Ship Modes:")
print(orders_col.distinct("Ship Mode"))

# 11. Count of orders per Category
print("\n11. Orders per Category:")
pipeline = [
    {"$group": {"_id": "$Category", "Count": {"$sum": 1}}}
]
for doc in orders_col.aggregate(pipeline):
    print(doc)


✅ 1. Inserted data into MongoDB.

2. Sample Documents:
{'_id': ObjectId('6895979665aa769a4bacd217'), 'Row ID': 1, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-BO-10001798', 'Category': 'Furniture', 'Sub-Category': 'Bookcases', 'Product Name': 'Bush Somerset Collection Bookcase', 'Sales': 261.96, 'Quantity': 2, 'Discount': 0.0, 'Profit': 41.9136}
{'_id': ObjectId('6895979665aa769a4bacd218'), 'Row ID': 2, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Produc