1. Key differences between SQL and NoSQL databases
SQL: Structured, uses tables, fixed schema, supports JOINs.

NoSQL (like MongoDB): Flexible schema, stores data in JSON-like documents, better for hierarchical data and horizontal scaling.

2. Why is MongoDB good for modern applications?
Schema flexibility, horizontal scalability, fast development cycles, JSON-like documents align well with modern app data models.

3. What is a collection in MongoDB?
A collection is a group of MongoDB documents, similar to a table in SQL.

4. How does MongoDB ensure high availability using replication?
Through replica sets: multiple copies of data across nodes (1 primary, multiple secondaries) for automatic failover and redundancy.

5. Benefits of MongoDB Atlas
Fully managed cloud database

Automated backups, scaling, monitoring

High availability and built-in security

6. Role of indexes in MongoDB
Indexes speed up query performance by allowing faster access to documents. Without indexes, MongoDB performs a full collection scan.

7. Stages of the MongoDB aggregation pipeline
Common stages:

$match (filter)

$group (aggregate)

$sort (order)

$project (reshape)

$lookup (join)

$limit, $skip

8. What is sharding? Difference from replication
Sharding: Distributes data across multiple servers for scalability

Replication: Copies data for high availability

9. What is PyMongo and why is it used?
PyMongo is the Python driver for MongoDB, used to interact with MongoDB from Python applications.

10. ACID properties in MongoDB
Atomicity, Consistency, Isolation, Durability — supported in multi-document transactions (since v4.0) for strong consistency.

11. Purpose of explain() in MongoDB
Shows query execution plan, helping optimize queries by understanding how MongoDB processes them.

12. How does MongoDB handle schema validation?
Via JSON Schema rules defined per collection using validator during collection creation or update.

13. Primary vs Secondary node in a replica set
Primary: Handles all write operations

Secondary: Replicates data from primary; used for reads (if enabled)

14. MongoDB security mechanisms
Authentication

Role-based access control

Encryption (in-transit and at-rest)

IP whitelisting and auditing

15. What are embedded documents and when to use them?
Documents within documents — useful when data is tightly related and often queried together (e.g., user with address info).

16. Purpose of $lookup in aggregation
Performs joins between documents in different collections (similar to SQL joins).

17. Common use cases for MongoDB
Content management

Real-time analytics

Catalogs and inventories

IoT applications

Mobile apps

18. Advantages of MongoDB for horizontal scaling
Sharding allows automatic distribution of data

Handles large volumes

Easily adds more nodes for performance and capacity

19. MongoDB vs SQL transactions
MongoDB supports multi-document transactions but is optimized for single-document atomicity

SQL has full ACID compliance by default

20. Capped vs Regular collections
Capped: Fixed size, FIFO insertion order, auto-removal of oldest data

Regular: Unlimited size, no automatic removal

21. Purpose of $match in aggregation
Filters documents at the beginning of the pipeline — improves performance by reducing data early.

22. How to secure access to MongoDB
Enable authentication

Use strong passwords

Limit network exposure

Use TLS/SSL

Role-based access control

23. What is the WiredTiger storage engine?
Default storage engine in MongoDB, offering high concurrency, compression, and better performance for most workloads.

#Practical
Dataset - https://drive.google.com/file/d/1bJ-X2ONfnE5YbsNe2bCK39IfoBHexYQO/view
1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB<
2. Retrieve and print all documents from the Orders collection<
3. Count and display the total number of documents in the Orders collection<
4. Write a query to fetch all orders from the "West" region<
5. Write a query to find orders where Sales is greater than 500<
6. Fetch the top 3 orders with the highest Profit<
7. Update all orders with Ship Mode as "First Class" to "Premium Class.O
8. Delete all orders where Sales is less than 50<
9. Use aggregation to group orders by Region and calculate total sales per region<
10. Fetch all distinct values for Ship Mode from the collection<
11. Count the number of orders for each category.

In [None]:
pip install pymongo pandas


In [None]:
#1.
import pandas as pd
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

# Load dataset
df = pd.read_csv("Superstore.csv")
data = df.to_dict(orient="records")

# Insert into MongoDB
orders_collection.insert_many(data)


In [None]:
#2.
for doc in orders_collection.find():
    print(doc)


In [None]:
#3.
count = orders_collection.count_documents({})
print("Total Documents:", count)


In [None]:
#4.
west_orders = orders_collection.find({"Region": "West"})
for order in west_orders:
    print(order)


In [None]:
#5.
high_sales = orders_collection.find({"Sales": {"$gt": 500}})
for order in high_sales:
    print(order)


In [None]:
#6.
top_profit = orders_collection.find().sort("Profit", -1).limit(3)
for order in top_profit:
    print(order)


In [None]:
#7.
orders_collection.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)


In [None]:
#8.
orders_collection.delete_many({"Sales": {"$lt": 50}})


In [None]:
#9.
pipeline = [
    {"$group": {"_id": "$Region", "TotalSales": {"$sum": "$Sales"}}}
]
for region in orders_collection.aggregate(pipeline):
    print(region)


In [None]:
#10.
ship_modes = orders_collection.distinct("Ship Mode")
print("Distinct Ship Modes:", ship_modes)


In [None]:
#11.
pipeline = [
    {"$group": {"_id": "$Category", "OrderCount": {"$sum": 1}}}
]
for category in orders_collection.aggregate(pipeline):
    print(category)
