                                        Theoretical Questions

Q1. What are the key differences between SQL and NoSQL databases?

Answer

SQL databases are relational with fixed schemas and use tables (e.g., MySQL).
NoSQL databases like MongoDB are schema‑less, store data as documents/JSON, and scale horizontally.





Q2. What makes MongoDB a good choice for modern applications?

Answer

MongoDB is flexible with schema‑less design, supports horizontal scaling, provides high performance, and handles large amounts of unstructured data easily.



Q3. Explain the concept of collections in MongoDB.

Answer

A collection is a group of MongoDB documents, similar to a table in SQL.
It stores related data without a fixed schema, allowing flexible fields per document.



Q4. How does MongoDB ensure high availability using replication?

Answer

MongoDB uses replica sets with a primary node and secondary nodes.
Data is copied across nodes; if the primary fails, a secondary becomes primary automatically.



Q5. What are the main benefits of MongoDB Atlas?

Answer

MongoDB Atlas is a fully managed cloud service offering automatic backups, scaling, global clusters, monitoring, and built‑in security.



Q6. What is the role of indexes in MongoDB, and how do they improve performance?

Answer

Indexes speed up query execution by allowing MongoDB to quickly locate documents.
Without indexes, MongoDB scans the entire collection.



Q7. Describe the stages of the MongoDB aggregation pipeline.

Answer

Aggregation pipeline processes documents through stages like:

$match (filter), $group (aggregate), $sort, $project (reshape), $lookup (join).



Q8. What is sharding in MongoDB? How does it differ from replication?

Answer

Sharding distributes data across multiple servers for horizontal scaling.
Replication copies the same data across servers for fault tolerance.



Q9. What is PyMongo, and why is it used?

Answer

PyMongo is the official Python driver for MongoDB.
It allows Python applications to connect, query, and manage MongoDB databases programmatically.



Q10. What are the ACID properties in the context of MongoDB transactions?

Answer

Atomicity: All operations succeed or none.

Consistency: Keeps database valid.

Isolation: Transactions don’t interfere.

Durability: Results are permanent even after failures.



Q11. What is the purpose of MongoDB’s explain() function?

Answer

explain() shows how MongoDB executes a query.
It helps analyze query performance and the use of indexes.



Q12. How does MongoDB handle schema validation?

Answer

MongoDB supports JSON Schema validation at the collection level.
It enforces rules for required fields, data types, and value ranges.



Q13. What is the difference between a primary and a secondary node in a replica set?

Answer

Primary: Accepts read/write operations.

Secondary: Copies data from the primary and can serve read requests (if enabled).



Q14. What security mechanisms does MongoDB provide for data protection?

Answer

MongoDB offers authentication, role‑based access control (RBAC), TLS/SSL encryption, auditing, and IP whitelisting for secure access.



Q15. Explain the concept of embedded documents and when they should be used.

Answer

Embedded documents store related data inside a single document.
They are best used when data is mostly accessed together (e.g., customer info + address).



Q16. What is the purpose of MongoDB’s $lookup stage in aggregation?

Answer

$lookup performs a left outer join between collections.
It combines related documents from different collections into a single result.



Q17. What are some common use cases for MongoDB?

Answer

Real‑time analytics

Content management systems

IoT & sensor data

Social media platforms

E‑commerce product catalogs



Q18. What are the advantages of using MongoDB for horizontal scaling?

Answer

MongoDB uses sharding to distribute data across servers, allowing:

High throughput

Large dataset handling

Load balancing and cost‑efficient scaling



Q19. How do MongoDB transactions differ from SQL transactions?

Answer

SQL transactions are standard and always multi‑row.
MongoDB traditionally used single‑document atomic operations, but now supports multi‑document transactions (slower than SQL).



Q20. What are the main differences between capped collections and regular collections?

Answer

Capped collections: Fixed size, maintain insertion order, auto‑delete oldest docs when full.

Regular collections: Unlimited growth, flexible updates.



Q21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?

Answer

$match filters documents based on conditions, similar to SQL’s WHERE clause.
It’s usually placed early to reduce processing load.



Q22. How can you secure access to a MongoDB database?

Answer

Enable authentication, enforce role‑based permissions, use TLS/SSL encryption, configure firewalls/IP whitelisting, and keep MongoDB updated.



Q23. What is MongoDB’s WiredTiger storage engine, and why is it important?

Answer

WiredTiger is MongoDB’s default storage engine.
It supports compression, concurrency control, and journaling, giving better performance and efficient memory use.

                                                Practical Questions

In [14]:
!pip install pymongo dnspython

import pandas as pd
from pymongo import MongoClient
from google.colab import files

client = MongoClient("mongodb+srv://<username>:<password>@<cluster-url>/")
db = client["superstore_db"]
orders = db["orders"]





In [None]:
# Q1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB.
uploaded = files.upload()  # upload Superstore.csv
df = pd.read_csv("Superstore.csv")
orders.insert_many(df.to_dict("records"))
print("Data inserted successfully!")


In [None]:
# Q2. Retrieve and print all documents from the Orders collection.
for order in orders.find():
    print(order)


In [None]:
# Q3. Count and display the total number of documents in the Orders collection.
count = orders.count_documents({})
print("Total documents:", count)


In [None]:
# Q4. Write a query to fetch all orders from the "West" region.
for order in orders.find({"Region": "West"}):
    print(order)


In [None]:
# Q5. Write a query to find orders where Sales is greater than 500.
for order in orders.find({"Sales": {"$gt": 500}}):
    print(order)


In [None]:
# Q6. Fetch the top 3 orders with the highest Profit.
for order in orders.find().sort("Profit", -1).limit(3):
    print(order)


In [None]:
# Q7. Update all orders with Ship Mode as "First Class" to "Premium Class".
result = orders.update_many({"Ship Mode": "First Class"}, {"$set": {"Ship Mode": "Premium Class"}})
print(result.modified_count, "documents updated")


In [None]:
# Q8. Delete all orders where Sales is less than 50.
result = orders.delete_many({"Sales": {"$lt": 50}})
print(result.deleted_count, "documents deleted")


In [None]:
# Q9. Use aggregation to group orders by Region and calculate total sales per region.
pipeline = [
    {"$group": {"_id": "$Region", "total_sales": {"$sum": "$Sales"}}}
]

for result in orders.aggregate(pipeline):
    print(result)


In [None]:
# Q10. Fetch all distinct values for Ship Mode from the collection.
ship_modes = orders.distinct("Ship Mode")
print("Distinct Ship Modes:", ship_modes)


In [None]:
# Q11. Count the number of orders for each category.
pipeline = [
    {"$group": {"_id": "$Category", "order_count": {"$sum": 1}}}
]

for result in orders.aggregate(pipeline):
    print(result)
