In [12]:
##Theoritical Questions

Que1: What are the key differences between SQL and NoSQL databases?

Ans1: SQL databases are relational, use structured schemas, and support SQL queries. NoSQL databases like MongoDB are non-relational, support dynamic schemas, and store data in flexible formats (e.g., JSON). SQL ensures ACID compliance; NoSQL offers scalability and high performance for unstructured data.

Que2: What makes MongoDB a good choice for modern applications?

Ans2: MongoDB offers flexible document-based storage, horizontal scalability, and a powerful query language. It handles unstructured and semi-structured data efficiently, making it ideal for real-time analytics, content management, IoT, and mobile apps.

Que3: Explain the concept of collections in MongoDB?

Ans3: Collections in MongoDB are analogous to tables in SQL. They store groups of related documents (records), each having a dynamic schema. Collections enable flexible, schema-less storage for varied data types within the same dataset.

Que4: How does MongoDB ensure high availability using replication?

Ans4: MongoDB uses replica sets to ensure high availability. A replica set includes a primary node and multiple secondary nodes. Data is automatically replicated across nodes, and if the primary fails, a secondary is promoted to primary automatically.

Que5: What are the main benefits of MongoDB Atlas?

Ans5: MongoDB Atlas is a fully managed cloud database offering auto-scaling, backups, real-time performance monitoring, global distribution, and built-in security. It eliminates infrastructure management and ensures high availability and scalability across major cloud providers.

Que6: What is the role of indexes in MongoDB, and how do they improve performance?

Ans6: Indexes improve query performance by allowing MongoDB to locate data efficiently without scanning the entire collection. Common indexes include single-field, compound, and text indexes. Without indexes, queries are slower and more resource-intensive.

Que7: Describe the stages of the MongoDB aggregation pipeline?

Ans7: The aggregation pipeline processes data through stages like $match (filtering), $group (aggregation), $sort, $project (reshaping), and $lookup (joins). Each stage transforms the documents, enabling powerful data analysis and reporting.

Que8: What is sharding in MongoDB? How does it differ from replication?

Ans8: Sharding partitions data across multiple servers for horizontal scaling, improving performance and storage. Replication copies data across servers to ensure high availability. Sharding handles large datasets; replication ensures fault tolerance.

Que9: What is PyMongo, and why is it used?

Ans9: PyMongo is the official Python driver for MongoDB. It enables Python applications to interact with MongoDB databases for CRUD operations, aggregation, indexing, and more, making it essential for integrating MongoDB with Python-based apps.

Que10: What are the ACID properties in the context of MongoDB transactions?

Ans10: ACID stands for Atomicity, Consistency, Isolation, and Durability. MongoDB supports multi-document transactions with ACID guarantees, ensuring reliable and consistent data updates, especially important for financial and enterprise applications.

Que11: What is the purpose of MongoDB's explain() function?

Ans11: The explain() function reveals how MongoDB executes a query, including index usage, scan type, and execution time. It helps developers optimize performance by identifying inefficiencies in query plans.

Que12: How does MongoDB handle schema validation?

Ans12: MongoDB allows schema validation using JSON Schema. It enforces rules on document structure and field types within a collection, ensuring data consistency while still supporting flexibility where needed.

Que13: What is the difference between a primary and a secondary node in a replica set?

Ans13: The primary node handles all write operations, while secondary nodes replicate the data from the primary and serve read queries (if enabled). In failover scenarios, a secondary can become primary.

Que14: What security mechanisms does MongoDB provide for data protection?

Ans14: MongoDB provides authentication, role-based access control, TLS/SSL encryption, auditing, IP whitelisting, and encryption at rest. These features ensure secure access, data protection, and compliance with security standards.

Que15: Explain the concept of embedded documents and when they should be used?

Ans15: Embedded documents store related data within a single document, improving read performance and reducing joins. Use them when related data is accessed together and when document size stays within BSON limits (16MB).

Que16: What is the purpose of MongoDB's $lookup stage in aggregation?

Ans16: $lookup performs left outer joins between collections. It allows combining documents from multiple collections in a single aggregation pipeline, useful for referencing related data like orders and customers.

Que17: What are some common use cases for MongoDB?

Ans17: MongoDB is widely used for content management, IoT, real-time analytics, e-commerce platforms, mobile apps, catalog systems, and any application needing flexible data models and scalable storage.

Que18: What are the advantages of using MongoDB for horizontal scaling?

Ans18: MongoDB supports horizontal scaling through sharding, distributing data across multiple nodes. This allows handling large volumes of data and traffic with minimal performance loss, ensuring seamless scalability.

Que19: How do MongoDB transactions differ from SQL transactions?

Ans19: MongoDB supports multi-document ACID transactions, similar to SQL. However, SQL is inherently transactional across rows and tables, while MongoDB's transactional features were introduced later and are typically used only when needed.

Que20: What are the main differences between capped collections and regular collections?

Ans20: Capped collections are fixed-size, automatically overwrite oldest documents, and maintain insertion order—ideal for logs or cache. Regular collections have no size limits and support document deletion and updates.

Que21: What is the purpose of the $match stage in MongoDB's aggregation pipeline?

Ans21: $match filters documents based on specific criteria, similar to a SQL WHERE clause. It's often placed early in the pipeline to reduce the number of documents processed in later stages.

Que22: How can you secure access to a MongoDB database?

Ans22: Secure MongoDB by enabling authentication, using strong passwords, enforcing role-based access control, encrypting connections with TLS/SSL, restricting network access via firewalls/IP whitelisting, and using encrypted storage.

Que23: What is MongoDB's WiredTiger storage engine, and why is it important?

Ans23: WiredTiger is MongoDB's default storage engine. It provides document-level concurrency, data compression, and efficient memory usage. It improves performance and scalability, especially in write-intensive and concurrent workloads.

##Practical Questions

In [13]:
pip install pymongo pandas



In [14]:
pip install gdown



Que1: Write a Python script to load the Superstore dataset from a CSV file into MongoDB.

In [15]:
#Ans1:
import pandas as pd
from pymongo import MongoClient
import gdown

# Download the CSV from Google Drive
gdrive_url = "https://drive.google.com/uc?id=1bJ-X2ONfnE5YbsNe2bCK39IfoBHexYQO"
output_file = "Superstore.csv"
gdown.download(gdrive_url, output_file, quiet=False)

# Connect to local MongoDB
client = MongoClient("mongodb+srv://Google-ColabProjects:12test34@cluster0.edvxzj6.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0")
db = client["SuperstoreDB"]
collection = db["Orders"]

# Load CSV with correct encoding
df = pd.read_csv(output_file, encoding="ISO-8859-1")  # or encoding="latin1"
data = df.to_dict(orient="records")

# Insert into MongoDB
collection.insert_many(data)
print("Data loaded into MongoDB.")

collection.delete_many({})  # Clears existing documents

Downloading...
From: https://drive.google.com/uc?id=1bJ-X2ONfnE5YbsNe2bCK39IfoBHexYQO
To: /content/Superstore.csv
100%|██████████| 2.29M/2.29M [00:00<00:00, 194MB/s]


Data loaded into MongoDB.


DeleteResult({'n': 9994, 'electionId': ObjectId('7fffffff00000000000000f0'), 'opTime': {'ts': Timestamp(1752909940, 922), 't': 240}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1752909940, 924), 'signature': {'hash': b'"\x0fz\xb3\xec*\xd4X>)c\xdb\xa9\xe8\xbd;\xec\xb3\x85H', 'keyId': 7482727795873808386}}, 'operationTime': Timestamp(1752909940, 922)}, acknowledged=True)

Que2: Retrieve and print all documents from the Orders collection.

In [16]:
#Ans2:
print("\nAll documents:")
for doc in collection.find():
    print(doc)


All documents:


Que3: Count and display the total number of documents in the Orders collection.

In [17]:
#Ans3:
count = collection.count_documents({})
print(f"\nTotal number of documents: {count}")


Total number of documents: 0


Que4: Write a query to fetch all orders from the "West" region.

In [18]:
#Ans4:
west_orders = collection.find({"Region": "West"})
print("\nOrders from West region:")
for doc in west_orders:
    print(doc)


Orders from West region:


Que5: Write a query to find orders where Sales is greater than 500.

In [19]:
#Ans5:
high_sales = collection.find({"Sales": {"$gt": 500}})
print("\nOrders with Sales > 500:")
for doc in high_sales:
    print(doc)


Orders with Sales > 500:


Que6: Fetch the top 3 orders with the highest Profit.

In [20]:
#Ans6:
top_profit_orders = collection.find().sort("Profit", -1).limit(3)
print("\nTop 3 profitable orders:")
for doc in top_profit_orders:
    print(doc)


Top 3 profitable orders:


Que7: Update all orders with Ship Mode as "First Class" to "Premium Class.

In [21]:
#Ans7:
update_result = collection.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)
print(f"\nUpdated {update_result.modified_count} documents.")


Updated 0 documents.


Que8: Delete all orders where Sales is less than 50.

In [22]:
#Ans8:
delete_result = collection.delete_many({"Sales": {"$lt": 50}})
print(f"\nDeleted {delete_result.deleted_count} documents.")


Deleted 0 documents.


Que9: Use aggregation to group orders by Region and calculate total sales per region.

In [23]:
#Ans9:
pipeline = [
    {"$group": {"_id": "$Region", "total_sales": {"$sum": "$Sales"}}}
]
region_sales = collection.aggregate(pipeline)
print("\nTotal Sales per Region:")
for doc in region_sales:
    print(doc)


Total Sales per Region:


Que10: Fetch all distinct values for Ship Mode from the collection.

In [24]:
#Ans10:
distinct_modes = collection.distinct("Ship Mode")
print("\nDistinct Ship Modes:")
print(distinct_modes)


Distinct Ship Modes:
[]


Que11: Count the number of orders for each category.

In [25]:
#Ans11:
category_count = collection.aggregate([
    {"$group": {"_id": "$Category", "order_count": {"$sum": 1}}}
])
print("\nNumber of orders per Category:")
for doc in category_count:
    print(doc)


Number of orders per Category:
