**1. What are the key differences between SQL and NoSQL databases**

SQL databases use structured schemas and tables with fixed rows and columns, while NoSQL databases (like MongoDB) are schema-less and store data in flexible formats like JSON. SQL is best for complex queries and ACID compliance, whereas NoSQL is designed for scalability and unstructured or semi-structured data.

**2. What makes MongoDB a good choice for modern applications**

MongoDB is schema-less, horizontally scalable, supports high availability with replication, and provides powerful querying and aggregation features. This makes it ideal for modern applications that need flexibility and scalability.

**3. Explain the concept of collections in MongoDB**

Collections in MongoDB are equivalent to tables in SQL. They hold multiple JSON-like documents, but unlike tables, they do not enforce a rigid schema.

**4. How does MongoDB ensure high availability using replication**

MongoDB uses replica sets, which are groups of MongoDB servers that maintain the same data set. If the primary node fails, one of the secondary nodes is automatically promoted to primary to ensure availability.

**5. What are the main benefits of MongoDB Atlas**

MongoDB Atlas provides a fully managed cloud database solution with automated scaling, backup, monitoring, and security features.

**6. What is the role of indexes in MongoDB, and how do they improve performance**

Indexes in MongoDB work like indexes in books. They allow the database to find documents faster without scanning the entire collection, improving query performance.

**7. Describe the stages of the MongoDB aggregation pipeline**

The aggregation pipeline consists of stages such as $match, $group, $project, $sort, $limit, and $lookup. Each stage processes documents and passes results to the next stage.

**8. What is sharding in MongoDB? How does it differ from replication**

Sharding is the process of distributing data across multiple servers to handle large datasets and high throughput. Replication ensures high availability by duplicating data across nodes, whereas sharding improves scalability by splitting data.

**9. What is PyMongo, and why is it used**

PyMongo is the official Python driver for MongoDB. It is used to interact with MongoDB from Python applications by performing CRUD operations and queries.

**10. What are the ACID properties in the context of MongoDB transactions**

ACID stands for Atomicity, Consistency, Isolation, and Durability. MongoDB supports multi-document ACID transactions since version 4.0, ensuring data integrity in critical applications.

**11. What is the purpose of MongoDB’s explain() function**

The explain() function provides details about how MongoDB executes a query, including information on indexes used and execution time.

**12. How does MongoDB handle schema validation**

MongoDB allows developers to define JSON schema rules when creating collections to enforce validation of document structure and data types.

**13. What is the difference between a primary and a secondary node in a replica set**

The primary node handles all write operations, while secondary nodes replicate data from the primary and can serve read operations depending on configuration.

**14. What security mechanisms does MongoDB provide for data protection**

MongoDB provides authentication, authorization, role-based access control, TLS/SSL encryption, auditing, and field-level encryption.

**15. Explain the concept of embedded documents and when they should be used**

Embedded documents are documents stored inside other documents. They are used when related data is frequently accessed together, reducing the need for joins.

**16. What is the purpose of MongoDB’s $lookup stage in aggregation**

The $lookup stage is used to perform left outer joins between collections, allowing data from different collections to be combined.

**17. What are some common use cases for MongoDB**

MongoDB is commonly used for content management systems, e-commerce platforms, IoT applications, real-time analytics, and mobile applications.

**18. What are the advantages of using MongoDB for horizontal scaling**

MongoDB supports automatic sharding, allowing data to be distributed across multiple nodes, improving read/write performance as the dataset grows.

**19. How do MongoDB transactions differ from SQL transactions**

SQL transactions are typically used in relational databases with strict ACID compliance. MongoDB transactions were introduced later and are more limited in performance but still provide ACID guarantees for multiple documents.

**20. What are the main differences between capped collections and regular collections**

Capped collections have a fixed size and automatically overwrite the oldest documents when full. Regular collections grow dynamically without overwriting data.

**21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline**

The $match stage filters documents based on specific conditions, similar to the WHERE clause in SQL.

**22. How can you secure access to a MongoDB database**

By enabling authentication, enforcing role-based access control, enabling TLS/SSL encryption, and using IP whitelisting or firewalls.

**23. What is MongoDB’s WiredTiger storage engine, and why is it important?**

WiredTiger is the default storage engine for MongoDB. It provides features like document-level concurrency control, compression, and improved performance.

In [19]:
# Import required libraries
import pandas as pd
from pymongo import MongoClient

# 1. Load the Superstore dataset into MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
orders = db["Orders"]

# Load CSV into Pandas
df = pd.read_csv("superstore.csv", encoding="ISO-8859-1")

# Insert into MongoDB (convert to list of dictionaries)
data = df.to_dict(orient="records")
orders.insert_many(data)
print("✅ 1. Data loaded into MongoDB successfully")


# 2. Retrieve and print all documents from the Orders collection
print("\n✅ 2. All documents:")
for order in orders.find().limit(5):
    print(order)


# 3. Count and display the total number of documents
total_docs = orders.count_documents({})
print("\n✅ 3. Total documents:", total_docs)


# 4. Fetch all orders from the "West" region
print("\n✅ 4. Orders from West region:")
for order in orders.find({"Region": "West"}).limit(5):
    print(order)


# 5. Find orders where Sales > 500
print("\n✅ 5. Orders with Sales > 500:")
for order in orders.find({"Sales": {"$gt": 500}}).limit(5):
    print(order)


# 6. Top 3 orders with highest Profit
print("\n✅ 6. Top 3 highest profit orders:")
for order in orders.find().sort("Profit", -1).limit(3):
    print(order)


# 7. Update Ship Mode "First Class" → "Premium Class"
result = orders.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)
print(f"\n✅ 7. Updated {result.modified_count} documents")


# 8. Delete all orders where Sales < 50
result = orders.delete_many({"Sales": {"$lt": 50}})
print(f"\n✅ 8. Deleted {result.deleted_count} documents")


# 9. Group orders by Region and calculate total sales
print("\n✅ 9. Total sales per Region:")
pipeline = [
    {"$group": {"_id": "$Region", "total_sales": {"$sum": "$Sales"}}}
]
for doc in orders.aggregate(pipeline):
    print(doc)


# 10. Fetch all distinct values of Ship Mode
ship_modes = orders.distinct("Ship Mode")
print("\n✅ 10. Distinct Ship Modes:", ship_modes)


# 11. Count the number of orders per Category
print("\n✅ 11. Orders per Category:")
pipeline = [
    {"$group": {"_id": "$Category", "count": {"$sum": 1}}}
]
for doc in orders.aggregate(pipeline):
    print(doc)


ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 68d4c8e7c6be400963bd8187, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>