# **Theoretical Questions**

1) What are the key differences between SQL and NoSQL database?

-  SQL (Relational Databases):

Use structured, tabular data (rows and columns).

Data is organized into tables with predefined schemas.

Examples: MySQL, PostgreSQL, Oracle, SQL Server.

- NoSQL (Non-Relational Databases):

Use flexible models: key-value pairs, documents, column-family, or graphs.

Schema-less or schema-flexible, allowing unstructured or semi-structured data.

2) What makes MongoDB a good choice for modern applications?
- 1. Flexible Data Model (Schema-less)

Stores data in JSON-like documents (BSON).

No need to predefine schema — each document can have different fields.

Ideal for apps where data structures evolve frequently (e.g., startups, agile development).

- Scalability

Built for horizontal scaling using sharding.

Handles large amounts of data and high user traffic by distributing data across multiple servers.

Perfect for cloud-native and global applications.

- High Performance

Optimized for fast reads and writes.

Uses in-memory storage engine options and indexing for efficient queries.

Supports large-scale, real-time applications (e.g., social media feeds, IoT data ingestion).

- Developer-Friendly

Stores data in a format similar to JSON → very natural for developers.

Reduces the “impedance mismatch” between application objects and database records.

Rich query language with support for ad-hoc queries, aggregation framework, geospatial queries, and full-text search.

- Supports Modern Application Needs

Handles unstructured, semi-structured, and structured data.

Works well with big data, analytics, content management, catalogs, IoT, and mobile apps.

Easy integration with modern programming languages and cloud platforms.

- High Availability & Reliability

Replica sets provide automatic failover and data redundancy.

Ensures your app stays available even if a server goes down.

- Cloud-Native & Ecosystem

MongoDB Atlas (managed cloud service) simplifies deployment, scaling, and monitoring.

3)  Explain the concept of collections in MongoDB?
- A collection is a group of MongoDB documents.

- Documents inside a collection are stored in a BSON (binary JSON) format.

- Unlike SQL tables, collections do not enforce a schema — documents in the same collection can have different structures

4) How does MongoDB ensure high availability using replication?
- MongoDB ensures high availability using replica sets, which provide data redundancy, automatic replication, and automatic failover. This allows applications to stay online with minimal interruption, even if servers fail.

5) < What are the main benefits of MongoDB Atlas?

- MongoDB Atlas offers high availability, scalability, global reach, built-in security, and developer-friendly tools — making it an excellent choice for cloud-native and modern applications.

6) What is the role of indexes in MongoDB, and how do they improve performance?
- Indexes in MongoDB make queries faster by avoiding full collection scans. They are crucial for improving performance, sorting, uniqueness, and complex queries, but should be used wisely to balance read and write performance.

7) Describe the stages of the MongoDB aggregation pipeline?
- The aggregation pipeline has stages like $match, $project, $group, $sort, $limit, $skip, $unwind, $lookup, $out/$merge, which allow MongoDB to perform powerful data transformations similar to SQL’s GROUP BY, JOIN, and analytic queries.

8) What is sharding in MongoDB? How does it differ from replication?
- Sharding is MongoDB’s way of horizontally scaling a database by distributing data across multiple servers (called shards).

- Each shard holds a subset of the data, allowing the database to handle very large datasets and high throughput.
9) What is PyMongo, and why is it used?
- PyMongo is the official Python driver for MongoDB.

- It allows Python applications to connect to, query, and interact with MongoDB databases.

- Essentially, it provides a Python interface to perform MongoDB operations like inserting, updating, deleting, and querying documents.

10) What are the ACID properties in the context of MongoDB transactions?
- MongoDB transactions ensure ACID properties so that complex operations involving multiple documents are safe, reliable, and consistent, just like in traditional relational databases.

11)  What is the purpose of MongoDB’s explain() function?

- The explain() function in MongoDB is used to analyze how a query is executed. It helps developers understand query performance and optimize database operations.

12) How does MongoDB handle schema validation?
- Unlike relational databases, MongoDB is schema-less by default, meaning documents in a collection can have different structures. However, MongoDB supports schema validation to enforce rules on the data for consistency and integrity.

13) What is the difference between a primary and a secondary node in a replica set?
- Primary: The main node for writes and replication source.

- Secondary: Copies data from primary for redundancy and optional reads, and can become primary if needed.
14) What security mechanisms does MongoDB provide for data protection?
- MongoDB provides security at multiple layers:

- Authentication – who can access the database

- Authorization / RBAC – what users can do

- Encryption – protect data in transit and at rest

- Auditing – monitor and log database activity

- Network Security – restrict access and secure communication

- Data Integrity – ensure reliability and detect tampering
15) Explain the concept of embedded documents and when they should be used?
- An embedded document is a document nested inside another document in a MongoDB collection.

- This allows related data to be stored together in a single record, rather than in separate collections.

- MongoDB stores data in BSON format, so documents can contain nested objects or arrays.

16)  What is the purpose of MongoDB’s $lookup stage in aggregation?
- The $lookup stage in MongoDB’s aggregation pipeline allows you to join documents from different collections, combining related data into a single result set — effectively enabling relational-style queries in a NoSQL database.

17)  What are some common use cases for MongoDB?
- MongoDB is especially suited for applications that require:

- Flexible schema for evolving data.

- High scalability for large datasets and heavy traffic.

- Real-time analytics and fast reads/writes.

- Complex hierarchical or nested data.

18) What are the advantages of using MongoDB for horizontal scaling?
- Large dataset handling

- Parallel query processing

- Elastic growth

- High availability

- Global distribution

19) How do MongoDB transactions differ from SQL transactions?
- MongoDB transactions: Designed for multi-document operations in a NoSQL environment, single-document operations are already atomic.

- SQL transactions: Standard for multi-row operations in relational databases, fully ACID by default.

- Key difference: MongoDB emphasizes flexibility and performance, while SQL relies on strict schema and mature transactional support.

20)  What are the main differences between capped collections and regular collections?
- Capped collections are fixed-size, circular, and insertion-ordered, ideal for logs and queues.

- Regular collections are dynamic, flexible, and general-purpose, suitable for most application data.
21) What is the purpose of the $match stage in MongoDB’s aggregation pipeline?
- $match filters documents in the aggregation pipeline.

- Acts like SQL’s WHERE clause.

- Optimizes performance when placed early in the pipeline.
22) How can you secure access to a MongoDB database?
- Enable authentication and require users to log in.

- Use RBAC to control what users can access.

- Encrypt data in transit and at rest.

- Restrict network access via IP whitelists and firewalls.

- Monitor activity with auditing.

- Follow security best practices like strong passwords and least privilege.

23) What is MongoDB’s WiredTiger storage engine, and why is it important?
- WiredTiger is MongoDB’s high-performance, default storage engine that provides:

- Concurrency for multiple operations

- Data compression for efficiency

- Durability and ACID compliance

- Support for modern features like transactions

# **Practical Questions**

1) Write a Python script to load the Superstore dataset from a CSV file into MongoDB?

In [None]:
import csv
from pymongo import MongoClient

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]          # Database
collection = db["orders"]             # Collection

# 2. Load CSV and insert documents
csv_file_path = "Superstore.csv"      # Path to your CSV file

with open(csv_file_path, mode='r', encoding='utf-8-sig') as file:
    reader = csv.DictReader(file)
    data = []
    for row in reader:
        # Optional: convert numeric fields
        if 'Sales' in row:
            row['Sales'] = float(row['Sales']) if row['Sales'] else 0.0
        if 'Quantity' in row:
            row['Quantity'] = int(row['Quantity']) if row['Quantity'] else 0
        if 'Profit' in row:
            row['Profit'] = float(row['Profit']) if row['Profit'] else 0.0
        data.append(row)

# 3. Insert all documents into MongoDB
if data:
    result = collection.insert_many(data)
    print(f"{len(result.inserted_ids)} documents inserted successfully!")
else:
    print("No data found in the CSV file.")


 2) Retrieve and print all documents from the Orders collection?

In [None]:
from pymongo import MongoClient
import pprint  # Pretty-print for nicer output

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]     # Replace with your database name
collection = db["orders"]        # Replace with your collection name

# 2. Retrieve all documents
documents = collection.find()    # Returns a cursor

# 3. Print each document
for doc in documents:
    pprint.pprint(doc)


3) Count and display the total number of documents in the Orders collection?

In [None]:
from pymongo import MongoClient

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]   # Replace with your database name
collection = db["orders"]      # Replace with your collection name

# 2. Count total documents
total_docs = collection.count_documents({})  # Empty filter counts all documents

# 3. Display the count
print(f"Total number of documents in Orders collection: {total_docs}")


4) Write a query to fetch all orders from the "West" region?


In [None]:
from pymongo import MongoClient
import pprint

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]    # Replace with your database name
collection = db["orders"]       # Replace with your collection name

# 2. Query orders from the "West" region
west_orders = collection.find({"Region": "West"})

# 3. Print the results
for order in west_orders:
    pprint.pprint(order)


5) Write a query to find orders where Sales is greater than 500<

In [None]:
from pymongo import MongoClient
import pprint

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]    # Replace with your database name
collection = db["orders"]       # Replace with your collection name

# 2. Query orders where Sales > 500
high_sales_orders = collection.find({"Sales": {"$gt": 500}})

# 3. Print the results
for order in high_sales_orders:
    pprint.pprint(order)


6) Fetch the top 3 orders with the highest Profit?

In [None]:
from pymongo import MongoClient
import pprint

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]    # Replace with your database name
collection = db["orders"]       # Replace with your collection name

# 2. Query top 3 orders with highest Profit
top_profit_orders = collection.find().sort("Profit", -1).limit(3)

# 3. Print the results
for order in top_profit_orders:
    pprint.pprint(order)


7) Update all orders with Ship Mode as "First Class" to "Premium Class.?

In [None]:
from pymongo import MongoClient

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]    # Replace with your database name
collection = db["orders"]       # Replace with your collection name

# 2. Update all matching documents
update_result = collection.update_many(
    {"Ship Mode": "First Class"},           # Filter
    {"$set": {"Ship Mode": "Premium Class"}} # Update operation
)

# 3. Print the result
print(f"Matched documents: {update_result.matched_count}")
print(f"Modified documents: {update_result.modified_count}")


8) Delete all orders where Sales is less than 50?

In [None]:
from pymongo import MongoClient

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]    # Replace with your database name
collection = db["orders"]       # Replace with your collection name

# 2. Delete all matching documents
delete_result = collection.delete_many({"Sales": {"$lt": 50}})

# 3. Print the result
print(f"Number of orders deleted: {delete_result.deleted_count}")


 9) Use aggregation to group orders by Region and calculate total sales per region?

In [None]:
from pymongo import MongoClient
import pprint

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]    # Replace with your database name
collection = db["orders"]       # Replace with your collection name

# 2. Aggregation pipeline
pipeline = [
    {
        "$group": {
            "_id": "$Region",           # Group by Region
            "totalSales": {"$sum": "$Sales"}  # Calculate total Sales per region
        }
    },
    {
        "$sort": {"totalSales": -1}     # Optional: sort by totalSales descending
    }
]

# 3. Execute the aggregation
results = collection.aggregate(pipeline)

# 4. Print the results
for result in results:
    pprint.pprint(result)


10) Fetch all distinct values for Ship Mode from the collection?

In [None]:
from pymongo import MongoClient

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]    # Replace with your database name
collection = db["orders"]       # Replace with your collection name

# 2. Fetch distinct Ship Mode values
distinct_ship_modes = collection.distinct("Ship Mode")

# 3. Print the results
print("Distinct Ship Modes:")
for mode in distinct_ship_modes:
    print(mode)


11) Count the number of orders for each category?

In [None]:
from pymongo import MongoClient
import pprint

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]    # Replace with your database name
collection = db["orders"]       # Replace with your collection name

# 2. Aggregation pipeline to count orders per category
pipeline = [
    {
        "$group": {
            "_id": "$Category",       # Group by Category
            "orderCount": {"$sum": 1} # Count the number of orders
        }
    },
    {
        "$sort": {"orderCount": -1}   # Optional: sort descending by order count
    }
]

# 3. Execute aggregation
results = collection.aggregate(pipeline)

# 4. Print results
for result in results:
    pprint.pprint(result)
