#Theoretical questions:


###1. What are the key differences between SQL and NoSQL databases?

Ans) SQL databases are relational and structured, using tables with fixed schemas (like MySQL or PostgreSQL).
NoSQL databases such as MongoDB are non-relational and store data in flexible, JSON-like documents.
SQL focuses on ACID compliance and complex joins, while NoSQL prioritizes scalability, flexibility, and performance for unstructured data.

###2. What makes MongoDB a good choice for modern applications?

Ans) MongoDB is a great choice for modern applications because it offers flexible data models, easy scalability, and high performance.
It stores data in BSON (a binary form of JSON), integrates smoothly with modern programming languages, and supports distributed systems for handling large data volumes efficiently.

###3. Explain the concept of collections in MongoDB.

Ans) In MongoDB, a collection is like a table in SQL. It stores multiple documents that share similar structures but can have different fields.
Collections allow flexible schema design, meaning documents can evolve without altering the database structure.

###4. How does MongoDB ensure high availability using replication?

Ans) MongoDB ensures high availability through replica sets.
A replica set consists of a primary node (handles writes) and secondary nodes (replicate data from the primary).
If the primary fails, a secondary node is automatically promoted to primary, ensuring continuous uptime and data availability.

###5. What are the main benefits of MongoDB Atlas?

MongoDB Atlas is a fully managed cloud version of MongoDB.
It provides automatic scaling, real-time performance monitoring, backups, and strong security controls.
It also simplifies deployment across AWS, Azure, and Google Cloud with minimal maintenance.

###6. What is the role of indexes in MongoDB, and how do they improve performance?

Ans) Indexes in MongoDB help the database find and retrieve data faster.
Without indexes, MongoDB scans every document in a collection (a full collection scan).
Indexes, like single-field or compound indexes, significantly improve query performance but may slightly impact write speed.

###7. Describe the stages of the MongoDB aggregation pipeline.

Ans)The aggregation pipeline processes data through multiple stages, transforming documents at each step.
Common stages include:

$match → filters documents

$group → groups data and performs aggregations

$sort → sorts documents

$project → selects or reshapes fields

$limit / $skip → controls the number of results
This enables complex data analysis directly within MongoDB.

###8. What is sharding in MongoDB? How does it differ from replication?

Ans) Sharding distributes data across multiple servers (called shards) to improve scalability and performance for large datasets.
Replication, on the other hand, duplicates the same data across multiple servers for fault tolerance.
In short, sharding = scalability; replication = reliability.

###9. What is PyMongo, and why is it used?

Ans) PyMongo is the official Python driver for MongoDB.
It allows Python applications to connect to MongoDB, perform CRUD operations, run queries, and use aggregation pipelines.
PyMongo makes it simple to integrate MongoDB with Python-based projects.

###10. What are the ACID properties in the context of MongoDB transactions?

Ans) ACID stands for Atomicity, Consistency, Isolation, and Durability.
MongoDB supports multi-document ACID transactions, ensuring:

Atomicity : all operations succeed or none do

Consistency : database remains valid

Isolation : transactions don’t interfere

Durability : changes persist even after a crash

###11. What is the purpose of MongoDB’s explain() function?

Ans)The explain() function shows how MongoDB executes a query.
It provides details like index usage, query plan, and performance metrics.
Developers use it to analyze and optimize queries for better efficiency.

###12. How does MongoDB handle schema validation?

Ans) MongoDB supports optional schema validation rules.
You can define a schema using JSON Schema syntax to enforce field types, required fields, and constraints.
This ensures data integrity while maintaining flexibility for evolving data structures.

###13. What is the difference between a primary and a secondary node in a replica set?

Ans)In a replica set:

The primary node handles all write operations and by default, reads.

Secondary nodes replicate data from the primary and can serve read queries if enabled.
If the primary fails, one of the secondaries becomes the new primary automatically.

###14. What security mechanisms does MongoDB provide for data protection?

Ans)MongoDB offers several security features:

Authentication (verifying user identity)

Authorization (role-based access control)

Encryption (data encrypted in-transit and at-rest)

Auditing (tracking user actions)
These mechanisms ensure strong data protection and compliance.

###15. Explain the concept of embedded documents and when they should be used.

Ans)Embedded documents are nested JSON-like structures stored within a single MongoDB document.
They are used when related data is frequently accessed together (e.g., user details and address).
Embedding improves read performance and keeps related information in one place, avoiding complex joins.

###16. What is the purpose of MongoDB’s $lookup stage in aggregation?

Ans) The $lookup stage in MongoDB’s aggregation pipeline is used to perform a left outer join between two collections.
It allows combining documents from different collections based on a shared field.

For example, you can use $lookup to combine customer data with their order details.

This stage is useful for enriching data without needing multiple queries.

###17. What are some common use cases for MongoDB?

Ans)MongoDB is widely used in scenarios where flexible and scalable data storage is needed.
Common use cases include:

Content management systems (CMS)

E-commerce and product catalogs

Real-time analytics

IoT data storage

Mobile and web applications

Social media and user profiles
Its document model makes it ideal for dynamic, fast-changing data.

###18. What are the advantages of using MongoDB for horizontal scaling?

Ans)MongoDB supports horizontal scaling through sharding, which distributes data across multiple servers.
Advantages include:

Better performance for large datasets

Improved read and write throughput

Ability to scale storage capacity easily

Reduced load on individual servers
This ensures that MongoDB can handle growing data and user demands efficiently.

###19. How do MongoDB transactions differ from SQL transactions?

Ans)MongoDB transactions are document-based and can span multiple documents or collections (since version 4.0).
SQL transactions are table-based and typically enforce stricter schema and consistency.

While SQL is inherently ACID-compliant, MongoDB adds ACID compliance at the document and multi-document level.
MongoDB transactions are designed to be lightweight and optimized for distributed environments.

###20. What are the main differences between capped collections and regular collections?

Ans)Capped collections have a fixed size and automatically overwrite the oldest data when the limit is reached.

They maintain insertion order and are ideal for logging or caching where only recent data matters.

Regular collections grow dynamically and do not overwrite older data.
Capped collections provide predictable performance due to their fixed size.

###21). What is the purpose of the $match stage in MongoDB’s aggregation pipeline?

Ans)The "match" stage filters documents based on specified conditions, similar to a SQL WHERE clause.
It selects only documents that meet the criteria for further processing in the pipeline.

Placing $match early in the pipeline improves performance by reducing the number of documents processed later.

###22. How can you secure access to a MongoDB database?

Ans)MongoDB security can be ensured using several mechanisms:

Enable authentication to verify users.

Use authorization with role-based access control (RBAC).

Encrypt data both in transit (TLS/SSL) and at rest.

Restrict network access using firewalls or IP whitelisting.

Audit logs to monitor database activities.
Following these practices helps protect MongoDB from unauthorized access and attacks.

###23. What is MongoDB’s WiredTiger storage engine, and why is it important?

Ans)WiredTiger is the default storage engine in MongoDB, designed for high performance and concurrency.

It supports document-level locking, compression, and checkpointing for efficient data management.

WiredTiger improves write throughput, reduces disk space usage, and provides durability through journaling.
Its design makes MongoDB faster and more reliable for modern workloads.

#Practical Questions

**Note: MongoDB Atlas has deprecated the classic M0 Free Cluster.So, i am just writing the valid queries. (Not sure this will be accepted or not):**

###1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB

In [2]:
import pandas as pd
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
collection = db["Orders"]

# Load CSV file
from google.colab import files
uploaded = files.upload()


Saving superstore.csv to superstore.csv


In [6]:
df = pd.read_csv("superstore.csv", encoding="latin1")

print("Data inserted successfully into MongoDB!")


Data inserted successfully into MongoDB!


2. Retrieve and Print All Documents from the Orders Collection

In [None]:
for order in collection.find():
    print(order)

3. Count and Display Total Number of Documents

In [None]:
total_docs = collection.count_documents({})
print("Total number of documents:", total_docs)


4. Fetch All Orders from the "West" Region

In [None]:
west_orders = collection.find({"Region": "West"})

for order in west_orders:
  print(order)

5. Find Orders Where Sales > 500

In [None]:
high_sales = collection.find({"Sales:" {"$gt": 500}})

for order in high_sales:
  print(order)

6. Fetch Top 3 Orders with Highest Profit

In [None]:
top_profit_orders = collection.find().sort("Profit", -1).limit(3)

for order in top_profit_orders:
  print(order)

7. Update Ship Mode from "First Class" to "Premium Class"

In [None]:
result = collection.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)

print("Documents updated:", result.modified_count)

8. 8. Delete All Orders Where Sales < 50

In [None]:
delete_result = collection.delete_many({"Sales": {"$lt": 50}})

print("Documents deleted:", delete_result.deleted_count)

9. Aggregation: Total Sales per Region

In [None]:
pipeline = [
    {
        "$group": {
            "_id": "$Region",
            "Total_Sales": {"$sum": "$Sales"}
        }
    }
]

for result in collection.aggregate(pipeline):
    print(result)


10. Fetch All Distinct Ship Mode Values

In [None]:
ship_modes = collection.distinct("Ship Mode")
print("Distinct Ship Modes:", ship_modes)


11. Count Number of Orders for Each Category

In [None]:
pipeline = [
    {
        "$group": {
            "_id": "$Category",
            "Order_Count": {"$sum": 1}
        }
    }
]

for result in collection.aggregate(pipeline):
    print(result)
