1. What are the key differences between SQL and NoSQL databases?

### SQL (Relational) Databases

Structure:- Tables with rows & columns	

Schema:- Fixed schema (predefined)	

Scalability:- Vertical scaling (scaling up)	

Query Language:- SQL (Structured Query Language)

Joins:-	Uses joins between tables	

Transactions:- Strong ACID compliance	

### NoSQL (MongoDB)

Structure:- Collections with JSON-like documents

Schema:- Flexible schema (dynamic)

Scalability:- Horizontal scaling (sharding)

Query Language:- NoSQL query language (JSON-based)

Joins:- Embeds data, uses $lookup for joins

Transactions:- Eventual consistency with ACID support (since MongoDB 4.0)


2. What makes MongoDB a good choice for modern applications?


-> Schema flexibility: Adapts to changing requirements.

-> High scalability: Supports horizontal scaling with sharding.

-> Fast performance: Uses in-memory storage and indexing.

-> Cloud integration: Works well with MongoDB Atlas.

-> JSON-like documents: Easy for developers using JavaScript, Python, etc.

3. Explain the concept of collections in MongoDB.

A collection is a group of MongoDB documents, similar to a table in a relational database. Collections don’t enforce a fixed schema, allowing flexible data storage.

4. How does MongoDB ensure high availability using replication?

MongoDB uses Replica Sets, consisting of:

Primary Node (handles writes & reads).

Secondary Nodes (replicate data from the primary).

Arbiter (optional, helps in elections but doesn't store data). If the primary fails, a secondary is elected as the new primary.

5. What are the main benefits of MongoDB Atlas?

Fully managed cloud service.

Automated backups & security.

Scalability with sharding & replication.

Multi-cloud support (AWS, GCP, Azure).

Monitoring & performance optimization.


6. What is the role of indexes in MongoDB, and how do they improve performance?

Indexes improve query performance by allowing MongoDB to locate data faster. Without an index, MongoDB performs a collection scan (slow). Index types:

Single field ({field: 1})

Compound ({field1: 1, field2: -1})

Text indexes for searching.

7. Describe the stages of the MongoDB aggregation pipeline.



$match – Filters data.

$group – Groups data (e.g., total sales per region).

$sort – Orders results.

$project – Reshapes output.

$limit – Restricts the number of documents.

$lookup – Joins collections.

$unwind – Splits arrays into multiple documents.

8. What is sharding in MongoDB? How does it differ from replication?



### Replication	

Purpose:- High availability	

How it Works:- Copies data across nodes

Failure Handling:- Elects a new primary	

### Sharding

Purpose:- Horizontal scaling

How it Works:- Distributes data across multiple servers

Failure Handling:- Keeps serving data from shards

9. What is PyMongo, and why is it used?



PyMongo is a Python library for interacting with MongoDB, allowing CRUD operations and aggregation queries.

10. What are the ACID properties in the context of MongoDB transactions?



Atomicity: Transactions complete fully or not at all.

Consistency: Data remains valid after transactions.

Isolation: Transactions don’t interfere with each other.

Durability: Changes persist after execution.

11. What is the purpose of MongoDB's explain() function?



db.collection.find().explain() shows query execution plans, helping optimize performance.

12. How does MongoDB handle schema validation?



Using the $jsonSchema operator, you can enforce required fields, data types, and constraints.

13. What is the difference between a primary and a secondary node in a replica set?



Primary: Handles all writes and reads.

Secondary: Syncs from the primary and can serve read queries.

14. What security mechanisms does MongoDB provide for data protection?



Authentication (SCRAM, LDAP, x.509).

Role-based access control (RBAC).

Encryption (TLS/SSL, field-level encryption).

IP whitelisting.

15. Explain the concept of embedded documents and when they should be used.



Embedded documents store related data within a single document, reducing the need for joins.

{

   "customer": {

       "name": "John",

       "address": { "city": "NYC", "zip": "10001" }

   }
   
}

Use embedded documents when:

The related data is frequently accessed together.

The document size remains manageable.

16. What is the purpose of MongoDB's $lookup stage in aggregation?



Purpose of $lookup in Aggregation Joins two collections, similar to SQL joins:
{

   $lookup: { 

      from: "products",

      localField: "product_id",

      foreignField: "_id",

      as: "product_details"

   }
   
}



17. What are some common use cases for MongoDB?


E-commerce applications.

IoT & real-time analytics.

Content management systems.

Financial data storage

18. What are the advantages of using MongoDB for horizontal scaling?

Uses sharding for distributing data.

Increases throughput.

Ensures high availability.

19. How do MongoDB transactions differ from SQL transactions?



MongoDB supports multi-document transactions (since v4.0), but they have more overhead than SQL transactions.

20. What are the main differences between capped collections and regular collections?



### Capped Collections	          

Fixed	                          

Preserves insertion order	        

Logging, real-time data	      

### Regular Collections

Dynamic

No strict order

General-purpose storage


21. What is the purpose of the $match stage in MongoDB's aggregation pipeline?



Purpose of $match in Aggregation Pipeline Filters documents based on conditions, improving performance.

22. How can you secure access to a MongoDB database?



How to Secure Access to MongoDB

Enable authentication.

Restrict network access.

Use encryption.

Set strong access controls.

23. What is MongoDB's Wired Tiger storage engine, and why is it important?


WiredTiger is MongoDB’s default storage engine that provides:

Document-level locking (better concurrency).

Compression (reduces storage usage).

Better performance.


### Practical

1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB.



In [4]:
pip install pymongo

Collecting pymongo
  Downloading pymongo-4.11-cp312-cp312-win_amd64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.11-cp312-cp312-win_amd64.whl (882 kB)
   ---------------------------------------- 0.0/882.3 kB ? eta -:--:--
   ---------------------------------- ---- 786.4/882.3 kB 16.9 MB/s eta 0:00:01
   ---------------------------------- ---- 786.4/882.3 kB 16.9 MB/s eta 0:00:01
   ---------------------------------------- 882.3/882.3 kB 1.7 MB/s eta 0:00:00
Downloading dnspython-2.7.0-py3-none-any.whl (313 kB)
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.11
Note: you may need to restart the kernel to use updated packages.


In [9]:
import pandas as pd
from pymongo import MongoClient


In [10]:
client = pymongo.MongoClient("mongodb://localhost:27017/", serverSelectionTimeoutMS=60000)
db = client["SuperstoreDB"]
collection = db["transactions"]

df = pd.read_csv("C:/Users/Hp/Downloads/superstore.csv", encoding='ISO-8859-1')
data = df.to_dict(orient="records")

collection.insert_many(data)
print("Data loaded successfully!")


ServerSelectionTimeoutError: localhost:27017: [WinError 10061] No connection could be made because the target machine actively refused it (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 60.0s, Topology Description: <TopologyDescription id: 67a9dde4999d747c8cc766d3, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [WinError 10061] No connection could be made because the target machine actively refused it (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

2. Retrieve and print all documents from the Orders collection.



In [11]:
for doc in collection.find():
    print(doc)


ServerSelectionTimeoutError: localhost:27017: [WinError 10061] No connection could be made because the target machine actively refused it (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 60.0s, Topology Description: <TopologyDescription id: 67a9dde4999d747c8cc766d3, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [WinError 10061] No connection could be made because the target machine actively refused it (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

3. Count and display the total number of documents in the Orders collection.



In [12]:
count = collection.count_documents({})
print("Total Orders:", count)


KeyboardInterrupt: 

4. Write a query to fetch all orders from the "West" region.



In [None]:
west_orders = collection.find({"Region": "West"})


5. Write a query to find orders where Sales is greater than 500.



In [None]:
high_sales = collection.find({"Sales": {"$gt": 500}})


6. Fetch the top 3 orders with the highest Profit.



In [None]:
top_profit = collection.find().sort("Profit", -1).limit(3)


7. Update all orders with Ship Mode as "First Class" to "Premium Class."



In [None]:
collection.update_many({"Ship Mode": "First Class"}, {"$set": {"Ship Mode": "Premium Class"}})


8. Delete all orders where Sales is less than 50.



In [None]:
collection.delete_many({"Sales": {"$lt": 50}})


9. Use aggregation to group orders by Region and calculate total sales per region.



In [None]:
pipeline = [
    {"$group": {"_id": "$Region", "TotalSales": {"$sum": "$Sales"}}}
]
region_sales = collection.aggregate(pipeline)


10. Fetch all distinct values for Ship Mode from the collection.



In [None]:
ship_modes = collection.distinct("Ship Mode")


11. Count the number of orders for each category.

In [None]:
category_counts = collection.aggregate([
    {"$group": {"_id": "$Category", "count": {"$sum": 1}}}
])
