# Theoritical Questions

1. What are the key differences between SQL and NoSQL databases
- SQL databases store data in structured tables with fixed schemas and use SQL for queries. They are best for complex relationships and transactions.

- NoSQL databases store data flexibly as documents, key-value pairs, or graphs, and are ideal for handling large, unstructured, or quickly changing data.

2. What makes MongoDB a good choice for modern applications
- MongoDB is a good choice for modern applications because:
    - It stores data in a flexible, JSON-like format (BSON), making it easy to handle changing or unstructured data.
    - It supports horizontal scaling, making it ideal for high-traffic, large-scale apps.
    - It's fast, developer-friendly, and integrates well with modern tech stacks (like Node.js, Python, etc.).

3. Explain the concept of collections in MongoDB
- In MongoDB, a collection is like a table in SQL databases.
    - It stores documents (records) in a flexible, JSON-like format.
    - Unlike tables, collections don’t require a fixed schema, so each document can have different fields.
    - Example: A users collection might store user profiles, with each document having details like name, email, and age.

4. How does MongoDB ensure high availability using replication?
- MongoDB uses replica sets, where data is duplicated from the primary to secondary nodes. If the primary fails, a secondary is promoted to maintain availability.

5. What are the main benefits of MongoDB Atlas?
- MongoDB Atlas is a managed cloud database with features like auto-scaling, backups, monitoring, and global distribution, reducing manual management.

6. What is the role of indexes in MongoDB, and how do they improve performance?
- Indexes allow MongoDB to quickly locate data, significantly improving read performance by avoiding full collection scans.

7. Describe the stages of the MongoDB aggregation pipeline.
- Stages like $match, $group, $project, and $sort process and transform documents step by step, enabling powerful data aggregation and analysis.

8. What is sharding in MongoDB? How does it differ from replication?
- Sharding splits data across servers for scalability, while replication copies data to ensure availability and fault tolerance.

9. What is PyMongo, and why is it used?
- PyMongo is the official MongoDB driver for Python, allowing developers to connect and interact with MongoDB from Python applications.

10. What are the ACID properties in the context of MongoDB transactions?
- ACID stands for Atomicity, Consistency, Isolation, and Durability, ensuring safe and reliable multi-document transactions in MongoDB.

11. What is the purpose of MongoDB’s explain() function?
- The explain() function shows how a query is executed, helping developers optimize performance by analyzing index use and execution stages.

12. How does MongoDB handle schema validation?
- MongoDB allows defining validation rules using JSON Schema, enabling optional enforcement of data structure while keeping flexibility.

13. What is the difference between a primary and a secondary node in a replica set?
- The primary node handles all write operations, while secondary nodes replicate the data and can be used for read operations or failover.

14. What security mechanisms does MongoDB provide for data protection?
- MongoDB offers authentication, authorization (RBAC), TLS/SSL encryption, IP whitelisting, and auditing to protect data.

15. Explain the concept of embedded documents and when they should be used.
- Embedded documents are nested documents inside another document, useful when related data is accessed together, reducing the need for joins.

16. What is the purpose of MongoDB’s $lookup stage in aggregation?
- $lookup performs a join-like operation between two collections, allowing you to combine data based on a related field.

17. What are some common use cases for MongoDB?
- MongoDB is great for real-time analytics, CMS, product catalogs, IoT applications, and use cases needing flexible or evolving schemas.



18. What are the advantages of using MongoDB for horizontal scaling?
- MongoDB supports automatic sharding, allowing it to scale out across servers, handling more traffic and larger datasets efficiently.

19. How do MongoDB transactions differ from SQL transactions?
- MongoDB supports multi-document ACID transactions (since v4.0), but they're optional and may add overhead compared to SQL's built-in transactions.



20. What are the main differences between capped collections and regular collections?
- Capped collections are fixed-size and overwrite old data, while regular collections grow dynamically and store data without order constraints.

21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?
- $match filters documents based on specified conditions, like a WHERE clause, improving performance by reducing documents early in the pipeline.



22. How can you secure access to a MongoDB database?
- You can secure access using authentication, IP whitelisting, encryption (TLS), role-based access control, and disabling unused network interfaces.

23. What is MongoDB’s WiredTiger storage engine, and why is it important?
WiredTiger is the default storage engine that offers high performance, compression, and concurrent access, improving speed and storage efficiency.

# Practical Questions

In [None]:
# 1.Write a Python script to load the Superstore dataset from a CSV file into MongoDB

import pandas as pd
from pymongo import MongoClient

csv_file = "superstore.csv"  # Use full path if needed
df = pd.read_csv('superstore.csv', encoding='ISO-8859-1')


data_dict = df.to_dict(orient='records')

# Step 3: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")  
db = client["superstore_db"]                      
collection = db["orders"]                           


collection.insert_many(data_dict)

print(f"{len(data_dict)} records inserted into MongoDB successfully.")


In [None]:
# 2.Retrieve and print all documents from the Orders collection

list = collection.find()
for data in list:
    print(data)

In [None]:
# 3.Count and display the total number of documents in the Orders collection

ct = collection.count_documents({})
print(f"The total number of document is {ct}")

In [None]:
# 4.Write a query to fetch all orders from the "West" region
docs = collection.find({'Region':'West'})
for doc in docs:
    print(doc)

In [None]:
# 5.Write a query to find orders where Sales is greater than 500
docs = collection.find({'Sales':{'$gt':500}})
for doc in docs:
    print(doc)

In [None]:
# 6.Fetch the top 3 orders with the highest Profit
docs = collection.find().sort('Profit', -1).limit(3)
for doc in docs:
    print(doc)

In [None]:
# 7.Update all orders with Ship Mode as "First Class" to "Premium Class".

collection.update_many({'Ship Mode':'First Class'},{'$set':{'Ship Mode':'Premium Class'}})
doc = collection.find({'Ship Mode':'Premium Class'})
print(*doc)

In [None]:
# 8.Delete all orders where Sales is less than 50.
collection.delete_many({"Sales" : {'%lt' : 50}})
print("deletion successful")

In [None]:
# 9.Use aggregation to group orders by Region and calculate total sales per region
pipeline = [
    {
        '$group': {
            '_id': '$Region',
            'total_sales': {'$sum': '$Sales'}
        }
    }
]

results = collection.aggregate(pipeline)

for result in results:
    print(f"Region: {result['_id']}, Total Sales: {result['total_sales']}")


In [None]:
# 10. Fetch all distinct values for Ship Mode from the collection
ship_modes = collection.distinct("Ship Mode")
print("Distinct Ship Modes:")
for mode in ship_modes:
    print("-", mode)


In [None]:
# 11. Count the number of orders for each category.
pipeline = [
    {
        '$group' : {
            '_id': "$Category",
            'count': {'$sum': 1}
        }
    }
]
result = collection.aggregate(pipeline)
for res in result:
    print(f"Category: {res['_id']}, Count: {res['count']}")
