# THEORETICAL QUESTIONS
1. What are the key differences between SQL and NoSQL databases?

>>SQL databases are relational, schema-based, and store data in tables, whereas NoSQL databases are non-relational, schema-flexible, and store data in formats like documents, key-value pairs, or graphs. SQL supports complex joins and vertical scaling, while NoSQL supports horizontal scaling and is better for big data and distributed systems.

2. What makes MongoDB a good choice for modern applications?

>>MongoDB offers schema flexibility, high scalability, fast read/write operations, horizontal scaling, and easy integration with cloud and microservices, making it ideal for modern applications.

3. Explain the concept of collections in MongoDB.

>>A collection in MongoDB is a group of documents stored together. It is similar to a table in SQL but does not enforce a fixed schema, allowing documents with different structures.

4. How does MongoDB ensure high availability using replication?

>>MongoDB uses replica sets, where data is copied across multiple nodes. One node acts as the primary and others as secondary. If the primary fails, a secondary automatically becomes the new primary.

5. What are the main benefits of MongoDB Atlas?

>>MongoDB Atlas provides a fully managed cloud database with automatic backups, scaling, monitoring, security, and global deployment support.

6. What is the role of indexes in MongoDB, and how do they improve performance?

>>Indexes allow MongoDB to locate documents quickly without scanning the entire collection, significantly improving query performance.

7. Describe the stages of the MongoDB aggregation pipeline.

>>Common aggregation stages include:
$match – filter documents
$group – group data
$sort – sort results
$project – reshape output
$lookup – join collections

8. What is sharding in MongoDB? How does it differ from replication?

>>Sharding distributes data across multiple servers for scalability, while replication copies data across servers for availability and fault tolerance.

9. What is PyMongo, and why is it used?

>>PyMongo is the official Python driver for MongoDB. It allows Python applications to connect, query, and manipulate MongoDB databases.

10. What are the ACID properties in MongoDB transactions?

>>ACID stands for Atomicity, Consistency, Isolation, and Durability. MongoDB supports ACID properties in multi-document transactions.

11. What is the purpose of MongoDB’s explain() function?

>>The explain() function shows how a query is executed, including index usage and execution time, helping in performance optimization.

12. How does MongoDB handle schema validation?

>>MongoDB uses JSON Schema validation to enforce rules on documents, such as required fields and data types.

13. What is the difference between a primary and a secondary node in a replica set?

>>The primary node handles all write operations, while secondary nodes replicate data from the primary and can serve read requests.

14. What security mechanisms does MongoDB provide for data protection?

>>MongoDB provides authentication, role-based access control, encryption at rest and in transit, auditing, and network security.

15. Explain embedded documents and when they should be used.

>>Embedded documents store related data within a single document. They should be used when data is frequently accessed together.

16. What is the purpose of MongoDB’s $lookup stage?

>>The $lookup stage performs a left outer join between collections in aggregation pipelines.

17. What are some common use cases for MongoDB?

>>Common use cases include e-commerce, real-time analytics, IoT applications, content management systems, and mobile apps.

18. What are the advantages of using MongoDB for horizontal scaling?

>>MongoDB supports automatic sharding, load balancing, and distributed data storage, making horizontal scaling efficient.

19. How do MongoDB transactions differ from SQL transactions?

>>MongoDB transactions are designed for distributed systems and scalability, whereas SQL transactions are tightly coupled with relational schemas.

20. What are the differences between capped collections and regular collections?

>>Capped collections have a fixed size and automatically overwrite old data, while regular collections grow dynamically.

21. What is the purpose of the $match stage in aggregation?

>>$match filters documents early in the pipeline, improving performance by reducing the data processed.

22. How can you secure access to a MongoDB database?

>>By enabling authentication, using strong passwords, role-based access control, IP whitelisting, and encryption.

23. What is MongoDB’s WiredTiger storage engine and why is it important?

>>WiredTiger is MongoDB’s default storage engine, providing high performance, compression, and concurrency control.

In [16]:
# PRACTICAL
#1. Write a Python script to load the Superstore dataset into MongoDB.
import pandas as pd
from pymongo import MongoClient

df = pd.read_csv("mongo DB assim.csv", encoding="latin1")

client = MongoClient("mongodb://localhost:27017/")
db = client["superstoreDB"]
db.orders.insert_many(df.to_dict("records"))

print("Data inserted successfully into MongoDB")


ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 69563ff83095d4967aefbb62, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

In [5]:
#2. Retrieve and print all documents from the Orders collection.
db.orders.find()


<pymongo.synchronous.cursor.Cursor at 0x7d881f3fb6b0>

In [7]:
#3. Count the total number of documents in the Orders collection.
db.orders.count_documents({})

ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 69563e113095d4967aef9456, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

In [8]:
#4. Fetch all orders from the "West" region.
db.orders.find({ Region: "West" })


NameError: name 'Region' is not defined

In [9]:
#5. Find orders where Sales is greater than 500.
db.orders.find({ Sales: { $gt: 500 } })


SyntaxError: invalid syntax (ipython-input-99469389.py, line 2)

In [10]:
#6. Fetch the top 3 orders with the highest Profit.
db.orders.find().sort({ Profit: -1 }).limit(3)


NameError: name 'Profit' is not defined

In [11]:
#7. Update Ship Mode from "First Class" to "Premium Class".
db.orders.updateMany(
  { "Ship Mode": "First Class" },
  { $set: { "Ship Mode": "Premium Class" } }
)


SyntaxError: invalid syntax (ipython-input-1527137869.py, line 4)

In [12]:
#8. Delete all orders where Sales is less than 50.
db.orders.deleteMany({ Sales: { $lt: 50 } })


SyntaxError: invalid syntax (ipython-input-3825382460.py, line 2)

In [13]:
#9. Group orders by Region and calculate total sales per region.
db.orders.aggregate([
  { $group: { _id: "$Region", totalSales: { $sum: "$Sales" } } }
])


SyntaxError: invalid syntax (ipython-input-3279085458.py, line 3)

In [14]:
#10. Fetch all distinct values for Ship Mode.
db.orders.distinct("Ship Mode")


ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 69563e113095d4967aef9456, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>

In [None]:
#11. Count the number of orders for each category.
db.orders.aggregate([
  { $group: { _id: "$Category", count: { $sum: 1 } } }
])
