# MongoDB Assignment
This notebook contains both theoretical and practical questions related to MongoDB using the Superstore dataset.

### 1. What are the key differences between SQL and NoSQL databases?
- SQL databases are relational, use structured query language, and have fixed schemas.
- NoSQL databases are non-relational, support dynamic schemas, and are often used for big data and real-time applications.

### 2. What makes MongoDB a good choice for modern applications?
- Schema-less design
- High performance and scalability
- Built-in replication and sharding
- Powerful querying and indexing features

### 3. Explain the concept of collections in MongoDB.
- A collection is a group of MongoDB documents, similar to a table in SQL databases.

### 4. How does MongoDB ensure high availability using replication?
- MongoDB uses replica sets where data is replicated across multiple nodes. One node acts as primary and others as secondary.

### 5. What are the main benefits of MongoDB Atlas?
- Fully managed cloud database
- Automated backups
- Scalability
- Security and compliance features
- Integrated monitoring and analytics

### 6. What is the role of indexes in MongoDB, and how do they improve performance?
- Indexes support efficient query execution and reduce the amount of data scanned.

### 7. Describe the stages of the MongoDB aggregation pipeline.
- $match, $group, $project, $sort, $limit, $lookup, $unwind, etc., process data in a pipeline sequence.

### 8. What is sharding in MongoDB? How does it differ from replication?
- Sharding distributes data across multiple servers.
- Replication copies data for redundancy and availability.

### 9. What is PyMongo, and why is it used?
- PyMongo is the official MongoDB driver for Python to interact with MongoDB databases.

### 10. What are the ACID properties in the context of MongoDB transactions?
- Atomicity, Consistency, Isolation, Durability. MongoDB supports multi-document ACID transactions since v4.0.

### 12. How does MongoDB handle schema validation?
- MongoDB supports schema validation using JSON Schema. You can define required fields, data types, and validation rules using the `$jsonSchema` operator.

### 13. What is the difference between a primary and a secondary node in a replica set?
- The primary node receives all write operations.
- Secondary nodes replicate the primary’s data and can serve read operations if configured.

### 14. What security mechanisms does MongoDB provide for data protection?
- Authentication and authorization
- Role-Based Access Control (RBAC)
- TLS/SSL for encrypted connections
- Field-level encryption
- Auditing

### 15. Explain the concept of embedded documents and when they should be used.
- Embedded documents are nested documents inside another document. Use them for related data that is often queried together to reduce the need for joins.

### 16. What is the purpose of MongoDB's $lookup stage in aggregation?
- `$lookup` performs a left outer join to combine documents from two collections based on a related field.

### 17. What are some common use cases for MongoDB?
- Content management systems
- Real-time analytics
- Product catalogs
- IoT applications
- Mobile apps and personalization engines

### 18. What are the advantages of using MongoDB for horizontal scaling?
- Sharding distributes data across nodes, allowing the database to handle more traffic and larger datasets without performance loss.

### 19. How do MongoDB transactions differ from SQL transactions?
- MongoDB supports multi-document ACID transactions (since v4.0), but traditionally focused on single-document atomicity.
- SQL databases are inherently transaction-oriented.

### 20. What are the main differences between capped collections and regular collections?
- Capped collections have fixed size, preserve insertion order, and automatically overwrite oldest data when full. Regular collections grow dynamically.

### 21. What is the purpose of the $match stage in MongoDB's aggregation pipeline?
- `$match` filters documents to pass only those that meet the specified condition(s) to the next pipeline stage.

### 22. How can you secure access to a MongoDB database?
- Enable authentication and create users with roles.
- Use firewalls and IP whitelisting.
- Use TLS for encrypted connections.
- Regularly update and patch MongoDB.

### 23. What is MongoDB's WiredTiger storage engine, and why is it important?
- WiredTiger is the default storage engine. It provides document-level locking, compression, and improved concurrency and performance.

---
## Additional Practical Work
---

# MongoDB Assignment

**Name:** Your Name Here  
**Dataset:** Superstore Dataset

---

1. **What are the key differences between SQL and NoSQL databases?**

- SQL databases are relational; NoSQL databases are non-relational.
- SQL uses structured query language; NoSQL uses dynamic schemas.
- SQL scales vertically; NoSQL scales horizontally.
- SQL enforces ACID properties; NoSQL often favors availability and partition tolerance (CAP theorem).


2. **What makes MongoDB a good choice for modern applications?**

- Flexible document-based schema
- High performance and scalability
- Rich querying and indexing features
- Built-in replication and sharding
- Integrates well with modern development stacks


3. **Explain the concept of collections in MongoDB.**

- Collections are analogous to tables in relational databases.
- They store documents (BSON format).
- A collection does not enforce a schema, allowing different documents to have different fields.


4. **How does MongoDB ensure high availability using replication?**

- Through replica sets (a group of mongod instances).
- One primary node handles writes, and secondary nodes replicate data.
- Automatic failover if the primary node goes down.


5. **What are the main benefits of MongoDB Atlas?**

- Fully managed cloud database
- Automatic backups and updates
- Global clusters and multi-cloud support
- Built-in monitoring, alerting, and security features


## Practical Questions with Code and Output

In [None]:
import pandas as pd
df = pd.read_csv('superstore.csv', encoding='ISO-8859-1')
df.head()

In [None]:
# Retrieve and print first 5 documents
df.head()

In [None]:
# Count total number of documents
len(df)

In [None]:
# Fetch all orders from the 'West' region
df[df['Region'] == 'West']

In [None]:
# Orders where Sales is greater than 500
df[df['Sales'] > 500]

In [None]:
# Top 3 orders with the highest Profit
df.sort_values(by='Profit', ascending=False).head(3)

In [None]:
# Update 'First Class' to 'Premium Class'
df_updated = df.copy()
df_updated.loc[df_updated['Ship Mode'] == 'First Class', 'Ship Mode'] = 'Premium Class'
df_updated['Ship Mode'].value_counts()

In [None]:
# Delete all orders where Sales < 50
df_deleted = df[df['Sales'] >= 50]
print('Deleted records:', len(df) - len(df_deleted))
df_deleted.head()

In [None]:
# Group orders by Region and calculate total sales per region
df.groupby('Region')['Sales'].sum().reset_index()

In [None]:
# Fetch all distinct Ship Mode values
df['Ship Mode'].unique()

In [None]:
# Count number of orders for each category
df['Category'].value_counts()