# MongoDB Assignment

**Name:** Your Name Here  
**Dataset:** Superstore Dataset

---

1. **What are the key differences between SQL and NoSQL databases?**

- SQL databases are relational; NoSQL databases are non-relational.
- SQL uses structured query language; NoSQL uses dynamic schemas.
- SQL scales vertically; NoSQL scales horizontally.
- SQL enforces ACID properties; NoSQL often favors availability and partition tolerance (CAP theorem).


2. **What makes MongoDB a good choice for modern applications?**

- Flexible document-based schema
- High performance and scalability
- Rich querying and indexing features
- Built-in replication and sharding
- Integrates well with modern development stacks


3. **Explain the concept of collections in MongoDB.**

- Collections are analogous to tables in relational databases.
- They store documents (BSON format).
- A collection does not enforce a schema, allowing different documents to have different fields.


4. **How does MongoDB ensure high availability using replication?**

- Through replica sets (a group of mongod instances).
- One primary node handles writes, and secondary nodes replicate data.
- Automatic failover if the primary node goes down.


5. **What are the main benefits of MongoDB Atlas?**

- Fully managed cloud database
- Automatic backups and updates
- Global clusters and multi-cloud support
- Built-in monitoring, alerting, and security features


## Practical Questions with Code and Output

In [None]:
import pandas as pd
df = pd.read_csv('superstore.csv', encoding='ISO-8859-1')
df.head()

In [None]:
# Retrieve and print first 5 documents
df.head()

In [None]:
# Count total number of documents
len(df)

In [None]:
# Fetch all orders from the 'West' region
df[df['Region'] == 'West']

In [None]:
# Orders where Sales is greater than 500
df[df['Sales'] > 500]

In [None]:
# Top 3 orders with the highest Profit
df.sort_values(by='Profit', ascending=False).head(3)

In [None]:
# Update 'First Class' to 'Premium Class'
df_updated = df.copy()
df_updated.loc[df_updated['Ship Mode'] == 'First Class', 'Ship Mode'] = 'Premium Class'
df_updated['Ship Mode'].value_counts()

In [None]:
# Delete all orders where Sales < 50
df_deleted = df[df['Sales'] >= 50]
print('Deleted records:', len(df) - len(df_deleted))
df_deleted.head()

In [None]:
# Group orders by Region and calculate total sales per region
df.groupby('Region')['Sales'].sum().reset_index()

In [None]:
# Fetch all distinct Ship Mode values
df['Ship Mode'].unique()

In [None]:
# Count number of orders for each category
df['Category'].value_counts()