# MongoDB

## Theoretical Questions

### **1. What are the key differences between SQL and NoSQL databases?**
| Feature            | **SQL Databases**                                                                       | **NoSQL Databases**                                                                                                         |
| ------------------ | --------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| **Data Model**     | Structured data stored in **tables** (rows & columns).                                  | Unstructured, semi-structured, or structured data stored in **documents**, **key-value pairs**, **graphs**, or **columns**. |
| **Schema**         | **Fixed schema** — must be defined before inserting data.                               | **Dynamic schema** — flexible structure, allows changes without altering the schema.                                        |
| **Scalability**    | Scales **vertically** (by increasing server power).                                     | Scales **horizontally** (by adding more servers).                                                                           |
| **Query Language** | Uses **SQL (Structured Query Language)**.                                               | Uses **non-SQL queries** (JSON-like, API-based, or custom query syntax).                                                    |
| **Transactions**   | Supports complex **ACID (Atomicity, Consistency, Isolation, Durability)** transactions. | Often follows **BASE (Basically Available, Soft state, Eventually consistent)** for flexibility.                            |
| **Examples**       | MySQL, PostgreSQL, Oracle, SQL Server                                                   | MongoDB, Cassandra, CouchDB, Redis                                                                                          |


### **2. What makes MongoDB a good choice for modern applications**

MongoDB is a **document-oriented NoSQL database** designed for high performance, scalability, and flexibility — ideal for modern web and mobile apps.

**Key Advantages:**

1. **Flexible Schema:**

   * Stores data as JSON-like documents (`BSON`).
   * Allows fields to vary between documents, making it perfect for agile development.

2. **Scalability:**

   * Uses **horizontal scaling (sharding)** to distribute data across servers.

3. **High Performance:**

   * Optimized for **fast read and write** operations with efficient indexing.

4. **Built-in Replication:**

   * Provides automatic **failover and redundancy** using replica sets.

5. **Rich Query Language:**

   * Supports powerful **filtering, aggregation, and indexing** similar to SQL features.

6. **Integration Friendly:**

   * Works seamlessly with modern frameworks (React, Node.js, Python, etc.).

### **3. Explain the concept of collections in MongoDB?**

* In **MongoDB**, a **collection** is a **group of related documents**, similar to a **table** in SQL.
* Each collection contains multiple **documents (records)**, and each document is a **JSON-like object** with key-value pairs.

**Example:**

```javascript
// A sample document in a "users" collection
{
  _id: 1,
  name: "Alice",
  email: "alice@example.com",
  age: 25
}
```

**Key Points:**

* Collections **do not enforce a strict schema** — documents in the same collection can have different fields.
* Collections are created automatically when you insert data.
* They make it easy to group similar data (e.g., `users`, `orders`, `products`).

### **4. How does MongoDB ensure high availability using replication?**

MongoDB ensures **high availability** through a feature called **replica sets**.

**Replica Set Structure:**

* A **replica set** consists of:

  * **1 Primary node** – handles all write operations.
  * **Multiple Secondary nodes** – replicate data from the primary.
  * Optionally, an **Arbiter** – helps in election during primary failure.

**How It Works:**

1. Data written to the **primary** node is automatically **replicated** to all **secondary** nodes.
2. If the primary node fails, one of the secondaries is **automatically promoted** to become the new primary (automatic failover).
3. Clients can still read data during failover, ensuring **no downtime**.

**Illustration:**

```
Primary → Secondary 1
         → Secondary 2
         ↳ Automatic Failover on Failure
```

**Result:**
MongoDB maintains **continuous availability, fault tolerance, and data redundancy** even during server or network failures.

### **5. What are the main benefits of MongoDB Atlas?**

**MongoDB Atlas** is the **fully managed cloud version** of MongoDB that automates database operations and scales easily.

**Key Benefits:**

1. **Fully Managed Service:**

   * Handles setup, backup, monitoring, and maintenance automatically.

2. **High Availability & Global Distribution:**

   * Built-in **replication** across multiple cloud regions and providers (AWS, Azure, GCP).

3. **Auto Scaling:**

   * Automatically scales storage and compute resources based on workload.

4. **Security:**

   * Offers **encryption at rest and in transit**, **VPC peering**, and **fine-grained access control**.

5. **Performance Monitoring:**

   * Provides an integrated dashboard for real-time performance analytics.

6. **Serverless & Integration Support:**

   * Works seamlessly with **MongoDB Realm**, APIs, and cloud functions for modern app architectures.

### **6. What is the role of indexes in MongoDB, and how do they improve performance?**

#### **Role of Indexes:**

Indexes in MongoDB are special data structures that **store a small portion of the collection’s data** in a way that makes queries faster.
They work similarly to indexes in books — instead of scanning every page (or document), MongoDB can directly jump to the required data.

#### **How Indexes Improve Performance:**

1. ⚡ **Faster Query Execution:**

   * Without indexes, MongoDB performs a **collection scan** — checking every document.
   * With an index, it searches only relevant entries, significantly speeding up query performance.

2. 🧭 **Efficient Sorting:**

   * Indexes also improve the speed of `sort()` operations.

3. 🎯 **Improved Query Optimization:**

   * The query planner uses indexes to choose the fastest path to fetch results.


#### **Example:**

```javascript
// Creating an index on the 'email' field
db.users.createIndex({ email: 1 });

// Query using the index
db.users.find({ email: "alice@example.com" });
```

Here, MongoDB will use the `email` index to quickly locate the matching document instead of scanning the whole `users` collection.

### **7. Describe the stages of the MongoDB aggregation pipeline?**

The **Aggregation Pipeline** in MongoDB is a powerful framework for **transforming, filtering, and analyzing data** in multiple stages — similar to SQL’s `GROUP BY` and `HAVING` clauses, but more flexible.

#### **Main Stages of Aggregation Pipeline:**

1. **`$match`** – Filters documents based on specified conditions.

   ```javascript
   { $match: { status: "Active" } }
   ```

2. **`$group`** – Groups documents by a field and performs aggregate operations like `sum`, `avg`, or `count`.

   ```javascript
   { $group: { _id: "$category", totalSales: { $sum: "$amount" } } }
   ```

3. **`$project`** – Reshapes documents by including, excluding, or computing new fields.

   ```javascript
   { $project: { name: 1, totalSales: 1, _id: 0 } }
   ```

4. **`$sort`** – Sorts documents in ascending or descending order.

   ```javascript
   { $sort: { totalSales: -1 } }
   ```

5. **`$limit`** – Restricts the number of output documents.

   ```javascript
   { $limit: 5 }
   ```

6. **`$lookup`** – Performs joins between collections (similar to SQL joins).

   ```javascript
   { $lookup: { from: "customers", localField: "cust_id", foreignField: "_id", as: "customerInfo" } }
   ```

### **8. What is sharding in MongoDB? How does it differ from replication?**

#### **Definition of Sharding:**

Sharding in MongoDB is the process of **horizontally partitioning data** across multiple servers (called shards) to handle large datasets and high throughput applications.

Each shard contains a subset of the data — together they form the complete dataset.

#### **How It Works:**

* A **shard key** (e.g., `user_id`, `region`) determines how data is distributed across shards.
* A **config server** stores metadata about the shards.
* A **mongos router** directs queries to the appropriate shard(s).

#### **Replication vs Sharding:**

| Feature               | **Replication**                                        | **Sharding**                                                       |
| --------------------- | ------------------------------------------------------ | ------------------------------------------------------------------ |
| **Purpose**           | Provides **data redundancy** and **high availability** | Provides **horizontal scalability** and handles large data volumes |
| **Data Distribution** | Copies the same data across multiple nodes             | Splits data into **smaller parts (shards)** across servers         |
| **Write Capacity**    | All writes go to the **primary node**                  | Writes are **distributed** across shards                           |
| **Usage**             | For **fault tolerance** and **backup**                 | For **scaling performance** and **big data handling**              |

### **9. What is PyMongo, and why is it used?**

**PyMongo** is the **official Python driver for MongoDB**, which allows Python applications to interact directly with MongoDB databases.

#### **Why It Is Used:**

1. **Database Connectivity:**

   * Enables Python developers to connect to MongoDB and perform operations like insert, update, delete, and query.

2. **Ease of Use:**

   * Provides a simple API to handle MongoDB documents as Python dictionaries.

3. **Supports Advanced Features:**

   * Works with aggregation pipelines, indexing, transactions, and more.

4. **Integration with Python Frameworks:**

   * Commonly used with frameworks like **Flask**, **Django**, and **FastAPI** for building web apps.

#### **Example:**

```python
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["SchoolDB"]
students = db["Students"]

# Insert a document
students.insert_one({"name": "Alice", "age": 21})

# Fetch a document
print(students.find_one({"name": "Alice"}))
```

### **10. What are the ACID properties in the context of MongoDB transactions?**

ACID stands for **Atomicity**, **Consistency**, **Isolation**, and **Durability** — four essential properties that ensure reliability and correctness in database transactions.
MongoDB (from version 4.0+) supports **multi-document ACID transactions**, making it behave similarly to traditional relational databases in terms of consistency.

#### **ACID Properties in MongoDB:**

1. **Atomicity:**

   * All operations within a transaction either **complete successfully** or **none are applied**.
   * Example: If transferring money between two accounts, both debit and credit must succeed or both fail.

2. **Consistency:**

   * The database moves from one valid state to another.
   * MongoDB ensures schema and data validation rules are always met.

3. **Isolation:**

   * Transactions are executed **independently** — concurrent operations don’t affect each other’s results.

4. **Durability:**

   * Once a transaction is committed, the changes are **permanently saved**, even after a crash or power failure.

### **11. What is the purpose of MongoDB’s explain() function?**

The `explain()` function in MongoDB is used to **analyze and understand how a query is executed**.
It helps developers **optimize query performance** by showing how MongoDB’s query planner chooses indexes, scans, and operations.

#### **Key Features:**

1. **Execution Plan Insight:**

   * Shows whether the query used an index or performed a full collection scan.

2. **Performance Optimization:**

   * Helps identify slow queries and improve them by adding or adjusting indexes.

3. **Query Statistics:**

   * Displays details like documents scanned, execution time, and index usage.


#### **Example:**

```javascript
db.users.find({ age: { $gt: 25 } }).explain("executionStats");
```

**Output Includes:**

* Query execution time
* Number of documents scanned
* Indexes used
* Total documents returned

### **12. How does MongoDB handle schema validation?**

Although MongoDB is **schema-flexible**, it provides **schema validation rules** to enforce data consistency and integrity when needed.

#### **How Schema Validation Works:**

MongoDB allows defining **JSON Schema validation rules** at the **collection level**.
These rules ensure that only documents following a certain structure can be inserted or updated.

#### **Example:**

```javascript
db.createCollection("students", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "age", "email"],
      properties: {
        name: { bsonType: "string" },
        age: { bsonType: "int", minimum: 18 },
        email: { bsonType: "string", pattern: "^.+@.+$" }
      }
    }
  }
});
```

If a document violates the validation rules, MongoDB rejects the operation.

#### **Benefits:**

* Maintains **data consistency**
* Prevents **invalid or incomplete data**
* Offers flexibility — you can update or relax validation rules anytime

### **13. What is the difference between a primary and a secondary node in a replica set?**

In a **MongoDB replica set**, data is replicated across multiple servers (nodes) to ensure **high availability** and **data redundancy**.
Each replica set has one **primary node** and one or more **secondary nodes**.

#### **Differences:**

| Feature              | **Primary Node**                                           | **Secondary Node**                                |
| -------------------- | ---------------------------------------------------------- | ------------------------------------------------- |
| **Role**             | Handles all **write** and **read (by default)** operations | Maintains a **copy** of the data from the primary |
| **Replication**      | Source of truth; all changes are written here              | Continuously **syncs** from the primary           |
| **Failover**         | Automatically replaced if it fails                         | Can be **promoted to primary** during failover    |
| **Read Operations**  | Default read target                                        | Can serve reads if **read preference** is set     |
| **Write Capability** | Yes                                                        | No (replica only)                                 |


#### **Example Scenario:**

* Primary Node: `Node A` (accepts writes)
* Secondary Nodes: `Node B`, `Node C` (replicate from A)
  If `Node A` fails → MongoDB automatically promotes `Node B` to be the new primary.

### **14. What security mechanisms does MongoDB provide for data protection?**

MongoDB includes several built-in **security mechanisms** to ensure data protection, prevent unauthorized access, and maintain confidentiality.

#### **Main Security Mechanisms:**

1. **Authentication:**

   * Verifies the identity of users using credentials (username & password).
   * Supports SCRAM, LDAP, Kerberos, and x.509 certificate-based authentication.

2. **Authorization (Access Control):**

   * Role-Based Access Control (RBAC) defines what actions a user can perform.
   * Example roles: `read`, `readWrite`, `dbAdmin`, `clusterAdmin`.

3. **Encryption:**

   * **Encryption at Rest:** Data stored on disk is encrypted using AES256.
   * **Encryption in Transit:** Data exchanged between clients and servers is encrypted using TLS/SSL.

4. **Auditing:**

   * Logs database events (like logins or data modifications) for compliance and monitoring.

5. **Network Security:**

   * IP whitelisting, firewalls, and VPC peering help restrict access to trusted networks.

### **15. Explain the concept of embedded documents and when they should be used?**

An **embedded document** in MongoDB is a document **nested inside another document**.
It allows related data to be stored together within a single collection — eliminating the need for complex joins.

#### **Example:**

```javascript
{
  _id: 1,
  name: "Alice",
  contact: {
    email: "alice@example.com",
    phone: "9876543210"
  },
  address: {
    city: "Mumbai",
    zip: "400001"
  }
}
```

Here, `contact` and `address` are **embedded documents** inside the main document.


#### **When to Use Embedded Documents:**

1. **One-to-One or One-to-Few Relationships:**

   * Example: A user and their address, or an order and its shipping details.

2. **Data That Is Accessed Together Frequently:**

   * If related information is always queried together, embedding improves read efficiency.

3. **Performance Optimization:**

   * Reduces the need for joins and additional queries — faster data retrieval.


#### **Avoid Embedding When:**

* Data grows indefinitely (e.g., user posts).
* You need frequent updates to only part of the data.

### **16. What is the purpose of MongoDB’s $lookup stage in aggregation?**

The `$lookup` stage in MongoDB’s **aggregation pipeline** performs a **left outer join** between two collections.
It allows combining related data stored in different collections — similar to an SQL `JOIN` operation.

#### **Syntax:**

```javascript
{
  $lookup: {
    from: "orders",          // The target collection
    localField: "_id",       // Field from the current collection
    foreignField: "customer_id", // Field from the target collection
    as: "customerOrders"     // Name of the new output array
  }
}
```

#### **Example:**

If we have two collections — `customers` and `orders` — we can use `$lookup` to get each customer’s orders:

```javascript
db.customers.aggregate([
  {
    $lookup: {
      from: "orders",
      localField: "customer_id",
      foreignField: "customer_id",
      as: "orders"
    }
  }
]);
```

**Result:**
Each customer document will now contain an array `orders` with their related order details.

### **17. What are some common use cases for MongoDB?**

MongoDB is a versatile NoSQL database widely used across industries for various modern applications.

#### **Common Use Cases:**

1. **Content Management Systems (CMS):**

   * Ideal for blogs, media storage, and e-commerce content where data structures vary frequently.

2. **Real-Time Analytics:**

   * Used in IoT, monitoring dashboards, and financial analytics where large data volumes update continuously.

3. **Mobile and Web Applications:**

   * Flexible schema supports frequent feature updates without downtime.

4. **Product Catalogs and Inventory Systems:**

   * Handles diverse product data with dynamic attributes (common in e-commerce).

5. **Social Networks and Messaging Apps:**

   * Efficiently stores user profiles, messages, and relationships.

6. **Geospatial Applications:**

   * Built-in geospatial indexing supports location-based queries (used in delivery, ride-hailing, maps).

### **18. What are the advantages of using MongoDB for horizontal scaling?**

**Horizontal scaling** means adding more servers (nodes) to handle increased data and traffic — rather than upgrading a single server.

#### **Advantages of MongoDB’s Horizontal Scaling (Sharding):**

1. **Massive Data Handling:**

   * Splits data across multiple shards, allowing storage of **huge datasets** beyond a single machine’s capacity.

2. **Improved Performance:**

   * Distributes both **read and write loads** among multiple servers, reducing bottlenecks.

3. **High Availability:**

   * Each shard can be a replica set, combining **sharding + replication** for performance and fault tolerance.

4. **Automatic Balancing:**

   * MongoDB automatically **balances data** across shards as the dataset grows.

5. **Cost Efficiency:**

   * Instead of expensive high-end hardware, you can use multiple **low-cost commodity servers**.

### **19. How do MongoDB transactions differ from SQL transactions?**

Both MongoDB and SQL databases support transactions, but they differ in how they handle data consistency, structure, and performance.

#### **Differences:**

| Feature              | **SQL Transactions**                                  | **MongoDB Transactions**                                                                 |
| -------------------- | ----------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| **Data Model**       | Operate on structured, relational tables.             | Operate on document-based collections.                                                   |
| **Default Behavior** | Every operation is transactional (ACID by default).   | Single-document operations are atomic; multi-document transactions require explicit use. |
| **Performance**      | Optimized for complex, multi-table transactions.      | Lightweight, but slower for very large multi-document transactions.                      |
| **Implementation**   | Managed through the relational engine (e.g., InnoDB). | Uses session-based transactions introduced in MongoDB 4.0+.                              |
| **Use Case**         | Banking, accounting systems needing strict ACID.      | Multi-document consistency in document-based apps.                                       |

### **20. What are the main differences between capped collections and regular collections?**

| Feature              | **Capped Collection**                                                               | **Regular Collection**                                         |
| -------------------- | ----------------------------------------------------------------------------------- | -------------------------------------------------------------- |
| **Definition**       | Fixed-size collection that automatically overwrites the oldest documents when full. | Standard collection that grows dynamically with no size limit. |
| **Size Limit**       | Defined during creation (`max size` or `max documents`).                            | No fixed size; can grow indefinitely.                          |
| **Use Case**         | Ideal for logs, caching, and real-time data streams.                                | Suitable for general data storage.                             |
| **Insertion Order**  | Maintains insertion order.                                                          | No guaranteed order.                                           |
| **Deletion**         | Old documents are automatically removed when full.                                  | Must delete manually.                                          |
| **Example Creation** | `db.createCollection("logs", { capped: true, size: 100000 })`                       | `db.createCollection("users")`                                 |

### **21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?**

The `$match` stage filters documents in the aggregation pipeline — it’s equivalent to a `WHERE` clause in SQL.
Only documents that satisfy the specified condition move to the next stage.

#### **Example:**

```javascript
db.orders.aggregate([
  { $match: { status: "Delivered", total: { $gt: 100 } } },
  { $group: { _id: "$customer_id", totalSpent: { $sum: "$total" } } }
]);
```

This example filters orders where `status = "Delivered"` and `total > 100` before grouping.

### **22. How can you secure access to a MongoDB database?**

MongoDB offers several methods to **secure access** and protect data from unauthorized users.

#### **1. Enable Authentication:**

* Require users to log in with a username and password.
* Example:

  ```bash
  mongod --auth
  ```
* Create an admin user:

  ```javascript
  db.createUser({ user: "admin", pwd: "password", roles: ["userAdminAnyDatabase"] });
  ```

#### **2. Use Role-Based Access Control (RBAC):**

* Assign specific roles (`read`, `readWrite`, `dbAdmin`, etc.) to users to limit privileges.

#### **3. Enable Network Security:**

* Bind MongoDB to a specific IP or localhost.

  ```bash
  bindIp: 127.0.0.1
  ```
* Use firewalls and VPNs to restrict network access.

#### **4. Use TLS/SSL Encryption:**

* Encrypt data **in transit** to prevent eavesdropping.

#### **5. Encrypt Data at Rest:**

* Enable **disk encryption** to secure stored data.

#### **6. Enable Auditing:**

* Track all user activities and database operations for compliance.

### **23. What is MongoDB’s WiredTiger storage engine, and why is it important?**

**WiredTiger** is the **default storage engine** used by MongoDB (since version 3.2).
It manages how data is stored, compressed, and accessed on disk.

#### **Key Features:**

1. **Document-Level Concurrency:**

   * Allows multiple write operations at the same time without locking the entire collection.

2. **Compression:**

   * Uses Snappy or Zlib compression to reduce disk usage.

3. **Checkpointing:**

   * Ensures data durability by saving consistent checkpoints periodically.

4. **Caching:**

   * Uses an internal cache to improve read and write performance.

5. **Journaling:**

   * Protects data against power failure or crashes by maintaining a write-ahead log.

#### **Why It’s Important:**

* Improves **performance** and **scalability**.
* Ensures **data durability** and **consistency**.
* Provides **efficient memory and disk usage** — ideal for large-scale databases.

## Practical Questions

### **1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB.**

In [None]:
import pandas as pd
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

data = pd.read_csv("superstore.csv", encoding="latin1")

records = data.to_dict(orient="records")

orders_collection.insert_many(records)

print("Superstore dataset successfully loaded into MongoDB!")

Superstore dataset successfully loaded into MongoDB!


### **2. Retrieve and print all documents from the Orders collection.**

In [4]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

for order in orders_collection.find():
    print(order)

{'_id': ObjectId('691608b46e9a9c77527a4fcc'), 'Row ID': 1, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-BO-10001798', 'Category': 'Furniture', 'Sub-Category': 'Bookcases', 'Product Name': 'Bush Somerset Collection Bookcase', 'Sales': 261.96, 'Quantity': 2, 'Discount': 0.0, 'Profit': 41.9136}
{'_id': ObjectId('691608b46e9a9c77527a4fcd'), 'Row ID': 2, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-CH-10000454', 'Category': 'Furniture', 'Sub

### **3. Count and display the total number of documents in the Orders collection.**

In [5]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

# Count total documents
total_orders = orders_collection.count_documents({})
print("Total number of orders:", total_orders)

Total number of orders: 9994


### **4. Write a query to fetch all orders from the "West" region.**

In [6]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

west_orders = orders_collection.find({"Region": "West"})

for order in west_orders:
    print(order)

{'_id': ObjectId('691608b46e9a9c77527a4fce'), 'Row ID': 3, 'Order ID': 'CA-2016-138688', 'Order Date': '6/12/2016', 'Ship Date': '6/16/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'DV-13045', 'Customer Name': 'Darrin Van Huff', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Los Angeles', 'State': 'California', 'Postal Code': 90036, 'Region': 'West', 'Product ID': 'OFF-LA-10000240', 'Category': 'Office Supplies', 'Sub-Category': 'Labels', 'Product Name': 'Self-Adhesive Address Labels for Typewriters by Universal', 'Sales': 14.62, 'Quantity': 2, 'Discount': 0.0, 'Profit': 6.8714}
{'_id': ObjectId('691608b46e9a9c77527a4fd1'), 'Row ID': 6, 'Order ID': 'CA-2014-115812', 'Order Date': '6/9/2014', 'Ship Date': '6/14/2014', 'Ship Mode': 'Standard Class', 'Customer ID': 'BH-11710', 'Customer Name': 'Brosina Hoffman', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Los Angeles', 'State': 'California', 'Postal Code': 90032, 'Region': 'West', 'Product ID': 'FUR-FU-1

### **5. Write a query to find orders where Sales is greater than 500.**

In [7]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

high_sales = orders_collection.find({"Sales": {"$gt": 500}})

for order in high_sales:
    print(order)

{'_id': ObjectId('691608b46e9a9c77527a4fcd'), 'Row ID': 2, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-CH-10000454', 'Category': 'Furniture', 'Sub-Category': 'Chairs', 'Product Name': 'Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back', 'Sales': 731.94, 'Quantity': 3, 'Discount': 0.0, 'Profit': 219.582}
{'_id': ObjectId('691608b46e9a9c77527a4fcf'), 'Row ID': 4, 'Order ID': 'US-2015-108966', 'Order Date': '10/11/2015', 'Ship Date': '10/18/2015', 'Ship Mode': 'Standard Class', 'Customer ID': 'SO-20335', 'Customer Name': "Sean O'Donnell", 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Fort Lauderdale', 'State': 'Florida', 'Postal Code': 33311, 'Region': 'South', 'Product ID': 'FUR-TA-100005

### **6. Fetch the top 3 orders with the highest Profit.**

In [None]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

top_orders = orders_collection.find().sort("Profit", -1).limit(3)

for order in top_orders:
    print(order)

{'_id': ObjectId('691608b46e9a9c77527a6a76'), 'Row ID': 6827, 'Order ID': 'CA-2016-118689', 'Order Date': '10/2/2016', 'Ship Date': '10/9/2016', 'Ship Mode': 'Standard Class', 'Customer ID': 'TC-20980', 'Customer Name': 'Tamara Chand', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Lafayette', 'State': 'Indiana', 'Postal Code': 47905, 'Region': 'Central', 'Product ID': 'TEC-CO-10004722', 'Category': 'Technology', 'Sub-Category': 'Copiers', 'Product Name': 'Canon imageCLASS 2200 Advanced Copier', 'Sales': 17499.95, 'Quantity': 5, 'Discount': 0.0, 'Profit': 8399.976}
{'_id': ObjectId('691608b46e9a9c77527a6fa5'), 'Row ID': 8154, 'Order ID': 'CA-2017-140151', 'Order Date': '3/23/2017', 'Ship Date': '3/25/2017', 'Ship Mode': 'First Class', 'Customer ID': 'RB-19360', 'Customer Name': 'Raymond Buch', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Seattle', 'State': 'Washington', 'Postal Code': 98115, 'Region': 'West', 'Product ID': 'TEC-CO-10004722', 'Category': 'Te

### **7. Update all orders with Ship Mode as "First Class" to "Premium Class.**

In [None]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

result = orders_collection.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)

print("Documents updated:", result.modified_count)

Documents updated: 1538


### **8. Delete all orders where Sales is less than 50.**

In [10]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

result = orders_collection.delete_many({"Sales": {"$lt": 50}})

print("Documents deleted:", result.deleted_count)

Documents deleted: 4849


### **9. Use aggregation to group orders by Region and calculate total sales per region.**

In [None]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

pipeline = [
    {"$group": {"_id": "$Region", "TotalSales": {"$sum": "$Sales"}}},
    {"$sort": {"TotalSales": -1}}
]

region_sales = orders_collection.aggregate(pipeline)

for region in region_sales:
    print(region)

{'_id': 'West', 'TotalSales': 694686.6195}
{'_id': 'East', 'TotalSales': 651137.705}
{'_id': 'Central', 'TotalSales': 479611.8458}
{'_id': 'South', 'TotalSales': 376023.312}


### **10. Fetch all distinct values for Ship Mode from the collection.**

In [12]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

distinct_modes = orders_collection.distinct("Ship Mode")

print("Distinct Ship Modes:", distinct_modes)

Distinct Ship Modes: ['Premium Class', 'Same Day', 'Second Class', 'Standard Class']


### **11. Count the number of orders for each category.**

In [13]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

pipeline = [
    {"$group": {"_id": "$Category", "TotalOrders": {"$sum": 1}}},
    {"$sort": {"TotalOrders": -1}}
]

category_orders = orders_collection.aggregate(pipeline)

for category in category_orders:
    print(category)

{'_id': 'Office Supplies', 'TotalOrders': 2076}
{'_id': 'Furniture', 'TotalOrders': 1573}
{'_id': 'Technology', 'TotalOrders': 1496}
