##Q 1. What are the key differences between SQL and NoSQL databases?
**Ans** - Differences between **SQL** and **NoSQL** databases lie in their data models, scalability, structure, and use cases.

1. **Data Model**
* **SQL**: Relational Databases
  * Structured data stored in tables.
  * Relationships between tables are defined using keys.

* **NoSQL**: Non-Relational or Distributed Databases
  * Includes document, key-value, wide-column, and graph models.
  * Flexible schema, often semi-structured or unstructured data.

2. **Schema**
* **SQL**: Fixed Schema
  * Requires predefined schemas.
  * Altering the schema can be complex and requires migrations.

* **NoSQL**: Dynamic Schema
  * Schema-less or flexible.
  * Fields can vary between records.

3. **Scalability**
* **SQL**: Vertical Scaling
  * Scaling up by increasing hardware resources on a single server.

* **NoSQL**: Horizontal Scaling
  * Scaling out by adding more servers or nodes to the database cluster.

4. **Transactions**
* **SQL**: Fully ACID-compliant
  * Strong consistency, isolation, and durability for complex transactions.

* **NoSQL**: BASE model
  * Some support ACID, but generally prioritize availability and partition tolerance over consistency.

5. **Query Language**
* **SQL**: Uses Structured Query Language
  * Powerful, standardized querying capabilities.

* **NoSQL**: Varies by database
  * Each NoSQL database has its own query methods.

6. **Use Cases**
* **SQL**:
  * Traditional enterprise applications, CRM, ERP
  * Applications with complex queries and transactional data

* **NoSQL**:
  * Real-time web apps, big data, IoT, content management
  * Applications requiring fast development, scalability, and flexible data handling

7. **Examples**
* **SQL**:
  * MySQL, PostgreSQL, Microsoft SQL Server, Oracle Database

* **NoSQL**:
  * MongoDB, Redis, Cassandra, Neo4j

##Q 2. What makes MongoDB a good choice for modern applications?
**Ans**- MongoDB is a popular **NoSQL document database** that offers several features making it a strong choice for **modern applications**, especially those requiring flexibility, scalability, and fast development.

1. **Flexible Schema**
  * MongoDB stores data in **BSON** format, allowing us to store documents with varying structures in the same collection.
  * Perfect for **agile development** and frequent changes to the data model without downtime.

2. **High Scalability**
  * MongoDB supports **automatic sharding**, distributing data across multiple servers or clusters.
  * Enables applications to handle large volumes of data and high user loads, ideal for cloud-native and big data applications.

3. **Document-Oriented Data Model**
  * Stores data as **JSON-like documents**, making it intuitive to work with objects in modern programming languages.
  * Easier to represent **nested data and hierarchical relationships** compared to relational tables.

4. **High Performance**
  * Optimized for **read and write performance**.
  * Supports **indexing**, **aggregation pipelines**, and **in-memory processing** to ensure fast queries and analytics.

5. **Powerful Query Language**
  * Supports rich and expressive queries with support for:
    * Filtering, projections, and sorting
    * Text search, geospatial queries
    * Aggregation pipelines for analytics

6. **Built-In Replication & High Availability**
  * MongoDB supports **replica sets**, ensuring data redundancy and high availability.
  * Failover happens automatically if a primary node goes down.

7. **Developer-Friendly Ecosystem**
  * Official drivers for many languages
  * Integration with **modern frameworks**
  * Seamless use with cloud platforms and DevOps tools

8. **Strong Community & Cloud Support**
  * **MongoDB Atlas** provides a fully managed, scalable cloud database service.
  * Large community, extensive documentation, and active support.

9. **Ideal for Modern Use Cases**

MongoDB is a great choice for:
  * Real-time analytics dashboards
  * Mobile and web applications
  * Content management systems
  * IoT applications
  * E-commerce platforms
  * Social media and messaging apps

##Q 3. Explain the concept of collections in MongoDB.
**Ans** - In **MongoDB**, a **collection** is a grouping of **documents**, and it is conceptually similar to a **table in relational databases**. However, unlike tables, collections do **not enforce a strict schema**, allowing documents within the same collection to have different structures.

**Concepts of a Collection in MongoDB**
1. **Document Container**
  * A **collection** stores **BSON documents**.
  * Each document is a **key-value pair** structure, similar to a JSON object.

2. **Schema-Free**
  * Collections are **schema-less**, meaning:

    * Documents in the same collection can have **different fields**, **data types**, or **nested structures**.
    * This flexibility is ideal for agile development and handling diverse or evolving data formats.

3. **Naming a Collection**
  * Collection names are **case-sensitive** and can include almost any UTF-8 character.
  * Certain characters like `\0` and names starting with `system.` are **reserved**.

4. **Creating a Collection**
  * Collections are typically created **implicitly** when we insert the first document.
  * We can also create them **explicitly** using `db.createCollection()` if we want to specify options.

5. **Accessing a Collection**

In [None]:
// In the Mongo shell
db.users.find()

// "users" is the collection name

6. **Common Operations on Collections**
  * `insertOne()` / `insertMany()` – Add documents
  * `find()` – Query documents
  * `updateOne()` / `updateMany()` – Modify documents
  * `deleteOne()` / `deleteMany()` – Remove documents
  * `createIndex()` – Create indexes for better performance

**Example: Collection and Documents**

**Collection Name:** `products`

**Documents:**

In [None]:
// Document 1
{
  "_id": 1,
  "name": "Laptop",
  "price": 75000,
  "brand": "Dell"
}

// Document 2
{
  "_id": 2,
  "name": "Smartphone",
  "brand": "Samsung",
  "features": ["5G", "128GB Storage"]
}

As we can see:
* Both documents are in the **same collection** (`products`)
* Their structure can vary
* MongoDB handles this gracefully

**Advantages of Using Collections**
  * **Flexibility**: Store varied data without redefining schemas
  * **Scalability**: Collections can grow very large and span multiple shards in distributed setups
  * **Ease of Use**: No need to predefine data formats; quick iteration

##Q 4. How does MongoDB ensure high availability using replication?
**Ans** - MongoDB ensures **high availability** using a mechanism called **replication**, which is implemented through **replica sets**. A replica set is a **group of MongoDB servers** that maintain the same data, providing redundancy and fault tolerance.

A **replica set** consists of:
  * **Primary node**: Accepts all write operations.
  * **Secondary nodes**: Maintain copies of the primary's data and replicate changes using **oplog**.
  * **Arbiter**: A node that does **not store data** but helps in **election voting** to choose a new primary.

**Features of Replication in MongoDB**

1. **Automatic Failover**
  * If the **primary** becomes unavailable, an **election** is triggered among secondaries.
  * One of the eligible secondaries is **automatically promoted to primary** within seconds, ensuring continuous availability.

2. **Data Redundancy**
  * All data is **replicated** from the primary to the secondaries.
  * This protects against **hardware failure** or **data loss**.

3. **Read Scalability**
  * By default, all **writes go to the primary**, but we can **read from secondaries** if eventual consistency is acceptable.
  * Use cases: analytics, reporting, backups.

4. **Write Concerns & Read Preferences**
  * We can configure **write concern** to ensure writes are acknowledged by multiple nodes.
  * We can set **read preferences** to read from primary, secondary, or nearest node depending on consistency and latency needs.

**Diagram: Simple 3-Node Replica Set**

In [None]:
      +------------+
      |  Primary   |
      +------------+
           /  \
          /    \
         v      v
+---------------+    +---------------+
|  Secondary 1  |    |  Secondary 2  |
+---------------+    +---------------+

* Writes go to Primary.
* Secondaries replicate data continuously.
* If Primary fails, one Secondary is elected as the new Primary.

**Steps in Replication Process**
1. **Primary receives write** → adds operation to its **oplog**.
2. **Secondaries tail the primary’s oplog** and apply changes in order.
3. If primary fails → **election** occurs → new primary is elected.
4. Clients are automatically **re-routed** to the new primary.

**Benefits of MongoDB Replication**

| Feature | Benefit |
|-||
| Automatic Failover | High availability without manual action |
| Redundancy | Protection against hardware/data failure |
| Horizontal Read Scaling | Distribute read load across secondaries |
| Geo-Redundancy | Replicas can be placed in multiple regions for disaster recovery |

**Example Configuration (Replica Set Init)**

In [None]:
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongodb1:27017" },
    { _id: 1, host: "mongodb2:27017" },
    { _id: 2, host: "mongodb3:27017" }
  ]
})

##Q 5. What are the main benefits of MongoDB Atlas?
**Ans** - **MongoDB Atlas** is the **fully managed cloud version of MongoDB**, and it offers a wide range of features that make deploying, managing, and scaling MongoDB much easier and more efficient for developers and businesses.

**main benefits of using MongoDB Atlas**:

1. **Fully Managed Service**
  * **No server management** required — MongoDB Atlas takes care of deployment, upgrades, backups, monitoring, and patches.
  * Saves time and reduces DevOps overhead.

2. **Automated Scaling**
  * **Vertical Scaling**: Easily increase instance size as needed.
  * **Horizontal Scaling**: Automatically distribute data across multiple nodes to handle large datasets and traffic.
  * **Auto-scaling**: Enables automatic resource scaling based on workload.

3. **High Availability**
  * Atlas uses **replica sets** with **automatic failover** to ensure 99.995%+ uptime.
  * Supports **multi-region replication** for geographic redundancy and disaster recovery.

4. **Global Distribution**
  * Deploy clusters in **80+ regions** across **AWS**, **Azure**, and **Google Cloud Platform**.
  * Store data close to users to reduce latency and improve app performance.

5. **Built-In Security**
  * End-to-end **encryption at rest and in transit**
  * **IP whitelisting**, **VPC peering**, **role-based access control**, **LDAP/SAML integration**
  * **Compliance certifications**: SOC 2, ISO 27001, GDPR, HIPAA

6. **Automated Backups and Snapshots**
  * Continuous and point-in-time backups
  * Easy recovery and restore with minimal effort

7. **Real-Time Performance Monitoring**
  * Built-in dashboard for metrics like:
    * Query performance
    * Read/write latency
    * CPU/RAM usage
  * Allows fine-tuning and quick debugging

8. **Serverless and Data API Options**
  * **Atlas Data API** lets us interact with our database over HTTPS without managing drivers.
  * **Serverless instances** scale to zero automatically — ideal for infrequent workloads or prototypes.

9. **Integrated Search and Analytics**
  * **Atlas Search**: Built on Apache Lucene, allows full-text search with powerful indexing and ranking.
  * **Atlas Charts**: Native data visualization tool.
  * Integrates with **BI tools** and **Data Lakes** for analytics on live or archived data.

10. **Developer-Friendly Tools**
  * Integrates with MongoDB Compass, VS Code, Realm, and CLI tools.
  * Access through Atlas UI or programmatically via APIs and SDKs.
  * Supports modern development stacks: MERN, MEAN, JAMstack, etc.

11. **Free Tier for Learning and Prototyping**
  * Offers a **free shared cluster** with limited resources:
    * Good for small apps, demos, or learning MongoDB

##Q 6. What is the role of indexes in MongoDB, and how do they improve performance?
**Ans** - Indexes play a **crucial role in MongoDB** by significantly improving the **performance** of query operations. Without indexes, MongoDB has to **scan every document** in a collection to find those that match a query — a process known as a **collection scan**.

An **index** in MongoDB is a **data structure** that stores a **subset of fields** from the documents in a collection in a sorted order. This allows MongoDB to efficiently **search, sort, and filter** through data.

**Benefits of Using Indexes**
1. **Faster Query Performance**
  * Indexes allow MongoDB to quickly **locate matching documents** without scanning the entire collection.
  * Especially useful in large datasets where full scans would be too slow.

2. **Efficient Sorting**
  * If the query uses a **sort** on an indexed field, MongoDB can return sorted results **without additional computation**.

3. **Support for Uniqueness**
  * **Unique indexes** enforce data integrity by preventing duplicate values for the indexed field.

4. **Improved Aggregation Performance**
  * Indexes speed up **match** and **sort** stages of aggregation pipelines.

5. **Support for Geospatial and Text Queries**
  * MongoDB supports **2D/2DSphere indexes** for geospatial data and **text indexes** for full-text search.

**Types of Indexes in MongoDB**

| Index Type | Description |
|-||
| **Single Field** | Index on one field (e.g., `name`) |
| **Compound** | Index on multiple fields (e.g., `firstName`, `lastName`) |
| **Multikey** | Automatically created for arrays |
| **Text Index** | Supports text search across string fields |
| **Hashed Index** | Used for sharded clusters (hashed sharding key) |
| **Geospatial Index** | Enables geospatial queries (`2d` or `2dsphere`) |
| **Wildcard Index** | Automatically indexes all fields or specific patterns in documents |
| **Unique Index** | Ensures values are unique across the collection |

**Example: Creating an Index**

In [None]:
// Create a single-field index
db.users.createIndex({ email: 1 })

// Create a compound index
db.orders.createIndex({ customerId: 1, orderDate: -1 })

// Create a text index
db.articles.createIndex({ title: "text", content: "text" })

**Indexes Work Internally**
1. MongoDB maintains an index as a **sorted data structure**.
2. When a query is issued, MongoDB checks if a suitable index exists.
3. If yes, it uses the index to **narrow down matching documents quickly**.
4. If not, it performs a **collection scan**, which is slower.

**Viewing and Managing Indexes**
  * View indexes on a collection:

In [None]:
db.collection.getIndexes()

* Drop an index:

In [None]:
db.collection.dropIndex("index_name")

* Drop all indexes:

In [None]:
db.collection.dropIndexes()

**Important Considerations**
  * **Too many indexes** can slow down write operations (inserts/updates) because MongoDB must also update the indexes.
  * Always **analyze our queries** using:

In [None]:
db.collection.find({...}).explain("executionStats")

* Use **indexing strategically** based on access patterns, query frequency, and data volume.

##Q 7. Describe the stages of the MongoDB aggregation pipeline.
**Ans** - The **MongoDB Aggregation Pipeline** is a powerful framework used to process and transform data from collections in stages. Each stage performs a specific operation on the input documents and passes the results to the next stage, allowing complex data transformations similar to SQL `GROUP BY`, `JOIN`, `WHERE`, and even custom computations.

**Overview: An Aggregation is Pipeline**
  * An aggregation pipeline is an **array of stages**.
  * Each stage **takes input documents**, **transforms them**, and **outputs new documents**.
  * Think of it like a **conveyor belt** where each step adds, filters, groups, or reshapes data.

**Common Stages in the Aggregation Pipeline**
1. **`$match`** - Filter Documents
  * Filters the documents to pass only those that match the condition.
  * Should be used **early** to reduce the amount of data processed in later stages.

In [None]:
{ $match: { status: "active" } }

2. **`$project`** – Reshape Each Document
  * Used to include, exclude, or compute new fields.
  * Can also rename or reformat fields.

In [None]:
{ $project: { name: 1, email: 1, fullName: { $concat: ["$firstName", " ", "$lastName"] } } }

3. **`$group`** - Group Documents by a Field
  * Aggregates documents by a specified key and computes values like `sum`, `avg`, `count`.

In [None]:
{
  $group: {
    _id: "$category",
    totalSales: { $sum: "$amount" }
  }
}

4. **`$sort`** – Sort Documents
  * Sorts the documents in ascending (`1`) or descending (`-1`) order.

In [None]:
{ $sort: { createdAt: -1 } }

5. **`$limit`** – Limit Number of Documents
  * Restricts the number of documents that pass to the next stage.

In [None]:
{ $limit: 10 }

6. **`$skip`** – Skip Documents
  * Skips a specified number of documents (used with `$limit` for pagination).

In [None]:
{ $skip: 10 }

7. **`$unwind`** – Deconstruct Arrays
  * Breaks up an array field so that each element becomes a separate document.

In [None]:
{ $unwind: "$tags" }

8. **`$lookup`** – Perform Joins (Like SQL JOIN)
  * Joins documents from another collection.


In [None]:
{
  $lookup: {
    from: "orders",
    localField: "_id",
    foreignField: "userId",
    as: "userOrders"
  }
}

9. **`$addFields`** – Add or Modify Fields
  * Similar to `$project`, but used to **add fields without replacing existing ones**.

In [None]:
{ $addFields: { discount: { $multiply: ["$price", 0.1] } } }

10. **`$count`** – Count Number of Documents
  * Outputs a single document with a count of the number of documents processed.

In [None]:
{ $count: "total" }

11. **`$facet`** – Run Multiple Pipelines in Parallel
  * Allows multiple aggregations to run on the same input set, often for reporting or analytics.

In [None]:
{
  $facet: {
    byCategory: [ { $group: { _id: "$category", count: { $sum: 1 } } } ],
    topSales: [ { $sort: { amount: -1 } }, { $limit: 5 } ]
  }
}

### 12. **`$merge` / `$out`** – Store Results in a Collection
  * Writes the output to a new or existing collection.

In [None]:
{ $out: "reportResults" }

**Example: Aggregation Pipeline in Action**

In [None]:
db.orders.aggregate([
  { $match: { status: "confirmed" } },
  { $group: { _id: "$customerId", totalSpent: { $sum: "$amount" } } },
  { $sort: { totalSpent: -1 } },
  { $limit: 5 }
])

**What it does**:
Filters confirmed orders → groups by customer → sums their spend → sorts descending → returns top 5 spenders.

**Summary Table: Aggregation Stages**

| Stage | Purpose |
|-||
| `$match` | Filter documents |
| `$project` | Shape/format documents |
| `$group` | Aggregate by a key |
| `$sort` | Sort documents |
| `$limit` | Restrict number of documents |
| `$skip` | Skip documents (for pagination) |
| `$unwind` | Split arrays into multiple documents |
| `$lookup` | Join with other collections |
| `$addFields` | Add or modify fields |
| `$count` | Count the number of documents |
| `$facet` | Run multiple aggregations in parallel |
| `$out` | Output results to a collection |

##Q 8 What is sharding in MongoDB? How does it differ from replication?
**Ans** - **Sharding** in MongoDB is a method of **horizontal scaling** used to handle **large datasets** and **high-throughput applications** by distributing data across multiple machines or clusters.

Although **sharding** and **replication** both involve multiple MongoDB servers and improve performance and reliability, they serve **different purposes**.

**Sharding** is the process of **splitting data across multiple servers** so that each server stores only a portion of the data.

**Components of a Sharded Cluster:**
1. **Shards**: The actual data storage nodes; each shard holds a subset of the data.
2. **Mongos**: The **query router** — routes queries to the appropriate shard.
3. **Config Servers**: Store metadata and configuration about the cluster.

**Use of Sharding**
* To **scale horizontally** as data grows beyond the capacity of a single server.
* To **improve performance** by parallelizing query execution and data storage.
* To avoid performance bottlenecks and storage limits.

**Sharding Working**
1. We define a **shard key**.
2. MongoDB partitions the data based on this key using **ranges** or **hashed values**.
3. Data is then distributed across multiple shards accordingly.
4. When we query, **mongos** determines which shard contain relevant data.

**Example: Sharded Collection**

If we shard a `users` collection on the key `region`, the data might be divided as:
  * Shard A: users from North and East
  * Shard B: users from South and West
  * Shard C: users from Central

**Replication**

**Replication** is the process of maintaining **identical copies** of our data across multiple MongoDB servers to ensure **high availability** and **data redundancy**.

**Features:**
  * One **primary node** handles all writes.
  * **Secondary nodes** replicate data from the primary.
  * If the primary fails, an **election** promotes a secondary to primary.

**Sharding vs Replication: Differences**

| Feature | **Sharding** | **Replication** |
|-|||
| **Purpose** | **Scalability** – distribute data | **Availability & redundancy** – duplicate data |
| **Data Distribution** | Each shard holds a **subset** of data | Each replica holds a **full copy** of the data |
| **Scaling** | **Horizontal scaling** (more shards = more data) | Limited scaling; mostly **vertical** |
| **Use Case** | Handling **big data**, **high throughput** apps | Ensuring **uptime**, **backup**, **failover support** |
| **Failure Handling** | May lose part of data if a shard fails (no replica) | Seamless failover to secondaries if primary fails |
| **Components** | Shards, mongos, config servers | Primary, secondaries, optional arbiter |
| **Query Routing**  | Queries go through **mongos**                       | Queries go to **primary** (writes) or secondaries (reads) |

##Q 9 What is PyMongo, and why is it used?
**Ans** - **PyMongo** is the **official Python driver for MongoDB**, developed and maintained by MongoDB Inc. It allows Python applications to **connect to, interact with, and manipulate** MongoDB databases programmatically.

**Use of PyMongo**

PyMongo provides a **simple and intuitive API** for performing all the essential MongoDB operations, such as:
* Connecting to MongoDB
* Inserting, querying, updating, and deleting documents
* Creating indexes
* Performing aggregation pipelines
* Managing databases and collections

**Features of PyMongo**

1. **Easy Connection to MongoDB**
* Connect to local or remote MongoDB instances, including **MongoDB Atlas**.

In [None]:
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")

2. **CRUD Operations**

Perform **Create, Read, Update, and Delete** operations using simple Python methods.

In [None]:
# Access database and collection
db = client['mydatabase']
collection = db['users']

# Insert a document
collection.insert_one({"name": "Alice", "age": 25})

# Query documents
user = collection.find_one({"name": "Alice"})

# Update a document
collection.update_one({"name": "Alice"}, {"$set": {"age": 26}})

# Delete a document
collection.delete_one({"name": "Alice"})

3. **Support for Aggregation Pipeline**

Run powerful aggregation queries directly from Python.

In [None]:
pipeline = [
    { "$match": { "status": "active" } },
    { "$group": { "_id": "$department", "count": { "$sum": 1 } } }
]
result = collection.aggregate(pipeline)

4. **Index Management**

Create and manage indexes to optimize performance.

In [None]:
collection.create_index("email", unique=True)

5. **BSON Support**

PyMongo uses **BSON** internally, the same format used by MongoDB, enabling support for advanced data types like `ObjectId`, `datetime`, `binary`, etc.

6. **Asynchronous Support**

For async applications, you can use **Motor**, the asynchronous version of PyMongo.

**When we should use PyMongo**
* We're building a **Python application** that interacts with MongoDB.
* We need to store or retrieve **structured or semi-structured data**.
* We're using **MongoDB as our primary database** and want to leverage Python's simplicity.

##Q 10. What are the ACID properties in the context of MongoDB transactions?
**Ans** - In **MongoDB transactions**, **ACID** properties ensure that database operations are **reliable, consistent, and safe**, even in the event of failures or concurrent access. This is especially important in multi-document transactions or when working with financial, inventory, or mission-critical data.

**A — Atomicity**

**All-or-nothing execution** of a transaction.
* In MongoDB, **either all operations in a transaction succeed or none of them are applied**.
* If an error occurs, the transaction is **aborted**, and all changes are rolled back.
* Ensures **no partial updates** occur, preventing data corruption.

Example: Transferring money between two accounts — both the debit and credit must succeed together.

**C — Consistency**

The database must remain in a **valid state before and after** a transaction.

* MongoDB enforces **schema and data rules**.
* If a transaction would violate data integrity or constraints, it is **aborted**.
* Ensures application logic and business rules are **always respected**.

**I — Isolation**

**Concurrent transactions** must not interfere with each other.

* MongoDB provides **snapshot isolation**:

  * Each transaction sees a **consistent snapshot of the data** at the start.
  * Other writes are **invisible** until the transaction commits.
* Prevents issues like **dirty reads**, **non-repeatable reads**, and **phantoms**.

**D — Durability**

Once a transaction is **committed**, the changes are **permanently saved** — even in the event of a crash or power failure.

* MongoDB writes transaction data to disk using **journal files**.
* Ensures that committed data is **never lost**, even during system failures.

**ACID in MongoDB:**
* **Single-document operations** in MongoDB have always been **atomic**, even without explicit transactions.
* **Multi-document ACID transactions** were introduced in:

  * **MongoDB 4.0** for replica sets
  * **MongoDB 4.2** for sharded clusters

**Example of a Multi-Document Transaction:**

In [None]:
with client.start_session() as session:
    with session.start_transaction():
        accounts.update_one({"_id": "A"}, {"$inc": {"balance": -100}}, session=session)
        accounts.update_one({"_id": "B"}, {"$inc": {"balance": 100}}, session=session)

##Q 11. What is the purpose of MongoDB’s explain() function?
**Ans** - The purpose of MongoDB’s **`explain()`** function is to help us to **analyze and optimize query performance** by showing **how MongoDB executes a query**. It provides detailed information about the query execution plan, including:
* Whether indexes are used
* How many documents were scanned
* Execution time
* Query stages involved

**Use `explain()`**

It is especially useful for:

* **Performance tuning**
* **Index optimization**
* Understanding **why a query is slow**
* **Comparing different query strategies**

**Syntax**

In [None]:
db.collection.find(query).explain()

We can also specify verbosity levels:

In [None]:
db.collection.find(query).explain("queryPlanner")   // Default
db.collection.find(query).explain("executionStats") // Includes run stats
db.collection.find(query).explain("allPlansExecution") // Most detailed

**Output Fields**

| Field | Description |
|-||
| `queryPlanner` | Shows the query plan, stages, and indexes used |
| `winningPlan` | The plan that MongoDB chose to execute the query |
| `stage` | The execution stage (e.g., `COLLSCAN`, `IXSCAN`, `FETCH`) |
| `indexName` | Name of the index used (if any) |
| `nReturned` | Number of documents returned |
| `totalDocsExamined`   | Number of documents scanned in total |
| `totalKeysExamined` | Number of index entries scanned |
| `executionTimeMillis` | Time taken to execute the query |

**Example: Basic Usage**

In [None]:
db.users.find({ age: 25 }).explain("executionStats")

**Output:**

In [None]:
{
  "queryPlanner": {
    "winningPlan": {
      "stage": "IXSCAN",
      "indexName": "age_1"
    }
  },
  "executionStats": {
    "nReturned": 5,
    "totalKeysExamined": 5,
    "totalDocsExamined": 5,
    "executionTimeMillis": 1
  }
}

**Interpretation**:
* MongoDB used the `age_1` index.
* It scanned 5 index keys and 5 documents to return 5 results.
* Very efficient!

**Common Stages We'll See**

| Stage | Meaning |
|-||
| `COLLSCAN` | Collection scan — no index used (slow) |
| `IXSCAN` | Index scan — index was used (fast) |
| `FETCH` | Retrieved documents after index match |
| `SORT` | Sorting stage (can be slow if no index) |

**Best Practices Using `explain()`**
* Use `COLLSCAN` results as red flags — consider creating indexes.
* Use `executionStats` to measure **real performance impact**.
* Regularly run `explain()` on frequently-used or slow queries.

##Q 12. How does MongoDB handle schema validation?
**Ans** - MongoDB is traditionally known as a **schema-less** database, but it **does support schema validation** to enforce **data integrity** when needed. This allows developers to define **rules** for the structure and contents of documents within a collection.

Schema validation in MongoDB allows us to **define rules** using a **JSON Schema** syntax. These rules can restrict:
* Required fields
* Data types
* Field value ranges or patterns
* Allowed enum values
* Field structures

MongoDB enforces validation **when documents are inserted or updated**.

**Define Schema Validation**

We can add schema validation using the `validator` option when creating or modifying a collection.

**Example: Require `name` and `age` fields**

In [None]:
db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "age"],
      properties: {
        name: {
          bsonType: "string",
          description: "must be a string and is required"
        },
        age: {
          bsonType: "int",
          minimum: 18,
          description: "must be an integer >= 18 and is required"
        }
      }
    }
  }
})

**Validator Levels**

We can control how strictly MongoDB enforces the rules with `validationLevel` and `validationAction`.

| Option | Values | Description |
|-|||
| `validationLevel` | `"strict"` (default), `"moderate"` | Determines which documents are validated (new/updated only, or all) |
| `validationAction` | `"error"` (default), `"warn"` | Whether to **reject** invalid documents or **log a warning** |

**Example:**

In [None]:
db.runCommand({
  collMod: "users",
  validator: { /* schema here */ },
  validationLevel: "moderate",
  validationAction: "warn"
})

**Benefits of Schema Validation**

* **Data integrity**: Prevents bad or malformed data from entering the database.
* **Consistency**: Ensures all documents follow a predictable structure.
* **Flexibility**: Unlike SQL, you can still keep optional fields or allow loose validation.
* **Documentation**: The schema itself acts as documentation for developers.

**Caveats and Considerations**

* Validation rules **do not affect existing documents** unless you manually revalidate them.
* Too strict validation may cause insert/update failures if not properly planned.
* Avoid overly complex schemas that could affect performance.

##Q 13. What is the difference between a primary and a secondary node in a replica set?
**Ans** -In a **MongoDB replica set**, the **primary** and **secondary** nodes serve distinct but complementary roles to ensure **high availability**, **fault tolerance**, and **data redundancy**.

**Key Differences Between Primary and Secondary Nodes**

| Feature | **Primary Node** | **Secondary Node(s)** |
|-|||
| **Role** | Main node that handles **all write operations** | Backup node(s) that **replicate data** from primary |
| **Write Capability** | Accepts **write and read** operations | Accepts **read** operations (if configured), no writes |
| **Election Priority** | Can be elected as primary | Can be promoted to primary via election |
| **Data Source** | Original source of data changes | Pulls and applies changes from primary’s **oplog** |
| **Failure Handling** | If fails, a new primary is elected | One may become the new primary |
| **Read Preference** | Clients read from primary by default | Can be used for reads if `readPreference` is set |

1. **Primary Node**
* The **only node** in the replica set that receives **write operations**.
* Maintains an **oplog** to record all changes.
* Secondaries **replicate** this oplog to stay in sync.

**Example:**

In [None]:
db.collection.insertOne({ name: "Alice" })  // Sent to primary

2. **Secondary Node(s)**

* **Replica(s)** of the primary — they maintain **identical copies** of the data.
* Continuously **sync with the primary's oplog**.
* Can be configured to accept **read operations** for **load balancing** or **geographic distribution**.

**Example Read from Secondary:**

In [None]:
db.getMongo().setReadPref("secondary")
db.collection.find({ name: "Alice" })

**Failover and Elections**
* If the **primary fails**, the replica set **automatically elects a new primary** from the eligible secondaries.
* This ensures **high availability** with **minimal downtime**.

##Q 14. What security mechanisms does MongoDB provide for data protection?
**Ans**- MongoDB provides several robust **security mechanisms** to protect our data at rest, in transit, and during access. These mechanisms ensure **confidentiality**, **integrity**, **authentication**, and **authorization** for secure data management.

**MongoDB Security Mechanisms**
1. **Authentication**

Ensures that only **verified users or applications** can access the database.
* **SCRAM**: Salted Challenge Response Authentication Mechanism.
* **x.509 certificates**: For internal cluster node authentication and external clients.
* **LDAP**: Integrate with enterprise directory services.
* **Kerberos**: For enterprise-level single sign-on.
* **AWS IAM**: Role-based access via cloud identity.

**Example:**

In [None]:
mongod --auth

2. **Authorization**

Controls **what authenticated users can do** using **role-based access control**.
* Built-in roles: `read`, `readWrite`, `dbAdmin`, etc.
* Custom roles: Define fine-grained permissions.
* Permissions can be scoped to:

  * Specific databases
  * Collections
  * Actions (e.g., `insert`, `update`, `index`, `drop`)

**Example:**

In [None]:
db.createUser({
  user: "analyst",
  pwd: "securePass123",
  roles: [ { role: "read", db: "salesDB" } ]
})

3. **Encryption**

* At Rest
  * **MongoDB Enterprise**: Native **Encrypted Storage Engine** encrypts all data at rest.
  * Use **FIPS 140-2 validated encryption** standards.
  * Option to integrate with **Key Management Systems**.

* In Transit
  * TLS/SSL encryption for all client-server and inter-node communications.
  * Prevents data sniffing, tampering, and man-in-the-middle attacks.

* Enable with:

In [None]:
mongod --tlsMode requireTLS --tlsCertificateKeyFile cert.pem

4. **Auditing**
  * Available in **MongoDB Enterprise**.
  * Logs access and operations for compliance.
  * Tracks who did what, when, and how.

5. **IP Whitelisting**
  * Allows access only from **trusted IP addresses**.
  * Prevents unauthorized network-level access.

6. **Field-Level Redaction & Client-Side Encryption**
  * **Field-Level Encryption**: Encrypt individual fields at the application level.
  * **Only client apps** with keys can decrypt sensitive fields.
  * Even DB admins can't view encrypted data.

**FLE Use Cases:**
  * Healthcare records
  * Financial transactions
  * Personally identifiable information

7. **Security Best Practices**
  * Disable direct access to ports using **firewalls**.
  * Run MongoDB with **authentication  enabled**.
  * Avoid using the `admin` database for app users.
  * Keep MongoDB **up to date** with security patches.
  * Monitor access with **MongoDB Atlas monitoring tools** or third-party integrations.

##Q 15. Explain the concept of embedded documents and when they should be used?
**Ans** - In **MongoDB**, an **embedded document** is a document **nested inside another document** as a field value. This is one of the core features of MongoDB’s **document-oriented** data model, allowing related data to be stored together in a **single document** rather than in separate collections.

An embedded document is a **subdocument** contained within a parent document. MongoDB supports documents with complex, hierarchical data structures using **BSON**.

**Example:**

In [None]:
{
  "_id": 1,
  "name": "Alice",
  "email": "alice@example.com",
  "address": {
    "street": "123 Main St",
    "city": "Mumbai",
    "zip": "400001"
  }
}

Here, the `address` field is an **embedded document**.

**Advantages of Embedded Documents**

| Benefit | Description |
|-||
| **Data locality** | Related data is stored together for faster reads |
| **Fewer joins or lookups** | Reduces the need for expensive `$lookup` queries (like SQL joins) |
| **Atomic operations** | Updates to a document and its embedded fields are atomic |
| **Simpler application logic** | Easier to retrieve complete records in one query |

**When to Use Embedded Documents**

we should **embed** when:

1. **One-to-One Relationships**
  * Example: A user with a single profile or settings document.

2. **One-to-Few Relationships**
  * Example: A blog post with a few comments (e.g., less than 10).

3. **Data that is always accessed together**
  * Improves performance by retrieving all needed data in a single read.

4. **Data that doesn’t grow unbounded**
  * MongoDB has a **16MB document size limit**.

**When NOT to Use Embedded Documents**

Avoid embedding when:

1. **Data grows indefinitely**
  * E.g., user comments, order history, or activity logs with unbounded growth.

2. **Data is accessed separately**
  * If subdocuments are queried independently, separate collections might be better.

3. **Many-to-Many relationships**
  * Embedding complex, multi-linked data structures leads to data duplication or inconsistencies.

4. **Large or deeply nested structures**
  * MongoDB limits:

     * Document size: **16MB**
     * Nesting depth: **100 levels**

**Embedded vs Referenced Documents**

| Feature | **Embedded Documents** | **Referenced Documents** |
|-|||
| Storage | Single document | Separate documents with references (ObjectId) |
| Read performance | Faster (no join needed) | May require `$lookup` or multiple queries |
| Write/update | Atomic and simpler | May involve multiple documents/transactions |
| Relationship type | One-to-one or one-to-few | One-to-many or many-to-many |

**Example Use Case: Order with Items**

* Embedded:

In [None]:
{
  "_id": 1001,
  "customer": "Vivek",
  "items": [
    { "product": "Laptop", "qty": 1 },
    { "product": "Mouse", "qty": 2 }
  ]
}

**Referenced (if items are many or reused):**

In [None]:
{
  "_id": 1001,
  "customer": "Vivek",
  "item_ids": [ ObjectId("..."), ObjectId("...") ]
}

##Q 16. What is the purpose of MongoDB’s $lookup stage in aggregation?
**Ans** - The **`$lookup`** stage in MongoDB’s **aggregation pipeline** is used to **perform a left outer join** between documents in one collection and documents in another collection. It allows us to **combine data from multiple collections** in a way that’s similar to the `JOIN` operation in SQL.

**Purpose of `$lookup`**
* To **enrich documents** by embedding related data from a different collection.
* To **simulate JOINs** in MongoDB, which is otherwise non-relational.
* Useful in scenarios like:

  * Merging `orders` with customer details.
  * Showing products with their supplier information.
  * Displaying users with their roles or permissions.

**Syntax of `$lookup`**

In [None]:
{
  $lookup: {
    from: "otherCollection",           // The collection to join with
    localField: "localFieldName",      // Field in the current collection
    foreignField: "foreignFieldName",  // Field in the other collection
    as: "outputArrayField"             // Name of the new array field with matched documents
  }
}

**Example**

Assume two collections:

**orders**

In [None]:
{
  "_id": 1,
  "product": "Laptop",
  "customerId": 101
}

**customers**

In [None]:
{
  "_id": 101,
  "name": "Vivek",
  "email": "vivek@example.com"
}

**Aggregation with `$lookup`:**

In [None]:
db.orders.aggregate([
  {
    $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customerDetails"
    }
  }
])

**Result:**

In [None]:
{
  "_id": 1,
  "product": "Laptop",
  "customerId": 101,
  "customerDetails": [
    {
      "_id": 101,
      "name": "Vivek",
      "email": "vivek@example.com"
    }
  ]
}

**Advanced `$lookup` with Pipeline (MongoDB 3.6+)**

Allows more control over matching and transformation during the join.

In [None]:
{
  $lookup: {
    from: "orders",
    let: { user_id: "$_id" },
    pipeline: [
      { $match: { $expr: { $eq: ["$customerId", "$$user_id"] } } },
      { $project: { product: 1, _id: 0 } }
    ],
    as: "orders"
  }
}

**Characteristics**

| Feature | Description |
|-||
| Join Type | Left outer join only (not inner or full joins) |
| Output Format | Matched documents appear as an **array** (`as` field) |
| Performance Impact | Can be heavy if joining large collections |
| Supports Pipelines | Yes, advanced syntax allows use of pipelines inside `$lookup` |
| Sharded Support | Available with restrictions (MongoDB 5.0+ improves support) |

**Best Use Cases**

* One-to-one or one-to-few relationships between collections.
* When we need to enrich or denormalize data for reporting or display.
* Reducing the need for multiple client-side queries.

**Performance Tips**

* Ensure **indexed fields** on `foreignField` for faster joins.
* Avoid joining very large datasets unless necessary.
* Use `$project` after `$lookup` to trim down large documents.

##Q 17. What are some common use cases for MongoDB?
**Ans**- MongoDB is a highly flexible, scalable, and schema-less NoSQL database that supports a wide range of real-world applications. Its document-oriented data model, rich query capabilities, and ability to scale horizontally make it ideal for many modern use cases.

**Common Use Cases for MongoDB**
1. **Content Management Systems**

MongoDB excels in CMS platforms due to its flexible schema and support for diverse content types.
* Blogs, news sites, product catalogs
* Easily store metadata, user-generated content, media references
* Example: Storing articles with tags, author info, embedded comments

2. **Real-Time Analytics**

MongoDB can store, process, and analyze large volumes of semi-structured data in real-time.
* Tracking user behavior, clickstream data
* IoT sensor data aggregation
* Mobile app usage patterns

3. **E-commerce Platforms**

MongoDB is great for handling complex and rapidly changing product catalogs and orders.
* Product details with variable attributes
* Customer profiles and wishlists
* Orders with embedded items and payment history

Its flexible data model allows adding new product fields without schema migrations.

4. **Mobile and Web Applications**

MongoDB’s JSON-like document structure pairs naturally with frontend frameworks and REST APIs.
* Offline-first apps with local data sync
* Session storage and user preferences
* Fast prototyping with dynamic data structures

5. **Internet of Things**

IoT devices generate high-velocity and high-volume data, which MongoDB handles well.

* Time-series data from devices
* Geo-location tracking
* Device configurations and logs

Integration with time-series collections (MongoDB 5.0+) further improves performance.

6. **Catalogs and Inventory Management**

MongoDB's document model supports dynamic product schemas, variant handling, and hierarchical categories.
* Store complex objects with pricing, stock, and supplier details
* Track changes and restocks efficiently

7. **Gaming Applications**

MongoDB is used to store player profiles, achievements, scores, and game state in a flexible way.
* Leaderboards and multiplayer state tracking
* Inventory systems with item data
* Live chat systems

8. **Customer Data Platforms & CRMs**

Easily manage customer 360° views with various attributes (purchases, interactions, support logs).
* Single view of customer across channels
* User segmentation and personalization

9. **Log Management**

MongoDB is frequently used as a log storage engine for applications and servers.
* Centralized logging for microservices
* Structured log storage and querying
* Integrated with ELK or custom analytics tools

10. **Data Lake and Archival Systems**

MongoDB can serve as a lightweight data lake or archive solution for JSON or semi-structured data.
* Ingest from multiple sources (APIs, events)
* Long-term storage of unstructured documents

##Q 18.What are the advantages of using MongoDB for horizontal scaling?
**Ans**- Using **MongoDB** for **horizontal scaling** offers several advantages, especially for applications that deal with **large datasets**, require **high availability**, and experience **variable or unpredictable traffic patterns**. Horizontal scaling in MongoDB is primarily achieved through **sharding**.

**Horizontal scaling** means **adding more servers** to distribute the database load, as opposed to vertical scaling, which means upgrading the existing server’s hardware.

In MongoDB, this is done using **sharding** — partitioning the data across multiple servers or clusters.

**Advantages of Using MongoDB for Horizontal Scaling**
1. **Handles Massive Data Volumes**

* MongoDB can store **billions of documents** across multiple servers.
* As our data grows, we simply **add more shards** — no need to re-architect the application.

2. **Improved Read and Write Performance**

* Distributes read and write loads across shards.
* **Parallel query execution** boosts performance for large datasets.
* Helps maintain low latency even under high traffic.

3. **Automatic Data Distribution**

* MongoDB’s sharded cluster automatically balances data based on the **shard key**.
* When data is uneven, it automatically **rebalances chunks** to keep load distribution optimal.

4. **Elastic Scalability**

* Scale out **on demand** — add or remove shards without taking the system offline.
* Supports both **manual scaling** and **auto-scaling**.

5. **Cost-Effective Scaling**

* Rather than investing in expensive high-performance machines, we can use **commodity hardware**.
* Lower total cost of ownership, especially for startups or fast-growing apps.

6. **Fault Isolation and Availability**

* If one shard goes down, **only part of the data is affected**, not the entire database.
* With **replica sets on each shard**, MongoDB ensures **high availability and failover support**.

7. **Geographical Distribution**

* we can shard based on **region**, placing data physically closer to users.
* Reduces latency for global applications and complies with **data locality regulations**.

8. **Custom Shard Keys for Optimization**

* we can choose a **shard key** that aligns with our query pattern.
* This helps distribute load effectively and avoids **hotspots**.

**Real-World Scenarios Benefiting from Horizontal Scaling**

| Scenario | Why Horizontal Scaling Helps |
|-||
| E-commerce site with 10M+ users | Handles concurrent checkouts and product views |
| IoT platform with billions of events | Efficient ingestion and querying of time-series data |
| Social media app with global users | Region-based sharding and latency reduction |
| Log aggregation system | Continuous write-heavy workloads handled smoothly |

##Q 19. How do MongoDB transactions differ from SQL transactions?
**Ans** - MongoDB and SQL databases both support **transactions** to ensure data consistency and integrity, but they differ significantly in **scope**, **design philosophy**, and **default behavior** due to the **NoSQL vs SQL** data model.

Here’s a detailed comparison:

1. **Data Model**

| Feature | **MongoDB** | **SQL (Relational DBs)** |
|-|||
| Model | Document-oriented (JSON/BSON) | Table-based (rows and columns) |
| Structure | Nested/embedded documents | Strictly normalized relational schema |
| Transaction Use | Often avoidable due to embedded documents | Required for cross-table integrity |

**MongoDB's flexible schema often reduces the need for multi-document transactions**, unlike SQL where normalization often forces multiple tables and thus requires transactions.

2. **Transaction Scope**

| Scope | **MongoDB** | **SQL** |
|-|||
| Single-document | Always atomic | Requires a transaction for atomicity |
| Multi-document  | Supported since v4.0 (replica sets), v4.2 (sharded clusters) | Native support for multi-table transactions |

**SQL supports multi-table transactions by default**.
MongoDB supports **multi-document ACID transactions**, but they’re more recent and **less efficient** for high-volume workloads compared to single-document ops.

3. **ACID Properties**

| Property | MongoDB | SQL |
|-|||
| Atomicity | Always atomic at document level | Always atomic in transactions |
| Consistency | Enforced via validation rules | Enforced via schema & constraints |
| Isolation | Snapshot isolation for transactions | Stronger and customizable |
| Durability  | Journaling + write concern | Transaction logs (WAL) |

Both systems **support full ACID transactions**, but **SQL has more mature isolation levels** with fine-tuning options.

4. **Write Behavior and Performance**

| Behavior | **MongoDB** | **SQL** |
|-|||
| Write overhead | Lightweight unless using transactions    | Depends on transaction complexity       |
| Best for       | High-throughput, denormalized workloads  | Highly structured, relational workloads |
| Default write  | No transaction unless explicitly started | Often batched inside a transaction      |

In MongoDB, **most operations are outside transactions by default** unless explicitly defined with `startTransaction()` in a session.

5. **Usage Syntax Comparison**

**SQL:**

In [None]:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

**MongoDB (PyMongo):**

In [None]:
with client.start_session() as session:
    with session.start_transaction():
        collection.update_one({"_id": 1}, {"$inc": {"balance": -100}}, session=session)
        collection.update_one({"_id": 2}, {"$inc": {"balance": 100}}, session=session)

6. **Sharding and Transactions**

| Feature | **MongoDB** | **SQL (e.g., PostgreSQL, MySQL)** |
|-|||
| Sharded cluster | Multi-document transactions supported from v4.2+ | Typically no built-in sharding |
| Distributed TXs | More complex, slower across shards | Requires third-party tools or clusters |

MongoDB supports **distributed transactions**, but they are **less performant** than single-shard or single-document operations.

##Q 20. What are the main differences between capped collections and regular collections?
**Ans** - **Capped collections** and **regular collections** in MongoDB serve different purposes, especially in terms of **data lifecycle**, **performance**, and **storage behavior**. comparison of the main differences:

1. **Definition**

| Collection Type | Description |
|-||
| **Regular Collection** | A standard MongoDB collection that stores documents with no size or order constraints. |
| **Capped Collection** | A fixed-size collection that **automatically overwrites the oldest documents** when it reaches its size limit. |

2. **Storage Behavior**

| Feature | **Regular Collection** | **Capped Collection** |
|-|||
| **Size Limit** | Unlimited (depends on disk size) | Fixed-size (defined at creation) |
| **Document Overwrite** | No automatic deletion or overwrite | Oldest documents are overwritten when full |
| **Insert Order** | No guaranteed order | Maintains **insertion order** |
| **Deletion** | You can delete any document | **Explicit deletions not allowed** (except via overwrite) |

3. **Use Cases**

| Use Case | Capped Collection | Regular Collection |
|-|||
| **Logging or event streaming** | Ideal — keeps latest logs | Not efficient |
| **Real-time dashboards** | Efficient with fixed data window | Works, but less efficient |
| **User profiles, orders, products** | Not suitable | Perfect use case |
| **Data archiving** | Not suitable | Recommended |

4. **Performance**

| Feature | **Capped Collection** | **Regular Collection** |
|-|||
| **Write Performance** | Very fast (no index maintenance, no disk reallocation) | Fast but depends on indexes |
| **Query Performance** | Optimized for **circular buffer-like reads** | Flexible but may require indexes |
| **Update Limitations** | Cannot change document size after insert | Can modify document freely |

5. **Creation Example**

Regular Collection:

In [None]:
db.createCollection("users")

Capped Collection:

In [None]:
db.createCollection("logs", {
  capped: true,
  size: 5242880,   // Size in bytes (e.g., 5MB)
  max: 5000        // Optional: Max number of documents
})

6. **Feature Limitations of Capped Collections**

| Limitation | Description |
|-||
| No document deletion | Cannot delete specific documents manually |
| No document resizing | Cannot update documents to increase their size |
| No default indexes | Only `_id` index by default |
| Limited use of queries | Supports a limited subset of query operators |

##Q 21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?
**Ans** - The **`$match`** stage in MongoDB’s **aggregation pipeline** is used to **filter documents** based on specified criteria—**similar to the `find()` query filter**. It allows us to include **only the documents that match a condition** for further processing in the pipeline.

**Purpose of `$match`**

* Filters input documents and **passes only the matching documents** to the next pipeline stage.
* Helps **reduce the amount of data processed** in later stages, improving **efficiency and performance**.
* Often placed **early in the pipeline** to limit the scope of processing.

**Syntax**

In [None]:
{ $match: <query> }

The `<query>` uses standard MongoDB query operators.

**Example: Filter users older than 30**

In [None]:
db.users.aggregate([
  {
    $match: { age: { $gt: 30 } }
  }
])

This filters and passes only documents where `age > 30` to the next stage.

**Example: Combine with other pipeline stages**

In [None]:
db.orders.aggregate([
  { $match: { status: "delivered" } },
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } }
])

* `$match` filters only delivered orders.
* `$group` calculates the total amount spent by each customer.

**Benefits of Using `$match`**

| Benefit | Description |
|-||
| **Improved performance** | Reduces documents early in the pipeline |
| **Index utilization** | `$match` at the beginning can use indexes |
| **Flexible filtering** | Supports complex conditions (`$and`, `$or`, `$in`, etc.) |
| **Optimized aggregation** | Filters data before grouping, projecting, or sorting |

**Complex Query Example**

In [None]:
db.products.aggregate([
  {
    $match: {
      category: "Electronics",
      price: { $gte: 5000, $lte: 50000 },
      inStock: true
    }
  },
  { $sort: { price: -1 } }
])

This filters for:

* Electronics category
* Price between ₹5000 and ₹50000
* In-stock items only
  Then sorts the filtered results in descending order by price.

**Best Practices**

* Place `$match` as **early as possible** in the pipeline.
* Combine multiple `$match` conditions into one stage to reduce overhead.
* Use `$match` instead of `$redact` when simple filtering is enough.

##Q 22. How can you secure access to a MongoDB database?
**Ans** - Securing access to a MongoDB database is **critical** for protecting sensitive data and preventing unauthorized access. MongoDB offers a comprehensive set of **security features** that can be combined to build a robust defense.

We can secure MongoDB step by step:

1. **Enable Authentication**

By default, MongoDB does **not require authentication**. Always enable it in production.

* Start MongoDB with `--auth` or set `security.authorization: enabled` in `mongod.conf`.
* Create an **admin user** to manage the database.

In [None]:
use admin
db.createUser({
  user: "admin",
  pwd: "strongPassword123",
  roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
})

2. **Use Role-Based Access Control**

Assign users **only the permissions they need** using predefined or custom roles.

* Example:

In [None]:
db.createUser({
  user: "appUser",
  pwd: "secureAppPwd",
  roles: [ { role: "readWrite", db: "myAppDB" } ]
})

3. **Enable TLS/SSL for Encrypted Connections**

Encrypt all client-server and inter-node traffic using **TLS/SSL** to prevent man-in-the-middle attacks.

How:

* Generate TLS certificates.
* Start MongoDB with:

In [None]:
mongod --tlsMode requireTLS --tlsCertificateKeyFile /path/cert.pem

4. **Use Strong Passwords and Keyfiles**

* Enforce strong passwords for all users.
* For replica sets, use **keyfiles** to authenticate communication between nodes.

* Keyfile setup:

In [None]:
mongod --keyFile /path/to/keyfile

5. **Network Access Control**

Limit who can connect to our MongoDB instance.

**Recommendations:**

* Bind MongoDB to `localhost` unless remote access is needed:

In [None]:
  net:
    bindIp: 127.0.0.1

* Use firewalls or cloud security groups to allow access **only from trusted IPs**.

6. **IP Whitelisting**

If you're using **MongoDB Atlas**, configure a list of **trusted IP addresses** or CIDR ranges that are allowed to connect.

7. **Enable Auditing**

Track user activity for compliance (e.g., GDPR, HIPAA, PCI).

* Logs actions like authentication, CRUD operations, schema changes.
* Useful for **forensics and accountability**.

8. **Use Field-Level Encryption**

Encrypt sensitive fields**at the client-side** before sending data to the database.

* Data remains encrypted even to DBAs or sysadmins.
* Decryption requires **client-side keys**.

9. **Disable Unused Interfaces**

Disable unused network interfaces like HTTP status interface and REST API.

In [None]:
net:
  http:
    enabled: false

10. **Keep MongoDB Up-to-Date**

Always use the **latest stable release** to patch known vulnerabilities and bugs.

11. **Monitor & Alert**

* Use MongoDB Atlas monitoring or external tools (like Prometheus + Grafana).
* Set up alerts for suspicious activity, slow queries, or connection anomalies.

##Q 23. What is MongoDB’s WiredTiger storage engine, and why is it important?
**Ans** - **WiredTiger** is MongoDB's default **storage engine**, introduced in **MongoDB 3.2** and used by default since **version 3.2 onward**. It plays a central role in how MongoDB **stores, manages, and accesses data on disk and in memory**.

**WiredTiger** is a high-performance, modern storage engine designed for:
* **Concurrency**: Supports multiple simultaneous read and write operations.
* **Compression**: Uses data compression to reduce disk space.
* **Caching**: Efficient memory usage and cache management.
* **Checkpointing and Journaling**: Ensures durability and crash recovery.

**WiredTiger is Important**
1. **Improved Performance and Scalability**

* Uses **document-level concurrency control** (vs older MMAPv1's collection-level lock), allowing multiple users to update different documents in the same collection simultaneously.
* Ideal for **multi-core systems**, enabling parallel processing.

2. **Data Compression**

* Supports **snappy** and **zlib** compression algorithms.
* Reduces disk space usage significantly, which also improves I/O performance.

In [None]:
storage:
  engine: wiredTiger
  wiredTiger:
    collectionConfig:
      blockCompressor: snappy

3. **Efficient Use of RAM (Cache)**

* Uses a **built-in cache** that dynamically balances memory use between indexes and data.
* Ensures efficient performance even with large working sets.

4. **Crash-Safe with Journaling**

* Provides **checkpointing** and **write-ahead journaling** for durability.
* Ensures that MongoDB can recover cleanly after an unplanned shutdown or crash.

5. **Fine-Grained Concurrency**

* Unlike older engines that locked the entire database or collection, WiredTiger offers **concurrent access at the document level**, which is crucial for modern applications with high throughput.

6. **Support for Encryption at Rest**

* WiredTiger enables **encryption of stored data** when using MongoDB Enterprise or Atlas.

7. **Customizable Configuration**

You can tune WiredTiger for different workloads:

* Change **compression settings**
* Adjust **cache size**
* Modify **journal and checkpoint intervals**

**WiredTiger vs MMAPv1 (Old Engine)**

| Feature | **WiredTiger** | **MMAPv1 (Deprecated)** |
|-|||
| Lock Granularity | Document-level | Collection-level |
| Compression | Yes (Snappy, zlib) | No |
| Write Performance | Much higher | Lower |
| Cache Management  | Intelligent, dynamic | Limited |
| Concurrency | Highly concurrent | Limited concurrency |
| Durability | Journaling + checkpointing | Journaling only |

# Practical

##Q 1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB.
**Ans** -**load the Superstore dataset from a CSV file into MongoDB** using **`pymongo`** and **`pandas`**.

* MongoDB is running locally on the default port.
* The collection will be created in a database called `superstore_db` and named `orders`.
* We're using the uploaded file: `/mnt/data/superstore.csv`.

**Requirements**

Make sure we have the following libraries installed:

In [None]:
pip install pymongo pandas

Python Script

In [None]:
import pandas as pd
from pymongo import MongoClient

# Load CSV into a pandas DataFrame
csv_path = '/mnt/data/superstore.csv'
df = pd.read_csv(csv_path)

# Replace NaN with None to avoid issues during MongoDB insertion
df = df.where(pd.notnull(df), None)

# Convert DataFrame to list of dictionaries (records)
records = df.to_dict(orient='records')

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")  # Change URI if using MongoDB Atlas or remote

# Create/use the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Optional: Drop collection if re-running the script to avoid duplicates
collection.drop()

# Insert the data
collection.insert_many(records)

print(f"Inserted {len(records)} records into 'superstore_db.orders' collection.")

##Q 2. Retrieve and print all documents from the Orders collection
**Ans**- To retrieve and print all documents from the `orders` collection in MongoDB,we can use the following **Python script with PyMongo**:

**Python Script to Read and Print All Documents**

In [None]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Select the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Retrieve and print all documents
all_orders = collection.find()

for order in all_orders:
    print(order)

**Notes:**

* The `find()` method returns a **cursor**, so we can iterate over it efficiently.
* Each document is printed as a Python dictionary, including the `_id` field automatically generated by MongoDB.

**Optional: Pretty Print (Better Formatting)**

In [None]:
from pprint import pprint

for order in collection.find():
    pprint(order)

This will make the output easier to read.

##Q 3. Count and display the total number of documents in the Orders collection.
**Ans**- To **count and display the total number of documents** in the `orders` collection from the `superstore_db` database using Python and `pymongo`

**Python Code to Count Documents**

In [None]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Select the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Count documents
total_orders = collection.count_documents({})

# Display result
print(f"Total number of documents in 'orders' collection: {total_orders}")

##Q 4. Write a query to fetch all orders from the "West" region
**Ans** - To **fetch all orders from the "West" region** in the `orders` collection of our MongoDB database using Python and `pymongo`

**Python Query to Filter by Region**

In [None]:
from pymongo import MongoClient
from pprint import pprint

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Select the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Query: Find all orders from the "West" region
west_orders = collection.find({ "Region": "West" })

# Print each matching document
for order in west_orders:
    pprint(order)

##Q 5. Write a query to find orders where Sales is greater than 500.
**Ans** - To **find all orders where the `Sales` value is greater than 500** in our MongoDB `orders` collection using Python and `pymongo`, we can use a **query with the `$gt` operator**.

**Python Query: Sales > 500**

In [None]:
from pymongo import MongoClient
from pprint import pprint

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Select the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Query: Find all orders where Sales > 500
high_sales_orders = collection.find({ "Sales": { "$gt": 500 } })

# Print matching documents
for order in high_sales_orders:
    pprint(order)

##Q 6. Fetch the top 3 orders with the highest Profit.
**Ans** - To **fetch the top 3 orders with the highest `Profit`** from the MongoDB `orders` collection using Python and `pymongo`,we can **sort the documents by `Profit` in descending order** and **limit** the result to 3.

**Python Query: Top 3 Profitable Orders**

In [None]:
from pymongo import MongoClient
from pprint import pprint

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Select the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Query: Get top 3 orders by Profit in descending order
top_profit_orders = collection.find().sort("Profit", -1).limit(3)

# Print the results
for order in top_profit_orders:
    pprint(order)

##Q 7. Update all orders with Ship Mode as "First Class" to "Premium Class."
**Ans** - To **update all orders** in our MongoDB `orders` collection where `Ship Mode` is `"First Class"` and change it to `"Premium Class"`, we can use the `update_many()` method with a filter and `$set` update operator.

Python Code: Update Ship Mode

In [None]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Select the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Update all documents where Ship Mode is "First Class"
result = collection.update_many(
    { "Ship Mode": "First Class" },
    { "$set": { "Ship Mode": "Premium Class" } }
)

# Output the result
print(f"Modified {result.modified_count} documents.")

##Q 8. Delete all orders where Sales is less than 50.
**Ans**- To **delete all orders where the `Sales` value is less than 50** from our MongoDB `orders` collection using Python and `pymongo`,we can use the `delete_many()` method with a `$lt` (less than) filter.

**Python Code: Delete Orders with Sales < 50**

In [None]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Select the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Delete all documents where Sales < 50
result = collection.delete_many({ "Sales": { "$lt": 50 } })

# Output the result
print(f"Deleted {result.deleted_count} documents.")

##Q 9. Use aggregation to group orders by Region and calculate total sales per region
**Ans** - To **group orders by `Region`** and **calculate total sales per region** using MongoDB's **aggregation pipeline** with Python and `pymongo`, we can use the `$group` stage along with `$sum`.

**Python Aggregation Query: Total Sales by Region**

In [None]:
from pymongo import MongoClient
from pprint import pprint

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Select the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Aggregation pipeline to group by Region and sum Sales
pipeline = [
    {
        "$group": {
            "_id": "$Region",
            "total_sales": { "$sum": "$Sales" }
        }
    },
    {
        "$sort": { "total_sales": -1 }  # Optional: Sort regions by total sales descending
    }
]

# Run the aggregation
results = collection.aggregate(pipeline)

# Display the results
print("Total Sales per Region:")
for result in results:
    pprint(result)

##Q 10. Fetch all distinct values for Ship Mode from the collection.
**Ans** - To **fetch all distinct values** for the `Ship Mode` field from our MongoDB `orders` collection using Python and `pymongo`, we can use the `distinct()` method.

**Python Code: Get Distinct `Ship Mode` Values**

In [None]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Select the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Fetch all distinct Ship Mode values
ship_modes = collection.distinct("Ship Mode")

# Print the result
print("Distinct Ship Mode values:")
for mode in ship_modes:
    print("-", mode)

##Q 11. Count the number of orders for each category.
**Ans** - To **count the number of orders for each `Category`** in our MongoDB `orders` collection using Python and `pymongo`, we can use the **aggregation pipeline** with `$group` and `$count`.

**Python Code: Count Orders by Category**

In [None]:
from pymongo import MongoClient
from pprint import pprint

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Select the database and collection
db = client["superstore_db"]
collection = db["orders"]

# Aggregation pipeline
pipeline = [
    {
        "$group": {
            "_id": "$Category",
            "order_count": { "$sum": 1 }
        }
    },
    {
        "$sort": { "order_count": -1 }  # Optional: sort descending
    }
]

# Run aggregation
results = collection.aggregate(pipeline)

# Display results
print("Order count per Category:")
for category in results:
    pprint(category)

**Explanation:**

* `$group`: Groups documents by `Category`.
* `$sum: 1`: Counts one for each document in the group.
* `$sort`: (Optional) Orders the output from most to least.

**Example Output (Illustrative):**

In [None]:
{'_id': 'Office Supplies', 'order_count': 1900}
{'_id': 'Furniture', 'order_count': 1100}
{'_id': 'Technology', 'order_count': 900}