1. What are the key differences between SQL and NoSQL databases?

Ans:

| Feature            | SQL (Relational DB)                          | NoSQL (Non-Relational DB)                        |
| ------------------ | -------------------------------------------- | ------------------------------------------------ |
| **Data Model**     | Structured data with fixed schemas (tables)  | Flexible schema, unstructured or semi-structured |
| **Examples**       | MySQL, PostgreSQL, Oracle                    | MongoDB, Cassandra, CouchDB, Redis               |
| **Query Language** | SQL (Structured Query Language)              | Various, e.g., MongoDB uses BSON-based query API |
| **Scalability**    | Vertical scaling (add more power to server)  | Horizontal scaling (add more servers)            |
| **Transactions**   | Strong ACID compliance                       | Limited ACID (some support BASE consistency)     |
| **Relationships**  | Designed for complex joins between tables    | Join support is limited; prefers denormalization |
| **Best for**       | Structured, consistent data and transactions | Big data, real-time apps, flexible data storage  |


2. What makes MongoDB a good choice for modern applications?

Ans:


MongoDB is widely used in modern web and mobile applications due to several key features:

* Document-oriented Storage: Stores data in flexible, JSON-like BSON documents.

* Dynamic Schema: Allows changes in data structure without downtime.

* Horizontal Scalability: Easily scale out using sharding across multiple nodes.

* High Availability: Built-in replication and automatic failover.

* Rich Query Language: Supports powerful ad-hoc queries, aggregation, and indexing.

* Developer-Friendly: Good integration with modern development stacks (MEAN/MERN).

* Real-Time Analytics: Efficient for handling large-scale, high-throughput workloads.

* Cloud-native (MongoDB Atlas): Fully managed database-as-a-service (DBaaS).



3. Explain the concept of collections in MongoDB

Ans:


* A collection in MongoDB is analogous to a table in a relational database.

* It is a group of documents (records), where:

* Each document is a JSON-like object (BSON).

* Documents in the same collection can have different fields and structures.

* No predefined schema: Collections don’t require all documents to have the same set of fields.

Example:

{

  "_id": ObjectId("abc123"),
  
  "name": "Alice",
  
  "email": "alice@example.com"
  
}


4. How does MongoDB ensure high availability using replication?

Ans:


MongoDB uses a replica set to ensure high availability:

* A replica set is a group of MongoDB servers that maintain the same data.

* It includes:

  * Primary node: Handles all writes and reads by default.

  * Secondary nodes: Maintain copies of the primary’s data via oplog replication.

* If the primary fails:

  * An automatic failover happens.

  * A secondary node is elected as the new primary.

* Replication provides:

  * Redundancy

  * ata backup

  * Automatic recovery

  * Read scalability (optionally read from secondaries)

5. What are the main benefits of MongoDB Atlas?

Ans:


MongoDB Atlas is the fully managed cloud version of MongoDB, offering:

* Automated Deployment: One-click provisioning of clusters on AWS, GCP, or Azure.

* Auto-scaling: Scales cluster size based on usage.

* Built-in Security:

  * Encryption at rest and in transit

  * Role-based access control

  * IP whitelisting

* Global Clusters: Serve users from the nearest data center.

* Real-time Monitoring: Dashboards, performance alerts, and logs.

* Automated Backups: Continuous and point-in-time recovery.

* Serverless Options: Pay-as-you-go based on usage.

* Data Tiering: Hot/cold storage management to optimize cost.

6. What is the role of indexes in MongoDB, and how do they improve performance?

Ans:

Indexes in MongoDB are special data structures that store a small portion of the collection's data to improve query speed:

* Without an index, MongoDB must scan every document (collection scan).

* With indexes:

  * MongoDB can quickly find matching documents.

  * Drastically reduces I/O operations.

* Types of indexes:

  * Single field: e.g., { name: 1 }

  * Compound index: e.g., { name: 1, age: -1 }
  * Text index: For full-text search

  * Geospatial index: For location-based queries

  * Hashed index: For sharded clusters

Example:

db.users.create_index([("email", 1)])


7. Describe the stages of the MongoDB aggregation pipeline

Ans:


The aggregation pipeline is a framework for data transformation and analysis. Data passes through a series of stages:

Common Stages:

1. $match: Filters documents (like WHERE in SQL)

2. $group: Groups documents and performs aggregations (sum, avg, etc.)

3. $project: Shapes the output (select specific fields, computed fields)

4. $sort: Sorts documents by field

5. $limit / $skip: Limits or skips documents

6. $unwind: Deconstructs arrays into individual documents

7. $lookup: Performs a join with another collection

Example:

db.orders.aggregate([

  { $match: { status: "complete" }},
  
  { $group: { _id: "$customerId", total: { $sum:
  "$amount" }}},
  
  { $sort: { total: -1 }},
  
  
  { $limit: 5 }

])


8. What is sharding in MongoDB? How does it differ from replication?

Ans:


Sharding is a method for horizontally scaling MongoDB:

* Large datasets are split across multiple machines (shards).

* Each shard contains a subset of the data.

* A shard key determines how data is distributed.

* A mongos router directs queries to the correct shard(s).

Sharding vs Replication:

| Feature      | Sharding                                                     | Replication                       |
| ------------ | ------------------------------------------------------------ | --------------------------------- |
| **Purpose**  | Scale data horizontally                                      | Provide high availability         |
| **Data**     | Partitioned across shards                                    | Duplicated across replicas        |
| **Failure**  | Single shard failure might cause data loss if not replicated | Failover to secondary             |
| **Used For** | Large datasets, high write loads                             | Redundancy, read scaling, backups |


9. What is PyMongo, and why is it used?

Ans;


PyMongo is the official Python driver for MongoDB.

* It provides an interface to:

* Connect to MongoDB

* Perform CRUD operations (Create, Read, Update, Delete)

* Use aggregation pipelines

* Manage indexes

* Handle transactions

Example usage:

from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")

db = client["mydb"]

users = db["users"]

users.insert_one({"name": "Alice", "age": 25})

* Why use PyMongo?

  * Enables Python applications to work seamlessly with MongoDB.

  * Supports all MongoDB features including GridFS, indexing, and sharding.



10. What are the ACID properties in the context of MongoDB transactions?

Ans:


ACID stands for:

| Property        | Meaning                                                      |
| --------------- | ------------------------------------------------------------ |
| **Atomicity**   | All operations in a transaction succeed or none do           |
| **Consistency** | Database moves from one valid state to another               |
| **Isolation**   | Transactions are isolated from one another (no interference) |
| **Durability**  | Once committed, changes are permanent (even after crashes)   |


In MongoDB:

Starting with version 4.0, MongoDB supports multi-document ACID transactions (for replica sets).

From version 4.2, transactions are supported across sharded clusters.

Transactions are used when multiple documents/collections need to be updated together safely.

with client.start_session() as session:
    
    with session.start_transaction():
    
        users.update_one({"_id": 1}, {"$set":
        {"balance": 90}}, session=session)
        
        users.update_one({"_id": 2}, {"$set":
        {"balance": 110}}, session=session)


11. What is the purpose of MongoDB’s explain() function?

Ans


The explain() function is used to analyze and understand how MongoDB executes a query.

Purpose:

* Shows query execution plan and performance details.

* Helps identify:

  * Whether indexes are being used.

  * Which stage is taking the most time.

  * Whether a collection scan (COLLSCAN) or index scan (IXSCAN) is occurring.

Example:

db.users.find({ age: 25 }).explain("executionStats")

Modes:

* "queryPlanner" – Default, shows logical plan.

* "executionStats" – Adds timing and performance stats.

* "allPlansExecution" – Includes all considered plans with stats.

* This is essential for query optimization and debugging.


12. How does MongoDB handle schema validation?

Ans:

Although MongoDB is schema-less, it supports schema validation rules at the collection level using JSON Schema.

How it works:

* Define validation rules when creating or modifying a collection.

* Enforces rules on fields, types, required properties, value ranges, etc.

Example

db.createCollection("products", {
  
  validator: {
  
    $jsonSchema: {
  
      bsonType: "object",
  
      required: ["name", "price"],
  
      properties: {
  
        name: {
  
          bsonType: "string",
  
          description: "must be a string and is
          
          required"
        
        },
        
        price: {
        
          bsonType: "double",
        
          minimum: 0,
        
          description: "must be a positive number"
        }
      }
    }
  }
}
)


13. What is the difference between a primary and a secondary node in a replica set?

Ans;

Primary Node:

* Handles all write and read operations (by default).

* Only one primary exists in a replica set.

* All write operations are replicated to secondaries.

Secondary Node:

* Read-only (unless explicitly enabled).

* Syncs data from the primary using the oplog (operation log).

* Can become primary during failover.

Key Differences:

| Feature     | Primary                  | Secondary                     |
| ----------- | ------------------------ | ----------------------------- |
| Role        | Accepts writes           | Replicates from primary       |
| Read access | Default source for reads | Can serve reads if configured |
| Failover    | Elected as new primary   | Can be promoted to primary    |


14. What security mechanisms does MongoDB provide for data protection?

Ans:


MongoDB offers multiple layers of security:

Authentication:

* Supports SCRAM, LDAP, x.509 certificates, Kerberos, and AWS IAM.

* Controls who can access the database.

Authorization:

* Role-Based Access Control (RBAC).

* Defines user roles (read-only, admin, dbOwner, etc.).

Encryption:

* TLS/SSL for encryption in transit.

Encryption at Rest:

  * MongoDB Enterprise: Native encryption.

  * MongoDB Atlas: Fully managed encryption.

Auditing:

* Tracks access and activity (available in MongoDB Enterprise).

IP Whitelisting:

*Allows only certain IP addresses to connect.

17. 7. What are some common use cases for MongoDB?

Ans:


MongoDB is best suited for:

Real-Time Analytics: Log processing, sensor data, IoT.

* Content Management Systems: Blogs, product catalogs, CMS platforms.

* Mobile Applications: Offline-first sync, flexible schemas.

* E-commerce Platforms: Product info, user data, order histories.

* Social Networks: Posts, comments, user profiles.

* Catalogs and Metadata Repositories: Dynamic fields, document-based storage.

* Gaming: Player profiles, scores, match histories.

18. What are the advantages of using MongoDB for horizontal scaling?

Ans:


MongoDB uses sharding for horizontal scaling:

Advantages:

* Distributes data across multiple machines.

* Handles large-scale datasets and high throughput.

* Enables cost-effective scaling (commodity hardware).

* No downtime scaling with auto-sharding.

* Improved write and read performance under heavy load.

* MongoDB's architecture supports multi-region clusters, ensuring faster access from different locations.

19. How do MongoDB transactions differ from SQL transactions?

Ans:

SQL Transactions:

* Always ACID-compliant.

* Built-in to the RDBMS.

* Support complex operations across multiple tables.

MongoDB Transactions:

* Introduced in MongoDB 4.0+ (replica sets), and 4.2+ (sharded clusters).

* Multi-document transactions are supported.

* Syntax similar to SQL, but with manual management.

Differences:

| Feature      | SQL                        | MongoDB                         |
| ------------ | -------------------------- | ------------------------------- |
| ACID Support | Always (built-in)          | Supported in modern versions    |
| Complexity   | Seamless across tables     | Slightly more verbose to manage |
| Speed        | Optimized                  | Slower for large-scale usage    |
| Use Case     | Complex relational systems | Limited transactional use cases |


20. What are the main differences between capped collections and regular collections?

Ans:

| Feature                | **Capped Collections**                              | **Regular Collections**                   |
| ---------------------- | --------------------------------------------------- | ----------------------------------------- |
| **Size Limit**         | Fixed size (specified at creation)                  | No fixed size                             |
| **Overwrite Behavior** | Automatically overwrites oldest documents when full | Keeps adding new documents unless deleted |
| **Insert Order**       | Preserves insert order                              | Not guaranteed                            |
| **Use Case**           | Logging, real-time analytics, caching               | General-purpose                           |
| **Deletion Allowed**   | No deletion of specific documents                   | Yes                                       |
| **Indexing**           | `_id` index only by default                         | Can have multiple indexes                 |


Capped collections are useful for:
Logging

* Monitoring

* Circular buffers

* Where only the most recent data is important

21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?

The $match stage is used to filter documents in an aggregation pipeline, similar to the WHERE clause in SQL.

Purpose:

* Filters documents before passing them to the next stages.

* Reduces the number of documents processed in later stages, improving performance.

* Uses the same syntax as find() queries.

Syntax Example:

db.orders.aggregate([

  { $match: { status: "shipped", amount: { $gt: 100 } } }\

])


Why it’s important:

Helps optimize performance by narrowing down the dataset early.

Can be pushed down to query execution to leverage indexes (if possible).

✅ Tip: Place $match early in the pipeline to reduce data volume quickly.

2. How can you secure access to a MongoDB database?

Ans:


Securing a MongoDB database involves a multi-layered security strategy:

🔐 1. Authentication

* Ensures only authorized users can access the database.

* Supported mechanisms:

  * SCRAM-SHA-1 / SCRAM-SHA-256 (default)

  * LDAP

  * x.509 Certificates

  * Kerberos

  * AWS IAM roles (MongoDB Atlas)

🧑‍💻 2. Authorization

* Role-Based Access Control (RBAC):

  * Limits what actions users can perform.

  * Built-in roles: read, readWrite, dbAdmin, userAdmin, etc.

🌐 3. Network Access Control

* IP whitelisting (especially in MongoDB Atlas).

* Use firewalls to restrict access to MongoDB ports.

* Run MongoDB on a private/internal network.

🔒 4. Encryption

* In Transit: Use TLS/SSL to encrypt data between clients and the database.

* At Rest:

  * MongoDB Enterprise supports built-in encryption.

  * MongoDB Atlas encrypts data at rest by default.

📜 5. Auditing (Enterprise only)

* Tracks who accessed what and when.

* Helps with compliance and forensic analysis.

🛡️ 6. Avoid Default Configuration Risks

* Don't expose the database to the public internet.

* Enable authentication (it's disabled by default in older versions).

* Disable HTTP status interface (rest option).



23. What is MongoDB’s WiredTiger storage engine, and why is it important?

Ans:

WiredTiger is MongoDB’s default storage engine since version 3.2.

🔧 What is a Storage Engine?
*
The part of the database that manages how data is stored, updated, and retrieved from disk.

🔥 Key Features of WiredTiger:

| Feature                    | Description                                                                       |
| -------------------------- | --------------------------------------------------------------------------------- |
| **Document-level Locking** | Higher concurrency, allows multiple writes in different documents simultaneously. |
| **Compression**            | Uses **snappy or zlib** to reduce disk usage (less storage = faster I/O).         |
| **Checkpointing**          | Regularly writes data to disk in snapshots to ensure durability.                  |
| **Caching**                | Uses a **memory-mapped cache** to speed up frequent queries.                      |
| **Journaling**             | Ensures crash recovery by logging write operations before committing them.        |


📈 Why It’s Important:

* Improves performance under high-concurrency workloads.

* Reduces disk I/O through compression and efficient caching.

* Enables scalable applications with faster writes and reads.

* Supports ACID transactions for replica sets and sharded clusters.

💡 Other Engines:

* MMAPv1: Legacy engine (now deprecated).

* In-Memory Engine: For high-speed, ephemeral data (Enterprise only).

* Encrypted Storage Engine: Built on WiredTiger with at-rest encryption (Enterprise).

