# Theory


1. What are the key differences between SQL and NoSQL databases?

**SQL (Relational Databases)**

* Structured data stored in tables (rows + columns)

* Requires a fixed schema

* Uses SQL queries

* Supports ACID transactions

* Scales vertically (bigger server)

**NoSQL (Non-Relational Databases)**

* Stores data in documents, key-value, graphs, or wide column

* Schema-less, flexible structure

* Uses different query mechanisms

* Eventually consistent (not always ACID)

* Scales horizontally across multiple servers

** When to use what?**

* SQL → Complex relationships, transactions (banking, finance)

* NoSQL → Big data, fast development, flexible schema, microservices

2. What makes MongoDB a good choice for modern applications?

* Document-based model → Stores JSON-like data, very natural to use with JavaScript/Node.js

* Flexible schema → Easy to iterate and modify fields anytime

* High performance → Fast reads/writes using internal indexing

* Horizontal scalability → Built-in sharding for large-scale systems

* Cloud-native → Works well with microservices, distributed systems

* Easy integration → Works perfectly with MERN, MEAN stacks

3. Explain the concept of collections in MongoDB

In MongoDB:

* Collection = Table in SQL

* Document = Row in SQL

Key points:

* A collection stores multiple documents.

* Documents inside a collection do not require the same structure.

  * Example: one document may have age, another may not.

* Collections are created automatically when you insert data.

* Indexed for fast access.

4. How does MongoDB ensure high availability using replication?

** Replica Set Components**

* Primary node → receives all writes

* Secondary nodes → copy data from the primary

* Arbiter (optional) → helps in elections but stores no data

**How high availability works**

* If the primary fails, MongoDB automatically elects a new primary

* Application continues without downtime

* No manual intervention → ensures fault tolerance

5. What are the main benefits of MongoDB Atlas?

MongoDB Atlas = MongoDB’s fully managed cloud service.

Benefits:

* Fully automated backups, updates, scaling

* Global clusters → deploy databases near your users

* Built-in security → encryption, network isolation, access control

* Performance optimization tools included

* Serverless & on-demand scaling

* Monitoring dashboard with alerts

* Works on AWS, Azure, GCP

6. What is the role of indexes in MongoDB, and how do they improve performance?

**Role of Indexes**

Indexes in MongoDB are data structures that allow the database to quickly locate documents without scanning the entire collection.

How Indexes Improve Performance

* Reduce query execution time

* Avoid collection scan (COLLSCAN)

* Improve filtering, sorting, and searching

* Support efficient range queries (>, <, !=)

* Enable compound queries (queries on multiple fields)

Example

* Without index → scans 1 million documents

* With index → directly jumps to relevant documents

7. Describe the stages of the MongoDB aggregation pipeline

MongoDB’s aggregation pipeline processes data step-by-step using stages (similar to data pipelines).

**Common Stages**

| Stage | Purpose |
|---|---|
| `$match` | Filters documents (like WHERE) |
| `$group` | Groups data, performs sums, counts, averages |
| `$project` | Selects or reshapes fields |
| `$sort` | Sorts results |
| `$limit` | Limits number of documents |
| `$skip` | Skips a number of documents |
| `$lookup` | Performs JOIN between collections |
| `$unwind` | Deconstructs arrays into separate documents |

8. What is sharding in MongoDB? How does it differ from replication?

**Sharding**

Sharding means splitting large data across multiple servers (shards).
It provides horizontal scaling.

**Why sharding is needed**

* When data is too large for one machine

* When traffic is too high

## Replication vs Sharding

| Feature | Replication | Sharding |
|---|---|---|
| Purpose | High availability & fault tolerance | Horizontal scaling & load distribution |
| Data | Same data on all nodes | Different portions of data on different nodes |
| Components | Primary + secondaries | Shard servers + config server + mongos router |
| Benefit | Prevents downtime | Handles big data & high traffic |

9. What is PyMongo, and why is it used?

What is PyMongo?

PyMongo is the official Python driver for MongoDB.

Why it is used

* To connect Python applications with MongoDB

* Perform CRUD operations (insert, update, delete, find)

* Run aggregation pipelines

* Create indexes

* Support for MongoDB features like

  * Transactions

  * Replica sets

  * Sharding

10. What are the ACID properties in the context of MongoDB transactions?

**ACID Properties**

1. Atomicity

* All operations in a transaction succeed or none do.

* “All or nothing.”

2. Consistency

* Database remains in a valid state before and after the transaction.

* Follows schema rules, constraints.

3. Isolation

* Transactions don’t interfere with each other.

* Intermediate changes are not visible to others.

4. Durability

* Once a transaction is committed, data is permanently saved, even after crashes.

Why ACID matters

* Ensures safe operations for financial apps

* Maintains correctness in multi-user environments

11. What is the purpose of MongoDB’s explain() function?

**Purpose**

explain() is used to analyze how MongoDB executes a query.

What it tells you

* Whether the query uses an index or full COLLSCAN

* The query execution plan

* Expected number of documents scanned

* Performance cost of the query

* How the optimizer decided to run the query

Why it's useful

* Helps developers optimize slow queries

* Useful in indexing decisions

* Essential for debugging performance issues

12. How does MongoDB handle schema validation?

MongoDB is schema-flexible, but it supports schema validation using JSON Schema rules.

How it works

* You define validation rules when creating or updating a collection.

* MongoDB enforces field types, allowed values, required fields, etc.

* Invalid documents are rejected on insert or update.

Example rules

* Field type check (string, int, array)

* Required fields

* Minimum and maximum values

* Pattern matching (regex)

Purpose

* Ensures data consistency

* Helps maintain structure in large applications

13. What is the difference between a primary and a secondary node in a replica set?

Primary Node

* Accepts all write operations

* Propagates changes to secondaries

* Only one primary exists at a time

Secondary Nodes

* Maintain copies of the data

 8Receive changes from primary via oplog

 8Can serve read operations (if enabled)

* Can become primary if the current primary fails (via election)

Simple difference

* Primary = writes

* Secondary = read-only copies + failover backup

14. What security mechanisms does MongoDB provide for data protection?

MongoDB has multiple built-in security layers.

**Key Security Mechanisms**

1. Authentication

* Verifies users using SCRAM, LDAP, Kerberos, X.509.

2. Authorization (Role-Based Access Control – RBAC)

* Grants permissions such as read, readWrite, admin.

3. Encryption

* In-transit: TLS/SSL

* At-rest: Encrypted storage engine

4. Auditing

* Logs important security events (access, schema changes).

5. IP Whitelisting & Network Isolation

* Only allowed IPs can access the database.

6. Field-Level Encryption

* Sensitive fields (e.g., credit card numbers) encrypted individually.

15. Explain the concept of embedded documents and when they should be used

**What are Embedded Documents?**

Embedded documents are documents nested inside another document.

Example:{
  "name": "Atharv",
  "address": {
    "city": "Mumbai",
    "pincode": 400001
  }
}

When to Use Embedded Documents

* Use embedding when:

* Data is closely related

* Data is always accessed together

* The embedded part does not grow indefinitely

* You want faster reads by avoiding JOIN-like operations

Best for

* User profile + address

* Order + list of order items

* Blog post + comments (small number)

Avoid embedding when

* Embedded array grows very large (like millions of comments)

* Data is not always required together

* Data needs to be shared across multiple documents

16. What is the purpose of MongoDB’s $lookup stage in aggregation?

Purpose

$lookup performs a left outer join between two MongoDB collections.

Why it is used

* To combine related data stored in different collections

* To simulate SQL-like JOIN operations

* To fetch nested/related documents in one aggregation pipeline

Example

Joining orders with customer details:{
  $lookup: {
    from: "customers",
    localField: "customer_id",
    foreignField: "_id",
    as: "customerDetails"
  }
}

17. What are some common use cases for MongoDB?

Common Use Cases

MongoDB is used for modern, large-scale, and rapidly changing applications:

* E-commerce platforms (products, orders, carts)

* Content management systems (CMS)

* Real-time analytics

* Social media applications

* Mobile and gaming applications

* IoT and sensor data

* Location-based services

* Chat and messaging apps

* Machine learning pipelines (unstructured data)

Why?

Because MongoDB handles:

* Unstructured/semistructured data

* High read/write throughput

* Flexible data models

18. What are the advantages of using MongoDB for horizontal scaling?

MongoDB supports horizontal scaling through sharding.

Advantages

Data distribution across multiple servers
→ No single server becomes a bottleneck.

1. High write scalability
→ Writes are spread across shards.

2. Supports massive datasets
→ Petabytes of data can be handled.

3. Automatic balancing
→ MongoDB moves data between shards automatically.

4. Better performance under heavy load
→ Read & write requests are distributed.

5. Cost-effective scaling
→ Cheaper commodity hardware instead of one big server.

19. How do MongoDB transactions differ from SQL transactions?

SQL Transactions

* Designed for strong ACID guarantees across all operations

* Often span multiple tables and rows

* Default in relational databases

* Built for complex joins and relationships

MongoDB Transactions

* Introduced in MongoDB 4.0+

* Support ACID, but only when needed

* Typically used for multi-document or multi-collection operations

* MongoDB encourages embedding, reducing the need for transactions

* Transactions add overhead → used only when necessary

## Key Differences

| SQL | MongoDB |
|---|---|
| Transactions are core | Transactions optional |
| Used frequently | Used rarely (due to document model) |
| Multi-table joins common | Usually avoided via embedding |
| Higher overhead acceptable | Designed for performance |

20. What are the main differences between capped collections and regular collections?

Capped Collections

* Fixed-size collections

* Automatically overwrite oldest documents when full

* Documents are stored in insertion order

* Very fast inserts → use a fixed amount of disk space

* No document deletion except overwrite

* Do not allow updates that increase document size

Regular Collections

* Can grow dynamically

* Do not overwrite anything automatically

* Allow full CRUD operations (insert, update, delete)

* No size restriction

* Support indexes, validation, and schema flexibility fully

When to Use Capped

* 8Log data

* Sensor data

* Real-time streams

* Cache-like systems

21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?

**Purpose of $match**

$match filters documents in the pipeline based on specified conditions—similar to the WHERE clause in SQL.

Why $match is important

* Reduces the number of documents passed to later stages

* Improves performance by filtering early

* Can use indexes for faster searching

* Essential for selecting specific records before grouping, sorting, etc

Example:{
  $match: { status: "active" }
}

22. How can you secure access to a MongoDB database?

MongoDB provides multiple layers of security.

Key Security Measures

1. Authentication

Verifies user identity.
Methods:

* SCRAM (default)

* X.509 certificates

* LDAP

* Kerberos

2. Authorization (RBAC – Role-Based Access Control)

Defines what each user can do.
Examples:

* read

* readWrite

* dbAdmin

* clusterAdmin

3. Network Security

* Enable firewalls

* Bind MongoDB to specific IP addresses

* Use IP whitelisting

* Run behind a VPN or private network

4. Encryption

* Encryption-at-rest (WiredTiger Encrypted Storage Engine)

* Encryption-in-transit using TLS/SSL

5. Auditing

Tracks access and actions taken on the database.

6. Security Best Practices

* Disable the default “open access” configuration

* Never expose MongoDB directly to the public internet

* Change default ports if required

* Use environment variables for credentials

23. What is MongoDB’s WiredTiger storage engine, and why is it important?

What is WiredTiger?

WiredTiger is MongoDB’s default storage engine (since MongoDB 3.2). It is responsible for how MongoDB stores data on disk, manages compression, caching, and concurrency.

Why it is important

1. Document-Level Concurrency

* Multiple writes can happen at the same time

* Avoids bottlenecks

* Faster performance

2. Compression

* Uses Snappy / Zlib / Zstd compression

* Reduces disk usage significantly

* Improves I/O performance

3. Checkpointing & Crash Safety

* Periodically writes data from memory to disk

* Prevents data loss in crashes

4. Better Memory Management

* Efficient caching system

* Reads are faster

* Supports modern hardware optimizations

5. Encryption at Rest

* Built-in encryptions options

* Provides secure storage (especially in Atlas)

