Q1.What are the key differences between SQL and NoSQL databases?
1. Data Model
- SQL: Uses tables with rows and columns (structured, relational data).
- NoSQL: Uses key-value pairs, documents, graphs, or wide-columns (non-relational, flexible structure).
2. Schema
- SQL: Has a fixed schema — the structure must be defined before inserting data.
- NoSQL: Has a dynamic schema — you can store data without defining the structure first.
3. Scalability
- SQL: Scales vertically (add more power to a single server).
- NoSQL: Scales horizontally (add more servers to handle data across nodes).
4. Query Language
- SQL: Uses Structured Query Language (SQL) to interact with the database.
- NoSQL: Uses varied query methods, often depending on the database (e.g., MongoDB uses JSON-like queries).
5. Transactions
- SQL: Fully supports ACID transactions (Atomicity, Consistency, Isolation, Durability) for reliability.
- NoSQL: May offer eventual consistency; not all NoSQL databases fully support ACID.
6. Use Cases
- SQL: Best for structured data and complex relationships (e.g., ERP, banking systems).
- NoSQL: Best for unstructured or semi-structured data, big data, and real-time apps (e.g., social media, logs).
7. Joins
- SQL: Supports joins to relate data from multiple tables.
- NoSQL: Generally does not support joins; uses embedded or denormalized data instead.
8. Performance at Scale
- SQL: Can slow down with very large datasets due to joins and normalization.
- NoSQL: Designed for high performance and fast read/write in distributed systems.
9. Examples
- SQL: MySQL, PostgreSQL, Oracle, SQL Server.
- NoSQL: MongoDB, Cassandra, CouchDB, Redis, Neo4j.


Q2.What makes MongoDB a good choice for modern applications?
1. Flexible Schema (Schema-less Design)
- You can store documents with different fields and structures in the same collection.
- Great for rapid development, where data structures might evolve frequently.
2. JSON-like Documents (BSON Format)
- Stores data in a format similar to JSON, which is easy to read and integrates well with JavaScript and web applications.
- Allows for embedded/nested documents, making queries and data modeling more intuitive.
3. High Scalability
- MongoDB supports horizontal scaling using sharding (splitting data across multiple servers).
- Ideal for handling large volumes of data and high-throughput workloads.
4. Fast Performance
- Optimized for read and write performance, especially for semi-structured or unstructured data.
- Suitable for real-time analytics, logging systems, or rapidly growing data sets.
5. Built-in High Availability
- Supports replica sets (automatic failover and data redundancy).
- Ensures fault tolerance and data consistency across multiple nodes.
6. Rich Query Language
- Supports powerful queries using filters, projections, aggregations, text search, and geospatial queries.
- Enables complex operations without joins or deeply nested logic.
7. Developer-Friendly
-  Integrates easily with popular programming languages (JavaScript, Python, Node.js, etc.).
- Great support through MongoDB drivers, official documentation, and Atlas cloud platform.
8. Ideal for Modern Use Cases
- Works well for:
 - Real-time analytics
 - IoT and mobile apps
 - Content management systems (CMS)
 - E-commerce platforms
 - Chat applications and social networks
9. MongoDB Atlas (Cloud Platform)
- Fully managed cloud service with:
 - Auto-scaling
 - Backups
 - Monitoring
 - Global distribution


Q3.Explain the concept of collections in MongoDB?
- In MongoDB, a collection is equivalent to a table in relational databases (like MySQL or PostgreSQL). It is a container for documents, which are stored in a BSON (Binary JSON) format.
- Key Characteristics of Collections:
1. Stores Documents
- A collection holds documents (like rows in a table).
- Documents are written in JSON-like format with key-value pairs.
- {
  - "name": "Alice",
  - "age": 28,
  - "email": "alice@example.com"
- }
2. No Fixed Schema
- Unlike SQL tables, collections in MongoDB don’t require a predefined schema.
- Each document in the same collection can have a different structure.
- // Document 1
 - { "name": "John", "age": 30 }
- // Document 2
 - { "name": "Jane", "city": "Delhi", "hobbies": ["reading", "cycling"] }
3. Created Automatically
- Collections are automatically created when you insert the first document.
- No need for explicit CREATE TABLE like SQL.
- db.users.insertOne({ name: "Tom", age: 25 }); // Creates `users` collection
4. Can Be Indexed
- Collections support indexes to improve query performance, just like SQL.
- Grouped by Database
- Collections exist inside databases.
- One MongoDB database can have many collections.
- db_name
├── users (collection)
├── orders (collection)
└── products (collection)
5.Common Operations on Collections
- insertOne(), insertMany()
- find(), updateOne(), deleteOne()
- aggregate()


Q4. How does MongoDB ensure high availability using replication?
1. Replica Set: Core of MongoDB Replication
- A replica set is a group of MongoDB servers (nodes) that maintain the same data.
- It usually consists of:
 - 1 Primary node: handles all writes and reads (by default)
 - 1 or more Secondary nodes: replicate data from the primary
2. Automatic Failover for High Availability
- If the primary node goes down, MongoDB automatically:
 - Elects a new primary from the secondaries
 - Redirects traffic to the new primary
 - This ensures that the database remains available without manual intervention.
3. Data Replication
- Secondaries constantly sync with the primary by replaying the operation log (oplog).
- All changes made to the primary are recorded in the oplog, and secondaries apply these changes to stay updated.
4. Read & Write Behavior
- Writes: Always go to the primary node.
- Reads: By default, go to the primary, but you can configure read preferences to allow reads from secondaries (for load balancing).


Q5. What are the main benefits of MongoDB Atlas?
1. Fully Managed Database as a Service (DBaaS)
- No need to install, update, or manage servers.
- MongoDB Atlas handles patches, backups, monitoring, scaling, and performance optimization automatically.
2. High Availability and Global Distribution
- Atlas automatically sets up replica sets for redundancy and high availability.
- You can deploy your database in multiple regions across cloud providers (AWS, Azure, GCP).
- Ensures low latency and geo-failover for global apps.
3. Automatic Scaling
- Storage and compute scale automatically based on workload.
- You can scale vertically (more power) or horizontally (sharding) with just a few clicks or even automatically.
4. Built-in Security
- Atlas provides end-to-end encryption, IP whitelisting, VPC peering, role-based access control (RBAC), and TLS/SSL.
- Compliant with security standards like SOC 2, GDPR, HIPAA, etc.
5. Powerful Monitoring and Performance Tools
- Real-time dashboards for query performance, index suggestions, system health, and slow query analysis.
- Integrated alerts and notifications for proactive monitoring.
6. Easy Backup and Restore
- Automated, point-in-time snapshots and backups.
-  Allows for disaster recovery or rollback if needed.
7. Serverless and On-Demand Options
- Offers serverless instances where you're charged only for usage.
- Great for unpredictable or low-volume workloads.
8. Integration with Popular Tools and Services
- Easily integrates with:
- MongoDB Compass (GUI client)
- BI connectors
- Data Lake, Charts, and Realm (mobile backend)
- DevOps tools like Terraform, AWS Lambda, etc.
9. Multi-Cloud Flexibility
- Deploy across AWS, Azure, and Google Cloud.
- Even multi-cloud clusters (data replicated across cloud providers).
10. Free Tier to Get Started
- MongoDB Atlas offers a free shared cluster (M0) with:
- 512 MB storage
- 100 concurrent connections
- Perfect for prototyping and learning


Q6.What is the role of indexes in MongoDB, and how do they improve performance?
-  What is the Role of Indexes in MongoDB?
 - Indexes in MongoDB are data structures that improve the speed and efficiency of query operations by allowing the database to find data without scanning every document in a collection.
 - They are similar to the index in a book, which helps you quickly find a topic rather than reading every page.
- How Indexes Improve Performance
1. Faster Query Execution
- Without an index, MongoDB performs a collection scan (checks every document).
With an index, MongoDB can jump directly to the documents that match the query criteria.
2. Significantly reduces query time on large collections.
- Efficient Filtering and Sorting
- Indexes make queries with WHERE, SORT, or RANGE filters much faster.
- Example:
 - db.users.find({ age: { $gt: 25 } }).sort({ name: 1 });
 - This will be slow without indexes on age or name.
3. Support for Aggregation Pipelines
- Aggregation stages like $match and $sort perform better with indexes.
- Allows efficient data transformation and analytics on large datasets.
4. Improves Performance of Join-like Operations
- $lookup operations between collections benefit when the joined fields are indexed.
5.Reduces Resource Usage
- Fewer documents scanned = less CPU and memory used.
- Helps with scalability and performance under load.


Q7.Describe the stages of the MongoDB aggregation pipeline?
1. $match — Filtering Data
- Filters documents based on conditions (like SQL's WHERE).
- Reduces the number of documents passed to the next stage.
- { $match: { status: "active" } }
2. $project — Reshaping Documents
- Includes, excludes, or reshapes fields.
- Can compute new fields or rename existing ones.
- { $project: { name: 1, total: { $add: ["$price", "$tax"] } } }
3. $group — Grouping and Aggregation
- Groups documents by a field and applies aggregation functions.
- Similar to SQL's GROUP BY.
- {
  - $group: {
    - _id: "$category",
    - totalSales: { $sum: "$amount" },
    - avgPrice: { $avg: "$price" }
  - }
- }
4. $sort — Sorting Results
- Sorts documents by one or more fields (ascending or descending).
- { $sort: { totalSales: -1 } }
5. $limit — Restricting Output
- Limits the number of documents passed to the next stage.
- { $limit: 5 }
6. $skip — Skipping Documents
- Skips a specified number of documents (used for pagination).
- { $skip: 10 }
7. $lookup — Joining Collections
- Performs a left outer join between documents from different collections.
- {
  - $lookup: {
    - from: "orders",
    - localField: "customer_id",
    - foreignField: "customer_id",
    - as: "orders"
  - }
- }
8. $unwind — Flattening Arrays
- Breaks apart arrays into separate documents (1 document per array element).
- { $unwind: "$items" }
9. $addFields — Add/Compute New Fields
- Adds new fields or updates existing fields with computed values.
- { $addFields: { discounted: { $subtract: ["$price", "$discount"] } } }
10. $count — Count Documents
- Counts the number of documents that pass through the pipeline.
- { $count: "total" }


Q8. What is sharding in MongoDB? How does it differ from replication?
- Sharding is MongoDB’s method for horizontal scaling, where data is split and distributed across multiple servers (called shards).
- This allows MongoDB to:
 - Handle very large datasets
 - Support high-throughput operations
 - Maintain performance as data volume grows
- difference :
1. Purpose
- Sharding: Used to scale horizontally by distributing data across multiple servers.
- Replication: Used to ensure data redundancy and high availability by copying data across servers.
2. Data Storage
- Sharding: Each shard stores a subset of the total data.
- Replication: Each replica node stores a full copy of the data.
3. Write and Read Scalability
- Sharding: Improves write and read scalability by dividing the workload across shards.
- Replication: Limited write scalability; all writes go to the primary node, reads can be distributed.
4. Fault Tolerance
- Sharding: Not designed for automatic failover by itself.
- Replication: Provides automatic failover — if the primary fails, a secondary becomes primary.
5. Query Routing
- Sharding: Requires a mongos router to determine which shard to query.
- Replication: No special router is needed; all nodes can respond (depending on role).
6. Data Consistency
- Sharding: Does not inherently provide redundancy — data is split, not copied.
- Replication: Ensures data consistency and redundancy across multiple nodes.
7. Use Case
- Sharding: Ideal for very large datasets or high-throughput applications (e.g., analytics, IoT).
- Replication: Ideal for disaster recovery, read-heavy apps, or ensuring uptime.
8. Can They Be Combined?
- Sharding: Yes, each shard can also be a replica set.
- Replication: Yes, can be used within a sharded cluster.


Q9. What is PyMongo, and why is it used?
- PyMongo is the official Python driver for MongoDB, developed and maintained by MongoDB, Inc.It allows Python applications to connect to, interact with, and manipulate data in MongoDB databases.
- Why is PyMongo Used?
 - PyMongo is used to:
1. Connect Python to MongoDB
- Establishes a connection to a MongoDB server or cluster using Python.
from pymongo import MongoClient
- client = MongoClient("mongodb://localhost:27017/")
2. Perform CRUD Operations
- Supports all Create, Read, Update, and Delete operations in Python.
- #Insert
 - db.users.insert_one({"name": "Alice", "age": 25})
- #Read
 - db.users.find_one({"name": "Alice"})
- #Update
 - db.users.update_one({"name": "Alice"}, {"$set": {"age": 26}})
- #Delete
 - db.users.delete_one({"name": "Alice"})
3. Work with Collections and Databases
- Easily create or access collections (tables) and databases.
- db = client["my_database"]
- collection = db["my_collection"]
4. Support Aggregation Pipelines
- Use MongoDB’s powerful aggregation framework in Python.
- pipeline = [
    - {"$match": {"status": "active"}},
    - {"$group": {"_id": "$category", "total": {"$sum": "$amount"}}}
- ]
- db.orders.aggregate(pipeline)
5. Handle Indexing and Performance
- Create and manage indexes from your Python code.
- collection.create_index("email", unique=True)
6. Work with MongoDB Atlas
- PyMongo supports cloud-hosted MongoDB (Atlas) and allows secure connections via URI.


Q10. What are the ACID properties in the context of MongoDB transactions?
- What Are the ACID Properties in MongoDB Transactions?
 - ACID stands for Atomicity, Consistency, Isolation, and Durability — a set of properties that ensure reliable, safe database transactions.MongoDB fully supports ACID-compliant transactions starting from version 4.0 for replica sets and 4.2+ for sharded clusters.Here’s what each ACID property means in the context of MongoDB transactions:
1. Atomicity
- Definition: All operations within a transaction are executed as a single unit — either all succeed or none are applied.
- MongoDB: Ensures multi-document transactions either fully commit or rollback if something fails.
- Example:
 - If you're transferring money between two accounts:
   - session.start_transaction()
   - db.accounts.update_one({"_id": 1}, {"$inc": {"balance": -100}}, session=session)
   - db.accounts.update_one({"_id": 2}, {"$inc": {"balance": 100}}, session=session)
   - session.commit_transaction()
 - If any update fails, both operations are rolled back.
2. Consistency
- Definition: A transaction brings the database from one valid state to another, maintaining all data rules and constraints.
- MongoDB: Enforces data types, schemas (with schema validation), and referential logic within a transaction to keep the database consistent.
- You won’t be left with half-updated data that breaks business rules.
3. Isolation
- Definition: Concurrent transactions are isolated from each other — they don’t interfere or see each other’s intermediate changes.
- MongoDB: Ensures that one transaction doesn’t affect another, even when they run at the same time.
- E.g., While one transaction updates data, others cannot see those changes until it is committed.
4. Durability
- Definition: Once a transaction is committed, its changes are permanently saved, even if there’s a crash or power failure.
- MongoDB: Writes are stored in the WiredTiger journal, ensuring changes persist.


Q11.What is the purpose of MongoDB’s explain() function?
-  Purpose of explain()
Understand Query Execution Plan
Shows how MongoDB processes your query step-by-step.

Helps identify whether the query uses:

Indexes

Collection scans

Filters, sorts, etc.

Performance Tuning
Pinpoint slow queries and understand the reason.

Helps in index optimization and query restructuring.

Check Index Usage
Reveals whether the query is using the best possible index.

Prevents unnecessary collection scans (COLLSCAN).

Debugging Queries
Especially helpful when:

Queries return no results

Queries are unexpectedly slow

You want to test before applying indexes


Q12. How does MongoDB handle schema validation?
-  How Does MongoDB Handle Schema Validation?
Although MongoDB is schema-less by default, it supports schema validation to enforce rules on the structure and content of documents in a collection — similar to defining schemas in SQL databases.This helps ensure data quality, consistency, and integrity while retaining MongoDB’s flexibility.
1. Using JSON Schema Validation
- MongoDB uses JSON Schema (a standard schema format) to define validation rules.
- You can:
 - Enforce required fields
 - Specify data types
 - Limit values (enums, ranges)
 - Apply nested rules for embedded documents
- Example: Schema Validation
 - db.createCollection("students", {
   - validator: {
      - $jsonSchema: {
      - bsonType: "object",
      - required: ["name", "age", "email"],
      - properties: {
        - name: {
          - bsonType: "string",
          - description: "must be a string and is required"
        - },
        - age: {
          - bsonType: "int",
          - minimum: 18,
          - description: "must be an integer >= 18"
        - },
        - email: {
          - bsonType: "string",
          - pattern: "^.+@.+$",
          - description: "must be a valid email"
        - }
      - }
    - }
  - }
- });
- This ensures:
- Every document has name, age, and email
age must be ≥ 18
email must match email format
2. Validation Levels
- MongoDB lets you define how strictly the schema should be applied:
 - Level	Description
 - strict	Documents must fully follow the schema
 - moderate	Only validated if the document includes the validated fields
off	No schema enforcement (default behavior)
 - validationLevel: "strict" // or "moderate", "off"
3. Validation Actions
- error (default): Reject documents that violate the schema
- warn: Allow invalid documents but log a warning
- validationAction: "error" // or "warn"
4. Updating Validation on Existing Collection
- db.runCommand({
  - collMod: "students",
  - validator: { ... },
  - validationLevel: "strict",
  - validationAction: "error"
- });
5. Disabling Schema Validation (optional)
- If you want to allow all kinds of documents:
- db.createCollection("flexible", {
  - validator: {}
- });



Q13. What is the difference between a primary and a secondary node in a replica set?
1. Role in the Replica Set
- Primary Node : Handles all write and read operations (by default) in the replica set.Only one primary exists at a time.
- Secondary Node : Replicates data from the primary node.Can handle read operations (if read preference is set).Multiple secondaries can exist.
2. Data Handling
- Primary Node : All data changes (insert, update, delete) are written here first.
- Secondary Node : Receives copies of operations from the primary via the oplog and applies them.
3. Election and Failover
- Primary Node : If it fails, the replica set holds an election to promote a secondary.
- Secondary Node : Can be elected to become primary during failover (if eligible and up-to-date).
4. Read/Write Access
- Primary Node : Supports read and write operations.
- Secondary Node : Only supports reads (and only if read preference is configured).Writes are not allowed unless it becomes the new primary.
5. Usage in High Availability
- Primary Node : Main point of data access and control. Needs to be reliable.
- Secondary Node :
 - Used for:
  - High availability
  -  Disaster recovery
  - Read scaling
  - Backups (to reduce primary load)
6. Priority Control
- Primary Node:Highest priority node in elections (default unless manually set).
- Secondary Node:Can have priority set to 0 to make it unelectable (e.g., archive backups).


Q14. What security mechanisms does MongoDB provide for data protection?
1. Authentication
- Verifies the identity of users and applications connecting to the database.
-  Supported methods:
 - SCRAM (default) – Secure challenge-response authentication
 - LDAP – Integrate with corporate identity systems
 - x.509 Certificates – For SSL-based client authentication
 - Kerberos – Enterprise-level authentication (e.g., Active Directory)
2. Authorization (Role-Based Access Control - RBAC)
- Determines what users are allowed to do.
- You can define custom roles with specific privileges on:
 - Collections
 - Databases
 - Admin operations
- Example:
 - db.createUser({
   - user: "analyst",
   - pwd: "pass123",
   - roles: [{ role: "read", db: "sales" }]
 - });
3. Encryption
- In Transit (TLS/SSL):
 - Encrypts data sent between clients and MongoDB servers to prevent eavesdropping.
- At Rest (Encryption on Disk):
 - Uses AES-256 encryption to protect data stored on disk.
 - Requires MongoDB Enterprise or MongoDB Atlas.
 - Supports KMIP (Key Management Interoperability Protocol) for external key storage.
4. Auditing
- Tracks who did what and when in your database.
- Useful for compliance (GDPR, HIPAA, etc.)
- Captures:
 - Login attempts
 - Role changes
 - Query operations
 - Administrative actions
5. IP Whitelisting & Network Security
- Limit access to MongoDB servers using:
- IP whitelisting
- VPC (Virtual Private Cloud) peering
- Firewalls and network rules
- MongoDB Atlas lets you easily control which IPs or services can connect.
6. Field-Level Encryption (FLE)
- Client-side encryption of specific fields before data even reaches MongoDB.
- Only clients with the correct decryption keys can view sensitive fields (like passwords, card numbers).
7. User & Role Separation
- You can create:
 - Admin users
 - Read-only users
 - App users with limited access
 - And more – according to least privilege principle.
8. Secure Defaults in MongoDB Atlas
- Atlas (MongoDB’s cloud service) includes
- Default TLS
- Authentication required
- Encryption at rest
- Built-in backup and monitoring



Q15.Explain the concept of embedded documents and when they should be used?
- In MongoDB, embedded documents (also called nested documents) are documents that are stored within other documents — kind of like having a mini-document inside a larger one.This is a key feature of MongoDB’s document-oriented model, allowing you to store related data together.
- Example:
 - A user with multiple addresses:
 - {
   - "_id": 1,
   - "name": "Aarav",
   - "email": "aarav@example.com",
   - "addresses": [
     - { "type": "home", "city": "Delhi", "pincode": 110001 },
     - { "type": "work", "city": "Gurgaon", "pincode": 122001 }
    - ]
 - }
- Here, the addresses field contains embedded documents.
- When to Use Embedded Documents
 - Use embedded documents when you want to store related data together, and when:
1. One-to-Few Relationships
- Example: A user has a few phone numbers or emails.
- Good use case for embedding.
2. Data is Accessed Together
- If you often read or write the main document and its nested data together (e.g., order + items).
- Embedding avoids joins and speeds up queries.
3. High Read Performance is Needed
- Embedding keeps related data in a single document, which means:
- Fewer disk I/O operations
- Faster reads
4. Data is Not Updated Frequently or Independently
- Embedded data should not require frequent or independent updates.
- If it does, separate collections might be better.
- When NOT to Use Embedded Documents
- Avoid embedding when:
 - The nested data grows without bounds (risk of exceeding MongoDB's 16MB document limit)
 - You have a many-to-many or one-to-many with high cardinality relationship
 - Embedded data is frequently queried or updated separately


Q16.What is the purpose of MongoDB’s $lookup stage in aggregation?
- The $lookup stage in MongoDB’s aggregation pipeline is used to perform a left outer join between documents in the current collection and documents from another collection.It allows you to combine related data from multiple collections, similar to a SQL JOIN.
- Key Benefits of $lookup:
 - Joins Without SQL
 - Allows SQL-style joins in a NoSQL database like MongoDB.
 - Combines Related Data
 - Easily bring together data from multiple collections (e.g., users and orders).
 - Powerful in Reporting and Analytics
 - Ideal for building dashboards, summaries, and data views.


Q17.What are some common use cases for MongoDB?
1. Content Management Systems (CMS)
- Why MongoDB? Flexible schema supports dynamic content (e.g., blog posts, product pages, metadata).
- Example: WordPress-like systems, news websites, headless CMS platforms.
2. Real-Time Analytics and Big Data Applications
- Why MongoDB? Can handle high write throughput, large datasets, and real-time aggregation.
- Example: Dashboards for financial analytics, IoT sensor tracking, and operational monitoring.
3. Catalogs and E-commerce Platforms
- Why MongoDB? Schema flexibility allows for different product types with varying attributes.
- Example: Online stores, product listings, marketplaces like Amazon or Flipkart.
4. Internet of Things (IoT)
- Why MongoDB? Handles high-velocity data ingestion from devices and sensors.
- Example: Smart home systems, health tracking wearables, industrial machines.
5. Mobile and Web Applications
- Why MongoDB? JSON-like documents map well to front-end data formats (like JavaScript objects).
- Example: Social media apps, real-time messaging apps, ride-sharing platforms.
6. User Profiles and Personalization
- Why MongoDB? Easily stores diverse and evolving user data, preferences, and activity logs.
- Example: Netflix recommendations, Spotify listening history, user dashboards.
7. Gaming Applications
- Why MongoDB? Supports rapid game state updates, player progress tracking, and leaderboards.
- Example: Online multiplayer games, in-game inventory systems, player matching.
8. Log and Event Data Storage
- Why MongoDB? Optimized for write-heavy workloads and time-series data with flexible structure.
- Example: Application logs, server events, security monitoring tools.
9. Healthcare and Medical Records
- Why MongoDB? Stores unstructured patient records, images, prescriptions, and test results.
- Example: Electronic Health Records (EHR), telemedicine platforms.
10. Distributed Applications / Microservices
- Why MongoDB? Scales easily across regions and supports service-specific data models.
- Example: Backend services in microservices architecture using MongoDB for data storage.



Q18. What are the advantages of using MongoDB for horizontal scaling?
1. Sharding for Data Distribution
- What it means: MongoDB uses sharding to split data across multiple servers (called shards).
- Advantage: Handles massive volumes of data by distributing the load.
- Result: Increased storage capacity and better performance across a cluster.
2. Linear Scalability
- What it means: As your data or traffic grows, you can add more nodes.
- Advantage: MongoDB scales linearly, meaning performance grows proportionally with each node.
- Result: You avoid performance bottlenecks seen in vertically scaled (single-node) databases.
3. High Availability
- What it means: Each shard can have its own replica set.
- Advantage: Even if one node fails, others continue serving data
- Result: Ensures fault tolerance and business continuity.
4. Cost-Effective Scaling
- What it means: You can scale using commodity hardware or cloud instances.
- Advantage: No need for expensive high-end servers.
- Result: Great for startups and enterprise applications alike.
5. Write and Read Scalability
- What it means: Shards can handle write and read operations in parallel.
- Advantage: Avoids single-write-node bottlenecks common in traditional databases.
- Result: Faster write performance and distributed query load.
6. Geo-Distribution Support
- What it means: You can place shards in different geographic locations.
- Advantage: Local users get faster responses.
- Result: Better user experience and data sovereignty compliance.
7. Automated Data Balancing
- What it means: MongoDB’s balancer evenly distributes data across shards.
- Advantage: Prevents hotspots or overloaded servers.
- Result: Maintains consistent performance without manual intervention.


Q19. How do MongoDB transactions differ from SQL transactions?
1. Data Model
- SQL Transactions:
 - Operate on structured tables with fixed schemas and relationships (rows, columns, foreign keys).
- MongoDB Transactions:
 - Operate on JSON-like documents (BSON) in collections with flexible schemas and embedded documents.
2. Multi-Document Transactions
- SQL:
 - Supports multi-table (multi-row) transactions natively since the beginning.
- MongoDB:
 - Introduced multi-document transactions in version 4.0 (for replica sets) and 4.2+ (for sharded clusters).
3. Schema Enforcement
- SQL:
 - Schema is strictly enforced (data types, constraints, foreign keys).
- MongoDB:
 - Schema-less by default (unless you enable validation rules); transactions operate on flexible documents.
4. Use Cases
- SQL:
 - Ideal for highly structured, relational systems (e.g., banking, ERP).
- MongoDB:
 - Best for document-based apps with denormalized data (e.g., content management, catalogs).
5. Isolation and Concurrency Control
- SQL:
 - Uses well-established isolation levels (READ COMMITTED, SERIALIZABLE, etc.) with locking mechanisms.
- MongoDB:
 - Uses snapshot isolation with an oplog-based replication system. Not all isolation levels are supported.
6. Performance Overhead
- SQL:
 - Designed for heavy transaction use — optimized over decades.
- MongoDB:
 - Transactions add overhead and are slower than single-document operations — which are atomic by default.
7. Default Behavior
- SQL:
 - Transactions are commonly required for most operations.
- MongoDB:
 - Designed to work without transactions by leveraging atomic single-document writes and data embedding.


Q20.What are the main differences between capped collections and regular collections?
1. Storage Size and Document Limit
- Regular Collection:
 - No fixed size limit.
 - Grows dynamically as more documents are added.
- Capped Collection:
 - Has a fixed size (in bytes or number of documents).
 - When the limit is reached, oldest documents are overwritten (FIFO behavior).
2. Insertion Behavior
- Regular Collection:
 - Allows insertions anywhere; documents can be updated, deleted, or inserted freely.
- Capped Collection:
 - Only supports appending new documents.
 - No deletion of individual documents (automatically handled by overwriting).
 - No reordering or resizing of documents.
3. Use Cases
- Regular Collection:
 - General-purpose use (e.g., user data, products, orders).
- Capped Collection:
 - Ideal for logging, caching, or sensor data, where:
 - You only need recent data
 - Order of insertion matters
4. Performance
- Regular Collection:
 - Performance varies based on indexing, document size, and queries.
- Capped Collection:
 - Faster write performance due to:
  - Pre-allocated space
  - No document relocation
  - Simpler storage pattern
5. Document Updates
- Regular Collection:
 - Allows updates that change document size
- Capped Collection:
 - Only allows updates that don’t increase document size.
 - If updated size is larger, operation will fail.
6. Document Ordering
- Regular Collection:
 - No guarantee of insertion order unless explicitly sorted.
- Capped Collection:
 - Maintains natural insertion order (important for logs).


Q21.What is the purpose of the $match stage in MongoDB’s aggregation pipeline?
- The $match stage is used to filter documents in the aggregation pipeline, similar to how the WHERE clause works in SQL.It passes only those documents that meet the specified condition(s) to the next stage in the pipeline.
- Why Use $match?
 - Filter Data Early
 - Reduces the number of documents passed through the pipeline, improving performance.
- Works with All Query Operators
 - Supports all the same operators as .find(), like $gt, $lt, $in, $and, etc.
 - Used in Combination with Other Stages
 - Commonly used before grouping ($group), sorting ($sort), or projecting ($project) data


Q22. How can you secure access to a MongoDB database?
1. Enable Authentication
- Purpose: Requires users to log in before accessing the database.
- How: Start MongoDB with authentication enabled.
 - mongod --auth --port 27017 --dbpath /data/db
 - Create Admin User:
 - use admin
 - db.createUser({
  - user: "admin",
  - pwd: "securepassword",
  - roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
- });
2. Use Role-Based Access Control (RBAC)
- Purpose: Assigns specific privileges to users.
- How: Create users with appropriate roles (e.g., readOnly, dbAdmin, readWrite).
- db.createUser({
  - user: "analyst",
  - pwd: "analyst123",
  -  roles: [{ role: "read", db: "sales" }]
- });
3. Enable TLS/SSL Encryption (For Data in Transit)
- Purpose: Encrypts data transmitted between MongoDB clients and servers.
- How:
- Generate SSL certificates.
- Start MongoDB with SSL options:
- mongod --sslMode requireSSL --sslPEMKeyFile /etc/ssl/mongodb.pem
4. Use Encryption at Rest
- Purpose: Protects stored data on disk.
- How:
 - Available in MongoDB Enterprise Edition and Atlas.
 - Uses AES-256 encryption.
 - Supports integration with external Key Management Systems (KMS).
5. Network Access Control (IP Whitelisting)
- Purpose: Limits access to trusted IP addresses only.
- How (MongoDB Atlas):
 - Configure IP whitelist in the Network Access settings.
- How (Self-Hosted):
 - Use firewall rules (e.g., iptables, ufw) to restrict port 27017.
6. Bind MongoDB to Localhost or Specific IP
- Purpose: Prevents open access to the entire internet.
- How:
 - Set bind IP in the config file (mongod.conf):
- net:
  - bindIp: 127.0.0.1
7. Use Strong Passwords and Change Them Regularly
- Purpose: Prevent brute-force attacks.
- How: Use password managers or environment variables; enforce password complexity.
8. Enable Auditing
- Purpose: Track who did what and when (compliance and forensic tracking).
- How: Available in MongoDB Enterprise Edition.
- Logs: Authentication attempts, role changes, CRUD operations, etc.
9. Regular Backups & Security Patching
- Purpose: Protect against data loss and vulnerabilities.
- How:
- Use mongodump, Atlas backups, or Ops Manager.
- Regularly update MongoDB to the latest version with security patches.
10. Use MongoDB Atlas (Cloud) for Built-in Security
- Comes with:
 - TLS enabled by default
 - IP access control
 - Encryption at rest
 - Backup automation
 - 2FA for user accounts



Q23 What is MongoDB’s WiredTiger storage engine, and why is it important?
-  WiredTiger is the default storage engine used by MongoDB (since version 3.2+), responsible for how data is stored, managed, and accessed on disk.It's a high-performance, modern storage engine designed to handle large-scale, high-throughput applications.
- Importance :
1. Document-Level Concurrency
- Allows multiple write operations on different documents at the same time.
- Increases performance for high-concurrency applications.
2. Data Compression
- Compresses both data and indexes (default: Snappy).
- Saves disk space and reduces disk I/O.
3. Efficient Caching
- Uses a memory-efficient cache for frequently accessed data.
- Speeds up read operations significantly.
4. Durability with Journaling
- Uses a write-ahead log (journal) to recover from crashes.
- Ensures data is not lost during unexpected shutdowns.
5. Checkpointing
- Periodically creates checkpoints of data.
- Enables faster recovery and consistency after crashes.
6. ACID Compliance
- Supports Atomicity, Consistency, Isolation, and Durability for transactions.
- Ensures data reliability in critical applications.
7. Tunable Performance
- Developers can customize settings (e.g., cache size, compression type).
- Adapts well to various workloads like read-heavy or write-heavy apps.
8. Scalability
- Designed to support large datasets and high-throughput operations.
- Suitable for everything from small apps to enterprise systems.
9. Default and Actively Supported
- WiredTiger is the default engine in MongoDB (since v3.2+).
- All modern MongoDB features are optimized for WiredTiger.

In [4]:
pip install pandas pymongo


Collecting pymongo
  Downloading pymongo-4.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m16.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.7.0-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.6/313.6 kB[0m [31m29.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.14.0


In [None]:
#Q3.Count and display the total number of documents in the Orders collection
from pymongo import MongoClient

# Step 1: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["orders"]

# Step 2: Count the documents
total_docs = collection.count_documents({})

# Step 3: Display the count
print(f"📦 Total number of documents in 'orders' collection: {total_docs}")


In [None]:
#Q4.Write a query to fetch all orders from the "West" region
from pymongo import MongoClient

# Step 1: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["orders"]

# Step 2: Query for orders from the "West" region
west_orders = collection.find({ "Region": "West" })

# Step 3: Print the results
for order in west_orders:
    print(order)


In [None]:
#Q5.Write a query to find orders where Sales is greater than 500
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["orders"]

# Query: Find orders where Sales > 500
high_sales_orders = collection.find({ "Sales": { "$gt": 500 } })

# Print the results
for order in high_sales_orders:
    print(order)


In [None]:
#Q6. Fetch the top 3 orders with the highest Profit
from pymongo import MongoClient

# Step 1: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["orders"]

# Step 2: Query - Sort by Profit descending, limit to 3
top_profit_orders = collection.find().sort("Profit", -1).limit(3)

# Step 3: Print the results
for order in top_profit_orders:
    print(order)


In [None]:
#Q7.Update all orders with Ship Mode as "First Class" to "Premium Class
from pymongo import MongoClient

# Step 1: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["orders"]

# Step 2: Update Ship Mode from "First Class" to "Premium Class"
result = collection.update_many(
    { "Ship Mode": "First Class" },          # Filter condition
    { "$set": { "Ship Mode": "Premium Class" } }  # Update action
)

# Step 3: Display result
print(f"✅ Modified {result.modified_count} documents.")


In [None]:
#Q8. Delete all orders where Sales is less than 50
from pymongo import MongoClient

# Step 1: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["orders"]

# Step 2: Delete orders where Sales < 50
result = collection.delete_many({ "Sales": { "$lt": 50 } })

# Step 3: Display result
print(f"🗑️ Deleted {result.deleted_count} documents where Sales < 50.")


In [None]:
#Q9.Use aggregation to group orders by Region and calculate total sales per region
from pymongo import MongoClient

# Step 1: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["orders"]

# Step 2: Define aggregation pipeline
pipeline = [
    {
        "$group": {
            "_id": "$Region",               # Group by Region
            "total_sales": { "$sum": "$Sales" }  # Sum of Sales per Region
        }
    },
    {
        "$sort": { "total_sales": -1 }      # Optional: sort by sales descending
    }
]

# Step 3: Execute aggregation
results = collection.aggregate(pipeline)

# Step 4: Print results
print("📊 Total Sales by Region:")
for result in results:
    print(f"Region: {result['_id']}, Total Sales: {round(result['total_sales'], 2)}")


In [None]:
#Q10. Fetch all distinct values for Ship Mode from the collection
from pymongo import MongoClient

# Step 1: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["orders"]

# Step 2: Get distinct Ship Mode values
distinct_ship_modes = collection.distinct("Ship Mode")

# Step 3: Print results
print("🚚 Distinct Ship Modes:")
for mode in distinct_ship_modes:
    print(f"- {mode}")



In [None]:
#Q11.Count the number of orders for each category.
from pymongo import MongoClient

# Step 1: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["orders"]

# Step 2: Define aggregation pipeline
pipeline = [
    {
        "$group": {
            "_id": "$Category",         # Group by Category
            "order_count": { "$sum": 1 }  # Count each order
        }
    },
    {
        "$sort": { "order_count": -1 }   # Optional: sort by count descending
    }
]

# Step 3: Execute aggregation
results = collection.aggregate(pipeline)

# Step 4: Print results
print("📦 Number of Orders by Category:")
for result in results:
    print(f"Category: {result['_id']}, Orders: {result['order_count']}")


In [None]:
#Q1.Write a Python script to load the Superstore dataset from a CSV file into MongoDB
import pandas as pd
from pymongo import MongoClient

# Step 1: Load the CSV file into a pandas DataFrame
file_path = '/mnt/data/superstore.csv'  # Update this path if needed
df = pd.read_csv(file_path)

# Step 2: Optional - Convert column names to remove spaces (MongoDB-friendly)
df.columns = df.columns.str.replace(' ', '_')

# Step 3: Convert DataFrame to dictionary format
records = df.to_dict(orient='records')

# Step 4: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")  # Adjust if using remote MongoDB
db = client["superstore_db"]                        # Database name
collection = db["orders"]                           # Collection name

# Step 5: Insert data into MongoDB
collection.insert_many(records)

# Step 6: Print result
print(f"✅ Inserted {len(records)} documents into 'orders' collection in 'superstore_db' database.")


In [None]:
#Q2. Retrieve and print all documents from the Orders collection
from pymongo import MongoClient

# Step 1: Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["orders"]

# Step 2: Fetch all documents from the collection
all_orders = collection.find()

# Step 3: Print each document
for order in all_orders:
    print(order)
