# **Theoretical Questions and Answers**


## Question 1 :- What are the key differences between SQL and NoSQL databases?

# Answer :-

## The key differences between SQL and NoSQL databases are :-

### **SQL database :-**

- SQL Databases feature is relational.
- It's structure is table based (like row and columns).
- The schema of SQL database is fixed (Predefined Structure).
- In SQL database using vertical scaling (adding resources to a single server).
- The data storage in SQL database uses structured format with strict relationships (joins).
- It uses Strong ACID Compliance (Atomicity, Consistency, Isolation, Durability).
- Uses SQL (Structured Query Language) as Query language.
- The flexibility in SQL database is in rigid structure, requires schema changes for updates.
- SQL database is best for Complex queries, transactions, and structured data.


### **NoSQL database :-**

- NoSQL Databases feature is non-relational.
- It's structure is Document-based, key-value, graph, or column-family.
- The schema of NoSQL database is Dynamic schema (flexible structure).
- In NoSQL database using Horizontal scaling (distributing across multiple servers).
- The data storage in NoSQL database is stores unstructured, semi-structured, or structured data.
- It uses BASE (Basically Available, Soft state, Eventually consistent).
- It uses various query methods (e.g., MongoDB Query Language, CQL for Cassandra) as Query language.
- The flexibility in NoSQL database is in Flexible, can easily accommodate changing data models.
- NoSQL database is best for Big data, real-time analytics, scalability, and semi-structured data.



## Question 2 :- What makes MongoDB a good choice for modern applications?

# Answer :-

### There are few points which makes MongoDB a good choice for modern applications mentioned below :-

- **1) Flexible Schema (Schema-less Design) :-**

 - Unlike SQL databases, MongoDB stores data in JSON-like BSON format, allowing dynamic and flexible structures.

 - Developers can modify data structures without affecting existing records, making it ideal for agile development.

- **2) Scalability & Performance :-**

 - Supports horizontal scaling using sharding, allowing large-scale applications to handle massive data volumes.

 - High-speed reads/writes due to its document-based model and indexing capabilities.

- **3) High Availability & Replication :-**

 - Replica sets ensure automatic failover and data redundancy, improving availability.

 - Data is replicated across multiple servers, making it resilient against failures.

- **4) Developer-Friendly & Easy Integration :-**

 - Uses MongoDB Query Language (MQL), which is intuitive and similar to JSON.

 - Supports multiple programming languages (Python, JavaScript, Java, etc.), making it highly versatile.

- **5) Supports Big Data & Real-Time Analytics :-**

 - Handles semi-structured & unstructured data efficiently.

 - Works well with big data applications and integrates with tools like Apache Spark.

- **6) Cloud & Microservices Ready :-**

 - MongoDB Atlas provides a fully managed cloud database with automated scaling and security.

 - Ideal for microservices architecture, where each service can store data independently.

- **7) ACID Transactions (Since MongoDB) :-**

 - Ensures Atomicity, Consistency, Isolation, and Durability (ACID) for multi-document transactions, making it suitable for financial applications.



## Question 3 :- Explain the concept of collections in MongoDB.

## Answer :-

### A collection in MongoDB is equivalent to a table in relational databases. It is a grouping of documents (records), where each document is stored in BSON (Binary JSON) format.

### MongoDB is a popular NoSQL database that offers a flexible, scalable, and high-performance way to store data.

### Key points about collections in MongoDB :-

- **Structure :-** A collection is a container that holds multiple documents, each representing a single data record.

- **Schema-less nature :-** Unlike relational databases, documents within a collection don't need to have the same fields or data types, allowing for dynamic data modeling.

### Types of Collections in MongoDB :-

- **Regular Collections :-** Store documents normally.

- **Capped Collections :-** Fixed-size collections with automatic overwrites when full.

- **Time-Series Collections :-** Optimized for storing time-based data.

## Question 4 :- How does MongoDB ensure high availability using replication?

# Answer :-

### MongoDB ensures high availability through a feature called replication, which is implemented using replica sets. A replica set consists of multiple MongoDB servers (nodes) that maintain identical copies of the database, ensuring redundancy and fault tolerance.

### **Key Components of Replication in MongoDB :-**

- **Primary Node (Leader) :-**

 - The only node that accepts write operations.

 - Automatically elected among the nodes in case of failure.

- **Secondary Nodes (Followers) :-**

 - Maintain copies of the data from the primary node.

 - Synchronize with the primary asynchronously.

 - Can be used for read operations to distribute load.

### **How MongoDB Ensures High Availability :-**

- **Automatic Failover :-**

 - If the primary node fails, one of the secondary nodes is automatically elected as the new primary.

 - Election takes place within seconds to minimize downtime.

- **Data Redundancy :-**

 - Secondary nodes hold identical copies of the primary's data.

 - Ensures that data is not lost even if the primary node fails.

- **Read Scalability :-**

 - Clients can read from secondary nodes (if enabled).

 - Helps distribute read operations, improving performance.

- **Write Operations Continue Seamlessly :-**

 - When a primary node fails, a secondary becomes the new primary.

 - Clients automatically reconnect to the new primary, ensuring continued operations.

## Question 5 :- What are the main benefits of MongoDB Atlas ?

## Answer :-

### MongoDB Atlas is a fully managed cloud database service that simplifies deployment, scaling, and maintenance of MongoDB databases. It is available on major cloud platforms like AWS, Azure, and Google Cloud.

### **Key Benefits of MongoDB Atlas :-**

- **Fully Managed Service :-**

 - No need for manual installation, configuration, or maintenance.

 - Automated backups, monitoring, and patching.

- **High Availability and Automatic Failover :-**

 - Built-in replica sets ensure redundancy.

 - If a node fails, automatic failover switches to a healthy node.

- **Samless Scalability :-**

 - **Vertical Scaling :-** Increase instance size as needed.

 - **Horizontal Scaling (Sharding) :-** Distribute data across multiple servers for better performance.

- **Advanced Security Features :-**

 - End-to-End Encryption for data at rest and in transit.

 - Access control and role-based permissions.

 - IP whitelisting and VPC peering for network security.

- **Real-time Performance Monitoring :-**

 - Built-in monitoring tools provide insights into performance, slow queries, and system health.

 - Alerts for performance issues.

- **Automated Backups & Recovery :-**

 - Continuous backups with point-in-time restore options.

 - Prevents data loss in case of accidental deletion or system failure.

- **Multi-Cloud and Multi-Region Support :-**

 - Deploy across multiple cloud providers (AWS, GCP, Azure).

 - Global clusters allow for faster data access worldwide.

- **Integration with Development Tools :-**

 - Supports PyMongo, Mongoose, Node.js, Python, Java, and more.

 - Built-in Data Explorer for querying and managing data directly.

- **Cost Optimization :-**

 - Pay-as-you-go pricing with different plans based on workload.

 - Auto-scaling helps manage costs efficiently.

- **Serverless and API Access :-**

 - MongoDB Atlas Serverless offers a pay-per-use model for unpredictable workloads.

 - Data API allows direct database access without managing servers.

## Question 6 :-  What is the role of indexes in MongoDB, and how do they improve performance ?

## Answer :-

### Indexes in MongoDB enhance query performance by allowing the database to quickly locate and retrieve documents instead of scanning the entire collection. They function similarly to indexes in books, helping MongoDB find data efficiently.

### **Role of Indexes in MongoDB :-**

- **Speeds Up Query Execution- :-**

 - Without indexes, MongoDB performs a collection scan, checking every document.

 - With indexes, MongoDB finds documents much faster.

- **Enhances Read Performance :-**

 - Queries that use indexes run significantly faster than those without.

- **Improves Sorting Operations :-**

 - Indexes help optimize queries using $sort, avoiding in-memory sorting.

- **Optimizes Filtering (match) in Aggregation :-**

 - Indexes speed up filtering by reducing the number of documents scanned.

- **Supports Unique Constraints :-**

 - Unique indexes ensure that duplicate values are not allowed in a field.


 ### **Indexes Improve Performance in MongoDB :-**
Indexes in MongoDB significantly improve performance by optimizing query execution and reducing the time required to retrieve data. Without indexes, MongoDB must perform a collection scan, meaning it checks every document in the collection to find a match. This approach is inefficient, especially for large datasets.

With indexes, MongoDB can efficiently locate documents, minimizing the number of records it needs to examine.

### **Here's how indexes enhance performance :-**

- **Faster Query Execution :-**

 - Indexes allow MongoDB to find data quickly by referencing an ordered structure instead of scanning all documents.

 - For example, searching for a specific customer in a database with an index on the customer_name field retrieves results much faster than scanning the entire collection.

- **Efficient Sorting :-**

 - Queries that use the $sort operation can benefit from indexes because MongoDB does not need to sort results in memory.

 - If an index exists on the sorting field, the query retrieves results in the required order directly from the index.

- **Optimized Filtering in Aggregation :-**

- When filtering data using the $match stage in an aggregation pipeline, indexes allow MongoDB to quickly exclude irrelevant documents, reducing processing time.

- **Reduced Disk I/O and Memory Usage :-**

 - Since indexed queries scan fewer documents, MongoDB requires fewer disk reads and less RAM, improving overall system performance.

- **Faster Join Operations Using lookup :-**

 - In scenarios where collections are joined using $lookup, indexed fields enhance lookup speed, making operations more efficient.

- **Unique Constraint Enforcement :-**

 - Indexes also improve performance by enforcing uniqueness constraints, preventing duplicate entries and reducing the need for additional validation operations.

## Question 7 :- Describe the stages of the MongoDB aggregation pipeline.

## Answer :-

#### The MongoDB aggregation pipeline is a powerful framework for performing complex data transformations and analysis. It processes documents through a sequence of stages, where each stage applies an operation on the data.

Each stage in the pipeline modifies documents before passing them to the next stage, allowing for complex transformations and computations.


### **Key Stages of the Aggregation Pipeline :-**

- **"$match" - Filtering Data :-**

 - This stage filters documents based on specified conditions, similar to the WHERE clause in SQL.

 - It improves efficiency by reducing the number of documents processed in subsequent stages.

- **"$group" - Grouping and Aggregating Data :-**

 - Groups documents based on a specified field and performs aggregation functions such as sum, average, count, min, and max.

 - Commonly used for calculating totals, averages, and grouped statistics.

- **"$project" - Reshaping Documents :-**

 - Modifies the structure of documents by including, excluding, or transforming fields.

 - Useful for renaming fields, computing new fields, or displaying only necessary information.

- **"$sort" Sorting Data :-**

 - Sorts documents in ascending or descending order based on a specified field.

 - Helps in ranking and ordering data for better analysis.

- **"$limit" - Restricting Output :-**

 - Limits the number of documents returned by the query.

 - Useful when retrieving only the top results, such as the top 5 highest sales orders.

- **"$skip" Skipping Documents :-**

 - Skips a specified number of documents before returning results.

 - Often used for pagination in large datasets.

- **"$lookup" - Joining Collections :-**

 - Performs a left outer join between two collections, similar to SQL joins.

 - Allows combining related data from different collections into a single result set.

- **"$unwind" - Decomposing Arrays :-**

 - Splits an array field into multiple documents, creating a separate document for each element in the array.

 - Useful when working with embedded arrays that need to be processed individually.

- **"$out" - Storing Results in a Collection :-**

 - Saves the output of the aggregation pipeline into a new or existing collection.

 - Useful for creating summary tables or exporting processed data.

- **"$count" - Counting Documents :-**

 - Returns the total number of documents that match a given condition.

 - Helps in generating statistical reports and summaries.

## Question 8 :- What is sharding in MongoDB? How does it differ from replication.

## Answer :-

### Sharding is a database partitioning technique used in MongoDB to distribute large datasets across multiple servers. It helps scale horizontally by splitting data into smaller, manageable chunks and storing them across multiple machines, called shards.

### Sharding is essential when a single server cannot handle large volumes of data or high query loads, ensuring that MongoDB can efficiently handle big data and high-traffic applications.


### **How Sharding Works explained in below points**

### **Sharding in MongoDB is implemented using three key components :-**

- **Shards :-**

 - Each shard is an individual MongoDB instance that stores a subset of the data.

 - Collectively, all shards make up the complete dataset.

- **Config Servers :-**

 - Store metadata about the cluster, such as which shard contains specific data.

 - Help in directing queries to the correct shard.

- **Query Routers (Mongos) :-**

 - Acts as an interface between applications and the database.

 - Directs client requests to the appropriate shard(s).


### **How is Data Distributed in Sharding**

### **Data is divided into shards based on a shard key, which determines how documents are distributed. There are three types of sharding strategies :-**

- **Range-Based Sharding :-**

 - Data is distributed based on a range of values in a selected field.

 - Example :- Orders are sharded based on OrderID ranges.

- **Hash-Based Sharding :-**

 - A hash function is applied to the shard key, distributing data evenly across shards.

 - Ensures uniform data distribution, avoiding hotspots.

- **Zone-Based Sharding :-**

 - Data is partitioned into specific zones based on defined criteria.

 - Example :- Storing customer data in shards based on geographic location.

## Question 9 :-  What is PyMongo, and why is it used ?

## Answer :-

### PyMongo is the official Python driver for MongoDB. It allows Python applications to interact with MongoDB databases by providing an interface to perform CRUD (Create, Read, Update, Delete) operations, run queries, and manage collections.

### PyMongo simplifies working with MongoDB in Python, making it easier to integrate NoSQL databases into Python-based applications.


### **Why is PyMongo used are mentioned below :-**

- **Connects Python Applications to MongoDB :-**

 - PyMongo enables Python programs to communicate with MongoDB databases over a network.

- **Performs CRUD Operations :-**

 - Allows inserting, retrieving, updating, and deleting documents easily.

- **Supports Query Execution :-**

 - Provides methods to run complex queries, filter data, and aggregate results efficiently.

- **Handles Indexing and Aggregation :-**

 - Supports MongoDB features like indexing, aggregation pipelines, and text search.

- **Supports Bulk Data Processing :-**

 - Efficiently handles large datasets with batch inserts and updates.

- **Works with BSON (Binary JSON) :-**

 - PyMongo automatically converts Python dictionaries into BSON format, which MongoDB uses for storage.

- **Supports Connection Pooling and Authentication :-**

 - Manages multiple database connections and secures access with authentication mechanisms.



## Question 10 :- What are the ACID properties in the context of MongoDB transactions ?



## Answer :-

### ACID properties ensure data integrity, consistency, and reliability in database transactions. MongoDB supports ACID transactions, allowing multiple operations to be executed safely and atomically across multiple documents and collections.


### There are some ACID properties which mentioned below :-**

- **Atomicity (All or Nothing Execution) :-**

 - A transaction must be fully completed or fully rolled back in case of failure.

 - If a transaction modifies multiple documents and fails halfway, all changes are discarded to prevent partial updates.

- **Consistency (Valid State Before and After Transaction) :-**

 - Ensures that data remains valid before and after a transaction.

 - MongoDB enforces schema validation and constraints to maintain consistency.

- **Isolation (Transactions Do Not Interfere with Each Other) :-**

 - Transactions are executed independently, ensuring that concurrent transactions do not affect each other’s results.

 - MongoDB uses snapshot isolation to provide a consistent view of data during a transaction.

- **Durability (Committed Transactions are Permanent) :-**

 - Once a transaction is committed, changes are permanently stored in the database, even if the system crashes.

 - MongoDB ensures durability using the WiredTiger storage engine and journaling.



## Question 11 :- What is the purpose of MongoDB's explain() function ?

## Answer :-

### The purpose of MongoDB's explain() function is used to analyze and understand the execution plan of a query. It helps developers and database administrators optimize queries by providing insights into how MongoDB processes the query, which indexes are used, and the execution time.


### **MongoDB explain() is used are mentioned below :-**

- **Query Performance Optimization :-**

 - It helps identify slow queries and optimize them using indexes.

- **Understanding Query Execution Steps :-**

 - It shows whether MongoDB is performing a collection scan (slow) or using an index (fast).

- **Choosing the Best Index :-**

 - It determines which indexes are used and whether new indexes should be created.

- **Reducing Resource Usage :-**

 - It helps in optimizing queries to use less CPU and memory.

### **How explain() Works in MongoDB are mention below**

### **MongoDB provides three modes of execution analysis :-**

- **"queryPlanner" Mode (Default) :-**

 - It shows how MongoDB plans to execute the query before running it.

- **"executionStats" Mode :-**

 It executes the query and provides details about performance, execution time, and index usage.

- **"allPlansExecution" Mode :-**

 - Provides detailed information about all possible execution plans MongoDB considered.

## Question 12 :- How does MongoDB handle schema validation ?

## Answer :-

### MongoDB is a schema-less NoSQL database, meaning documents in a collection can have different structures. However, to ensure data consistency and integrity, MongoDB provides schema validation mechanisms that enforce rules on document structure.



### **Methods of Schema Validation in MongoDB are :-**

- **1) JSON Schema Validation :-**

 - MongoDB supports JSON Schema-based validation, allowing users to define rules for documents in a collection.

 - The schema can specify data types, required fields, value ranges, and patterns to maintain structured data.

 - If a document does not meet the defined rules, MongoDB rejects it during insertion or update.

- **2) Validation Levels :-**

 - MongoDB allows controlling how strictly schema rules are enforced:

 - **Strict Validation :-** Rejects any document that does not match the schema.

 - **Moderate Validation :-** Allows existing invalid documents but enforces rules on new ones.

- **3) Validation Actions :-**

 - MongoDB provides two options when a document violates validation rules:

 - **Error :-** Prevents insertion or update if the document does not meet the schema.

 - **Warn :-** Logs a warning but allows the document to be inserted.

- **4) Schema Validation in Mongoose :-**

 - For applications using Mongoose (MongoDB ODM for Node.js), schema validation is built-in. It enforces strict data validation before storing documents, ensuring compliance with predefined rules.



## Question 13 :- What is the difference between a primary and a secondary node in a replica set ?

## Answer :-


### In MongoDB, a replica set is a group of MongoDB servers that maintain identical copies of the data to ensure high availability and fault tolerance. A replica set consists of one primary node and one or more secondary nodes.


### **Primary Node :-**

- The main node in the replica set that handles all write operations.

- Clients always read and write to the primary by default.

- Applies all changes first, and secondary nodes replicate from it.

- If the primary node fails, a new primary is elected automatically from the secondaries.

### **Secondary Node :-**

- Replica copies of the primary node, used for backup and load balancing.

- Synchronizes data from the primary using oplog (operations log).

- By default, secondary nodes do not accept write operations.

- Can be used for read scaling by allowing read queries (readPreference : secondary).

- Participates in the election process to become the new primary if needed.


## Question 14 :- What security mechanisms does MongoDB provide for data protection ?

## Answer :-


### MongoDB provides multiple security features to protect data from unauthorized access, breaches, and vulnerabilities. These mechanisms ensure authentication, encryption, access control, and network security to maintain data integrity and confidentiality.


### **Key Security Mechanisms in MongoDB :-**

- **1) Authentication (User Access Control) :-**

 - Ensures only authorized users can access the database.

 - **Supports multiple authentication methods :-**

   - **SCRAM (Salted Challenge Response Authentication Mechanism) :-** Default authentication method.

   - **LDAP (Lightweight Directory Access Protocol) :-** Integrates with enterprise authentication systems.

   - **X.509 Certificates :-** Uses SSL/TLS certificates for authentication.

   - **Kerberos Authentication :-** Secure authentication for enterprise environments.

- **2) Role-Based Access Control (RBAC) :-**

 - Assigns specific roles to users with restricted permissions.

 - **Built-in roles include :-**

   - **read :-** Can only read data.

   - **readWrite :-** Can read and write data.

   - **dbAdmin :-** Manages database but cannot access data.

   - **userAdmin :-** Manages user accounts and roles.

- **3) Data Encryption :-**

 - **Encryption at Rest**

   - Uses WiredTiger storage encryption to encrypt data on disk.

 - **Encryption in Transitu**

   - Uses TLS/SSL encryption to secure data while transferring between clients and servers.

- **4) Network Security :-**

 - **IP Whitelisting :-** Restricts database access to specific IP addresses.

 - **VPC Peering :-** Securely connects MongoDB with cloud-based applications.

 - **Firewall Rules :-** Blocks unauthorized connections using firewalls.

- **5) Auditing and Logging :-**

 - **MongoDB provides audit logs to track database activities, including :-**

   - Authentication attempts.

   - Data modifications.

   - Query execution logs.

- **6) Security for MongoDB Atlas :-**

 - **Managed security features include :-**

   - Automatic backup encryption.

   - End-to-end data encryption.

   - Private networking for secure cloud access.

## Question 15 :-  Explain the concept of embedded documents and when they should be used.

## Answer :-

### Embedded documents in MongoDB allow nesting one document inside another, storing related data within a single document instead of using multiple collections.

### This structure follows a denormalized data model, improving read performance and reducing the need for joins. This approach improves query performance by reducing the need for joins and enabling faster data retrieval.



### **When to Use Embedded Documents mentioned below :-**

- **Strong Parent-Child Relationship :-**

 - When one entity is always dependent on another, embedding is suitable.

 - Example : A blog post with comments, where comments always belong to a specific post.

- **Frequent Joint Access :-**

 - If data is often retrieved together, embedding improves performance.

 - Example : A user profile with addresses, where addresses are always needed when retrieving user details.

- **Limited Growth of Data :-**

 - If embedded data remains small and manageable, embedding is efficient.

 - Example : A product with a few fixed reviews.


### **When Not to Use Embedded Documents are mentioned below :-**

- **Excessive Data Growth :-**

 - MongoDB has a 16MB document size limit, making large embedded documents inefficient.

 - Example: Millions of orders under a single customer document.

- **Independent Access to Child Data :-**

 - If the child data is frequently accessed without the parent, it should be stored separately.

 - Example : Orders being queried separately from customer details.

## Question 16 :- What is the purpose of MongoDB’s $lookup stage in aggregation ?

## Answer :-


### The **$lookup** stage in MongoDB's aggregation pipeline is used to perform JOIN operations between two collections, similar to the SQL JOIN. It allows documents from one collection to be merged with related documents from another collection based on a common field.


### The purpose of $lookup in MongoDB are :-**

- **Joins Data from Multiple Collections :-**

 - Enables combining related data without requiring a relational database.

 - Example : Joining orders with customers to fetch customer details along with orders.

- **Reduces Multiple Queries :-**

 - Instead of querying multiple collections separately, $lookup retrieves related data in a single query, improving efficiency.

- **Enhances Data Representation :-**

 - Provides a structured view by embedding related documents within a single result.

- **Supports Complex Queries :-**

 - "$lookup" can be combined with other aggregation stages
    ($match, $group, etc.) to perform advanced data analysis.



### **When to Use "$lookup" :-**

- When data is stored in separate collections but needs to be retrieved together.

- When performing one-to-many or many-to-one relationships, such as fetching orders along with customer details.

- When optimizing read performance by reducing multiple queries.

## Question 17 :- What are some common use cases for MongoDB ?

## Answer :-


### MongoDB is a NoSQL database designed for scalability, flexibility, and high performance, making it suitable for various applications across different industries. It is widely used for handling large volumes of unstructured and semi-structured data.


### There are some common use cases of MongoDB mention below :-**

- **1) Real-Time Analytics and Big Data Processing :-**

 - MongoDB's scalability and high-speed reads/writes make it ideal for real-time data analysis.

 - Used in financial services, fraud detection, and IoT analytics.

- **2) Content Management Systems (CMS) :-**

 - Handles text, images, videos, and metadata efficiently.

 - Used by media, publishing, and e-commerce platforms for managing dynamic content.

- **3) E-Commerce and Retail Applications :-**

 - Stores product catalogs, customer data, and order history with flexible schema.

 - Supports personalized recommendations and real-time inventory tracking.

- **4) Internet of Things (IoT) Applications :-**

 - Efficiently manages high-frequency sensor data from connected devices.

 - Used in smart homes, industrial automation, and healthcare monitoring.

- **5) Mobile and Web Applications :-**

 - Ideal for applications requiring high-speed user interactions and flexible data models.

 - Used in social media, gaming, and on-demand services (ride-sharing, food delivery).

- **6) Financial and Banking Systems :-**

 - Manages transactions, customer profiles, and risk analysis securely.

 - Supports high-performance queries and real-time fraud detection.

- **7) Healthcare and Medical Records :-**

 - Stores patient records, prescriptions, and medical imaging data.

 - Ensures fast retrieval and secure access to medical data.

- **8) Geospatial Applications :-**

 - Supports location-based services, mapping, and logistics tracking.

 - Used in ride-hailing apps, delivery services, and GPS navigation.

## Question 18 :- What are the advantages of using MongoDB for horizontal scaling ?

## Answer :-

### Horizontal scaling, also known as sharding, is a technique where data is distributed across multiple servers to handle large-scale applications. MongoDB is designed for horizontal scalability, making it ideal for handling high traffic, large datasets, and distributed workloads.


### **Advantages of Horizontal Scaling in MongoDB are :-**

- **1) Handles Large Datasets Efficiently :-**

 - Data is partitioned across multiple servers (shards), preventing any single server from becoming a bottleneck.

 - Supports big data applications that require massive storage.

- **2) Improved Read and Write Performance :-**

 - Read and write operations are distributed across shards, reducing load on individual servers.

 - Supports parallel processing, ensuring faster query execution.

- **3) Cost-Effective Scaling :-**

 - Instead of upgrading a single powerful machine (vertical scaling), MongoDB allows adding multiple lower-cost servers, making it more affordable.

- **4) High Availability and Fault Tolerance :-**

 - If one shard fails, the other shards continue to function, ensuring no single point of failure.

 - Combined with replication, MongoDB ensures automatic failover and data redundancy.

- **5) Automatic Balancing of Data :-**

 - MongoDB's sharded clusters automatically balance data across shards to prevent overload.

 - New servers can be added dynamically without downtime.

- **6) Geographic Distribution :-**

 - Shards can be deployed across multiple geographic locations, improving performance for globally distributed applications.

- **7) Supports High Traffic Applications :-**

 - Ideal for applications with millions of users, such as social media, e-commerce, and streaming platforms.

 - Efficiently handles high-concurrency workloads.



## Question 19 :- How do MongoDB transactions differ from SQL transactions ?

## Answer :-


### Transactions in databases ensure data integrity by following ACID (Atomicity, Consistency, Isolation, Durability) properties. While both MongoDB and SQL databases support transactions, their implementation and behavior differ due to their underlying architectures.



### **Differences Between MongoDB and SQL Transactions :-**

#### **MongoDB Transactions :-**

- The data base type in MongoDB Transactions is "NO-SQL (Document based).

- The data model is Flexible schema (JSON-like documents) in MongoDB Transactions.

- MongoDB Transaction is Multi-Document Transactions supported from MongoDB 4.0

- In MongoDB Transaction Fully ACID-compliant for transactions.

- MongoDB Transaction is designed for horizontal scaling (sharding).

- In MongoDB Transaction write operations	transactions can span multiple documents within a replica set.

- In MongoDB Transaction isolation level is snapshot isolation using WiredTiger storage engine.

- In MongoDB Transaction performance	faster for individual document writes.

- In MongoDB Transaction Joins and Relationships are uses embedded documents and "$lookup" for joins.


#### **SQL Transactions :-**

- The data base type in SQL Transactions is relational (Table-based).

- The data model is Fixed schema (Tables, Rows, Columns) in SQL Transactions.

- SQL Transaction is Multi-Document Transactions is native support for multi-table transactions.

- In SQL Transaction Fully ACID-compliant.

- SQL Transaction is designed for vertical scaling (single powerful machine).

- In SQL Transaction write operations	transactions span multiple tables.

- In SQL Transaction isolation level is supports multiple isolation levels (Read Committed, Repeatable Read, Serializable).

- In SQL Transaction performance optimized for structured transactional workloads.

- In SQL Transaction Joins and Relationships are uses foreign keys and JOINs for data relationships.



### **How MongoDB Handles Transactions Differently :-**

- **1) Multi-Document Transactions (Introduced in MongoDB 4.0) :-**

 - Earlier versions of MongoDB only supported atomicity at the document level.

 - Now, MongoDB allows transactions across multiple documents and collections, similar to SQL databases.

- **2) Optimized for Scalability :-**

 - MongoDB transactions are designed to work with sharded clusters.

 - SQL databases, in contrast, rely on monolithic architecture where transactions span multiple tables in a single database.

- **3) Joins vs. Embedded Documents :-**

 - SQL uses JOINs to combine related tables.

 - MongoDB uses embedded documents or $lookup to merge collections, reducing the need for transactions.

## Question 20 :- What are the main differences between capped collections and regular collections ?

## Answer :-


### MongoDB provides two types of collections: capped collections and regular collections. Each serves different purposes based on storage and performance requirements.


### A capped collection is a fixed-size, high-performance collection that maintains data in insertion order and automatically removes the oldest documents when it reaches its size limit.


### **Differences Between Capped and Regular Collections :-**

#### **Capped Collection :-**

- In Capped Collection the size limit is fixed-size (defined during creation).

- The automatic deletion in Capped Collection is oldest documents are automatically removed when the size limit is reached.

- In Capped Collection the insert order is preserves the order of insertion.

- The performance in Capped Collection is faster writes due to fixed storage allocation.

- The update behavior in Capped Collection cannot update documents if they change the document size.

- The use case of Capped Collection is ideal for logging, caching, real-time event tracking.

- The TTL (Time-To-Live) Support in Capped Collection is No built-in TTL.


### **When to Use Capped Collections :-**

- Log files where old data can be automatically discarded.

- Real-time analytics where recent data is more relevant.

- Message queues where only the latest messages are needed.


#### **Regular Collections :-**

- In Regular Collections the size limit is grows dynamically.

- The automatic deletion in Regular Collection is documents persist unless manually deleted.

- In Regular Collection the insert order is no guaranteed order.

- The performance in Regular Collection is slightly slower due to dynamic growth.

- The update behavior in Regular Collection allows full document updates.

- The use case in Regular Collection is used for general-purpose storage.

- The TTL (Time-To-Live) Support in Regular Collection can use TTL indexes to expire documents.



### **When to Use Regular Collections :-**

- When data needs to persist indefinitely.

- When updates require changing document size.

- For traditional document storage.



## Question 21 :- What is the purpose of the $match stage in MongoDB's aggregation pipeline ?

## Answer :-


### The $match stage in MongoDB's aggregation pipeline is used to filter documents based on specified conditions, similar to the WHERE clause in SQL. It reduces the number of documents processed in subsequent stages, improving performance.


### **Key Purposes of $match are :-**

- **Filters Documents at the Start :-**

 - Eliminates unnecessary documents early in the pipeline, reducing processing time.

- **Improves Performance :-**

 - When placed at the beginning of the aggregation pipeline, it speeds up processing by reducing the dataset size.

- **Supports Complex Query Conditions :-**

 - Allows filtering using multiple conditions ($gt, $lt, $eq, $in, $or, etc.).

- **Optimizes Index Usage :-**

 - Uses indexes when available, making queries faster.


### **When to Use $match :-**

- When you only need specific documents from a collection.

- At the start of an aggregation pipeline to optimize performance.

- When performing complex filtering before grouping or transforming data.



## Question 22 :- How can you secure access to a MongoDB database ?

## Answer :-

### Securing a MongoDB database is essential to prevent unauthorized access, data breaches, and cyber threats. MongoDB provides multiple security mechanisms to ensure data protection.


### **Key Security Measures in MongoDB are :-**

- **1) Enable Authentication and Authorization :-**

 - Authentication ensures only authorized users can access the database.

 - Role-Based Access Control (RBAC) restricts users to specific roles and permissions.

- **2) Use TLS/SSL Encryption :-**

 - Encrypts data in transit to prevent interception.

 - Ensures secure communication between clients and the database.

- **3) Restrict Network Access :-**

 - Bind MongoDB to a specific IP address to limit exposure.

 - Use firewalls and IP whitelisting to prevent unauthorized access.

- **4) Implement Data Encryption at Rest :-**

 - Stores encrypted data using WiredTiger storage encryption.

 - Protects sensitive data from unauthorized access.

- **5) Enable Audit Logging :-**

 - Tracks database activities, including logins and modifications.

 - Helps in detecting and responding to security breaches.

- **6) Use Strong Passwords and Avoid Default Settings :-**

 - Enforce strong and unique passwords for database users.

 - Disable default settings that allow unrestricted access.

- **7) Backup Data Regularly :-**

 - Regular backups prevent data loss in case of cyberattacks or system failures.

 - Store backups in secure and encrypted locations.



## Question 23 :- What is MongoDB's WiredTiger storage engine, and why is it important ?


## Answer :-


### **WiredTiger** is the default storage engine in MongoDB, introduced in MongoDB 3.2. It provides high performance, concurrency control, and efficient storage management, making MongoDB more scalable and efficient.



### **Importance of WiredTiger :-**

- **1) Improved Performance and Concurrency :-**

 - Uses multi-threaded architecture, allowing multiple operations to execute in parallel.

 - Handles high read and write throughput efficiently.

- **2) Compression for Reduced Storage Usage :-**

 - Supports snappy and zlib compression, reducing the storage footprint of documents.

 - Helps optimize disk space usage.

- **3) Document-Level Locking for Faster Transactions :-**

 - Unlike previous storage engines that used global locks, WiredTiger provides document-level locking.

 - Increases database responsiveness and reduces contention in multi-user environments.

- **4) Support for Multi-Document ACID Transactions :-**

 - Ensures Atomicity, Consistency, Isolation, and Durability (ACID) in multi-document transactions.

 - Makes MongoDB suitable for financial and transactional applications.

- **5) Efficient Memory and Cache Management :-**

 - Uses a write-ahead logging (WAL) mechanism, improving data durability.

 - Efficiently manages cache and RAM utilization, leading to better performance.

- **6) Enhanced Scalability :-**

 - Well-suited for large-scale applications requiring high availability and distributed systems.

 - Works effectively with MongoDB sharding and replication.



# Practical Questions and Answers :-

## Question 1 :- Write a Python script to load the Superstore dataset from a CSV file into MongoDB.

In [1]:
!pip install pymongo




[notice] A new release of pip available: 22.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import pandas as pd

#Import pymongo after installation :-
import pymongo
from pymongo import MongoClient

import json

In [3]:
# Making connection to mongobd :-

client = pymongo.MongoClient("mongodb://localhost:27017")


In [4]:
# Reading the CSV file :-

df = pd.read_csv("superstore.csv")

In [5]:
df

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country/Region,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2018-152156,11/8/2018,11/11/2018,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.9600,2,0.00,41.9136
1,2,CA-2018-152156,11/8/2018,11/11/2018,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.9400,3,0.00,219.5820
2,3,CA-2018-138688,6/12/2018,6/16/2018,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.6200,2,0.00,6.8714
3,4,US-2017-108966,10/11/2017,10/18/2017,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,5,0.45,-383.0310
4,5,US-2017-108966,10/11/2017,10/18/2017,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.3680,2,0.20,2.5164
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9989,9990,CA-2016-110422,1/21/2016,1/23/2016,Second Class,TB-21400,Tom Boeckenhauer,Consumer,United States,Miami,...,33180.0,South,FUR-FU-10001889,Furniture,Furnishings,Ultra Door Pull Handle,25.2480,3,0.20,4.1028
9990,9991,CA-2019-121258,2/26/2019,3/3/2019,Standard Class,DB-13060,Dave Brooks,Consumer,United States,Costa Mesa,...,92627.0,West,FUR-FU-10000747,Furniture,Furnishings,Tenex B1-RE Series Chair Mats for Low Pile Car...,91.9600,2,0.00,15.6332
9991,9992,CA-2019-121258,2/26/2019,3/3/2019,Standard Class,DB-13060,Dave Brooks,Consumer,United States,Costa Mesa,...,92627.0,West,TEC-PH-10003645,Technology,Phones,Aastra 57i VoIP phone,258.5760,2,0.20,19.3932
9992,9993,CA-2019-121258,2/26/2019,3/3/2019,Standard Class,DB-13060,Dave Brooks,Consumer,United States,Costa Mesa,...,92627.0,West,OFF-PA-10004041,Office Supplies,Paper,"It's Hot Message Books with Stickers, 2 3/4"" x 5""",29.6000,4,0.00,13.3200


In [6]:
df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country/Region,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2018-152156,11/8/2018,11/11/2018,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2018-152156,11/8/2018,11/11/2018,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,3,CA-2018-138688,6/12/2018,6/16/2018,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714
3,4,US-2017-108966,10/11/2017,10/18/2017,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,5,0.45,-383.031
4,5,US-2017-108966,10/11/2017,10/18/2017,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,2,0.2,2.5164


In [7]:
df.tail()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country/Region,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
9989,9990,CA-2016-110422,1/21/2016,1/23/2016,Second Class,TB-21400,Tom Boeckenhauer,Consumer,United States,Miami,...,33180.0,South,FUR-FU-10001889,Furniture,Furnishings,Ultra Door Pull Handle,25.248,3,0.2,4.1028
9990,9991,CA-2019-121258,2/26/2019,3/3/2019,Standard Class,DB-13060,Dave Brooks,Consumer,United States,Costa Mesa,...,92627.0,West,FUR-FU-10000747,Furniture,Furnishings,Tenex B1-RE Series Chair Mats for Low Pile Car...,91.96,2,0.0,15.6332
9991,9992,CA-2019-121258,2/26/2019,3/3/2019,Standard Class,DB-13060,Dave Brooks,Consumer,United States,Costa Mesa,...,92627.0,West,TEC-PH-10003645,Technology,Phones,Aastra 57i VoIP phone,258.576,2,0.2,19.3932
9992,9993,CA-2019-121258,2/26/2019,3/3/2019,Standard Class,DB-13060,Dave Brooks,Consumer,United States,Costa Mesa,...,92627.0,West,OFF-PA-10004041,Office Supplies,Paper,"It's Hot Message Books with Stickers, 2 3/4"" x 5""",29.6,4,0.0,13.32
9993,9994,CA-2019-119914,5/4/2019,5/9/2019,Second Class,CC-12220,Chris Cortes,Consumer,United States,Westminster,...,92683.0,West,OFF-AP-10002684,Office Supplies,Appliances,"Acco 7-Outlet Masterpiece Power Center, Wihtou...",243.16,2,0.0,72.948


In [8]:
df.shape

(9994, 21)

In [9]:
# Converting the CSV file into json file :-
# Convert DataFrame to a List of Dictionaries (MongoDB format)

data = df.to_dict(orient= "records" )

In [10]:
data

[{'Row ID': 1,
  'Order ID': 'CA-2018-152156',
  'Order Date': '11/8/2018',
  'Ship Date': '11/11/2018',
  'Ship Mode': 'Second Class',
  'Customer ID': 'CG-12520',
  'Customer Name': 'Claire Gute',
  'Segment': 'Consumer',
  'Country/Region': 'United States',
  'City': 'Henderson',
  'State': 'Kentucky',
  'Postal Code': 42420.0,
  'Region': 'South',
  'Product ID': 'FUR-BO-10001798',
  'Category': 'Furniture',
  'Sub-Category': 'Bookcases',
  'Product Name': 'Bush Somerset Collection Bookcase',
  'Sales': 261.96,
  'Quantity': 2,
  'Discount': 0.0,
  'Profit': 41.9136},
 {'Row ID': 2,
  'Order ID': 'CA-2018-152156',
  'Order Date': '11/8/2018',
  'Ship Date': '11/11/2018',
  'Ship Mode': 'Second Class',
  'Customer ID': 'CG-12520',
  'Customer Name': 'Claire Gute',
  'Segment': 'Consumer',
  'Country/Region': 'United States',
  'City': 'Henderson',
  'State': 'Kentucky',
  'Postal Code': 42420.0,
  'Region': 'South',
  'Product ID': 'FUR-CH-10000454',
  'Category': 'Furniture',
  'Su

In [11]:
# Saving this record into Mongodb and defining the database :-

db = client["SuperstoreDB"]

In [12]:
# Priting the variable DB :-

print(db)

Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'SuperstoreDB')


In [13]:
import pandas as pd
from pymongo import MongoClient
import numpy as np

# Load the CSV
df = pd.read_csv("superstore.csv")
df = df.replace({np.nan: None})  # Replace NaN with None for MongoDB

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["sales_data"]

# Optional: Clear old data first
collection.drop()

# Insert into MongoDB
data = df.to_dict(orient="records")
collection.insert_many(data)

print("Data inserted successfully!")


Data inserted successfully!


In [14]:
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
client.server_info()  # Will raise an exception if connection fails
print("Successfully connected to MongoDB.")

Successfully connected to MongoDB.


In [15]:
# Now save the record database "SuperstoreDB" :-
# Insert Data into MongoDB :-

db.superstore.insert_many(data)

InsertManyResult([ObjectId('67f21bebfd9ecb017bc87fdd'), ObjectId('67f21bebfd9ecb017bc87fde'), ObjectId('67f21bebfd9ecb017bc87fdf'), ObjectId('67f21bebfd9ecb017bc87fe0'), ObjectId('67f21bebfd9ecb017bc87fe1'), ObjectId('67f21bebfd9ecb017bc87fe2'), ObjectId('67f21bebfd9ecb017bc87fe3'), ObjectId('67f21bebfd9ecb017bc87fe4'), ObjectId('67f21bebfd9ecb017bc87fe5'), ObjectId('67f21bebfd9ecb017bc87fe6'), ObjectId('67f21bebfd9ecb017bc87fe7'), ObjectId('67f21bebfd9ecb017bc87fe8'), ObjectId('67f21bebfd9ecb017bc87fe9'), ObjectId('67f21bebfd9ecb017bc87fea'), ObjectId('67f21bebfd9ecb017bc87feb'), ObjectId('67f21bebfd9ecb017bc87fec'), ObjectId('67f21bebfd9ecb017bc87fed'), ObjectId('67f21bebfd9ecb017bc87fee'), ObjectId('67f21bebfd9ecb017bc87fef'), ObjectId('67f21bebfd9ecb017bc87ff0'), ObjectId('67f21bebfd9ecb017bc87ff1'), ObjectId('67f21bebfd9ecb017bc87ff2'), ObjectId('67f21bebfd9ecb017bc87ff3'), ObjectId('67f21bebfd9ecb017bc87ff4'), ObjectId('67f21bebfd9ecb017bc87ff5'), ObjectId('67f21bebfd9ecb017bc87f

##  Question 2 :- Retrieve and print all documents from the Orders collection.

In [16]:
# Connect to MongoDB
client = pymongo.MongoClient("mongodb://localhost:27017/")  # Ensure MongoDB is running
db = client["SuperstoreDB"]  # Select the database
collection = db["Orders"]  # Select the collection


In [17]:
# Retrieve all documents :-

orders = collection.find()


In [18]:
# Print all documents :-
# Prints each document

for order in orders:
    print(order)


In [19]:
# Retrieves only 10 records :-


orders = collection.find().limit(10)

In [20]:
orders

<pymongo.synchronous.cursor.Cursor at 0x1ce7f287990>

## Question 3 :- Count and display the total number of documents in the Orders collection.

In [21]:
# Connect to MongoDB :-

client = pymongo.MongoClient("mongodb://localhost:27017/")  # To ensure MongoDB is running
db = client["SuperstoreDB"]  # Selecting the database
collection = db["Orders"]  # Selecting the collection


In [36]:
# Count total documents :-

total_documents = collection.count_documents({})


In [37]:
total_documents


0

In [38]:
# Display the count :-

print(f"Total number of documents in the Orders collection: {total_documents}")

Total number of documents in the Orders collection: 0


## Question 4 :- Write a query to fetch all orders from the "West" region.

In [39]:
# Connect to MongoDB :-

client = pymongo.MongoClient("mongodb://localhost:27017/")  # To ensure MongoDB is running
db = client["SuperstoreDB"]  # Selecting the database
collection = db["Orders"]  # Selecting the collection


# Query to fetch all orders where Region is "West" :-

west_orders = collection.find({"Region": "West"})


# Print the retrieved orders :-
for order in west_orders:
    print(order)

In [40]:
west_orders = collection.find({"Region": "West"}).limit(5)

In [41]:
west_orders

<pymongo.synchronous.cursor.Cursor at 0x1ce0b799f90>

## Question 5 :- Write a query to find orders where Sales is greater than 500.

In [42]:
# Connect to MongoDB :-
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
collection = db["Orders"]


# Code to find orders where Sales > 500 :-

high_sales_orders = collection.find({"Sales": {"$gt": 500}})


# Printing the retrieved orders :-

for order in high_sales_orders:
    print(order)

In [28]:
# Limiting the result with 5 :-

high_sales_orders = collection.find({"Sales": {"$gt": 500}}).limit(5)


## Question 6 :- Fetch the top 3 orders with the highest Profit.

In [43]:
# Connect to MongoDB :-

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
collection = db["Orders"]


# Code to fetch top 3 orders with highest Profit :-
# Sort by Profit in descending order :-

top_profit_orders = collection.find().sort("Profit", -1).limit(3)


# Printing the top 3 orders :-
for order in top_profit_orders:
    print(order)

## Question 7 :-  Update all orders with Ship Mode as "First Class" to "Premium Class".

In [45]:
# Connect to MongoDB :-

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
collection = db["Orders"]


# Updating all documents where Ship Mode is "First Class" to "Premium Class" :-

result = collection.update_many(
    {"Ship Mode": "First Class"},  # Condition applied :- Find documents with "First Class"
    {"$set": {"Ship Mode": "Premium Class"}}  # Updating :- Change to "Premium Class"
)


# Printing the number of updated documents :-

print(f"Total documents updated: {result.modified_count}")

Total documents updated: 0


In [31]:
one_doc = collection.find_one()
print(one_doc)


None


## Question 8 :- Delete all orders where Sales is less than 50.

In [32]:
# Connect to MongoDB :-

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
collection = db["Orders"]


# Deleting all orders where Sales is less than 50 :-
# Here, $lt means "less than" :-

result = collection.delete_many({"Sales": {"$lt": 50}})


# Printing the number of deleted documents :-

print(f"Total documents deleted: {result.deleted_count}")

Total documents deleted: 0


## Question 9 :- Use aggregation to group orders by Region and calculate total sales per region.

In [33]:
# Connect to MongoDB :-

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
collection = db["Orders"]


# Aggregation Code to Group by Region and Sum Sales :-
# Here, Group by Region, sum Sales :-

pipeline = [
    {"$group": {"_id": "$Region", "Total Sales": {"$sum": "$Sales"}}}
]



# Executing Aggregation Code :-

results = collection.aggregate(pipeline)



# Printing the Total Sales per Region :-
for result in results:
    print(result)

## Question 10 :- Fetch all distinct values for Ship Mode from the collection.


In [34]:
# Connect to MongoDB :-

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
collection = db["Orders"]


# Fetching all distinct values of "Ship Mode" :-

distinct_ship_modes = collection.distinct("Ship Mode")


# Printing the distinct Ship Mode values :-

print("Distinct Ship Modes:", distinct_ship_modes)


Distinct Ship Modes: []


## Question 11 :-  Count the number of orders for each category.


In [35]:
# Connect to MongoDB :-

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
collection = db["Orders"]


# Aggregation Code to Count Orders per Category :-
# Here, Group by Category, count orders :-

pipeline = [
    {"$group": {"_id": "$Category", "Total Orders": {"$sum": 1}}}
]



# Executing Aggregation Code :-

results = collection.aggregate(pipeline)



# Printing the Number of Orders per Category :-
for result in results:
    print(result)