#Theoretical Questions

1. What are the key differences between SQL and NoSQL databases

ANS-SQL and NoSQL databases differ primarily in the way they store, organize, and manage data. SQL databases are relational and use a structured, tabular model with rows and columns, where data is organized into predefined schemas. This makes them highly suitable for applications requiring structured data and complex relationships. In contrast, NoSQL databases are non-relational and offer more flexible data models, such as document-based, key-value pairs, wide-column stores, or graph structures, allowing them to handle unstructured and semi-structured data more effectively. Another key distinction lies in schema rigidity: SQL databases require a fixed schema that must be defined beforehand, and altering it can be costly, whereas NoSQL databases are schema-less or schema-flexible, making it easier to adapt to changing data requirements. Scalability is also an important difference: SQL databases generally scale vertically by enhancing the capacity of a single server, while NoSQL databases are designed to scale horizontally across multiple servers, making them ideal for distributed systems and handling massive amounts of data. From a transaction perspective, SQL databases strictly adhere to ACID properties, ensuring strong consistency and reliability, which is crucial in domains like banking and finance. On the other hand, NoSQL databases often embrace the BASE model, offering high availability and eventual consistency, which is better suited for applications that prioritize performance over strict accuracy in real time. Querying methods further highlight the difference—SQL databases rely on the standardized Structured Query Language (SQL) for complex queries and joins, whereas NoSQL databases use varied approaches such as APIs, JSON-like queries, or proprietary languages depending on the database type. In terms of use cases, SQL databases are preferred in systems where data integrity, consistency, and relational complexity are essential, while NoSQL databases excel in handling large-scale, real-time, and rapidly evolving datasets like social media platforms, IoT applications, and big data analytics. Examples of SQL databases include MySQL, PostgreSQL, Oracle, and Microsoft SQL Server, whereas popular NoSQL databases include MongoDB, Cassandra, Redis, CouchDB, Neo4j, and DynamoDB.

2. What makes MongoDB a good choice for modern applications

ANS-MongoDB is considered a strong choice for modern applications because it is designed to handle the flexibility, scalability, and speed that today’s data-driven systems demand. Unlike traditional relational databases, MongoDB is a NoSQL, document-oriented database, which means it stores data in a flexible, JSON-like format. This allows developers to work with semi-structured or unstructured data without worrying about rigid schemas, making it much easier to adapt to changing business requirements and rapidly evolving application features. Its horizontal scalability is another key advantage—MongoDB can distribute data across multiple servers (sharding) to support massive workloads and high availability, which is crucial for global, cloud-native applications.

Additionally, MongoDB provides high performance for both read and write operations, making it suitable for real-time applications like social networks, IoT systems, and e-commerce platforms. It balances consistency and scalability by supporting features like replication and eventual consistency, ensuring that applications remain resilient and always accessible. MongoDB also comes with a powerful query language that supports rich queries, indexing, and aggregation, allowing developers to work efficiently with complex datasets.

From a developer’s perspective, MongoDB is highly appealing because it integrates seamlessly with modern programming languages, frameworks, and cloud environments. Its flexible schema enables faster development cycles, reducing the time and effort required to launch new features. Moreover, MongoDB Atlas, its fully managed cloud service, simplifies deployment, monitoring, and scaling, making it easier to run production-grade applications without heavy database administration.

3. Explain the concept of collections in MongoDB

ANS-In MongoDB, a collection is the equivalent of a table in a relational database, but it is much more flexible. A collection is essentially a group of documents, where each document is a JSON-like object containing key–value pairs. Unlike relational tables that enforce a fixed schema, collections in MongoDB are schema-less, meaning the documents inside a collection do not need to have the same fields or structure. For example, one document in a collection might include a field for "address," while another might not, and MongoDB will still store them together in the same collection.

Collections are used to organize documents that are related to one another. For instance, in an e-commerce application, you might have one collection for users, another for products, and another for orders. This grouping makes it easy to query, index, and manage related sets of data. MongoDB also allows developers to create indexes on collections to speed up query performance, and collections can be sharded or replicated to ensure scalability and availability in distributed systems.

4. How does MongoDB ensure high availability using replication

ANS-MongoDB ensures high availability through a feature called replication, which is implemented using replica sets. A replica set is a group of MongoDB servers that maintain the same dataset, providing redundancy and automatic failover in case of server failures.

In a replica set, one node acts as the primary node, and the others are secondary nodes. All write operations and most read operations are directed to the primary, while the secondary nodes maintain copies of the data by continuously replicating the primary’s oplog (operations log). If the primary node fails due to a crash, network issue, or maintenance, the replica set automatically holds an election among the secondary nodes to choose a new primary. This process happens quickly, without requiring manual intervention, ensuring that the database remains available with minimal downtime.

Secondary nodes can also be configured to handle read operations (read scaling), which helps distribute workloads across the replica set. Additionally, MongoDB supports arbiter nodes—lightweight members that don’t store data but participate in elections to break ties and ensure high availability in replica sets with an even number of nodes.

Overall, by maintaining multiple synchronized copies of data across servers and enabling automatic failover through elections, MongoDB replication provides fault tolerance, data redundancy, and high availability, which are critical for modern, always-on applications.

5. What are the main benefits of MongoDB Atlas

ANS-MongoDB Atlas is the fully managed cloud database service for MongoDB, and it offers several important benefits that make it a strong choice for modern applications. One of the main advantages is ease of management: Atlas handles complex database administration tasks such as provisioning, configuration, patching, backups, and monitoring, allowing developers to focus on building applications rather than managing infrastructure.

Another major benefit is global scalability and availability. Atlas runs on major cloud providers like AWS, Azure, and Google Cloud, and it lets you deploy clusters across multiple regions, enabling low-latency access for global users. Its built-in replication and automated failover ensure high availability and fault tolerance, while auto-scaling makes it easy to adjust compute and storage resources as workloads grow without downtime.

Atlas also provides strong security features, including end-to-end encryption, network isolation (VPC peering, private endpoints), role-based access control, and compliance with key standards like GDPR, HIPAA, and SOC 2. For performance, it offers powerful tools like indexing, real-time monitoring, and performance optimization dashboards.

A further benefit is its integration with modern application ecosystems. Atlas supports features like full-text search, real-time analytics, serverless triggers, and data visualization through MongoDB Charts. It also connects seamlessly with cloud services and popular programming frameworks, making it developer-friendly.

6. What is the role of indexes in MongoDB, and how do they improve performance

ANS-In MongoDB, indexes play a crucial role in improving the efficiency of data retrieval. An index is a special data structure that stores a small portion of the collection’s data in a way that makes searches faster, similar to how an index in a book helps you find topics quickly without scanning every page. By default, MongoDB performs a collection scan when executing a query, meaning it checks each document one by one, which becomes slow as data grows. With an index, MongoDB can directly locate the documents that match the query conditions, drastically reducing the number of documents it needs to examine.

Indexes improve performance by speeding up queries such as equality matches, range queries, sorting, and even geospatial searches. For example, if you create an index on a username field in a user collection, queries searching for a specific username will be executed much faster compared to scanning the entire collection. MongoDB supports different types of indexes, such as single-field indexes, compound indexes (on multiple fields), text indexes, hashed indexes, and geospatial indexes, each optimized for different use cases.

While indexes significantly boost read performance, they come with trade-offs: they require additional disk space and can slightly slow down write operations (inserts, updates, deletes) because the index must also be updated whenever the data changes. Therefore, choosing the right indexes based on query patterns is important for maintaining overall database efficiency.

7. Describe the stages of the MongoDB aggregation pipeline

ANS-In MongoDB, the **aggregation pipeline** is a powerful framework used to process, transform, and analyze documents within a collection. It works by passing data through a sequence of stages, where each stage performs a specific operation and passes the transformed output to the next stage, similar to an assembly line. The key stages of the aggregation pipeline include:

1. **\$match** – Filters documents based on specific criteria, similar to the `WHERE` clause in SQL. This stage helps reduce the amount of data passed to subsequent stages.

2. **\$project** – Reshapes documents by including, excluding, or adding new fields. For example, you can display only certain fields or create computed fields.

3. **\$group** – Groups documents by a specific field and performs aggregation operations like `sum`, `avg`, `min`, `max`, or `count`. This is comparable to SQL’s `GROUP BY`.

4. **\$sort** – Sorts documents in ascending or descending order based on one or more fields.

5. **\$limit** – Restricts the number of documents passed to the next stage, useful for returning only the top N results.

6. **\$skip** – Skips a specified number of documents, often used with `$limit` for pagination.

7. **\$unwind** – Deconstructs an array field into multiple documents, creating one document for each element of the array.

8. **\$lookup** – Performs a left outer join with another collection, bringing in related data.

9. **\$out** – Writes the results of the pipeline into a new collection, useful for storing aggregated results.

10. **\$count** – Counts the number of documents that pass through the pipeline and outputs the result.

These stages can be combined in different ways depending on the query or analysis. For example, you might first `$match` documents to filter, then `$group` them by category, and finally `$sort` the results.


8. What is sharding in MongoDB? How does it differ from replication

ANS-In MongoDB, sharding is a method of horizontal scaling that allows a database to handle large amounts of data and high traffic by distributing data across multiple servers, called shards. Each shard holds a subset of the data, and together all shards make up the complete dataset. MongoDB uses a shard key to determine how data is partitioned across the shards, enabling queries to efficiently target only the relevant shard(s) instead of scanning the entire database. Sharding is particularly useful for applications with massive datasets or high write and read throughput, such as social media platforms, e-commerce systems, or IoT applications.

While sharding focuses on scaling out the database to handle large volumes of data and traffic, replication is primarily about high availability and fault tolerance. In replication, data is copied across multiple servers in a replica set (primary and secondary nodes), so that if the primary fails, a secondary can take over without downtime. Replication ensures redundancy and data safety, whereas sharding ensures that the database can efficiently store and process data that exceeds the capacity of a single server.

9. What is PyMongo, and why is it used

ANS-PyMongo is the official Python driver for MongoDB. It provides a way for Python applications to connect to, interact with, and manipulate MongoDB databases using Python code. With PyMongo, developers can perform operations such as creating databases and collections, inserting, querying, updating, and deleting documents, and running advanced queries or aggregation pipelines—all directly from Python.

PyMongo is widely used because it bridges Python and MongoDB, making it easy to integrate MongoDB into Python applications, whether for web development, data analysis, or machine learning projects. It supports features like connection pooling, authentication, and transactions, and allows developers to work with MongoDB’s flexible document structure (BSON/JSON-like) in a Pythonic way. Essentially, PyMongo turns MongoDB into a natural extension of Python, enabling efficient database operations without needing to write low-level database code.

10. What are the ACID properties in the context of MongoDB transactions

ANS-In the context of MongoDB, **ACID properties** ensure that database transactions are **reliable and consistent**, even in multi-document operations. ACID stands for **Atomicity, Consistency, Isolation, and Durability**, and each property plays a specific role:

1. **Atomicity** – A transaction is treated as a single unit of work. Either **all operations succeed** or **none are applied**, so the database never ends up in a partial or inconsistent state.

2. **Consistency** – Transactions take the database from one valid state to another, ensuring that **all data integrity rules and constraints** are maintained before and after the transaction.

3. **Isolation** – Operations within a transaction are **isolated from other concurrent transactions**, preventing interference and ensuring that intermediate states are not visible to other operations.

4. **Durability** – Once a transaction is committed, the changes are **permanently recorded** in the database, even if there is a system crash or power failure.

MongoDB supports multi-document transactions starting from **version 4.0**, allowing ACID guarantees across multiple documents and collections within replica sets and sharded clusters. This makes MongoDB suitable for applications requiring strong consistency and reliability, such as financial or inventory systems, while still benefiting from the flexibility of a NoSQL database.

11. What is the purpose of MongoDB’s explain() function

ANS-In MongoDB, the explain() function is used to analyze and understand how a query is executed. It provides detailed information about the query execution plan, including which indexes were used, how documents were scanned, the number of documents examined versus returned, and the stages of the query pipeline. By using explain(), developers can identify performance bottlenecks, optimize queries, and decide whether creating or modifying indexes would improve efficiency.

For example, db.collection.find({ field: value }).explain("executionStats") not only shows the plan MongoDB used but also provides statistics like execution time, number of keys examined, and number of documents scanned. This makes it a crucial tool for query optimization and debugging, especially in large collections where inefficient queries can significantly impact performance.

12. How does MongoDB handle schema validation

ANS-In MongoDB, **schema validation** allows you to enforce rules on the structure and content of documents within a collection, even though MongoDB is schema-less by default. This ensures that documents meet certain criteria before being inserted or updated, helping maintain data consistency and integrity. Schema validation is defined using **validation rules** and **validation levels** when creating or modifying a collection.

Validation rules are expressed using **MongoDB’s query expressions** or **JSON Schema**, specifying requirements for fields, data types, value ranges, and even nested structures. For example, you can require that a `name` field is a string, an `age` field is an integer greater than 0, or that certain fields are mandatory.

Validation levels determine how strictly these rules are applied:

* **Strict**: All inserts and updates must comply with the rules.
* **Moderate**: Only documents that violate the rules are rejected, while existing documents remain untouched.

By combining schema validation with indexes and other MongoDB features, developers can ensure data quality and reduce errors while still benefiting from the flexibility of a schema-less database.

13. What is the difference between a primary and a secondary node in a replica set

ANS-In a MongoDB **replica set**, the **primary** and **secondary** nodes have distinct roles that together ensure **high availability and data redundancy**.

* The **primary node** is the main node that **receives all write operations** (inserts, updates, deletes) and can also handle read operations by default. It maintains the authoritative copy of the data.

* **Secondary nodes** are **replicas of the primary**. They continuously **synchronize with the primary** by copying its oplog (operations log), keeping their datasets up to date. Secondary nodes can also serve read operations if configured to do so, helping distribute read workloads and improve performance.

If the primary node fails, the replica set automatically triggers an **election** among the secondary nodes to select a new primary, ensuring the database remains available without manual intervention.

14. What security mechanisms does MongoDB provide for data protection

ANS-MongoDB provides a range of **security mechanisms** to protect data, both at rest and in transit, ensuring that only authorized users can access or modify data. The main security features include:

1. **Authentication** – Verifies the identity of users before granting access. MongoDB supports multiple authentication mechanisms, such as SCRAM (username/password), LDAP, x.509 certificates, and Kerberos.

2. **Authorization & Role-Based Access Control (RBAC)** – Controls what authenticated users can do. Users are assigned roles that define their permissions, such as read-only, read-write, or administrative privileges on specific databases or collections.

3. **Encryption** – Protects data both **in transit** and **at rest**. TLS/SSL encrypts data traveling between clients and servers, while **encrypted storage engines** (like WiredTiger encryption) secure data on disk.

4. **Auditing** – Tracks and logs database activities, helping administrators monitor access and detect suspicious operations.

5. **Network Security** – Includes **IP whitelisting**, Virtual Private Cloud (VPC) peering, and firewall rules to control which clients or servers can connect to the database.

6. **Field-Level Security** – Using **MongoDB Atlas**, developers can implement additional protections such as client-side field-level encryption, ensuring sensitive fields (e.g., credit card numbers) are encrypted and only decrypted by authorized applications.

15. Explain the concept of embedded documents and when they should be used

ANS-In MongoDB, embedded documents are documents stored within other documents as nested objects, allowing related data to be kept together in a single document. For example, a user document might contain an embedded address document with fields like street, city, and zip. This approach is different from relational databases, where related data is typically stored in separate tables and joined using foreign keys.

Embedded documents are useful when the related data is tightly coupled and usually accessed together. They improve performance by reducing the need for multiple queries or joins, as all the relevant information is stored in a single document. However, they are not ideal for data that grows indefinitely or is accessed independently, as very large embedded documents can impact performance and memory usage.

16. What is the purpose of MongoDB’s $lookup stage in aggregation

ANS-In MongoDB, the $lookup stage in an aggregation pipeline is used to perform a left outer join between two collections. It allows you to combine documents from a source collection with matching documents from a foreign collection based on a specified field, effectively bringing related data together in a single query.

For example, if you have an orders collection and a customers collection, you can use $lookup to join each order with its corresponding customer information. The result of $lookup adds an array of matching documents from the foreign collection to each document in the source collection.

The $lookup stage is especially useful in MongoDB because, unlike relational databases, it enables joining collections without restructuring the data into multiple tables, allowing developers to maintain a flexible, document-based schema while still performing relational-style queries.

17. What are some common use cases for MongoDB

ANS-MongoDB is widely used for modern applications that require **flexible, scalable, and high-performance data storage**. Some common use cases include:

1. **Content Management Systems (CMS)** – MongoDB’s flexible schema allows storing articles, media, and metadata without rigid table structures, making it ideal for blogs, news sites, and digital publishing platforms.

2. **E-commerce Applications** – It handles product catalogs, user profiles, orders, and inventory efficiently, especially when product attributes vary widely and the application requires high read/write throughput.

3. **Real-Time Analytics** – MongoDB can store and analyze large volumes of semi-structured data from IoT devices, social media, or logs, supporting dashboards and real-time insights.

4. **Mobile and Web Applications** – Its document model aligns naturally with JSON-based APIs, simplifying development for apps that need rapid iteration and dynamic data structures.

5. **Gaming** – MongoDB is used to store player profiles, game state, leaderboards, and session data due to its high performance and scalability.

6. **Social Networks** – It supports unstructured, rapidly changing data such as posts, comments, likes, and connections, while handling high traffic loads.

7. **IoT Applications** – MongoDB efficiently stores sensor data, device metadata, and time-series information in a scalable and flexible way.

18. What are the advantages of using MongoDB for horizontal scaling

ANS-MongoDB offers several advantages for **horizontal scaling** due to its design as a distributed, NoSQL database. The primary mechanism for horizontal scaling in MongoDB is **sharding**, which allows data to be distributed across multiple servers (shards), enabling the database to handle large volumes of data and high traffic.

Key advantages include:

1. **Elastic Scalability** – You can add or remove shards dynamically without downtime, allowing the database to grow with application demands.

2. **High Performance** – By distributing data and queries across multiple servers, MongoDB reduces the load on individual machines, improving read and write throughput.

3. **Automatic Data Distribution** – MongoDB uses a **shard key** to automatically partition data across shards, ensuring balanced storage and workload distribution.

4. **Fault Tolerance** – Each shard can be a replica set, combining horizontal scaling with **high availability and redundancy**, so even if one node fails, the system remains operational.

5. **Supports Large Datasets** – Sharding enables MongoDB to store and manage data that exceeds the capacity of a single server, making it suitable for big data applications.

6. **Flexible Architecture** – Developers can scale out without major application changes, as MongoDB handles routing queries to the correct shard internally.

19. How do MongoDB transactions differ from SQL transactions

ANS-MongoDB transactions differ from SQL transactions mainly in terms of **scope, flexibility, and underlying architecture**, even though both follow **ACID principles**.

1. **Scope** – SQL transactions have long supported **multi-table, multi-row operations** since relational databases inherently enforce a structured schema. MongoDB, as a document-oriented NoSQL database, initially supported **single-document atomic operations**. Starting from version 4.0, MongoDB supports **multi-document transactions** across one or more collections, and in sharded clusters from version 4.2.

2. **Schema Flexibility** – SQL transactions operate on rigid schemas, requiring all tables and columns to be predefined. MongoDB transactions operate on **schema-less, JSON-like documents**, allowing greater flexibility in data structure while still maintaining atomicity.

3. **Implementation** – SQL databases use **locking mechanisms** and transactional logs at the table or row level. MongoDB uses an **optimistic concurrency model** with an **oplog (operations log)** for replication and transaction management, ensuring atomic commits across multiple documents.

4. **Performance Considerations** – SQL transactions are generally mature and optimized for multi-row operations but can suffer from contention in highly distributed environments. MongoDB transactions add **some performance overhead** due to the need to track changes across multiple documents and shards, but they allow ACID guarantees in a NoSQL context.

5. **Use Cases** – SQL transactions are ideal for structured data with complex joins and strong consistency requirements (e.g., banking systems). MongoDB transactions are useful when **multi-document operations** are necessary but the application benefits from a flexible schema and horizontal scalability.

20. What are the main differences between capped collections and regular collections

ANS-In MongoDB, **capped collections** and **regular collections** differ mainly in terms of **storage behavior, performance, and use cases**:

1. **Storage Limits** – Capped collections have a **fixed size** defined at creation, and when they reach that size, they **overwrite the oldest documents** in a circular manner. Regular collections have **no size limit** and grow dynamically as new documents are added.

2. **Insertion Order and Performance** – Capped collections **maintain insertion order** and provide **high-performance writes**, since documents are stored in a preallocated space and never move. Regular collections do not guarantee insertion order and may require more overhead for writes as the database grows.

3. **Deletion and Updates** – In capped collections, documents **cannot be deleted individually**; they are removed only when overwritten. Updates are limited—modifications cannot increase the document size beyond its original allocation. Regular collections allow full deletion and unrestricted updates.

4. **Use Cases** – Capped collections are ideal for **logs, caches, or real-time feeds**, where only the most recent data matters. Regular collections are suitable for general-purpose storage where **all historical data** needs to be preserved and queried.

21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline

ANS-In MongoDB’s aggregation pipeline, the $match stage is used to filter documents based on specified criteria, similar to the WHERE clause in SQL. It allows only the documents that meet the given conditions to pass to the next stage of the pipeline, reducing the amount of data processed downstream and improving performance.

For example, if you have a sales collection, you can use $match to select only the documents where region is "North" or amount is greater than 1000. Placing $match early in the pipeline is a best practice because it limits the dataset for subsequent stages like $group, $sort, or $project, making the aggregation more efficient.

22. How can you secure access to a MongoDB database

ANS-Securing access to a MongoDB database involves several mechanisms that ensure only authorized users can connect and perform actions on the database. The key methods include:

1. **Authentication** – Verifies the identity of users before granting access. MongoDB supports mechanisms such as **SCRAM (username/password)**, LDAP, Kerberos, and x.509 certificates.

2. **Authorization & Role-Based Access Control (RBAC)** – Controls what authenticated users can do. Users are assigned **roles** that define permissions, such as read-only, read-write, or administrative privileges, which can be scoped to specific databases or collections.

3. **Network Security** – Restricts which clients or IP addresses can connect to the database using **IP whitelisting, firewalls, and Virtual Private Cloud (VPC) peering**.

4. **Encryption** – Protects data **in transit** using TLS/SSL and **at rest** using encrypted storage engines. MongoDB Atlas also offers **client-side field-level encryption** for sensitive fields.

5. **Auditing** – Logs all database activity, enabling monitoring, detection of suspicious operations, and compliance with regulatory standards.

23. What is MongoDB’s WiredTiger storage engine, and why is it important

ANS-MongoDB’s **WiredTiger** is the **default storage engine** used for storing data and managing how it is read and written on disk. It is important because it provides **high performance, concurrency, and data compression**, making MongoDB efficient and scalable for modern applications.

Key features of WiredTiger include:

1. **Document-Level Concurrency** – Unlike earlier engines that used collection- or database-level locks, WiredTiger allows multiple operations to occur simultaneously on different documents, improving throughput and reducing contention.

2. **Compression** – WiredTiger supports data compression for both collections and indexes, which reduces disk usage and improves I/O efficiency.

3. **Durability** – It uses **write-ahead logging** to ensure that all changes are durable, supporting MongoDB’s ACID guarantees in transactions.

4. **Memory Management** – WiredTiger efficiently uses RAM for caching frequently accessed data, balancing memory usage and performance.

#Practical Questions

1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB


ANS-

In [None]:
import pandas as pd
from pymongo import MongoClient

df = pd.read_csv('/path/to/your/Downloads/Superstore.csv')

data_dict = df.to_dict(orient='records')

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

collection.insert_many(data_dict)

print(f"{collection.count_documents({})} documents inserted into the collection.")


2. Retrieve and print all documents from the Orders collection

ANS-

In [None]:
from pymongo import MongoClient
import pprint

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

documents = collection.find()

for doc in documents:
    pprint.pprint(doc)


3 Count and display the total number of documents in the Orders collection

ANS-

In [None]:
from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

total_documents = collection.count_documents({})
print(total_documents)


4. Write a query to fetch all orders from the "West" region

ANS-

In [None]:
from pymongo import MongoClient
import pprint

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

west_orders = collection.find({"Region": "West"})

for order in west_orders:
    pprint.pprint(order)


5. Write a query to find orders where Sales is greater than 500

ANS-

In [None]:
from pymongo import MongoClient
import pprint

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

high_sales_orders = collection.find({"Sales": {"$gt": 500}})

for order in high_sales_orders:
    pprint.pprint(order)


6. Fetch the top 3 orders with the highest Profit

In [None]:
from pymongo import MongoClient
import pprint

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

top_profit_orders = collection.find().sort("Profit", -1).limit(3)

for order in top_profit_orders:
    pprint.pprint(order)


7. Update all orders with Ship Mode as "First Class" to "Premium Class.

ANS-

In [None]:
from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

collection.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)


8. Delete all orders where Sales is less than 50

ANS-

In [None]:
from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

collection.delete_many({"Sales": {"$lt": 50}})


9. Use aggregation to group orders by Region and calculate total sales per region

ANS-

In [None]:
from pymongo import MongoClient
import pprint

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

pipeline = [
    {"$group": {"_id": "$Region", "total_sales": {"$sum": "$Sales"}}}
]

results = collection.aggregate(pipeline)

for result in results:
    pprint.pprint(result)


10. Fetch all distinct values for Ship Mode from the collection

In [None]:
from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

distinct_ship_modes = collection.distinct("Ship Mode")
print(distinct_ship_modes)


11. Count the number of orders for each category

In [None]:
from pymongo import MongoClient
import pprint

client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
collection = db['orders']

pipeline = [
    {"$group": {"_id": "$Category", "order_count": {"$sum": 1}}}
]

results = collection.aggregate(pipeline)

for result in results:
    pprint.pprint(result)
