# ***MongoDB***

1. What are the key differences between SQL and NoSQL databases ?
-> SQL databases are relational, schema-based, and use structured query language, ideal for complex joins. NoSQL databases are non-relational, schema-less (flexible), and handle unstructured/semi-structured data, excelling in scalability and performance for large datasets.
2. What makes MongoDB a good choice for modern applications ?
-> MongoDB's flexible schema, horizontal scalability, high performance, document model (JSON-like), and rich query language make it ideal for rapidly evolving, data-intensive modern applications requiring agility.
3. Explain the concept of collections in MongoDB?
-> Collections in MongoDB are analogous to tables in relational databases. They are groups of BSON documents (records) that do not enforce a rigid schema, meaning documents within the same collection can have different fields.
4. How does MongoDB ensure high availability using replication ?
-> MongoDB uses replica sets, groups of mongod instances that maintain the same data set. One node is primary (handles writes), and others are secondaries (replicate data). If the primary fails, an election promotes a secondary to primary, ensuring continuous operation.
5. What are the main benefits of MongoDB Atlas ?
-> MongoDB Atlas offers a fully managed, cloud-based MongoDB service with automated scaling, backups, monitoring, and security, simplifying database deployment and management.
6. What is the role of indexes in MongoDB, and how do they improve performance ?
-> Indexes in MongoDB are special data structures that store a small portion of the collection's data in an easy-to-traverse form. They improve query performance by allowing MongoDB to quickly locate specific data without scanning the entire collection.
7. Describe the stages of the MongoDB aggregation pipeline ?
-> The MongoDB aggregation pipeline processes documents through a series of stages (e.g., $match, $group, $project, $sort, $limit) to transform and analyze data, outputting aggregated results.

8. What is sharding in MongoDB? How does it differ from replication ?
-> Sharding is MongoDB's method for horizontal scaling by distributing data across multiple machines (shards). Replication provides high availability and data redundancy by maintaining copies of data; sharding distributes data to scale write capacity and manage larger datasets.
9. What is PyMongo, and why is it used ?
-> PyMongo is the official MongoDB driver for Python. It's used to interact with MongoDB databases from Python applications, allowing developers to perform CRUD operations, define indexes, and execute aggregation pipelines.
10. What are the ACID properties in the context of MongoDB transactions ?
-> MongoDB supports multi-document ACID (Atomicity, Consistency, Isolation, Durability) transactions across replica sets, ensuring data integrity for complex operations involving multiple documents.
11. What is the purpose of MongoDB’s explain() function ?
-> The explain() function in MongoDB provides detailed information about how a query is executed, including index usage, scan details, and execution time, helping optimize query performance.
12. How does MongoDB handle schema validation ?
-> MongoDB supports schema validation at the collection level, allowing you to enforce rules on document structure, data types, and required fields using JSON Schema, ensuring data consistency.
13. What is the difference between a primary and a secondary node in a replica set ?
-> A primary node in a replica set receives all write operations and replicates them to secondaries. Secondary nodes maintain a copy of the primary's data and can serve read queries (if configured) but cannot receive write operations directly.
14. What security mechanisms does MongoDB provide for data protection ?
-> MongoDB offers authentication (SCRAM-SHA-1, x.509, LDAP, Kerberos), authorization (role-based access control), encryption (at rest and in transit via TLS/SSL), auditing, and IP whitelisting for data protection.
15. Explain the concept of embedded documents and when they should be used ?
-> Embedded documents are sub-documents nested within a parent document. They are used when data has a "contains" or "one-to-one/one-to-few" relationship, avoiding joins and improving read performance for frequently accessed related data.
16. What is the purpose of MongoDB’s $lookup stage in aggregation ?
-> The $lookup stage performs a left outer join to an unsharded collection in the same database, allowing you to combine documents from two collections based on specified fields.
17. What are some common use cases for MongoDB ?
-> Common use cases include content management systems, e-commerce, real-time analytics, mobile applications, IoT, gaming, and personalized user experiences due to its flexibility and scalability.
18. What are the advantages of using MongoDB for horizontal scaling ?
-> MongoDB's sharding mechanism allows for horizontal scaling by distributing data and load across multiple servers, enabling applications to handle massive amounts of data and high traffic without a single point of failure.
19.  How do MongoDB transactions differ from SQL transactions ?
-> MongoDB transactions are multi-document and can span multiple operations and collections within a single replica set, ensuring ACID properties. Traditional SQL transactions typically operate on rows within a relational schema.
20.  What are the main differences between capped collections and regular collections ?
-> Capped collections are fixed-size collections that maintain insertion order and automatically remove the oldest documents when the maximum size is reached. Regular collections have no size limit and do not enforce insertion order.
21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline ?
-> The $match stage filters documents to pass only those that match the specified query conditions to the next stage in the aggregation pipeline, similar to a WHERE clause in SQL.
22. How can you secure access to a MongoDB database ?
-> Secure access by enabling authentication, creating users with specific roles (least privilege), enabling network encryption (TLS/SSL), configuring firewalls (IP whitelisting), and regularly auditing access logs.
23. What is MongoDB’s WiredTiger storage engine, and why is it important?
-> WiredTiger is MongoDB's default storage engine. It's important for its performance benefits, including document-level concurrency control (optimistic concurrency), compression, and journaling, which improve efficiency and data integrity.

1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB ?

In [33]:
!pip install pymongo
import pandas as pd
from pymongo import MongoClient
import json

MONGO_URI = "mongodb+srv://prabalroy203:8MBSH3NUNq9vC8Os@cluster0.ybolhv5.mongodb.net/"
DATABASE_NAME = "superstore_db"
COLLECTION_NAME = "orders"
CSV_FILE_PATH = "superstore.csv"

def load_csv_to_mongodb(csv_file, mongo_uri, db_name, collection_name):

    print(f"Attempting to load data from '{csv_file}' into MongoDB...")
    try:
        df = pd.read_csv(csv_file, encoding='latin1')
        print(f"Successfully read {len(df)} rows from '{csv_file}'.")
        data_to_insert = df.to_dict(orient='records')
        print(f"Converted DataFrame to {len(data_to_insert)} documents for insertion.")
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")

        if data_to_insert:
            result = collection.insert_many(data_to_insert)
            print(f"Successfully inserted {len(result.inserted_ids)} documents into MongoDB.")
        else:
            print("No data to insert.")

    except FileNotFoundError:
        print(f"Error: The CSV file '{csv_file}' was not found. Please check the path.")
    except Exception as e:
        print(f"An error occurred: {e}")
    finally:
        if 'client' in locals() and client:
            client.close()
            print("MongoDB connection closed.")

if __name__ == "__main__":
    load_csv_to_mongodb(CSV_FILE_PATH, MONGO_URI, DATABASE_NAME, COLLECTION_NAME)




Attempting to load data from 'superstore.csv' into MongoDB...
Successfully read 9994 rows from 'superstore.csv'.
Converted DataFrame to 9994 documents for insertion.
Connected to MongoDB: Database 'superstore_db', Collection 'orders'.
Successfully inserted 9994 documents into MongoDB.
MongoDB connection closed.


2. Retrieve and print all documents from the Orders collection ?

In [32]:
def retrieve_all_documents(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to retrieve all documents from '{collection_name}' collection...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        documents = collection.find({})
        count = 0
        for doc in documents:
            print(doc)
            count += 1
        print(f"Successfully retrieved and printed {count} documents.")

    except Exception as e:
        print(f"An error occurred during document retrieval: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after retrieval.")


if __name__ == "__main__":
    retrieve_all_documents(MONGO_URI, DATABASE_NAME, COLLECTION_NAME)


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
{'_id': ObjectId('687ff256c10ebde3de2d039c'), 'Row ID': 265, 'Order ID': 'CA-2016-125318', 'Order Date': '6/6/2016', 'Ship Date': '6/13/2016', 'Ship Mode': 'Standard Class', 'Customer ID': 'RC-19825', 'Customer Name': 'Roy Collins', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Chicago', 'State': 'Illinois', 'Postal Code': 60610, 'Region': 'Central', 'Product ID': 'TEC-PH-10001433', 'Category': 'Technology', 'Sub-Category': 'Phones', 'Product Name': 'Cisco Small Business SPA 502G VoIP phone', 'Sales': 328.224, 'Quantity': 4, 'Discount': 0.2, 'Profit': 28.7196}
{'_id': ObjectId('687ff256c10ebde3de2d039d'), 'Row ID': 266, 'Order ID': 'CA-2015-155040', 'Order Date': '11/10/2015', 'Ship Date': '11/15/2015', 'Ship Mode': 'Standard Class', 'Customer ID': 'AH-10210', 'Customer Name': 'Alan Hwang', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Brentwood', 'State': 'California', 'Postal Code': 94513, 'R

3. Count and display the total number of documents in the Orders collection ?

In [31]:
def retrieve_all_documents(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to count documents in '{collection_name}' collection...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        count = collection.count_documents({})
        print(f"Total number of documents in '{collection_name}': {count}")

    except Exception as e:
        print(f"An error occurred during document retrieval: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after retrieval.")


if __name__ == "__main__":
    retrieve_all_documents(MONGO_URI, DATABASE_NAME, COLLECTION_NAME)



Attempting to count documents in 'orders' collection...
Connected to MongoDB: Database 'superstore_db', Collection 'orders'.
Total number of documents in 'orders': 5145
MongoDB connection closed after retrieval.


4. Write a query to fetch all orders from the "West" region.

In [30]:
def retrieve_all_documents(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to retrieve documents from '{collection_name}' collection for the 'West' region...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        documents = collection.find({"Region": "West"})
        count = 0
        for doc in documents:
            print(doc)
            count += 1
        print(f"Successfully retrieved and printed {count} documents from the 'West' region.")

    except Exception as e:
        print(f"An error occurred during document retrieval: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after retrieval.")


if __name__ == "__main__":
    retrieve_all_documents(MONGO_URI, DATABASE_NAME, COLLECTION_NAME)



Attempting to retrieve documents from 'orders' collection for the 'West' region...
Connected to MongoDB: Database 'superstore_db', Collection 'orders'.
{'_id': ObjectId('687ff256c10ebde3de2d029b'), 'Row ID': 8, 'Order ID': 'CA-2014-115812', 'Order Date': '6/9/2014', 'Ship Date': '6/14/2014', 'Ship Mode': 'Standard Class', 'Customer ID': 'BH-11710', 'Customer Name': 'Brosina Hoffman', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Los Angeles', 'State': 'California', 'Postal Code': 90032, 'Region': 'West', 'Product ID': 'TEC-PH-10002275', 'Category': 'Technology', 'Sub-Category': 'Phones', 'Product Name': 'Mitel 5320 IP Phone VoIP phone', 'Sales': 907.152, 'Quantity': 6, 'Discount': 0.2, 'Profit': 90.7152}
{'_id': ObjectId('687ff256c10ebde3de2d029d'), 'Row ID': 10, 'Order ID': 'CA-2014-115812', 'Order Date': '6/9/2014', 'Ship Date': '6/14/2014', 'Ship Mode': 'Standard Class', 'Customer ID': 'BH-11710', 'Customer Name': 'Brosina Hoffman', 'Segment': 'Consumer', 'Country': '

5. Write a query to find orders where Sales is greater than 500<

In [29]:
def retrieve_all_documents(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to retrieve documents from '{collection_name}' collection where Sales > 500...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        documents = collection.find({"Sales": {"$gt": 500}})
        count = 0
        for doc in documents:
            print(doc)
            count += 1
        print(f"Successfully retrieved and printed {count} documents where Sales > 500.")

    except Exception as e:
        print(f"An error occurred during document retrieval: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after retrieval.")


if __name__ == "__main__":
    retrieve_all_documents(MONGO_URI, DATABASE_NAME, COLLECTION_NAME)



Attempting to retrieve documents from 'orders' collection where Sales > 500...
Connected to MongoDB: Database 'superstore_db', Collection 'orders'.
{'_id': ObjectId('687ff256c10ebde3de2d0295'), 'Row ID': 2, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-CH-10000454', 'Category': 'Furniture', 'Sub-Category': 'Chairs', 'Product Name': 'Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back', 'Sales': 731.94, 'Quantity': 3, 'Discount': 0.0, 'Profit': 219.582}
{'_id': ObjectId('687ff256c10ebde3de2d0297'), 'Row ID': 4, 'Order ID': 'US-2015-108966', 'Order Date': '10/11/2015', 'Ship Date': '10/18/2015', 'Ship Mode': 'Standard Class', 'Customer ID': 'SO-20335', 'Customer Name': "Sean O'Donnell", 'Segment': 'Consum

6. Fetch the top 3 orders with the highest Profit .

In [28]:
def retrieve_all_documents(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to retrieve the top 3 documents from '{collection_name}' collection with the highest Profit...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        documents = collection.find({}).sort("Profit", -1).limit(3)
        count = 0
        for doc in documents:
            print(doc)
            count += 1
        print(f"Successfully retrieved and printed {count} documents with the highest Profit.")

    except Exception as e:
        print(f"An error occurred during document retrieval: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after retrieval.")


if __name__ == "__main__":
    retrieve_all_documents(MONGO_URI, DATABASE_NAME, COLLECTION_NAME)



Attempting to retrieve the top 3 documents from 'orders' collection with the highest Profit...
Connected to MongoDB: Database 'superstore_db', Collection 'orders'.
{'_id': ObjectId('687ff256c10ebde3de2d1d3e'), 'Row ID': 6827, 'Order ID': 'CA-2016-118689', 'Order Date': '10/2/2016', 'Ship Date': '10/9/2016', 'Ship Mode': 'Standard Class', 'Customer ID': 'TC-20980', 'Customer Name': 'Tamara Chand', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Lafayette', 'State': 'Indiana', 'Postal Code': 47905, 'Region': 'Central', 'Product ID': 'TEC-CO-10004722', 'Category': 'Technology', 'Sub-Category': 'Copiers', 'Product Name': 'Canon imageCLASS 2200 Advanced Copier', 'Sales': 17499.95, 'Quantity': 5, 'Discount': 0.0, 'Profit': 8399.976}
{'_id': ObjectId('687ff256c10ebde3de2d226d'), 'Row ID': 8154, 'Order ID': 'CA-2017-140151', 'Order Date': '3/23/2017', 'Ship Date': '3/25/2017', 'Ship Mode': 'Premium Class', 'Customer ID': 'RB-19360', 'Customer Name': 'Raymond Buch', 'Segment': 'Co

7.  Update all orders with Ship Mode as "First Class" to "Premium Class.

In [27]:
def retrieve_all_documents(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to retrieve the top 3 documents from '{collection_name}' collection with the highest Profit...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        documents = collection.find({}).sort("Profit", -1).limit(3)
        count = 0
        for doc in documents:
            print(doc)
            count += 1
        print(f"Successfully retrieved and printed {count} documents with the highest Profit.")

    except Exception as e:
        print(f"An error occurred during document retrieval: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after retrieval.")

def update_ship_mode(mongo_uri, db_name, collection_name, old_mode, new_mode):
    print(f"\nAttempting to update documents in '{collection_name}' collection from '{old_mode}' to '{new_mode}'...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        query = {"Ship Mode": old_mode}
        new_values = {"$set": {"Ship Mode": new_mode}}
        result = collection.update_many(query, new_values)
        print(f"Successfully updated {result.modified_count} documents from '{old_mode}' to '{new_mode}'.")

    except Exception as e:
        print(f"An error occurred during document update: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after update.")


if __name__ == "__main__":
    update_ship_mode(MONGO_URI, DATABASE_NAME, COLLECTION_NAME, "First Class", "Premium Class")


Attempting to update documents in 'orders' collection from 'First Class' to 'Premium Class'...
Connected to MongoDB: Database 'superstore_db', Collection 'orders'.
Successfully updated 0 documents from 'First Class' to 'Premium Class'.
MongoDB connection closed after update.


8. Delete all orders where Sales is less than 50.

In [26]:
def retrieve_all_documents(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to retrieve the top 3 documents from '{collection_name}' collection with the highest Profit...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        documents = collection.find({}).sort("Profit", -1).limit(3)
        count = 0
        for doc in documents:
            print(doc)
            count += 1
        print(f"Successfully retrieved and printed {count} documents with the highest Profit.")

    except Exception as e:
        print(f"An error occurred during document retrieval: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after retrieval.")

def update_ship_mode(mongo_uri, db_name, collection_name, old_mode, new_mode):
    print(f"\nAttempting to update documents in '{collection_name}' collection from '{old_mode}' to '{new_mode}'...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        query = {"Ship Mode": old_mode}
        new_values = {"$set": {"Ship Mode": new_mode}}
        result = collection.update_many(query, new_values)
        print(f"Successfully updated {result.modified_count} documents from '{old_mode}' to '{new_mode}'.")

    except Exception as e:
        print(f"An error occurred during document update: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after update.")

def delete_orders_by_sales(mongo_uri, db_name, collection_name, max_sales):
    print(f"\nAttempting to delete documents from '{collection_name}' collection where Sales < {max_sales}...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        query = {"Sales": {"$lt": max_sales}}
        result = collection.delete_many(query)
        print(f"Successfully deleted {result.deleted_count} documents where Sales < {max_sales}.")

    except Exception as e:
        print(f"An error occurred during document deletion: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after deletion.")


if __name__ == "__main__":
    delete_orders_by_sales(MONGO_URI, DATABASE_NAME, COLLECTION_NAME, 50)



Attempting to delete documents from 'orders' collection where Sales < 50...
Connected to MongoDB: Database 'superstore_db', Collection 'orders'.
Successfully deleted 0 documents where Sales < 50.
MongoDB connection closed after deletion.


9. Use aggregation to group orders by Region and calculate total sales per region.

In [25]:
def retrieve_all_documents(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to retrieve the top 3 documents from '{collection_name}' collection with the highest Profit...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        documents = collection.find({}).sort("Profit", -1).limit(3)
        count = 0
        for doc in documents:
            print(doc)
            count += 1
        print(f"Successfully retrieved and printed {count} documents with the highest Profit.")

    except Exception as e:
        print(f"An error occurred during document retrieval: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after retrieval.")

def update_ship_mode(mongo_uri, db_name, collection_name, old_mode, new_mode):
    print(f"\nAttempting to update documents in '{collection_name}' collection from '{old_mode}' to '{new_mode}'...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        query = {"Ship Mode": old_mode}
        new_values = {"$set": {"Ship Mode": new_mode}}
        result = collection.update_many(query, new_values)
        print(f"Successfully updated {result.modified_count} documents from '{old_mode}' to '{new_mode}'.")

    except Exception as e:
        print(f"An error occurred during document update: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after update.")

def delete_orders_by_sales(mongo_uri, db_name, collection_name, max_sales):
    print(f"\nAttempting to delete documents from '{collection_name}' collection where Sales < {max_sales}...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        query = {"Sales": {"$lt": max_sales}}
        result = collection.delete_many(query)
        print(f"Successfully deleted {result.deleted_count} documents where Sales < {max_sales}.")

    except Exception as e:
        print(f"An error occurred during document deletion: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after deletion.")

def aggregate_sales_by_region(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to group orders by Region and calculate total sales in '{collection_name}' collection...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        pipeline = [
            {"$group": {"_id": "$Region", "totalSales": {"$sum": "$Sales"}}},
            {"$sort": {"totalSales": -1}}
        ]
        results = collection.aggregate(pipeline)

        print("Total Sales by Region:")
        for doc in results:
            print(f"  Region: {doc['_id']}, Total Sales: {doc['totalSales']:.2f}")

    except Exception as e:
        print(f"An error occurred during aggregation: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after aggregation.")


if __name__ == "__main__":
    aggregate_sales_by_region(MONGO_URI, DATABASE_NAME, COLLECTION_NAME)



Attempting to group orders by Region and calculate total sales in 'orders' collection...
Connected to MongoDB: Database 'superstore_db', Collection 'orders'.
Total Sales by Region:
  Region: West, Total Sales: 694686.62
  Region: East, Total Sales: 651137.70
  Region: Central, Total Sales: 479611.85
  Region: South, Total Sales: 376023.31
MongoDB connection closed after aggregation.


10. Fetch all distinct values for Ship Mode from the collection.

In [24]:
def retrieve_all_documents(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to retrieve the top 3 documents from '{collection_name}' collection with the highest Profit...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        documents = collection.find({}).sort("Profit", -1).limit(3)
        count = 0
        for doc in documents:
            print(doc)
            count += 1
        print(f"Successfully retrieved and printed {count} documents with the highest Profit.")

    except Exception as e:
        print(f"An error occurred during document retrieval: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after retrieval.")

def update_ship_mode(mongo_uri, db_name, collection_name, old_mode, new_mode):
    print(f"\nAttempting to update documents in '{collection_name}' collection from '{old_mode}' to '{new_mode}'...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        query = {"Ship Mode": old_mode}
        new_values = {"$set": {"Ship Mode": new_mode}}
        result = collection.update_many(query, new_values)
        print(f"Successfully updated {result.modified_count} documents from '{old_mode}' to '{new_mode}'.")

    except Exception as e:
        print(f"An error occurred during document update: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after update.")

def delete_orders_by_sales(mongo_uri, db_name, collection_name, max_sales):
    print(f"\nAttempting to delete documents from '{collection_name}' collection where Sales < {max_sales}...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        query = {"Sales": {"$lt": max_sales}}
        result = collection.delete_many(query)
        print(f"Successfully deleted {result.deleted_count} documents where Sales < {max_sales}.")

    except Exception as e:
        print(f"An error occurred during document deletion: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after deletion.")

def aggregate_sales_by_region(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to group orders by Region and calculate total sales in '{collection_name}' collection...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        pipeline = [
            {"$group": {"_id": "$Region", "totalSales": {"$sum": "$Sales"}}},
            {"$sort": {"totalSales": -1}}
        ]
        results = collection.aggregate(pipeline)

        print("Total Sales by Region:")
        for doc in results:
            print(f"  Region: {doc['_id']}, Total Sales: {doc['totalSales']:.2f}")

    except Exception as e:
        print(f"An error occurred during aggregation: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after aggregation.")

def get_distinct_ship_modes(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to fetch distinct 'Ship Mode' values from '{collection_name}' collection...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        distinct_modes = collection.distinct("Ship Mode")

        print("Distinct Ship Modes:")
        for mode in distinct_modes:
            print(f"  - {mode}")

    except Exception as e:
        print(f"An error occurred while fetching distinct Ship Modes: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after fetching distinct Ship Modes.")


if __name__ == "__main__":
    get_distinct_ship_modes(MONGO_URI, DATABASE_NAME, COLLECTION_NAME)



Attempting to fetch distinct 'Ship Mode' values from 'orders' collection...
Connected to MongoDB: Database 'superstore_db', Collection 'orders'.
Distinct Ship Modes:
  - Premium Class
  - Same Day
  - Second Class
  - Standard Class
MongoDB connection closed after fetching distinct Ship Modes.


11.  Count the number of orders for each category.

In [23]:
def retrieve_all_documents(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to retrieve the top 3 documents from '{collection_name}' collection with the highest Profit...")
    client = None
    try:

        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")


        documents = collection.find({}).sort("Profit", -1).limit(3)


        count = 0
        for doc in documents:
            print(doc)
            count += 1
        print(f"Successfully retrieved and printed {count} documents with the highest Profit.")

    except Exception as e:
        print(f"An error occurred during document retrieval: {e}")
    finally:

        if client:
            client.close()
            print("MongoDB connection closed after retrieval.")

def update_ship_mode(mongo_uri, db_name, collection_name, old_mode, new_mode):
    print(f"\nAttempting to update documents in '{collection_name}' collection from '{old_mode}' to '{new_mode}'...")
    client = None
    try:

        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        query = {"Ship Mode": old_mode}
        new_values = {"$set": {"Ship Mode": new_mode}}
        result = collection.update_many(query, new_values)
        print(f"Successfully updated {result.modified_count} documents from '{old_mode}' to '{new_mode}'.")
    except Exception as e:
        print(f"An error occurred during document update: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after update.")

def delete_orders_by_sales(mongo_uri, db_name, collection_name, max_sales):
    print(f"\nAttempting to delete documents from '{collection_name}' collection where Sales < {max_sales}...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        query = {"Sales": {"$lt": max_sales}}
        result = collection.delete_many(query)
        print(f"Successfully deleted {result.deleted_count} documents where Sales < {max_sales}.")
    except Exception as e:
        print(f"An error occurred during document deletion: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after deletion.")
def aggregate_sales_by_region(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to group orders by Region and calculate total sales in '{collection_name}' collection...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        pipeline = [
            {"$group": {"_id": "$Region", "totalSales": {"$sum": "$Sales"}}},
            {"$sort": {"totalSales": -1}}
        ]
        results = collection.aggregate(pipeline)
        print("Total Sales by Region:")
        for doc in results:
            print(f"  Region: {doc['_id']}, Total Sales: {doc['totalSales']:.2f}")
    except Exception as e:
        print(f"An error occurred during aggregation: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after aggregation.")
def get_distinct_ship_modes(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to fetch distinct 'Ship Mode' values from '{collection_name}' collection...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        distinct_modes = collection.distinct("Ship Mode")
        print("Distinct Ship Modes:")
        for mode in distinct_modes:
            print(f"  - {mode}")
    except Exception as e:
        print(f"An error occurred while fetching distinct Ship Modes: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after fetching distinct Ship Modes.")
def count_orders_by_category(mongo_uri, db_name, collection_name):
    print(f"\nAttempting to count orders by Category in '{collection_name}' collection...")
    client = None
    try:
        client = MongoClient(mongo_uri)
        db = client[db_name]
        collection = db[collection_name]
        print(f"Connected to MongoDB: Database '{db_name}', Collection '{collection_name}'.")
        pipeline = [
            {"$group": {"_id": "$Category", "count": {"$sum": 1}}},
            {"$sort": {"count": -1}}
        ]
        results = collection.aggregate(pipeline)
        print("Order Count by Category:")
        for doc in results:
            print(f"  Category: {doc['_id']}, Count: {doc['count']}")
    except Exception as e:
        print(f"An error occurred during aggregation: {e}")
    finally:
        if client:
            client.close()
            print("MongoDB connection closed after aggregation.")
if __name__ == "__main__":
    count_orders_by_category(MONGO_URI, DATABASE_NAME, COLLECTION_NAME)



Attempting to count orders by Category in 'orders' collection...
Connected to MongoDB: Database 'superstore_db', Collection 'orders'.
Order Count by Category:
  Category: Office Supplies, Count: 2076
  Category: Furniture, Count: 1573
  Category: Technology, Count: 1496
MongoDB connection closed after aggregation.
