                                   Theoretical Questions And Answer

#1.What are the key differences between SQL and NoSQL database.
ans:1. Data Model

SQL: Relational model → data stored in tables with rows and columns.
Example: MySQL, PostgreSQL, Oracle.

NoSQL: Non-relational → data stored as documents, key-value pairs, wide-column stores, or graphs.
Example: MongoDB (document), Redis (key-value), Cassandra (column), Neo4j (graph).

2. Schema

SQL: Fixed schema → structure must be defined before inserting data. Changes require migrations.

NoSQL: Flexible schema → fields can vary between records; easy to add new fields.

3. Scalability

SQL: Vertically scalable → scale by increasing CPU, RAM, or storage of a single server.

NoSQL: Horizontally scalable → scale by adding more servers or nodes; good for large distributed datasets.

4. Query Language

SQL: Uses Structured Query Language (SQL) for querying and managing data.

NoSQL: Querying depends on the database type (e.g., MongoDB uses JSON-like queries, Redis uses commands).

5. Transactions & Consistency

SQL: Strong ACID (Atomicity, Consistency, Isolation, Durability) support → reliable for financial and critical data.

NoSQL: Often eventually consistent (BASE model: Basically Available, Soft state, Eventually consistent) → trades off strict consistency for speed and scalability.

6. Use Cases

SQL: Best for structured data with complex relationships and transactions.
Examples: Banking systems, ERP, accounting, e-commerce.

NoSQL: Best for large-scale, unstructured or semi-structured data, flexible schema, or high-velocity data.
Examples: Real-time analytics, social media feeds, IoT data, content management.

7. Examples
Type	SQL Databases	NoSQL Databases
Relational	MySQL, PostgreSQL, Oracle, SQL Server	N/A
Document	N/A	MongoDB, CouchDB
Key-Value	N/A	Redis, DynamoDB
Column	N/A	Cassandra, HBase
Graph	N/A	Neo4j, ArangoDB

#2.What makes MongoDB a good choice for modern applications.
ans:Schema flexibility: MongoDB stores data in JSON-like documents (BSON), so fields can vary between documents.

Scalability: Supports horizontal scaling via sharding.

High performance: Designed for fast reads and writes.

Rich query capabilities: Supports complex queries, indexing, and aggregation.

Cloud-friendly: Works well with cloud-native architectures and microservices.

Developer-friendly: No need for rigid schemas, which speeds up development.


#3.Explain the concept of collections in MongoDB.
ans-A collection is similar to a table in SQL databases.

It’s a grouping of documents (records).

Collections don’t require a predefined schema, so documents in the same collection can have different fields.


#4.How does MongoDB ensure high availability using replication.
ans-MongoDB uses replica sets.

A replica set has:

Primary node: Handles writes and reads by default.

Secondary nodes: Replicate data from the primary.

If the primary fails, an automatic election chooses a new primary.

This ensures continuous availability.

#5.What are the main benefits of MongoDB Atlas.
ans-Fully managed cloud MongoDB service.

Automatic backups and recovery.

Scaling made easy: Both vertical and horizontal.

High availability with built-in replica sets.

Security features like encryption and access controls.

Monitoring and alerting out-of-the-box.


#6.What is the role of indexes in MongoDB, and how do they improve performance.
ans-Indexes allow fast data retrieval without scanning the entire collection.

Common types: single field, compound, text, geospatial.

Without indexes, MongoDB must do a collection scan, which is slow.


#7.Describe the stages of the MongoDB aggregation pipeline.
ans-$match: Filter documents.

$group: Group documents and perform calculations (sum, avg).

$project: Select or reshape fields.

$sort: Order results.

$limit/$skip: Pagination.

$lookup: Join with another collection.

Pipeline stages are executed in order.

#8.What is sharding in MongoDB? How does it differ from replication.
ans-Sharding: Distributes data across multiple servers (horizontal scaling).

Replication: Copies data across nodes for high availability.

Difference: Sharding scales data volume, replication ensures availability and redundancy.

#9.What is PyMongo, and why is it used?
ans-PyMongo is the official Python driver for MongoDB.

Allows Python apps to:

Connect to MongoDB.

Insert, query, update, and delete documents.

Use aggregation, indexes, and transactions.

#10.What are the ACID properties in the context of MongoDB transactions?
ans-Atomicity: All operations in a transaction succeed or fail together.

Consistency: Data remains valid according to rules.

Isolation: Transactions don’t interfere with each other.

Durability: Committed changes are permanent.

MongoDB supports ACID for multi-document transactions since version 4.0.

#11.What is the purpose of MongoDB’s explain() function?
ans-explain() shows query execution details.

Helps identify performance issues.

Shows whether the query used indexes or collection scans.

#12.How does MongoDB handle schema validation.
ans-Uses JSON Schema rules.

You can enforce:

Required fields

Data types

Field patterns

Documents not meeting the rules are rejected.


#13.What is the difference between a primary and a secondary node in a replica set.
ans-Primary: Accepts writes and replicates to secondaries.

Secondary: Copies data from primary; can serve reads if configured.

Failover: If primary fails, a secondary becomes the new primary.


#14.What security mechanisms does MongoDB provide for data protection.
ans-Authentication: Passwords, LDAP, or SCRAM.

Authorization: Role-based access control (RBAC).

Encryption: TLS/SSL for transit, encryption at rest.

Auditing: Logs security-related operations.


#15.Explain the concept of embedded documents and when they should be used.
ans-Embedded documents: Nested documents inside a parent document.

When to use:

One-to-few relationships (e.g., a blog post with comments).

When data is usually accessed together.

Avoid if the embedded document can grow unbounded, which may affect performance.

#16.What is the purpose of MongoDB’s $lookup stage in aggregation
ans-Performs a join-like operation between two collections.

Adds related documents from another collection to the result set.

#17.What are some common use cases for MongoDB.
ans-Content management systems (CMS)

Real-time analytics

IoT applications

E-commerce catalogs

Mobile and web apps

Event logging


#18.What are the advantages of using MongoDB for horizontal scaling.
ans-Sharding distributes data across multiple servers.

Handles large datasets beyond a single machine.

Maintains high availability with replication.


#19.How do MongoDB transactions differ from SQL transactions.
ans-MongoDB supports multi-document ACID transactions (like SQL).

SQL always has ACID by default for table operations.

MongoDB transactions are optional; many apps use single-document atomic writes without transactions.

#20.What are the main differences between capped collections and regular collections.
ans-Capped collections:

Fixed size

Circular, automatically overwrites oldest documents

Fast insert performance

Regular collections:

Grow dynamically

No automatic deletion.


#21.What is the purpose of the $match stage in MongoDB’s aggregation pipeline.
ans-Filters documents based on criteria, like a WHERE clause in SQL.

Often used first to reduce the dataset early for efficiency.


#22.How can you secure access to a MongoDB database.
ans-Enable authentication and RBAC.

Use TLS/SSL encryption for connections.

Enable firewall rules or IP whitelisting.

Keep MongoDB updated and avoid default open ports.


#23.What is MongoDB’s WiredTiger storage engine, and why is it important.
ans-Default storage engine since MongoDB 3.2.

Features:

Document-level locking → high concurrency

Compression → saves disk space

Journaling → ensures durability

Improves both performance and reliability.

                                             Practical Questions And Answer

#1.Write a Python script to load the Superstore dataset from a CSV file into MongoDB

In [7]:
import sys
!{sys.executable} -m pip install pymongo pandas



In [8]:
import pandas as pd
from pymongo import MongoClient

# MongoDB connection details
# IMPORTANT: Replace with your actual MongoDB connection string
# For a local MongoDB instance, it might be: 'mongodb://localhost:27017/'
# For MongoDB Atlas, it will be a string like: 'mongodb+srv://<username>:<password>@<cluster-name>.mongodb.net/<database-name>?retryWrites=true&w=majority'
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, please update this!
database_name = 'superstore_db'
collection_name = 'orders'
csv_file_path = '/content/superstore.csv'

try:
    # Establish connection to MongoDB
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    # Load CSV data into a pandas DataFrame
    # Specify encoding to handle potential errors, 'latin1' is a common fix
    df = pd.read_csv(csv_file_path, encoding='latin1')

    # Convert DataFrame to a list of dictionaries (MongoDB document format)
    # .to_dict('records') converts each row into a dictionary
    data_to_insert = df.to_dict('records')

    # Insert data into MongoDB collection
    if data_to_insert:
        # Clear existing data in the collection if you want to start fresh each time
        # Uncomment the line below if you want to clear the collection before inserting
        # collection.delete_many({})
        insert_result = collection.insert_many(data_to_insert)
        print(f"Successfully inserted {len(insert_result.inserted_ids)} documents into '{collection_name}' collection.")
    else:
        print("No data to insert.")

    # Optional: Verify data count
    print(f"Total documents in '{collection_name}' collection: {collection.count_documents({})}")

except Exception as e:
    print(f"An error occurred: {e}")
finally:
    # Close the MongoDB connection
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

An error occurred: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 695936d59c464fa5f87e6aed, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.


#2.Retrieve and print all documents from the Orders collection

In [10]:
from pymongo import MongoClient

# IMPORTANT: Ensure your MongoDB server is running or mongo_uri is correct.
# Use the same connection details as before
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, update if needed
database_name = 'superstore_db'
collection_name = 'orders'

try:
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    print(f"Retrieving all documents from '{collection_name}' collection...")

    # Retrieve all documents
    documents = collection.find({})

    # Print each document
    for i, doc in enumerate(documents):
        print(f"Document {i+1}: {doc}")
        if i >= 9: # Print only first 10 documents to avoid overwhelming output
            print("... (showing first 10 documents only)")
            break

    print(f"Total documents found in '{collection_name}' collection: {collection.count_documents({})})")

except Exception as e:
    print(f"An error occurred while retrieving documents: {e}")
finally:
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

Retrieving all documents from 'orders' collection...
An error occurred while retrieving documents: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 6959378c9c464fa5f87e91f8, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.


#3.Count and display the total number of documents in the Orders collection.

In [11]:
from pymongo import MongoClient

# IMPORTANT: Ensure your MongoDB server is running and the mongo_uri is correct.
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, update if needed
database_name = 'superstore_db'
collection_name = 'orders'

try:
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    count = collection.count_documents({})  # Count all documents in the collection
    print(f"Total number of documents in '{collection_name}' collection: {count}")

except Exception as e:
    print(f"An error occurred while counting documents: {e}")
finally:
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

An error occurred while counting documents: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 695937e99c464fa5f87e91f9, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.


#4.Write a query to fetch all orders from the "West" region.

In [12]:
from pymongo import MongoClient

# IMPORTANT: Ensure your MongoDB server is running and the mongo_uri is correct.
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, update if needed
database_name = 'superstore_db'
collection_name = 'orders'

try:
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    print(f"Fetching orders from the 'West' region in '{collection_name}' collection...")

    # Query for documents where the 'Region' field is 'West'
    west_region_orders = collection.find({'Region': 'West'})

    # Print each document (limiting to the first 10 to avoid excessive output)
    for i, order in enumerate(west_region_orders):
        print(f"Order {i+1}: {order}")
        if i >= 9:
            print("... (showing first 10 orders from the West region only)")
            break

    # Count total documents for 'West' region
    count_west_orders = collection.count_documents({'Region': 'West'})
    print(f"Total orders found in the 'West' region: {count_west_orders}")

except Exception as e:
    print(f"An error occurred while fetching orders: {e}")
finally:
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

Fetching orders from the 'West' region in 'orders' collection...
An error occurred while fetching orders: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 695938739c464fa5f87e91fa, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.


#5.Write a query to find orders where Sales is greater than 500.

In [13]:
from pymongo import MongoClient

# IMPORTANT: Ensure your MongoDB server is running and the mongo_uri is correct.
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, update if needed
database_name = 'superstore_db'
collection_name = 'orders'

try:
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    print(f"Fetching orders where Sales > 500 from '{collection_name}' collection...")

    # Query for documents where the 'Sales' field is greater than 500
    high_sales_orders = collection.find({'Sales': {'$gt': 500}})

    # Print each document (limiting to the first 10 to avoid excessive output)
    for i, order in enumerate(high_sales_orders):
        print(f"Order {i+1}: {order}")
        if i >= 9:
            print("... (showing first 10 orders with sales > 500 only)")
            break

    # Count total documents for 'Sales' > 500
    count_high_sales_orders = collection.count_documents({'Sales': {'$gt': 500}})
    print(f"Total orders found with Sales > 500: {count_high_sales_orders}")

except Exception as e:
    print(f"An error occurred while fetching orders: {e}")
finally:
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

Fetching orders where Sales > 500 from 'orders' collection...
An error occurred while fetching orders: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 695938cc9c464fa5f87e91fb, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.


#6.Fetch the top 3 orders with the highest Profit.

In [14]:
from pymongo import MongoClient

# IMPORTANT: Ensure your MongoDB server is running and the mongo_uri is correct.
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, update if needed
database_name = 'superstore_db'
collection_name = 'orders'

try:
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    print(f"Fetching top 3 orders with the highest Profit from '{collection_name}' collection...")

    # Query for documents, sort by 'Profit' in descending order, and limit to 3
    top_profit_orders = collection.find({}).sort('Profit', -1).limit(3)

    # Print each document
    for i, order in enumerate(top_profit_orders):
        print(f"Top Order {i+1}: {order}")

except Exception as e:
    print(f"An error occurred while fetching orders: {e}")
finally:
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

Fetching top 3 orders with the highest Profit from 'orders' collection...
An error occurred while fetching orders: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 695939199c464fa5f87e91fc, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.


#7.Update all orders with Ship Mode as "First Class" to "Premium Class.

In [15]:
from pymongo import MongoClient

# IMPORTANT: Ensure your MongoDB server is running and the mongo_uri is correct.
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, update if needed
database_name = 'superstore_db'
collection_name = 'orders'

try:
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    print(f"Attempting to update 'Ship Mode' from 'First Class' to 'Premium Class' in '{collection_name}' collection...")

    # Query to find documents with 'Ship Mode' as 'First Class'
    query = {'Ship Mode': 'First Class'}

    # Update operation to set 'Ship Mode' to 'Premium Class'
    new_values = {'$set': {'Ship Mode': 'Premium Class'}}

    # Perform the update
    update_result = collection.update_many(query, new_values)

    print(f"Successfully updated {update_result.modified_count} documents.")

    # Optional: Verify a few updated documents (if any were modified)
    if update_result.modified_count > 0:
        print("Verifying a few updated documents...")
        updated_docs = collection.find({'Ship Mode': 'Premium Class'}).limit(5)
        for i, doc in enumerate(updated_docs):
            print(f"Updated Document {i+1}: {doc['Ship Mode']}")

except Exception as e:
    print(f"An error occurred while updating orders: {e}")
finally:
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

Attempting to update 'Ship Mode' from 'First Class' to 'Premium Class' in 'orders' collection...
An error occurred while updating orders: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 6959396e9c464fa5f87e91fd, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.


#8.Delete all orders where Sales is less than 50.

In [17]:
from pymongo import MongoClient

# IMPORTANT: Ensure your MongoDB server is running and the mongo_uri is correct.
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, update if needed
database_name = 'superstore_db'
collection_name = 'orders'

try:
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    print(f"Attempting to delete orders where Sales < 50 from '{collection_name}' collection...")

    # Query to find documents with 'Sales' less than 50
    query = {'Sales': {'$lt': 50}}

    # Perform the delete operation
    delete_result = collection.delete_many(query)

    print(f"Successfully deleted {delete_result.deleted_count} documents.")

    # Optional: Verify remaining document count
    print(f"Total documents remaining in '{collection_name}' collection: {collection.count_documents({})})")

except Exception as e:
    print(f"An error occurred while deleting orders: {e}")
finally:
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

Attempting to delete orders where Sales < 50 from 'orders' collection...
An error occurred while deleting orders: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 69593a089c464fa5f87e91fe, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.


#9.Use aggregation to group orders by Region and calculate total sales per region

In [18]:
from pymongo import MongoClient

# IMPORTANT: Ensure your MongoDB server is running and the mongo_uri is correct.
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, update if needed
database_name = 'superstore_db'
collection_name = 'orders'

try:
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    print(f"Aggregating total sales by region in '{collection_name}' collection...")

    # Aggregation pipeline to group by Region and calculate total sales
    pipeline = [
        {
            '$group': {
                '_id': '$Region',
                'TotalSales': { '$sum': '$Sales' }
            }
        },
        {
            '$sort': { 'TotalSales': -1 } # Sort by TotalSales in descending order
        }
    ]

    # Execute the aggregation pipeline
    results = collection.aggregate(pipeline)

    print("Total Sales per Region:")
    for result in results:
        print(f"  Region: {result['_id']}, Total Sales: {result['TotalSales']:.2f}")

except Exception as e:
    print(f"An error occurred during aggregation: {e}")
finally:
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

Aggregating total sales by region in 'orders' collection...
An error occurred during aggregation: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 69593a739c464fa5f87e91ff, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.


#10.Fetch all distinct values for Ship Mode from the collection

In [19]:
from pymongo import MongoClient

# IMPORTANT: Ensure your MongoDB server is running and the mongo_uri is correct.
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, update if needed
database_name = 'superstore_db'
collection_name = 'orders'

try:
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    print(f"Fetching distinct 'Ship Mode' values from '{collection_name}' collection...")

    # Use distinct() to get all unique values for the 'Ship Mode' field
    distinct_ship_modes = collection.distinct('Ship Mode')

    print("Distinct Ship Modes:")
    for mode in distinct_ship_modes:
        print(f"- {mode}")

except Exception as e:
    print(f"An error occurred while fetching distinct ship modes: {e}")
finally:
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

Fetching distinct 'Ship Mode' values from 'orders' collection...
An error occurred while fetching distinct ship modes: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 69593ae49c464fa5f87e9200, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.


#11.Count the number of orders for each category.

In [20]:
from pymongo import MongoClient

# IMPORTANT: Ensure your MongoDB server is running and the mongo_uri is correct.
mongo_uri = 'mongodb://localhost:27017/' # Placeholder, update if needed
database_name = 'superstore_db'
collection_name = 'orders'

try:
    client = MongoClient(mongo_uri)
    db = client[database_name]
    collection = db[collection_name]

    print(f"Counting orders by category in '{collection_name}' collection...")

    # Aggregation pipeline to group by Category and count orders
    pipeline = [
        {
            '$group': {
                '_id': '$Category',
                'OrderCount': { '$sum': 1 }
            }
        },
        {
            '$sort': { 'OrderCount': -1 } # Sort by OrderCount in descending order
        }
    ]

    # Execute the aggregation pipeline
    results = collection.aggregate(pipeline)

    print("Order Count per Category:")
    for result in results:
        print(f"  Category: {result['_id']}, Order Count: {result['OrderCount']}")

except Exception as e:
    print(f"An error occurred during aggregation: {e}")
finally:
    if 'client' in locals() and client:
        client.close()
        print("MongoDB connection closed.")

Counting orders by category in 'orders' collection...
An error occurred during aggregation: localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 69593b479c464fa5f87e9201, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]>
MongoDB connection closed.
