#Theoretical Questions


Ans1:  Key differences between SQL and NoSQL databases

Data Model: SQL uses structured tables with rows & columns; NoSQL uses flexible models (documents, key-value, graph, columnar).

Schema: SQL is schema-based (fixed structure), NoSQL is schema-less (dynamic fields).

Scalability: SQL scales vertically (adding more resources to one machine), NoSQL scales horizontally (adding more machines).

Transactions: SQL fully supports ACID transactions; NoSQL often prioritizes availability and partition tolerance (BASE), though MongoDB supports ACID for multi-document transactions.

Query Language: SQL uses SQL syntax; MongoDB uses a query API in JSON-like format.

Ans2:Flexible schema for evolving requirements.

Horizontal scaling via sharding.

Rich query capabilities with JSON-like syntax.

Native support for geospatial queries, full-text search, and aggregation pipelines.

Strong integration with modern programming languages.

Cloud-based management via MongoDB Atlas.

Ans3: A collection is the equivalent of a table in SQL but without a fixed schema. It stores documents (BSON objects) and can hold documents with different fields.



Ans4:MongoDB uses replica sets—a group of MongoDB servers containing:

Primary node (handles writes)

Secondary nodes (replicate data, handle reads if enabled)

Automatic failover if the primary goes down.

Ans5:Main benefits of MongoDB Atlas

Fully managed cloud MongoDB service.

Automated backups and scaling.

Built-in monitoring and alerts.

Global cluster deployment.

Security features like encryption and access control.

Ans6: Indexes allow MongoDB to quickly locate data without scanning every document in a collection, improving query performance. Common types include single-field, compound, text, and geospatial indexes.

Ans7: Stages of MongoDB aggregation pipeline

$match → Filter documents.

$group → Group by fields, apply aggregations.

$project → Shape output fields.

$sort → Sort results.

$limit / $skip → Pagination.

$lookup → Join with another collection.



Ans8:

Sharding: Distributes data across multiple machines for horizontal scaling.

Replication: Duplicates data across servers for redundancy and high availability.

Sharding = scalability, Replication = availability.

Ans9: PyMongo is the official Python driver for MongoDB. It is used to connect, query, and manipulate MongoDB databases from Python applications.

Ans10: Atomicity – All or nothing execution.

Consistency – Maintains valid data state.

Isolation – Prevents interference between concurrent transactions.

Durability – Changes persist after commit.

Ans11: The explain() function in MongoDB shows how the database will execute a query, including index usage and execution time—helpful for optimization.



Ans12: MongoDB supports JSON Schema validation rules to enforce constraints on documents before insertion or update.



Ans13: Primary: Accepts writes and replicates changes to secondaries.

Secondary: Maintains a copy of primary data, can handle read queries if configured.



Ans14: Authentication (SCRAM, LDAP, Kerberos).

Authorization (role-based access control).

TLS/SSL encryption in transit.

Encryption at rest.

IP whitelisting and firewall rules.

Ans15: A way to nest related data inside a document, reducing the need for joins. Use when data is tightly coupled and queried together frequently.



Ans16: Performs a left outer join between documents in the aggregation pipeline and another collection.



Ans17: Content management systems.

Real-time analytics.

IoT data storage.

E-commerce catalogs.

Social media apps.



Ans18: Sharding allows distributing data and load across servers.

Handles high read/write throughput.

Supports massive datasets without a single-server bottleneck.

Ans19: harding allows distributing data and load across servers.

Handles high read/write throughput.

Supports massive datasets without a single-server bottleneck.

Ans20: Capped: Fixed size, overwrites oldest data when full, maintains insertion order, fast writes.

Regular: Unlimited size, data deletion not automatic.

Ans21: Filters documents early in the pipeline to reduce data processed in later stages—improves performance.

Ans22: Enable authentication and authorization.

Use strong passwords & role-based access control.

Enable TLS/SSL encryption.

Restrict network access to trusted IPs.



Ans23: Default storage engine in MongoDB.

Supports document-level concurrency.

Uses compression to reduce storage.

Provides checkpointing for crash recovery.

# Practical Questions

In [4]:
!pip install pymongo

Collecting pymongo
  Downloading pymongo-4.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.7.0-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.6/313.6 kB[0m [31m17.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.14.0


In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import pandas as pd
from pymongo import MongoClient

# 1. Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")  # Change URI if using MongoDB Atlas
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

# Load CSV file
df = pd.read_csv("/mnt/data/superstore.csv")

# Insert data into MongoDB
orders_collection.delete_many({})  # Clear existing data
orders_collection.insert_many(df.to_dict(orient="records"))
print("✅ Data inserted successfully.")

# 2. Retrieve and print all documents
print("\n📄 All documents in Orders collection:")
for doc in orders_collection.find():
    print(doc)

# 3. Count and display total number of documents
total_docs = orders_collection.count_documents({})
print(f"\n📊 Total number of documents: {total_docs}")

# 4. Fetch all orders from the "West" region
print("\n🌍 Orders from West region:")
for doc in orders_collection.find({"Region": "West"}):
    print(doc)

# 5. Find orders where Sales > 500
print("\n💰 Orders with Sales > 500:")
for doc in orders_collection.find({"Sales": {"$gt": 500}}):
    print(doc)

# 6. Top 3 orders with highest Profit
print("\n🏆 Top 3 orders by Profit:")
for doc in orders_collection.find().sort("Profit", -1).limit(3):
    print(doc)

# 7. Update Ship Mode from "First Class" to "Premium Class"
update_result = orders_collection.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)
print(f"\n✏ Updated documents: {update_result.modified_count}")

# 8. Delete orders where Sales < 50
delete_result = orders_collection.delete_many({"Sales": {"$lt": 50}})
print(f"🗑 Deleted documents: {delete_result.deleted_count}")

# 9. Aggregation: Group by Region and calculate total sales
print("\n📊 Total sales per region:")
pipeline = [
    {"$group": {"_id": "$Region", "TotalSales": {"$sum": "$Sales"}}}
]
for result in orders_collection.aggregate(pipeline):
    print(result)

# 10. Fetch distinct values for Ship Mode
ship_modes = orders_collection.distinct("Ship Mode")
print(f"\n🚢 Distinct Ship Modes: {ship_modes}")

# 11. Count number of orders for each Category
print("\n📦 Orders per Category:")
pipeline = [
    {"$group": {"_id": "$Category", "Count": {"$sum": 1}}}
]
for result in orders_collection.aggregate(pipeline):
    print(result)
