<a href="https://colab.research.google.com/github/Prithivi1515/Demo/blob/main/MongoDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Theoretical Questions


1. What are the key differences between SQL and NoSQL database


SQL: Relational, table-based, structured schema, ACID-compliant, good for complex queries.

NoSQL: Non-relational, flexible schema (document, key-value, graph, etc.), scalable, good for unstructured data.

2.  What makes MongoDB a good choice for modern applications

Flexible schema for evolving data.

Scalable (horizontal scaling via sharding).

High performance for read/write operations.

Supports JSON-like documents for easy integration with modern apps.

3. Explain the concept of collections in MongoDB

Collections are groups of MongoDB documents (similar to tables in SQL).

They don’t enforce a schema, allowing flexibility in document structure.

4. How does MongoDB ensure high availability using replication

MongoDB uses replica sets: a primary node handles writes, while secondary nodes replicate data.

If the primary fails, a secondary is elected as the new primary, ensuring minimal downtime.

5.  What are the main benefits of MongoDB Atlas?

Fully managed cloud database service.

Automated backups, scaling, and monitoring.

Global clusters for low-latency access.

Built-in security and compliance features.

6. What is the role of indexes in MongoDB, and how do they improve performance?

Indexes improve query performance by creating pointers to data.

Without indexes, MongoDB performs a full collection scan.

7.  Describe the stages of the MongoDB aggregation pipeline.

Stages include match, group, sort, project, lookup, etc.

Data flows through stages, transforming and processing it step-by-step.

8. What is sharding in MongoDB? How does it differ from replication?

Sharding splits data across multiple servers (shards) for horizontal scaling.

Replication duplicates data across nodes for high availability.

9. What is PyMongo, and why is it used?

PyMongo is the official Python driver for MongoDB.

It allows Python applications to interact with MongoDB databases.

10. What are the ACID properties in the context of MongoDB transactions?

MongoDB supports ACID (Atomicity, Consistency, Isolation, Durability) for multi-document transactions.

Ensures data integrity in complex operations.

11. What is the purpose of MongoDB’s explain() function?

Provides details on query execution (e.g., indexes used, execution time).

Helps optimize queries for better performance.

12. How does MongoDB handle schema validation?

MongoDB allows schema validation rules to enforce document structure.

Ensures data consistency without rigid schema enforcement.

13. What is the difference between a primary and a secondary node in a replica set?

Primary: Handles all write operations and replicates data to secondaries.

Secondary: Replicates data and can handle read operations.

14. What security mechanisms does MongoDB provide for data protection?

Authentication (e.g., SCRAM, LDAP, Kerberos).

Encryption (at rest and in transit).

Role-based access control (RBAC).

Auditing for tracking database activity.

15.  Explain the concept of embedded documents and when they should be used?

Documents nested within other documents.

Useful for one-to-one or one-to-many relationships to reduce joins.

16. What is the purpose of MongoDB’s $lookup stage in aggregation?

Performs a left outer join between two collections.

Combines data from related documents.

17. What are some common use cases for MongoDB?

Real-time analytics.

Content management systems.

IoT applications.

Mobile and social apps.

E-commerce platforms.

18. What are the advantages of using MongoDB for horizontal scaling?

Sharding distributes data across multiple servers.

Handles large datasets and high traffic efficiently.

19. How do MongoDB transactions differ from SQL transactions?

MongoDB supports multi-document ACID transactions (similar to SQL).

However, MongoDB transactions are typically used sparingly due to performance considerations.

20. What are the main differences between capped collections and regular collections?

Capped Collections: Fixed size, FIFO order, high-performance for logs or caching.

Regular Collections: No size limit, flexible for general use.

21.  What is the purpose of the $match stage in MongoDB’s aggregation pipeline?

Filters documents to pass only those that match specified conditions.

Similar to the WHERE clause in SQL.

22. How can you secure access to a MongoDB database?

Enable authentication and authorization.

Use TLS/SSL for encryption.

Restrict network access (firewalls, IP whitelisting).

Regularly update MongoDB and apply security patches.

23. What is MongoDB’s WiredTiger storage engine, and why is it important?

MongoDB’s default storage engine.

Provides document-level concurrency control, compression, and efficient storage management.

Practical Questions


1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB


In [None]:
from pymongo import MongoClient
import pandas as pd

# Connect to MongoDB
client = MongoClient('mongodb://localhost:27017/')
db = client['superstore_db']
orders_collection = db['Orders']

def load_dataset_to_mongodb(file_path):
    df = pd.read_csv(file_path)
    records = df.to_dict('records')
    orders_collection.insert_many(records)
    print("Dataset loaded into MongoDB.")

file_path = "C:/Users/prith/Downloads/superstore.csv"
load_dataset_to_mongodb(file_path)

2. Retrieve and print all documents from the Orders collection


In [None]:
def print_all_documents():
    for document in orders_collection.find():
        print(document)

print_all_documents()

3. Count and display the total number of documents in the Orders collection


In [None]:
def count_documents():
    count = orders_collection.count_documents({})
    print(f"Total number of documents: {count}")

count_documents()

4. Write a query to fetch all orders from the "West" region


In [None]:
def fetch_orders_from_west():
    west_orders = orders_collection.find({"Region": "West"})
    for order in west_orders:
        print(order)

fetch_orders_from_west()

5. Write a query to find orders where Sales is greater than 500


In [None]:
def fetch_orders_with_sales_gt_500():
    high_sales_orders = orders_collection.find({"Sales": {"$gt": 500}})
    for order in high_sales_orders:
        print(order)

fetch_orders_with_sales_gt_500()

6. Fetch the top 3 orders with the highest Profit


In [None]:
def fetch_top_3_profit_orders():
    top_orders = orders_collection.find().sort("Profit", -1).limit(3)
    for order in top_orders:
        print(order)

fetch_top_3_profit_orders()

7. Update all orders with Ship Mode as "First Class" to "Premium Class"


In [None]:
def update_ship_mode():
    result = orders_collection.update_many(
        {"Ship Mode": "First Class"},
        {"$set": {"Ship Mode": "Premium Class"}}
    )
    print(f"Updated {result.modified_count} documents.")

update_ship_mode()

8. Delete all orders where Sales is less than 50


In [None]:
def delete_orders_with_sales_lt_50():
    result = orders_collection.delete_many({"Sales": {"$lt": 50}})
    print(f"Deleted {result.deleted_count} documents.")

delete_orders_with_sales_lt_50()

9. Use aggregation to group orders by Region and calculate total sales per region


In [None]:
def calculate_total_sales_per_region():
    pipeline = [
        {"$group": {"_id": "$Region", "total_sales": {"$sum": "$Sales"}}}
    ]
    result = orders_collection.aggregate(pipeline)
    for doc in result:
        print(doc)

calculate_total_sales_per_region()

10. Fetch all distinct values for Ship Mode from the collection


In [None]:
def fetch_distinct_ship_modes():
    ship_modes = orders_collection.distinct("Ship Mode")
    print("Distinct Ship Modes:", ship_modes)

fetch_distinct_ship_modes()

11. Count the number of orders for each category


In [None]:
def count_orders_per_category():
    pipeline = [
        {"$group": {"_id": "$Category", "count": {"$sum": 1}}}
    ]
    result = orders_collection.aggregate(pipeline)
    for doc in result:
        print(doc)

count_orders_per_category()