1. NoSQL Databases:

a. Write a Python program that connects to a MongoDB database and inserts a new document into a collection named "students". The document should include fields such as "name", "age", and "grade". Print a success message after the insertion.

In [None]:
from pymongo import MongoClient

def insert_student(name, age, grade):
    # Connect to MongoDB
    client = MongoClient("mongodb://localhost:80180/")
    db = client["mydatabase"]
    collection = db["students"]

    # Create the student document
    student = {
        "name": name,
        "age": age,
        "grade": grade
    }

    # Insert the document into the collection
    collection.insert_one(student)

    # Print success message
    print("Student inserted successfully.")

# Usage example
insert_student("John Doe", 18, "A")


b. Implement a Python function that connects to a Cassandra database and inserts a new record into a table named "products". The record should contain fields like "id", "name", and "price". Handle any potential errors that may occur during the insertion.

In [None]:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement

def insert_product(id, name, price):
    # Connect to Cassandra
    cloud_config = {
        'secure_connect_bundle': '/path/to/secure-connect-database.zip'
    }
    auth_provider = PlainTextAuthProvider('username', 'password')
    cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
    session = cluster.connect()

    # Create the product record
    product = {
        "id": id,
        "name": name,
        "price": price
    }

    # Prepare the INSERT statement
    insert_query = "INSERT INTO products (id, name, price) VALUES (?, ?, ?)"
    insert_statement = SimpleStatement(insert_query, consistency_level=ConsistencyLevel.QUORUM)

    try:
        # Execute the INSERT statement
        session.execute(insert_statement, (id, name, price))
        print("Product inserted successfully.")
    except Exception as e:
        print("Error inserting product:", str(e))

    # Close the connection
    session.shutdown()
    cluster.shutdown()


insert_product(1, "Example Product", 9.99)


In [None]:
2. Document-oriented NoSQL Databases:

In [None]:
   a. Given a MongoDB collection named "books", write a Python function that fetches all the books published in the last year and prints their titles and authors.


In [None]:
from pymongo import MongoClient
from datetime import datetime, timedelta

def fetch_recent_books():
    # Connect to MongoDB
    client = MongoClient("mongodb://localhost:27017/")
    db = client["mydatabase"]
    collection = db["books"]

    # Calculate the date one year ago
    one_year_ago = datetime.now() - timedelta(days=365)

    # Construct the query to fetch recent books
    query = {"publication_date": {"$gte": one_year_ago}}

    # Fetch the recent books from the collection
    recent_books = collection.find(query)

    # Print the titles and authors of the recent books
    for book in recent_books:
        print("Title:", book["title"])
        print("Author:", book["author"])
        print()

fetch_recent_books()


b. Design a schema for a document-oriented NoSQL database to store customer information for an e-commerce platform. Write a Python program to insert a new customer document into the database and handle any necessary validations.

In [None]:
from pymongo import MongoClient

def insert_customer(customer):
    # Connect to MongoDB
    client = MongoClient("mongodb://localhost:27017/")
    db = client["mydatabase"]
    collection = db["customers"]

    # Insert the customer document into the collection
    collection.insert_one(customer)

    # Print success message
    print("Customer inserted successfully.")

# Usage example
customer = {
    "name": "John Doe",
    "email": "johndoe@example.com",
    "address": {
        "street": "123 Main St",
        "city": "New York",
        "state": "NY",
        "postal_code": "10001",
        "country": "USA"
    }
}
insert_customer(customer)


In [None]:
3. High Availability and Fault Tolerance:


a. Explain the concept of replica sets in MongoDB. Write a Python program that connects to a MongoDB replica set and retrieves the status of the primary and secondary nodes.

Ans - Replica sets in MongoDB provide several benefits:

High Availability: If the primary node fails, one of the secondary nodes is automatically elected as the new primary. This ensures that the system remains available even in the event of a node failure.

Data Redundancy: MongoDB replicates data across multiple nodes, providing data redundancy. If a secondary node fails, another secondary can step in and continue replicating the data.

Automatic Failover: Replica sets enable automatic failover, where the replica set members can detect if the primary node becomes unavailable and elect a new primary automatically. This ensures minimal downtime and reduces the need for manual intervention.

Read Scalability: Clients can read from secondary nodes in a replica set, distributing read operations across multiple nodes and improving read scalability.



In [None]:
from pymongo import MongoClient

def check_replica_set_status():
    # Connect to MongoDB replica set
    client = MongoClient("mongodb://localhost:27017/")
    db = client["admin"]

    # Get the replica set status
    status = db.command("replSetGetStatus")

    # Print the status of each member
    for member in status["members"]:
        print("Member ID:", member["_id"])
        print("State:", member["stateStr"])
        print()

    # Close the connection
    client.close()

# Usage example
check_replica_set_status()


b. Describe how Cassandra ensures high availability and fault tolerance in a distributed database system. Write a Python program that connects to a Cassandra cluster and fetches the status of the nodes.


Ans - Cassandra ensures high availability and fault tolerance in a distributed database system through the following mechanisms:

Replication: Cassandra uses a peer-to-peer distributed architecture where data is replicated across multiple nodes. Each node in the cluster holds a copy of the data, ensuring redundancy and fault tolerance. Replication can be configured to replicate data across multiple data centers for additional resilience.

Consistent Hashing: Cassandra uses consistent hashing to distribute data across the cluster. This ensures that each node in the cluster is responsible for a specific range of data, allowing for efficient read and write operations. Consistent hashing also helps in maintaining data distribution and load balancing as nodes are added or removed from the cluster.

Replication Factor: Cassandra allows configuring the replication factor, which determines the number of replicas for each piece of data. By having multiple replicas, Cassandra ensures that data remains available even if some nodes fail. Replicas can be placed on different physical nodes, racks, or data centers to provide fault tolerance and availability.

Data Repair: Cassandra periodically performs anti-entropy repairs to reconcile differences between replicas and ensure data consistency. This repair process identifies inconsistencies and synchronizes data across replicas to maintain data integrity and prevent data loss.

Gossip Protocol: Cassandra uses a gossip protocol for node discovery and failure detection. Nodes in the cluster communicate with each other, exchanging information about their status and the status of other nodes. This allows Cassandra to detect and respond to node failures quickly, ensuring high availability.

Tunable Consistency: Cassandra provides tunable consistency, allowing users to trade off consistency for availability or vice versa based on their application requirements. Consistency levels can be set at the request level, allowing developers to choose the desired level of consistency for each operation.




4. Sharding in MongoDB:
   a. Explain the concept of sharding in MongoDB and how it improves performance and scalability. Write a Python program that sets up sharding for a MongoDB cluster and inserts multiple documents into a sharded collection.

In a sharded cluster, data is divided into chunks based on a shard key, which is a field or set of fields chosen to determine the target shard for each document. Each shard is a separate replica set that stores a subset of the data. The primary purpose of sharding is to distribute data evenly across multiple shards to ensure that the cluster can handle large amounts of data and high throughput.

Sharding improves performance and scalability in MongoDB in several ways:

Horizontal Scalability: By distributing data across multiple shards, sharding enables MongoDB to handle larger data sets and high traffic loads. Each shard can be hosted on a separate machine or replica set, allowing for increased storage capacity and processing power.

Parallel Query Execution: Sharding enables parallel query execution across multiple shards, leading to improved query performance and reduced response times. Queries that involve large datasets can be distributed and executed in parallel across shards, taking advantage of the processing power of multiple machines.

Load Balancing: MongoDB's sharding architecture automatically balances the data distribution across shards. As data grows or the cluster expands, MongoDB redistributes the chunks across shards to maintain an even distribution and avoid hotspots or overloading of specific shards.



In [None]:
from pymongo import MongoClient
from bson import ObjectId

def setup_sharding():
    # Connect to MongoDB
    client = MongoClient("mongodb://localhost:27177/")

    # Enable sharding for the database
    admin_db = client["admin"]
    admin_db.command("enableSharding", "mydatabase")

    # Create a sharded collection
    db = client["mydatabase"]
    db.command({"shardCollection": "mydatabase.students", "key": {"_id": "hashed"}})

    # Insert multiple documents into the sharded collection
    collection = db["students"]
    students = [
        {"_id": ObjectId(), "name": "John Doe", "age": 18, "grade": "A"},
        {"_id": ObjectId(), "name": "Jane Smith", "age": 19, "grade": "B"},
        {"_id": ObjectId(), "name": "Mark Johnson", "age": 20, "grade": "C"}
    ]
    collection.insert_many(students)

    # Print success message
    print("Sharding set up and documents inserted successfully.")

# Usage example
setup_sharding()


b. To design a sharding strategy for a social media application where user data needs to be distributed across multiple shards, you can use a shard key based on the user ID or username. Here's a Python program to demonstrate how data is distributed and retrieved 

In [None]:
from pymongo import MongoClient
from bson import ObjectId

def insert_user(user_id, username, email):
    # Connect to MongoDB
    client = MongoClient("mongodb://localhost:27017/")

    # Insert the user document into the sharded collection
    db = client["mydatabase"]
    collection = db["users"]
    user = {
        "_id": user_id,
        "username": username,
        "email": email
    }
    collection.insert_one(user)

    # Print success message
    print("User inserted successfully.")

def find_user(username):
    # Connect to MongoDB
    client = MongoClient("mongodb://localhost:27017/")

    # Find the user document based on username
    db = client["mydatabase"]
    collection = db["users"]
    user = collection.find_one({"username": username})

    # Print user details
    if user:
        print("User details:")
        print("_id:", user["_id"])
        print("username:", user["username"])
        print("email:", user["email"])
    else:
        print("User not found.")

# Usage example
insert_user(ObjectId(), "john123", "john@example.com")
insert_user(ObjectId(), "jane456", "jane@example.com")

find_user("john123")
find_user("jane456")
find_user("nonexistentuser")


5. Indexing in MongoDB:
   a. Describe the concept of indexing in MongoDB and its importance in query optimization. Write a Python program that creates an index on a specific field in a MongoDB collection and executes a query using that index.

a. Indexing in MongoDB is the process of creating an index on a specific field or set of fields in a collection. An index is a data structure that improves the speed of data retrieval operations, such as querying, by allowing MongoDB to quickly locate the desired data based on the indexed field(s).

Indexes in MongoDB are implemented using B-tree data structures. When an index is created on a field, MongoDB creates an index entry for each document in the collection, storing the indexed field's value and a reference to the document. This allows MongoDB to perform index-based lookups and efficiently filter data based on the indexed field(s).

The importance of indexing in query optimization cannot be overstated. By creating indexes on frequently queried fields, MongoDB can avoid performing full collection scans and instead use the index to narrow down the search space. This significantly improves query performance, especially for collections with a large number of documents.

In [None]:
from pymongo import MongoClient

def create_index_and_query():
    # Connect to MongoDB
    client = MongoClient("mongodb://localhost:27017/")

    # Get the database and collection
    db = client["mydatabase"]
    collection = db["products"]

    # Create an index on the "name" field
    collection.create_index("name")

    # Execute a query using the index
    query = {"name": "example_product"}
    result = collection.find(query)

    # Print the query result
    for product in result:
        print("Name:", product["name"])
        print("Description:", product["description"])
        print()

    # Close the connection
    client.close()

# Usage example
create_index_and_query()


b. To optimize the query for searching products with a specific keyword in the name or description in the "products" collection, we can add appropriate indexes. Here's a Python function that performs the search and optimizes the query by adding indexes:

In [None]:
from pymongo import MongoClient

def search_products(keyword):
    # Connect to MongoDB
    client = MongoClient("mongodb://localhost:27017/")

    # Get the database and collection
    db = client["mydatabase"]
    collection = db["products"]

    # Add indexes on the "name" and "description" fields
    collection.create_index([("name", "text"), ("description", "text")])

    # Perform the search query using the indexes
    query = {"$text": {"$search": keyword}}
    result = collection.find(query)

    # Print the search result
    for product in result:
        print("Name:", product["name"])
        print("Description:", product["description"])
        print()

    # Close the connection
    client.close()

# Usage example
search_products("example_keyword")
