![image](https://user-images.githubusercontent.com/57321948/196933065-4b16c235-f3b9-4391-9cfe-4affcec87c35.png)

# Submitted by: Mohammad Wasiq

## Email: `gl0427@myamu.ac.in`

# Pre-Placement Training Assignment - `Big Data` 

## NOSQL

**Q1. NoSQL Databases:**

**a. Write a Python program that connects to a MongoDB database and inserts a new document into a collection named "students". The document should include fields such as "name", "age", and "grade". Print a success message after the insertion.**

In [None]:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider

def insert_product_record(id, name, price):
    try:
        # Configure the Cassandra cluster and authentication
        auth_provider = PlainTextAuthProvider(username='your_username', password='your_password')
        cluster = Cluster(['your_host'], auth_provider=auth_provider)

        # Connect to the Cassandra cluster and create a session
        session = cluster.connect('your_keyspace')

        # Prepare the INSERT query
        insert_query = session.prepare("INSERT INTO products (id, name, price) VALUES (?, ?, ?)")

        # Execute the INSERT query with the provided data
        session.execute(insert_query, (id, name, price))

        # Close the Cassandra session and cluster connection
        session.shutdown()
        cluster.shutdown()

        print("Record inserted successfully!")
    except Exception as e:
        print(f"Error inserting record: {str(e)}")

# Usage example
insert_product_record(1, 'Product A', 9.99)

**b. Implement a Python function that connects to a Cassandra database and inserts a new record into a table named "products". The record should contain fields like "id", "name", and "price". Handle any potential errors that may occur during the insertion.**

In [None]:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider

def insert_product_record(id, name, price):
    try:
        # Connect to Cassandra cluster
        auth_provider = PlainTextAuthProvider(username='your_username', password='your_password')
        cluster = Cluster(['your_host'], auth_provider=auth_provider)
        session = cluster.connect('your_keyspace')

        # Prepare the INSERT statement
        insert_query = f"INSERT INTO products (id, name, price) VALUES ({id}, '{name}', {price})"

        # Execute the INSERT statement
        session.execute(insert_query)

        # Close the session and cluster connection
        session.shutdown()
        cluster.shutdown()

        print("Record inserted successfully!")
    except Exception as e:
        print(f"Error inserting record: {str(e)}")

# Usage example
insert_product_record(1, 'Product A', 9.99)

**Q2. Document-oriented NoSQL Databases:**

**a. Given a MongoDB collection named "books", write a Python function that fetches all the books published in the last year and prints their titles and authors.**

In [None]:
from pymongo import MongoClient
from datetime import datetime, timedelta

def fetch_recent_books():
    try:
        # Connect to MongoDB
        client = MongoClient('mongodb://localhost:27017')
        db = client['your_database']
        collection = db['books']

        # Calculate the date one year ago from the current date
        one_year_ago = datetime.now() - timedelta(days=365)

        # Construct the query to fetch books published in the last year
        query = {"publish_date": {"$gte": one_year_ago}}

        # Fetch the books matching the query
        books = collection.find(query)

        # Print the titles and authors of the fetched books
        for book in books:
            print(f"Title: {book['title']}")
            print(f"Author: {book['author']}")
            print("-------------------")

        # Close the MongoDB connection
        client.close()
    except Exception as e:
        print(f"Error fetching recent books: {str(e)}")

# Usage example
fetch_recent_books()

**b. Design a schema for a document-oriented NoSQL database to store customer information for an e-commerce platform. Write a Python program to insert a new customer document into the database and handle any necessary validations.**

In [None]:
from pymongo import MongoClient

def insert_customer(customer_data):
    try:
        # Connect to MongoDB
        client = MongoClient('mongodb://localhost:27017')
        db = client['your_database']
        collection = db['customers']

        # Validate customer data
        if 'customer_id' not in customer_data:
            raise ValueError("Customer ID is required.")
        if 'name' not in customer_data:
            raise ValueError("Name is required.")
        # Add more validations as per your requirements

        # Insert the customer document into the database
        collection.insert_one(customer_data)
        print("Customer inserted successfully.")

        # Close the MongoDB connection
        client.close()
    except Exception as e:
        print(f"Error inserting customer: {str(e)}")

# Usage example
customer = {
    'customer_id': '123',
    'name': 'John Doe',
    'email': 'john@example.com',
    'address': '123 Main St',
    'phone': '123-456-7890'
}
insert_customer(customer)

**Q3. High Availability and Fault Tolerance:**

**a. Explain the concept of replica sets in MongoDB. Write a Python program that connects to a MongoDB replica set and retrieves the status of the primary and secondary nodes.**

In [None]:
from pymongo import MongoClient

def get_replica_set_status():
    try:
        # Connect to MongoDB replica set
        client = MongoClient('mongodb://localhost:27017,localhost:27018,localhost:27019/?replicaSet=my_replica_set')

        # Get the replica set status
        rs_status = client.admin.command('replSetGetStatus')

        # Print the status of each member
        for member in rs_status['members']:
            print(f"Member ID: {member['_id']}")
            print(f"Member Host: {member['name']}")
            print(f"Member State: {member['stateStr']}")
            print("")

        # Close the MongoDB connection
        client.close()
    except Exception as e:
        print(f"Error retrieving replica set status: {str(e)}")

# Usage example
get_replica_set_status()

**b. Describe how Cassandra ensures high availability and fault tolerance in a distributed database system. Write a Python program that connects to a Cassandra cluster and fetches the status of the nodes.**

In [None]:
from cassandra.cluster import Cluster

def get_cluster_status():
    try:
        # Connect to Cassandra cluster
        cluster = Cluster(['localhost'])  # Replace with the appropriate contact points

        # Retrieve the cluster metadata
        metadata = cluster.metadata
        all_hosts = metadata.all_hosts()

        # Print the status of each node
        for host in all_hosts:
            print(f"Host: {host.address}")
            print(f"Datacenter: {host.datacenter}")
            print(f"Rack: {host.rack}")
            print(f"Status: {host.is_up}")
            print("")

        # Close the Cassandra cluster connection
        cluster.shutdown()
    except Exception as e:
        print(f"Error retrieving cluster status: {str(e)}")

# Usage example
get_cluster_status()

**Q4. Sharding in MongoDB:**

**a. Explain the concept of sharding in MongoDB and how it improves performance and scalability. Write a Python program that sets up sharding for a MongoDB cluster and inserts multiple documents into a sharded collection.**

In [None]:
from pymongo import MongoClient

# Connect to the MongoDB instance
client = MongoClient('mongodb://localhost:27017')

# Enable sharding for a database
admin_db = client.admin
admin_db.command('enableSharding', 'mydatabase')

# Create a sharded collection with a shard key
sharded_collection = client['mydatabase']['mycollection']
sharded_collection.create_index('shard_key')

# Insert multiple documents into the sharded collection
documents = [
    {'shard_key': 'A', 'data': 'Document 1'},
    {'shard_key': 'B', 'data': 'Document 2'},
    {'shard_key': 'C', 'data': 'Document 3'},
    # Add more documents as needed
]
sharded_collection.insert_many(documents)

# Print the documents in the sharded collection
for doc in sharded_collection.find():
    print(doc)

# Close the MongoDB connection
client.close()

**b. Design a sharding strategy for a social media application where user data needs to be distributed across multiple shards. Write a Python program to demonstrate how data is distributed and retrieved from the sharded cluster.**


In [None]:
from pymongo import MongoClient

# Connect to the MongoDB instance
client = MongoClient('mongodb://localhost:27017')

# Get a reference to the sharded collection
sharded_collection = client['mydatabase']['users']

# Insert user data into the sharded collection
def insert_user(user_id, username, email):
    user = {
        '_id': user_id,
        'username': username,
        'email': email,
        # Other user data fields
    }
    sharded_collection.insert_one(user)

# Query user data from the sharded collection
def get_user(user_id):
    user = sharded_collection.find_one({'_id': user_id})
    return user

# Perform data distribution and retrieval operations
insert_user(1, 'user1', 'user1@example.com')
insert_user(2, 'user2', 'user2@example.com')
insert_user(3, 'user3', 'user3@example.com')

user1 = get_user(1)
user2 = get_user(2)
user3 = get_user(3)

# Print the retrieved user data
print(user1)
print(user2)
print(user3)

# Close the MongoDB connection
client.close()

**Q5. Indexing in MongoDB:**

**a. Describe the concept of indexing in MongoDB and its importance in query optimization. Write a Python program that creates an index on a specific field in a MongoDB collection and executes a query using that index.**

In [None]:
from pymongo import MongoClient
from pymongo import DESCENDING

# Connect to the MongoDB instance
client = MongoClient('mongodb://localhost:27017')

# Get a reference to the MongoDB database and collection
db = client['mydatabase']
collection = db['mycollection']

# Create an index on the 'name' field
collection.create_index([('name', DESCENDING)])

# Execute a query using the created index
query = {'name': 'John'}
result = collection.find(query).sort('name', DESCENDING)

# Print the query results
for doc in result:
    print(doc)

# Close the MongoDB connection
client.close()

**b. Given a MongoDB collection named "products", write a Python function that searches for products with a specific keyword in the name or description. Optimize the query by adding appropriate indexes.**


In [None]:
from pymongo import MongoClient
from pymongo import TEXT

def search_products(keyword):
    # Connect to the MongoDB instance
    client = MongoClient('mongodb://localhost:27017')

    # Get a reference to the MongoDB database and collection
    db = client['mydatabase']
    collection = db['products']

    # Create indexes on the 'name' and 'description' fields for text search
    collection.create_index([('name', TEXT), ('description', TEXT)])

    # Execute the search query using the created indexes
    query = {'$text': {'$search': keyword}}
    result = collection.find(query)

    # Print the query results
    for doc in result:
        print(doc)

    # Close the MongoDB connection
    client.close()