### Q1. What is MongoDB? Explain non-relational databases in short. In which scenarios it is preferred to use MongoDB over SQL databases?

MongoDB is a popular open-source NoSQL database system that falls under the category of non-relational databases. Non-relational databases, also known as NoSQL databases, are a type of database management system that does not use the traditional tabular relational database model, like SQL databases. Instead, they provide a more flexible and dynamic way to store and retrieve data, often using a variety of data models, including document, key-value, column-family, and graph.

__Key characteristics of non-relational databases (e.g.: MongoDB) are:__

__Schema Flexibility:__ Non-relational databases don't require a fixed schema, which means you can store data without a predefined structure. This flexibility is especially useful when dealing with unstructured or semi-structured data.

__Scalability:__ NoSQL databases are designed to scale out horizontally, which makes them suitable for handling large amounts of data and high traffic loads.

__High Performance:__ They often offer fast read and write operations, making them a good choice for applications that require low-latency responses.

__Replication and High Availability:__ Many NoSQL databases, including MongoDB, support data replication and provide mechanisms for ensuring high availability and fault tolerance.

__Diverse Data Models:__ NoSQL databases support various data models, allowing you to choose the one that best fits your specific use case. MongoDB, for example, is known for its document-based data model.


__Scenarios where MongoDB is preferred over traditional SQL databases (RDBMS) include:__

__Flexible Schema Requirements:__ If your application's data structure is subject to frequent changes or if you need to store semi-structured or unstructured data, MongoDB's schema-less design is more accommodating.

__Large Amounts of Data:__ MongoDB's horizontal scalability and support for sharding make it suitable for handling large datasets and high traffic loads, which can be challenging for traditional SQL databases.

__High Write Throughput:__ MongoDB is well-suited for applications that require high-speed writes, such as real-time analytics or data logging.

__JSON-Like Documents:__ When you work with data that can be represented in a JSON-like format, MongoDB's document-oriented data model can simplify development.

__Geo-Spatial Data:__ If your application needs to work with geospatial data or location-based services, MongoDB has built-in support for geospatial indexing and queries.

__Scalability Requirements:__ MongoDB can be easily distributed across multiple servers or cloud instances, providing scalability and redundancy.

### Q2. State and Explain the features of MongoDB.

__1.) Ad-hoc queries for optimized, real-time analytics__

When designing the schema of a database, it is impossible to know in advance all the queries that will be performed by end users. An ad hoc query is a short-lived command whose value depends on a variable. Each time an ad hoc query is executed, the result may be different, depending on the variables in question.

Optimizing the way in which ad-hoc queries are handled can make a significant difference at scale, when thousands to millions of variables may need to be considered. This is why MongoDB, a document-oriented, flexible schema database, stands apart as the cloud database platform of choice for enterprise applications that require real-time analytics. With ad-hoc query support that allows developers to update ad-hoc queries in real time, the improvement in performance can be game-changing.

MongoDB supports field queries, range queries, and regular expression searches. Queries can return specific fields and also account for user-defined functions. This is made possible because MongoDB indexes BSON documents and uses the MongoDB Query Language (MQL).


__2.) Indexing appropriately for better query executions__

In our experience, the number one issue that many technical support teams fail to address with their users is indexing. Done right, indexes are intended to improve search speed and performance. A failure to properly define appropriate indices can and usually will lead to a myriad of accessibility issues, such as problems with query execution and load balancing.

Without the right indices, a database is forced to scan documents one by one to identify the ones that match the query statement. But if an appropriate index exists for each query, user requests can be optimally executed by the server. MongoDB offers a broad range of indices and features with language-specific sort orders that support complex access patterns to datasets.

Notably, MongoDB indices can be created on demand to accommodate real-time, ever-changing query patterns and application requirements. They can also be declared on any field within any of your documents, including those nested within arrays.


__3.) Replication for better data availability and stability__

Replication for better data availability and stability
When your data only resides in a single database, it is exposed to multiple potential points of failure, such as a server crash, service interruptions, or even good old hardware failure. Any of these events would make accessing your data nearly impossible.

Replication allows you to sidestep these vulnerabilities by deploying multiple servers for disaster recovery and backup. Horizontal scaling across multiple servers that house the same data (or shards of that same data) means greatly increased data availability and stability. Naturally, replication also helps with load balancing. When multiple users access the same data, the load can be distributed evenly across servers.

In MongoDB, replica sets are employed for this purpose. A primary server or node accepts all write operations and applies those same operations across secondary servers, replicating the data. If the primary server should ever experience a critical failure, any one of the secondary servers can be elected to become the new primary node. And if the former primary node comes back online, it does so as a secondary server for the new primary node.


__4.) Sharding__

When dealing with particularly large datasets, sharding—the process of splitting larger datasets across multiple distributed collections, or <b><font color = 'blue'> "Shards" </font></b> — helps the database distribute and better execute what might otherwise be problematic and cumbersome queries. Without sharding, scaling a growing web application with millions of daily users is nearly impossible.

Like replication via replication sets, sharding in MongoDB allows for much greater horizontal scalability. Horizontal scaling means that each shard in every cluster houses a portion of the dataset in question, essentially functioning as a separate database. The collection of distributed server shards forms a single, comprehensive database much better suited to handling the needs of a popular, growing application with zero downtime.

All operations in a sharding environment are handled through a lightweight process called <b><font color = 'blue'>mongos</font></b>. <b><br>
<font color = 'blue'>Mongos</font></b> can direct queries to the correct shard based on the shard key. Naturally, proper sharding also contributes significantly to better load balancing.


__5.) Load balancing__

At the end of the day, optimal load balancing remains one of the holy grails of large-scale database management for growing enterprise applications. Properly distributing millions of client requests to hundreds or thousands of servers can lead to a noticeable (and much appreciated) difference in performance.

Fortunately, via horizontal scaling features like replication and sharding, MongoDB supports large-scale load balancing. The platform can handle multiple concurrent read and write requests for the same data with best-in-class concurrency control and locking protocols that ensure data consistency. There’s no need to add an external load balancer—MongoDB ensures that each and every user has a consistent view and quality experience with the data they need to access

### Q3. Write a code to connect MongoDB to Python. Also, create a database and a collection in MongoDB.

In [3]:
import pymongo


In [4]:
with open(r"/home/jovyan/work/mongoDB_conn_str.txt") as fh:
    connection_string = fh.read().rstrip("\n")
    fh.seek(0)

    
# Create a new client and connect to the server
client = pymongo.MongoClient(connection_string)

# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")

except Exception as e:
    print("Traceback >>\n",e)


# creating/getting a database inside the mongoDB cluster,
db = client['wallmart']

# creating/getting a collection inside the database 'wallmart'
customer_collection = db['customer']

Pinged your deployment. You successfully connected to MongoDB!


### Q4. Using the database and the collection created in question number 3, write a code to insert one record, and insert many records. Use the find() and find_one() methods to print the inserted record.

In [14]:
cust_record = {
    'first_name' : 'Daniel',
    'middle_name' : None,
    'last_name': 'Butcher',
    'phone' : "4587855663",
    'receipt_number' : "Mart-M-85236P-Digital",
    'payment_method' : 'Debit Card'
}

customer_collection.insert_one(document= cust_record)

many_customer_records = [
    {
        'first_name' : 'Sophia',
        'middle_name' : None,
        'last_name': 'Nebel',
        'phone' : "3647139856",
        'receipt_number' : "Mart-J-331J0202",
        'payment_method' : 'CASH'
    },

    {
        'first_name' : 'Nickolas',
        'middle_name' : None,
        'last_name': 'Shapiro',
        'phone' : "665544771",
        'receipt_number' : "Mart-L-ef685k6-Digital",
        'payment_method' : 'CC'
    },

    {
        'first_name' : 'Blake',
        'middle_name' : None,
        'last_name': 'Cunningham',
        'phone' : "666654245",
        'receipt_number' : "Mart-L-310245",
        'payment_method' : 'CASH'
    }
]

customer_collection.insert_many(many_customer_records)

<pymongo.results.InsertManyResult at 0x7f015811af50>

In [15]:
for doc in customer_collection.find():
    print(type(doc))
    print(doc)    
    print("\n")

<class 'dict'>
{'_id': ObjectId('653118fd1fb5e7869e1fb5e9'), 'first_name': 'Adarsh', 'middle_name': None, 'last_name': 'Namdev', 'phone': '7507157178', 'receipt_number': 'Mart-M-711458-Digital', 'payment_method': 'UPI'}


<class 'dict'>
{'_id': ObjectId('65311b45f46b68d5939c2094'), 'first_name': 'Abhay', 'middle_name': None, 'last_name': 'Verma', 'phone': '267478524', 'receipt_number': 'Mart-M-711458', 'payment_method': 'CASH'}


<class 'dict'>
{'_id': ObjectId('65311b45f46b68d5939c2093'), 'first_name': 'Mercedes', 'middle_name': 'Ann', 'last_name': 'Tyler', 'phone': '4584566', 'receipt_number': 'Mart-L-1er58-Digital', 'payment_method': 'CC'}


<class 'dict'>
{'_id': ObjectId('6533d92b4e4a0e6665c8334a'), 'first_name': 'Daniel', 'middle_name': None, 'last_name': 'Butcher', 'phone': '4587855663', 'receipt_number': 'Mart-M-85236P-Digital', 'payment_method': 'Debit Card'}


<class 'dict'>
{'_id': ObjectId('6533e1714e4a0e6665c83353'), 'first_name': 'Daniel', 'middle_name': None, 'last_name'

In [16]:
customer_collection.find_one({'payment_method': 'CASH'})

{'_id': ObjectId('65311b45f46b68d5939c2094'),
 'first_name': 'Abhay',
 'middle_name': None,
 'last_name': 'Verma',
 'phone': '267478524',
 'receipt_number': 'Mart-M-711458',
 'payment_method': 'CASH'}

### Q5. Explain how you can use the find() method to query the MongoDB database. Write a simple code to demonstrate this.

On a basic level and to explain in short, the `find()` method is used to filter the record based on the paramter `filter` (a filter criteria) passed to it as dictionary. If there is no filter criteria is given then `find()` method just simply retrn 101 documents by default.

In [18]:
for document in customer_collection.find().limit(3).skip(3):
    print(document)

{'_id': ObjectId('653118fd1fb5e7869e1fb5e9'), 'first_name': 'Adarsh', 'middle_name': None, 'last_name': 'Namdev', 'phone': '7507157178', 'receipt_number': 'Mart-M-711458-Digital', 'payment_method': 'UPI'}
{'_id': ObjectId('65311b45f46b68d5939c2094'), 'first_name': 'Abhay', 'middle_name': None, 'last_name': 'Verma', 'phone': '267478524', 'receipt_number': 'Mart-M-711458', 'payment_method': 'CASH'}
{'_id': ObjectId('65311b45f46b68d5939c2093'), 'first_name': 'Mercedes', 'middle_name': 'Ann', 'last_name': 'Tyler', 'phone': '4584566', 'receipt_number': 'Mart-L-1er58-Digital', 'payment_method': 'CC'}


In [5]:
# skipping first 5 documents and then fetching the next 3 documents.
for document in customer_collection.find().limit(3).skip(5):
    print(document)

{'_id': ObjectId('6533e1714e4a0e6665c83354'), 'first_name': 'Sophia', 'middle_name': None, 'last_name': 'Nebel', 'phone': '3647139856', 'receipt_number': 'Mart-J-331J0202', 'payment_method': 'CASH'}
{'_id': ObjectId('6533e1714e4a0e6665c83355'), 'first_name': 'Nickolas', 'middle_name': None, 'last_name': 'Shapiro', 'phone': '665544771', 'receipt_number': 'Mart-L-ef685k6-Digital', 'payment_method': 'CC'}
{'_id': ObjectId('6533e1714e4a0e6665c83356'), 'first_name': 'Blake', 'middle_name': None, 'last_name': 'Cunningham', 'phone': '666654245', 'receipt_number': 'Mart-L-310245', 'payment_method': 'CASH'}


### Q6. Explain the sort() method. Give an example to demonstrate sorting in MongoDB.

In [11]:
for document in customer_collection.find().sort('phone', pymongo.ASCENDING):
    print(document)

{'_id': ObjectId('65311b45f46b68d5939c2094'), 'first_name': 'Abhay', 'middle_name': None, 'last_name': 'Verma', 'phone': '267478524', 'receipt_number': 'Mart-M-711458', 'payment_method': 'CASH'}
{'_id': ObjectId('6533e1714e4a0e6665c83354'), 'first_name': 'Sophia', 'middle_name': None, 'last_name': 'Nebel', 'phone': '3647139856', 'receipt_number': 'Mart-J-331J0202', 'payment_method': 'CASH'}
{'_id': ObjectId('65311b45f46b68d5939c2093'), 'first_name': 'Mercedes', 'middle_name': 'Ann', 'last_name': 'Tyler', 'phone': '4584566', 'receipt_number': 'Mart-L-1er58-Digital', 'payment_method': 'CC'}
{'_id': ObjectId('6533d92b4e4a0e6665c8334a'), 'first_name': 'Daniel', 'middle_name': None, 'last_name': 'Butcher', 'phone': '4587855663', 'receipt_number': 'Mart-M-85236P-Digital', 'payment_method': 'Debit Card'}
{'_id': ObjectId('6533e1714e4a0e6665c83353'), 'first_name': 'Daniel', 'middle_name': None, 'last_name': 'Butcher', 'phone': '4587855663', 'receipt_number': 'Mart-M-85236P-Digital', 'payment_m

In [18]:
for document in customer_collection.find().sort('phone', pymongo.DESCENDING):
    print(document)

{'_id': ObjectId('653118fd1fb5e7869e1fb5e9'), 'first_name': 'Adarsh', 'middle_name': None, 'last_name': 'Namdev', 'phone': '7507157178', 'receipt_number': 'Mart-M-711458-Digital', 'payment_method': 'UPI'}
{'_id': ObjectId('6533e1714e4a0e6665c83356'), 'first_name': 'Blake', 'middle_name': None, 'last_name': 'Cunningham', 'phone': '666654245', 'receipt_number': 'Mart-L-310245', 'payment_method': 'CASH'}
{'_id': ObjectId('6533e1714e4a0e6665c83355'), 'first_name': 'Nickolas', 'middle_name': None, 'last_name': 'Shapiro', 'phone': '665544771', 'receipt_number': 'Mart-L-ef685k6-Digital', 'payment_method': 'CC'}
{'_id': ObjectId('6533d92b4e4a0e6665c8334a'), 'first_name': 'Daniel', 'middle_name': None, 'last_name': 'Butcher', 'phone': '4587855663', 'receipt_number': 'Mart-M-85236P-Digital', 'payment_method': 'Debit Card'}
{'_id': ObjectId('6533e1714e4a0e6665c83353'), 'first_name': 'Daniel', 'middle_name': None, 'last_name': 'Butcher', 'phone': '4587855663', 'receipt_number': 'Mart-M-85236P-Digi

### Q7. Explain why `delete_one()`, `delete_many()`, and `drop()` is used.

`delete_one()`:

This method is used to delete a single document (or record) that matches a specified condition or filter.

You use `delete_one()` when you want to remove a specific document from a collection based on a particular criteria, such as deleting a user by their unique identifier or removing a specific order by order ID.


`delete_many()`:

This method is used to delete multiple documents that match a specified condition or filter.

`delete_many()` is employed when you want to delete several documents that meet a certain condition, such as deleting all users who have not logged in for the past year or removing all completed orders from a database


`drop()`:

The `drop()` method is used to remove an entire collection (equivalent to a table in a relational database) from the database. It effectively deletes all documents in the collection and the collection itself.

`drop()` is typically used when you want to completely discard a collection and all its data, often as part of administrative or cleanup tasks. This should be used with caution as it's a destructive operation.