Q1. What is MongoDB? Explain non-relational databases in short. In which scenarios it is preferred to use
MongoDB over SQL databases?

MongoDB is a non-relational document database that provides support for JSON-like storage. The MongoDB database has a flexible data model that enables you to store unstructured data, and it provides full indexing support, and replication with rich and intuitive APIs.

A non-relational database is a database that does not use the tabular schema of rows and columns found in most traditional database systems. Instead, non-relational databases use a storage model that is optimized for the specific requirements of the type of data being stored. For example, data may be stored as simple key/value pairs, as JSON documents, or as a graph consisting of edges and vertices.

NoSQL databases like MongoDB are a good choice when your data is document-centric and doesn’t fit well into the schema of a relational database, when you need to accommodate massive scale, when you are rapidly prototyping, and a few other use cases.

Q2. State and Explain the features of MongoDB.

1) Document Model

MongoDB has been designed with developer productivity and flexibility in mind. It is a document-oriented database, which means that data is stored as documents, and documents are grouped in collections. The document model is a lot more natural for developers to work with because documents are self-contained and can be treated as objects. This means that developers can focus on the data they need to store and process, rather than worrying about how to split the data across different rigid tables.

Sharding
Sharding is the process of splitting larger datasets across multiple distributed instances, or “shards.” When applied to particularly large datasets, sharding helps the database distribute and better execute what might otherwise be problematic and cumbersome queries. Without sharding, scaling a growing web application with millions of daily users is nearly impossible.

Sharding in MongoDB allows for much greater horizontal scalability. Horizontal scaling means that each shard in every cluster houses a portion of the dataset in question, essentially functioning as a separate database. Combining the data of the distributed shards forms a single, comprehensive database much better suited to handling the needs of a popular, growing application with zero downtime.

All client operations in a sharding environment are handled through a lightweight process called mongos. The mongos can direct queries to the correct shard based on the shard key. Proper sharding also contributes significantly to better load balancing.

Check out the dedicated Database Sharding article to learn more about the different sharding architectures and what problems they solve.

Replication
When your data only resides in a single server, it is exposed to multiple potential points of failure, such as a server crash, service interruptions, or even good old hardware failure. Any of these events would make accessing your data nearly impossible.

Replication allows you to sidestep these vulnerabilities by deploying multiple servers for disaster recovery and backup. Horizontal scaling across multiple servers greatly increases data availability, reliability, and fault tolerance. Potentially, replication can help spread the read load to the secondary members of the replica set with the use of read preference.

In MongoDB, replica sets are employed for this purpose. A primary server or node accepts all write operations and applies those same operations across secondary servers, replicating the data. If the primary server should ever experience a critical failure, any one of the secondary servers can be elected to become the new primary node. And if the former primary node comes back online, it does so as a secondary server for the new primary node.

MongoDB Atlas, MongoDB’s DBaaS (Database-as-a-Service) platform, has a minimum of three member replica sets. They can span across multiple regions or even multiple cloud providers of your choice.

Check out the Replication article to learn more about how replication works in MongoDB.

Authentication
Authentication is a critical security feature in MongoDB. Authentication ensures that only authorized users can access the database. Without authentication, anyone can access your data.

MongoDB provides a number of authentication mechanisms for users to access the database. The most common is the Salted Challenge Response Authentication Mechanism (SCRAM), which is the default. When used, SCRAM requires the user to provide an authentication database, username, and password.

To learn more about SCRAM and the other available authentication mechanisms, check out the MongoDB Authentication article.

Database Triggers
Database triggers in MongoDB Atlas are a powerful feature that allow you to execute code when certain events occur in your database. For example, you can use triggers to execute a script when a document is inserted, updated, or deleted. Triggers can also be scheduled to execute at specific times.

MongoDB Atlas allows you to create and manage triggers in a simple, intuitive way. You can control your triggers through the Atlas UI.

Database triggers are a great way to perform audits, ensure data consistency and data integrity, and to perform complex event processing. Check out the dedicated Database Triggers article to learn more about the different types of triggers and how to use them.

Time Series Data
Time series data is most commonly generated by a device, such as a sensor, that records data over time. The data is stored in a collection of documents, each of which contains a timestamp and a value. MongoDB provides a number of features to help you manage time series data.

The native time series collections in MongoDB are designed to be storage-efficient and perform well with sequences of measurements. You have a number of parameters to control the storage of time series data, including the granularity (the time span between measurements) and the expiration threshold of old data.

To learn more about the native time series collections and other MongoDB features that make working with time series data easier, check out the Time Series Data article.

Ad-Hoc Queries
When designing the schema of a database, it is impossible to know in advance all the queries that will be performed by end users. An ad-hoc query is a short-lived command whose value depends on a variable. Each time an ad-hoc query is executed, the result may be different, depending on the variables in question.

Optimizing the way in which ad-hoc queries are handled can make a significant difference at scale, when thousands to millions of variables may need to be considered. This is why MongoDB, a document-oriented, flexible schema database, stands apart as the cloud database platform of choice for enterprise applications that require real-time analytics. With ad-hoc query support that allows developers to update ad-hoc queries in real time, the improvement in performance can be game-changing.

MongoDB supports field queries, geo queries, and regular expression searches. Queries can return specific fields and also account for user-defined functions. This is made possible with MongoDB indexes, BSON documents, and the MongoDB Query Language (MQL). MongoDB also supports aggregations via the Aggregation Framework.

To learn more about the analytics features of MongoDB, check out the dedicated Real-Time Analytics article.

Indexing
In our experience, the number one issue that many technical support teams fail to address with their users is indexing. Done right, indexes are intended to improve search speed and performance. A failure to properly define appropriate indexes can and usually will lead to a myriad of accessibility issues, such as problems with query execution and load balancing.

Without the right indexes, a database is forced to scan documents one by one to identify the ones that match the query statement. But if an appropriate index exists for each query, user requests can be optimally executed by the server. MongoDB offers a broad range of indexes and features with language-specific sort orders that support complex access patterns to datasets.

Notably, MongoDB indexes can be created on demand to accommodate real-time, ever-changing query patterns and application requirements. They can also be declared on any field within any of your documents, including those nested within arrays.

Check out the Performance Best Practices: Indexing article to learn more about the different types of indexes and how to use them.

Q3. Write a code to connect MongoDB to Python. Also, create a database and a collection in MongoDB.

In [3]:
import pymongo
client = pymongo.MongoClient("mongodb+srv://pwskills:pwskills@cluster0.5bwjf8n.mongodb.net/?retryWrites=true&w=majority")
db = client.test


In [2]:
pip install pymongo

Note: you may need to restart the kernel to use updated packages.


In [4]:
client = pymongo.MongoClient("mongodb+srv://pwskills:pwskills@cluster0.5bwjf8n.mongodb.net/?retryWrites=true&w=majority")
#Create Database
db=client['PWSkills']

In [6]:
pass

In [9]:
#Create Collection ~ Table
collection=db['My_Collection']

In [16]:
data={ 'Name' : "Meena" ,"Course" :"Data Science"}

In [9]:
data

{'Name': 'Lalitha', 'Course': 'Data Science'}

In [17]:
collection.insert_one(data)

<pymongo.results.InsertOneResult at 0x7f5388723880>

In [11]:
collection.find_one()

{'_id': ObjectId('63fcbd070ddcd7815398b6c9'),
 'Name': 'Lalitha',
 'Course': 'Data Science'}

Q4. Using the database and the collection created in question number 3, write a code to insert one record,
and insert many records. Use the find() and find_one() methods to print the inserted record.

In [11]:
data1={ 'Name' : "Lalitha" ,"Course" :"Data Science"}
data2={ 'Name' : "Lalitha" ,"Course" :"Data Science"}
data3={ 'Name' : "Lalitha" ,"Course" :"Data Science"}

In [12]:
my_list=[data1,data2,data3]

In [13]:
collection.insert_many(my_list)

<pymongo.results.InsertManyResult at 0x7f5389437c10>

In [16]:
collection.find()

<pymongo.cursor.Cursor at 0x7fb7dd39ca00>

In [17]:
for i in collection.find() :
    print(i)

{'_id': ObjectId('63fcbd070ddcd7815398b6c9'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6ca'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6cb'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6cc'), 'Name': 'Lalitha', 'Course': 'Data Science'}


In [18]:
for i in collection.find_one() :
    print(i)

_id
Name
Course


Q5. Explain how you can use the find() method to query the MongoDB database. Write a simple code to
demonstrate this.

In MongoDB, find() method is used to select documents in a collection and return a cursor to the selected documents. Cursor means a pointer that points to a document, when we use find() method it returns a pointer on the selected documents and returns one by one. If we want to return pointer on all documents then use empty() parameter that returns all documents one by one. It takes only some optional parameters. The first optional parameter is the selection criteria on which we want to return a cursor. To return all documents in a collection use empty document({}). Using this method you can also replace embedded documents. You can also use this method in multi-document transactions. If you use this method in the mongo shell, then the shell will automatically iterate the cursor to display up to 20 documents in the collection, if you want to continue then type it or you can manually iterate the result of the find() method by assigning the returned cursor to a variable with the var keyword. You can also modify the behavior of this method using cursor methods.

db.Collection_name.find(selection_criteria, projection,options)

selection_criteria: It specifies selection criteria. To return all documents in a collection use empty document({}). The type of this parameter is document.

projection: It specifies the fields to return in the documents that match the selection criteria. To return all fields in the matching documents, remove this parameter. It is of the document type.

options: It specifies some additional options for the selection_criteria parameter. It modifies the behavior of selection_criteria and also affects the results that will be returned.

In [6]:
db.collection.find()

<pymongo.cursor.Cursor at 0x7f5388650310>

Find all the documents present in the collection by passing empty document:

In [18]:
for i in collection.find({}) :
    print(i)

{'_id': ObjectId('63fcbd070ddcd7815398b6c9'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6ca'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6cb'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6cc'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc04ed30cbe40fc3fdd48'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc04ed30cbe40fc3fdd49'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc04ed30cbe40fc3fdd4a'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc084d30cbe40fc3fdd4b'), 'Name': 'Meena', 'Course': 'Data Science'}


In [None]:
Find all the document that matches the given filter query

In [20]:
for i in collection.find({'Name' : "Meena"}) :
    print(i)

{'_id': ObjectId('63fcc084d30cbe40fc3fdd4b'), 'Name': 'Meena', 'Course': 'Data Science'}


Q6. Explain the sort() method. Give an example to demonstrate sorting in MongoDB.

The sort() method can be used to sort the metadata values for a calculated metadata field.

The following example used the “food” collection to demonstrate how documents can be sorted using the metadata “textScore.” The field name in the sort() method can be arbitrary as the query system ignores the field name.

In [21]:
for i in collection.find({}) :
    print(i)

{'_id': ObjectId('63fcbd070ddcd7815398b6c9'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6ca'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6cb'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6cc'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc04ed30cbe40fc3fdd48'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc04ed30cbe40fc3fdd49'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc04ed30cbe40fc3fdd4a'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc084d30cbe40fc3fdd4b'), 'Name': 'Meena', 'Course': 'Data Science'}


In [36]:
for i in collection.find().sort("Name") :
    print(i)
    

{'_id': ObjectId('63fcbd070ddcd7815398b6c9'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6ca'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6cb'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcbdd60ddcd7815398b6cc'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc04ed30cbe40fc3fdd48'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc04ed30cbe40fc3fdd49'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc04ed30cbe40fc3fdd4a'), 'Name': 'Lalitha', 'Course': 'Data Science'}
{'_id': ObjectId('63fcc084d30cbe40fc3fdd4b'), 'Name': 'Meena', 'Course': 'Data Science'}


Q7. Explain why delete_one(), delete_many(), and drop() is used.

Delete_many()
Delete_many() is used when one needs to delete more than one document. A query object containing which document to be deleted is created and is passed as the first parameter to the delete_many().

Syntax:

collection.delete_many(filter, collation=None, hint=None, session=None)



Deleting document from Collection or Database
In MongoDB, a single document can be deleted by the method delete_one(). The first parameter of the method would be a query object which defines the document to be deleted. If there are multiple documents matching the filter query, only the first appeared document would be deleted. 

Note: Deleting a document is the same as deleting a record in the case of SQL.

The drop() method removes the specified row or column.

By specifying the column axis (axis='columns'), the drop() method removes the specified column.

By specifying the row axis (axis='index'), the drop() method removes the specified row.