## 1. Introduction to MongoDB
 What is MongoDB?
 
 - MongoDB is a NoSQL database that stores data in a document-oriented format, using
 JSON-like structures called BSON (Binary JSON).
 
 - It's designed to handle large volumes of unstructured, semi-structured, or structured
 data.
 MongoDB is flexible, scalable, and perfect for modern applications like e-commerce
 platforms, social media, and real-time analytics.

 #### Key Features of MongoDB
 1. Document-Oriented: Stores data in key-value pairs within documents (JSON-like
 structure).
 2. Schema-less: Unlike SQL databases, MongoDB doesn’t require a fixed schema.
 3. High Scalability: Horizontal scaling using sharding.
 4. Indexing: Supports various types of indexes to optimize query performance.
 5. Aggregation Framework: For advanced data analysis and transformation.
 6. Replication: Provides high availability via replica sets.
 7. Rich Query Language: Supports CRUD operations, filtering, sorting, and joins using
 $lookup .

## What is BSON Format?
 BSON (Binary JSON) is the data storage and network transfer format used by MongoDB.
 While it's similar to JSON (JavaScript Object Notation), it has a few key differences:

 Key Characteristics of BSON:
 - Binary Representation: BSON is optimized for binary data and is more compact and
 efficient than plain JSON, especially for data transfer.
 - Supports Extra Data Types: While JSON supports basic types like 
array , BSON supports additional types such as:
 - Date: A native date type.
 - Binary Data: For files or blobs.
 - ObjectId: A unique ID for MongoDB documents.
 string , 
number , and
 - Efficient Size: BSON includes metadata (like the length of fields) for fast parsing and
 searching.


 - When MongoDB stores this data, it’s converted into binary:

#### 2. MongoDB stores this in BSON format internally:

- You don’t directly see BSON, but this is what MongoDB uses behind the scenes to
 store the document.

In [None]:
#json foramt
{
 "name": "Dhanunjaya",
 "age": 25,
"date": "2024-11-29T12:00:00Z"
 }

#bson format
  \x16\x00\x00\x00\x02name\x00\x0a\x00\x00\x00Dhanunjaya\x00\x10age\x00\x19\x00\x00\x00\

## 2. Key Features of MongoDB Explained

 Here’s an easy-to-understand breakdown of the key features:

 1. Document-Oriented

 - MongoDB stores data in documents instead of rows and columns.
 - Each document is a JSON-like object containing key-value pairs.
 - Advantage: Flexible structure—you can store complex nested data easily

2. Schema-less
 - No predefined structure is required for data.
 - Each document in a collection can have a different structure.
 - Advantage: Easier to adapt to changes in data models

# using mongoshell

- mongosh #Open your terminal or MongoDB Compass
- download the MongoDB Community Server
- download the mongo shell and put it in the monogo folder and give bin folder path to the environment variable

In [None]:
#Creating a Database
use testdb

#creating collections
db.createCollection("users")

#inserting documents
 db.users.insertOne({
 "name": "Dhanunjaya",
 "role": "QA Tester",
 "experience": 0
 })

#viewing all documents
db.users.find()


# using python

In [2]:
pip install pymongo

Collecting pymongo
  Using cached pymongo-4.10.1-cp311-cp311-win_amd64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Using cached dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Using cached pymongo-4.10.1-cp311-cp311-win_amd64.whl (876 kB)
Using cached dnspython-2.7.0-py3-none-any.whl (313 kB)
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.10.1
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [1]:
# connection to the mongodb
from pymongo import MongoClient
# Connect to the MongoDB server (default: localhost:27017)
client = MongoClient("mongodb://localhost:27017/")

In [2]:
#create a new db and new collection
db = client['dj_test'] #db name , which contains the collections

collection = db['tp_one'] #collection name, which contains the documents

In [None]:
# Insert a document
user_data = {
"name": "Dhanunjaya",
"role": "QA Tester",
"experience": 0
}

result = collection.insert_one(user_data)   #insert_one document 
print(f"Inserted Document ID: {result.inserted_id}")

Inserted Document ID: 67497e6603b835eca6c32adc


In [None]:
# Query the collection
for user in collection.find(): #find to retrieve the documents 
    print(user)

{'_id': ObjectId('67497e6603b835eca6c32adc'), 'name': 'Dhanunjaya', 'role': 'QA Tester', 'experience': 0}


In [11]:
#CRUD Operations , create, read, update, delete

In [None]:
#create - insert one document

collection.insert_one(
    {"name": "Dhanunjaya",
 "role": "QA Tester",
 "experience": 0
 })
 
 
 #create - insert many documents

collection.insert_many(
    [
        { "name": "Alice", "role": "Developer", "experience": 2 },
 { "name": "Bob", "role": "Tester", "experience": 1 }
    ]
)

InsertManyResult([ObjectId('6749850b03b835eca6c32ade'), ObjectId('6749850b03b835eca6c32adf')], acknowledged=True)

In [None]:
# read operations

for i in collection.find(): #collection with find()
    print(i)

{'_id': ObjectId('67497e6603b835eca6c32adc'), 'name': 'Dhanunjaya', 'role': 'QA Tester', 'experience': 0}
{'_id': ObjectId('674984d703b835eca6c32add'), 'name': 'Dhanunjaya', 'role': 'QA Tester', 'experience': 0}
{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}
{'_id': ObjectId('6749850b03b835eca6c32adf'), 'name': 'Bob', 'role': 'Tester', 'experience': 1}


In [18]:
# read operations with specific query

for i in collection.find({'name' : 'Alice', 'role':'Developer' }): #collection with find()
    print(i)

{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}


In [None]:
# to get specific fields only

# will not work, because You're attempting to include the name field ('name': 1) and exclude the role field ('role': 0). However, MongoDB does not allow the combination of inclusion and exclusion in the same query, except for the _id field.
for i in collection.find({}, {'name':1, 'role':0}):  
    print(i)

OperationFailure: Cannot do exclusion on field role in inclusion projection, full error: {'ok': 0.0, 'errmsg': 'Cannot do exclusion on field role in inclusion projection', 'code': 31254, 'codeName': 'Location31254'}

In [20]:
# Include only 'name' and exclude everything else
for i in collection.find({}, {'name': 1}):
    print(i)


{'_id': ObjectId('67497e6603b835eca6c32adc'), 'name': 'Dhanunjaya'}
{'_id': ObjectId('674984d703b835eca6c32add'), 'name': 'Dhanunjaya'}
{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice'}
{'_id': ObjectId('6749850b03b835eca6c32adf'), 'name': 'Bob'}


In [None]:
# Exclude 'role' and '_id' by default (default is included if not specified)
for i in collection.find({}, {'experience': 0}):
    print(i)


In [None]:
# Include only 'name' and exclude everything else
for i in collection.find({}, {'name': 1, '_id':0}): # for id it will work
    print(i)


{'name': 'Dhanunjaya'}
{'name': 'Dhanunjaya'}
{'name': 'Alice'}
{'name': 'Bob'}


In [22]:
# Exclude 'role' and '_id' by default (default is included if not specified)
for i in collection.find({}, {'experience': 0}):
    print(i)


{'_id': ObjectId('67497e6603b835eca6c32adc'), 'name': 'Dhanunjaya', 'role': 'QA Tester'}
{'_id': ObjectId('674984d703b835eca6c32add'), 'name': 'Dhanunjaya', 'role': 'QA Tester'}
{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer'}
{'_id': ObjectId('6749850b03b835eca6c32adf'), 'name': 'Bob', 'role': 'Tester'}


In [46]:
# Exclude 'role' and '_id' by default (default is included if not specified)
for i in collection.find({}):
    print(i)


{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}


In [35]:
# updating the documents

#update one document
collection.update_one(
    { "name": "Dhanunjaya" },
 { "$set": { "experience": 1, 'role':"QA Tester" } }
)

UpdateResult({'n': 1, 'nModified': 0, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [39]:
# update many
collection.update_many(
    {"role":"QA Tester"},
    {"$set" : {'experience':'2'}}
)

UpdateResult({'n': 2, 'nModified': 2, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [41]:
# delete the documents

collection.delete_one({'name':'Bob'})

DeleteResult({'n': 1, 'ok': 1.0}, acknowledged=True)

In [45]:
# delete many documents
collection.delete_many({'name':'Dhanunjaya'})

DeleteResult({'n': 2, 'ok': 1.0}, acknowledged=True)

In [21]:
collection.insert_one({'name':'dhanunjaya','date':True,'age':12})

InsertOneResult(ObjectId('674dc299131081ea648bb998'), acknowledged=True)

In [22]:
collection.find_one({})

{'_id': ObjectId('6749850b03b835eca6c32ade'),
 'name': 'Alice',
 'role': 'Developer',
 'experience': 2}

In [23]:
for i in collection.find({}):
    print(i)

{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}
{'_id': ObjectId('674dc299131081ea648bb998'), 'name': 'dhanunjaya', 'date': True, 'age': 12}


In [18]:
for i in collection.find({},{'name':1,'_id':0}):
    print(i)

{'name': 'Alice'}
{'name': 'dhanunjaya'}


In [19]:
collection.update_one({'name':'dhanunjaya'},{'$set':{'date':False}})

UpdateResult({'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [20]:
collection.delete_one({'name':'dhanunjaya'})

DeleteResult({'n': 1, 'ok': 1.0}, acknowledged=True)

In [24]:
collection

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'dj_test'), 'tp_one')

In [None]:
. CREATE (Insert Data)
 Task 1:
 You need to validate if new user data is correctly inserted into the database. Test the
 following scenarios:
 Insert a user with all required fields (
 name , 
role , 
experience ).
 Insert a user with missing fields and ensure that it still inserts correctly, but defaults are
 applied for missing fields.

Insert a user with extra, non-required fields (e.g., 
email ), and check if MongoDB
 handles it without breaking

In [3]:
#task one

data = {'name':'dj','role':'qa','experience':4}

collection.insert_one(data)

InsertOneResult(ObjectId('67556aa98ea3ab4410643f0d'), acknowledged=True)

In [4]:
data = {'name':'dj','role':'qa','experience':None}

collection.insert_one(data)

InsertOneResult(ObjectId('67556ad48ea3ab4410643f0e'), acknowledged=True)

In [6]:
data = {'name':'dj','role':'qa','email':'dhanunjaaya@gmail.com'}

collection.insert_one(data)

InsertOneResult(ObjectId('67556b2d8ea3ab4410643f0f'), acknowledged=True)

In [14]:
users = [
    {"name": "John Doe", "role": "Developer", "experience": 3},
    {"name": "Jane Smith", "role": "QA Tester", "experience": 2},
    {"name": "Alice Johnson", "role": "DevOps Engineer", "experience": 4},
]


collection.insert_many(users)

InsertManyResult([ObjectId('67556c168ea3ab4410643f16'), ObjectId('67556c168ea3ab4410643f17'), ObjectId('67556c168ea3ab4410643f18')], acknowledged=True)

In [15]:
for i in collection.find():
    print(i)

{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}
{'_id': ObjectId('674dc299131081ea648bb998'), 'name': 'dhanunjaya', 'date': True, 'age': 12}
{'_id': ObjectId('67556aa98ea3ab4410643f0d'), 'name': 'dj', 'role': 'qa', 'experience': 4}
{'_id': ObjectId('67556ad48ea3ab4410643f0e'), 'name': 'dj', 'role': 'qa', 'experience': None}
{'_id': ObjectId('67556b2d8ea3ab4410643f0f'), 'name': 'dj', 'role': 'qa', 'email': 'dhanunjaaya@gmail.com'}
{'_id': ObjectId('67556b8e8ea3ab4410643f10'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556b8e8ea3ab4410643f12'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c128ea3ab4410643f13'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane Smith', 'role': 'QA Test

In [None]:
2. READ (Query Data)
 Task 3:
 Test if the system retrieves the correct users based on specific queries.
 Query all users with a specific role, e.g., all QA Testers.
 Query users who have experience greater than a certain value, e.g., greater than 1 year.
 Check if the projection works—retrieve users' names but exclude their _id 

In [21]:
for i in collection.find({'name':'dj','role':'qa'}):
    print(i)

{'_id': ObjectId('67556aa98ea3ab4410643f0d'), 'name': 'dj', 'role': 'qa', 'experience': 4}
{'_id': ObjectId('67556ad48ea3ab4410643f0e'), 'name': 'dj', 'role': 'qa', 'experience': None}
{'_id': ObjectId('67556b2d8ea3ab4410643f0f'), 'name': 'dj', 'role': 'qa', 'email': 'dhanunjaaya@gmail.com'}


In [22]:
for i in collection.find({'role':'qa'}):
    print(i)

{'_id': ObjectId('67556aa98ea3ab4410643f0d'), 'name': 'dj', 'role': 'qa', 'experience': 4}
{'_id': ObjectId('67556ad48ea3ab4410643f0e'), 'name': 'dj', 'role': 'qa', 'experience': None}
{'_id': ObjectId('67556b2d8ea3ab4410643f0f'), 'name': 'dj', 'role': 'qa', 'email': 'dhanunjaaya@gmail.com'}


In [30]:
for i in collection.find({'experience':{'$gte':1}}):
    print(i)

{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}
{'_id': ObjectId('67556aa98ea3ab4410643f0d'), 'name': 'dj', 'role': 'qa', 'experience': 4}
{'_id': ObjectId('67556b8e8ea3ab4410643f10'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556b8e8ea3ab4410643f12'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c128ea3ab4410643f13'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c128ea3ab4410643f15'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c168ea3ab4410643f16'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c168ea3ab4410643f17'), 'name': 

In [31]:
# Fetch all users, projecting only the 'name' field and excluding '_id'
for user in collection.find({}, {"name": 1, "_id": 0}):
    print(user)


{'name': 'Alice'}
{'name': 'dhanunjaya'}
{'name': 'dj'}
{'name': 'dj'}
{'name': 'dj'}
{'name': 'John Doe'}
{'name': 'Jane Smith'}
{'name': 'Alice Johnson'}
{'name': 'John Doe'}
{'name': 'Jane Smith'}
{'name': 'Alice Johnson'}
{'name': 'John Doe'}
{'name': 'Jane Smith'}
{'name': 'Alice Johnson'}


In [None]:
Task 4:
 Test the edge cases for querying:
 Query using an invalid field (e.g., a field that doesn't exist).
 Test querying with partial matches (e.g., use regular expressions to find users with
 role containing the word "Tester").

In [32]:
for i in collection.find({'id':1}):
    print(i)

In [33]:
# Query for users with a role containing the word "Tester"
regex_query = collection.find({"role": {"$regex": "Tester", "$options": "i"}})  # 'i' for case-insensitive

# Print matching users
for user in regex_query:
    print(user)

# Optional validation (if you want to count matches)
matched_count = collection.count_documents({"role": {"$regex": "Tester", "$options": "i"}})
print(f"Number of matches: {matched_count}")


{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c168ea3ab4410643f17'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
Number of matches: 3


# regex

In [None]:
Summary of Patterns:
^...: Starts with.
...$: Ends with.
.*...*: Contains.
\b...\b: Word boundary.
[...]: Specific character sets.
\d: Digits.
|: OR condition.
.{n,m}: Length constraints.

In [None]:
# Match roles starting with "QA" $options: "i": Makes the match case-insensitive.
regex_query = collection.find({"role": {"$regex": "^QA", "$options": "i"}})

for user in regex_query:
    print(user)


{'_id': ObjectId('67556aa98ea3ab4410643f0d'), 'name': 'dj', 'role': 'qa', 'experience': 4}
{'_id': ObjectId('67556ad48ea3ab4410643f0e'), 'name': 'dj', 'role': 'qa', 'experience': None}
{'_id': ObjectId('67556b2d8ea3ab4410643f0f'), 'name': 'dj', 'role': 'qa', 'email': 'dhanunjaaya@gmail.com'}
{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c168ea3ab4410643f17'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}


In [36]:
# Match roles ending with "Engineer"
regex_query = collection.find({"role": {"$regex": "Engineer$", "$options": "i"}})

for user in regex_query:
    print(user)


{'_id': ObjectId('67556b8e8ea3ab4410643f12'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c128ea3ab4410643f15'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c168ea3ab4410643f18'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}


In [37]:
# Match roles containing "Dev"
regex_query = collection.find({"role": {"$regex": "Dev", "$options": "i"}})

for user in regex_query:
    print(user)


{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}
{'_id': ObjectId('67556b8e8ea3ab4410643f10'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556b8e8ea3ab4410643f12'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c128ea3ab4410643f13'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c128ea3ab4410643f15'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c168ea3ab4410643f16'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c168ea3ab4410643f18'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}


In [39]:
# Match roles containing the exact word "QA"
regex_query = collection.find({"role": {"$regex": r"\bQA\b"}})

for user in regex_query:
    print(user)


{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c168ea3ab4410643f17'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}


In [40]:
# Match roles containing "Tester" or "Engineer"
regex_query = collection.find({"role": {"$regex": "Tester|Engineer", "$options": "i"}})

for user in regex_query:
    print(user)


{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556b8e8ea3ab4410643f12'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c128ea3ab4410643f15'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c168ea3ab4410643f17'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c168ea3ab4410643f18'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}


In [None]:
# Match roles starting with "D" or "T" and ending with "r"
regex_query = collection.find({"role": {"$regex": "^[DT].*r$", "$options": "i"}})

for user in regex_query:
    print(user)


"""
Explanation
^[DT]: Matches roles starting with "D" or "T".
.*: Matches any number of characters between the start and the end.
r$: Matches roles ending with "r" (e.g., "Developer", "Tester").
"""


{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}
{'_id': ObjectId('67556b8e8ea3ab4410643f10'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556b8e8ea3ab4410643f12'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c128ea3ab4410643f13'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c128ea3ab4410643f15'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c168ea3ab4410643f16'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c168ea3ab4410643f18'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}


In [48]:
# Case-sensitive match for "Tester"
regex_query = collection.find({"role": {"$regex": "Tester"}})

for user in regex_query:
    print(user)


{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c168ea3ab4410643f17'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}


In [49]:
# Match roles not containing "QA"
regex_query = collection.find({"role": {"$regex": "^(?!.*QA).*$", "$options": "i"}})

for user in regex_query:
    print(user)

"""^:

Indicates the start of the string.
(?!.*QA):

Negative lookahead assertion.
Ensures that the string does not contain "QA" anywhere after the current position.
.*:

Matches any number of characters (zero or more).
$:

Indicates the end of the string."""

{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}
{'_id': ObjectId('67556b8e8ea3ab4410643f10'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556b8e8ea3ab4410643f12'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c128ea3ab4410643f13'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c128ea3ab4410643f15'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c168ea3ab4410643f16'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c168ea3ab4410643f18'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}


'^:\n\nIndicates the start of the string.\n(?!.*QA):\n\nNegative lookahead assertion.\nEnsures that the string does not contain "QA" anywhere after the current position.\n.*:\n\nMatches any number of characters (zero or more).\n$:\n\nIndicates the end of the string.'

In [None]:
# Match roles containing any numbers
regex_query = collection.find({"role": {"$regex": r"\d", "$options": "i"}})

for user in regex_query:
    print(user)


"""Explanation
\d: Matches any numeric character (0-9).
Useful for identifying entries with numeric annotations (e.g., "Engineer1").
"""


In [None]:
# Match roles with exactly 6 characters
regex_query = collection.find({"role": {"$regex": r"^.{6}$", "$options": "i"}})

for user in regex_query:
    print(user)


"""Explanation
^.{6}$: Matches strings with exactly six characters.
Replace 6 with n,m for ranges, e.g., ^.{4,8}$ matches strings between 4 to 8 characters long."""

In [47]:
# Match roles with exactly 6 characters
regex_query = collection.find({"role": {"$regex": r"^.{1,10}$", "$options": "i"}})

for user in regex_query:
    print(user)

{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}
{'_id': ObjectId('67556aa98ea3ab4410643f0d'), 'name': 'dj', 'role': 'qa', 'experience': 4}
{'_id': ObjectId('67556ad48ea3ab4410643f0e'), 'name': 'dj', 'role': 'qa', 'experience': None}
{'_id': ObjectId('67556b2d8ea3ab4410643f0f'), 'name': 'dj', 'role': 'qa', 'email': 'dhanunjaaya@gmail.com'}
{'_id': ObjectId('67556b8e8ea3ab4410643f10'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c128ea3ab4410643f13'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c168ea3ab4410643f16'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c168ea3ab4410643f17'), 'name': 'Jane Smith', 'role': 'QA Te

In [50]:
query = collection.find({"role": {"$exists": True}})

for user in query:
    print(user)


{'_id': ObjectId('6749850b03b835eca6c32ade'), 'name': 'Alice', 'role': 'Developer', 'experience': 2}
{'_id': ObjectId('67556aa98ea3ab4410643f0d'), 'name': 'dj', 'role': 'qa', 'experience': 4}
{'_id': ObjectId('67556ad48ea3ab4410643f0e'), 'name': 'dj', 'role': 'qa', 'experience': None}
{'_id': ObjectId('67556b2d8ea3ab4410643f0f'), 'name': 'dj', 'role': 'qa', 'email': 'dhanunjaaya@gmail.com'}
{'_id': ObjectId('67556b8e8ea3ab4410643f10'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556b8e8ea3ab4410643f12'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c128ea3ab4410643f13'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane Smith', 'role': 'QA Tester', 'experience': 2}
{'_id': ObjectId('67556c128ea3ab4410643f15'), 'name': 'Alice Johnson', 

In [51]:
query = collection.find({"role": {"$exists": False}})

for user in query:
    print(user)


{'_id': ObjectId('674dc299131081ea648bb998'), 'name': 'dhanunjaya', 'date': True, 'age': 12}


In [None]:
3. UPDATE (Modify Data)
 Task 5:
 25/89
Update the experience of a specific user (e.g., "Dhanunjaya") and verify if the data is
 updated correctly.
 Test updating multiple users at once, e.g., all users with the role "Tester" to update their
 experience to a higher value.
 Task 6:
 Test if the system handles missing fields when updating. For instance, try updating a
 document without including some fields, and verify that the unchanged fields remain
 intact.
 Test partial updates (using $set ) and check if only the specified fields are modified,
 while others remain unchanged.

In [53]:
collection.update_one({'name':'dhanunjaya'},{'$set':{'experience':3}})

UpdateResult({'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [69]:
for i in collection.find({'role':'QA Tester'}):
    print(i)

{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane', 'role': 'QA Tester', 'experience': 10, 'email': 'jane.smith@example.com'}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane', 'role': 'QA Tester', 'experience': 10}
{'_id': ObjectId('67556c168ea3ab4410643f17'), 'name': 'Jane', 'role': 'QA Tester', 'experience': 10}


In [66]:
collection.update_many({'role':'QA Tester'},{'$set':{'experience':10,'name':'Jane'}})

UpdateResult({'n': 3, 'nModified': 3, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [68]:
from pymongo import MongoClient
from bson import ObjectId

collection.update_one({'_id': ObjectId('67556b8e8ea3ab4410643f11')},{'$set': {'email': 'jane.smith@example.com'}})

UpdateResult({'n': 1, 'nModified': 0, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [None]:
. DELETE (Remove Data)
 Task 7:
 Delete a specific user by their name and verify that the document is deleted from the
 database.
 Test deleting multiple users based on a condition, such as deleting all users with the role
 "Tester".
 After deleting, query the database to ensure that the users are no longer present.
 Task 8:
 Test edge cases:
 Try deleting a user that doesn’t exist and check how MongoDB responds (no errors, just
 no deletion).
 Attempt deleting documents from an empty collection and verify no errors occur

In [72]:
collection.delete_one({'_id': ObjectId('6749850b03b835eca6c32ade')})

DeleteResult({'n': 1, 'ok': 1.0}, acknowledged=True)

In [75]:
for i in collection.find():
    print(i)

{'_id': ObjectId('674dc299131081ea648bb998'), 'name': 'dhanunjaya', 'date': True, 'age': 12, 'experience': 3}
{'_id': ObjectId('67556b8e8ea3ab4410643f10'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556b8e8ea3ab4410643f11'), 'name': 'Jane', 'role': 'QA Tester', 'experience': 10, 'email': 'jane.smith@example.com'}
{'_id': ObjectId('67556b8e8ea3ab4410643f12'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c128ea3ab4410643f13'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c128ea3ab4410643f14'), 'name': 'Jane', 'role': 'QA Tester', 'experience': 10}
{'_id': ObjectId('67556c128ea3ab4410643f15'), 'name': 'Alice Johnson', 'role': 'DevOps Engineer', 'experience': 4}
{'_id': ObjectId('67556c168ea3ab4410643f16'), 'name': 'John Doe', 'role': 'Developer', 'experience': 3}
{'_id': ObjectId('67556c168ea3ab4410643f17'), 'name': 'Jane', 'role': 'QA Tester', 'experience': 10}
{'_id': Ob

In [79]:
collection.delete_many({'role':'qa'})

DeleteResult({'n': 0, 'ok': 1.0}, acknowledged=True)

In [None]:
Task 9:
 You’re testing a new MongoDB-backed feature in an e-commerce platform, where users can
 add products to their shopping cart. Test the following:
 Insert product documents with different fields (price, quantity, and category).
 Update the product quantities when users add or remove products from the cart.
 Delete a product from the cart and verify if the remaining products are not affected.

In [81]:
collection_shop = db['shop_cart']

In [None]:
collection_shop.insert_one({'item':'pant','price':23.45,'country':'india'})

InsertOneResult(ObjectId('675574588ea3ab4410643f19'), acknowledged=True)

In [83]:
# Insert product documents
products = [
    {'name': 'Laptop', 'price': 1200, 'quantity': 1, 'category': 'Electronics'},
    {'name': 'Shoes', 'price': 50, 'quantity': 2, 'category': 'Fashion'},
    {'name': 'Smartphone', 'price': 800, 'quantity': 1, 'category': 'Electronics'}
]

# Insert products into the collection
collection_shop.insert_many(products)

# Verify the insertion
for product in collection_shop.find():
    print(product)

{'_id': ObjectId('675574588ea3ab4410643f19'), 'item': 'pant', 'price': 23.45, 'country': 'india'}
{'_id': ObjectId('6755748c8ea3ab4410643f1a'), 'name': 'Laptop', 'price': 1200, 'quantity': 1, 'category': 'Electronics'}
{'_id': ObjectId('6755748c8ea3ab4410643f1b'), 'name': 'Shoes', 'price': 50, 'quantity': 2, 'category': 'Fashion'}
{'_id': ObjectId('6755748c8ea3ab4410643f1c'), 'name': 'Smartphone', 'price': 800, 'quantity': 1, 'category': 'Electronics'}


In [85]:
# Update the quantity of 'Laptop' by adding 1
collection_shop.update_one(
    {'name': 'Laptop'},  # Find the product by name
    {'$inc': {'quantity': 1}}  # Increment the quantity by 1
)

# Verify the update
updated_product = collection_shop.find_one({'name': 'Laptop'})
print(updated_product)


{'_id': ObjectId('6755748c8ea3ab4410643f1a'), 'name': 'Laptop', 'price': 1200, 'quantity': 3, 'category': 'Electronics'}


In [86]:
# Delete the 'Smartphone' product from the cart
collection_shop.delete_one({'name': 'Smartphone'})

# Verify the deletion
deleted_product = collection_shop.find_one({'name': 'Smartphone'})
print(deleted_product)  # This should return None or show no results

# Verify that the remaining products are still there
remaining_products = collection_shop.find()
for product in remaining_products:
    print(product)


None
{'_id': ObjectId('675574588ea3ab4410643f19'), 'item': 'pant', 'price': 23.45, 'country': 'india'}
{'_id': ObjectId('6755748c8ea3ab4410643f1a'), 'name': 'Laptop', 'price': 1200, 'quantity': 3, 'category': 'Electronics'}
{'_id': ObjectId('6755748c8ea3ab4410643f1b'), 'name': 'Shoes', 'price': 50, 'quantity': 2, 'category': 'Fashion'}


# 3. Indexing in MongoDB

- Indexing is a crucial concept in MongoDB (and any database) that helps improve the
 performance of queries by allowing for faster data retrieval. Without indexing, MongoDB has
 to scan every document in a collection to find the matching data, which can be slow when
 dealing with large datasets.

### What is Indexing?
 An index is a special data structure that stores a sorted order of the values of a specific field
 or fields. By using indexes, MongoDB can quickly locate and retrieve data, without having to
 scan the entire collection

In [None]:
 Types of Indexes in MongoDB
 1. Single Field Index
 This is the most common type, where an index is created on a single field in a collection.
 2. Compound Index
 A compound index is created on multiple fields. It's useful when you frequently query by
 a combination of fields.
 3. Multikey Index
 Used when a field contains an array. MongoDB creates an index for each element in the
 array.
 4. Text Index
 Allows for efficient searching of text in string fields, often used for full-text search.
 5. Hashed Index
 A hashed index is based on a hash of the field's value. It's often used for sharding in
 distributed MongoDB clusters.
 6. Geospatial Index
 Used for queries that involve geographic data, such as finding nearby locations.

In [90]:
# Single Field Index

collection.create_index([('name',1)])

'name_1'

In [91]:
 # Create a compound index on "role" and "experience" , -1 for in descending order

collection.create_index([("role", 1), ("experience", -1)])

'role_1_experience_-1'

##### Why Indexing is Important
- Performance: Queries are much faster with indexes, especially when dealing with large
 datasets.

- Efficiency: It prevents MongoDB from performing a full collection scan for each query.
 Cost: While indexes improve read performance, they add a cost to write operations
 (insert/update) because the index must be updated as well

In [None]:
# Query to find all products in a specific category
products_in_electronics = collection.find({'category': 'Electronics'})

# MongoDB will automatically use the 'category' index to optimize this query.
for product in products_in_electronics:
    print(product)



In [None]:
# Force MongoDB to use the 'category' index
products_in_electronics = collection.find({'category': 'Electronics'}).hint([('category', 1)])

# MongoDB will use the specified index ('category' index) explicitly.
for product in products_in_electronics:
    print(product)


In [93]:
# Use explain to see which index is being used
explain_output = collection.find({'category': 'Electronics'}).explain()

# Print the explanation to see which index MongoDB is using
print(explain_output)


{'explainVersion': '1', 'queryPlanner': {'namespace': 'dj_test.shop_cart', 'parsedQuery': {'category': {'$eq': 'Electronics'}}, 'indexFilterSet': False, 'queryHash': '421A7F3B', 'planCacheKey': '0AB69667', 'optimizationTimeMillis': 0, 'maxIndexedOrSolutionsReached': False, 'maxIndexedAndSolutionsReached': False, 'maxScansToExplodeReached': False, 'prunedSimilarIndexes': False, 'winningPlan': {'isCached': False, 'stage': 'COLLSCAN', 'filter': {'category': {'$eq': 'Electronics'}}, 'direction': 'forward'}, 'rejectedPlans': []}, 'executionStats': {'executionSuccess': True, 'nReturned': 1, 'executionTimeMillis': 2, 'totalKeysExamined': 0, 'totalDocsExamined': 3, 'executionStages': {'isCached': False, 'stage': 'COLLSCAN', 'filter': {'category': {'$eq': 'Electronics'}}, 'nReturned': 1, 'executionTimeMillisEstimate': 0, 'works': 4, 'advanced': 1, 'needTime': 2, 'needYield': 0, 'saveState': 0, 'restoreState': 0, 'isEOF': 1, 'direction': 'forward', 'docsExamined': 3}, 'allPlansExecution': []}, '

In [92]:
# List all indexes in the 'cart' collection
indexes = collection.list_indexes()

# Print all indexes
for index in indexes:
    print(index)


SON([('v', 2), ('key', SON([('_id', 1)])), ('name', '_id_')])
SON([('v', 2), ('key', SON([('name', 1)])), ('name', 'name_1')])
SON([('v', 2), ('key', SON([('role', 1), ('experience', -1)])), ('name', 'role_1_experience_-1')])


In [None]:
1. Basic Index Creation
 Task 1:
 Create a simple index on the 
name field in the 
users collection.
 Run a query to find a user by name and check if MongoDB uses the index.
 2. Compound Index
 Task 2:
 Create a compound index on 
role and 
experience in the 
users collection.
 Run a query to find users with a specific role and experience. Verify that MongoDB uses
 the compound index

In [94]:
for i in collection.find():
    print(i)

{'_id': ObjectId('675574588ea3ab4410643f19'), 'item': 'pant', 'price': 23.45, 'country': 'india'}
{'_id': ObjectId('6755748c8ea3ab4410643f1a'), 'name': 'Laptop', 'price': 1200, 'quantity': 3, 'category': 'Electronics'}
{'_id': ObjectId('6755748c8ea3ab4410643f1b'), 'name': 'Shoes', 'price': 50, 'quantity': 2, 'category': 'Fashion'}


In [95]:
collection.create_index([('users',1)])

'users_1'

In [101]:
a = collection.find({'country': 'india'}).explain()

a

{'explainVersion': '1',
 'queryPlanner': {'namespace': 'dj_test.shop_cart',
  'parsedQuery': {'country': {'$eq': 'india'}},
  'indexFilterSet': False,
  'queryHash': '1E6E91D5',
  'planCacheKey': '5B4AF4B1',
  'optimizationTimeMillis': 0,
  'maxIndexedOrSolutionsReached': False,
  'maxIndexedAndSolutionsReached': False,
  'maxScansToExplodeReached': False,
  'prunedSimilarIndexes': False,
  'winningPlan': {'isCached': False,
   'stage': 'COLLSCAN',
   'filter': {'country': {'$eq': 'india'}},
   'direction': 'forward'},
  'rejectedPlans': []},
 'executionStats': {'executionSuccess': True,
  'nReturned': 1,
  'executionTimeMillis': 0,
  'totalKeysExamined': 0,
  'totalDocsExamined': 3,
  'executionStages': {'isCached': False,
   'stage': 'COLLSCAN',
   'filter': {'country': {'$eq': 'india'}},
   'nReturned': 1,
   'executionTimeMillisEstimate': 0,
   'works': 4,
   'advanced': 1,
   'needTime': 2,
   'needYield': 0,
   'saveState': 0,
   'restoreState': 0,
   'isEOF': 1,
   'direction': 

In [None]:

3. Text Index
 Task 3:
 Create a text index on a 
description field in a 
products collection.
 Run a query to search for a product by a text string and validate the results.
 4. Drop Indexes
 31/89
Task 4:
 Drop the index on the 
name field in the 
users collection.
 Verify that the index is removed by running the 
getIndexes() command.

In [102]:
# Create a text index on the 'description' field
collection.create_index([('description', 'text')])

# Verify the index creation
indexes = collection.list_indexes()
for index in indexes:
    print(index)

SON([('v', 2), ('key', SON([('_id', 1)])), ('name', '_id_')])
SON([('v', 2), ('key', SON([('name', 1)])), ('name', 'name_1')])
SON([('v', 2), ('key', SON([('role', 1), ('experience', -1)])), ('name', 'role_1_experience_-1')])
SON([('v', 2), ('key', SON([('users', 1)])), ('name', 'users_1')])
SON([('v', 2), ('key', SON([('_fts', 'text'), ('_ftsx', 1)])), ('name', 'description_text'), ('weights', SON([('description', 1)])), ('default_language', 'english'), ('language_override', 'language'), ('textIndexVersion', 3)])


In [107]:
products_collection = db['products']  # Replace with your collection name

# Sample data to insert
products = [
    {"name": "Laptop", "price": 1200, "quantity": 10, "category": "Electronics", "description": "A high-performance laptop with a fast processor and large storage."},
    {"name": "Smartphone", "price": 800, "quantity": 20, "category": "Electronics", "description": "A sleek smartphone with a powerful camera and long battery life."},
    {"name": "Tablet", "price": 400, "quantity": 15, "category": "Electronics", "description": "A versatile tablet perfect for reading and multimedia."},
    {"name": "Wireless Mouse", "price": 25, "quantity": 50, "category": "Accessories", "description": "A smooth and responsive wireless mouse."},
    {"name": "Wireless Keyboard", "price": 50, "quantity": 30, "category": "Accessories", "description": "A compact and ergonomic wireless keyboard."}
]

# Insert data into the 'products' collection
products_collection.insert_many(products)

# Verify data insertion
for product in products_collection.find():
    print(product)

{'_id': ObjectId('675578dd8ea3ab4410643f27'), 'name': 'Laptop', 'price': 1200, 'quantity': 10, 'category': 'Electronics', 'description': 'A high-performance laptop with a fast processor and large storage.'}
{'_id': ObjectId('675578dd8ea3ab4410643f28'), 'name': 'Smartphone', 'price': 800, 'quantity': 20, 'category': 'Electronics', 'description': 'A sleek smartphone with a powerful camera and long battery life.'}
{'_id': ObjectId('675578dd8ea3ab4410643f29'), 'name': 'Tablet', 'price': 400, 'quantity': 15, 'category': 'Electronics', 'description': 'A versatile tablet perfect for reading and multimedia.'}
{'_id': ObjectId('675578dd8ea3ab4410643f2a'), 'name': 'Wireless Mouse', 'price': 25, 'quantity': 50, 'category': 'Accessories', 'description': 'A smooth and responsive wireless mouse.'}
{'_id': ObjectId('675578dd8ea3ab4410643f2b'), 'name': 'Wireless Keyboard', 'price': 50, 'quantity': 30, 'category': 'Accessories', 'description': 'A compact and ergonomic wireless keyboard.'}


In [108]:
# Sample data to insert into 'users' collection
users = [
    {"name": "Alice", "email": "alice@example.com", "role": "Admin", "experience": 5},
    {"name": "Bob", "email": "bob@example.com", "role": "Tester", "experience": 3},
    {"name": "Charlie", "email": "charlie@example.com", "role": "Developer", "experience": 4},
    {"name": "David", "email": "david@example.com", "role": "Manager", "experience": 6},
    {"name": "Eve", "email": "eve@example.com", "role": "Designer", "experience": 2}
]

# Connect to the 'users' collection
users_collection = db['users']  # Replace with your collection name

# Insert data into the 'users' collection
users_collection.insert_many(users)

# Verify data insertion
for user in users_collection.find():
    print(user)


{'_id': ObjectId('675578e48ea3ab4410643f2c'), 'name': 'Alice', 'email': 'alice@example.com', 'role': 'Admin', 'experience': 5}
{'_id': ObjectId('675578e48ea3ab4410643f2d'), 'name': 'Bob', 'email': 'bob@example.com', 'role': 'Tester', 'experience': 3}
{'_id': ObjectId('675578e48ea3ab4410643f2e'), 'name': 'Charlie', 'email': 'charlie@example.com', 'role': 'Developer', 'experience': 4}
{'_id': ObjectId('675578e48ea3ab4410643f2f'), 'name': 'David', 'email': 'david@example.com', 'role': 'Manager', 'experience': 6}
{'_id': ObjectId('675578e48ea3ab4410643f30'), 'name': 'Eve', 'email': 'eve@example.com', 'role': 'Designer', 'experience': 2}


In [None]:
# Create a text index on the 'description' field
products_collection.create_index([('description', 'text')])


Products matching the search:
{'_id': ObjectId('675578dd8ea3ab4410643f27'), 'name': 'Laptop', 'price': 1200, 'quantity': 10, 'category': 'Electronics', 'description': 'A high-performance laptop with a fast processor and large storage.'}


In [118]:

# Query to search for products with the word 'laptop' in the description
query = products_collection.find({"$text": {"$search": "with"}})

# Print search results
print("\nProducts matching the search:")
for product in query:
    print(product)



Products matching the search:


In [119]:
# Search for an exact phrase
query = products_collection.find({"$text": {"$search": "\"high-performance laptop\""}})

# Print the results
for product in query:
    print(product)


{'_id': ObjectId('675578dd8ea3ab4410643f27'), 'name': 'Laptop', 'price': 1200, 'quantity': 10, 'category': 'Electronics', 'description': 'A high-performance laptop with a fast processor and large storage.'}


In [120]:
# Create an index on the 'name' field (if not already present)
users_collection.create_index([('name', 1)])

# Drop the index on the 'name' field
index_name = 'name_1'  # Default index name for a single-field ascending index on 'name'
users_collection.drop_index(index_name)

# Verify that the index has been removed
indexes = users_collection.list_indexes()
print("\nIndexes in 'users' collection after dropping the index:")
for index in indexes:
    print(index)



Indexes in 'users' collection after dropping the index:
SON([('v', 2), ('key', SON([('_id', 1)])), ('name', '_id_')])


In [None]:
5. Testing Performance with Indexes
 Task 5:
 Test the performance of a query with and without an index.
 First, run a query without creating any indexes on a large dataset.
 Then, create an index and rerun the same query. Compare the response time.

In [122]:
import time
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient('mongodb://localhost:27017/')
db = client['ecommerce']  # Use your database name
products_collection = db['products']  # Use your collection name

# Insert a large number of documents (for performance testing)
# Example: Insert 100,000 products (you can adjust the number)
for i in range(100000):
    products_collection.insert_one({
        "name": f"Product {i}",
        "category": "Electronics",
        "price": 100 + i,
        "description": "A great product.",
        "quantity": 10
    })

# Step 1: Run the query without an index
start_time = time.time()

# Example query: Search for products in a specific category
query = products_collection.find({"category": "Electronics"})

# Iterate through the results (just to force the query to run)
for product in query:
    pass  # Do nothing with the result, just force MongoDB to execute the query

end_time = time.time()
no_index_time = end_time - start_time

print(f"Time without index: {no_index_time:.4f} seconds")

# Step 2: Create an index on the 'category' field
products_collection.create_index('category')

# Step 3: Run the query again after creating the index
start_time = time.time()

# Example query: Search for products in a specific category
query = products_collection.find({"category": "Electronics"})

# Iterate through the results (just to force the query to run)
for product in query:
    pass  # Do nothing with the result, just force MongoDB to execute the query

end_time = time.time()
with_index_time = end_time - start_time

print(f"Time with index: {with_index_time:.4f} seconds")

# Compare the times
print(f"Performance improvement: {no_index_time / with_index_time:.2f}x faster with the index")


Time without index: 0.3437 seconds
Time with index: 0.4129 seconds
Performance improvement: 0.83x faster with the index


In [None]:
Real-Life Testing Scenarios for Indexing
 1. Scenario 1: Slow Query Issue
 Your application’s search feature is slow when users search for products by category.
 Test adding an index on the 
category field and measure the performance
 improvement.
 2. Scenario 2: Compound Index for Sorting
 You’re testing a reporting system where data is sorted by multiple fields (e.g., 
date and
 status ). Add a compound index and validate that the reports load faster after the index
 is created.
 3. Scenario 3: Dealing with Large Collections
 As part of performance testing, you need to validate that the system can handle large
 collections efficiently. Create indexes on fields that are frequently queried (e.g., 
email , 
name ,
 createdAt ), and run performance tests to ensure the system remains fast.

In [125]:
import time
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient('mongodb://localhost:27017/')
db = client['ecommerce']  # Use your database name
products_collection = db['products']  # Use your collection name

# Step 1: Run the query without an index
start_time = time.time()

# Search for products in a specific category (e.g., 'Electronics')
query = products_collection.find({"category": "Electronics"})
for product in query:
    pass  # Just executing the query

end_time = time.time()
no_index_time = end_time - start_time
print(f"Time without index: {no_index_time:.4f} seconds")

# Step 2: Create an index on the 'category' field
products_collection.create_index('category')

# Step 3: Run the query again after creating the index
start_time = time.time()

# Search for products in a specific category
query = products_collection.find({"category": "Electronics"})
for product in query:
    pass  # Just executing the query

end_time = time.time()
with_index_time = end_time - start_time
print(f"Time with index: {with_index_time:.4f} seconds")

# Performance improvement
print(f"Performance improvement: {no_index_time / with_index_time:.2f}x faster with the index")


Time without index: 0.5632 seconds
Time with index: 0.6215 seconds
Performance improvement: 0.91x faster with the index


In [126]:
# Step 1: Run the sorting query without an index
start_time = time.time()

# Sorting products by 'date' and 'status'
query = products_collection.find().sort([("date", 1), ("status", 1)])
for product in query:
    pass  # Just executing the query

end_time = time.time()
no_index_sort_time = end_time - start_time
print(f"Time without compound index: {no_index_sort_time:.4f} seconds")

# Step 2: Create a compound index on 'date' and 'status'
products_collection.create_index([('date', 1), ('status', 1)])

# Step 3: Run the sorting query again after creating the index
start_time = time.time()

# Sorting products by 'date' and 'status'
query = products_collection.find().sort([("date", 1), ("status", 1)])
for product in query:
    pass  # Just executing the query

end_time = time.time()
with_index_sort_time = end_time - start_time
print(f"Time with compound index: {with_index_sort_time:.4f} seconds")

# Performance improvement
print(f"Performance improvement: {no_index_sort_time / with_index_sort_time:.2f}x faster with the compound index")


Time without compound index: 0.5714 seconds
Time with compound index: 0.5662 seconds
Performance improvement: 1.01x faster with the compound index


In [127]:
# Step 1: Run queries without indexes
start_time = time.time()

# Example query: Find user by email
query = users_collection.find({"email": "user@example.com"})
for user in query:
    pass  # Just executing the query

end_time = time.time()
no_index_email_time = end_time - start_time
print(f"Time without index (email): {no_index_email_time:.4f} seconds")

# Step 2: Create indexes on 'email', 'name', and 'createdAt'
users_collection.create_index('email')
users_collection.create_index('name')
users_collection.create_index('createdAt')

# Step 3: Run the same queries again after creating the indexes
start_time = time.time()

# Query by email
query = users_collection.find({"email": "user@example.com"})
for user in query:
    pass  # Just executing the query

end_time = time.time()
with_index_email_time = end_time - start_time
print(f"Time with index (email): {with_index_email_time:.4f} seconds")

# Performance improvement for email query
print(f"Performance improvement for email: {no_index_email_time / with_index_email_time:.2f}x faster with the index")


Time without index (email): 0.0015 seconds
Time with index (email): 0.0090 seconds
Performance improvement for email: 0.16x faster with the index


In [None]:
Conclusion
Scenario 1 (Slow Search by Category): Indexing the category field significantly speeds up search queries.
Scenario 2 (Sorting by Multiple Fields): A compound index on date and status improves sorting performance.
Scenario 3 (Large Collections): Creating indexes on frequently queried fields (email, name, createdAt) helps maintain performance even as the collection grows.