<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Build Fast with AI](https://img.shields.io/badge/BuildFastWithAI-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://www.buildfastwithai.com/genai-course)
[![EduChain GitHub](https://img.shields.io/github/stars/satvik314/educhain?style=for-the-badge&logo=github&color=gold)](https://github.com/satvik314/educhain)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1s5CMSEyiDHnH5XS0cHBypcmE5Dn9TYoa?usp=sharing)
## Master Generative AI in 6 Weeks
**What You'll Learn:**
- Build with Latest LLMs
- Create Custom AI Apps
- Learn from Industry Experts
- Join Innovation Community
Transform your AI ideas into reality through hands-on projects and expert mentorship.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)
*Empowering the Next Generation of AI Innovators

# 🍃 MongoDB for AI-Powered Applications 🚀  

MongoDB is a **NoSQL** database designed for handling large-scale, high-performance applications. ⚡ With its flexible document model and powerful querying capabilities, it's an excellent choice for AI-driven workflows. 🤖💡  

## 🔥 Why Use MongoDB?  

✅ **Flexible Document Storage:** 📄 Store structured & unstructured data efficiently.  
✅ **Vector Search:** 🧠🔎 Ideal for handling embeddings and similarity searches.  
✅ **Real-Time Processing:** ⚡ Supports high-speed data retrieval and updates.  
✅ **Scalable & Fast:** 🚀 Handles large-scale applications with ease.  
✅ **Seamless Integration:** 🔗 Works well with AI frameworks like PyTorch, TensorFlow, and LangChain.  
✅ **Cloud & Local Support:** ☁️ Run on MongoDB Atlas (cloud) or locally.  


### **📦 Install Required Packages**








In [None]:
!pip install pymongo
!pip install pymongo langchain tiktoken faiss-cpu pypdf


### **🔑 Setup API Keys**








In [None]:
import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
MONGODB_URI = userdata.get('MONGODB_URI')

### **🔗 Check Connection**


In [None]:
from pymongo import MongoClient

connection_string = MONGODB_URI

client = MongoClient(connection_string)
client.admin.command('ping')
print("✅ Connected to MongoDB Atlas successfully!")

✅ Connected to MongoDB Atlas successfully!


### **🗄️ Create a Database and Collection**


In [None]:
db = client["my_database"]
collection = db["users"]


### **➕ Insert One Document**


In [None]:
user = {"name": "John Doe", "age": 30, "city": "New York"}
collection.insert_one(user)
print("✅ One document inserted!")


✅ One document inserted!


### **📂 Insert Multiple Documents**


In [None]:
users = [
    {"name": "Alice", "age": 25, "city": "Los Angeles"},
    {"name": "Bob", "age": 28, "city": "Chicago"},
]
collection.insert_many(users)
print("✅ Multiple documents inserted!")


✅ Multiple documents inserted!


### **🔍 Find One Document**


In [None]:
user = collection.find_one({"name": "John Doe"})
print("👤 Found user:", user)


👤 Found user: {'_id': ObjectId('67b0e4f9385ffc37fdc11fa7'), 'name': 'John Doe', 'age': 30, 'city': 'New York'}


### **📋 Find All Documents**


In [None]:
for user in collection.find():
    print(user)


### **🎯 Find Documents with a Condition**


In [None]:
for user in collection.find({"city": "New York"}):
    print(user)


{'_id': ObjectId('67b0e4f9385ffc37fdc11fa7'), 'name': 'John Doe', 'age': 30, 'city': 'New York'}


### **✏️ Update One Document**


In [None]:
collection.update_one({"name": "John Doe"}, {"$set": {"age": 31}})
print("✅ User updated!")


✅ User updated!


### **❌ Delete One Document**


In [None]:
collection.delete_one({"name": "Alice"})
print("✅ One document deleted!")


✅ One document deleted!


### **⚠️ Drop Collection (Delete Everything)**


In [None]:
collection.drop()
print("🗑️ Collection deleted!")


🗑️ Collection deleted!


### **🤖 Performing RAG (Retrieval-Augmented Generation) with MongoDB**

In [None]:
from pymongo import MongoClient
from pymongo.operations import SearchIndexModel
import time
from typing import List, Dict
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA



### **🗄️ Create Database and Collection**








In [None]:
db = client.mydatabase

collection = db.mycollection

### **🧠 Generate Embeddings with OpenAI**








In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

def get_embedding(text: str) -> List[float]:
    return embeddings.embed_query(text)

### **📄 Load PDF Document from URL**








In [None]:
pdf_url = "https://investors.mongodb.com/node/12236/pdf"
loader = PyPDFLoader(pdf_url)
data = loader.load()

### **✂️ Split Text into Chunks**








In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=20)
documents = text_splitter.split_documents(data)
print(f"Loaded and split {len(documents)} documents from PDF.")

Loaded and split 88 documents from PDF.


### **📥 Generate and Store Embeddings in MongoDB**








In [None]:
docs_to_insert: List[Dict] = []
for doc in documents:
    try:
        embedding = get_embedding(doc.page_content)
        docs_to_insert.append({
            "text": doc.page_content,
            "embedding": embedding
        })
    except Exception as e:
        print(f"Error generating embedding for chunk: {e}")

if docs_to_insert:
    result = collection.insert_many(docs_to_insert)
    print(f"Inserted {len(docs_to_insert)} documents into MongoDB")
else:
    print("No documents to insert.")

Inserted 88 documents into MongoDB


### **🔍 Create Vector Search Index in MongoDB**








In [None]:
index_name = "vector_index"
search_index_model = SearchIndexModel(
    definition={
        "fields": [
            {
                "type": "vector",
                "numDimensions": 1536,
                "path": "embedding",
                "similarity": "cosine"
            }
        ]
    },
    name=index_name,
    type="vectorSearch"
)
try:
    collection.create_search_index(model=search_index_model)
    print(f"Created vector search index '{index_name}' on MongoDB collection.")
except Exception as e:
    print(f"Error creating index: {e}")

Created vector search index 'vector_index' on MongoDB collection.


### **⏳ Wait for Vector Search Index to be Ready**








In [None]:
print("Polling to check if the index is ready. This may take up to a minute.")
predicate = None
if predicate is None:
   predicate = lambda index: index.get("queryable") is True
while True:
   indices = list(collection.list_search_indexes(index_name))
   if len(indices) and predicate(indices[0]):
      break
   time.sleep(5)
print(index_name + " is ready for querying.")

Polling to check if the index is ready. This may take up to a minute.
vector_index is ready for querying.


### **🔎 Define Function for Vector Search Queries**








In [None]:
def get_query_results(query: str, limit: int = 5) -> List[Dict]:
    """Gets results from a vector search query."""
    query_embedding = get_embedding(query)
    pipeline = [
        {
            "$vectorSearch": {
                "index": index_name,
                "queryVector": query_embedding,
                "path": "embedding",
                "numCandidates": 10,
                "limit": limit,
            }
        },
        {"$project": {"_id": 0, "text": 1}}
    ]
    results = list(collection.aggregate(pipeline))
    return results

### **💬 Perform Retrieval and Question Answering**








In [None]:
query = "What are MongoDB's latest AI announcements?"
results = get_query_results(query)

print("Query:", query)
print("Results:")
for doc in results:
    print(doc)

### **🧠 Generate Answer Using Retrieved Context**








In [None]:
query = "What are MongoDB's latest AI announcements?"
results = get_query_results(query)

print("Query:", query)
print("Results:")
for doc in results:
    print(doc)

llm = OpenAI(temperature=0)

context_string = " ".join([doc["text"] for doc in results])

prompt = f"""Use the following pieces of context to answer the question at the end.
    {context_string}
    Question: {query}
"""
output = llm(prompt)

print(output)

Query: What are MongoDB's latest AI announcements?
Results:
{'text': 'MongoDB  continues to expand its AI ecosystem with the announcement of the MongoDB AI Applications Program (MAAP),'}
{'text': 'more of our customers. We also see a tremendous opportunity to win more legacy workloads, as AI has now become a catalyst to modernize these\napplications. MongoDB\'s document-based architecture is particularly well-suited for the variety and scale of data required by AI-powered applications.\xa0\nWe are confident MongoDB  will be a substantial beneficiary of this next wave of application development."'}
{'text': 'which provides customers with reference architectures, pre-built partner integrations, and professional services to help\nthem quickly build AI-powered applications. Accenture will establish a center of excellence focused on MongoDB  projects,\nand is the first global systems integrator to join MAAP.'}
{'text': 'of MongoDB  8.0—with significant performance improvements such as faste

  output = llm(prompt)



MongoDB's latest AI announcements include the MongoDB AI Applications Program (MAAP), the release of MongoDB 8.0 with performance improvements, the general availability of Atlas Stream Processing, and a partnership with Accenture to establish a center of excellence focused on MongoDB projects.
