[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/ai-agents-lab-notebooks/blob/main/notebook_template.ipynb)


[![Lab Documentation and Solutions](https://img.shields.io/badge/Lab%20Documentation%20and%20Solutions-purple)](https://mongodb-developer.github.io/rag-lab/)


# Step 1: Install libraries


In [62]:
! pip install -qU pymongo langchain langchain-community langchain-mongodb fireworks-ai bs4 tiktoken sentence_transformers

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


# Step 2: Setup prerequisites


In [52]:
import os
from openai import OpenAI

In [25]:
MONGODB_URI = "<CODE_BLOCK_1"

In [63]:
os.environ["FIREWORKS_API_KEY"] = "CODE_BLOCK_2"

# Step 3: Prepare the dataset


### Load the dataset


In [20]:
from langchain_community.document_loaders import WebBaseLoader

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [21]:
loader = WebBaseLoader(
    [
        "https://www.mongodb.com/developer/products/atlas/choose-embedding-model-rag/",
        "https://www.mongodb.com/developer/products/atlas/evaluate-llm-applications-rag/",
        "https://www.mongodb.com/developer/products/atlas/choosing-chunking-strategy-rag/",
        "https://www.mongodb.com/developer/products/atlas/gemma-mongodb-huggingface-rag/",
    ]
)
docs = loader.load()

In [22]:
# Preview a document
docs[0]

Document(metadata={'source': 'https://www.mongodb.com/developer/products/atlas/choose-embedding-model-rag/', 'title': 'How to Choose the Right Embedding Model for Your LLM Application | MongoDB', 'description': 'In this tutorial, we will see why embeddings are important for RAG, and how to choose the right embedding model for your RAG application.', 'language': 'en'}, page_content='How to Choose the Right Embedding Model for Your LLM Application | MongoDBBlogAtlas Vector Search voted most loved vector database in 2024 Retool State of AI reportLearn more\xa0>>Developer Articles & TopicsGeneral InformationDocumentationDeveloper Articles & TopicsCommunity ForumsBlogUniversityProductsPlatformAtlasBuild on a developer data platformPlatform ServicesDatabaseDeploy a multi-cloud databaseSearchDeliver engaging search experiencesVector SearchDesign intelligent apps with GenAIStream ProcessingUnify data in motion and data at restToolsCompassWork with MongoDB data in a GUIIntegrationsIntegrations 

In [23]:
docs[0].page_content

'How to Choose the Right Embedding Model for Your LLM Application | MongoDBBlogAtlas Vector Search voted most loved vector database in 2024 Retool State of AI reportLearn more\xa0>>Developer Articles & TopicsGeneral InformationDocumentationDeveloper Articles & TopicsCommunity ForumsBlogUniversityProductsPlatformAtlasBuild on a developer data platformPlatform ServicesDatabaseDeploy a multi-cloud databaseSearchDeliver engaging search experiencesVector SearchDesign intelligent apps with GenAIStream ProcessingUnify data in motion and data at restToolsCompassWork with MongoDB data in a GUIIntegrationsIntegrations with third-party servicesRelational MigratorMigrate to MongoDB with confidenceSelf ManagedEnterprise AdvancedRun and manage MongoDB yourselfCommunity EditionDevelop locally with MongoDBBuild with MongoDB AtlasGet started for free in minutesSign UpTest Enterprise AdvancedDevelop with MongoDB on-premisesDownloadTry Community EditionExplore the latest version of MongoDBDownloadResourc

In [24]:
docs[0].metadata

{'source': 'https://www.mongodb.com/developer/products/atlas/choose-embedding-model-rag/',
 'title': 'How to Choose the Right Embedding Model for Your LLM Application | MongoDB',
 'description': 'In this tutorial, we will see why embeddings are important for RAG, and how to choose the right embedding model for your RAG application.',
 'language': 'en'}

### Chunk up the data


In [26]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [27]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name="cl100k_base", chunk_size=200, chunk_overlap=30
)

In [28]:
split_docs = text_splitter.split_documents(docs)

In [29]:
len(split_docs)

202

In [30]:
split_docs = [doc.dict() for doc in split_docs]

In [31]:
split_docs[0]

{'id': None,
 'metadata': {'source': 'https://www.mongodb.com/developer/products/atlas/choose-embedding-model-rag/',
  'title': 'How to Choose the Right Embedding Model for Your LLM Application | MongoDB',
  'description': 'In this tutorial, we will see why embeddings are important for RAG, and how to choose the right embedding model for your RAG application.',
  'language': 'en'},
 'page_content': 'How to Choose the Right Embedding Model for Your LLM Application | MongoDBBlogAtlas Vector Search voted most loved vector database in 2024 Retool State of AI reportLearn more\xa0>>Developer Articles & TopicsGeneral InformationDocumentationDeveloper Articles & TopicsCommunity ForumsBlogUniversityProductsPlatformAtlasBuild on a developer data platformPlatform ServicesDatabaseDeploy a multi-cloud databaseSearchDeliver engaging search experiencesVector SearchDesign intelligent apps with GenAIStream ProcessingUnify data in motion and data at restToolsCompassWork with MongoDB data in a GUIIntegra

### Generate embeddings


In [32]:
from sentence_transformers import SentenceTransformer

In [33]:
embedding_model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")

In [34]:
def get_embedding(text: str):
    embedding = embedding_model.encode(text)
    return embedding.tolist()

In [35]:
embedded_docs = [
    {**d, "embedding": get_embedding(d["page_content"])} for d in split_docs
]

In [36]:
embedded_docs = []
for doc in split_docs:
    temp = doc.copy()
    temp["embedding"] = get_embedding(temp["page_content"])
    embedded_docs.append(temp)

# Step 4: Perform Semantic Search on Your Data


In [120]:
from pymongo import MongoClient
from typing import List, Dict

### Ingest documents into MongoDB


In [144]:
# Initialize a MongoDB Python client
mongo_client = MongoClient(MONGODB_URI)

In [145]:
# Name of the database -- Change if needed or leave as is
DB_NAME = "mongodb_rag_lab"
# Name of the collection -- Change if needed or leave as is
COLLECTION_NAME = "knowledge_base"
# Name of the vector search index -- Change if needed or leave as is
ATLAS_VECTOR_SEARCH_INDEX_NAME = "vector_index"

In [146]:
# Connect to the collection defined above using the MongoDB client
collection = mongo_client[DB_NAME][COLLECTION_NAME]

In [147]:
# Bulk delete all existing records from the collection defined above -- should be a one-liner
collection.delete_many({})

DeleteResult({'n': 202, 'electionId': ObjectId('7fffffff000000000000000c'), 'opTime': {'ts': Timestamp(1720559750, 201), 't': 12}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1720559750, 202), 'signature': {'hash': b'|\xdd\xc0\xedw\xcd\xfd?\xc7k\xd22\xff\xc3\nC\x0b\x17\xa16', 'keyId': 7353010953081847814}}, 'operationTime': Timestamp(1720559750, 201)}, acknowledged=True)

In [148]:
# Bulk insert `records` into the collection defined above -- should be a one-liner
collection.insert_many(embedded_docs)

print("Data ingestion into MongoDB completed")

Data ingestion into MongoDB completed


### Create a vector search index

Follow the instructions in the documentation to create a Vector Search index in the Atlas UI.


### Define a vector search function


In [149]:
def vector_search(user_query: str) -> List[Dict]:
    """
    Perform a vector search on a MongoDB collection based on the user query.

    Args:
    user_query (str): The user's query string.

    Returns:
    list: A list of matching documents.
    """

    # Generate embedding for the user query
    query_embedding = get_embedding(user_query)

    # Define the vector search pipeline
    pipeline = [
        {
            "$vectorSearch": {
                "index": ATLAS_VECTOR_SEARCH_INDEX_NAME,
                "queryVector": query_embedding,
                "path": "embedding",
                "numCandidates": 150,
                "limit": 5,
            }
        },
        {
            "$project": {
                "_id": 0,
                "page_content": 1,
                "score": {"$meta": "vectorSearchScore"},
            }
        },
    ]

    # Execute the search
    results = collection.aggregate(pipeline)
    return list(results)

### Run vector search queries


In [122]:
vector_search(
    "What are the important considerations while choosing an embedding model?"
)

[{'page_content': 'models out there, how do we choose the best one for our use case?A good place to start when looking for embedding models to use is the MTEB Leaderboard on Hugging Face. It is the most up-to-date list of proprietary and open-source text embedding models, accompanied by statistics on how each model performs on various embedding tasks such as retrieval, summarization, etc.Evaluations of this magnitude for multimodal models are just emerging (see the MME benchmark) so we will only focus on text embedding models for this tutorial. However, all the guidance here on choosing an embedding model also applies to multimodal models.Benchmarks are a good place to begin but bear in mind that these results are self-reported and have been benchmarked on datasets that might not accurately represent the data you are dealing with. It is also possible that some models may include the MTEB datasets in their training data since they are publicly available. So even if you choose a model ba

In [123]:
vector_search("How to choose a chunking strategy for RAG?")

[{'page_content': 'breaking down large pieces of text into smaller segments or chunks. In the context of RAG, embedding smaller chunks instead of entire documents to create the knowledge base means that given a user query, you only have to retrieve the most relevant document chunks, resulting in fewer input tokens and more targeted context for the LLM to work with.Choosing the right chunking strategy for your RAG applicationThere is no “one size fits all” solution when it comes to choosing a chunking strategy for RAG — it depends on the structure of the documents being used to create the knowledge base and will look different depending on whether you are working with well-formatted text documents or documents with code snippets, tables, images, etc. The three key components of a chunking strategy are as follows:Splitting technique: Determines where the chunk boundaries will be placed — based on paragraph boundaries, programming language-specific separators, tokens, or even semantic bou

# Step 5: Build a RAG Application


### Instantiate a chat model


In [124]:
from fireworks.client import Fireworks

In [70]:
fw_client = Fireworks()
model = "accounts/fireworks/models/llama-v2-7b-chat"

### Define a function to create the chat prompt


In [141]:
def create_prompt(user_query: str) -> str:
    """
    Create a chat prompt that includes the user query and retrieved context.

    Args:
        user_query (str): The user's query string.

    Returns:
        str: The chat prompt string.
    """
    context = vector_search(user_query)
    context = "\n\n".join([d.get("page_content", "") for d in context])
    prompt = f"Answer the question based only on the following context. If the context is empty, say I DON'T KNOW\n\nContext:\n{context}\n\nQuestion:{user_query}"
    return prompt

### Define a function to answer user queries


In [167]:
def generate_answer(user_query: str) -> str:
    """
    Generate an answer to the user query.

    Args:
        user_query (str): The user's query string.

    Returns:
        str: The answer string.
    """
    response = fw_client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": create_prompt(user_query),
            }
        ],
    )
    return response.choices[0].message.content

### Query the RAG application


In [127]:
print(
    generate_answer(
        "What are the important considerations while choosing an embedding model?"
    )
)

  Based on the provided context, the important considerations while choosing an embedding model are:

1. Dataset: The dataset you are working with and the type of data it contains, such as text, images, or audio, can influence the choice of embedding model.
2. Dimensions: The number of dimensions in the embedding model, which can impact the model's ability to capture complexity and nuanced details in the data.
3. Training data: The embedding model may have been trained on a dataset that includes the MTEB datasets, which could affect the model's performance on unseen data.
4. Operational efficiency: The embedding model's size and computational requirements can impact the model's operational efficiency, so it's important to choose a model that balances complexity and efficiency.
5. Trade-off: The choice of embedding model should be based on a trade-off between capturing the complexity of the data and operational efficiency.
6. Self-reported benchmarks: The benchmarks provided by the mode

In [168]:
print(generate_answer("What did I just ask you?"))

  Based on the given context, I cannot answer the question as the context does not provide any information to answer the question. Therefore, I DON'T KNOW.


# Step 6: Add memory to the RAG Application


In [152]:
from datetime import datetime

In [151]:
history_collection = mongo_client[DB_NAME]["chat_history"]
history_collection.create_index("session_id")

'session_id_1'

### Define a function to store chat messages in MongoDB


In [154]:
def store_chat_message(session_id, role, content):
    message = {
        "session_id": session_id,
        "role": role,
        "content": content,
        "timestamp": datetime.now(),
    }
    history_collection.insert_one(message)

### Define a function to retrieve chat history from MongoDB


In [155]:
def retrieve_session_history(session_id):
    cursor = history_collection.find({"session_id": session_id}).sort("timestamp", 1)

    if cursor:
        messages = [{"role": msg["role"], "content": msg["content"]} for msg in cursor]
    else:
        messages = []

    return messages

### Handle chat history in the `generate_answer` function


In [162]:
def generate_answer(session_id: str, user_query: str) -> str:
    """
    Generate an answer to the user query.

    Args:
        user_query (str): The user's query string.

    Returns:
        str: The answer string.
    """
    context = vector_search(user_query)
    context = "\n\n".join([d.get("page_content", "") for d in context])

    messages = []

    system_message = {
        "role": "system",
        "content": f"Answer the question based only on the following context. If the context is empty, say I DON'T KNOW\n\nContext:\n{context}",
    }
    messages.append(system_message)

    message_history = retrieve_session_history(session_id)
    messages += message_history

    user_message = {"role": "user", "content": user_query}
    messages.append(user_message)
    print(messages)

    response = fw_client.chat.completions.create(model=model, messages=messages)

    answer = response.choices[0].message.content

    store_chat_message(session_id, "user", user_query)
    store_chat_message(session_id, "assistant", answer)
    return answer

In [163]:
print(
    generate_answer(
        "2",
        user_query="What are the important considerations while choosing an embedding model?",
    )
)

[{'role': 'system', 'content': "Answer the question based only on the following context. If the context is empty, say I DON'T KNOW\n\nContext:\nmodels out there, how do we choose the best one for our use case?A good place to start when looking for embedding models to use is the MTEB Leaderboard on Hugging Face. It is the most up-to-date list of proprietary and open-source text embedding models, accompanied by statistics on how each model performs on various embedding tasks such as retrieval, summarization, etc.Evaluations of this magnitude for multimodal models are just emerging (see the MME benchmark) so we will only focus on text embedding models for this tutorial. However, all the guidance here on choosing an embedding model also applies to multimodal models.Benchmarks are a good place to begin but bear in mind that these results are self-reported and have been benchmarked on datasets that might not accurately represent the data you are dealing with. It is also possible that some mo

In [165]:
print(
    generate_answer(
        "2",
        user_query="Tell me more about embedding size.",
    )
)

[{'role': 'system', 'content': 'Answer the question based only on the following context. If the context is empty, say I DON\'T KNOW\n\nContext:\nvector. Smaller embeddings offer faster inference and are more storage-efficient, while more dimensions can capture nuanced details and relationships in the data. Ultimately, we want a good trade-off between capturing the complexity of data and operational efficiency.The top 10 models on the leaderboard contain a mix of small vs large and proprietary vs open-source models. Let’s compare some of these to find the best embedding model for our dataset.Before we beginHere are some things to note about our evaluation experiment.DatasetMongoDB’s cosmopedia-wikihow-chunked dataset is available on Hugging Face, which consists of prechunked WikiHow-style articles.Models evaluatedvoyage-lite-02-instruct: A proprietary embedding model from VoyageAItext-embedding-3-large: One of OpenAI’s latest proprietary embedding modelsUAE-Large-V1: A small-ish (335M p

In [166]:
print(
    generate_answer(
        "2",
        user_query="What did I just ask you?",
    )
)

[{'role': 'system', 'content': 'Answer the question based only on the following context. If the context is empty, say I DON\'T KNOW\n\nContext:\nQuestion: {question}\n    """\n    # Defining the chat prompt\n    prompt = ChatPromptTemplate.from_template(template)\n    # Defining the model to be used for chat completion\n    llm = ChatOpenAI(temperature=0, model=model)\n    # Parse output as a string\n    parse_output = StrOutputParser()\n\ntypes of questions to include in the test set: simple corresponds to straightforward questions that can be easily answered using the source data. multi_context stands for questions that would require information from multiple related sections or chunks to formulate an answer. reasoning questions are those that would require an LLM to reason to effectively answer the question. You will want to set this distribution based on your best guess of the type of questions you would expect from your users.Generates a test set from the dataset (pages) we create