[![View Article](https://img.shields.io/badge/View%20Article-blue)](https://www.mongodb.com/developer/products/atlas/advanced-rag-langchain-mongodb/)


# Adding Semantic Caching and Memory to your RAG Application using MongoDB and LangChain

In this notebook, we will see how to use the new MongoDBCache and MongoDBChatMessageHistory in your RAG application.


## Step 1: Install required libraries

- **datasets**: Python library to get access to datasets available on Hugging Face Hub

- **langchain**: Python toolkit for LangChain

- **langchain-mongodb**: Python package to use MongoDB as a vector store, semantic cache, chat history store etc. in LangChain

- **langchain-openai**: Python package to use OpenAI models with LangChain

- **pymongo**: Python toolkit for MongoDB

- **pandas**: Python library for data analysis, exploration, and manipulation

In [1]:
! pip install -qU datasets langchain langchain-mongodb langchain-openai pymongo pandas

## Step 2: Setup pre-requisites

* Set the MongoDB connection string. Follow the steps [here](https://www.mongodb.com/docs/manual/reference/connection-string/) to get the connection string from the Atlas UI.

* Set the OpenAI API key. Steps to obtain an API key as [here](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)

In [1]:
import getpass

In [2]:
MONGODB_URI = getpass.getpass("Enter your connection string:")

Enter your connection string: ········


In [4]:
OPENAI_API_KEY = getpass.getpass("Enter your OpenAI API key:")

Enter your OpenAI API key:········


In [None]:
# Optional-- If you want to enable Langsmith -- good for debugging
import os
print(MONGODB_URI)
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

mongodb+srv://xiangzhitao233_db_user:glEpEmE4jJN4yV95@cluster0.k4xdbmb.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0


## Step 3: Download the dataset

We will be using MongoDB's [embedded_movies](https://huggingface.co/datasets/MongoDB/embedded_movies) dataset

In [16]:
import pandas as pd
from datasets import load_dataset

In [17]:
# Ensure you have an HF_TOKEN in your development environment:
# access tokens can be created or copied from the Hugging Face platform (https://huggingface.co/docs/hub/en/security-tokens)

# Load MongoDB's embedded_movies dataset from Hugging Face
# https://huggingface.co/datasets/MongoDB/airbnb_embeddings

data = load_dataset("MongoDB/embedded_movies")

In [18]:
df = pd.DataFrame(data["train"])

## Step 4: Data analysis

Make sure length of the dataset is what we expect, drop Nones etc.

In [19]:
# Previewing the contents of the data
df.head(1)

Unnamed: 0,plot,runtime,genres,fullplot,directors,writers,countries,poster,languages,cast,title,num_mflix_comments,rated,imdb,awards,type,metacritic,plot_embedding
0,Young Pauline is left a lot of money when her ...,199.0,[Action],Young Pauline is left a lot of money when her ...,"[Louis J. Gasnier, Donald MacKenzie]","[Charles W. Goddard (screenplay), Basil Dickey...",[USA],https://m.media-amazon.com/images/M/MV5BMzgxOD...,[English],"[Pearl White, Crane Wilbur, Paul Panzer, Edwar...",The Perils of Pauline,0,,"{'id': 4465, 'rating': 7.6, 'votes': 744}","{'nominations': 0, 'text': '1 win.', 'wins': 1}",movie,,"[0.0007293965299999999, -0.026834568000000003,..."


In [20]:
# Only keep records where the fullplot field is not null
df = df[df["fullplot"].notna()]

In [21]:
# Renaming the embedding field to "embedding" -- required by LangChain
df.rename(columns={"plot_embedding": "embedding"}, inplace=True)

In [22]:
df.head(2)

Unnamed: 0,plot,runtime,genres,fullplot,directors,writers,countries,poster,languages,cast,title,num_mflix_comments,rated,imdb,awards,type,metacritic,embedding
0,Young Pauline is left a lot of money when her ...,199.0,[Action],Young Pauline is left a lot of money when her ...,"[Louis J. Gasnier, Donald MacKenzie]","[Charles W. Goddard (screenplay), Basil Dickey...",[USA],https://m.media-amazon.com/images/M/MV5BMzgxOD...,[English],"[Pearl White, Crane Wilbur, Paul Panzer, Edwar...",The Perils of Pauline,0,,"{'id': 4465, 'rating': 7.6, 'votes': 744}","{'nominations': 0, 'text': '1 win.', 'wins': 1}",movie,,"[0.0007293965299999999, -0.026834568000000003,..."
1,A penniless young man tries to save an heiress...,22.0,"[Comedy, Short, Action]",As a penniless man worries about how he will m...,"[Alfred J. Goulding, Hal Roach]",[H.M. Walker (titles)],[USA],https://m.media-amazon.com/images/M/MV5BNzE1OW...,[English],"[Harold Lloyd, Mildred Davis, 'Snub' Pollard, ...",From Hand to Mouth,0,TV-G,"{'id': 10146, 'rating': 7.0, 'votes': 639}","{'nominations': 1, 'text': '1 nomination.', 'w...",movie,,"[-0.022837115, -0.022941574000000003, 0.014937..."


## Step 5: Create a simple RAG chain using MongoDB as the vector store

In [46]:
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
uri = MONGODB_URI
# Create a new client and connect to the server
client = MongoClient(uri, server_api=ServerApi('1'))
# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

Pinged your deployment. You successfully connected to MongoDB!


In [4]:
from langchain_mongodb import MongoDBAtlasVectorSearch
from pymongo import MongoClient

# Initialize MongoDB python client
client = MongoClient(MONGODB_URI, appname="devrel.content.python")

DB_NAME = "langchain_chatbot"
COLLECTION_NAME = "data"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "vector_index"
collection = client[DB_NAME][COLLECTION_NAME]

In [48]:
# Delete any existing records in the collection
collection.delete_many({})

DeleteResult({'n': 2904, 'electionId': ObjectId('7fffffff0000000000000105'), 'opTime': {'ts': Timestamp(1757230994, 172), 't': 261}, 'ok': 1.0, '$clusterTime': {'clusterTime': Timestamp(1757230994, 172), 'signature': {'hash': b'!9\xbf|thR\xc3\x10\xb6\xeb\x10\x1d>n\xd9N\x96\x87\x93', 'keyId': 7506536139625332738}}, 'operationTime': Timestamp(1757230994, 172)}, acknowledged=True)

In [49]:
# Data Ingestion
records = df.to_dict("records")
collection.insert_many(records)

print("Data ingestion into MongoDB completed")

Data ingestion into MongoDB completed


In [5]:
for doc in collection.find().limit(3):
    print(doc)

{'_id': ObjectId('68bd37979bdd3f7158baece6'), 'plot': "Young Pauline is left a lot of money when her wealthy uncle dies. However, her uncle's secretary has been named as her guardian until she marries, at which time she will officially take ...", 'runtime': 199.0, 'genres': ['Action'], 'fullplot': 'Young Pauline is left a lot of money when her wealthy uncle dies. However, her uncle\'s secretary has been named as her guardian until she marries, at which time she will officially take possession of her inheritance. Meanwhile, her "guardian" and his confederates constantly come up with schemes to get rid of Pauline so that he can get his hands on the money himself.', 'directors': ['Louis J. Gasnier', 'Donald MacKenzie'], 'writers': ['Charles W. Goddard (screenplay)', 'Basil Dickey (screenplay)', 'Charles W. Goddard (novel)', 'George B. Seitz', 'Bertram Millhauser'], 'countries': ['USA'], 'poster': 'https://m.media-amazon.com/images/M/MV5BMzgxODk1Mzk2Ml5BMl5BanBnXkFtZTgwMDg0NzkwMjE@._V1_SY1

In [6]:
from langchain_community.embeddings import DashScopeEmbeddings
import os
from dotenv import load_dotenv
load_dotenv()
embeddings = DashScopeEmbeddings(
    model="text-embedding-v1",
    dashscope_api_key=os.getenv("DASHSCOPE_API_KEY")
)

In [17]:
# Vector Store Creation
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
    connection_string=MONGODB_URI,
    namespace=DB_NAME + "." + COLLECTION_NAME,
    embedding=embeddings,
    index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
    text_key="fullplot",
)

In [18]:
docs1 = vector_store.similarity_search("money", k=3)
print(f"len docs1 is {len(docs1)}")
for d in docs1:
    print(d.page_content[:200])  # 打印前200个字符

len docs1 is 3
It's the story about a lazy, irreverent slacker panda, named Po, who is the biggest fan of Kung Fu around...which doesn't exactly come in handy while working every day in his family's noodle shop. Une
Clyde Williams and Billy Foster are a couple of blue-collar workers in Atlanta who have promised to raise funds for their fraternal order, the Brothers and Sisters of Shaka. However, their method for 
Daniel and his mother move from New Jersey to California. She has a wonderful new job, but Daniel quickly discovers that a dark haired Italian boy with a Jersey accent doesn't fit into the blond surfe


In [19]:
# Using the MongoDB vector store as a retriever in a RAG chain
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

In [20]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

# Generate context using the retriever, and pass the user question through
# 1. retriever：输入是用户的 query（问题），输出是相关文档列表（List[Document]） 
# 2. lambda docs: "\n\n".join([d.page_content for d in docs])：把 Document 列表转成纯文本字符串（只取 .page_content），多个文档之间用 \n\n 拼接。
# 所以 "context" 对应的值是 retriever 输出的文本串。
# "question": RunnablePassthrough() 意味着原始的用户输入（query）会直接作为 "question" 传递下去，不做处理。
retrieve = {
    "context": retriever | (lambda docs: "\n\n".join([d.page_content for d in docs])),
    "question": RunnablePassthrough(),
}

template = """Answer the question based only on the following context: \
{context}

Question: {question}
"""
# Defining the chat prompt
prompt = ChatPromptTemplate.from_template(template)
# Defining the model to be used for chat completion
import os
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
model = ChatOpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    model="qwen-plus",   # ✅ 用通义千问提供的模型名，比如 qwen-plus / qwen-max / qwen-turbo
    temperature=0
)
# model = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
# Parse output as a string
parse_output = StrOutputParser()

# Naive RAG chain
naive_rag_chain = retrieve | prompt | model | parse_output

In [21]:
result = naive_rag_chain.invoke("What is the best movie to watch when sad?")
# print(f'retrieve context:{retrieve["context"]}')
# print(f'retrieve question:{retrieve["question"]}')
print(result)

I cannot determine the best movie to watch when sad based on the provided context. The context only describes the plots of several movies but does not provide information about their emotional tone or suitability for specific moods.


- 拼接一个简单问题模板

In [72]:
docs1 = vector_store.similarity_search("What is the best movie to watch when sad?", k=5)
print(f"len docs1 is {len(docs1)}")
test_content = ""
for d in docs1:
    test_content += ("\n\n" + d.page_content)

print(f'retrieve context:{test_content}')
special_template = f"Answer the question based only on the following context: {test_content}  \
Question: What is the best movie to watch when sad?"
print(special_template)

len docs1 is 5
retrieve context:

Air America was the CIA's private airline operating in Laos during the Vietnam War, running anything and everything from soldiers to foodstuffs for local villagers. After losing his pilot's license, Billy Covington is recruited into it, and ends up in the middle of a bunch of lunatic pilots, gun-running by his friend Gene Ryack, and opium smuggling by his own superiors.

It is the year 2029: Astronaut Leo Davidson boards a pod cruiser on a Space Station for a "routine" reconnaissance mission. But an abrupt detour through a space time wormhole lands him on a strange planet where talking apes rule over the human race. With the help of a sympathetic chimpanzee activist named Ari and a small band of human rebels, Leo leads the effort to evade the advancing Gorilla Army led by General Thade and his most trusted warrior Attar. Now the race is on to reach a sacred temple within the planet's Forbidden Zone to discover the shocking secrets of mankind's past - a

- 使用自我批评链对这个问题进行反思问答

In [74]:
from langchain_experimental.smart_llm import SmartLLMChain
special_prompt = ChatPromptTemplate.from_template(special_template)
smart_chain = SmartLLMChain(llm=model, prompt=special_prompt, n_ideas=3, verbose=True)
smart_chain.invoke({})



[1m> Entering new SmartLLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mHuman: Answer the question based only on the following context: 

Air America was the CIA's private airline operating in Laos during the Vietnam War, running anything and everything from soldiers to foodstuffs for local villagers. After losing his pilot's license, Billy Covington is recruited into it, and ends up in the middle of a bunch of lunatic pilots, gun-running by his friend Gene Ryack, and opium smuggling by his own superiors.

It is the year 2029: Astronaut Leo Davidson boards a pod cruiser on a Space Station for a "routine" reconnaissance mission. But an abrupt detour through a space time wormhole lands him on a strange planet where talking apes rule over the human race. With the help of a sympathetic chimpanzee activist named Ari and a small band of human rebels, Leo leads the effort to evade the advancing Gorilla Army led by General Thade and his most trusted warrior Attar. Now the race is

{'resolution': 'The question asks for the best movie to watch when sad, based only on the provided context.\n\nStep 1: Understand the rules  \n- The instruction clearly states: **"Answer the question based only on the following context."**  \n- This means **no outside knowledge** or assumptions can be used.  \n- Only information **explicitly stated** in the context can be used to form the answer.\n\nStep 2: Analyze what is in the context  \nThe context includes summaries of four different movie plots:\n1. A Vietnam-era movie involving Air America, pilots, smuggling, and conflict.\n2. A science fiction movie set in 2029 about a planet ruled by talking apes (mentioned twice).\n3. An action movie about a former Navy SEAL stopping a coup.\n4. A sequel to an illegal cross-country race film titled *The Cannonball Run*, described as a comedy with returning stars and new ones.\n\nHowever, **none of these summaries include emotional descriptors** like "funny," "uplifting," "dramatic," or "actio

## Step 6: Create a RAG chain with chat history

In [9]:
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory

In [11]:
def get_session_history(session_id: str) -> MongoDBChatMessageHistory:
    return MongoDBChatMessageHistory(
        MONGODB_URI, session_id, database_name=DB_NAME, collection_name="history"
    )

In [23]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
# Given a follow-up question and history, create a standalone question
standalone_system_prompt = """
Given a chat history and a follow-up question, rephrase the follow-up question to be a standalone question. \
Do NOT answer the question, just reformulate it if needed, otherwise return it as is. \
Only return the final standalone question. \
"""
standalone_question_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", standalone_system_prompt),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{question}"),
    ]
)

question_chain = standalone_question_prompt | model | parse_output

RunnablePassthrough 是 LangChain 任务流（Runnable） 里最简单的组件之一。
它的作用是 原样传递输入，不做任何处理。

常用于：占位、接受输入并传递给下游 或用作“临时变量”存储计算结果

RunnablePassthrough.assign(context=...) 方法作用：是给 RunnablePassthrough 创建一个新的输出字典。
是给 RunnablePassthrough 创建一个新的输出字典。

它会把传入的每个参数当作 key-value 存入输出字典中，类似：

{
    "context": <你传入的 Runnable/函数的输出>
}

In [24]:
# Generate context by passing output of the question_chain i.e. the standalone question to the retriever
retriever_chain = RunnablePassthrough.assign(
    context=question_chain
    | retriever
    | (lambda docs: "\n\n".join([d.page_content for d in docs]))
)

MessagesPlaceholder(variable_name="history") 的意思是 —— 在这里插入一段“消息历史”，保证多轮对话时，模型能看到之前的聊天内容，而不仅仅是最后一个问题。

In [25]:
# Create a prompt that includes the context, history and the follow-up questio

rag_system_prompt = """Answer the question based only on the following context: \
{context}
"""
rag_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", rag_system_prompt),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{question}"),
    ]
)

In [26]:
# RAG chain
rag_chain = retriever_chain | rag_prompt | model | parse_output

In [27]:
# RAG chain with history
with_message_history = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)
with_message_history.invoke(
    {"question": "What is the best movie to watch when sad?"},
    {"configurable": {"session_id": "1"}},
)

'None of the movie descriptions provided offer information about which movie would be best to watch when sad.'

In [28]:
with_message_history.invoke(
    {
        "question": "Hmmm..I don't want to watch that one. Can you suggest something else?"
    },
    {"configurable": {"session_id": "1"}},
)

"Based on the descriptions provided, none of the movies are specifically recommended for when you're sad. However, if you're looking for something uplifting or emotionally resonant, you might consider *We Were Soldiers*, which highlights bravery, camaraderie, and the deep bonds formed in the face of adversity. It can be a powerful and moving experience that acknowledges hardship while celebrating courage and loyalty.\n\nIf you'd like something more lighthearted or joyful, none of the options provided fit that mood. You might want to look elsewhere for a feel-good film."

In [29]:
with_message_history.invoke(
    {"question": "How about something more light?"},
    {"configurable": {"session_id": "1"}},
)

"None of the movie descriptions provided are lighthearted or comedic in tone. If you're looking for something more light-hearted or cheerful, you may want to explore other options outside the ones described."

## Step 7: Get faster responses using Semantic Cache

**NOTE:** Semantic cache only caches the input to the LLM. When using it in retrieval chains, remember that documents retrieved can change between runs resulting in cache misses for semantically similar queries.

In [61]:
from langchain_core.globals import set_llm_cache
from langchain_mongodb.cache import MongoDBAtlasSemanticCache

set_llm_cache(
    MongoDBAtlasSemanticCache(
        connection_string=MONGODB_URI,
        embedding=embeddings,
        collection_name="semantic_cache",
        database_name=DB_NAME,
        index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
        wait_until_ready=True,  # Optional, waits until the cache is ready to be used
    )
)

In [62]:
%%time
naive_rag_chain.invoke("What is the best movie to watch when sad?")

CPU times: user 87.8 ms, sys: 670 µs, total: 88.5 ms
Wall time: 1.24 s


'Once a Thief'

In [63]:
%%time
naive_rag_chain.invoke("What is the best movie to watch when sad?")

CPU times: user 43.5 ms, sys: 4.16 ms, total: 47.7 ms
Wall time: 255 ms


'Once a Thief'

In [64]:
%%time
naive_rag_chain.invoke("Which movie do I watch when sad?")

CPU times: user 115 ms, sys: 171 µs, total: 115 ms
Wall time: 1.38 s


'I would recommend watching "Last Action Hero" when sad, as it is a fun and action-packed film that can help lift your spirits.'