# How to Add Memory of Past Questions in a RAG App: Chat Memory

* [Link to the Documentation Page.](https://python.langchain.com/docs/use_cases/question_answering/chat_history)

## Goal
* Allow the user to have a back-and-forth conversation, meaning the application needs some sort of “memory” of past questions and answers.
* In this guide we focus on adding logic for incorporating historical messages, and NOT on chat history management. Chat history management is covered [here](https://python.langchain.com/docs/expression_language/how_to/message_history).

## Dependencies and Necessary Modules
* Same as with the project we develop for the Quickstart Guide (see notebook 159-rag-quickstart).

## .env file
* OpenAI API key.
* LangSmith credentials.
* Our LangSmith project name: RAGchatMemory

In [1]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

## Open LangSmith to track the following operations
* smith.langchain.com

## The initial RAG App without sources

In [2]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [3]:
# Load, chunk and index the contents of the blog.
bs_strainer = bs4.SoupStrainer(class_=("post-content", "post-title", "post-header"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs_strainer},
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [4]:
rag_chain.invoke("What is Task Decomposition?")

'Task Decomposition is a technique used to break down complex tasks into smaller and simpler steps. This approach helps agents to plan and execute tasks more efficiently by transforming big tasks into manageable components. Task decomposition can be achieved through various methods such as prompting with specific instructions or utilizing human inputs.'

## Contextualizing the question: define a sub-chain that takes historical messages and the latest user question
* We’ll use a prompt that includes a MessagesPlaceholder variable under the name “chat_history”. This allows us to pass in a list of Messages to the prompt using the “chat_history” input key, and these messages will be inserted after the system message and before the human message containing the latest question.

In [6]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)
contextualize_q_chain = contextualize_q_prompt | llm | StrOutputParser()

Now we can ask follow-up questions that reference past messages and have them reformulated into standalone questions:

In [7]:
from langchain_core.messages import AIMessage, HumanMessage

contextualize_q_chain.invoke(
    {
        "chat_history": [
            HumanMessage(content="What does LLM stand for?"),
            AIMessage(content="Large language model"),
        ],
        "question": "What is meant by large",
    }
)

'What is the definition of "large" in this context?'

## Chain with chat history
* We will add some routing functionality to only run the “condense question chain” when our chat history isn’t empty.
* We will take advantage of the fact that if a function in an LCEL (LangChain Expression Language) chain returns another chain, that chain will itself be invoked.

In [8]:
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{question}"),
    ]
)


def contextualized_question(input: dict):
    if input.get("chat_history"):
        return contextualize_q_chain
    else:
        return input["question"]


rag_chain = (
    RunnablePassthrough.assign(
        context=contextualized_question | retriever | format_docs
    )
    | qa_prompt
    | llm
)

In [9]:
chat_history = []

question = "What is Task Decomposition?"
ai_msg = rag_chain.invoke({"question": question, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question), ai_msg])

second_question = "What are common ways of doing it?"
rag_chain.invoke({"question": second_question, "chat_history": chat_history})

AIMessage(content='Task decomposition can be done in common ways such as using Language Model (LLM) with simple prompting, task-specific instructions, or human inputs. LLMs can be prompted with specific instructions like "Steps for XYZ" to break down tasks into smaller steps. Task-specific instructions, such as "Write a story outline," can also help in decomposing tasks into manageable subgoals. Additionally, human inputs can be utilized to decompose complex tasks effectively.', response_metadata={'token_usage': {'completion_tokens': 90, 'prompt_tokens': 640, 'total_tokens': 730}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_3bc1b5746c', 'finish_reason': 'stop', 'logprobs': None})

## Advanced technique
* In a real Q&A application we’ll want some way of persisting chat history and some way of automatically inserting and updating it. You can learn about how to do it [here](https://python.langchain.com/docs/expression_language/how_to/message_history).