# Conversational RAG

## Intro
* In most RAG applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of "memory" of past questions and answers.

## The problem
* How do we handle when the user refers to previous Q&As in the conversation?

## The second problem...
* This is probably the topic that is worst explained in the LangChain documentation.

## What we need to solve
* Store the chat conversation.
* When the user enters a new input, put that input in context.
* Re-phrase the user input to have a contextualized input.
* Send the contextualized input to the retriever.
* Use the retriever to build a conversational rag chain.
* Add extra features like persising memory (save memory in a file) and session memories.

## The process we will follow
1. Create a basic RAG without memory.
2. Create a ChatPrompTemplate able to contextualize inputs.
3. Create a retriever aware of memory.
4. Create a basic conversational RAG.
5. Create an advanced conversational RAG with persistence and session memories.

## Setup

#### After you download the code from the github repository in your computer
In terminal:
* cd project_name
* pyenv local 3.11.4
* poetry install
* poetry shell

#### To open the notebook with Jupyter Notebooks
In terminal:
* jupyter lab

Go to the folder of notebooks and open the right notebook.

#### To see the code in Virtual Studio Code or your editor of choice.
* open Virtual Studio Code or your editor of choice.
* open the project-folder
* open the 001-conversational-rag.py file

## Create your .env file
* In the github repo we have included a file named .env.example
* Rename that file to .env file and here is where you will add your confidential api keys. Remember to include:
* OPENAI_API_KEY=your_openai_api_key
* LANGCHAIN_TRACING_V2=true
* LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
* LANGCHAIN_API_KEY=your_langchain_api_key
* LANGCHAIN_PROJECT=your_project_name

We will call our LangSmith project **001-conversational-rag**.

## Connect with the .env file located in the same directory of this notebook

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [1]:
#!pip install python-dotenv

In [1]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

## Install LangChain

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [3]:
#!pip install langchain

## Connect with an LLM

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [4]:
#!pip install langchain-openai

* NOTE: Since right now is the best LLM in the market, we will use OpenAI by default. You will see how to connect with other Open Source LLMs like Llama3 or Mistral in a next lesson.

In [2]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [6]:
#!pip install langchain-community langchain-chroma bs4

## The process we will follow
1. Create a basic RAG without memory.
2. Create a ChatPrompTemplate able to contextualize inputs.
3. Create a retriever aware of memory.
4. Create a basic conversational RAG.
5. Create an advanced conversational RAG with persistence and session memories.

## Step 1: Create a basic RAG without memory
* We will use the RAG process we already know.
* We will use create_stuff_documents_chain to build a qa chain: a chain able to asks questions to an LLM.
* We will use create_retrieval_chain and the qa chain to build the RAG chain: a chain able to asks questions to the retriever and then format the response with the LLM.

In [3]:
import bs4
from langchain import hub
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = TextLoader("./data/be-good.txt")

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

splits = text_splitter.split_documents(docs)

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [4]:
question_answer_chain = create_stuff_documents_chain(llm, prompt)

rag_chain = create_retrieval_chain(retriever, question_answer_chain)

* Let's try the app:

In [5]:
output = rag_chain.invoke({"input": "What is this article about?"})

In [6]:
output["answer"]

'The article discusses the principles of starting a successful business, emphasizing the importance of creating something that people want while not overly worrying about immediate profitability. It compares business practices to those of charities, suggesting that running a business with a benevolent approach can lead to success, as illustrated by examples like Craigslist. The author reflects on the insights gained from these principles and their implications for entrepreneurs.'

* As we can see in the following question, our app has no memory of the conversation.

In [7]:
output = rag_chain.invoke({"input": "What was my previous question about?"})

In [8]:
output["answer"]

"I don't know."

## Step 2: Create a ChatPromptTemplate able to contextualize inputs
* Goal: put the input in context and re-phrase it so we have a contextualized input.
* We will define a new system prompt that instructs the LLM in how to contextualize the input.
* Our new ChatPromptTemplate will include:
    * The new system prompt.
    * MessagesPlaceholder, a placeholder used to pass the list of messages included in the chat_history.

In [9]:
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

In [10]:
print(chat_history)

NameError: name 'chat_history' is not defined

## Step 3: Create a Retriever aware of the memory
* We will build our new retriever with create_history_aware_retriever that uses the contextualized input to get a contextualized response.

In [11]:
from langchain.chains import create_history_aware_retriever

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

## Step 4: Create a basic Conversational RAG
* We will use the retriever aware of memory, that uses the prompt with contextualized input.
* We will use create_stuff_documents_chain to build a qa chain: a chain able to asks questions to an LLM.
* We will use create_retrieval_chain and the qa chain to build the RAG chain: a chain able to asks questions to the retriever and then format the response with the LLM.

In [12]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

In [14]:
print(chat_history)

NameError: name 'chat_history' is not defined

#### Trying our basic conversational RAG
Below we ask a question and a follow-up question that requires contextualization to return a sensible response. Because our chain includes a "chat_history" input, the caller needs to manage the chat history. We can achieve this by appending input and output messages to a list:

In [15]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

question = "What is this article about?"

ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
    ]
)

second_question = "What was my previous question about?"

ai_msg_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_2["answer"]),
    ]
)

print(ai_msg_2["answer"])

Your previous question was about the content of the article, specifically asking what it is about.


In [16]:
print(chat_history)

[HumanMessage(content='What is this article about?', additional_kwargs={}, response_metadata={}), AIMessage(content='The article discusses the principles of creating successful startups, emphasizing the importance of making something people want without overly focusing on profit at first. It critiques the shortcomings of many well-intentioned projects and compares successful models like Craigslist, which operate with a charitable mindset yet achieve significant results. Overall, it explores the intersection of business success and benevolence in entrepreneurship.', additional_kwargs={}, response_metadata={}), HumanMessage(content='What is this article about?', additional_kwargs={}, response_metadata={}), AIMessage(content='Your previous question was about the content of the article, specifically asking what it is about.', additional_kwargs={}, response_metadata={})]


## Step 5: Advanced conversational RAG with persistence and session memories
* We will store the chat history in a python dictionary. In advanced apps, you will use advanced ways to store chat history.
* Associate chat history with user session with the function get_session_history().
* Inject chat history into inputs and update it after each interaction using BaseChatMessageHistory and RunnableWithMessageHistory.

Let's now focus on how to handle and maintain chat history in the question and answer (Q&A) application to make conversations flow more naturally.

Here’s a simplified explanation:

1. **Stateful Management of Chat History**: Instead of manually entering previous parts of the conversation every time a new input is made, the application can automatically manage and update chat history. This means that the application remembers past interactions and uses that information to understand and respond to new questions more accurately.

2. **Tools for Managing Chat History**:
   - **BaseChatMessageHistory**: This component is responsible for storing the history of the conversation.
   - **RunnableWithMessageHistory**: This acts as a wrapper around the main processing chain (LCEL chain) and the chat history storage (BaseChatMessageHistory). It takes care of adding the historical chat data to new inputs and updating the stored history after each response.

3. **How It Works**: When you use these components in an application:
   - The application automatically retrieves and updates the chat history every time it processes a new input. This helps in maintaining a continuous and coherent conversation flow.
   - When using `RunnableWithMessageHistory`, it manages chat history using a configuration that includes a unique session identifier (`session_id`). This identifier helps the system know which conversation history to retrieve and update whenever a user interacts with the system.

4. **Alternative ways to store the chat history**: In our simple implementation, chat histories might be stored in a basic dictionary. More complex systems might use databases like Redis to ensure more reliable and long-term storage of conversation data.

In [17]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [18]:
conversational_rag_chain.invoke(
    {"input": "What is this article about?"},
    config={
        "configurable": {"session_id": "001"}
    },  # constructs a key "001" in `store`.
)["answer"]

'The article discusses the balance between creating something that people want and the concern for making money, suggesting that successful businesses can operate similarly to charities. It critiques certain well-intentioned projects for lacking accountability and emphasizes the importance of user satisfaction. The author uses examples like Craigslist to illustrate how a business can thrive while prioritizing user needs over maximizing revenue.'

In [19]:
conversational_rag_chain.invoke(
    {"input": "What was my previous question about?"},
    config={"configurable": {"session_id": "001"}},
)["answer"]

'Your previous question was about the overall topic or content of the article.'

The conversation history can be inspected in the store dict:

In [21]:
for message in store["001"].messages:
    if isinstance(message, AIMessage):
        prefix = "AI"
    else:
        prefix = "User"

    print(f"{prefix}: {message.content}\n")

User: What is this article about?

AI: This article discusses the concept of creating something that people want as a motto for startup success. It explores the idea that focusing on making something people want, rather than worrying about the business model initially, can lead to success. The article suggests that businesses can benefit from operating more like charities in terms of meeting people's needs and achieving success.

User: What was my previous question about?

AI: Your previous question was about the topic or subject of the article that was referenced in the provided text snippets.



In [20]:
print(store)

{'001': InMemoryChatMessageHistory(messages=[HumanMessage(content='What is this article about?', additional_kwargs={}, response_metadata={}), AIMessage(content='The article discusses the balance between creating something that people want and the concern for making money, suggesting that successful businesses can operate similarly to charities. It critiques certain well-intentioned projects for lacking accountability and emphasizes the importance of user satisfaction. The author uses examples like Craigslist to illustrate how a business can thrive while prioritizing user needs over maximizing revenue.', additional_kwargs={}, response_metadata={}), HumanMessage(content='What was my previous question about?', additional_kwargs={}, response_metadata={}), AIMessage(content='Your previous question was about the overall topic or content of the article.', additional_kwargs={}, response_metadata={})])}


## How to execute the code from Visual Studio Code
* In Visual Studio Code, see the file 001-conversational-rag.py
* In terminal, make sure you are in the directory of the file and run:
    * python 001-conversational-rag.py