# Conversational RAG

## Intro
* In many Q&A applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of "memory" of past questions and answers.

## Setup

#### Recommended: create new virtualenv
* mkdir your_project_name
* cd your_project_name
* pyenv virtualenv 3.11.4 your_venv_name
* pyenv activate your_venv_name
* pip install jupyterlab
* jupyter lab

In [1]:
#!pip install python-dotenv

#### .env File
Remember to include:
OPENAI_API_KEY=your_openai_api_key

LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
LANGCHAIN_API_KEY=your_langchain_api_key
LANGCHAIN_PROJECT=your_project_name

We will call our LangSmith project **conversationalRAG**.

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

#### Install LangChain

In [3]:
#!pip install langchain

## Connect with an LLM

In [4]:
#!pip install langchain-openai

* NOTE: Since right now is the best LLM in the market, we will use OpenAI by default. You will see how to connect with other Open Source LLMs like Llama3 or Mistral in a next lesson.

In [5]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

In [6]:
#!pip install langchain-community langchainhub langchain-chroma bs4

## Basic RAG without memory

In [7]:
import bs4
from langchain import hub
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = TextLoader("./data/be-good.txt")

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

splits = text_splitter.split_documents(docs)

vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [8]:
question_answer_chain = create_stuff_documents_chain(llm, prompt)

rag_chain = create_retrieval_chain(retriever, question_answer_chain)

In [9]:
output = rag_chain.invoke({"input": "What is this article about?"})

In [10]:
output["answer"]

'The article discusses the concept of making something people want as a motto for startups, advising founders not to worry excessively about the business model initially. It explores the idea that creating something people want and not focusing too much on making money can resemble a charity. Examples like Craigslist running successfully with a charitable approach are also provided.'

* As we can see in the following question, our app has no memory of the conversation.

In [11]:
output = rag_chain.invoke({"input": "What was my previous question about?"})

In [12]:
output["answer"]

'Your previous question was about the idea of betting against benevolence and how powerful forces like benevolence can lead to successful outcomes in various contexts, such as in the case of Internet startups like Google and Microsoft. The discussion also touched on the success of organizations like Craigslist that operate with a benevolent approach, even though they are not structured as charities.'

## Adding memory

We will now improve the application so that it better understands conversations over time, especially when questions relate to earlier parts of the conversation.

1. **Chat History Context**: When you chat with someone, your questions might refer to things you discussed earlier. For example, if you ask, "What are common ways of doing it?" after discussing "Task Decomposition," the system should understand that "it" means "Task Decomposition."

2. **Improving the App**: To handle this, the app needs two updates:
   - **Prompt Update**: Change the app's input setup so it can use previous messages in the conversation to understand the context of new questions.
   - **Contextualizing Questions**: Create a process where the app takes the latest question and, if necessary, reformulates it by using the conversation history. This helps the app understand what the question is actually about.

3. **Technical Steps**:
   - A new function is introduced to integrate previous messages into the app’s process when formulating responses.
   - When a new question is asked, the app can look at the earlier conversation to better understand and answer the question. This involves taking the previous chat, analyzing it alongside the new question, and then reformulating the question if needed before looking for answers.

This approach ensures the app responds more accurately to questions based on the full conversation, not just the last question asked.

In [13]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

Let's now focus on how to enhance this question and answer (QA) system to better understand and use the context of an ongoing conversation when providing answers.

Here's a breakdown of the process in simpler terms:

1. **Rephrasing the Query**: Before looking up information (retrieving), the system reformulates the input question by considering the previous conversation. This helps ensure that the information it gathers is relevant to the current discussion.

2. **Building the QA Chain**:
   - **Updating the Retriever**: The system updates its information retrieval process to a new version that is aware of the conversation history. This new version is called `history_aware_retriever`.
   - **Generating Answers**: A function named `create_stuff_documents_chain` is used to construct a `question_answer_chain`. This chain takes the rephrased query, the conversation history, and any relevant retrieved information to produce an answer.

3. **Final Chain Assembly**:
   - A final process chain, referred to as `rag_chain`, is created using another function called `create_retrieval_chain`.
   - This chain first uses the `history_aware_retriever` to gather information relevant to the rephrased query and the conversation history.
   - Then, the `question_answer_chain` uses this information along with the initial query and chat history to generate an answer.
   - The system keeps track of all the intermediate information (like the retrieved context) to make the process more efficient and transparent.

Essentially, the system enhances the accuracy and relevance of responses in a conversation by ensuring that every new question is considered in the full context of what has been previously discussed.

In [14]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)


question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

Below we ask a question and a follow-up question that requires contextualization to return a sensible response. Because our chain includes a "chat_history" input, the caller needs to manage the chat history. We can achieve this by appending input and output messages to a list:

In [15]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

question = "What is this article about?"

ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
    ]
)

second_question = "What was my previous question about?"

ai_msg_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})

print(ai_msg_2["answer"])

Your previous question was about the general topic or theme of the article.


Let's now focus on how to handle and maintain chat history in the question and answer (Q&A) application to make conversations flow more naturally.

Here’s a simplified explanation:

1. **Stateful Management of Chat History**: Instead of manually entering previous parts of the conversation every time a new input is made, the application can automatically manage and update chat history. This means that the application remembers past interactions and uses that information to understand and respond to new questions more accurately.

2. **Tools for Managing Chat History**:
   - **BaseChatMessageHistory**: This component is responsible for storing the history of the conversation.
   - **RunnableWithMessageHistory**: This acts as a wrapper around the main processing chain (LCEL chain) and the chat history storage (BaseChatMessageHistory). It takes care of adding the historical chat data to new inputs and updating the stored history after each response.

3. **How It Works**: When you use these components in an application:
   - The application automatically retrieves and updates the chat history every time it processes a new input. This helps in maintaining a continuous and coherent conversation flow.
   - For example, when using `RunnableWithMessageHistory`, it manages chat history using a configuration that includes a unique session identifier (`session_id`). This identifier helps the system know which conversation history to retrieve and update whenever a user interacts with the system.

4. **Example of Implementation**: In a simple implementation, chat histories might be stored in a basic dictionary. More complex systems might use databases like Redis to ensure more reliable and long-term storage of conversation data.

Overall, these tools and methods help a Q&A application remember and utilize previous interactions, making the conversation feel more natural and informed.

In [16]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [17]:
conversational_rag_chain.invoke(
    {"input": "What is this article about?"},
    config={
        "configurable": {"session_id": "001"}
    },  # constructs a key "001" in `store`.
)["answer"]

Parent run 2b2ecc13-4056-44e5-b5ea-9094a966498f not found for run baa1cdbf-9e8f-4dcd-af4e-e015efa8121c. Treating as a root run.


'The article discusses the motto of Y Combinator, "Make something people want," and how focusing on creating value for users can lead to success. It also explores the idea of running a business like a charity, using examples like Craigslist, and how this approach can still be successful. The author reflects on the surprising connection between creating something people want and operating like a charity in the business world.'

In [18]:
conversational_rag_chain.invoke(
    {"input": "What was my previous question about?"},
    config={"configurable": {"session_id": "001"}},
)["answer"]

Parent run 1b87cc43-334f-462d-b583-d9ec72795006 not found for run 92207bd3-623c-4d1d-b9af-9ce145f833bf. Treating as a root run.


'Your previous question was about the topic or subject matter of the article.'

The conversation history can be inspected in the store dict:

In [19]:
for message in store["001"].messages:
    if isinstance(message, AIMessage):
        prefix = "AI"
    else:
        prefix = "User"

    print(f"{prefix}: {message.content}\n")

User: What is this article about?

AI: The article discusses the motto of Y Combinator, "Make something people want," and how focusing on creating value for users can lead to success. It also explores the idea of running a business like a charity, using examples like Craigslist, and how this approach can still be successful. The author reflects on the surprising connection between creating something people want and operating like a charity in the business world.

User: What was my previous question about?

AI: Your previous question was about the topic or subject matter of the article.

