# Conversational RAG

## Intro
* In most RAG applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of "memory" of past questions and answers.

## The problem
* How do we handle when the user refers to previous Q&As in the conversation?

## The second problem...
* This is probably the topic that is worst explained in the LangChain documentation.

## What we need to solve
* Store the chat conversation.
* When the user enters a new input, put that input in context.
* Re-phrase the user input to have a contextualized input.
* Send the contextualized input to the retriever.
* Use the retriever to build a conversational rag chain.
* Add extra features like persising memory (save memory in a file) and session memories.

## The process we will follow
1. Create a basic RAG without memory.
2. Create a ChatPrompTemplate able to contextualize inputs.
3. Create a retriever aware of memory.
4. Create a basic conversational RAG.
5. Create an advanced conversational RAG with persistence and session memories.

## Setup

#### After you download the code from the github repository in your computer
In terminal:
* cd project_name
* pyenv local 3.11.4
* poetry install
* poetry shell

#### To open the notebook with Jupyter Notebooks
In terminal:
* jupyter lab

Go to the folder of notebooks and open the right notebook.

#### To see the code in Virtual Studio Code or your editor of choice.
* open Virtual Studio Code or your editor of choice.
* open the project-folder
* open the 001-conversational-rag.py file

## Create your .env file
* In the github repo we have included a file named .env.example
* Rename that file to .env file and here is where you will add your confidential api keys. Remember to include:
* OPENAI_API_KEY=your_openai_api_key
* LANGCHAIN_TRACING_V2=true
* LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
* LANGCHAIN_API_KEY=your_langchain_api_key
* LANGCHAIN_PROJECT=your_project_name

We will call our LangSmith project **001-conversational-rag**.

## Connect with the .env file located in the same directory of this notebook

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [1]:
#!pip install python-dotenv

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

## Install LangChain

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [3]:
#!pip install langchain

## Connect with an LLM

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [4]:
#!pip install langchain-openai

* NOTE: Since right now is the best LLM in the market, we will use OpenAI by default. You will see how to connect with other Open Source LLMs like Llama3 or Mistral in a next lesson.

In [5]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

If you are using the pre-loaded poetry shell, you do not need to install the following package because it is already pre-loaded for you:

In [6]:
#!pip install langchain-community langchain-chroma bs4

## The process we will follow
1. Create a basic RAG without memory.
2. Create a ChatPrompTemplate able to contextualize inputs.
3. Create a retriever aware of memory.
4. Create a basic conversational RAG.
5. Create an advanced conversational RAG with persistence and session memories.

## Step 1: Create a basic RAG without memory
* We will use the RAG process we already know.
* We will use create_stuff_documents_chain to build a qa chain: a chain able to asks questions to an LLM.
* We will use create_retrieval_chain and the qa chain to build the RAG chain: a chain able to asks questions to the retriever and then format the response with the LLM.

In [None]:
import bs4
#from langchain import hub
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.document_loaders import TextLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAIEmbeddings

text = TextLoader("data/be-good.txt").load()

from langchain_community.vectorstores import FAISS
vector_db = FAISS.from_documents(text, OpenAIEmbeddings())
retriever = vector_db.as_retriever()

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [8]:
question_answer_chain = create_stuff_documents_chain(llm, prompt)

rag_chain = create_retrieval_chain(retriever, question_answer_chain)

* Let's try the app:

In [9]:
output = rag_chain.invoke({"input": "What is this article about?"})

In [10]:
output["answer"]

'The article discusses the motto "Make something people want" coined by Y Combinator founders and how it relates to building successful businesses. It explores the idea of focusing on creating value for users before worrying about monetization, suggesting that this approach could resemble a charity model. Examples like Craigslist are used to illustrate this concept of running a successful business with a focus on user needs over profit.'

* As we can see in the following question, our app has no memory of the conversation.

In [11]:
output = rag_chain.invoke({"input": "What was my previous question about?"})

In [12]:
output["answer"]

'Your previous question was about the concept of benevolence in businesses and organizations, specifically how being benevolent can lead to success and growth. The idea was discussed in relation to examples such as Google, Microsoft, and Craigslist, highlighting the potential power of benevolence as a guiding principle.'

## Step 2: Create a ChatPromptTemplate able to contextualize inputs
* Goal: put the input in context and re-phrase it so we have a contextualized input.
* We will define a new system prompt that instructs the LLM in how to contextualize the input.
* Our new ChatPromptTemplate will include:
    * The new system prompt.
    * MessagesPlaceholder, a placeholder used to pass the list of messages included in the chat_history.

In [13]:
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

## Step 3: Create a Retriever aware of the memory
* We will build our new retriever with create_history_aware_retriever that uses the contextualized input to get a contextualized response.

In [14]:
from langchain.chains import create_history_aware_retriever

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

## Step 4: Create a basic Conversational RAG
* We will use the retriever aware of memory, that uses the prompt with contextualized input.
* We will use create_stuff_documents_chain to build a qa chain: a chain able to asks questions to an LLM.
* We will use create_retrieval_chain and the qa chain to build the RAG chain: a chain able to asks questions to the retriever and then format the response with the LLM.

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

history_rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

#### Trying our basic conversational RAG
Below we ask a question and a follow-up question that requires contextualization to return a sensible response. Because our chain includes a "chat_history" input, the caller needs to manage the chat history. We can achieve this by appending input and output messages to a list:

In [17]:
from langchain_core.messages import HumanMessage, AIMessage
chat_history = []

for i in range(0, 15):
    query = str(input("Human : "))
    if query == "exit":
        break
    else :
        response = history_rag_chain.invoke({"input": query, "chat_history": chat_history})
        print(response["answer"])
        chat_history.extend(
            [
                HumanMessage(content=query),
                AIMessage(content=response["answer"]),
            ]
        )
    i+=1

Your previous question was about the topic or subject of the article under discussion.


## How to execute the code from Visual Studio Code
* In Visual Studio Code, see the file Men6.py
* In terminal, make sure you are in the directory of the file and run:
    * python Men6.py