**Reference Link:** [RAG Systems Essentials (Analytics Vidhya)](https://courses.analyticsvidhya.com/courses/take/rag-systems-essentials/lessons/60148017-hands-on-deep-dive-into-rag-evaluation-metrics-generator-metrics-i)

## Multi-user Conversational RAG System

### Load Dependencies

In [2]:
!pip install -qq langchain
!pip install -qq langchain-openai
!pip install -qq langchain-community

In [3]:
!pip install -qq langchain-chroma

## Enter API Tokens

In [4]:
# from getpass import getpass

# OPENAI_KEY = getpass('Enter your OpenAI Key: ')

## Setup Environment Variables


In [5]:
# import os

# os.environ['OPENAI_API_KEY'] = OPENAI_KEY

In [6]:
import os
from dotenv import load_dotenv

load_dotenv()

True

### Load Wikipedia Data

In [7]:
import gzip
import json
import requests
from tqdm import tqdm
import sys
import os

# Download a file from a URL
def http_get(url, path) -> None:
    """
    Downloads a URL to a given path on disc
    """
    if os.path.dirname(path) != "":
        os.makedirs(os.path.dirname(path), exist_ok=True)

    req = requests.get(url, stream=True)
    if req.status_code != 200:
        print("Exception when trying to download {}. Response {}".format(url, req.status_code), file=sys.stderr)
        req.raise_for_status()
        return

    download_filepath = path + "_part"
    with open(download_filepath, "wb") as file_binary:
        content_length = req.headers.get("Content-Length")
        total = int(content_length) if content_length is not None else None
        progress = tqdm(unit="B", total=total, unit_scale=True)
        for chunk in req.iter_content(chunk_size=1024):
            if chunk:  # filter out keep-alive new chunks
                progress.update(len(chunk))
                file_binary.write(chunk)

    os.rename(download_filepath, path)
    progress.close()


wikipedia_filepath = 'simplewiki-2020-11-01.jsonl.gz'

http_get('http://sbert.net/datasets/simplewiki-2020-11-01.jsonl.gz', wikipedia_filepath)

100%|██████████| 50.2M/50.2M [00:13<00:00, 3.83MB/s]


In [8]:
import gzip
import json

wikipedia_filepath = 'simplewiki-2020-11-01.jsonl.gz'

passages = []
with gzip.open(wikipedia_filepath, 'rt', encoding='utf8') as fIn:
    for line in fIn:
        data = json.loads(line.strip())
        #Only add the first paragraph
        passages.append(data['paragraphs'][0])

print("Passages:", len(passages))

Passages: 169597


In [9]:
len(passages)

169597

In [10]:
print(passages[0])

Ted Cassidy (July 31, 1932 - January 16, 1979) was an American actor. He was best known for his roles as Lurch and Thing on "The Addams Family".


### Load Open AI LLMs


In [11]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

### Generate LLM Embeddings and store them in Chroma Vector DB

**Chroma Vector DB** is a versatile, open-source vector database designed for managing and querying vector embeddings. It is easy to set up and integrates well with various AI tools and algorithms. Chroma is particularly useful for applications that require rapid and precise retrieval of content represented as embeddings—efficient data formats for text, images, and soon, audio and video.

**Key Features:**
- **Integration with AI Tools:** Chroma supports embedding functions from leading providers like OpenAI, Google, and Hugging Face, allowing for flexible and powerful data handling.
- **Ease of Use:** The database provides default embedding functions, or users can integrate external APIs to generate embeddings.
- **Efficient Querying:** Users can create collections to store embeddings, documents, and metadata. These can be queried to retrieve the most similar items, making information retrieval quick and effective.
- **Flexible API:** Chroma offers a straightforward API that supports both standard operations and custom embedding functions.

For more detailed information, visit the official Chroma documentation [here](https://docs.trychroma.com).


In [12]:
passages[:3]

['Ted Cassidy (July 31, 1932 - January 16, 1979) was an American actor. He was best known for his roles as Lurch and Thing on "The Addams Family".',
 'Aileen Carol Wuornos Pralle (born Aileen Carol Pittman; February 29, 1956\xa0– October 9, 2002) was an American serial killer. She was born in Rochester, Michigan. She confessed to killing six men in Florida and was executed in Florida State Prison by lethal injection for the murders. Wuornos said that the men she killed had raped her or tried to rape her while she was working as a prostitute.',
 "A crater is a round dent on a planet. They are usually shaped like a circle or an oval. They are usually made by something like a meteor hitting the surface of a planet. Underground activity such as volcanoes or explosions can also cause them but it's not as likely."]

In [13]:
from langchain_openai import OpenAIEmbeddings

# details here: https://openai.com/blog/new-embedding-models-and-api-updates
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

In [14]:
# The vectorstore we'll be using
from langchain_chroma import Chroma

# The splitting and chunking strategy
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [15]:
from langchain.docstore.document import Document

docs = [Document(page_content=doc) for doc in passages]

In [16]:
splitter = RecursiveCharacterTextSplitter(chunk_size=3000,
                                          chunk_overlap=200)
chunked_docs = splitter.split_documents(docs)

In [17]:
chunked_docs[:3]

[Document(metadata={}, page_content='Ted Cassidy (July 31, 1932 - January 16, 1979) was an American actor. He was best known for his roles as Lurch and Thing on "The Addams Family".'),
 Document(metadata={}, page_content='Aileen Carol Wuornos Pralle (born Aileen Carol Pittman; February 29, 1956\xa0– October 9, 2002) was an American serial killer. She was born in Rochester, Michigan. She confessed to killing six men in Florida and was executed in Florida State Prison by lethal injection for the murders. Wuornos said that the men she killed had raped her or tried to rape her while she was working as a prostitute.'),
 Document(metadata={}, page_content="A crater is a round dent on a planet. They are usually shaped like a circle or an oval. They are usually made by something like a meteor hitting the surface of a planet. Underground activity such as volcanoes or explosions can also cause them but it's not as likely.")]

## Create Vector DB and Retriever

If you have already created `wiki_db`in the previous hands-on session then just load the DB and DO NOT run the following code to create the database again, ignore this when running on Colab

In [None]:
# create vector DB of docs and embeddings - takes 1 min on Colab
chroma_db = Chroma.from_documents(documents=chunked_docs, collection_name='wiki_db',
                                  embedding=openai_embed_model,
                                  # need to set the distance function to cosine else it uses euclidean by default
                                  # check https://docs.trychroma.com/guides#changing-the-distance-function
                                  collection_metadata={"hnsw:space": "cosine"},
                                  persist_directory="./wiki_db")

## Load Vector DB from disk

Run the following code if your vector DB already exists on disk from the previous hands-on session

In [None]:
# load from disk
chroma_db = Chroma(persist_directory="./wiki_db",
                   collection_name='wiki_db',
                   embedding_function=openai_embed_model)

In [None]:
similarity_retriever = chroma_db.as_retriever(search_type="similarity_score_threshold",
                                              search_kwargs={"k": 5, "score_threshold": 0.2})

### Build a QA RAG Chain

In [None]:
from langchain_core.prompts import ChatPromptTemplate

prompt = """You are an assistant for question-answering tasks.
            Use the following pieces of retrieved context to answer the question.
            If the answer is not present in the context, just say that you don't know.
            Keep the answer to the point.

            Question:
            {question}

            Context:
            {context}

            Answer:
         """

prompt_template = ChatPromptTemplate.from_template(prompt)

In [None]:
prompt_template.pretty_print()

## RAG Chain - Using LCEL

In [None]:
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

qa_rag_chain = (
    {
        "context": (similarity_retriever
                      |
                    format_docs),
        "question": RunnablePassthrough()
    }
      |
    prompt_template
      |
    chatgpt
)

In [None]:
query = "What is the fastest animal?"
result = qa_rag_chain.invoke(query)
print(result.content)

In [None]:
query = "Who was the winner of the champions league in 2020?"
result = qa_rag_chain.invoke(query)
print(result.content)

In [None]:
query = "What is the capital of India?"
result = qa_rag_chain.invoke(query)
print(result.content)

# Conversational RAG System with LangChain

In many Q&A applications, the ability to engage in back-and-forth conversations with users is crucial. This necessitates the application having a form of "memory" to recall past interactions and apply this context to current queries.

This guide focuses on integrating historical messages into the application's logic. Additional details on managing chat history can be found [here](https://python.langchain.com/docs/expression_language/how_to/message_history/).

![](https://i.imgur.com/8hLJMPl.gif)

### Building on the Q&A RAG System - to a Conversational Q&A RAG System

We will enhance our Q&A RAG System, which utilizes the Wikipedia dataset, by implementing the following updates:

- **Prompt Adjustment:** Our prompt will be modified to include historical messages as inputs, allowing the system to maintain context over the course of a conversation.

- **Contextualizing Questions:** We will introduce a sub-chain mechanism to reformulate the latest user query by considering the chat history. This is crucial for understanding questions that refer back to previous messages. For example, a query like "Can you elaborate on the second point?" relies on the context provided by preceding interactions, which affects the system's ability to retrieve relevant information effectively.





## Contextualizing the Question

To maintain a seamless flow in conversations, especially in a Q&A setting, it's essential to incorporate historical interactions. Here’s how we achieve this:

### Defining a Sub-Chain for Historical Context

1. **Sub-Chain Creation:** We'll define a sub-chain that uses both historical messages and the latest user query. This sub-chain reformulates the question if it refers to any past interactions, ensuring the system, especially the vector database understands the context to return the most relevant documents to this newly reworded question.

2. **Using `MessagesPlaceholder`:** Our prompt construction involves a `MessagesPlaceholder` variable named `chat_history`. This setup allows us to input a list of messages using the `chat_history` key. The system integrates these messages, positioning them after its own responses and before the latest user question.

3. **Helper Function Usage:** We employ the `create_history_aware_retriever` function available [here](https://api.python.langchain.com/en/latest/chains/langchain.chains.history_aware_retriever.create_history_aware_retriever.html). This function is crucial for handling instances where the chat history might be empty and orchestrates the sequence of operations: `prompt | llm | StrOutputParser() | retriever`.

4. **Chain Construction:** The `create_history_aware_retriever` constructs a chain that processes inputs under the keys `input` and `chat_history`, ensuring the output schema aligns with that of a retriever.

By implementing these steps, our system can effectively utilize historical context to better understand and respond to user queries, thereby enhancing the conversational experience.


In [None]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

rephrase_system_prompt = """Given a chat history and the latest user question
which might reference context in the chat history, formulate a standalone question
which can be understood without the chat history. Do NOT answer the question,
just reformulate it if needed and otherwise return it as is.
"""

rephrase_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", rephrase_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

history_aware_retriever = create_history_aware_retriever(
    chatgpt, similarity_retriever, rephrase_prompt
)

This chain prepends a rephrasing of the input query to our retriever, so that the retrieval incorporates the context of the conversation.

## Building the QA RAG Chain with Chat History

Now we're ready to construct our comprehensive QA RAG chain, which leverages historical context for more accurate and relevant responses.

### Components of the QA RAG Chain

1. **Creating Document Chains:**
   - We use the `create_stuff_documents_chain` function, which is detailed [here](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html). This function is used to create a `question_answer_chain`, accepting inputs such as `context`, `chat_history`, and `input`. It efficiently combines the retrieved context with the conversation history and the current query to generate an informed answer.

2. **Building the Final QA RAG Chain:**
   - The entire QA RAG chain is assembled using the `create_retrieval_chain` function, available [here](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html). This chain integrates the `history_aware_retriever` with the `question_answer_chain`. It retains intermediate outputs like the retrieved context for added convenience during the query handling process.
   - The `create_retrieval_chain` function accepts keys such as `input` and `chat_history` and includes `input`, `chat_history`, `context`, and `answer` in its outputs.

By implementing these steps, the system not only contextualizes but also provides accurate answers by synthesizing information from both the current and historical interactions. This method enhances the conversational AI’s ability to understand and respond to user queries dynamically, making the interactions more engaging and relevant.


In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_system_prompt = """You are an assistant for question-answering tasks.
                      Use the following pieces of retrieved context to answer the question.
                      If the answer is not present in the context, just say that you don't know.
                      Keep the answer to the point.

                      Context:
                      {context}

                  """

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", """Question:
                     {input}

                     Answer:
                  """),
    ]
)

question_answer_chain = create_stuff_documents_chain(chatgpt, qa_prompt)
qa_rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

In [None]:
chat_history = []

question = "What is the capital of India?"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
print(response['answer'])

In [None]:
chat_history

In [None]:
from langchain_core.messages import HumanMessage, AIMessage

chat_history.extend([HumanMessage(content=question),
                     AIMessage(content=response["answer"])])
chat_history

In [None]:
question = "Tell me more about this city"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
print(response['answer'])

In [None]:
chat_history.extend([HumanMessage(content=question),
                     AIMessage(content=response["answer"])])
chat_history

In [None]:
question = "What is the fastest animal?"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
print(response['answer'])

In [None]:
response

In [None]:
chat_history.extend([HumanMessage(content=question),
                     AIMessage(content=response["answer"])])

In [None]:
chat_history

In [None]:
chat_history[-2:]

In [None]:
question = "Tell me about its different species"
response = qa_rag_chain.invoke({"input": question, "chat_history": chat_history})
chat_history.extend([HumanMessage(content=question),
                     AIMessage(content=response["answer"])])
print(response['answer'])