# **Introduction to Retrieval Augmented Generation**

We know that LLMs have the capability to generate stuff by themselves. But these tools aren't perfect.

Even though they're super smart, they sometimes get things wrong, especially if they need to be really precise or use the latest information. So, to fix this, some of the brightest minds at Meta AI came up with a new trick called retrieval-augmented generation, or RAG for short, in 2020.

Think of it as giving our language models an assistant. This assistant digs through a massive pile of updated information and feeds the most relevant and recent bits to the LLM.

**Benefits:**  
1. **Enhanced factual accuracy and Domain Specific Expertise:** Imagine a customer service chatbot trained on general conversation data. It might struggle with technical domain specific questions. RAGs allow you to integrate domain-specific knowledge bases, enabling the chatbot to handle these inquires with expertise.
2. **Reduce Hallucination:** LLMs can generate false information, a phenomenon known as hallucination. The knowledge base provided can help support the claims of generative model.

**Components of RAG:**  
1. **Retrieval:** When a user asks a question or provides a prompt, retrievals first help fetch relevant passages from a vast knowledge base. This Knowledge Base could be the company's internal documents, or any other source of text data.
2. **Augmentation:** The retrieved passages are then used to "augment" the LLM's knowledge. This can include various techniques, such as summarization or encoding the key information.
3. **Generation:** Finally LLM leverages its understanding of language along with the augmented information to generate a response. This response can be an answer to a question, a creative text format based on a prompt, etc...

**Applications:**  
1. Question Answering: A RAG powered customer care chatbot can answer customer queries by retrieving product information, FAQs and guides to provide a well-rounded response.
2. Document Summarization: A research paper summarization tool can use RAG to retrieve relevant sections and then generate a summary highlighting main points.
3. Creative Text Generation: A story writing assistant can use RAG to retrieve information about historical periods or fictional creation, helping LLM to generate more deeply engaging stories.
4. Code Generation: A code completion tool can use RAG to retrieve relevant code examples and API documentation, helping developers write code more efficiently.


## **What are Retrievals?**
Understand that the retrievals are specialized in navigating through vast amounts of data to find information that is relevant to a specific query or context. 

Retrieval models focus on the precision of matching query criteria with the data they have access to. Note that retrieval models rely heavily on the quality and structure of the data they access. Their performance depends on the relevance and accuracy of the information stored in the databases they query. 

In simple terms, retrievals search and identify relevant data from a large corpus for a given query.

## **Building a RAG System**
Step 1: Create an Index on available Knowledge Base  
- Data from formats like PDF, HTML, etc is cleaned and converted into plain text. This text is then divided into smaller parts (i.e chunks) and turned into vector representations by passing the chunks into the embedding model to make it easier to find later.

Step 2: Create a Retrival
- When someone asks a question, the RAG system turns that question into vector embedding using the same method used in indexing. Then, it compares this vector to the vectors of the indexed text parts to fing the `k` most similar chunks. These `k` most similar chunks are used in the next step as a context.

Step 3: Generation  
- The system combines the retrieved text parts (i.e. context) with the original question to create a prompt. The LLM uses this prompt to answer the question.

**Step 1: Create an Index on available Knowledge Base**      
1. **Data Collection:** Carefully ingest the data from various sources. This data forms the basis of Knowledge Base.
2. **Split and Parse:** Once the data is ingested, it needs to be broken down into manageable chunks. This is important because the LLMs havve a maximum context wondow that they can process in one go. During this step the data is not only splitted but also parsed to extract the useful metadata. Metadata can be information like document title, authors, etc...
3. **Embedding Generation:** The next step is to convert the chunks into vector embeddings. This can be done using embedding models like BERT, GPT etc... that transforms text into a vector space while capturing semantic relationships and contextual meaning into numberical representation.
4. **Vector Database:** The final step is to store the generated embeddings along with the metadata in a vector database such as ChromaDB, PineCone, etc... These databases are optimized for handing large volumes of data and allow efficient querying.

## **Step 1: Create an Index on available Knowledge Base**

In [3]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import TextLoader

loader = DirectoryLoader('data/subtitles', glob="*.srt", show_progress=True, loader_cls=TextLoader)

docs = loader.load()

100%|█████████████████████████████████████████| 10/10 [00:00<00:00, 5207.73it/s]


In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

chunks = text_splitter.split_documents(docs)

In [8]:
print("Number of Documents:", len(docs))
print()
print("Number of Chunks:", len(chunks))

Number of Documents: 10

Number of Chunks: 514


In [9]:
from langchain_openai import OpenAIEmbeddings

f = open("keys/.openai_api_key.txt")
OPENAI_API_KEY = f.read()

embeddings_model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

In [16]:
# ! pip install langchain_chroma
from langchain_chroma import Chroma

CHROMA_PATH = "./chroma_db_"

db_chroma = Chroma.from_documents(chunks, embeddings_model, persist_directory=CHROMA_PATH)

In [18]:
# Setting up a DB connection 

db_chroma = Chroma(persist_directory=CHROMA_PATH, embedding_function=embeddings_model)

## **Step 2: Create a Retrieval**

In [21]:
query = 'Who is Rachem?'

docs_chroma = db_chroma.similarity_search_with_score(query, k=5)

context_text = "\n\n".join([doc.page_content for doc, _score in docs_chroma])

In [23]:
len(docs_chroma)

5

In [24]:
context_text

'242\n00:14:53,433 --> 00:14:55,264\n"Just a waitress"?\n\n243\n00:14:56,569 --> 00:14:58,298\nNow that was....\n\n244\n00:14:58,738 --> 00:15:00,865\nI mean, as opposed to....\n\n245\n00:15:02,375 --> 00:15:04,707\nOkay, is this over yet? Rach?\n\n246\n00:15:05,478 --> 00:15:08,879\nI do not have chubby ankles!\n\n247\n00:15:09,082 --> 00:15:10,242\nNo! I\n\n248\n00:15:10,483 --> 00:15:13,611\nOkay, look at the other side.\nLook at Julie\'s column.\n\n249\n00:15:14,487 --> 00:15:15,954\n"She\'s not Rachem"?\n\n242\n00:14:53,433 --> 00:14:55,264\n"Just a waitress"?\n\n243\n00:14:56,569 --> 00:14:58,298\nNow that was....\n\n244\n00:14:58,738 --> 00:15:00,865\nI mean, as opposed to....\n\n245\n00:15:02,375 --> 00:15:04,707\nOkay, is this over yet? Rach?\n\n246\n00:15:05,478 --> 00:15:08,879\nI do not have chubby ankles!\n\n247\n00:15:09,082 --> 00:15:10,242\nNo! I\n\n248\n00:15:10,483 --> 00:15:13,611\nOkay, look at the other side.\nLook at Julie\'s column.\n\n249\n00:15:14,487 --> 00:15

## **Step 3: Generation**

In [25]:
from langchain_core.prompts import ChatPromptTemplate

PROMPT_TEMPLATE = """
Answer the question based only on the following context:
{context}
Answer the question based on the above context: {question}.
Provide a detailed answer.
Don’t justify your answers.
Don’t give information not mentioned in the CONTEXT INFORMATION.
Do not say "according to the context" or "mentioned in the context" or similar.
"""

prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)

prompt = prompt_template.format(context=context_text, question=query)

In [31]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(openai_api_key=OPENAI_API_KEY)

response_text = model.invoke(prompt)

print(response_text.content)

Rachem is a typo or mispronunciation of the name Rachel, as seen in the dialogue where the characters mention "She's not Rachem" and "It's always been you, Rach."
