# **Introduction to Retrieval Augmented Generation**

We know that LLMs have the capability to generate stuff by themselves. But these tools aren't perfect.

Even though they're super smart, they sometimes get things wrong, especially if they need to be really precise or use the latest information. So, to fix this, some of the brightest minds at Meta AI came up with a new trick called retrieval-augmented generation, or RAG for short, in 2020.

Think of it as giving our language models an assistant. This assistant digs through a massive pile of updated information and feeds the most relevant and recent bits to the LLM.

**Benefits:**  
1. **Enhanced factual accuracy and Domain Specific Expertise:** Imagine a customer service chatbot trained on general conversation data. It might struggle with technical domain specific questions. RAGs allow you to integrate domain-specific knowledge bases, enabling the chatbot to handle these inquires with expertise.
2. **Reduce Hallucination:** LLMs can generate false information, a phenomenon known as hallucination. The knowledge base provided can help support the claims of generative model.

**Components of RAG:**  
1. **Retrieval:** When a user asks a question or provides a prompt, retrievals first help fetch relevant passages from a vast knowledge base. This Knowledge Base could be the company's internal documents, or any other source of text data.
2. **Augmentation:** The retrieved passages are then used to "augment" the LLM's knowledge. This can include various techniques, such as summarization or encoding the key information.
3. **Generation:** Finally LLM leverages its understanding of language along with the augmented information to generate a response. This response can be an answer to a question, a creative text format based on a prompt, etc...

**Applications:**  
1. Question Answering: A RAG powered customer care chatbot can answer customer queries by retrieving product information, FAQs and guides to provide a well-rounded response.
2. Document Summarization: A research paper summarization tool can use RAG to retrieve relevant sections and then generate a summary highlighting main points.
3. Creative Text Generation: A story writing assistant can use RAG to retrieve information about historical periods or fictional creation, helping LLM to generate more deeply engaging stories.
4. Code Generation: A code completion tool can use RAG to retrieve relevant code examples and API documentation, helping developers write code more efficiently.


## **What are Retrievals?**
Understand that the retrievals are specialized in navigating through vast amounts of data to find information that is relevant to a specific query or context.

Retrieval models focus on the precision of matching query criteria with the data they have access to. Note that retrieval models rely heavily on the quality and structure of the data they access. Their performance depends on the relevance and accuracy of the information stored in the databases they query.

In simple terms, retrievals search and identify relevant data from a large corpus for a given query.

## **Building a RAG System**
Step 1: Create an Index on available Knowledge Base  
- Data from formats like PDF, HTML, etc is cleaned and converted into plain text. This text is then divided into smaller parts (i.e chunks) and turned into vector representations by passing the chunks into the embedding model to make it easier to find later.

Step 2: Create a Retrival
- When someone asks a question, the RAG system turns that question into vector embedding using the same method used in indexing. Then, it compares this vector to the vectors of the indexed text parts to fing the `k` most similar chunks. These `k` most similar chunks are used in the next step as a context.

Step 3: Generation  
- The system combines the retrieved text parts (i.e. context) with the original question to create a prompt. The LLM uses this prompt to answer the question.

**Step 1: Create an Index on available Knowledge Base**      
1. **Data Collection:** Carefully ingest the data from various sources. This data forms the basis of Knowledge Base.
2. **Split and Parse:** Once the data is ingested, it needs to be broken down into manageable chunks. This is important because the LLMs havve a maximum context wondow that they can process in one go. During this step the data is not only splitted but also parsed to extract the useful metadata. Metadata can be information like document title, authors, etc...
3. **Embedding Generation:** The next step is to convert the chunks into vector embeddings. This can be done using embedding models like BERT, GPT etc... that transforms text into a vector space while capturing semantic relationships and contextual meaning into numberical representation.
4. **Vector Database:** The final step is to store the generated embeddings along with the metadata in a vector database such as ChromaDB, PineCone, etc... These databases are optimized for handing large volumes of data and allow efficient querying.



# Step 1:Create an Index on availble knowledge Base

In [1]:
!pip install langchain_community

Collecting langchain_community
  Downloading langchain_community-0.3.4-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain<0.4.0,>=0.3.6 (from langchain_community)
  Downloading langchain-0.3.6-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.14 (from langchain_community)
  Downloading langchain_core-0.3.15-py3-none-any.whl.metadata (6.3 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.6.1-py3-none-any.whl.metadata (3.5 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.23.0-py3-none-any.whl.metadata (7.6 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-jso

In [4]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import TextLoader

loader = DirectoryLoader("/content/sample_data",glob = "*.srt",show_progress = True,loader_cls = TextLoader)
docs = loader.load()

# /content/srt files - files path
# glob = "*.srt" : * - all, .srt - file format
# show_progress = True - while executing

100%|██████████| 10/10 [00:00<00:00, 6978.88it/s]


In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500,chunk_overlap = 50)
chunks = text_splitter.split_documents(docs)

In [6]:
print("Number of Documents:", len(docs))
print()
print("Number of Chunks:", len(chunks))

Number of Documents: 10

Number of Chunks: 514


In [7]:
!pip install langchain-openai

Collecting langchain-openai
  Downloading langchain_openai-0.2.5-py3-none-any.whl.metadata (2.6 kB)
Collecting tiktoken<1,>=0.7 (from langchain-openai)
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading langchain_openai-0.2.5-py3-none-any.whl (50 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m47.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tiktoken, langchain-openai
Successfully installed langchain-openai-0.2.5 tiktoken-0.8.0


In [8]:
from langchain_openai import ChatOpenAI

from langchain_openai import OpenAIEmbeddings

from google.colab import userdata # api_key has uploaded in google colab
API_Key = userdata.get('OpenAIAPIKey')

embedding_model = OpenAIEmbeddings(openai_api_key =API_Key )

In [9]:
!pip install langchain_chroma

Collecting langchain_chroma
  Downloading langchain_chroma-0.1.4-py3-none-any.whl.metadata (1.6 kB)
Collecting chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0 (from langchain_chroma)
  Downloading chromadb-0.5.17-py3-none-any.whl.metadata (6.8 kB)
Collecting fastapi<1,>=0.95.2 (from langchain_chroma)
  Downloading fastapi-0.115.4-py3-none-any.whl.metadata (27 kB)
Collecting build>=1.0.3 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain_chroma)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain_chroma)
  Downloading chroma_hnswlib-0.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain_chroma)
  Downloading uvicorn-0.32.0-py3-none-any.whl.metadata (6.6 kB)
Collecting posthog>=2.4.0 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain_chroma)
  Do

In [10]:
# Initialize a ChromaDB connection
from langchain_chroma import Chroma

# Initialize the database connection
# If database exist, it will connect with the collection_name and persist_directory
# Otherwise a new collection will be created

db = Chroma(collection_name = "vector_database",
            embedding_function = embedding_model,
            persist_directory= "./chroma_db_")


In [11]:
db.add_documents(chunks)

['0cf29eec-48d1-4a78-9469-9f4b01d87bac',
 'ff2f4f8d-393b-4c3d-b8f3-3f19d5927612',
 '0841ea41-0b5c-4aff-b0b7-5b4b04a2b022',
 '28386e53-5776-498e-8c58-50c52aec00ee',
 'cbe38680-ca71-413f-99aa-eaffa800dec5',
 '5d12d8a1-5e37-4689-8a2b-0977f4dc12a7',
 '844769eb-a713-442f-8239-996c5c668344',
 'd33020cd-360f-4198-8e92-2a916ec7251a',
 '1b0e8c84-642e-4e6c-b0db-6351ad387c08',
 'a5acf133-2369-4cf9-930b-83af4b35c8e9',
 '39bbebc3-bd62-4f9c-9df3-9eea06e2b241',
 'a54368e4-2a20-4872-a521-a1fd9dfd57ab',
 'ed8f54ea-2a68-4e73-84b9-31976e629f5e',
 '81aca97a-5562-4674-8887-e1c3707cce47',
 'afec6a06-422d-414f-8076-4ee0d4f6ef81',
 '1560854b-70b0-4dde-a48d-07ee77011896',
 '16a2789e-1231-40e4-a94b-e194384dc0f0',
 '82a68a57-6d9b-4465-a113-366389c04588',
 '3939b5d2-7c5d-4f75-bff0-e06b629d5f81',
 '7f86a08a-b3a7-417c-bc5d-a213dc813cd4',
 'db344da4-b72b-4245-81bf-92edc8438c0f',
 'c0e39974-a43f-4fed-b406-1c1738d7ff64',
 '28f98e9c-02c1-4469-be62-3b6e2216bb52',
 '91fa1b8c-b7aa-4d95-bdd7-4609d6b0c6ec',
 'f67ba547-b256-

In [12]:
# we can check the already existing values
print(len(db.get()['ids']))

514


# Step 2:Create a Retrieval

In [13]:
query = "Who is Rachem?"

docs_chroma = db.similarity_search_with_score(query,k =3) # extract top 3 relavant

print(len(docs_chroma))

3


In [14]:
context_text = "\n\n".join([doc.page_content for doc, _score in docs_chroma])

print(context_text)

242
00:14:53,433 --> 00:14:55,264
"Just a waitress"?

243
00:14:56,569 --> 00:14:58,298
Now that was....

244
00:14:58,738 --> 00:15:00,865
I mean, as opposed to....

245
00:15:02,375 --> 00:15:04,707
Okay, is this over yet? Rach?

246
00:15:05,478 --> 00:15:08,879
I do not have chubby ankles!

247
00:15:09,082 --> 00:15:10,242
No! I

248
00:15:10,483 --> 00:15:13,611
Okay, look at the other side.
Look at Julie's column.

249
00:15:14,487 --> 00:15:15,954
"She's not Rachem"?

46
00:03:06,910 --> 00:03:11,010
I'm the same way.Show me a bottle of wine.
and I'm like, Wow, who am I?

47
00:03:13,254 --> 00:03:16,553
Wait. We're talking about Rachel.
You and Rachel.

48
00:03:16,758 --> 00:03:19,727
I've been dreaming about
me and Rachel for 10 years.

49
00:03:19,927 --> 00:03:21,656
But now I'm with Julie.

50
00:03:21,896 --> 00:03:24,763
So it's like, me and Julie,
me and Rachel.

51
00:03:24,999 --> 00:03:27,092
Me and Julie, me and Rach

158
00:10:58,068 --> 00:11:01,305
And then one 

# Step 3 -Generation

In [None]:
from langchain_core.prompts import ChatPromptTemplate

Prompt_template = """
Answer the question based only on the following context:{context}

Answer the question based on the above context:{question}
provide a detailed answer.
Don't justify your answers.
Don't give information not mentioned in the Context Information.
Do not say "according to the  context" or "mentioned in the context" or similar.
"""

prompt_template = ChatPromptTemplate.from_template(Prompt_template)

prompt = prompt_template.format(context =context_text,question = query)


In [20]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(openai_api_key =API_Key)

response_text = model.invoke(prompt)

print(response_text.content)

Rachem is a misspelling of the name Rachel.
