# **Building a RAG Application: A Comprehensive Demo**

### **Objective:**
In this notebook, you will explore the **Retrieval-Augmented Generation (RAG)** and its application to **Large Language Models (LLMs)**. You will:

- **Set up the necessary libraries** and tools for implementing RAG with LLMs.

By the end of this demo, you will understand the RAG framework and its practical applications in enhancing LLMs.

\

---

\

### **What is RAG?**

**Retrieval-Augmented Generation (RAG)** is an advanced method to enhance **LLMs** with external knowledge. Instead of relying solely on the model's pre-trained knowledge, RAG allows the LLM to retrieve relevant information from external databases (like vector stores) during inference, making it suitable for dynamic and domain-specific applications.

RAG addresses many challenges faced by LLMs:
- **Domain Knowledge Gaps**: LLMs can be limited in specific knowledge areas, especially in evolving fields.
- **Factuality Issues & Hallucinations**: RAG helps reduce incorrect or fabricated answers by retrieving context from trusted data sources.
- **Real-Time Updates**: RAG enables the integration of continuously updated external knowledge without retraining the model.

\

---

\


### **Why Use RAG?**

RAG enhances the capabilities of LLMs by:
- **Improving Response Quality**: Provides context from up-to-date knowledge, making the response more accurate.
- **Handling Knowledge Gaps**: Allows LLMs to access external databases, which are often required for specialized tasks.
- **Reducing Hallucinations**: RAG reduces the risk of LLMs making up answers or providing misleading responses.
- **Faster Deployment**: RAG avoids the need for retraining the model for every new dataset or domain.

> For more detailed information, you can refer to this [RAG Paper](https://arxiv.org/abs/2312.10997) or the [Retrieval-Augmented Generation Blog](https://www.promptingguide.ai/research/rag.en#introduction-to-rag).



### **Step 1. Setup and Install Dependencies**

> Run the following cell to install dependencies required:

In [1]:
# Import the warnings module and suppress any warnings that might appear during execution
import warnings
warnings.filterwarnings("ignore")  # Ignore warnings to keep the output clean and focused

- The `warnings` module is used to manage and control the warning messages in Python.
- `filterwarnings("ignore")` tells Python to ignore all warning messages that may be raised during execution. This is often done to keep the notebook or script output cleaner, especially when warnings are known and not critical to the code's functionality.

In [2]:
# downloading packages for running the notebook
import sys
import subprocess

# subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt', '--quiet'])torch==2.1.1
!pip install gradio==3.38.0
!pip openai==0.27.8
!pip fasttext==0.9.2
!pip faiss-cpu==1.7.4
!pip torchaudio==2.1.1
!pip langchain==0.0.243
!pip llama-index==0.6.8
!pip torchvision==0.16.1
!pip transformers==4.31.0
!pip gradio_client==0.2.10
!pip install sentence-transformers
!pip install langchain llama-index
!pip install fastText
!pip install langchain-community langchain-core
!pip install langchain_huggingface
!pip install faiss-cpu

Collecting gradio==3.38.0
  Downloading gradio-3.38.0-py3-none-any.whl.metadata (17 kB)
Collecting aiofiles<24.0,>=22.0 (from gradio==3.38.0)
  Downloading aiofiles-23.2.1-py3-none-any.whl.metadata (9.7 kB)
Collecting aiohttp~=3.0 (from gradio==3.38.0)
  Downloading aiohttp-3.13.2-cp313-cp313-macosx_11_0_arm64.whl.metadata (8.1 kB)
Collecting altair<6.0,>=4.2.0 (from gradio==3.38.0)
  Using cached altair-5.5.0-py3-none-any.whl.metadata (11 kB)
Collecting fastapi (from gradio==3.38.0)
  Downloading fastapi-0.124.4-py3-none-any.whl.metadata (30 kB)
Collecting ffmpy (from gradio==3.38.0)
  Downloading ffmpy-1.0.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client>=0.2.10 (from gradio==3.38.0)
  Downloading gradio_client-2.0.1-py3-none-any.whl.metadata (7.1 kB)
Collecting httpx (from gradio==3.38.0)
  Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting huggingface-hub>=0.14.0 (from gradio==3.38.0)
  Downloading huggingface_hub-1.2.3-py3-none-any.whl.metadata (1

### **Step 2. Imports and Configuration**

In [3]:
!pip install fastText


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [4]:
import os
import openai
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
import getpass

ModuleNotFoundError: No module named 'langchain.text_splitter'

### **Step 3. Add OpenAI API Key**

[Models available](https://platform.openai.com/docs/models)

#### **📌 Best Practices for API Key Safety**


1. Always use a unique API key for each team member on your account
2. Never deploy your key in client-side environments like browsers or mobile apps
3. Never commit your key to your repository
4. Use Environment Variables in place of your API key
5. Use a Key Management Service
6. Monitor your account usage and rotate your keys when needed



In [4]:
os.environ['OPENAI_API_KEY'] = getpass.getpass()

··········


#### **🛡️ Further Details**

1. Use .env Files for Local Development
  - Store the API key in a .env file and load it using dotenv.

    ```
    from dotenv import load_dotenv
    load_dotenv()
    API_KEY = os.getenv("OPENAI_API_KEY")
    ```
2. Use Secret Managers for Production
  - AWS: AWS Secrets Manager
  - Azure: Azure Key Vault
  - GCP: Google Secret Manager

3. Use Token-Based Authentication if Available
  - Some cloud services provide temporary tokens instead of static API keys.

### **Step 4. Language Detection Model from NLLB**

> It adds robustness in the approach to handle multilingual inputs and translate them if required.



In [9]:
import fasttext
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(repo_id="facebook/fasttext-language-identification", filename="model.bin")
model = fasttext.load_model(model_path)

model.bin:   0%|          | 0.00/1.18G [00:00<?, ?B/s]

### **Step 5. Text Cleaning and Preprocessing**

> We now clean and preprocess the data. The text needs to be formatted before it can be processed into vector embeddings.

#### **1. Upload the Dataset**
> (dataset.txt)

In [10]:
from google.colab import files

# Upload the CSV files
uploaded = files.upload()

# Verify files uploaded successfully
print(uploaded)


Saving dataset.txt to dataset (1).txt
{'dataset (1).txt': b"# About Pan Card\n\n### What is Pan card?\n\nThe PAN card is a unique ten-digit alphanumeric identification number that is issued by the Income Tax Department of India to track the tax-related transactions of individuals and entities. The PAN card is mandatory for any financial transaction in India, including opening a bank account, buying or selling property, and filing income tax returns.\n\n### Who needs a Pan card?\n\nAll individuals/non-individuals (including foreign citizens/entities) earning taxable income in India\xc2\xa0must have a PAN card.\n\n### Types of PAN cards\n\nIn India, two types of PAN cards are available: e-PAN card and physical PAN card.\n\n1. e-PAN card: An e-PAN card is a digitally-signed PAN card issued in electronic format. It contains the same PAN details as a physical PAN card but is available in a digital format. It can be downloaded online and used as a valid identification document for various pu

In [11]:
# Read and clean the dataset
with open('dataset.txt', 'r') as f:
    data = f.read()

data = data.replace('\n\n','\n') # Clean unnecessary line breaks
data = data.split('---') # Split the data into separate sections based on a delimiter

In [12]:
# Preprocess the text (adjust formatting for clean text)
for i in range(0, len(data)):
    if i==4:
      data[i] = data[i].replace('\n**', '\n###').replace('**','')
    elif i==3:
      data[i] = data[i].replace('**','')
    else:
      data[i] = data[i].replace('**','')

In [13]:
# Organize the data into question-answer pairs
ques_ans = dict()
for i in range(0, len(data)):
    topics = data[i].split('\n###')
    for topic in topics[1:]:
      question_answer_pair = topic.split('\n')
      ques_ans[question_answer_pair[0]] = " ".join(question_answer_pair[1:])

In [14]:
all_content = str()
for key, value in ques_ans.items():
    # print(key)
    all_content += key + " " + value + "\n"
print(all_content)
# Print cleaned content for verification

 What is Pan card? The PAN card is a unique ten-digit alphanumeric identification number that is issued by the Income Tax Department of India to track the tax-related transactions of individuals and entities. The PAN card is mandatory for any financial transaction in India, including opening a bank account, buying or selling property, and filing income tax returns.
 Who needs a Pan card? All individuals/non-individuals (including foreign citizens/entities) earning taxable income in India must have a PAN card.
 Types of PAN cards In India, two types of PAN cards are available: e-PAN card and physical PAN card. 1. e-PAN card: An e-PAN card is a digitally-signed PAN card issued in electronic format. It contains the same PAN details as a physical PAN card but is available in a digital format. It can be downloaded online and used as a valid identification document for various purposes. The e-PAN card is usually issued in a PDF format. 2. Physical PAN card: A physical PAN card is a laminated

#### **2. Text Chunking**

> To prepare the text for semantic search, we break the content into smaller chunks. This will make it easier to process and index efficiently.


In [15]:
# Split the large content into smaller chunks for indexing
text_splitter = CharacterTextSplitter(separator='\n', chunk_size=300, chunk_overlap=128, length_function=len)

chunks = text_splitter.split_text(all_content)

# Display the first chunk for verification
print(chunks[0])



What is Pan card? The PAN card is a unique ten-digit alphanumeric identification number that is issued by the Income Tax Department of India to track the tax-related transactions of individuals and entities. The PAN card is mandatory for any financial transaction in India, including opening a bank account, buying or selling property, and filing income tax returns.


#### **3. Vector Embeddings**

> Next, we use Sentence Transformers to convert the text chunks into vector embeddings. This is the crucial step for performing semantic search.

In [16]:
from sentence_transformers import SentenceTransformer
text = "Sampel text!"

# Initialize the model for text embeddings
encoder = SentenceTransformer("paraphrase-mpnet-base-v2")

# Encode the chunks into vector embeddings
vectors = encoder.encode(text)

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/594 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

#### **4. Store Embeddings in FAISS**

> We use FAISS to store and index the vectors. FAISS is optimized for similarity search, allowing us to retrieve the most relevant vectors during the search phase.

In [17]:
embeddings = OpenAIEmbeddings()

vectorStore = FAISS.from_texts(chunks, embeddings)
vectorStore.save_local("faiss_doc_idx")

  embeddings = OpenAIEmbeddings()


#### **5. Perform Semantic Search**

> Now that we have indexed the embeddings, we can perform a semantic search. This search will find the most relevant documents based on a query, considering the meaning rather than exact matches.

In [18]:
docs = vectorStore.similarity_search("WHow lowng does it usually take to receive the PAN card after applying?")

#### 6. **Setup OpenAI Model**


In [19]:
from langchain_community.llms import HuggingFaceHub

In [20]:
import os

In [21]:
from langchain_openai import OpenAI
from langchain_community.llms import huggingface_hub
from langchain_openai.callbacks import get_openai_callback
from langchain.chains.question_answering import load_qa_chain

llm = OpenAI(temperature=0)

chain = load_qa_chain(llm, chain_type="refine")

with get_openai_callback() as cb:
    response = chain.run(input_documents=[docs[0]], question=chain)

  llm = OpenAI(temperature=0)
stuff: https://python.langchain.com/docs/versions/migrating_chains/stuff_docs_chain
map_reduce: https://python.langchain.com/docs/versions/migrating_chains/map_reduce_chain
refine: https://python.langchain.com/docs/versions/migrating_chains/refine_chain
map_rerank: https://python.langchain.com/docs/versions/migrating_chains/map_rerank_docs_chain

See also guides on retrieval and question-answering here: https://python.langchain.com/docs/how_to/#qa-with-rag
  chain = load_qa_chain(llm, chain_type="refine")
  response = chain.run(input_documents=[docs[0]], question=chain)


In [22]:
response

'\nThe time required to issue a PAN card depends on whether or not you have an Aadhaar card. If you do have an Aadhaar card, you can get a PAN card instantly (in under 10 minutes) by applying through ABC. However, if you do not have an Aadhaar card, the process will take longer. Once you make the payment to ABC, they will contact you and initiate the process. In this case, it will take approximately 3 weeks for your PAN card to be issued. If you need to update or correct any information on your PAN card, you can do so by providing additional context and refining your existing answer. The original question is: "How long does it take to issue a PAN card?" We have provided an existing answer: "If you have an Aadhaar card, you can get a PAN card instantly (in under 10 minutes) by applying through ABC. If you do not have an Aadhaar card, it will take approximately 3 weeks for your PAN card to be issued." Given the new context, refine the original answer to better answer the question. If the

### **Step 6. Gradio Chatbot**

In [None]:
from langchain_core.prompts import PromptTemplate
from langchain.chains import RetrievalQA

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import AIMessage, HumanMessage
import openai
import gradio as gr

def predict(message, history):
    history_langchain_format = []

    for human, ai in history:
        history_langchain_format.append(HumanMessage(content=human))
        history_langchain_format.append(AIMessage(content=ai))

    #history_langchain_format.append(HumanMessage(content=message))

    language = model.predict(message)[0][0].split('__')[-1]
    template = """I want you to act as a question answering bot which uses the context mentioned and answer in a concise manner and doesn't make stuff up.
            You will answer question based on the context - {context}.
            You will create content in""" + str(language) + """language.
            Question: {question}
            Answer:
            """
    QA_CHAIN_PROMPT = PromptTemplate.from_template(template)
    qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorStore.as_retriever(), chain_type_kwargs={"prompt": QA_CHAIN_PROMPT})

    result = qa_chain({"query": message})

    history_langchain_format.append(HumanMessage(content=message))
    history_langchain_format.append(AIMessage(content=result['result']))

    return result['result']

gr.ChatInterface(predict,
    chatbot=gr.Chatbot(height=300),
    textbox=gr.Textbox(placeholder="Ask me a question related to PAN Services", container=False, scale=7),
    title="DocumentQABot",
    theme="soft",
    examples=["What is the cost/fees of a PAN card?", "How long does it usually take to receive the PAN card after applying?"],
    retry_btn=None,
    undo_btn="Delete Previous",
    clear_btn="Clear",).launch(share=True)

---
#  -----------------------------------------------------  **THANK YOU** ------------------------------------------------------------


---