# Building a RAG System on Ramayana PDF

### Goals of this Notebook

We will:

Load an LLM model and check the response generated without help from RAG. Then,

1. Load a Ramayana PDF document  
2. Convert it into searchable text  
3. Create vector embeddings  
4. Build a Retrieval Augmented Generation (RAG) pipeline  
5. Ask questions to a model **without RAG**  
6. Ask the same questions **with RAG**  
7. Evaluate the quality improvement  



# Ramayana RAG System – Using Mistral (Local LLM)

This notebook builds a full Retrieval Augmented Generation system using:

- Mistral 7B (local via llama-cpp)
- Chroma Vector Database
- Sentence Transformer embeddings
- PyMuPDF PDF loader

In [2]:
# Installation for GPU llama-cpp-python
# uncomment and run the following code in case GPU is being used
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.85 --force-reinstall --no-cache-dir -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.4/1.8 MB[0m [31m10.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m160.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.6/16.6 MB[0m [31m219.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m174.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for llama-cpp-python (pyproject.toml) ... [?25l[?25hdone
[31mERROR: 

In [2]:
# For installing the libraries & downloading models from HF Hub
!pip install --upgrade pip -q

!pip install \
huggingface_hub \
pandas \
tiktoken \
pymupdf \
langchain \
langchain-community \
chromadb \
sentence-transformers \
llama-cpp-python -q


In [3]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

#Libraries for downloading and loading the llm
from huggingface_hub import hf_hub_download
from llama_cpp import Llama

## Question Answering using LLM

#### Downloading and Loading the model

In [4]:
#Using Mistral model
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.2-GGUF"
model_basename = "mistral-7b-instruct-v0.2.Q6_K.gguf"  # the model is in gguf format

In [5]:
# Using hf_hub_download to download a model from the Hugging Face model hub
# The repo_id parameter specifies the model name or path in the Hugging Face repository
# The filename parameter specifies the name of the file to download
model_path = hf_hub_download(
    repo_id= model_name_or_path, #code to mention the repo id
    filename= model_basename #code to mention the model name
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


mistral-7b-instruct-v0.2.Q6_K.gguf:   0%|          | 0.00/5.94G [00:00<?, ?B/s]

In [6]:
#Load model with GPU support
llm = Llama(
    model_path=model_path,
    n_ctx=2300, # Context window
    n_gpu_layers=38,
    n_batch=512
)

AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 


#### Response

In [7]:
#Function to generate responses from the LLM
#response function creates a reusable function to generate responses from the LLM
#- Handles all inference parameters in one place
#- Returns just the text response
def response(query,max_tokens=128,temperature=0,top_p=0.95,top_k=50):
    model_output = llm(
      prompt=query,
      max_tokens=max_tokens, #Maximum number of tokens to generate
      temperature=temperature, #Controls randomness
      top_p=top_p, #picks from top tokens that make up top_p of total probability
      top_k=top_k #considers only the top_k most likely tokens
    )

    return model_output['choices'][0]['text']

####Sample Questions


    1. Who is Hanuman and what role does he play in the Ramayana?
    2. What happened in the battle between Rama and Ravana?
    3. Tell me about Sita's character in the Ramayana.
    4. Who is Queen Tara and what happens to her?

In [8]:
#Storing sample questions into variables for easse of usage
qn1 = "Who is Hanuman and what role does he play in the Ramayana?"
qn2 = "What happened in the battle between Rama and Ravana?"
qn3 = "Tell me about Sita's character in the Ramayana."
qn4 = "Who is Queen Tara and what happens to her?"

In [9]:
#Question1
response(qn1)

'\n\nHanuman is a central character in the Indian epic Ramayana, which narrates the adventures of Prince Rama, an incarnation of the Hindu god Vishnu. Hanuman is known as the monkey god or the vanara god and is revered for his devotion to Lord Rama, his strength, and his intelligence.\n\nHanuman was born to Anjani, a vanara (monkey) princess, and the wind god, Vayu. He grew up in the forest with other monkeys and apes. Hanuman is best known for his unwavering'

In [10]:
#Question2
response(qn2)

Llama.generate: prefix-match hit


"\n\nRavana, the king of Lanka, was a powerful demon king who had abducted Sita, wife of Rama. Rama, along with his brother Lakshmana, set out on a journey to rescue Sita. They reached the forest of Dandaka where they met Sage Valmiki and received blessings from him.\n\nRavana, who was aware of Rama's mission, sent his brother Vibhishana to invite Rama for peace talks. Rama agreed but on the condition that Sita should be present during the talks. Ravana agreed"

In [11]:
#Question3
response(qn3)

Llama.generate: prefix-match hit


'\n\nSita, also known as Janaki or Seetha, is one of the most revered and beloved characters in Hindu mythology. She is the wife of Lord Rama, an avatar of Vishnu, and is considered an ideal woman and a paragon of virtue and devotion.\n\nAccording to the epic Ramayana, Sita was born in the kingdom of Mithila to King Janaka and Queen Sunanda. She was discovered by sage Valmiki while she was playing in the forest as a child, and he predicted that she would one day become the wife of an avatar'

In [12]:
#Question4
response(qn4)

Llama.generate: prefix-match hit


'\n\nQueen Tara, also known as Taranis, is a character in the 1982 animated film The Dark Crystal. She is the queen of the Gelfling tribe of Em-Kai and the wife of King Kermit. When the Skeksis gain control of the Crystal of Truth, they use it to corrupt the Saplings, which are the source of life for the Gelflings. Queen Tara becomes ill as a result of this corruption and eventually dies.\n\nThe main character of the story, Kermit the Gelfling, sets out on a'

**Observations on the Model**

- This establishes our baseline performance of the **Mistral language model** without any additional prompt engineering or retrieval augmentation. The intent was to understand how effectively the base model alone could answer Ramayana queries drawn from data out there.
- The model generated relevant and contextually coherent responses except for the last query where it answered the question out of context.The responses reflected general understanding and followed a logical flow.
- While the content was somwhat accurate except for one, the responses were incomplete, often stopping mid-sentence and lacked depth and was generic.
- The model generally delivers useful information; however, its responses are often high-level, making them appear more like general answers than context-specific response.

## Question Answering using LLM with Prompt Engineering

#### Defining system prompt ####

- Adds structure and guidance to LLM responses
- Sets expectations for medical accuracy
- Improves response format and quality
- Still no external context, just better instructions

#### Creating response function for 5 different combinations ####

Considering **response1** function as the base function, creating 4 other combinations and comparing it with response function
reponse1 has parameters

 **max_tokens** = 1024,   **temp** = 0.7   **top_p** = 0.95  **top_k** = 50

- **response2** - changed temp value from 0.7 to 0.0
- **response3** - changed top_p value from 0.95 to 0.85
- **response4** - changed top_k value from 50 to 80
- **response5** - changed max_tokens from 1024 to 512

In [13]:
#Creating system instruction
system_message = """You are an expert in Hindu mythology with comprehensive knowledge in Ramayana.
Your role is to provide accurate, relevant information to enthusiasts who wants to know about Ramayana
Guidelines for your responses:
Provide detailed, accurate information
Use appropriate terminology
Structure responses clearly with proper organization
Keep responses concise and to the point
"""  #code to define the system prompt

#Creeate system prompt with instructions for the model
#system_prompt = f"[INST]<<SYS>>\n{system_message}\n<</SYS>>[/INST]"
system_prompt = f"[INST]<<SYS>>\n{system_message}\n<</SYS>>\n"


#### Response 1
This is going to be the baseline response.

In [14]:
# max_tokens = 1024, temp = 0.7 top_p = 0.95 top_k = 50
def response1(query,max_tokens=1024,temperature=0.7,top_p=0.95,top_k=50):

  # Construct the final prompt using the user's query and system prompt
    query_full = f"{system_prompt}{user_input}[/INST]"
    # Generate a response using the model
    model_output = llm(
      prompt=query_full,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      repeat_penalty=1.1,
      top_k=top_k,
      stop=['</s>'],
      echo=False,


    )

    # Return only the generated answer text
    return model_output['choices'][0]['text']

In [15]:
# Changed temp value from 0.7 to 0.0
# **max_tokens** = 1024,   **temp** = 0.0   **top_p** = 0.95  **top_k** = 50
def response2(query,max_tokens=1024,temperature=0.0,top_p=0.95,top_k=50):

    # Construct the final prompt using the user's query and system prompt
    query_full = f"{system_prompt}{user_input}[/INST]"
    # Generate a response using the model
    model_output = llm(
      prompt=query_full,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      repeat_penalty=1.1,
      top_k=top_k,
      stop=['</s>'],
      echo=False,


    )

    # Return only the generated answer text
    return model_output['choices'][0]['text']

In [16]:
# Changed temp top_p from 0.95 to 0.85
# **max_tokens** = 1024,   **temp** = 0.7   **top_p** = 0.85  **top_k** = 50
def response3(query,max_tokens=1024,temperature=0.7,top_p=0.85,top_k=50):

    # Construct the final prompt using the user's query and system prompt
    query_full = f"{system_prompt}{user_input}[/INST]"
    # Generate a response using the model
    model_output = llm(
      prompt=query_full,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      repeat_penalty=1.1,
      top_k=top_k,
      stop=['</s>'],
      echo=False,

    )

    # Return only the generated answer text
    return model_output['choices'][0]['text']

In [17]:
# Changed temp max_tokens from 1024 to 512
# **max_tokens** = 512,   **temp** = 0.7   **top_p** = 0.95  **top_k** = 50
def response4(query,max_tokens=512,temperature=0.7,top_p=0.95,top_k=50):

    # Construct the final prompt using the user's query and system prompt
    query_full = f"{system_prompt}{user_input}[/INST]"
    # Generate a response using the model
    model_output = llm(
      prompt=query_full,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      repeat_penalty=1.1,
      top_k=top_k,
      stop=['</s>'],
      echo=False,

    )

    # Return only the generated answer text
    return model_output['choices'][0]['text']

In [18]:
# Changed temp top_k from 50 to 80
# **max_tokens** = 1024,   **temp** = 0.7   **top_p** = 0.95  **top_k** = 80
def response5(query,max_tokens=1024,temperature=0.7,top_p=0.95,top_k=80):

    # Construct the final prompt using the user's query and system prompt
    query_full = f"{system_prompt}{user_input}[/INST]"
    # Generate a response using the model
    model_output = llm(
      prompt=query_full,
      max_tokens=max_tokens,
      temperature=temperature,
      top_p=top_p,
      repeat_penalty=1.1,
      top_k=top_k,
      stop=['</s>'],
      echo=False,

    )

    # Return only the generated answer text
    return model_output['choices'][0]['text']

1. Who is Hanuman and what role does he play in the Ramayana?
2. What happened in the battle between Rama and Ravana?
3. Tell me about Sita's character in the Ramayana.
4. Who is Queen Tara and what happens to her?



### Query 1: Who is Hanuman and what role does he play in the Ramayana?

Comparing response1 and response2 on
**Query 1**: Who is Hanuman and what role does he play in the Ramayana?

In [29]:
#  Response1 max_tokens = 1024, temp = 0.7 top_p = 0.95 top_k = 50
#  What happened in the battle between Rama and Ravana? %% [markdown]
user_input = "Who is Hanuman and what role does he play in the Ramayana?"
response1(user_input)

Llama.generate: prefix-match hit


" Hanuman is a central character in the Hindu epic Ramayana, renowned for his unwavering devotion to Lord Rama. He is also known by various other names such as Anjaneya, Vayu Putra (son of the wind god), and Mahavira.\n\nHanuman was born to Anjani, a celestial nymph, and Vayu, the god of wind. He grew up in the forest with the Vanaras (monkey-like beings) under the care of Sage Agastya. Hanuman's childhood is marked by his extraordinary strength, agility, intelligence, and unwavering devotion to Rama.\n\nWhen Rama was exiled from Ayodhya along with his wife Sita and brother Lakshmana, Hanuman went with them in disguise as a sage's hermitage servant. During their stay in the forest, Hanuman became friends with Rama, Sita, and Lakshmana. He proved his loyalty and strength several times during their exile.\n\nWhen Sita was abducted by Ravana, the king of Lanka, Hanuman played a crucial role in helping Rama locate her. He traveled to Lanka under the guise of a golden-furred monkey to gathe

In [30]:
#  Response2 max_tokens = 1024, temp = 0.0 top_p = 0.95 top_k = 50
#  Modified temp = 0.0
#  What happened in the battle between Rama and Ravana? %% [markdown]
user_input = "Who is Hanuman and what role does he play in the Ramayana?"
response2(user_input)

Llama.generate: prefix-match hit


" Hanuman is a central character in the Indian epic Ramayana, renowned for his unwavering devotion to Lord Rama. He is an ardent devotee of Rama and played a pivotal role in helping Rama rescue his wife Sita from the demon king Ravana.\n\nHanuman is also known as Anjaneya or Maruti. He was born to Anjani, the sister of Wind God Vayu, and the monkey king, Kesari. Hanuman's birth was a result of the boon granted to Anjani by Sage Agastya that her son would be invincible and possess extraordinary strength and intelligence.\n\nHanuman is revered for his immense physical strength, courage, loyalty, and wisdom. He is often depicted with distinctive features such as a tail, face markings, and the ability to grow or shrink in size at will. Hanuman's most famous exploit during the Ramayana was when he crossed the ocean to Lanka to search for Sita. He accomplished this by leaping over the sea, demonstrating his incredible power and determination.\n\nIn the epic battle between Rama's army and Rav

**Observations**


### Query 2: What happened in the battle between Rama and Ravana?

Comparing response1 and response3 on
**Query 2**: What happened in the battle between Rama and Ravana?

In [31]:
#  Response1 max_tokens = 1024, temp = 0.7 top_p = 0.95 top_k = 50
# What happened in the battle between Rama and Ravana?
user_input = "What happened in the battle between Rama and Ravana?"
response1(user_input)

Llama.generate: prefix-match hit


" In the epic Hindu narrative of Ramayana, the climax of the story involves the battle between Rama, the prince of Ayodhya, and Ravana, the king of Lanka. This confrontation ensues due to Ravana's abduction of Sita, Rama's wife.\n\nRama, accompanied by his brother Lakshmana, his monkey allies Hanuman, Jambavan, Sugriva, and Vali, and an army of vanaras (monkeys), sets out to rescue Sita. They reach the seashore and build a bridge called Setu or Rama Setu across the ocean with the help of the god Agni and the monkey king Jambavan.\n\nUpon reaching Lanka, Rama challenges Ravana to a duel in front of his assembly. Ravana, despite being aware of Rama's superiority, decides to use various strategies in the battle, including sending his sons Meghnadha (Indrajit) and Kumbhakarna.\n\nRama manages to defeat both Meghnadha and Kumbhakarna with the help of his brother Lakshmana and Hanuman. The battle between Rama and Ravana finally ensues, and it is a fierce fight. Ravana uses various weapons an

In [32]:
#  Response3 max_tokens = 1024, temp = 0.7 top_p = 0.85 top_k = 50
#  Modified top_p = 0.85
# What happened in the battle between Rama and Ravana?
user_input = "What happened in the battle between Rama and Ravana?"
response3(user_input)

Llama.generate: prefix-match hit


" The Battle between Rama and Ravana is a pivotal part of the Hindu epic, Ramayana. After Sita Devi, the wife of Lord Rama, was abducted by Ravana, King of Lanka, Rama sets out on a journey with his brother Lakshmana and monkey allies Hanuman, Jambavan, Sugriva, and Vali to rescue her.\n\nUpon reaching Lanka, they lay siege to the city. During this time, Ravana attempted to negotiate for Sita's return by offering to let her live in his harem with Rama as her husband. However, Rama refused, as he believed that living in adulterous relations was not an option even if it meant being reunited with his wife.\n\nAs negotiations failed, the battle commenced. The initial skirmishes saw the gods taking sides – Indra and other devas supporting Rama while Ravana had the support of the asuras and rakshasas. In the ensuing war, numerous battles took place between the vanaras (monkey allies) and the rakshasas.\n\nOne significant episode was the encounter between Hanuman and Ravana's brother Vibhisha

**Observations**

### Query 3: Tell me about Sita's character in the Ramayana.

Comparing response1 and response4 on
**Query 3**: Tell me about Sita's character in the Ramayana.

In [33]:
#  Response1 max_tokens = 1024, temp = 0.7 top_p = 0.95 top_k = 50
# Tell me about Sita's character in the Ramayana.
user_input = "Tell me about Sita's character in the Ramayana."
response1(user_input)

Llama.generate: prefix-match hit


" Sita, also known as Sita Devi or Mata Sita, is a prominent and beloved character in the Hindu epic poem Ramayana. She is the beloved wife of Lord Rama, one of the avatars (incarnations) of God Vishnu. Sita is revered for her unwavering devotion to her husband, her purity, and her strength.\n\nThe story of Sita begins when she is found as a baby in a field of Ashoka trees by King Janaka of Videha. He adopts her as his daughter and raises her with great care. When Rama, the prince of Ayodhya, visits King Janaka for his daughters' swayamvara (self-choice marriage ceremony), he is captivated by Sita and marries her after defeating the bow called Shiva Dhanush, which was placed there as a challenge for the suitors.\n\nAfter their marriage, they return to Ayodhya where they live happily with Rama's brother Lakshmana. However, their happiness is short-lived when Ravana, the king of Lanka, abducts Sita while Rama and Lakshmana are in exile in the forest. Sita remains steadfast in her loyalty

In [34]:
#  Response4 max_tokens = 1024, temp = 0.7 top_p = 0.95 top_k = 80
# Tell me about Sita's character in the Ramayana.
user_input = "Tell me about Sita's character in the Ramayana."
response4(user_input)

Llama.generate: prefix-match hit


' Sita, also known as Goddess Sita, is a prominent character in the Hindu epic Ramayana. She is the beloved wife of Prince Rama and is considered an ideal woman and symbol of purity, loyalty, and devotion in Indian culture.\n\nSita was born from the earth goddess, Earth (Prithvi), when King Janaka performed a yajna (sacrifice). When King Janaka discovered Sita, he named her "Sita" which means furrow as she appeared from a furrow in the ground. Sita grew up in King Janaka\'s palace under his protection and was known for her exceptional beauty and virtues.\n\nRama, the prince of Ayodhya, fell in love with Sita at first sight when he visited King Janaka\'s kingdom for performing a yajna. With divine intervention and his father\'s blessings, Rama was married to Sita. Their marriage was considered perfect and symbolized ideal love between husband and wife.\n\nThroughout the Ramayana, Sita demonstrates remarkable strength, courage, and loyalty. When Rama was banished to the forest for 14 yea

**Observations**

### Query 4: Who is Queen Tara and what happens to her?

Comparing response1 and response5 on
**Query 4**: Who is Queen Tara and what happens to her?

In [35]:
#  Response1 max_tokens = 1024, temp = 0.7 top_p = 0.95 top_k = 50
# Who is Queen Tara and what happens to her?
user_input = "Who is Queen Tara and what happens to her?"
response1(user_input)

Llama.generate: prefix-match hit


" Queen Tara, also known as Tara Devi or Tarani, is a minor yet significant character in the Hindu epic Ramayana. She is the queen of the monkey king Vali and the sister of Sugriva.\n\nThe story of Queen Tara unfolds during the Vanavaas Yatra (exile) of Lord Rama, where they reside in the forest of Dandaka under the protection of Sage Valmiki. The monkey king Vali, who was known for his immense strength and valor, had a bitter rivalry with Sugriva.\n\nDuring this period, Sita Devi, the wife of Lord Rama, was abducted by Ravana, the king of Lanka. In an attempt to help Rama locate Sita and reclaim her, Vali offered his sister Tara's hand in marriage to Sugriva as part of a peace agreement. According to the terms of this pact, whoever killed Vali would become the new king of Kishkindha.\n\nSugriva, seeing an opportunity to gain power and protect Rama's interests, agreed to marry Tara but pledged to Avanashi, Goddess of Truth, that he would not consummate the marriage until Sita was safel

In [36]:
#  Response5 max_tokens = 512, temp = 0.7 top_p = 0.95 top_k = 50
# Who is Queen Tara and what happens to her?
user_input = "Who is Queen Tara and what happens to her?"
response5(user_input)

Llama.generate: prefix-match hit


' Queen Tara, also known as Tara Devi or Tarasundari, is a minor character in the Indian epic Ramayana. She is the queen of King Himavat of the Himalayas and is renowned for her great beauty and piety.\n\nOne day, while Lord Rama was on his exile in the forest, accompanied by Sita and Lakshmana, they encountered Tara Devi. Impressed by their noble qualities and purity, she expressed her desire to serve them as their courtier and guide during their wandering years. Rama accepted her proposal, and Tara became an integral part of their entourage.\n\nWhile in the company of Rama and Sita, Tara Devi was visited by her husband, King Himavat. Upon discovering his wife\'s absence, he grew suspicious and followed them. Eventually, he realized that they had nothing but noble intentions towards his wife and blessed them with his divine wisdom and blessings.\n\nAnother notable incident regarding Queen Tara occurred when she along with Rama, Sita, and Lakshmana visited the celestial city of Amarava

**Observations**

## Data Preparation for RAG

In [19]:
#Libraries for processing dataframes,text
import json,os
import tiktoken
import pandas as pd

#Libraries for Loading Data, Chunking, Embedding, and Vector Databases
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

In [27]:
#Loading the pdf
# Upload your PDF
from google.colab import files
print("📤 Please upload your Ramayana PDF file...")
uploaded = files.upload()

# Get the filename
pdf_filename = list(uploaded.keys())[0]
print(f"\n✅ Uploaded: {pdf_filename}")

#ramayana_pdf_path = # Get the filename
ramayana_pdf_path = list(uploaded.keys())[0]
print(f"\n✅ Uploaded: {pdf_filename}")
pdf_loader = PyMuPDFLoader(ramayana_pdf_path)
ramayana = pdf_loader.load()

if not ramayana:
    print(f"Error: No pages loaded from {ramayana_pdf_path}. Please check the file path and ensure the PDF is not empty or corrupted.")
else:
    print(f"Successfully loaded {len(ramayana)} pages from {ramayana_pdf_path}.")

📤 Please upload your Ramayana PDF file...


Saving Ramayana.pdf to Ramayana.pdf

✅ Uploaded: Ramayana.pdf

✅ Uploaded: Ramayana.pdf
Successfully loaded 386 pages from Ramayana.pdf.


### Data Overview

#### Checking the first 5 pages

In [28]:
for i in range(5):
    print(f"Page Number : {i+1}",end="\n")
    print(ramayana[i].page_content,end="\n")

Page Number : 1

Page Number : 2
A TALE ENDLESSLY RETOLD, 
A VISION ETERNALLY BORN ANEW 
Since its original appearance over 2000 years 
ago, Ramayana has served as the model for 
poems, stories, folktales, plays, and films in 
India, Burma, Cambodia, Thailand, Indonesia, 
and the Philippines. In each of these lands, 
writers modified and built upon the original 
epic to enhance its impact and meaning for 
their own cultures. 
With this English version, Ramayana may truly 
be said to have reached the West. As B. A. van 
Nooten says in his fascinating introduction, 
this is "an extraordinary accomplishment. 
. 
• 
• 
In the minds of many people who hear the 
Ramayana a mystery is being presented, and 
slowly, erratically, parts of the mystery 
unfold .... We get glimpses of a higher, purer 
reality that holds out hope for those enmeshed in 
the sorry state of mundane existence. Again 
and again [we] experience this joy of discovery. 
The struggle between good and evil is on our 
behalf, 

#### Checking the number of pages

In [29]:
len(ramayana)

386

### Data Chunking
Chunking is a fundamental step in Retrieval-Augmented Generation (RAG) systems that enables effective information retrieval from large documents. Since embedding models have token limits, entire documents cannot be processed as a single unit, making segmentation necessary. By dividing text into smaller, semantically meaningful chunks, each segment can be embedded and indexed independently. This improves retrieval precision because similarity search operates at the chunk level rather than over an entire document. Without chunking, embeddings become too generalized, reducing the system’s ability to locate specific information. Chunking also ensures that only relevant context is passed to the language model, minimizing noise and reducing hallucinations. Proper chunk sizing helps maintain semantic coherence while preserving enough context for accurate responses. Overlapping chunks further improve continuity across sections and prevent loss of important information at boundaries. Additionally, chunking enhances system efficiency by reducing embedding computation time and retrieval latency. Overall, chunking is essential for building scalable, accurate, and context-aware RAG pipelines.

In [30]:
# Initializing a RecursiveCharacterTextSplitter to split the text into manageable chunks for embedding and retrieval
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name='cl100k_base',
    chunk_size=512,
    chunk_overlap= 50
)

In [31]:
#loading the PDF document, extracting its text, and splitting it into smaller chunks
document_chunks = pdf_loader.load_and_split(text_splitter)

In [32]:
#Checking the number of text chunks the pdf has been split into
len(document_chunks)

698

#### Checking whether there is any overlap between chunks

In [33]:
document_chunks[52].page_content

'Born as a Man \n7 \nthe hermit women loved them, and watched those two boys \ninstead of offering worship to the gods. \nAs Valmiki composed Ramayana, and as Kusa and Lava \ngrew old enough to learn, he taught it to them by memory. \nWhen they were twelve years old Valmiki had brought his \nstory nearly up to the present, and Kusa and Lava knew ev\xad\nery foot of it, and sang Ramayana to a lute and a drum, like \nGandharvas, the heavenly musicians. \nThat year \nKing \nRama \nheld \na \nyear\'s \ncelebration \nin \nNaimisha Forest along the river Gomati. At home, Kusa and \nLava rehearsed their song. Deer listened from the wood and \nbirds from the trees. They practiced long, and many forest \nmen came to listen. After each day they brought Kusa and \nLava gifts and presents-a waterjar, a bark-cloth shirt, a \ndeerskin, some thread, a grass belt, red cloth, an axe, a cord \nto tie firewood, a cooking pot, and wild food they bad \ngathered. \nThen they all went to Rama\'s festival. Pe

In [34]:
document_chunks[53].page_content

'Oh Listen . • . .  \n" \nRama, free your mind from malice and ill will; this is Val\xad\nmiki\'s song. \nOn the banks of the Sarayu river is Fair Ayodhya, the \nroyal capital of Kosala. She is a fabled city, famed among \nmen, twelve leagues long and ten wide, with Sala trees, filled \nwith grain and gold. Heaven is fair, Ayodhya is fairer; \nHeaven is cool in summer, but the Kosala hills are better. \nMajesty, when your father Dasaratha was alive he ruled \nfrom the tall white Ayodhya palace built atop a rising hill; he \nwas Lord of the Earth and the Lord of Men; he was a solar \nKing bright as the noonday Sun. In those bright days now \ngone by forever, the gods from the air saw Kosala to be clear'

In [35]:
document_chunks[54].page_content

'8 \nTHE PRINCE OF AYODHY A \nas a mirror, with no least touch of evil to make any black \nshadow \nover \nthe \nland. \nThe \nKosalas were \nwell-fed \nand \nhealthy, the Sarayu was filled with boats, every cow\'s horns \nwere covered with rings of silver and bands of gold. Every \nman could keep what he had in peace and - gain what more he \nwanted. \nThe young people wore elegant clothes, and life was joyful \namong the gardens and in the pleasure-parks. Three-\nand \nseven-storied mansions lined the wide straight streets. The \nKosalas had no enemies and Ayodhya was unconquerable. \nFlowers grew all over. Long-tusked elephants walked the \nstreets wearing bells on their necks. There were rows of full \nshops with open doors, pale white palaces, and lordly trees; \nrattling chariots drove by and there was music; foreign cara\xad\nvans came bringing merchants and rich tribute from lesser \nkings. \n-\nFair Ayodhya was filled with warriors, like a mountain \ncave filled with lions; he

### Embedding

Embeddings are vector representations of text that capture semantic meaning. Similar meanings = similar vectors.

In [36]:
#This model is chosen for generating high-quality embeddings for each text chunk
embedding_model = SentenceTransformerEmbeddings(model_name='all-MiniLM-L6-v2')

  embedding_model = SentenceTransformerEmbeddings(model_name='all-MiniLM-L6-v2')


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]



README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/103 [00:00<?, ?it/s]

BertModel LOAD REPORT from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [37]:
# Generating embedding for the first document chunk
embedding_1 = embedding_model.embed_query(document_chunks[51].page_content)
# Generating embedding for the second document chunk
embedding_2 = embedding_model.embed_query(document_chunks[52].page_content)

In [38]:
#Checking if both are of the same size
print("Dimension of the embedding vector ",len(embedding_1))
len(embedding_1)==len(embedding_2)

Dimension of the embedding vector  384


True

### Vector Database

A specialized database that stores embeddings and allows semantic search - finding similar vectors quickly.

How it works:

We store our chunk embeddings in ChromaDB
When user asks a question, we embed the question
ChromaDB finds the most similar chunk embeddings (cosine similarity)
We retrieve those chunks to give context to the LLM

In [39]:
import os

In [40]:
# Create the output directory 'ramayana_db' if it doesn't exist, to store processed data or vector database files.
out_dir = 'ramayana_db'

if not os.path.exists(out_dir):
  os.makedirs(out_dir)


In [41]:
vectorstore = Chroma.from_documents( #creating a Chroma vector store from a set of document chunks.
    document_chunks, # List of text chunks to be converted into embeddings
    embedding_model, # Embedding model used to generate vector representations of the chunks
    persist_directory=out_dir #Directory where the Chroma vector store will be saved
)


In [42]:
#Loading Chroma vector store with the given embedding model
vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)

  vectorstore = Chroma(persist_directory=out_dir,embedding_function=embedding_model)


In [43]:
#Accessing the embedding function used in the Chroma vector store
vectorstore.embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [44]:
#Performing a similarity search in the vector store to find the top 3 most similar documents to "Hanuman"
vectorstore.similarity_search("Hanuman",k=3)

[Document(metadata={'format': 'PDF 1.6', 'subject': '', 'author': 'William Buck', 'moddate': '2014-11-17T13:16:22-05:00', 'page': 380, 'creationDate': "D:20141117125546-05'00'", 'trapped': '', 'source': 'Ramayana.pdf', 'creationdate': '2014-11-17T12:55:46-05:00', 'modDate': "D:20141117131622-05'00'", 'title': 'Ramayana', 'total_pages': 386, 'creator': 'Adobe Acrobat 10.1.3', 'file_path': 'Ramayana.pdf', 'keywords': '', 'producer': 'Adobe Acrobat 10.1.3 Paper Capture Plug-in with ClearScan'}, page_content='Bear King and said, "You will live while Valmika\'s Rama\xad\nyana is heard on Earth." He bent close and gave Jambavan a \ncharm, a present, something precious. "And Hanuman also \nwill live so long, where is he now?" \nHanuman came bounding down from the sky. He hit the \nground with a thud like a thunderstone. He was right close to \nRama, smiling at him, laughing and gay. \n"Oh Hanuman!" \n"My King!" Hanuman knelt before Rama. \nRama said, "As long as men shall speak of you, you wi

### Retriever

In [45]:
retriever = vectorstore.as_retriever( #Converting the Chroma vector store into a retriever for querying.
    search_type='similarity', #Specifying that retrieval is based on cosine similarity
    search_kwargs={'k': 3} #Retrieving the top 3 most similar documents for a given query.
)

In [46]:
user_input = 'Who is Hanuman and what role does he play in the Ramayana?'
rel_docs = retriever.invoke(user_input)
rel_docs

[Document(metadata={'author': 'William Buck', 'subject': '', 'producer': 'Adobe Acrobat 10.1.3 Paper Capture Plug-in with ClearScan', 'creator': 'Adobe Acrobat 10.1.3', 'creationdate': '2014-11-17T12:55:46-05:00', 'format': 'PDF 1.6', 'keywords': '', 'page': 380, 'source': 'Ramayana.pdf', 'file_path': 'Ramayana.pdf', 'modDate': "D:20141117131622-05'00'", 'total_pages': 386, 'title': 'Ramayana', 'moddate': '2014-11-17T13:16:22-05:00', 'creationDate': "D:20141117125546-05'00'", 'trapped': ''}, page_content='Bear King and said, "You will live while Valmika\'s Rama\xad\nyana is heard on Earth." He bent close and gave Jambavan a \ncharm, a present, something precious. "And Hanuman also \nwill live so long, where is he now?" \nHanuman came bounding down from the sky. He hit the \nground with a thud like a thunderstone. He was right close to \nRama, smiling at him, laughing and gay. \n"Oh Hanuman!" \n"My King!" Hanuman knelt before Rama. \nRama said, "As long as men shall speak of you, you wi

### System and User Prompt Template

Prompts guide the model to generate accurate responses. Here, we define two parts:

    1. The system message describing the assistant's role.
    2. A user message template including context and the question.

In [49]:
qna_system_message = """You are a scholarly assistant with deep knowledge of the Ramayana.
Use ONLY the provided context from the Ramayana text to answer the question.
Do not add information that is not present in the context.
If the context does not contain enough information to answer the question, clearly say so.
Provide clear, well-structured, and accurate explanations based strictly on the retrieved text."""

qna_user_message_template = """Context:{context}

Question:{question}

Based strictly on the context above, provide a clear and detailed answer."""


In [50]:
def generate_rag_response(user_input, k=3, max_tokens=256, temperature=0.0, top_p=0.95, top_k=50):
    global qna_system_message, qna_user_message_template

    # Create retriever with dynamic k
    retriever_k = vectorstore.as_retriever(
        search_type='similarity',
        search_kwargs={'k': k}
    )

    # Retrieve relevant chunks
    relevant_document_chunks = retriever_k.invoke(user_input)

    # Extract text
    context_list = [doc.page_content for doc in relevant_document_chunks]

    # Combine context
    context_for_query = "\n\n".join(context_list)

    # Format user prompt
    user_message = qna_user_message_template.format(
        context=context_for_query,
        question=user_input
    )

    # Full prompt
    prompt = qna_system_message + "\n\n" + user_message

    try:
        response = llm(
            prompt=prompt,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            top_k=top_k
        )

        response = response['choices'][0]['text'].strip()

    except Exception as e:
        response = f"Error occurred: {e}"

    return response


## Question Answering using RAG
Can Fine-tune the chunking, retriever, and LLM parameters to check different results. These are some of the c=fine tuning that can be done to observe the changes.
1) Remove chunk overlap
2) Set temp from 0.0 to 0.7
3) Set k=2 from k=3
4) Set top_p from 0.95 to 0.8
5) Set top_k from 50 to 25

We can implement these at a later stage.

### Query 1: Who is Hanuman and what role does he play in the Ramayana?

In [51]:
#Response using basic parameters
user_input = 'Who is Hanuman and what role does he play in the Ramayana?'
print(generate_rag_response(user_input))

Llama.generate: prefix-match hit


Hanuman is a character in the Indian epic poem Ramayana. He is described as a bear king who met Rama, the protagonist of the story, and was granted immortality as long as people continue to recite the Ramayana on Earth. Hanuman is also mentioned to have arrived at the scene where Rama was present, and he paid his respects to him by kneeling before him and addressing him as "My King."

Rama acknowledged Hanuman's loyalty and strength, praising him for his true heart and strong arms, and for having done things that couldn't be done. The context does not provide any further information about Hanuman's background or the specific actions he took on Rama's behalf. However, it is mentioned earlier in the text that Hanuman played a significant role in helping Rama rescue his wife Sita from the demon king Ravana.

Therefore, based on the context provided, Hanuman is a loyal and strong devotee of Rama who was granted immortality by him. He is also known for his bravery and selfless service to Ra

### Query 2: What happened in the battle between Rama and Ravana?

In [52]:
user_input = 'What happened in the battle between Rama and Ravana'
print(generate_rag_response(user_input))

Llama.generate: prefix-match hit


The battle between Rama and Ravana took place when Rama, accompanied by his brother Lakshmana, Matali, and an army of monkeys led by Hanuman, arrived in Lanka to rescue Sita from Ravana's captivity. The two adversaries engaged each other in a fierce chariot battle.

Ravana, wielding a mace, charged towards Rama, who was protected by Matali. However, Matali skillfully deflected Ravana's blows and managed to knock the mace aside. Ravana then whirled his mace in a circle, causing it to moan with the sound of woe.

Rama, in response, took a spear from Indra's weapons-racks and threw it at Ravana. The spear broke Ravana's flagpole and flag, and also shattered his chariot's arrows. The running demon car lost its rattle and clatter, and its wheels turned on in mournful silence.

Matali was unable to outdrive Ravana's mace, so he dropped the reins and stood up in its path. The mace


### Query 1: Tell me about Sita's character in the Ramayana.


In [53]:
user_input = "Tell me about Sita's character in the Ramayana."
print(generate_rag_response(user_input))

Llama.generate: prefix-match hit


Sita is a central character in the Ramayana, known for her unwavering faithfulness and devotion to her husband Rama. She is portrayed as a model of good behavior and piety for Hindu women. However, doubts about her purity arise when she is abducted by Ravana and taken to Lanka. Despite this, Sita remains steadfast in her loyalty to Rama and eventually gives birth to his two sons, Kusa and Lava, in the forest. The great poet-sage Valmiki takes care of them and teaches them about their father's exploits.

At the end of the war, when Sita is brought before Rama, he does not receive her graciously as one might expect. Instead, Rama haughtily rebuffs her, believing that her honor has been compromised in Lanka and that she is unfit to become a queen again. In despair, Sita threatens to immolate herself in a fire, but the God of Fire refuses to burn her. This version of Sita's character contrasts with the one presented by Buck, who describes Rama receiving Sita graciously and tenderly back in

### Query 1: Who is Queen Tara and what happens to her?

In [54]:
user_input = 'Who is Queen Tara and what happens to her?'
print(generate_rag_response(user_input))

Llama.generate: prefix-match hit


Queen Tara, also known as Sita, is the queen of King Janaka in Videha Kingdom. She is described as having golden skin, beautiful dark eyes, firm breasts, a thin waist, round hips, and a soft smile. The text mentions that she was found by Janaka while he was plowing his field fourteen years ago. Sita is held captive in Lanka, the island kingdom of Ravana, after being abducted by him. She is guarded by Rakshasis and is threatened with death if she does not submit to Ravana. Despite her captivity, her heart remains true to her husband, Lord Rama. Hanuman, acting on Rama's behalf, goes to Lanka to rescue Sita. He finds her in an Asoka Grove behind the palace of the Demon King and manages to speak to her, reassuring her that help is on the way. The text does not provide any further information about what happens to Queen Tara after this point.
