<a href="https://colab.research.google.com/github/BalaSree2005/RAG_application/blob/main/rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip install langchain langchain-community langchain-google-genai langchain-chroma pypdf python-dotenv

Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-google-genai
  Downloading langchain_google_genai-2.1.9-py3-none-any.whl.metadata (7.2 kB)
Collecting langchain-chroma
  Downloading langchain_chroma-0.2.5-py3-none-any.whl.metadata (1.1 kB)
Collecting pypdf
  Downloading pypdf-6.0.0-py3-none-any.whl.metadata (7.1 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain-google-genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-ai-generativelanguage<0.7.0,>=0.6.18 (from langchain-google-genai)
  Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl.metadata (9.8 kB)
Collecting chromadb>=1.0.9 (from langchain-chroma)
  Downloading chromadb-1.0.20-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB)
Collect

In [1]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_community.vectorstores import Chroma
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
import os
from dotenv import load_dotenv
load_dotenv()


False

In [6]:
loader = PyPDFLoader("File.pdf")
docs = loader.load()
print(docs[0].metadata)

{'producer': 'Microsoft® Word 2016', 'creator': 'Microsoft® Word 2016', 'creationdate': '2025-08-26T14:08:46+05:30', 'author': 'Windows User', 'moddate': '2025-08-26T14:08:46+05:30', 'source': 'File.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}


In [7]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
chunks = splitter.split_documents(docs)
print(len(chunks))
print(chunks[0].page_content)

8
Andhra Pradesh is a prominent state located in the southeastern region of India. It shares its borders 
with Telangana, Odisha, Chhattisgarh, Karnataka, Tamil Nadu, and the Bay of Bengal to the east. The 
state boasts one of the longest coastlines in India, approximately 974 kilometers, which plays a crucial 
role in its economy and culture. The geography of Andhra Pradesh is diverse, consisting of fertile plains, 
hills, and plateaus. The Eastern Ghats run through parts of the state, creating picturesque hill stations 
such as Araku Valley and Horsley Hills. The Krishna and Godavari are the two major rivers flowing 
through Andhra Pradesh, providing essential irrigation that supports the state’s predominantly agrarian 
economy. 
 
The state’s capital is Amaravati, a newly developing city conceived to be the political and administrative 
hub after the bifurcation of Telangana in 2014. Prior to this, Hyderabad served as the joint capital for


In [10]:
import os
from google.colab import userdata


# Set the Google API Key as an environment variable
os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')
embeddings = GoogleGenerativeAIEmbeddings(model="gemini-embedding-001", google_api_key=os.environ["GOOGLE_API_KEY"])

In [11]:
vector_store = Chroma.from_documents(documents=chunks, embedding=embeddings)

In [12]:
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})

In [13]:
chat_model = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0.1,
    max_output_tokens=500
)

In [14]:
prompt = PromptTemplate(
    template="""
      You are a helpful assistant.
      Answer ONLY from the provided transcript context.
      If the context is insufficient, just say you don't know.

      {context}
      Question: {question}
    """,
    input_variables=['context', 'question']
)

In [15]:
question = 'What is the capital of Andhra Pradesh?' #'probation period is' #'duration of notice period' #'how much graduity deducted'
retrieved_docs = retriever.invoke(question)
context = "\n\n".join([doc.page_content for doc in retrieved_docs])

In [16]:
final_prompt = prompt.invoke({
    "context": context,
    "question": question
})


In [17]:
parser = StrOutputParser()

# Generate the answer
response = chat_model.invoke(final_prompt)

parser.invoke(response.content)

'The capital of Andhra Pradesh is Amaravati.'

In [18]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda

In [19]:
def format_docs(retrieved_docs):
  context_text = "\n\n".join([doc.page_content for doc in retrieved_docs])
  return context_text

In [20]:
parallel_chain = RunnableParallel(
    context= retriever | RunnableLambda(format_docs),
    question=RunnablePassthrough()
)

In [21]:
parallel_chain.invoke('What is the famous food in Andhra Pradesh?')

{'context': 'The culture of Andhra Pradesh is vibrant and deeply rooted in traditions and customs. Telugu literature \nis celebrated for its classical and modern works, encompassing poetry, drama, and prose. The state is \nfamous for Kuchipudi, one of the eight classical dance forms of India, which combines graceful \nmovements with expressive storytelling based on Hindu mythology. Carnatic music is widely practiced \nand cherished, with many musicians from Andhra Pradesh earning national and international acclaim. \nThe state celebrates a variety of festivals such as Sankranti, Ugadi, Vinayaka Chaturthi, and Dasara with \nenthusiasm, reflecting the agricultural and religious heritage. Andhra cuisine is known for its rich, spicy \nflavors and unique dishes such as Pesarattu, Gongura pickle, Pulihora, and the famous Andhra biryani. \n \nAgriculture is the backbone of Andhra Pradesh’s economy, employing a large portion of the population.\n\nand religious significance. The famous Tirupati

In [22]:
parser = StrOutputParser()

In [23]:
main_chain = parallel_chain | prompt | chat_model | parser

In [25]:
main_chain.invoke('famous Andhra cuisine ?')

'Andhra cuisine is known for its rich, spicy flavors and unique dishes such as Pesarattu, Gongura pickle, Pulihora, and the famous Andhra biryani.'