# Chapter 06 – Advanced RAG Systems

In this notebook we will:
1. Use **chunking strategies** for better document ingestion.
2. Improve **retrieval with re-ranking & compression**.
3. Design **advanced prompts with citations**.
4. Evaluate RAG answers with LangChain evaluators.


In [1]:
import os
import sys
from pathlib import Path

sys.path.append(os.path.abspath(".."))

In [2]:
# initializing the llm
from llm.load_llm import initialize_llm

llm = initialize_llm()

LLM ready: ChatGoogleGenerativeAI


In [3]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter

embeddings  = GoogleGenerativeAIEmbeddings(model = "models/embedding-001")

## 6.1 Load Documents and Apply Chunking

We will use a text file 

In [8]:
import os
print(os.path.exists("rag.txt"))  # Should return True if the file exists

True


In [11]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("rag.txt" ,  encoding="utf-8")
docs = loader.load()

# Chunking 
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunked_docs = splitter.split_documents(docs)

print(f"Original docs: {len(docs)} | After chunking: {len(chunked_docs)}")
print(chunked_docs[0].page_content[:300])

Original docs: 1 | After chunking: 16
## An Introduction to Modern AI Application Development

The world of Large Language Models (LLMs) is rapidly evolving beyond simple chatbots. Developers are now building sophisticated applications that can reason, access external data, and perform complex tasks. Three fundamental concepts at the he


In [12]:
chunked_docs

[Document(metadata={'source': 'rag.txt'}, page_content='## An Introduction to Modern AI Application Development\n\nThe world of Large Language Models (LLMs) is rapidly evolving beyond simple chatbots. Developers are now building sophisticated applications that can reason, access external data, and perform complex tasks. Three fundamental concepts at the heart of this revolution are **LangChain**, **Embeddings**, and **Retrieval-Augmented Generation (RAG)**. Understanding how these technologies work together is key to unlocking the full potential of AI.'),
 Document(metadata={'source': 'rag.txt'}, page_content='---\n\n### Embeddings: The Language of Machines 🧠\n\nAt its core, a language model doesn\'t understand words like "cat" or "dog" in the way humans do. It understands numbers. **Embeddings** are the bridge that translates our qualitative, nuanced language into the quantitative, numerical language of machines.'),
 Document(metadata={'source': 'rag.txt'}, page_content='An embedding 

we can also perform this process with the simple `open` in python

## 6.2 Create a Vector Database
Create a Chroma Database using Gemini embeddings

In [17]:
vectorstore = Chroma.from_documents(chunked_docs, embeddings, persist_directory="./chroma_db")
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})


## 6.3 Advanced Prompt for Retrieval with Citations

In [18]:
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""
Answer the question based ONLY on the following context:

{context}

Question: {question}

Answer in 3-4 sentences. Use citations like [source1], [source2].
"""
)

In [19]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",
    chain_type_kwargs={"prompt": prompt},
    return_source_documents=True,
    verbose=True
)

## 6.4 Query with Advanced RAG

In [21]:
query = "What are the advantages of using embeddings?"
result = qa_chain.invoke({"query": query})

print("Answer:\n", result["result"])
print("\nSources:")
for doc in result["source_documents"]:
    print(" -", doc.metadata)




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
Answer:
 Embeddings translate human language into numerical representations that machines can understand [Embeddings: The Language of Machines 🧠].  This allows for vector search, enabling efficient retrieval of relevant information from a knowledge base by comparing the embedding of a query to the embeddings of documents [1].  The similarity of embeddings reflects the semantic meaning of the text, capturing the "vibe" of the information [An embedding is a vector].  This facilitates more accurate and nuanced information retrieval compared to traditional keyword-based methods.

Sources:
 - {'source': 'rag.txt'}
 - {'source': 'rag.txt'}
 - {'source': 'rag.txt'}


## 6.5 Evaluating RAG answers

In [24]:
from langchain.evaluation.qa import QAEvalChain

examples = [
    {"query": query, "answer": "Translates data from human readable form to numerical form."}
]

predictions = [{"query": "What is the main topic of the document?", "result": result["result"]}]

eval_chain = QAEvalChain.from_llm(llm)
graded = eval_chain.evaluate(examples, predictions)

print("Evaluation:\n", graded)


Evaluation:
 [{'results': 'GRADE: CORRECT'}]
