# 🧠 Retrieval-Augmented Generation (RAG) with Re-Ranker – Explained Simply

Welcome!  
This notebook demonstrates how to build a **RAG (Retrieval-Augmented Generation)** system with a **re-ranker**. It's designed to help **AI models answer questions more accurately** by first finding relevant information from a document (like a PDF), then carefully selecting the **best parts** to base the answer on.

### 🧩 What’s Happening Here?

Think of it like this:
1. You ask a question 💬
2. The computer looks inside a PDF to find helpful info 📄
3. It uses a **smart filter (re-ranker)** to choose only the best info ✅
4. Then it asks an AI (like ChatGPT) to answer the question based on that info 🤖

### 🔧 What This Notebook Includes:

- 📥 **Document Ingestion**: Loading and breaking the PDF into smaller chunks
- 🔍 **Retrieval**: Finding the most relevant pieces for a question
- 🏅 **Re-Ranking**: Reordering the results so the best answers come first
- 💬 **Final Answer Generation**: Using a language model (LLM) to give a final answer based only on the filtered results

---

## 🗂️ Sections Breakdown

### 📘 1. Imports
Loading all the tools and libraries needed for processing, retrieval, and generation.

### 📄 2. Load and Split PDF
Read your PDF and break it into smaller readable pieces for the AI.

### 🧠 3. Vector Store with FAISS
Convert the text into searchable numbers (embeddings) and save them in FAISS.

### 📊 4. Reranker Model
Load a smart model that ranks the best pieces of text to answer the question.

### 📝 5. Prompt Setup
Tell the AI how to answer based on the context it gets.

### 🚀 6. Full RAG Pipeline
Connect everything together: retrieve → rerank → generate → display answer.


In [None]:
# 📘 Imports:
# This part brings in all the tools we need.
# - PyPDFLoader: Helps us read PDFs
# - TextSplitter: Breaks large text into smaller, readable parts
# - FAISS: Helps us search fast through text chunks
# - Prompts and Chains: Used to build the conversation with the AI model
# - LLMCall and Embeddings: These are your AI brain and text encoder

from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.document_loaders import PyPDFLoader
from langchain.prompts import ChatPromptTemplate
from llm_call import LLMCall
from embeddings import Embeddings
from operator import itemgetter

In [None]:
# This loads a PDF file from your computer.
# Then it breaks the PDF into small parts (called "chunks") so that the AI can read and understand them better.
# Big texts confuse models — this makes it easier!

pdf_file = 'sample.pdf'
chunk_size = 1000
chunk_overlap = 200

loader = PyPDFLoader(pdf_file)
documents = loader.load()

# Split the document into manageable chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap
)
texts = text_splitter.split_documents(documents)

In [15]:
texts[0]

Document(metadata={'producer': 'Adobe PDF Library 17.0', 'creator': 'Adobe InDesign 19.3 (Macintosh)', 'creationdate': '2024-06-18T14:09:48-07:00', 'moddate': '2024-06-18T14:10:14-07:00', 'trapped': '/False', 'source': 'sample.pdf', 'total_pages': 4, 'page': 0, 'page_label': '1'}, page_content='Before using iPhone, review the iPhone User Guide  at  \nsupport.apple.com/guide/iphone .\nSafety and Handling\nSee “Safety, handling, and support” in the iPhone  \nUser Guide .\nExposure to Radio Frequency\nOn iPhone, go to Settings > General > Legal &  \nRegulatory > RF Exposure. Or go to apple.com/  \nlegal/rfexposure .\nBattery and Charging\nAn iPhone battery should only be repaired by a trained \ntechnician to avoid battery damage, which could cause \noverheating, fire, or injury. Batteries should be recycled \nor disposed of separately from household waste and \naccording to local environmental laws and guidelines. For \ninformation about Apple lithium-ion batteries and battery \nservice a

In [16]:
len(texts)

8

## Re-Ranker Code

In [None]:
# This loads a second AI model (a re-ranker).
# It takes the documents found from the FAISS database and scores them to find the best ones.
# Think of it like ranking Google search results from most to least helpful.
# It uses a pre-trained model from BAAI called "bge-reranker-base".

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load reranker model
reranker_tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-reranker-base")
reranker_model = AutoModelForSequenceClassification.from_pretrained("BAAI/bge-reranker-base")

def rerank_documents(query: str, documents: list, top_k: int = 5) -> list:
    # Convert to list of strings if needed
    passages = [doc.page_content if hasattr(doc, "page_content") else str(doc) for doc in documents]

    # Tokenize
    inputs = reranker_tokenizer(
        [query] * len(passages),
        passages,
        padding=True,
        truncation=True,
        return_tensors="pt"
    )

    with torch.no_grad():
        scores = reranker_model(**inputs).logits.squeeze(-1)

    # Sort by score
    reranked = sorted(zip(passages, scores), key=lambda x: x[1], reverse=True)
    top_docs = [doc for doc, _ in reranked[:top_k]]

    return top_docs


In [None]:
# 📝 Define the custom prompt template used in the final RAG stage
# This sets up how we will ask the AI to answer our question.
# It gives the AI instructions like:
# "Use this context to answer. If you don't know, say you don't know."
# It helps keep answers accurate and based only on your PDF.

rag_template = """
You are a customer service agent for a apple mobile company. 
You have been given the following information about the customer question and the context.
Customer Query: {question}
Context: {context}

Answer: 
The answer should be based on the context provided.
Your task is to answer the customer question based on the context provided. If the question is not related to the context, please say "I don't know or Do Not Answer it just say please ask me question related to Apple Mobiles only".
Do not make up any information or provide any personal opinions or experiences.
Please answer in a friendly and professional manner.
"""

In [19]:
rag_prompt = ChatPromptTemplate.from_messages(
    [
	("system", "You are a helpful assistant."),
	("human", rag_template),
    ]
)

print(rag_prompt)

input_variables=['context', 'question'] input_types={} partial_variables={} messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are a helpful assistant.'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='\nYou are a customer service agent for a apple mobile company. \nYou have been given the following information about the customer question and the context.\nCustomer Query: {question}\nContext: {context}\n\nAnswer: \nThe answer should be based on the context provided.\nYour task is to answer the customer question based on the context provided. If the question is not related to the context, please say "I don\'t know or Do Not Answer it just say please ask me question related to Apple Mobiles only".\nDo not make up any information or provide any personal opinions or experiences.\nPlease answer in a

# 📘 Run the Full RAG Pipeline:

-  This is the main part of the program.
 >1. It gets a question from the user.
 >2. It finds the most related text chunks using the retriever.
 >3. It re-ranks those chunks to bring the most useful ones to the top.
 >4. It sends those top chunks and the question to the AI model.
 >5. It prints out the answer from the model!

## ☁️ Using Azure OpenAI for Embeddings & Generation

In [20]:
# 🧠 Initialize Azure OpenAI Embeddings
open_ai_embeddings = Embeddings.azure_openai()

vectorstore_openai = FAISS.from_documents(
    texts,
    open_ai_embeddings
)

retriever = vectorstore_openai.as_retriever()

# 🤖 Initialize Azure OpenAI Chat Model (LLM)
open_ai_llm = LLMCall.azure_openai()

question = "Is there a warranty on the phone?"

retrieved_docs = retriever.invoke(question)
reranked_docs = rerank_documents(question, retrieved_docs, top_k=5)


In [21]:
final_rag_chain = (
    {"context": lambda _: reranked_docs, "question": itemgetter("question")}
    | rag_prompt
    | open_ai_llm
    | StrOutputParser()
)

response = final_rag_chain.invoke({"question": question})
print("📦 Answer:", response)

📦 Answer: Yes, there is a warranty on the phone. Apple offers a One-Year Limited Warranty that covers defects in materials and workmanship for one year from the date of original retail purchase. However, this warranty does not cover normal wear and tear or damage caused by accident or abuse. If you need to obtain service under this warranty, you can call Apple or visit an Apple Store or an Apple Authorized Service Provider. Please remember that you may need to provide proof of purchase when making a claim. For more detailed information, you can visit apple.com/legal/warranty.


## 🤗 Using Hugging Face for Embeddings & Generation

In [None]:
huggingface_embeddings = Embeddings.huggingface()

vectorstore_hf = FAISS.from_documents(
    texts,
    huggingface_embeddings)

retriever_hf = vectorstore_hf.as_retriever()

huggingface_llm = LLMCall.huggingface()

question = "Is there a warranty on the phone?"

retrieved_docs = retriever_hf.invoke(question)
reranked_docs = rerank_documents(question, retrieved_docs, top_k=5)

In [None]:
final_rag_chain = (
    {"context": lambda _: reranked_docs, "question": itemgetter("question")}
    | rag_prompt
    | huggingface_llm
    | StrOutputParser()
)

response = final_rag_chain.invoke({"question": question})
print("📦 Answer:", response)

## 🦙 Using Ollama for Local LLM Inference

In [None]:
ollama_llm = LLMCall.chat_ollama()

vectorstore_ollama = FAISS.from_documents(
    texts,
    huggingface_embeddings)

retriever_ollama = vectorstore_ollama.as_retriever()

question = "Is there a warranty on the phone?"

retrieved_docs = retriever_ollama.invoke(question)
reranked_docs = rerank_documents(question, retrieved_docs, top_k=5)

In [None]:
final_rag_chain = (
    {"context": lambda _: reranked_docs, "question": itemgetter("question")}
    | rag_prompt
    | ollama_llm
    | StrOutputParser()
)

response = final_rag_chain.invoke({"question": question})
print("📦 Answer:", response)

## ⚡ Using Groq Inference API

In [28]:
groq_llm = LLMCall.chat_groq()

vectorstore_groq = FAISS.from_documents(
    texts,
    # huggingface_embeddings
    open_ai_embeddings)

retriever_groq = vectorstore_groq.as_retriever()

question = "Is there a warranty on the phone?"

retrieved_docs = retriever_groq.invoke(question)
reranked_docs = rerank_documents(question, retrieved_docs, top_k=5)

In [None]:
final_rag_chain = (
    {"context": lambda _: reranked_docs, "question": itemgetter("question")}
    | rag_prompt
    | groq_llm
    | StrOutputParser()
)

response = final_rag_chain.invoke({"question": question})
print("📦 Answer:", response)

📦 Answer: Yes, there is a warranty on the phone. According to the Apple One-Year Limited Warranty, the company warrants the included hardware product and accessories against defects in materials and workmanship for one year from the date of original retail purchase. However, please note that this warranty does not cover normal wear and tear, damage caused by accident or abuse. If you have any issues with your device, you can call Apple or visit an Apple Store or an Apple Authorized Service Provider for service. You may be required to furnish proof of purchase details when making a claim under this warranty. For more detailed information, you can visit apple.com/legal/warranty and support.apple.com.


<!-- Font Awesome CDN (Add in <head> if not already included) -->
<link
  rel="stylesheet" 
  href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.0/css/all.min.css"
/>

<!-- Social Footer Section -->
<div style="
  background-color:rgb(199, 195, 195);
  padding: 40px 30px;
  border-radius: 20px;
  box-shadow: 0 4px 12px rgba(0,0,0,0.08);
  font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
  font-size: 18px;
  max-width: 900px;
  margin: 60px auto 30px;
  text-align: center;
  color: #444;
">
<!-- End of Notebook Note -->
  <h2 style="margin-bottom: 10px;">📘 End of Notebook</h2>
  <p style="color: #666; font-size: 14px;">
    Thank you for exploring! Feel free to connect via the links below.
  </p>

  <!-- Social Icons -->
<div style="
  display: flex;
  gap: 25px;
  align-items: center;
  flex-wrap: wrap;
  justify-content: center;
  margin-bottom: 25px;
">
  <!-- LinkedIn -->
  <a href="https://www.linkedin.com/in/ChiragB254" target="_blank" style="text-decoration: none; color: #0077b5;">
    <i class="fab fa-linkedin fa-lg"></i> LinkedIn
  </a>

  <!-- GitHub -->
  <a href="https://github.com/ChiragB254" target="_blank" style="text-decoration: none; color: #333;">
    <i class="fab fa-github fa-lg"></i> GitHub
  </a>

  <!-- Instagram -->
  <a href="https://www.instagram.com/data.scientist_chirag" target="_blank" style="text-decoration: none; color: #E1306C;">
    <i class="fab fa-instagram fa-lg"></i> Instagram
  </a>

  <!-- Email -->
  <a href="mailto:devchirag27@gmail.com" style="text-decoration: none; color: #D44638;">
    <i class="fas fa-envelope fa-lg"></i> Email
  </a>

  <!-- X (Twitter) -->
  <a href="https://x.com/ChiragB254" target="_blank" style="text-decoration: none; color: #000;">
    <i class="fab fa-x-twitter fa-lg"></i> X.com
  </a>
  </div>

  <p style="font-size: 13px; color: black; font-style: italic; margin-top: 8px;">
    <strong>Made with ❤️ by Chirag Bansal</strong>
  </p>
</div>

