# 🔀 RAG Fusion with Multiple LLM Backends

This notebook implements **RAG Fusion**, an enhancement of standard Retrieval-Augmented Generation (RAG) where **multiple sub-queries** are used to retrieve a more diverse and complete set of context documents.

We fuse the results from each sub-query using **Reciprocal Rank Fusion (RRF)** and pass the unified context to the LLM to generate an accurate answer.

---

## 🔧 LLMs & Embeddings Used:

- **Embeddings:** `sentence-transformers/all-mpnet-base-v2` (Hugging Face)
- **LLMs:**
  - 🧠 Azure OpenAI (`gpt-35-turbo` or `gpt-4`)
  - 🤗 Hugging Face (`gemma-2b-it`)
  - 🦙 Ollama (locally run `llama3.2`)
  - ⚡ Groq Inference API (`llama-3.3-70b-versatile`)

---

## 💡 Highlights:
- Sub-query generation improves recall
- RRF scoring re-ranks retrieved chunks from multiple queries
- You can switch between LLMs by changing one line



In [21]:
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.document_loaders import PyPDFLoader
from langchain.prompts import ChatPromptTemplate
from llm_call import LLMCall
from embeddings import Embeddings
from operator import itemgetter

In [22]:
# Load PDF and split it into chunks
pdf_file = 'sample.pdf'
chunk_size = 1000
chunk_overlap = 200

loader = PyPDFLoader(pdf_file)
documents = loader.load()

# Split the document into manageable chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap
)
texts = text_splitter.split_documents(documents)

In [23]:
texts[0]

Document(metadata={'producer': 'Adobe PDF Library 17.0', 'creator': 'Adobe InDesign 19.3 (Macintosh)', 'creationdate': '2024-06-18T14:09:48-07:00', 'moddate': '2024-06-18T14:10:14-07:00', 'trapped': '/False', 'source': 'sample.pdf', 'total_pages': 4, 'page': 0, 'page_label': '1'}, page_content='Before using iPhone, review the iPhone User Guide  at  \nsupport.apple.com/guide/iphone .\nSafety and Handling\nSee “Safety, handling, and support” in the iPhone  \nUser Guide .\nExposure to Radio Frequency\nOn iPhone, go to Settings > General > Legal &  \nRegulatory > RF Exposure. Or go to apple.com/  \nlegal/rfexposure .\nBattery and Charging\nAn iPhone battery should only be repaired by a trained \ntechnician to avoid battery damage, which could cause \noverheating, fire, or injury. Batteries should be recycled \nor disposed of separately from household waste and \naccording to local environmental laws and guidelines. For \ninformation about Apple lithium-ion batteries and battery \nservice a

In [24]:
len(texts)

8

## 🧠 RAG Fusion Prompt Template

In RAG Fusion, we enhance the retrieval step by issuing multiple semantically varied versions of a user's question.  
This helps the system fetch a broader and more accurate set of documents across different sub-query perspectives.

### 🔍 Why This Matters:
- Vector similarity search can be overly literal — rephrased queries help surface diverse content
- Reciprocal Rank Fusion (RRF) benefits from having varied but meaningful results from different queries
- It reduces overfitting to a single phrasing and increases the robustness of document retrieval

This template is used to instruct the LLM to generate **5 alternative versions** of the user’s input query.  
Each version expresses the **same intent**, but uses different structure, keywords, or phrasing — ideal for feeding into a multi-query retrieval pipeline, which is later fused via RRF.


In [25]:
rag_fusion_template = """

You are a question(query) generator. You can create multiple questions similar with multiple prespective to the given question with the same meaning.
By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Please generate 5 questions similar to the given question.
Make sure to use different words and phrases to express the same idea.
The questions should be clear and concise.
The questions should be grammatically correct and easy to understand.
The questions should be in English.
Provide these alternative questions separated by newlines.

The question is: {question}

"""

In [26]:
rag_fusion_prompt = ChatPromptTemplate.from_messages(    [
	("system", "You are a helpful assistant."),
	("human", rag_fusion_template),
    ]
)

## 🔁 Reciprocal Rank Fusion (RRF)

RRF is used to merge results from multiple ranked retrievals (from sub-queries).
It improves document diversity and recall, especially when different phrasings produce different top-k results.

The formula used:  
**Score = 1 / (rank + k)**

Where:
- `rank` = position of document in list
- `k` = a constant (default = 60)


In [27]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

In [28]:
# 📝 Define the custom prompt template used in the final RAG stage

rag_template = """
You are a customer service agent for a apple mobile company. 
You have been given the following information about the customer question and the context.
Customer Query: {question}
Context: {context}

Answer: 
The answer should be based on the context provided.
Your task is to answer the customer question based on the context provided. If the question is not related to the context, please say "I don't know or Do Not Answer it just say please ask me question related to Apple Mobiles only".
Do not make up any information or provide any personal opinions or experiences.
Please answer in a friendly and professional manner.
"""

In [29]:
rag_prompt = ChatPromptTemplate.from_messages(
    [
	("system", "You are a helpful assistant."),
	("human", rag_template),
    ]
)

print(rag_prompt)

input_variables=['context', 'question'] input_types={} partial_variables={} messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are a helpful assistant.'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='\nYou are a customer service agent for a apple mobile company. \nYou have been given the following information about the customer question and the context.\nCustomer Query: {question}\nContext: {context}\n\nAnswer: \nThe answer should be based on the context provided.\nYour task is to answer the customer question based on the context provided. If the question is not related to the context, please say "I don\'t know or Do Not Answer it just say please ask me question related to Apple Mobiles only".\nDo not make up any information or provide any personal opinions or experiences.\nPlease answer in a

## ☁️ Using Azure OpenAI for Embeddings & Generation

In [None]:
# 🧠 Initialize Azure OpenAI Embeddings
open_ai_embeddings = Embeddings.azure_openai()

vectorstore_openai = FAISS.from_documents(
    texts,
    open_ai_embeddings
)

retriever = vectorstore_openai.as_retriever()

# 🤖 Initialize Azure OpenAI Chat Model (LLM)
open_ai_llm = LLMCall.azure_openai()

generate_queries = (
    rag_fusion_prompt 
    | open_ai_llm
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

question = "Is there a warranty on the phone?"

# 🔄 Build RAG Fusion Retrieval Chain
# 1. Generate sub-queries from original question
# 2. Run document retrieval for each query
# 3. Combine and re-rank results using Reciprocal Rank Fusion (RRF)
retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion

docs = retrieval_chain_rag_fusion.invoke({"question": question})

## 🤖 Azure OpenAI – Final Answer Generation


In [31]:
final_rag_chain = (
    {"context": retrieval_chain_rag_fusion, 
     "question": itemgetter("question")} 
    | rag_prompt
    | open_ai_llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

'Yes, there is a warranty on the phone. Apple provides a one-year limited warranty that covers defects in materials and workmanship for the included hardware product and accessories, starting from the date of original retail purchase. Please note that this warranty does not cover normal wear and tear or damage caused by accidents or abuse. For more detailed information on obtaining service, you can visit apple.com/legal/warranty or support.apple.com. If you have any further questions, feel free to ask!'

## 🤗 Using Hugging Face for Embeddings & Generation

In [None]:
huggingface_embeddings = Embeddings.huggingface()

vectorstore_hf = FAISS.from_documents(
    texts,
    huggingface_embeddings)

retriever_hf = vectorstore_hf.as_retriever()

huggingface_llm = LLMCall.huggingface()

generate_queries_hf = (
    rag_fusion_prompt 
    | huggingface_llm
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

# 🔄 Build RAG Fusion Retrieval Chain
# 1. Generate sub-queries from original question
# 2. Run document retrieval for each query
# 3. Combine and re-rank results using Reciprocal Rank Fusion (RRF)
retrieval_chain_rag_fusion_hf = generate_queries_hf | retriever_hf.map() | reciprocal_rank_fusion

docs_hf = retrieval_chain_rag_fusion_hf.invoke({"question": question})

## 🤗 Hugging Face – Final Answer Generation


In [None]:
final_rag_chain_hf = (
    {"context": retrieval_chain_rag_fusion_hf, 
     "question": itemgetter("question")} 
    | rag_prompt
    | huggingface_llm
    | StrOutputParser()
)

final_rag_chain_hf.invoke({"question": question})

## 🦙 Using Ollama for Local LLM Inference

In [None]:
ollama_llm = LLMCall.chat_ollama()

vectorstore_ollama = FAISS.from_documents(
    texts,
    huggingface_embeddings)

retriever_ollama = vectorstore_ollama.as_retriever()

generate_queries_ollama = (
    rag_fusion_prompt 
    | ollama_llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

# 🔄 Build RAG Fusion Retrieval Chain
# 1. Generate sub-queries from original question
# 2. Run document retrieval for each query
# 3. Combine and re-rank results using Reciprocal Rank Fusion (RRF)
retrieval_chain_rag_fusion_ollama = (
    generate_queries_ollama 
    | retriever_ollama.map()
    | reciprocal_rank_fusion
)

docs_ollama = retrieval_chain_rag_fusion_ollama.invoke({"question": question})

## 🦙 Ollama – Final Answer Generation


In [None]:
final_rag_chain_ollama = (
    {"context": retrieval_chain_rag_fusion_ollama, 
     "question": itemgetter("question")}
    | rag_prompt
    | ollama_llm
    | StrOutputParser()
)

final_rag_chain_ollama.invoke({"question": question})

## ⚡ Using Groq Inference API

In [None]:
groq_llm = LLMCall.chat_groq()

vectorstore_groq = FAISS.from_documents(
    texts,
    huggingface_embeddings)

retriever_groq = vectorstore_groq.as_retriever()

generate_queries_groq = (
    rag_fusion_prompt 
    | groq_llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

# 🔄 Build RAG Fusion Retrieval Chain
# 1. Generate sub-queries from original question
# 2. Run document retrieval for each query
# 3. Combine and re-rank results using Reciprocal Rank Fusion (RRF)
retrieval_chain_rag_fusion_groq = (
    generate_queries_groq 
    | retriever_groq.map()
    | reciprocal_rank_fusion
)

docs_groq = retrieval_chain_rag_fusion_groq.invoke({"question": question})

## ⚡ Groq Inference – Final Answer Generation


In [None]:
final_rag_chain_groq = (
    {"context": retrieval_chain_rag_fusion_groq, 
     "question": itemgetter("question")}
    | rag_prompt
    | groq_llm
    | StrOutputParser()
)

final_rag_chain_groq.invoke({"question": question})

<!-- Font Awesome CDN (Add in <head> if not already included) -->
<link
  rel="stylesheet" 
  href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.0/css/all.min.css"
/>

<!-- Social Footer Section -->
<div style="
  background-color:rgb(199, 195, 195);
  padding: 40px 30px;
  border-radius: 20px;
  box-shadow: 0 4px 12px rgba(0,0,0,0.08);
  font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
  font-size: 18px;
  max-width: 900px;
  margin: 60px auto 30px;
  text-align: center;
  color: #444;
">
<!-- End of Notebook Note -->
  <h2 style="margin-bottom: 10px;">📘 End of Notebook</h2>
  <p style="color: #666; font-size: 14px;">
    Thank you for exploring! Feel free to connect via the links below.
  </p>

  <!-- Social Icons -->
<div style="
  display: flex;
  gap: 25px;
  align-items: center;
  flex-wrap: wrap;
  justify-content: center;
  margin-bottom: 25px;
">
  <!-- LinkedIn -->
  <a href="https://www.linkedin.com/in/ChiragB254" target="_blank" style="text-decoration: none; color: #0077b5;">
    <i class="fab fa-linkedin fa-lg"></i> LinkedIn
  </a>

  <!-- GitHub -->
  <a href="https://github.com/ChiragB254" target="_blank" style="text-decoration: none; color: #333;">
    <i class="fab fa-github fa-lg"></i> GitHub
  </a>

  <!-- Instagram -->
  <a href="https://www.instagram.com/data.scientist_chirag" target="_blank" style="text-decoration: none; color: #E1306C;">
    <i class="fab fa-instagram fa-lg"></i> Instagram
  </a>

  <!-- Email -->
  <a href="mailto:devchirag27@gmail.com" style="text-decoration: none; color: #D44638;">
    <i class="fas fa-envelope fa-lg"></i> Email
  </a>

  <!-- X (Twitter) -->
  <a href="https://x.com/ChiragB254" target="_blank" style="text-decoration: none; color: #000;">
    <i class="fab fa-x-twitter fa-lg"></i> X.com
  </a>
  </div>

  <p style="font-size: 13px; color: black; font-style: italic; margin-top: 8px;">
    <strong>Made with ❤️ by Chirag Bansal</strong>
  </p>
</div>

