# 🔍 RAG with Query Decomposition – Simple Explanation

This notebook shows how to build a **RAG (Retrieval-Augmented Generation)** system that uses a special trick called **query decomposition** to answer complex questions better.

---

## 🧠 What's RAG?

RAG means:
1. Take a user question
2. Search for relevant information (chunks) from documents
3. Use a language model (like ChatGPT) to write an answer based on those documents

This helps the model give better answers — especially when it has limited memory and can't store all information.

---

## 🧩 What is Query Decomposition?

Sometimes, the question we ask is **too big** or **too complex** for the system to search properly. So, instead of using just the original question, we break it down into **smaller sub-questions**.

For example:
> **Original Question**: *How does Apple's iPhone differ from Samsung's latest phone in terms of camera and performance?*  
> **Decomposed Queries**:  
> - What is the camera quality of the iPhone?  
> - What is the performance of the iPhone?  
> - How does Samsung's latest phone camera compare?  
> - What is the performance of Samsung's phone?

By breaking things down, we can search **more effectively**, collect more useful info, and give **better answers**.

---

## 🔨 What This Notebook Does

Here’s what happens step-by-step:

1. **Load a PDF file** with content (for example, a manual, research doc, or article)
2. **Split it into chunks** so it's easier to search
3. Take the user's question and **break it into smaller questions** using an LLM (like GPT)
4. Use each sub-question to search for relevant chunks
5. **Combine** all the found chunks
6. Feed everything to the language model to **generate a good answer**

---

## 🤖 LLMs and Embeddings Used

- **Embeddings** (turn text into vectors): Hugging Face model `sentence-transformers/all-MiniLM-L6-v2`
- **LLM for Decomposition and Answering**: Can be OpenAI, HuggingFace, Ollama, or Groq (you choose)

---

## ✅ Why Use Query Decomposition?

- Helps the system understand **complex or multi-part questions**
- Increases the chance of finding **more relevant chunks**
- Makes the final answer **more complete and accurate**

---

> This is a great starting point if you're learning about combining search + AI to build smarter systems!


******

## 📚 Importing Required Libraries

We start by importing all the tools we need from LangChain and standard Python libraries.  
These include document loaders, embedding models, LLM wrappers, and vector databases.


In [1]:
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.document_loaders import PyPDFLoader
from langchain.prompts import ChatPromptTemplate
from langchain_core.prompts import PromptTemplate
from llm_call import LLMCall
from embeddings import Embeddings
from operator import itemgetter

## 📄 Load and Prepare Documents

We load a PDF file and break it into smaller overlapping text chunks using LangChain's text splitter.  
This makes retrieval more accurate, especially for long documents.


In [2]:
# Load PDF and split it into chunks
pdf_file = 'sample.pdf'
chunk_size = 1000
chunk_overlap = 200

loader = PyPDFLoader(pdf_file)
documents = loader.load()

# Split the document into manageable chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=chunk_size, chunk_overlap=chunk_overlap
)
texts = text_splitter.split_documents(documents)

  from cryptography.hazmat.primitives.ciphers.algorithms import AES, ARC4


In [None]:
# Show the first text chunk to inspect what the document looks like after splitting

texts[0]

Document(metadata={'producer': 'Adobe PDF Library 17.0', 'creator': 'Adobe InDesign 19.3 (Macintosh)', 'creationdate': '2024-06-18T14:09:48-07:00', 'moddate': '2024-06-18T14:10:14-07:00', 'trapped': '/False', 'source': 'sample.pdf', 'total_pages': 4, 'page': 0, 'page_label': '1'}, page_content='Before using iPhone, review the iPhone User Guide  at  \nsupport.apple.com/guide/iphone .\nSafety and Handling\nSee “Safety, handling, and support” in the iPhone  \nUser Guide .\nExposure to Radio Frequency\nOn iPhone, go to Settings > General > Legal &  \nRegulatory > RF Exposure. Or go to apple.com/  \nlegal/rfexposure .\nBattery and Charging\nAn iPhone battery should only be repaired by a trained \ntechnician to avoid battery damage, which could cause \noverheating, fire, or injury. Batteries should be recycled \nor disposed of separately from household waste and \naccording to local environmental laws and guidelines. For \ninformation about Apple lithium-ion batteries and battery \nservice a

In [None]:
# Check how many chunks were generated from the document

len(texts)

8

## 🧠 Query Decomposition Prompt Template

This prompt helps the LLM take a big or complex question and break it down into 3–5 smaller sub-questions.  
Each sub-question focuses on one part of the original — great for improving retrieval and accuracy.


In [None]:
# Decomposition
# LLM prompt template to decompose user queries into multiple simpler sub-questions

template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):"""

prompt_decomposition = ChatPromptTemplate.from_template(template)

## Decomposition Query Test Cell

In [None]:
# LLM
open_ai_llm = LLMCall.azure_openai()

generate_queries_decomposition = ( prompt_decomposition | open_ai_llm | StrOutputParser() | (lambda x: x.split("\n")))

# Run
question = "Is there a warranty on the phone?"
questions = generate_queries_decomposition.invoke({"question":question})
questions

['1. What is the standard warranty period for smartphones from major manufacturers?',
 '2. How can I check if my phone is still under warranty?',
 '3. What types of damages are typically covered under a smartphone warranty?']

## Prompts

In [None]:
# 📝 Define the custom prompt template used in the Output for each decomposition query

rag_template = """
You are a customer service agent for a apple mobile company.
Customer Query: \n --- \n {question} \n --- \n

You have been given the following information about the context.
Context: \n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}

The answer should be based on the context provided.
Your task is to answer the customer question based on the context provided. If the question is not related to the context, please say "I don't know or Do Not Answer it just say please ask me question related to Apple Mobiles only".
Do not make up any information or provide any personal opinions or experiences.
Please answer in a friendly and professional manner.
"""

prompt_rag_template = PromptTemplate.from_template(rag_template)

In [None]:
# 📝 Define the custom prompt template used in the final RAG stage
final_template = """

You are a customer service agent for a apple mobile company.
Customer Query: {question}

You have been given the following set of Q+A pairs use the Answers as the context.
Context: {context}

Use the above context and any background question + answer pairs to answer the question: \n {question}

Answer:

The answer should be based on the context provided.
Your task is to answer the customer question based on the context provided. If the question is not related to the context, please say "I don't know or Do Not Answer it just say please ask me question related to Apple Mobiles only".
Do not make up any information or provide any personal opinions or experiences.
Please answer in a friendly and professional manner.
"""

final_prompt = ChatPromptTemplate.from_template(final_template)

## 🔁 RAG Execution Over Decomposed Queries

This function takes a complex user question, breaks it into sub-questions using a query decomposition chain, and then performs **RAG (Retrieve + Generate)** on each sub-question individually.

### What it does:
1. Generates sub-questions from the original query
2. Retrieves relevant documents for each sub-question using a retriever
3. Passes each sub-question and its context to the LLM to generate an answer
4. Returns all generated answers along with the sub-questions

This helps handle long or multi-topic questions by **splitting and answering them separately**, which improves relevance and precision.

In [None]:
def retrieve_and_rag(question,prompt_rag,sub_question_generator_chain,llm,retriever):
    """RAG on each sub-question"""
    
    # Step 1: Use the LLM to generate sub-questions from the main question
    sub_questions = sub_question_generator_chain.invoke({"question":question})
    
    # Step 2: Initialize a list to store answers from each RAG chain results
    rag_results = []
    
    for sub_question in sub_questions:
        
        # Step 3: Retrieve chunks relevant to the sub-question
        retrieved_docs = retriever.get_relevant_documents(sub_question)
        
        # Step 4: Run prompt → LLM → parse to get answer
        # Use retrieved documents and sub-question in RAG chain
        answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs, 
                                                                "question": sub_question})
        
        # Step 5: Append the result to our answer list
        rag_results.append(answer)
    
    return rag_results,sub_questions

## 📝 Format Q&A Results for Display

This helper function takes in a list of sub-questions and their answers, and formats them into a readable block of text.

It numbers each question and answer pair for easy review.


In [34]:
def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""
    
    formatted_string = ""
    
    # Enumerate over each (question, answer) pair
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        # Format the pair and append to the final string
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    
    # Remove trailing newline and return
    return formatted_string.strip()


## ☁️ Using Azure OpenAI for Embeddings & Generation

In [None]:
open_ai_embeddings = Embeddings.azure_openai()

vectorstore_openai = FAISS.from_documents(
    texts,
    open_ai_embeddings
)

azure_retriever = vectorstore_openai.as_retriever()

# 🤖 Initialize Azure OpenAI Chat Model (LLM)

open_ai_llm = LLMCall.azure_openai()

generate_queries_decomposition = ( prompt_decomposition | open_ai_llm | StrOutputParser() | (lambda x: x.split("\n")))

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag_template, generate_queries_decomposition, open_ai_llm, azure_retriever)

context = format_qa_pairs(questions, answers)


In [None]:
final_rag_chain = (
    final_prompt
    | open_ai_llm
    | StrOutputParser()
)

ans = final_rag_chain.invoke({"context":context,"question":question})

In [25]:
ans

'Yes, there is a warranty on the phone. Typically, smartphones, including Apple devices, come with a one-year limited warranty that covers defects in materials and workmanship. This warranty does not cover normal wear and tear or damage caused by accidents or abuse. If you have any further questions or need assistance regarding your Apple mobile, feel free to ask!'

## 🤗 Using Hugging Face for Embeddings & Generation

In [None]:
huggingface_embeddings = Embeddings.huggingface()

vectorstore_hf = FAISS.from_documents(
    texts,
    huggingface_embeddings)

retriever_hf = vectorstore_hf.as_retriever()

huggingface_llm = LLMCall.huggingface()

generate_queries_decomposition = ( prompt_decomposition | huggingface_llm | StrOutputParser() | (lambda x: x.split("\n")))

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag_template, generate_queries_decomposition, huggingface_llm, retriever_hf)

context = format_qa_pairs(questions, answers)

In [None]:
final_rag_chain = (
    final_prompt
    | huggingface_llm
    | StrOutputParser()
)

ans = final_rag_chain.invoke({"context":context,"question":question})

In [30]:
ans

'Yes, there is a warranty on the phone. Typically, smartphones, including Apple devices, come with a one-year limited warranty that covers defects in materials and workmanship. This warranty does not cover normal wear and tear or damage caused by accidents or abuse. If you have any further questions or need assistance regarding your Apple mobile, feel free to ask!'

## 🦙 Using Ollama for Local LLM Inference

In [None]:
ollama_llm = LLMCall.chat_ollama()

vectorstore_ollama = FAISS.from_documents(
    texts,
    huggingface_embeddings)

retriever_ollama = vectorstore_ollama.as_retriever()

generate_queries_decomposition = ( prompt_decomposition | ollama_llm | StrOutputParser() | (lambda x: x.split("\n")))

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag_template, generate_queries_decomposition, ollama_llm, retriever_ollama)
context = format_qa_pairs(questions, answers)

In [None]:
final_rag_chain = (
    final_prompt
    | ollama_llm
    | StrOutputParser()
)

ans = final_rag_chain.invoke({"context":context,"question":question})

In [29]:
ans

'Yes, there is a warranty on the phone. Typically, smartphones, including Apple devices, come with a one-year limited warranty that covers defects in materials and workmanship. This warranty does not cover normal wear and tear or damage caused by accidents or abuse. If you have any further questions or need assistance regarding your Apple mobile, feel free to ask!'

## ⚡ Using Groq Inference API

In [35]:
groq_llm = LLMCall.chat_groq()

vectorstore_groq = FAISS.from_documents(
    texts,
    open_ai_embeddings)

retriever_groq = vectorstore_groq.as_retriever()

generate_queries_decomposition = ( prompt_decomposition | groq_llm | StrOutputParser() | (lambda x: x.split("\n")))

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag_template, generate_queries_decomposition, groq_llm, retriever_groq)
context = format_qa_pairs(questions, answers)

In [36]:
final_rag_chain = (
    final_prompt
    | groq_llm
    | StrOutputParser()
)

ans = final_rag_chain.invoke({"context":context,"question":question})

In [37]:
ans

"Yes, there is a warranty on the phone. The standard warranty period offered by Apple, the phone's manufacturer, is one year from the date of original retail purchase. This warranty covers defects in materials and workmanship, but it does not cover normal wear and tear or damage caused by accident or abuse. If you'd like to know more about what's covered under the warranty or how to file a claim, I'd be happy to help with that as well."

<!-- Font Awesome CDN (Add in <head> if not already included) -->
<link
  rel="stylesheet" 
  href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.0/css/all.min.css"
/>

<!-- Social Footer Section -->
<div style="
  background-color:rgb(199, 195, 195);
  padding: 40px 30px;
  border-radius: 20px;
  box-shadow: 0 4px 12px rgba(0,0,0,0.08);
  font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
  font-size: 18px;
  max-width: 900px;
  margin: 60px auto 30px;
  text-align: center;
  color: #444;
">
<!-- End of Notebook Note -->
  <h2 style="margin-bottom: 10px;">📘 End of Notebook</h2>
  <p style="color: #666; font-size: 14px;">
    Thank you for exploring! Feel free to connect via the links below.
  </p>

  <!-- Social Icons -->
<div style="
  display: flex;
  gap: 25px;
  align-items: center;
  flex-wrap: wrap;
  justify-content: center;
  margin-bottom: 25px;
">
  <!-- LinkedIn -->
  <a href="https://www.linkedin.com/in/ChiragB254" target="_blank" style="text-decoration: none; color: #0077b5;">
    <i class="fab fa-linkedin fa-lg"></i> LinkedIn
  </a>

  <!-- GitHub -->
  <a href="https://github.com/ChiragB254" target="_blank" style="text-decoration: none; color: #333;">
    <i class="fab fa-github fa-lg"></i> GitHub
  </a>

  <!-- Instagram -->
  <a href="https://www.instagram.com/data.scientist_chirag" target="_blank" style="text-decoration: none; color: #E1306C;">
    <i class="fab fa-instagram fa-lg"></i> Instagram
  </a>

  <!-- Email -->
  <a href="mailto:devchirag27@gmail.com" style="text-decoration: none; color: #D44638;">
    <i class="fas fa-envelope fa-lg"></i> Email
  </a>

  <!-- X (Twitter) -->
  <a href="https://x.com/ChiragB254" target="_blank" style="text-decoration: none; color: #000;">
    <i class="fab fa-x-twitter fa-lg"></i> X.com
  </a>
  </div>

  <p style="font-size: 13px; color: black; font-style: italic; margin-top: 8px;">
    <strong>Made with ❤️ by Chirag Bansal</strong>
  </p>
</div>

