## 🔁 **Query Transformations for Smarter Retrieval** | **RAG100X**

This notebook explores how transforming user queries can dramatically improve the quality of retrieval in RAG systems. Instead of relying only on the original (often vague or complex) query, we use LLMs to rewrite, broaden, or break it down — helping the retriever fetch **more accurate and complete results**.

✅ **Key Techniques Covered**
We build and test three query transformation strategies:

- **Query Rewriting** → Makes vague queries more specific  
- **Step-back Prompting** → Broadens narrow queries for better context  
- **Sub-query Decomposition** → Splits complex queries into simpler ones

Each transformed query is run through a retriever to compare against the original, helping us analyze which method gives the most useful chunks.

> 🛠️ **Why this matters in production:**  
In real-world RAG systems (e.g., customer support or legal research), users often ask unclear or overloaded questions. These techniques help reframe the query behind the scenes — so your system doesn’t miss critical documents or generate half-baked answers.

For example, a legal chatbot asked *"What did the 2022 ruling say about tax law in California?"* might miss key background. But a **step-back prompt** like *"What are the major 2022 tax rulings in California?"* ensures broader, more relevant context gets retrieved.

---

### 🔄 **How This Fits into RAG100X**

So far, RAG100x has covered:

1. PDF-based document QA  
2. CSV-based retrieval from structured data  
3. Blog-based RAG with hallucination checks  
4. Chunk-size tuning for better retrieval  
5. Propositional chunking for precision

Now in **Day 6**, we shift focus from *what* you retrieve to *how* you ask for it. Query transformation is a lightweight, modular layer you can add to almost any RAG system — and it works especially well when users submit complex or ambiguous inputs.

> 💡 Think of this as upgrading the "search intent" of your RAG — so your system starts from a better question, not just a better index.


## 📦 Installation & Setup

In [None]:
# Install required packages
!pip install langchain langchain-openai python-dotenv

In [None]:
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate

import os
from dotenv import load_dotenv

# Load environment variables from a .env file
load_dotenv()

# Set the OpenAI API key environment variable
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

## ✍️1. Query Rewriting for Better Retrieval

In production RAG systems, user queries are often **too vague** or ambiguous, which leads to poor or irrelevant document retrieval. Query rewriting improves this by **rephrasing queries to be more detailed and retrieval-friendly**.

We use GPT-4o (via OpenAI) to automatically rewrite the user's original question using a tailored prompt.

---

### 🔍 Key Components Explained

- **`ChatOpenAI`**  
  Loads the GPT-4o model for deterministic generation. Setting `temperature=0` ensures consistent outputs, which is important for reproducible retrieval.

- **`PromptTemplate`**  
  A reusable template that takes in the original query and formats it into a prompt that tells the LLM *exactly how to rewrite it*.

- **`|` Operator (LangChain's chain pipe)**  
  Chains the prompt and the model into a single callable object — this is syntactic sugar for connecting stages of an LLM pipeline.

- **`.invoke()`**  
  This runs the chained prompt + model on the given input and returns the LLM's response.

---

Now let’s define the rewriting chain.


In [None]:
# Load the GPT-4o model with deterministic behavior
re_write_llm = ChatOpenAI(temperature=0, model_name="gpt-4o", max_tokens=4000)

# Create a prompt template that tells the model how to rewrite the query
query_rewrite_template = """You are an AI assistant tasked with reformulating user queries to improve retrieval in a RAG system. 
Given the original query, rewrite it to be more specific, detailed, and likely to retrieve relevant information.

Original query: {original_query}

Rewritten query:"""

# Wrap the prompt into a LangChain template object
query_rewrite_prompt = PromptTemplate(
    input_variables=["original_query"],
    template=query_rewrite_template
)

# Chain the prompt and LLM using LangChain's pipe syntax
query_rewriter = query_rewrite_prompt | re_write_llm

# Define a simple function that takes a user query and returns the rewritten version
def rewrite_query(original_query):
    """
    Rewrite the original query to improve retrieval.
    
    Args:
    original_query (str): The original user query
    
    Returns:
    str: The rewritten query
    """
    response = query_rewriter.invoke(original_query)
    return response.content


## 🧪 Demonstration: Query Rewriting in Action

Let’s see how our query rewriting chain performs on a real-world example from the **“Understanding Climate Change”** dataset.

The goal is to show how a vague or general query can be rewritten into a **more specific and information-rich version**, which increases the chance of retrieving relevant chunks.

---

### 🧠 Why this matters

RAG pipelines are only as good as the queries they receive. If a user asks something too broad like _"How does climate change affect the environment?"_, the retriever might return irrelevant or generic chunks.

By rewriting the query to something like:

> “What are the specific environmental consequences of rising global temperatures due to climate change, such as sea-level rise, extreme weather, or biodiversity loss?”

...we give the retriever a **sharper signal** — and improve both recall and answer grounding.

Let’s try it:


In [None]:
# Example query from the "Understanding Climate Change" dataset
original_query = "What are the impacts of climate change on the environment?"

# Use the rewrite function defined earlier
rewritten_query = rewrite_query(original_query)

# Print the before and after
print("Original query:", original_query)
print("\nRewritten query:", rewritten_query)


## 🔄2. Step-back Prompting for Broader Context

Another powerful query transformation technique is **Step-back Prompting**.  
Instead of making a query more specific (like query rewriting), this strategy does the opposite — it **zooms out**.

---

### 🧠 Why Step-back Prompting?

Sometimes users ask **narrow or overly specific questions**, which makes it hard for the retriever to return relevant background info. Step-back prompting generates a **more general version** of the query that can surface **high-level context or foundational facts**.

This is especially useful when:
- The user skips context because they assume the system knows it.
- You want to combine fine-grained and broad retrievals to improve answer grounding.

---

### 🔍 Code Breakdown

- **`step_back_llm = ChatOpenAI(...)`**  
  Initializes a GPT-4o model with deterministic outputs (temperature=0) to ensure consistent step-back generation.

- **`step_back_template`**  
  Defines the prompt used to guide the LLM. It tells the model to generate a **more general version** of the user’s original query. This helps retrieve wider context from the vector store.

- **`PromptTemplate(...)`**  
  Converts the above text into a reusable format. We insert the user’s original query into `{original_query}`.

- **`step_back_chain = step_back_prompt | step_back_llm`**  
  Chains the template with the LLM — meaning we now have a ready-to-use component that takes a user query and outputs a broadened version.

- **`generate_step_back_query()`**  
  A helper function that invokes the LLM chain and extracts the broader query. This is what we’ll call during inference.

> 🛠️ **In practice**:  
A narrow query like _“How does climate change affect coral reefs?”_ might be hard to answer directly if your database lacks reef-specific chunks. A step-back query like _“How does climate change affect marine ecosystems?”_ increases the chance of finding relevant passages — even if the original term doesn’t exist in the index.


In [None]:
# 1. Initialize the LLM (GPT-4o) with low temperature for consistent outputs
step_back_llm = ChatOpenAI(
    temperature=0,            # Makes output deterministic (same input = same output)
    model_name="gpt-4o",      # High-quality OpenAI model for reasoning and rewriting
    max_tokens=4000           # Generous token limit for flexibility in generation
)

# 2. Define the prompt template used to generate step-back queries
step_back_template = """You are an AI assistant tasked with generating broader, more general queries to improve context retrieval in a RAG system.
Given the original query, generate a step-back query that is more general and can help retrieve relevant background information.

Original query: {original_query}

Step-back query:"""

# 3. Create a PromptTemplate object
# This allows us to format the above prompt dynamically with the user query
step_back_prompt = PromptTemplate(
    input_variables=["original_query"],
    template=step_back_template
)

# 4. Create an LLMChain by combining the prompt and LLM
# This allows us to treat it like a function: pass a query and get a step-back result
step_back_chain = step_back_prompt | step_back_llm

# 5. Wrap the chain in a Python function for reusability
def generate_step_back_query(original_query):
    """
    Generate a step-back query to retrieve broader context.
    
    Args:
        original_query (str): The original user query
    
    Returns:
        str: A broader, more general version of the query
    """
    response = step_back_chain.invoke(original_query)
    return response.content  # Extract and return the LLM's output

## 🧪 Demonstration: Step-back Prompting for Broader Context

In [None]:
# 🔄 Try step-back prompting on a climate-related question

# Original user query
original_query = "What are the impacts of climate change on the environment?"

# Generate a more general version of this query for background retrieval
step_back_query = generate_step_back_query(original_query)

# Show both queries side-by-side
print("Original query:", original_query)
print("\nStep-back query:", step_back_query)


### 🔍3. Sub-query Decomposition: Breaking Down Complex Queries

When users ask complex, multi-part questions, a single query often misses important context during retrieval.  
**Sub-query decomposition** is a technique where we break a long query into 2–4 focused, simpler questions — each targeting a different aspect of the original intent.

This has three major benefits:
- ✅ Improves recall by retrieving relevant documents for each sub-question
- ✅ Enables structured, multi-hop reasoning
- ✅ Leads to richer, more complete answers

---

#### 🧠 Why It Works

Think of it like asking a research assistant:

> "Tell me everything about climate change's environmental impact"

...vs...

> - "How does climate change affect oceans?"  
> - "What are the effects on agriculture?"  
> - "What about biodiversity and human health?"

By decomposing, you improve both **retrieval** and **answer generation**.

---

#### 🛠️ Key Components Explained

- **`PromptTemplate`**  
  Instructs the LLM to rephrase a query into multiple sub-queries using a clear example. Ensures output consistency.

- **`LLMChain` with GPT-4o**  
  Chains the prompt and LLM together. GPT-4o is used here for its reasoning ability and structured outputs.

- **`decompose_query()`**  
  This helper function runs the chain and returns a clean list of sub-queries by parsing the model’s output line by line.



In [None]:
# ✅ Load a GPT-4o model to perform sub-query decomposition
sub_query_llm = ChatOpenAI(temperature=0, model_name="gpt-4o", max_tokens=4000)

# 📝 Prompt template to instruct the model on how to break down queries
subquery_decomposition_template = """You are an AI assistant tasked with breaking down complex queries into simpler sub-queries for a RAG system.
Given the original query, decompose it into 2-4 simpler sub-queries that, when answered together, would provide a comprehensive response to the original query.

Original query: {original_query}

example: What are the impacts of climate change on the environment?

Sub-queries:
1. What are the impacts of climate change on biodiversity?
2. How does climate change affect the oceans?
3. What are the effects of climate change on agriculture?
4. What are the impacts of climate change on human health?"""

# 📦 Wrap the prompt using LangChain's PromptTemplate
subquery_decomposition_prompt = PromptTemplate(
    input_variables=["original_query"],
    template=subquery_decomposition_template
)

# 🔗 Create an LLMChain that ties the prompt to the LLM
subquery_decomposer_chain = subquery_decomposition_prompt | sub_query_llm

# 🔍 Function to decompose a query into sub-queries
def decompose_query(original_query: str):
    """
    Decompose the original query into simpler sub-queries.
    
    Args:
    original_query (str): The original complex query
    
    Returns:
    List[str]: A list of simpler sub-queries
    """
    # Invoke the LLM chain
    response = subquery_decomposer_chain.invoke(original_query).content

    # Clean and extract each sub-query line
    sub_queries = [q.strip() for q in response.split('\n') if q.strip() and not q.strip().startswith('Sub-queries:')]
    return sub_queries


## 🧪 Demonstration: Sub-query Decomposition: Breaking Down Complex Queries

In [None]:
# example query over the understanding climate change dataset
original_query = "What are the impacts of climate change on the environment?"
sub_queries = decompose_query(original_query)
print("\nSub-queries:")
for i, sub_query in enumerate(sub_queries, 1):
    print(sub_query)

---

## 📘 Summary & Credits

This notebook is based on the excellent open-source repository [RAG_Techniques by NirDiamant](https://github.com/NirDiamant/RAG_Techniques).  
I referred to that work to understand how the pipeline is structured and then reimplemented the same concept in a **fully self-contained** way, but using recent models — as part of my personal learning journey.

The purpose of this notebook is purely **educational**:  
- To deepen my understanding of Retrieval-Augmented Generation systems  
- To keep a clean, trackable log of what I’ve built and learned  
- And to serve as a future reference for myself or others starting from scratch

To support that, I’ve added clear, concise markdowns throughout the notebook — explaining *why* each package was installed, *why* each line of code exists, and *how* each component fits into the overall RAG pipeline. It’s designed to help anyone (including my future self) grasp the **how** and the **why**, not just the **what**.

## 🔄 Why Improve Queries in RAG?

Most RAG pipelines focus on chunking and retrieval, but **query quality is the real first bottleneck**. Poorly phrased queries lead to irrelevant or incomplete results — even if your chunks and embeddings are perfect.

This notebook explores **three advanced query transformation techniques** to supercharge retrieval:

- ✍️ **Query Rewriting** — Refines vague queries into better-formed, search-optimized versions
- 🔁 **Step-back Prompting** — Adds background context by generating broader, higher-level queries
- 🧩 **Sub-query Decomposition** — Splits complex queries into smaller, atomic sub-queries for multi-hop retrieval

Each method is:
- Powered by **GPT-4o via LangChain**
- Implemented as **modular, reusable LLM chains**
- Tested on the *Understanding Climate Change* dataset

---

## 🧠 What’s New in This Version?

Compared to earlier RAG builds, this version focuses on **query-centric improvements**, offering:

- 🧠 **LLM-first pre-retrieval transformation** — Enhance queries *before* sending them to the retriever  
- 🎯 **Custom prompt templates** — Fine-tuned instructions for consistent, high-quality output  
- 🧪 **Chainable logic** — Each transformation is self-contained and composable with other RAG tools  
- 🧼 **Production-friendly design** — No reliance on external modules, all logic lives inside the notebook for easy reproducibility

This design philosophy enables **plug-and-play enhancements** for production-grade retrieval systems.

---

## 📈 Inferences & Key Takeaways

From running the transformations on real examples:

- 🔍 **Rewriting consistently improves keyword alignment**, making FAISS and Chroma retrievals more relevant  
- 🌐 **Step-back prompting adds essential context** — especially for under-specified or abstract queries  
- 🧩 **Sub-query decomposition boosts completeness** — great for multi-hop or reasoning-heavy questions  
- 📊 Overall, combining these techniques leads to **higher-quality retrieval inputs**, which translate into **more accurate, grounded LLM outputs**

---

## 🚀 What Could Be Added Next?

To evolve this system into a full production-ready layer:

- 🔁 **Evaluate each method automatically** — Use LLM graders to compare retrieval relevance and answer faithfulness across transformed queries  
- 🧠 **Train a lightweight transformer rewriter** — Fine-tune a small model (e.g. DistilBERT) on rewritten vs. original queries for offline use  
- 🔌 **Plug into real retrievers** — Connect to Elastic, Weaviate, or pgvector to observe improvements at scale  
- 📊 **Add scoring dashboards** — Track how each transformation impacts hit rate, relevancy, and latency  
- 🧱 **Stack transformations dynamically** — Learn when to apply which technique (e.g. rewrite only if the query is vague, decompose only if it's long)

---

## 💡 Final Word

This notebook is part of my larger personal project: **RAG100x** — a challenge to build and log my journney in RAG from 0 100 in the coming months.

It’s not built to impress — it’s built to **progress**.  
Everything here is structured to enable **daily iteration**, focused experimentation, and clean documentation.

If you're exploring RAG from first principles, feel free to use this as a scaffold for your own builds. And of course — check out the original repository for broader implementations and ideas.

