### 📖 Where We Are

**In the last notebook**, we explored **Query Expansion**, a technique to make a simple user query more comprehensive to improve retrieval recall.

**In this notebook**, we'll tackle the opposite problem: what to do when a user's query is too **complex**. We'll learn about **Query Decomposition**, a powerful strategy where we use an LLM to break down a single, multi-part question into several simpler sub-questions. This allows our RAG system to find focused, high-quality context for each part of the original query, leading to more complete and accurate answers.

### 1. What is Query Decomposition?

Query decomposition is the process of taking a complex, multi-part question and breaking it into simpler, atomic sub-questions that can each be retrieved and answered individually. The final answers are then combined to address the original, complex query.

#### ✅ Why Use Query Decomposition?

- **Improves Retrieval**: A single vector search for a complex query (e.g., "compare A and B") may not find the best documents for either A or B individually.
- **Reduces Missed Information**: It ensures that every part of the user's question is addressed.
- **Enables Multi-Hop Reasoning**: It allows the system to gather facts from different documents and combine them to answer a question that isn't explicitly answered in any single chunk.

In [1]:
# --- Standard LangChain and Environment Imports ---
import os
from dotenv import load_dotenv
from langchain.chat_models import init_chat_model
from langchain.prompts import PromptTemplate
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain.chains.combine_documents import create_stuff_documents_chain

In [2]:
# Load API keys from .env file
load_dotenv()
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
llm = init_chat_model(model="groq:gemma2-9b-it")

In [3]:
# --- 1. Load and Index Documents ---
# This follows the standard RAG indexing pipeline.
loader = TextLoader("langchain_crewai_dataset.txt")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(docs)

embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embedding)
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 4, "lambda_mult": 0.7})

  from .autonotebook import tqdm as notebook_tqdm


### 2. Creating the Decomposition Chain
This is the first step in our pipeline. We create a dedicated chain that uses an LLM to break down the user's complex question into a list of simpler sub-questions.

In [4]:
# We use a specific prompt to instruct the LLM to act as a query decomposer.
# The prompt guides the model to identify and separate the distinct parts of the original question.
decomposition_prompt = PromptTemplate.from_template("""
You are an expert at query decomposition. Your task is to break down a complex user question into 2-4 simpler, self-contained sub-questions. 
These sub-questions will be used to retrieve relevant documents.

Question: "{question}"

Sub-questions (one per line):
""")

# Create a simple LCEL chain for this decomposition task.
decomposition_chain = decomposition_prompt | llm | StrOutputParser()

In [5]:
# Let's test the decomposition chain in isolation.
query = "How does LangChain use memory and agents compared to CrewAI?"
decomposed_questions = decomposition_chain.invoke({"question": query})

print("Original Query:", query)
print("\nDecomposed Sub-Questions:")
print(decomposed_questions)

Original Query: How does LangChain use memory and agents compared to CrewAI?

Decomposed Sub-Questions:
1. What types of memory mechanisms does LangChain support?
2. How do agents function in LangChain?
3. What memory capabilities does CrewAI offer?
4. How do agents work in CrewAI? 





### 3. Building the Full RAG Pipeline
Now we'll create a function that orchestrates the entire process:
1.  Take the user's query.
2.  Send it to the `decomposition_chain` to get the sub-questions.
3.  For each sub-question, run a standard RAG process (retrieve -> generate answer).
4.  Combine the answers into a final response.

In [6]:
# First, we need a standard QA chain. This chain will be called for EACH sub-question.
qa_prompt = PromptTemplate.from_template("""
Use the following context to answer the question.
Context: {context}
Question: {input}
Answer:""")
qa_chain = create_stuff_documents_chain(llm=llm, prompt=qa_prompt)

# Now, create the function that ties everything together.
def full_query_decomposition_rag_pipeline(user_query: str) -> str:
    # 1. Decompose the original query into sub-questions.
    sub_qs_text = decomposition_chain.invoke({"question": user_query})
    # 2. Clean up the LLM's output to get a clean list of questions.
    sub_questions = [q.strip() for q in sub_qs_text.strip().split("\n") if q.strip()]
    
    # 3. Initialize an empty list to store the results from each sub-query.
    results = []
    
    print(f"--- Decomposed into {len(sub_questions)} sub-questions ---")
    
    # 4. Loop through each sub-question and execute a RAG chain for it.
    for subq in sub_questions:
        # a. Retrieve relevant documents for the *current* sub-question.
        docs = retriever.invoke(subq)
        # b. Use the QA chain to generate an answer for the sub-question.
        result = qa_chain.invoke({"input": subq, "context": docs})
        # c. Store the formatted Q&A pair.
        results.append(f"Sub-Question: {subq}\nAnswer: {result}")
    
    # 5. Combine all the individual answers into a single final string.
    return "\n\n".join(results)

In [7]:
# --- 4. Run the Full Pipeline ---
final_answer = full_query_decomposition_rag_pipeline(query)
print("\n✅ Final Synthesized Answer:\n")
print(final_answer)

--- Decomposed into 4 sub-questions ---

✅ Final Synthesized Answer:

Sub-Question: 1. What types of memory mechanisms does LangChain utilize?
Answer: LangChain uses memory modules like **ConversationBufferMemory** and **ConversationSummaryMemory**.  


Sub-Question: 2. How do agents function within the LangChain framework?
Answer: LangChain agents function using a **planner-executor model**.  Here's a breakdown:

* **Planner:** The agent uses its understanding of the task and available tools to devise a sequence of actions (tool invocations) needed to achieve the goal. This involves reasoning, decision-making, and potentially branching logic based on the situation.
* **Executor:** The agent then carries out the planned actions, interacting with the tools and processing their outputs.  

Crucially, LangChain agents maintain **context-aware memory** across these steps. This means they remember previous actions, tool outputs, and other relevant information to make informed decisions in s

### 🔑 Key Takeaways

* **Decomposition for Complexity**: Query Decomposition is the ideal strategy when dealing with complex, multi-part user questions that a single search would struggle with.
* **LLM as a Query Planner**: The core of this technique is using an LLM with a specific prompt to act as a "query planner," breaking a large task into smaller, manageable sub-tasks (the sub-questions).
* **Divide and Conquer**: The strategy follows a "divide and conquer" approach. By breaking down the problem, the RAG system can retrieve highly focused context for each sub-question, leading to more accurate individual answers.
* **Orchestration is Key**: The implementation requires a controlling function or a more advanced framework (like LangGraph) to manage the multi-step process: decompose, loop through sub-queries, execute RAG for each, and synthesize the final result.
* **Performance Trade-off**: This is a powerful but resource-intensive method. It involves multiple LLM calls and retrieval steps, leading to higher latency and cost compared to a single RAG query.