# **What is Query Decomposition?**
Query decomposition is the process of taking a complex, multi-part question and breaking it into simpler, atomic sub-questions that can each be retrieved and answered individually.

**Why Use Query Decomposition?**

- Complex queries often involve multiple concepts
- LLMs or retrievers may miss parts of the original question
- It enables multi-hop reasoning (answering in steps)
- Allows parallelism (especially in multi-agent frameworks)

In [1]:
from langchain.chat_models import init_chat_model
from langchain.prompts import PromptTemplate
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables import RunnableSequence

In [None]:
# Step 1: Load and embed the document
loader = TextLoader("langchain_crewai_dataset.txt")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(docs)

embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embedding)
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 4, "lambda_mult": 0.7})

In [3]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

llm=init_chat_model(model="groq:gemma2-9b-it")
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000001FFD9A0B380>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001FFD97D4050>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [4]:
# Step 3: Query decomposition
decomposition_prompt = PromptTemplate.from_template("""
You are an AI assistant. Decompose the following complex question into 2 to 4 smaller sub-questions for better document retrieval.

Question: "{question}"

Sub-questions:
""")
decomposition_chain = decomposition_prompt | llm | StrOutputParser()

In [None]:
query = "How does LangChain use memory and agents compared to CrewAI?"
decomposition_question=decomposition_chain.invoke({"question": query})

In [6]:
print(decomposition_question)

Here's a breakdown of the complex question into smaller sub-questions:

1. **What types of memory mechanisms does LangChain offer for its models?**  (This focuses on LangChain's specific memory capabilities)
2. **How do LangChain agents utilize memory in their decision-making processes?** (This delves into the practical application of memory within LangChain agents)
3. **What memory and retrieval strategies does CrewAI employ for its AI agents?** (This shifts the focus to CrewAI's approach to memory)
4. **How do the memory and agent functionalities of LangChain and CrewAI differ in terms of implementation and capabilities?** (This encourages a comparative analysis) 


These sub-questions target different aspects of the original query, allowing for more focused and precise document retrieval. 



In [7]:
# Step 4: QA chain per sub-question
qa_prompt = PromptTemplate.from_template("""
Use the context below to answer the question.

Context:
{context}

Question: {input}
""")
qa_chain = create_stuff_documents_chain(llm=llm, prompt=qa_prompt)

In [8]:
# Step 5: Full RAG pipeline logic
def full_query_decomposition_rag_pipeline(user_query):
    # Decompose the query
    sub_qs_text = decomposition_chain.invoke({"question": user_query})
    sub_questions = [q.strip("-•1234567890. ").strip() for q in sub_qs_text.split("\n") if q.strip()]
    
    results = []
    for subq in sub_questions:
        docs = retriever.invoke(subq)
        result = qa_chain.invoke({"input": subq, "context": docs})
        results.append(f"Q: {subq}\nA: {result}")
    
    return "\n\n".join(results)

In [9]:
# Step 6: Run
query = "How does LangChain use memory and agents compared to CrewAI?"
final_answer = full_query_decomposition_rag_pipeline(query)
print("✅ Final Answer:\n")
print(final_answer)

✅ Final Answer:

Q: Here are some sub-questions to break down the complex question:
A: Please provide the complex question! I need to know what question you're trying to break down before I can provide sub-questions. 

For example, you could ask:

* "How does CrewAI work?"
* "What are the advantages of using LangChain agents?"
* "Can you give me an example of a multi-step workflow that could be improved by CrewAI?"


Once you give me the complex question, I can help you break it down into more manageable sub-questions.  



Q: **What types of memory mechanisms does LangChain employ?** (This focuses on LangChain's specific memory capabilities)
A: LangChain uses memory modules such as **ConversationBufferMemory** and **ConversationSummaryMemory**. 


These modules allow the LLM to remember past conversation turns and summarize long interactions. 


Q: **How do LangChain agents utilize memory?** (This explores the application of memory within LangChain's agent framework)
A: LangChain agen

## **Multi-Hop (Multi-Step) Reasoning in Large Language Models (LLMs)**

**Multi-hop reasoning**, also known as **multi-step reasoning**, refers to an LLM’s ability to **connect multiple pieces of information across different contexts or steps** to arrive at a final answer or decision.
It’s a key capability that separates **surface-level retrieval** from **deep, logical comprehension**, enabling the model to reason like a human — combining facts, deducing relationships, and forming conclusions through a sequence of reasoning steps.

---

## **What is Multi-Hop Reasoning?**

Multi-hop reasoning means an LLM performs **sequential inferencing**, where **each reasoning step builds upon the previous one** to solve complex tasks.

**Example 1:**

> **Question:** “Where was the author of *Pride and Prejudice* born?”

**Reasoning path:**

1. Identify the author of *Pride and Prejudice* → Jane Austen.
2. Retrieve the birthplace of Jane Austen → Steventon, Hampshire, England.
3. **Final Answer:** Steventon, Hampshire, England.

Here, the model needs to **combine two separate facts** — one about the author and another about her birthplace — instead of relying on a single fact lookup.

---

**Why Multi-Hop Reasoning Matters**

Most real-world tasks require **chained logic** or reasoning beyond one-hop retrieval.
For example:

* Legal analysis: linking cases and laws.
* Scientific reasoning: combining experimental results with theories.
* RAG systems: synthesizing multi-document knowledge.
* Conversational AI: maintaining long context and logical continuity.

---

**How Multi-Hop Reasoning Works**

There are **two primary mechanisms** by which LLMs perform multi-step reasoning:

**A. Implicit Reasoning (Internal Reasoning)**

The model performs reasoning *within its internal attention and hidden states*, using its trained parameters to simulate reasoning.

**Mechanism:**

* The model encodes the question and relevant context.
* During decoding, it implicitly models dependencies across concepts.
* It “jumps” across related information within its attention mechanism.

**Limitation:** This is opaque (“black-box”) and hard to control or verify.

---

**B. Explicit Reasoning (Externalized or Step-by-Step Reasoning)**

The model is guided to **show its reasoning process explicitly**, usually through **prompt engineering**, **chain-of-thought (CoT)**, or **tool-assisted reasoning**.

**Techniques Include:**

| Method                               | Description                                                                      | Example                         |
| ------------------------------------ | -------------------------------------------------------------------------------- | ------------------------------- |
| **Chain-of-Thought (CoT) Prompting** | Ask the model to "think step by step"                                            | “Let’s think step-by-step: …”   |
| **Self-Consistency**                 | Generate multiple reasoning paths and choose the majority result                 | Reduces random reasoning errors |
| **Tool-Augmented Reasoning**         | Use external tools (like calculators, retrievers, or Python) to assist reasoning | LangChain Agents calling APIs   |
| **Decomposition-based Reasoning**    | Breaks a query into sub-questions and solves each                                | “Who → Where → What” structure  |
| **Tree of Thoughts (ToT)**           | Explore multiple reasoning branches before deciding                              | Simulates search tree reasoning |

---

**Multi-Hop Reasoning in RAG Systems**

When integrated into **Retrieval-Augmented Generation (RAG)**, multi-hop reasoning allows LLMs to:

1. Retrieve multiple documents across hops (hop-1 → hop-2 → hop-n).
2. Extract relevant parts from each document.
3. Chain them together logically to form a coherent answer.

**Example:**

> **Question:** “Which company acquired the startup founded by the creator of Instagram?”

**Reasoning Path:**

1. Who founded Instagram? → Kevin Systrom, Mike Krieger.
2. What startup did they found later? → Artifact.
3. Which company acquired Artifact? → Yahoo.
4. **Answer:** Yahoo acquired Artifact, the startup founded by the creators of Instagram.

This process requires **multi-document retrieval** and **sequential logic chaining**.

---

**Architectural Mechanisms Behind Multi-Hop Reasoning**

| Component                               | Role                                                                          |
| --------------------------------------- | ----------------------------------------------------------------------------- |
| **Attention Mechanisms**                | Allow the model to connect tokens or concepts across long sequences.          |
| **Transformers Layers**                 | Learn relationships across distant concepts.                                  |
| **Memory Modules (External)**           | Store and retrieve intermediate reasoning states.                             |
| **Tool Integration (LangChain, ReAct)** | Externalizes reasoning by invoking search, calculation, or APIs for each hop. |
| **Reasoning Loops / LCEL**              | Enable iterative “think → act → observe” loops until convergence.             |

---

**Example: Multi-Hop Reasoning Chain with LangChain**

```python
from langchain.chains import SequentialChain, LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate

# Step 1: Find who founded Instagram
prompt1 = PromptTemplate(input_variables=["question"], 
                         template="Who founded Instagram?")
chain1 = LLMChain(llm=ChatOpenAI(model="gpt-4o-mini"), prompt=prompt1, output_key="founders")

# Step 2: Find which startup they later founded
prompt2 = PromptTemplate(input_variables=["founders"], 
                         template="Which startup was founded later by {founders}?")
chain2 = LLMChain(llm=ChatOpenAI(model="gpt-4o-mini"), prompt=prompt2, output_key="startup")

# Step 3: Find which company acquired that startup
prompt3 = PromptTemplate(input_variables=["startup"], 
                         template="Which company acquired {startup}?")
chain3 = LLMChain(llm=ChatOpenAI(model="gpt-4o-mini"), prompt=prompt3, output_key="acquirer")

# Sequential multi-hop reasoning chain
multi_hop_chain = SequentialChain(chains=[chain1, chain2, chain3],
                                  input_variables=["question"],
                                  output_variables=["acquirer"])

response = multi_hop_chain.invoke({"question": "Which company acquired the startup founded by the creator of Instagram?"})
print(response)
```

This chain decomposes reasoning into **3 logical hops**, each producing an intermediate result.

---

**Strengths of Multi-Hop Reasoning**

- **Improved Depth of Understanding:** Enables LLMs to handle complex, compositional questions.
- **Better Interpretability:** Step-by-step reasoning can be inspected and validated.
- **Generalizable:** Works for text, tables, code, or multi-modal reasoning.
- **Supports RAG Pipelines:** Multi-hop retrieval allows deeper document linking.

---

**Limitations and Challenges**

| Limitation                             | Description                                                                   |
| -------------------------------------- | ----------------------------------------------------------------------------- |
| **Hallucination Propagation**          | Errors early in reasoning can propagate across steps, compounding mistakes.   |
| **Context Window Limitations**         | LLMs may lose context in long reasoning chains.                               |
| **Lack of True Logical Understanding** | Models simulate reasoning but don’t “understand” logic in a symbolic sense.   |
| **Computation Overhead**               | Multi-hop reasoning requires more tokens, leading to higher latency and cost. |
| **Non-Deterministic Outputs**          | Same query may yield different reasoning paths across runs.                   |
| **Difficulty in Evaluation**           | Hard to measure correctness of intermediate reasoning steps.                  |

---

**Techniques to Improve Multi-Hop Reasoning**

| Technique                       | Description                                                        | Example                                |
| ------------------------------- | ------------------------------------------------------------------ | -------------------------------------- |
| **Chain-of-Thought Prompting**  | Encourage explicit reasoning steps                                 | “Let’s think step by step.”            |
| **Self-Reflection (Reflexion)** | LLM re-evaluates and corrects its previous reasoning               | Used in agentic frameworks             |
| **ReAct Framework**             | Combines reasoning and acting (uses tools during reasoning)        | Think → Search → Answer                |
| **Memory-Augmented RAG**        | Keeps track of reasoning steps for long queries                    | Helps multi-document reasoning         |
| **Tree-of-Thought (ToT)**       | Explores multiple reasoning paths and prunes bad ones              | Used for mathematical or logical tasks |
| **Iterative LCEL Pipelines**    | LangChain Expression Language enables dynamic multi-step execution | `RunnableSequence`, `RunnableBranch`   |

---

**Problems in Current LLM Multi-Hop Reasoning**

* **Brittleness**: Minor rephrasing of a question may break reasoning chains.
* **Overfitting to Patterns**: Models follow memorized reasoning templates instead of generalizing.
* **Opaque Intermediate Steps**: Implicit reasoning steps are hard to trace.
* **Error Amplification**: Misstep at one hop corrupts all later reasoning.
* **Tool Misuse**: In ReAct-style agents, LLMs sometimes call irrelevant tools or repeat actions.

---

**Future Directions and Research Trends**

1. **Neural-Symbolic Reasoning:**
Combining symbolic logic systems with neural LLMs for verifiable reasoning.

2. **Memory-Augmented Transformers:**
Integrating persistent long-term memory modules for multi-hop consistency.

3. **Dynamic RAG Pipelines:**
Adaptive retrievers that perform iterative multi-hop document linking.

4. **Tree-Based Reasoning Models:**
Tree of Thoughts (ToT) and Graph of Thoughts (GoT) architectures for exploring branching reasoning paths.

5. **Self-Verification Loops:**
LLMs validating their own intermediate results using secondary models.

---

**Summary Table**

| Aspect              | Description                                                |
| ------------------- | ---------------------------------------------------------- |
| **Definition**      | Connecting multiple reasoning steps to reach an answer     |
| **Goal**            | Combine different pieces of knowledge logically            |
| **Mechanisms**      | Attention, CoT, ToT, external tools                        |
| **Advantages**      | Deeper understanding, better retrieval, improved RAG       |
| **Limitations**     | Hallucinations, context loss, cost                         |
| **Best Frameworks** | LangChain, ReAct, LCEL, Tree-of-Thoughts                   |
| **Use Cases**       | Question answering, scientific discovery, reasoning agents |

---

**Key Takeaways**

* Multi-hop reasoning enables **deep, compositional logic** in LLMs.
* It’s central to **advanced RAG**, **agentic reasoning**, and **knowledge synthesis**.
* Best implemented via **structured prompting**, **tool integration**, and **iterative reasoning loops**.
* Still limited by hallucinations and lack of explainability — ongoing research aims to bridge the gap between **neural reasoning** and **true logical inference**.

## **Query Decomposition in RAG Systems**

**What is Query Decomposition?**

**Query Decomposition** is the process of **breaking down a complex question into simpler, smaller sub-queries**, so that each sub-query can be answered (or retrieved) independently — and then **combined logically** to form a final, coherent answer.

It’s a **core technique in multi-hop reasoning RAGs**, used when a single retrieval operation cannot fetch all relevant information.

---

**Why Query Decomposition Is Needed in RAG**

Traditional RAG pipelines perform:

1. A single query to a retriever (e.g., vector database like Pinecone/FAISS)
2. Then send retrieved chunks to the LLM for synthesis.

This **fails when:**

* The question requires *multiple reasoning hops.*
* Needed facts are *spread across different documents or sources.*
* The retriever retrieves semantically similar but *incomplete* information.

**Example:**

> ❓“Which startup founded by the creators of Instagram was later acquired by Yahoo?”

→ Single retrieval may fail because:

* “Instagram creators” and “Yahoo acquisition” occur in *different documents.*

---

**What Query Decomposition Solves**

| Problem                    | Solution via Query Decomposition             |
| -------------------------- | -------------------------------------------- |
| Complex, multi-hop queries | Break into smaller, focused sub-queries      |
| Scattered facts            | Retrieve relevant context for each sub-part  |
| Retrieval inefficiency     | Perform multiple, precise searches           |
| LLM confusion              | Simplify reasoning chain for better accuracy |

---

**How Query Decomposition Works**

**Step-by-Step Process**

1. **Input Query:**
   A complex or multi-hop natural language question.

2. **Decomposition Phase (LLM-based):**
   The question is *decomposed* into multiple, smaller questions — often using an LLM (like GPT-4) or a custom prompt.

3. **Retrieval Phase:**
   Each sub-question is sent to a retriever (e.g., Pinecone, ChromaDB, Elasticsearch) to fetch relevant context.

4. **Synthesis Phase:**
   Retrieved contexts are **merged or chained** and fed into the generator (LLM) to compose the final answer.

---

**Example: Step-by-Step**

**Input Question:**

> “Which company acquired the startup founded by the creator of Instagram?”

**Decomposition:**

1. Who founded Instagram?
2. What startup did that person later found?
3. Which company acquired that startup?

**Retrieval:**

Each sub-question retrieves its own context:

* Q1 → [Kevin Systrom, Mike Krieger]
* Q2 → [Artifact startup]
* Q3 → [Yahoo acquisition]

**Synthesis:**

LLM combines results:

> “Yahoo acquired Artifact, a startup founded by Instagram’s creators Kevin Systrom and Mike Krieger.”

---

**Decomposition Techniques**

| Technique                              | Description                                                  | Example / Implementation                                       |
| -------------------------------------- | ------------------------------------------------------------ | -------------------------------------------------------------- |
| **Rule-based**                         | Predefined templates split compound questions                | Detect conjunctions (“and”, “after”, “who”)                    |
| **LLM-based Decomposition**            | Use an LLM to rewrite a query into sub-queries               | `Prompt: “Decompose this question into smaller sub-questions”` |
| **Dependency Parsing**                 | Use NLP parse trees to detect relationships                  | “Who founded X?” → “Who acquired Y?”                           |
| **Semantic Graph Decomposition**       | Convert query into graph nodes and edges                     | Uses knowledge graphs                                          |
| **Iterative Decomposition (Self-Ask)** | LLM answers one sub-question, generates the next dynamically | Used in “Self-Ask with Search” approach                        |

---

**Example Implementation in LangChain**

Here’s how to perform **LLM-driven Query Decomposition** in a RAG pipeline:

```python
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, SequentialChain
from langchain.chat_models import ChatOpenAI

# Step 1: LLM to decompose query
decomposition_prompt = PromptTemplate(
    input_variables=["question"],
    template="Decompose the following question into smaller, logically connected sub-questions:\n\n{question}"
)

decomposition_chain = LLMChain(
    llm=ChatOpenAI(model="gpt-4o-mini"),
    prompt=decomposition_prompt,
    output_key="sub_questions"
)

# Step 2: Retrieve for each sub-question (pseudo example)
def retrieve_for_sub_questions(sub_questions, retriever):
    docs = []
    for q in sub_questions.split("\n"):
        docs.extend(retriever.get_relevant_documents(q))
    return docs

# Step 3: Synthesize final answer
synthesis_prompt = PromptTemplate(
    input_variables=["question", "context"],
    template="Using the context below, answer the question:\nQuestion: {question}\n\nContext: {context}"
)

synthesis_chain = LLMChain(
    llm=ChatOpenAI(model="gpt-4o-mini"),
    prompt=synthesis_prompt,
    output_key="final_answer"
)
```

This **LCEL-style pipeline** can be fully modularized with `RunnableSequence` or `RunnableMap` for parallel retrieval.

---

**Query Decomposition in LCEL (LangChain Expression Language)**

LCEL allows creating structured workflows like:

```python
from langchain.schema.runnable import RunnableSequence, RunnableMap

query_decomposition_rag = RunnableSequence([
    decomposition_chain,
    RunnableMap({
        "context": lambda x: retrieve_for_sub_questions(x["sub_questions"], retriever),
        "question": lambda x: x["question"]
    }),
    synthesis_chain
])
```

This lets you **decompose, retrieve, and synthesize** in a single expression graph.

---

**Integration with Hybrid Retrieval**

Query decomposition can work alongside **hybrid retrieval** (dense + sparse retrievers):

* Sub-query 1 → Vector DB (semantic context)
* Sub-query 2 → ElasticSearch/BM25 (keyword facts)
* Combine results → Final synthesis

This improves **recall** (fetching all relevant data) and **precision** (focusing on relevance).

---

**Advantages**

| Benefit                                   | Explanation                                            |
| ----------------------------------------- | ------------------------------------------------------ |
| **Handles Multi-Hop Reasoning**        | Enables reasoning over multi-document knowledge        |
| **Improves Retrieval Accuracy**        | Smaller, targeted sub-queries lead to higher precision |
| **Simplifies LLM Processing**          | Each hop is easier for the model to understand         |
| **Reduces Hallucination**              | Facts are retrieved explicitly, not guessed            |
| **Scalable for Large Knowledge Bases** | Works across multiple retrievers or databases          |

---

**Limitations**

| Limitation                     | Description                                                  |
| ------------------------------ | ------------------------------------------------------------ |
| **Error Propagation**          | Mistake in one sub-query can affect the entire chain         |
| **Increased Latency**          | Multiple retrievals = higher time/cost                       |
| **Ambiguous Decomposition**    | LLM may generate incorrect or overlapping sub-queries        |
| **Context Merging Complexity** | Combining retrieved chunks can cause redundancy or conflicts |

---

**Best Practices**

- **Prompt-Driven Decomposition:**
Guide the LLM explicitly — e.g., “Break this into minimal, logically dependent sub-questions.”

- **Limit Sub-Query Count:**
Keep decomposition to 3–5 hops to control cost and drift.

- **Filter Duplicates:**
Avoid retrieving overlapping content.

- **Score and Merge Results:**
Use reranking (BM25 + cosine similarity) before synthesis.

- **Add Memory Tracking:**
Store intermediate answers in memory to support iterative reasoning.

---

**Real-World Use Cases**

| Domain                 | Application                                                                 |
| ---------------------- | --------------------------------------------------------------------------- |
| **Legal AI**           | Breaking long legal questions into smaller precedent lookups                |
| **Finance**            | Decomposing investment analysis queries into company and market sub-queries |
| **Biomedical RAG**     | Multi-step retrieval across research papers                                 |
| **Customer Support**   | Chaining user issues → product → solution database queries                  |
| **Knowledge Graph QA** | Mapping sub-queries to nodes/edges in KG                                    |

---

**Query Decomposition + Multi-Hop Reasoning (Synergy)**

These two concepts often **coexist** in advanced RAG systems:

| Stage               | Operation                                  |
| ------------------- | ------------------------------------------ |
| Decomposition       | Break complex question → Sub-questions     |
| Multi-hop Reasoning | Connect sub-answers → Logical chain        |
| Final Generation    | Compose final, contextually aware response |

Together, they enable **deep retrieval + structured reasoning** → the foundation of **next-gen agentic RAGs**.

---

**In Summary**

| Aspect             | Description                                                     |
| ------------------ | --------------------------------------------------------------- |
| **Definition**     | Splitting a complex question into smaller, simpler sub-queries  |
| **Goal**           | Enable accurate retrieval and synthesis for multi-hop reasoning |
| **Key Techniques** | LLM-based decomposition, self-ask, LCEL chains                  |
| **Benefits**       | Better accuracy, less hallucination, improved retrieval recall  |
| **Challenges**     | Latency, error propagation, merging results                     |
| **Used In**        | Advanced RAG, agent frameworks, multi-hop QA                    |

---

**In short:**

> *Query decomposition is the “divide and conquer” strategy of RAG systems — breaking down complex user questions into smaller retrieval problems, each solvable with precision, and then recomposed through multi-hop reasoning for a complete, reliable answer.*
