### üß† What is Query Decomposition?
Query decomposition is the process of taking a complex, multi-part question and breaking it into simpler, atomic sub-questions that can each be retrieved and answered individually.

#### ‚úÖ Why Use Query Decomposition?

- Complex queries often involve multiple concepts

- LLMs or retrievers may miss parts of the original question

- It enables multi-hop reasoning (answering in steps)

- Allows parallelism (especially in multi-agent frameworks)

In [1]:
from pydantic import BaseModel
import os
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain_classic.prompts import PromptTemplate
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_classic.chains.retrieval import create_retrieval_chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.runnables import RunnableMap
from langchain_classic.output_parsers import OutputFixingParser
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# Step 1: Load and embed the document
loader = TextLoader('langchain_crewai_dataset.txt')
docs = loader.load()

In [4]:
# Step 2: Use semantic chunk
### Custom Semantic Chunker With Threshold

class ThresholdSematicChunker:
    def __init__(self, model_name="all-MiniLM-L6-v2", threshold=0.7):
        self.model = SentenceTransformer(model_name)
        self.threshold = threshold
    
    def split(self, text:str):
        sentences = [s.strip() for s in text.split('.') if s.strip()]
        embeddings = self.model.encode(sentences)
        chunks = []
        current_chunk = [sentences[0]]

        for i in range(1, len(sentences)):
            sim = cosine_similarity([embeddings[i - 1]], [embeddings[i]])[0][0]
            if sim >= self.threshold:
                current_chunk.append(sentences[i])
            else:
                chunks.append(". ".join(current_chunk) + ".")
                current_chunk = [sentences[i]]

        chunks.append(". ".join(current_chunk) + ".")
        return chunks
    
    def split_document(self, docs):
        result = []
        for doc in docs:
            for chunk in self.split(doc.page_content):
                result.append(
                    Document(
                        page_content=chunk,
                        metadata = doc.metadata
                    )
                )
        return result

In [5]:
# Step 2.1: Split Documents
semantic_chunker = ThresholdSematicChunker()
semantic_chunk = semantic_chunker.split_document(docs)
len(semantic_chunk)

378

In [6]:
# Step 3: Embed document
embeding = OpenAIEmbeddings(
    model = "text-embedding-3-small"
)

In [7]:
# Step 4: Vector store

vector_store = FAISS.from_documents(
    semantic_chunk,
    embeding
)

In [8]:
# Step 5: Set retriver

retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 4, "lambda_mult": 0.7}
)

In [9]:
# Step 6: Set LLM
os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")

llm=init_chat_model("openai:o4-mini")

In [11]:
# Step 7: Set JSON output parser

json_output_parser = JsonOutputParser()

In [12]:
# Step 7.2: Query decomposition
output_format = JsonOutputParser().get_format_instructions()
decomposition_prompt = PromptTemplate.from_template(
"""You are an AI assistant. Decompose the following complex question into 2 to 4 smaller sub-questions for better document retrieval.
Return ONLY a valid JSON object exactly matching the format below (no surrounding text, no explanation):
{output_format}
Question: "{question}"
"""
)

decomposition_chain = decomposition_prompt | llm | json_output_parser

In [13]:
# Step 7.3: Test decomposition chain
query = "How does LangChain use memory and agents compared to CrewAI?"
decomposition_question=decomposition_chain.invoke({"question": query, "output_format": output_format})
print(decomposition_question)


{'question': 'How does LangChain use memory and agents compared to CrewAI?', 'sub_questions': ['How does LangChain implement and utilize memory in its architecture?', 'How does CrewAI implement and utilize memory in its architecture?', 'How does LangChain employ agents for task orchestration and decision making?', 'How does CrewAI employ agents for task orchestration and decision making?']}


In [14]:
# Step 8: QA chain per sub question
qa_prompt = PromptTemplate.from_template(
"""Use the context below to answer the question.
Context:
{context}
Question: {input}
""") 

qa_chain = create_stuff_documents_chain(
    llm=llm,
    prompt=qa_prompt
)



In [22]:
# Step 9: Full RAG pipeline logic

def full_query_decomposition_rag_pipeline(user_query):
    """
    Decompose the query and send each query to LLM
    """
    result = []
    sub_question = decomposition_chain.invoke({"question": user_query, "output_format": output_format})
    for idx, sub_question in enumerate(sub_question['sub_questions'], start=1):
        docs = retriever.invoke(sub_question)
        result_data = qa_chain.invoke({"input": sub_question, "context": docs})
        result.append(f"Q: {sub_question}\nA: {result_data}")
    
    return "\n\n".join(result)

In [23]:
# Step 10: Run query
query = "How does LangChain use memory and agents compared to CrewAI?"
final_answer = full_query_decomposition_rag_pipeline(query)
print("‚úÖ Final Answer:\n")
print(final_answer)

‚úÖ Final Answer:

Q: What memory capabilities does LangChain provide?
A: LangChain today ships with built-in ‚Äúchat memory‚Äù modules, most notably:

1. ConversationBufferMemory  
   ‚Äì Keeps the entire back-and-forth in memory so you can feed the full transcript (or a sliding window of it) back into your next prompt.  

2. ConversationSummaryMemory  
   ‚Äì As the dialog grows, it periodically condenses earlier turns into a running summary, so you maintain context without blowing out token limits.

Q: How does LangChain implement agents?
A: LangChain‚Äôs agents are built around a ‚Äúplanner‚Äìexecutor‚Äù architecture:  
1. Planner: the LLM breaks down a user‚Äôs goal into a sequence of steps (i.e. which tools to call, in what order).  
2. Executor: each planned step is dispatched to the appropriate tool‚Äîweb searches, calculators, code‚Äêexecution sandboxes, custom APIs, etc.‚Äîand the results are fed back into the planner until the overall task is complete.

Q: What memory capabi