### üß† What is Query Decomposition?
Query decomposition is the process of taking a complex, multi-part question and breaking it into simpler, atomic sub-questions that can each be retrieved and answered individually.

#### ‚úÖ Why Use Query Decomposition?

- Complex queries often involve multiple concepts

- LLMs or retrievers may miss parts of the original question

- It enables multi-hop reasoning (answering in steps)

- Allows parallelism (especially in multi-agent frameworks)

In [2]:
from langchain.chat_models import init_chat_model
from langchain_classic.prompts import PromptTemplate
from langchain_classic.document_loaders import TextLoader
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables import RunnableSequence

In [3]:
# Step 1: Load and embed the document
loader = TextLoader("langchain_crewai_dataset.txt")
docs = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(docs)

embedding = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embedding)
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 4, "lambda_mult": 0.7})

In [11]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

llm=init_chat_model(model="gpt-5-nano", model_provider="openai", temperature=0)
llm

ChatOpenAI(profile={'max_input_tokens': 272000, 'max_output_tokens': 128000, 'image_inputs': True, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True, 'structured_output': True, 'image_url_inputs': True, 'pdf_inputs': True, 'pdf_tool_message': True, 'image_tool_message': True, 'tool_choice': True}, client=<openai.resources.chat.completions.completions.Completions object at 0x155c3c830>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x155cab800>, root_client=<openai.OpenAI object at 0x13d450b30>, root_async_client=<openai.AsyncOpenAI object at 0x155cabfe0>, model_name='gpt-5-nano', model_kwargs={}, openai_api_key=SecretStr('**********'), stream_usage=True)

In [12]:
# Step 3: Query decomposition
decomposition_prompt = PromptTemplate.from_template("""
You are an AI assistant. Decompose the following complex question into 2 to 4 smaller sub-questions for better document retrieval.

Question: "{question}"

Sub-questions:
""")
decomposition_chain = decomposition_prompt | llm | StrOutputParser()

In [13]:
query = "How does LangChain use memory and agents compared to CrewAI?"
decomposition_question=decomposition_chain.invoke({"question": query})


In [14]:
print(decomposition_question)

Here are four sub-questions to guide document retrieval:

- What memory concepts and implementations do LangChain and CrewAI expose (e.g., short-term vs. long-term memory, persistence, vector stores, retrieval methods, summarization), and how are they configured?

- How are agents designed in LangChain and CrewAI (types of agents, planning vs. reactive approaches, tool invocation patterns, reasoning chains), and what built-in options do they provide?

- How do memory and agents interact in practice (how past interactions influence decisions, session continuity, context/window management, and any limits or trade-offs)?

- What are the customization and extension options for memory and agents in each framework (APIs, docs, examples, tooling for integration with external systems, debugging, and performance considerations)?


In [15]:
# Step 4: QA chain per sub-question
qa_prompt = PromptTemplate.from_template("""
Use the context below to answer the question.

Context:
{context}

Question: {input}
""")
qa_chain = create_stuff_documents_chain(llm=llm, prompt=qa_prompt)

In [16]:
# Step 5: Full RAG pipeline logic
def full_query_decomposition_rag_pipeline(user_query):
    # Decompose the query
    sub_qs_text = decomposition_chain.invoke({"question": user_query})
    sub_questions = [q.strip("-‚Ä¢1234567890. ").strip() for q in sub_qs_text.split("\n") if q.strip()]
    
    results = []
    for subq in sub_questions:
        docs = retriever.invoke(subq)
        result = qa_chain.invoke({"input": subq, "context": docs})
        results.append(f"Q: {subq}\nA: {result}")
    
    return "\n\n".join(results)

In [17]:
# Step 6: Run
query = "How does LangChain use memory and agents compared to CrewAI?"
final_answer = full_query_decomposition_rag_pipeline(query)
print("‚úÖ Final Answer:\n")
print(final_answer)

‚úÖ Final Answer:

Q: Sub-question 1: What memory models and storage options does LangChain offer for agents (e.g., specific memory primitives, how they store, retrieve, and summarize context)?
A: - The provided context notes that LangChain agents use a planner-executor model with ‚Äúcontext-aware memory use across steps.‚Äù
- It does not specify any memory models, memory primitives, storage options, or how to store, retrieve, or summarize context.

In short: There is a mention of memory capability, but no concrete details about memory primitives or storage/retrieval/summarization mechanisms in the given text. If you can share more detailed docs or specify the LangChain version, I can give a precise answer.

Q: Sub-question 2: How do LangChain agents handle reasoning, planning, and tool use (e.g., ReAct-style loops), and where does memory participate in that flow?
A: - Core pattern: LangChain agents use a planner-executor loop. The planner decides a sequence of tool invocations (action