### 📖 Where We Are

**In the previous section on Advanced Retrieval**, we focused on improving the *retrieval process itself* with techniques like Hybrid Search, Reranking, and MMR. We learned how to find and prioritize better documents from our vector store.

**In this new section on Query Enhancement**, we shift our focus to the beginning of the pipeline: the **user's query**. We'll learn techniques to transform a user's initial, often simple, query into a more effective one *before* it ever hits the retriever. This notebook introduces the first and most powerful of these techniques: **Query Expansion**.

### 1. Query Expansion

In a RAG pipeline, the quality of the query sent to the retriever determines the quality of the retrieved context, which in turn dictates the accuracy of the LLM’s final answer. **Garbage in, garbage out.**

Query Expansion is a technique that uses an LLM to reformulate a user's query to be more specific, comprehensive, and better aligned with the language in the source documents. This is especially useful when the original query is short, ambiguous, or uses different terminology than your documents.

In [1]:
# --- Standard LangChain and Environment Imports ---
import os
from dotenv import load_dotenv
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chat_models import init_chat_model
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableMap

In [2]:
# Load API keys from .env file
load_dotenv()
llm = init_chat_model("groq:openai/gpt-oss-20b")

In [3]:
# --- 1. Load, Split, and Index Documents ---
# This follows the standard RAG indexing pipeline.
loader = TextLoader("langchain_crewai_dataset.txt")
raw_docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(raw_docs)

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(chunks, embedding_model)

# Create our base retriever.
retriever = vectorstore.as_retriever(search_type="mmr", search_kwargs={"k": 5})

  from .autonotebook import tqdm as notebook_tqdm


### 2. Creating the Query Expansion Chain
This is a small, dedicated chain whose only job is to take a user's query and use an LLM to make it better for retrieval.

In [4]:
# We create a prompt that specifically instructs the LLM to expand the query.
# It's crucial to tell it NOT to answer the question, but to improve it for a search.
query_expansion_prompt = PromptTemplate.from_template("""
You are an AI assistant. Your task is to expand the user's query to improve document retrieval.
Add relevant synonyms, technical terms, and broader concepts related to the original query.
Do not answer the query, only expand it.

Original query: "{query}"

Expanded query:""")

# Create a simple LCEL chain for this task.
query_expansion_chain = query_expansion_prompt | llm | StrOutputParser()

In [5]:
# Let's test the expansion chain in isolation.
original_query = "Langchain memory"
expanded_query = query_expansion_chain.invoke({"query": original_query})

print("Original Query:", original_query)
print("\nExpanded Query:", expanded_query)

Original Query: Langchain memory

Expanded Query: LangChain memory, LangChain memory modules, LangChain memory interface, LangChain memory adapters, LangChain memory providers, LangChain memory store, LangChain memory buffer, LangChain memory persistence, LangChain memory caching, LangChain memory management, LangChain memory architecture, LangChain memory implementation, LangChain memory component, LangChain memory classes, LangChain memory objects, LangChain memory state, LangChain memory context, LangChain memory usage patterns, LangChain memory optimization, LangChain memory plugin, LangChain memory chain, LangChain memory persistence layer, LangChain memory retrieval, LangChain memory system, LangChain memory repository, LangChain memory database, LangChain memory features, LangChain memory usage, LangChain memory management patterns, LangChain memory in conversational AI, LangChain memory in RAG, LangChain memory in chatbot, LangChain memory for LLM, LangChain memory in LLM appli

### 3. Building the Full RAG Pipeline with Expansion
Now, we'll construct a complete RAG pipeline that incorporates our `query_expansion_chain`. We use a `RunnableMap` to manage the flow of data, ensuring the *original* query is used for the final answer while the *expanded* query is used for retrieval.

In [6]:
# First, define the final answering chain, which takes context and a question.
answer_prompt = PromptTemplate.from_template("""
Answer the question based on the context below.
Context: {context}
Question: {input}
Answer:""")
document_chain = create_stuff_documents_chain(llm=llm, prompt=answer_prompt)

# Now, build the full pipeline using a RunnableMap.
rag_pipeline = (
    RunnableMap({
        # The 'context' is generated by a sub-chain:
        # 1. The original 'input' from the user is passed to the `query_expansion_chain`.
        # 2. The result (the expanded query) is then used to invoke the retriever.
        "context": lambda x: retriever.invoke(query_expansion_chain.invoke({"query": x["input"]})),
        # The 'input' is passed through directly from the original user query.
        # This ensures the LLM answers the user's actual question.
        "input": lambda x: x["input"],
    })
    | document_chain
)

In [7]:
# Run the full pipeline with a simple query.
query = {"input": "What types of memory does LangChain support?"}
response = rag_pipeline.invoke(query)
print("✅ Answer:\n", response)

✅ Answer:
 LangChain provides two built‑in memory types that help an LLM keep track of or condense prior dialogue:

| Memory type | What it does |
|-------------|--------------|
| **ConversationBufferMemory** | Stores the entire conversation history (or a configurable buffer) so the model can refer to earlier turns. |
| **ConversationSummaryMemory** | Keeps a running summary of the conversation, compressing long interactions into a concise recap that stays within token limits. |

These modules let you choose whether to retain the full transcript or a summarized version when feeding context back into the LLM.


In [8]:
# Run the pipeline with a more ambiguous query.
query = {"input": "CrewAI agents?"}

# First, let's see how the query gets expanded.
expanded_query = query_expansion_chain.invoke({"query": query['input']})
print("Expanded Query:\n", expanded_query)

# Now, run the full RAG pipeline.
response = rag_pipeline.invoke(query)
print("\n✅ Answer:\n", response)

Expanded Query:
 CrewAI agents, crew AI assistants, crew artificial intelligence agents, crew AI bots, crew AI software agents, crew AI autonomous agents, crew AI multi‑agent systems, crew AI agent architecture, crew AI agent framework, crew AI agent development, crew AI agent integration, crew AI agent use cases, crew AI agent applications, crew AI agent technologies, crew AI agent software, crew AI agent platform, crew AI agent solutions, crew AI agent system, crew AI agent design, crew AI agent capabilities, crew AI agent benefits, crew AI agent challenges, crew AI agent examples, crew AI agent research, crew AI agent industry, crew AI agent deployment, crew AI agent training, crew AI agent evaluation, crew AI agent monitoring, crew AI agent performance, crew AI agent safety, crew AI agent reliability, crew AI agent ethics, crew AI agent regulation, crew AI agent policy, crew AI agent governance, crew AI in aviation, crew AI in maritime, crew AI in space missions, crew AI for crew m

### 🔑 Key Takeaways

* **Query Expansion Boosts Recall**: By reformulating a user's query to be more comprehensive, you significantly increase the chances of retrieving all relevant documents, especially for short or ambiguous queries.
* **LLMs are Excellent Expanders**: Using an LLM with a specific prompt is a highly effective way to automatically generate synonyms, related technical terms, and clearer questions for your retriever.
* **Isolate the Expansion Logic**: It's a best practice to create a dedicated chain for query expansion. This keeps your pipeline modular and easy to debug.
* **`RunnableMap` for Complex Flows**: LangChain's `RunnableMap` is the perfect tool for orchestrating complex RAG pipelines. It allows you to run different chains in parallel and structure their outputs into the format needed for the next step (e.g., creating the `context` and `input` keys).
* **Balance Performance and Cost**: Remember that query expansion adds an extra LLM call, which introduces latency and cost. It's a powerful tool that should be used when the improvement in retrieval quality justifies the trade-off.