## Query Enhancement - Query Expansion Techniques
In a RAG pipeline, the quality of the query sent to the retriever determines how good the retrieved context is - and therefore, how accurate the LLM's final answer will be.
That's where Query Expansion/ Enhancement comes in.

### üéØ What is Query Enhancement? 
Query enhancement refers to techniques used to improve or reformulate the user query to retrieve better, more relevant documents from the knowledge base. It is especially useful when:
- The original query is short, ambiguous , or under-specified
- You want to broaden the scope to catch synonyms, related phrases, or spelling variants

In [1]:
from langchain_classic.document_loaders import TextLoader
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chat_models import init_chat_model
from langchain_classic.prompts import PromptTemplate
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_classic.chains.retrieval import create_retrieval_chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableMap


In [None]:
# Step 1: TextLoader
loader = TextLoader("langchain.txt", encoding="utf-8", autodetect_encoding=True)
raw_docs = loader.load()

In [6]:
# Step 2: Split the documents
splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50
)
chunks = splitter.split_documents(raw_docs)

In [8]:
# Step 3: Embedding model and vectore store
embedding_model = HuggingFaceEmbeddings(model="all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(chunks, embedding_model)

# make MMR retriever
retriever = vector_store.as_retriever(
    search_type = "mmr",
    search_kwargs = {"k":5}
)


In [13]:
# Step 4: LLM and Prompt
from dotenv import load_dotenv
load_dotenv()

llm = init_chat_model("groq:openai/gpt-oss-120b")
llm


ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 32768, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x000001F1608337A0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001F160833CB0>, model_name='openai/gpt-oss-120b', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [15]:
# Query expansion
query_expansion_prompt = PromptTemplate.from_template("""You are a helpful assitant. Expand the following query to improve document retrieval by adding relevent synonyms, technical terms, and useful context.
Original query: "{query}" 
Expanded query:                                                                                                                                                                    
""")

query_expansion_chain = query_expansion_prompt | llm | StrOutputParser()
query_expansion_chain

PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='You are a helpful assitant. Expand the following query to improve document retrieval by adding relevent synonyms, technical terms, and useful context.\nOriginal query: "{query}" \nExpanded query:                                                                                                                                                                    \n')
| ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 32768, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x000001F1608337A0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001F160833CB0>, model_name='openai/gpt-oss-120b', model_kwargs={}, groq_api_key=SecretStr('**********'))
| StrOutputPar

In [16]:
query = {"query": "Langchain memory"}
query_expansion_chain.invoke(query)

'**Expanded Search Query**\n\n```\n(LangChain OR "Langchain") AND (memory OR "context management" OR "conversation memory" OR "state persistence" OR "session memory" OR "chatbot memory" OR "memory buffer" OR "memory store" OR "memory module") \nAND ( "ConversationBufferMemory" OR "ConversationSummaryMemory" OR "ConversationEntityMemory" OR "ConversationKGMemory" OR "ConversationReplayMemory" OR "ConversationChain" OR "LLMChain" OR "ChatPromptTemplate" OR "PromptTemplate" ) \nAND ( "vector store" OR "vector database" OR FAISS OR Chroma OR Pinecone OR Weaviate OR Milvus OR Qdrant OR "embedding store" ) \nAND ( "retrieval‚Äëaugmented generation" OR RAG OR "retrieval‚Äëaugmented" OR "knowledge retrieval" ) \nAND ( "state management" OR "session state" OR "persistent storage" OR Redis OR "SQL database" OR "PostgreSQL" OR "MongoDB" OR "DynamoDB" OR "Azure Cosmos DB" ) \nAND ( "LLM" OR "large language model" OR "OpenAI" OR "ChatGPT" OR "Claude" OR "Gemini" OR "Llama" ) \nAND ( "prompt enginee

In [17]:
# RAG answering prompt
answer_prompt = PromptTemplate.from_template(
    """
Answer the question based on the context below.
Context : {context}
Qusetion: {input}
"""
)

document_chain = create_stuff_documents_chain(llm, answer_prompt)


In [19]:
# Full Rag pipeline with query expansion
rag_pipeline = (
    RunnableMap({
        "input": lambda x: x["input"],
        "context": lambda x : retriever.invoke(query_expansion_chain.invoke({"query": x["input"]}))
    })
    | document_chain
)

In [21]:
# Run Query
# query = {"input": "What types of memeory does Langchain support?"}
query = {"input": "What types of memeory does CrewAI support?"}
print(query_expansion_chain.invoke({"query": query}))
response = rag_pipeline.invoke(query)
print("Ansewer:\n", response)


**Expanded query**

```json
{
  "input": "What kinds of memory or data storage does the CrewAI platform support? Specifically, which types of short‚Äëterm (in‚Äëmemory, cache, session) and long‚Äëterm (persistent, vector store, knowledge‚Äëbase, database) memory are available? Does CrewAI provide support for semantic memory (embeddings), episodic memory (conversation history), persistent vector databases (e.g., Pinecone, Weaviate, Milvus), relational storage (PostgreSQL, MySQL), NoSQL stores (MongoDB, Redis), file‚Äëbased storage, or any custom memory back‚Äëends? Please include any technical terms such as ‚Äòvector store‚Äô, ‚Äòembedding cache‚Äô, ‚Äòknowledge graph‚Äô, ‚Äòstate management‚Äô, ‚Äòpersistent store‚Äô, ‚Äòephemeral cache‚Äô, and relevant APIs or configuration options for selecting or extending memory types in CrewAI."
}
```
Ansewer:
 CrewAI inherits the same flexible‚ÄØ*memory*‚ÄØabstractions that LangChain provides, so you can plug in any of the standard memory types t