### Query Enhancement ‚Äì Query Expansion Techniques

In a RAG pipeline, the quality of the query sent to the retriever determines how good the retrieved context is ‚Äî and therefore, how accurate the LLM‚Äôs final answer will be.

That‚Äôs where Query Expansion / Enhancement comes in.

#### üéØ What is Query Enhancement?
Query enhancement refers to techniques used to improve or reformulate the user query to retrieve better, more relevant documents from the knowledge base.
It is especially useful when:

- The original query is short, ambiguous, or under-specified
- You want to broaden the scope to catch synonyms, related phrases, or spelling variants

In [9]:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import TextLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain_classic.prompts import PromptTemplate
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_classic.chains.retrieval import create_retrieval_chain
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableMap

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

In [None]:
## Step 1: Load dataset
loader = TextLoader("langchain_crewai_dataset.txt")
row_docs = loader.load()

In [4]:
# Step 2: Use semantic chunk
### Custom Semantic Chunker With Threshold

class ThresholdSematicChunker:
    def __init__(self, model_name="all-MiniLM-L6-v2", threshold=0.7):
        self.model = SentenceTransformer(model_name)
        self.threshold = threshold
    
    def split(self, text:str):
        sentences = [s.strip() for s in text.split('.') if s.strip()]
        embeddings = self.model.encode(sentences)
        chunks = []
        current_chunk = [sentences[0]]

        for i in range(1, len(sentences)):
            sim = cosine_similarity([embeddings[i - 1]], [embeddings[i]])[0][0]
            if sim >= self.threshold:
                current_chunk.append(sentences[i])
            else:
                chunks.append(". ".join(current_chunk) + ".")
                current_chunk = [sentences[i]]

        chunks.append(". ".join(current_chunk) + ".")
        return chunks
    
    def split_document(self, docs):
        result = []
        for doc in docs:
            for chunk in self.split(doc.page_content):
                result.append(
                    Document(
                        page_content=chunk,
                        metadata = doc.metadata
                    )
                )
        return result

In [8]:
# Step 2.1: Split Documents

semantic_chunker = ThresholdSematicChunker()

semantic_chunk = semantic_chunker.split_document(row_docs)

len(semantic_chunk)

378

In [10]:
# Step 3: Vector store
embedding_model = OpenAIEmbeddings(
     model="text-embedding-3-small"
)

vector_store = FAISS.from_documents(
    semantic_chunk,
    embedding_model
)

In [11]:
# Step 4: MMR retriever

retriever = vector_store.as_retriever(
    search_type = "mmr",
    search_kwargs={"k":5}
)

retriever

VectorStoreRetriever(tags=['FAISS', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x72469a313b90>, search_type='mmr', search_kwargs={'k': 5})

In [12]:
# Step 5: LLM and Prompts

import os
from dotenv import load_dotenv
load_dotenv()

os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")

llm=init_chat_model("openai:o4-mini")
llm

ChatOpenAI(profile={'max_input_tokens': 200000, 'max_output_tokens': 100000, 'image_inputs': True, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True, 'structured_output': True, 'image_url_inputs': True, 'pdf_inputs': True, 'pdf_tool_message': True, 'image_tool_message': True, 'tool_choice': True}, client=<openai.resources.chat.completions.completions.Completions object at 0x724698fd73b0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x724698fd7470>, root_client=<openai.OpenAI object at 0x724698fd4530>, root_async_client=<openai.AsyncOpenAI object at 0x72469fbcc980>, model_name='o4-mini', model_kwargs={}, openai_api_key=SecretStr('**********'), stream_usage=True)

In [13]:
# Step 6: Query expansion

query_expansion_prompt = PromptTemplate.from_template("""
You are a helpful assistant. Expand the following query to improve document retrieval by adding relevant synonyms, technical terms, and useful context.

Original query: "{query}"

Expanded query:
""")

query_expansion_chain = query_expansion_prompt|llm|StrOutputParser()
query_expansion_chain

PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='\nYou are a helpful assistant. Expand the following query to improve document retrieval by adding relevant synonyms, technical terms, and useful context.\n\nOriginal query: "{query}"\n\nExpanded query:\n')
| ChatOpenAI(profile={'max_input_tokens': 200000, 'max_output_tokens': 100000, 'image_inputs': True, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True, 'structured_output': True, 'image_url_inputs': True, 'pdf_inputs': True, 'pdf_tool_message': True, 'image_tool_message': True, 'tool_choice': True}, client=<openai.resources.chat.completions.completions.Completions object at 0x724698fd73b0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x724698fd7470>, root_client=<openai.OpenAI object at 0x724698fd4530>, root_async_client=<openai.AsyncOpenAI

In [14]:
query_expansion_chain.invoke(
    {
        "query":"Langchain memory"
    }
)

'Here‚Äôs one possible expanded query that adds synonyms, technical terms, and useful context around ‚ÄúLangchain memory‚Äù:\n\n‚ÄúLangChain memory‚Äù OR  \n‚ÄúLangChain memory management‚Äù OR  \n‚ÄúLangChain memory adapter‚Äù OR  \n‚ÄúLangChain memory module‚Äù OR  \n‚ÄúConversationBufferMemory‚Äù OR  \n‚ÄúConversationSummaryMemory‚Äù OR  \n‚ÄúMultiSessionMemory‚Äù OR  \n‚Äúvector store memory‚Äù OR  \n‚Äúembedding store‚Äù OR  \n‚Äúpersistent memory‚Äù OR  \n‚Äúsession memory‚Äù OR  \n‚Äúchat memory‚Äù OR  \n‚Äústate management‚Äù OR  \n‚Äúcontextual memory‚Äù OR  \n‚Äúmemory buffer‚Äù OR  \n‚ÄúRAG‚Äù OR  \n‚Äúretrieval-augmented generation‚Äù OR  \n‚ÄúLLM memory‚Äù OR  \n‚Äúcontext window‚Äù OR  \n‚ÄúRedis adapter‚Äù OR  \n‚ÄúMongoDB memory‚Äù OR  \n‚ÄúLangChain Python‚Äù OR  \n‚ÄúLangChain JavaScript‚Äù OR  \n‚ÄúLangChain Java‚Äù'

In [15]:
# Step 7: RAG answering prompt
answer_prompt = PromptTemplate.from_template(
    """
    Answer the question based on the context below.

    Context:
    {context}

    Question: {input}
    """
)

document_chain = create_stuff_documents_chain(
    llm,
    answer_prompt
)

In [16]:
# Step 8: Full RAG pipeline with query expansion

rag_pipeline = (
    RunnableMap({
        "input": lambda x: x["input"],
        "context": lambda x: retriever.invoke(query_expansion_chain.invoke({"query": x["input"]}))
    })
    | document_chain
)



In [19]:
query = {"input": "What types of memory does CrewAI support?"}
response = rag_pipeline.invoke(query)
print("‚úÖ Answer:\n", response)

‚úÖ Answer:
 The excerpts provided don‚Äôt mention any specific memory model or memory types supported by CrewAI. No memory types are defined in the context you shared.


Question: "What types of memory does LangChain support?"
‚úÖ Answer:
 LangChain supports at least two built-in memory types:  
‚Ä¢ ConversationBufferMemory  
‚Ä¢ ConversationSummaryMemory

Question: What types of memory does LangGraph support?
‚úÖ Answer:
 LangGraph currently supports two memory modules:  
‚Ä¢ ConversationBufferMemory  
‚Ä¢ ConversationSummaryMemory

Question: What types of memory does CrewAI support?
‚úÖ Answer:
 The excerpts provided don‚Äôt mention any specific memory model or memory types supported by CrewAI. No memory types are defined in the context you shared.

In [20]:
# Step 8.1: Run query
query = {"input": "CrewAI agents?"}
print(query_expansion_chain.invoke({"query":query}))
response = rag_pipeline.invoke(query)
print("‚úÖ Answer:\n", response)

Expanded query:

("CrewAI" OR "Crew AI" OR "digital crew agents" OR "virtual crew assistant" OR "AI-powered crew management")  
AND  
("autonomous agent" OR "intelligent software agent" OR "multi-agent system" OR "agent-based model" OR "virtual assistant")  
AND  
("crew scheduling" OR "crew coordination" OR "resource allocation" OR "team operations" OR "personnel planning" OR "crew resource management")  
AND  
("machine learning" OR "deep learning" OR "reinforcement learning" OR "agent-based modeling" OR "decision support system" OR "automation" OR "digital twin")
‚úÖ Answer:
 CrewAI agents are LLM-powered, semi-autonomous ‚Äúcrew members‚Äù in a multi-agent orchestration framework.  Each agent is defined by:  
‚Ä¢ A specific role (e.g. researcher, planner, executor)  
‚Ä¢ A clear purpose and goal  
‚Ä¢ A prescribed toolset it can invoke  

Within the CrewAI framework, agents operate in parallel or in sequence‚Äîstaying on task and collaborating in a structured way‚Äîto ensure each c