Query Enhancement:
Query enhancement refers to techniques used to improve or reformulate the user query to retrieve better, more relevant documents from the knowledge base.
It is especially useful when:

- The original query is short, ambiguous, or under-specified
- Need to broaden the scope to catch synonyms, related phrases, or spelling variants

In [40]:
from langchain.document_loaders import TextLoader,DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
from langchain_chroma import Chroma
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser #job is to take whatever the LLM outputs (raw string) and convert it into a structured format that your chain expects.



In [41]:
load_dotenv(override=True)

True

In [42]:
root_path =r"C:\Users\Mohamed Arshad\Downloads\My_RAG_Lab\llm_engineering\RAG\knowledge-base"

In [43]:
#load documents
loader = DirectoryLoader(path=root_path,
                         glob="**/*.md",
                         loader_cls=TextLoader,
                         loader_kwargs={"encoding":"utf-8"})

try:
    docs =loader.load()
    print(f"Document loaded with {len(docs)} from {root_path}")

except Exception as e:
    print(f'Error occured {e}')


Document loaded with 76 from C:\Users\Mohamed Arshad\Downloads\My_RAG_Lab\llm_engineering\RAG\knowledge-base


In [44]:
docs[60]

Document(metadata={'source': 'C:\\Users\\Mohamed Arshad\\Downloads\\My_RAG_Lab\\llm_engineering\\RAG\\knowledge-base\\employees\\Oliver Spencer.md'}, page_content='# HR Record\n\n# Oliver Spencer\n\n## Summary\n- **Date of Birth**: May 14, 1990\n- **Job Title**: Backend Software Engineer\n- **Location**: Austin, Texas\n- **Current Salary**: $125,000  \n\n## Insurellm Career Progression\n- **March 2018**: Joined Insurellm as a Backend Developer I, focusing on API development for customer management systems.\n- **July 2019**: Promoted to Backend Developer II after successfully leading a team project to revamp the claims processing system, reducing response time by 30%.\n- **June 2021**: Transitioned to Backend Software Engineer with a broader role in architecture and system design, collaborating closely with the DevOps team.\n- **September 2022**: Assigned as the lead engineer for the new "Innovate" initiative, aimed at integrating AI-driven solutions into existing products.\n- **January

In [45]:
# Text Splitter / Chunking
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
chunks =text_splitter.split_documents(documents=docs)

In [46]:
# Embedding generation and vector store
embedding_model =HuggingFaceEmbeddings(model="all-MiniLM-L6-V2")
vector_store =Chroma.from_documents(documents=chunks,embedding=embedding_model)

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 10})

In [47]:
openai_api_key = os.getenv('OPENAI_API_KEY')
if openai_api_key:
    print('key_ exists')
else:
    print('key doesnt exists')

key_ exists


In [48]:
# initialize model
llm =ChatOpenAI(model='o4-mini')

In [49]:
query_expansion_prompt = PromptTemplate.from_template(
    """
    You are a helpful assistant. Expand the following query to improve document retrieval by adding relevant synonyms and useful context.

Original query: "{query}"

Expanded query:
    """
)

In [50]:
user_query = "Show me all employees in the HR department with more than 5 years experience"

In [51]:
user_query = "Show me all employees with more than 2 years experience"

In [52]:
query_expansion_chain = query_expansion_prompt|llm|StrOutputParser()

In [53]:
expanded_query =query_expansion_chain.invoke({"query":user_query})
print(f"Expanded query :{expanded_query}")

Expanded query :Expanded query:

(Show OR List OR Find OR Retrieve)
AND (employees OR staff OR personnel OR “team members” OR workforce)
AND (experience OR tenure OR “length of service” OR “work history”)
AND (“more than 2 years” OR “at least two years” OR “over two years” OR “>2 years” OR “≥24 months”)
OR (senior OR “mid-level” OR experienced OR seasoned OR veteran)
OR (hire_date ≤ CURRENT_DATE – INTERVAL '2 years' OR years_of_experience ≥ 2)

Putting it all together as a single Boolean search string:

(Show OR List OR Find OR Retrieve)
AND (employees OR staff OR personnel OR “team members” OR workforce)
AND (experience OR tenure OR “length of service” OR “work history”)
AND (“more than 2 years” OR “at least two years” OR “over two years” OR “>2 years” OR “≥24 months” OR senior OR mid-level OR experienced OR seasoned OR veteran)
AND (hire_date ≤ CURRENT_DATE – INTERVAL '2 years' OR years_of_experience ≥ 2)


In [54]:
# RAG answering prompt
answer_prompt = PromptTemplate.from_template("""
Answer the question based on the context below.

Context:
{context}

Question: {query}
""")


In [55]:
# Retriever with expanded query
retrieved_docs = retriever.invoke(query_expansion_chain.invoke({"query": user_query}))

# Final answer chain
document_chain = answer_prompt | llm | StrOutputParser()
final_answer = document_chain.invoke({"context": retrieved_docs, "query": user_query})

In [56]:
final_answer

'Here are the employees whose total professional tenure exceeds two years:\n\n• Tyler Brooks  \n   – Role: Junior Backend Developer (Austin, TX)  \n   – Insurellm Tenure: June 2021 – Present (~3 years)\n\n• Jennifer Adams  \n   – Role: Sales Development Representative (Remote – Denver, CO)  \n   – Insurellm Tenure: June 2022 – Present (incl. SDR Intern June 2022–Feb 2023 & SDR Mar 2023–Present) (~2.1 years)  \n   – Prior Experience: Customer Service Rep at RetailCo Aug 2019–May 2022 (~2.8 years)\n\n• Amanda Foster  \n   – Role: HR Business Partner (San Francisco, CA)  \n   – Insurellm Tenure: September 2016 – Present (~7.8 years)  \n   – Prior HR Roles: Senior HR Generalist (2013–2016), HR Coordinator (2009–2013)\n\n• Emily Carter  \n   – Role: Account Executive (Austin, TX)  \n   – Insurellm Tenure: 2021 – Present (~3 years)  \n   – Prior Role: Sales Coordinator (2019–2021)\n\n• Lisa Anderson  \n   – Role: Marketing Manager (Austin, TX)  \n   – Insurellm Tenure: April 2019 – Present (

In [None]:
"""'Here are the employees whose total professional tenure exceeds two years:
\n\n• Tyler Brooks  \n   – Role: Junior Backend Developer (Austin, TX)  \n  
 – Insurellm Tenure: June 2021 – Present (~3 years)\n\n• 
 
 Jennifer Adams  \n   – Role: Sales Development Representative (Remote – Denver, CO)  \n  
  – Insurellm Tenure: June 2022 – Present (incl. SDR Intern June 2022–Feb 2023 & SDR Mar 2023–Present) (~2.1 years)  \n   – Prior Experience: Customer Service Rep at RetailCo Aug 2019–May 2022 (~2.8 years)\n\n• 
  
  Amanda Foster  \n   – Role: HR Business Partner (San Francisco, CA)  \n   – Insurellm Tenure: September 2016 – Present (~7.8 years)  \n   – Prior HR Roles: Senior HR Generalist (2013–2016), HR Coordinator (2009–2013)\n\n• 
  
  Emily Carter  \n   – Role: Account Executive (Austin, TX)  \n   – Insurellm Tenure: 2021 – Present (~3 years)  \n   – Prior Role: Sales Coordinator (2019–2021)\n\n• 
  
  Lisa Anderson  \n   – Role: Marketing Manager (Austin, TX)  \n   – Insurellm Tenure: April 2019 – Present (~5 years)  \n   – Prior Roles: Senior Marketing Specialist (2016–2019), Marketing Coordinator (2012–2015)' """