This Python code implements a Retrieval-Augmented Generation (RAG) pipeline that combines semantic embeddings with keyword-based ranking to answer questions over a custom knowledge base.

Dense Retrieval: Uses HuggingFaceEmbeddings and Chroma to capture semantic meaning of documents.

Sparse Retrieval (BM25): Uses BM25Retriever to rank documents based on term frequency, inverse document frequency, and length normalization.

Ensemble Retrieval: Combines dense and sparse retrievers via EnsembleRetriever with configurable weights (e.g., 0.7 dense, 0.3 sparse).

In [11]:
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever # is to combine both the retrievers
from langchain_chroma import Chroma
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
import os
from dotenv import load_dotenv


BM25 = keyword-based document ranking using term frequency, inverse document frequency, and length normalization to score relevance.

In [12]:
load_dotenv(override=True)


True

In [13]:
openai_api_key =os.getenv('OPENAI_API_KEY')

Load Documents

In [14]:
root_path =r"C:\Users\Mohamed Arshad\Downloads\My_RAG_Lab\llm_engineering\RAG\knowledge-base"

In [15]:
loader =DirectoryLoader(path=root_path,
                        glob="**/*.md",
                        loader_cls=TextLoader,
                        loader_kwargs={"encoding":"utf-8"})

try:
    docs=loader.load()
    print(f"docs loaded with {len(docs)} documents")

except Exception as e:
    print(f"error occured {e}")

docs loaded with 76 documents


Create Chunks

In [16]:
text_splitter =RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
chunks =text_splitter.split_documents(documents=docs)

Initialize Dense Vector

In [17]:
embedding_model =HuggingFaceEmbeddings(model_name="all-MiniLM-L6-V2")
dense_vector_store = Chroma.from_documents(documents=docs,embedding=embedding_model)
dense_retreiver =dense_vector_store.as_retriever()

Initialize Sparse Retreiver

In [18]:
sparse_retreiver = BM25Retriever.from_documents(documents=chunks,k=3)
sparse_retreiver.k=3 #top 3 documents to retrieve

Combine Ensemble Retreiver

Mostly semantic queries (concepts, paraphrases, open-ended questions) → more weight on dense.

Mostly factual or keyword-heavy queries → more weight on sparse/BM25.

Mixed queries → start with 0.7 dense / 0.3 sparse, then tweak based on results.

In [19]:
hybrid_retreiver =EnsembleRetriever(retrievers=[dense_retreiver,sparse_retreiver],
weights=[0.7,0.3])

In [20]:
hybrid_retreiver

EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001DE79267FB0>, search_kwargs={}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001DE78D40530>, k=3)], weights=[0.7, 0.3])

In [21]:
Prompt =PromptTemplate.from_template(
    """
    Answer the question based on the context below

    Context:{context}

    Question: {input}

    """
)

In [28]:
llm =ChatOpenAI(model='gpt-4.1-nano')

In [29]:
# stuff document chain
document_chain =create_stuff_documents_chain(llm=llm,prompt=Prompt)

# create full rag chain
rag_chain = create_retrieval_chain(retriever=hybrid_retreiver,combine_docs_chain=document_chain)
rag_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | EnsembleRetriever(retrievers=[VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x000001DE79267FB0>, search_kwargs={}), BM25Retriever(vectorizer=<rank_bm25.BM25Okapi object at 0x000001DE78D40530>, k=3)], weights=[0.7, 0.3]), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template='\n    Answer the question based on the context below\n\n    Context:{context}\n\n    Question: {input}\n\n    ')
            | ChatOpenAI

In [None]:
query ={"input":"who received the prestigious IIOTY award in 2023?"}
response =rag_chain.invoke(query)

print(f"Answer: {response['answer']}/n")
#print(response)


for i,doc in enumerate(response['context']):
    print(f"\n Doc {i+1} :{doc.page_content}")

Answer: Maxine received the prestigious IIOTY Innovator of the Year (IIOTY 2023) award in 2023./n

 Doc 1 :# HR Record

# Alex Chen

## Summary
- **Date of Birth:** March 15, 1990
- **Job Title:** Backend Software Engineer
- **Location:** San Francisco, California
- **Current Salary:** $115,000  

## Insurellm Career Progression
- **April 2020:** Joined Insurellm as a Junior Backend Developer. Focused on building APIs to enhance customer data security.
- **October 2021:** Promoted to Backend Software Engineer. Took on leadership for a key project developing a microservices architecture to support the company's growing platform.
- **March 2023:** Awarded the title of Senior Backend Software Engineer due to exemplary performance in scaling backend services, reducing downtime by 30% over six months.

## Annual Performance History
- **2020:**  
  - Completed onboarding successfully.  
  - Met expectations in delivering project milestones.  
  - Received positive feedback from the team lead