### Reranking Hybrid Search Statergies

Re-ranking is a second-stage filtering process in retrieval systems, especially in RAG pipelines, where we:

1. First use a fast retriever (like BM25, FAISS, hybrid) to fetch top-k documents quickly.

2. Then use a more accurate but slower model (like a cross-encoder or LLM) to re-score and reorder those documents by relevance to the query.

👉 It ensures that the most relevant documents appear at the top, improving the final answer from the LLM.

In [1]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter 
from langchain.prompts import PromptTemplate 
from langchain_core.output_parsers import StrOutputParser 

In [4]:
## load text file
loader=TextLoader("langchain_sample.txt")
raw_docs=loader.load()

# Split text into document chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(raw_docs)
docs


[Document(metadata={'source': 'langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(metadata={'source': 'langchain_sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.'),
 Document(metadata={'source': 'langchain_sample.txt'}, page_content='Retrieval-Augmented Generation (RAG) is a powerful technique where external knowledge is retrieved and passed into the prompt to ground LLM responses. LangChain makes it easy to implement RAG using vector databases like FAISS, Chroma, and Pinecone.\nBM25 is a traditional 

In [5]:
## user query
query="ວິທີການໃຊ້ LangChain ເພື່ອສ້າງແອັບ application ທີ່ມີຄວາມຈຳ ແລະ ເຄື່ອງມື ?"

In [64]:
### FAISS and Huggingface model Embeddings

from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings

embedding_model=HuggingFaceEmbeddings(model_name="D:\\model\\BAAI-bge-m3")
vectorstore=FAISS.from_documents(docs,embedding_model)
retriever=vectorstore.as_retriever(search_kwargs={"k":6})

In [65]:
## HuggingFace Embedding 
from langchain_huggingface import HuggingFaceEmbeddings

## Initialize a simple Embedding model(no API Key needed!)
embeddings=HuggingFaceEmbeddings(
    model_name="D:\\model\\BAAI-bge-m3"
)   
vectorstore_hugging=FAISS.from_documents(docs,embeddings)
vectorstore_hugging=vectorstore_hugging.as_retriever(search_kwargs={"k":6})




In [66]:
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x00000274DBEB0CB0>, search_kwargs={'k': 6})

In [67]:
vectorstore_hugging

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x00000274DBE89150>, search_kwargs={'k': 6})

In [68]:
## prompt and use the llm
import os
from langchain.chat_models import init_chat_model
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")
llm=init_chat_model("groq:meta-llama/llama-4-maverick-17b-128e-instruct",max_tokens=2000)
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x00000274DBEB1D90>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x00000274DBEB2690>, model_name='meta-llama/llama-4-maverick-17b-128e-instruct', model_kwargs={}, groq_api_key=SecretStr('**********'), max_tokens=2000)

In [69]:
# Prompt Template
prompt = PromptTemplate.from_template("""
You are a helpful assistant. Your task is to rank the following documents from most to least relevant to the user's question.

User Question: "{question}"

Documents:
{documents}

Instructions:
- Think about the relevance of each document to the user's question.
- Return a list of document indices in ranked order, starting from the most relevant.

Important: Response in Lao Languae only                                     

Output format: comma-separated document indices (e.g., 2,1,3,0,...)
""")

In [70]:
retrieved_docs=retriever.invoke(query)
retrieved_docs

[Document(id='37722509-db76-486a-b5ff-b9fb7dba5544', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='91875ee3-cdf2-45a3-8938-7abc230f320d', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.'),
 Document(id='8e047901-e13d-4ec1-bae3-32488811979c', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond mo

In [71]:
chain=prompt| llm | StrOutputParser()
chain

PromptTemplate(input_variables=['documents', 'question'], input_types={}, partial_variables={}, template='\nYou are a helpful assistant. Your task is to rank the following documents from most to least relevant to the user\'s question.\n\nUser Question: "{question}"\n\nDocuments:\n{documents}\n\nInstructions:\n- Think about the relevance of each document to the user\'s question.\n- Return a list of document indices in ranked order, starting from the most relevant.\n\nImportant: Response in Lao Languae only                                     \n\nOutput format: comma-separated document indices (e.g., 2,1,3,0,...)\n')
| ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x00000274DBEB1D90>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x00000274DBEB2690>, model_name='meta-llama/llama-4-maverick-17b-128e-instruct', model_kwargs={}, groq_api_key=SecretStr('**********'), max_tokens=2000)
| StrOutputParser()

In [72]:
doc_lines = [f"{i+1}. {doc.page_content}" for i, doc in enumerate(retrieved_docs)]
formatted_docs = "\n".join(doc_lines)

In [73]:
doc_lines

['1. LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.',
 '2. LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.',
 '3. LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.',
 '4. FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional spaces. It supports both flat and comp

In [74]:
formatted_docs

'1. LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.\n2. LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.\n3. LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.\n4. FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional spaces. It supports both flat and compressed ind

In [77]:
response=chain.invoke({"question":query,"documents":formatted_docs})
response

'ເພື່ອຕອບຄຳຖາມຂອງຜູ້ໃຊ້ "ວິທີການໃຊ້ LangChain ເພື່ອສ້າງແອັບ application ທີ່ມີຄວາມຈຳ ແລະ ເຄື່ອງມື ?", ຂ້ອຍຈະຕ້ອງວິເຄາະເອກະສານທີ່ໃຫ້ມາເພື່ອຈັດອັນດັບຄວາມກ່ຽວຂ້ອງ.\n\nເອກະສານ 1 ໄດ້ກ່າວເຖິງ LangChain ວ່າເປັນກອບການພັດທະນາແອັບພລິເຄຊັນໂດຍໃຊ້ LLMs, ມີເຄື່ອງມື ແລະ ອົງປະກອບສໍາລັບການຈັດການຄວາມຊົງຈຳ. ມັນແມ່ນຄວາມກ່ຽວຂ້ອງກັບຄຳຖາມ.\n\nເອກະສານ 3 ກ່າວເຖິງຄວາມສາມາດຂອງ LangChain ໃນການລວມເອົາເຄື່ອງມືພາຍນອກ ແລະ ຄວາມຈຳ, ເຊິ່ງເປັນສ່ວນຫນຶ່ງທີ່ສໍາຄັນຂອງຄໍາຖາມ.\n\nເອກະສານ 4 ອະທິບາຍເຖິງ Agents ໃນ LangChain, ເຊິ່ງສາມາດໃຊ້ເພື່ອຕັດສິນໃຈວ່າເຄື່ອງມືໃດທີ່ຈະໃຊ້ ແລະ ໃນລໍາດັບໃດ, ເຊິ່ງກ່ຽວຂ້ອງກັບການນໍາໃຊ້ LangChain ເພື່ອສ້າງແອັບພລິເຄຊັນ.\n\nເອກະສານ 2 ແລະ 6 ໃຫ້ຂໍ້ມູນກ່ຽວກັບການປະສົມປະສານກັບບໍລິການພາຍນອກ ແລະ ເຕັກນິກ Retrieval-Augmented Generation (RAG), ເຊິ່ງເປັນປະໂຫຍດແຕ່ບໍ່ໄດ້ກ່ຽວຂ້ອງໂດຍກົງກັບຄໍາຖາມຫຼັກ.\n\nເອກະສານ 5 ກ່າວເຖິງ Dense retrieval ແລະ hybrid retrieval, ເປັນຂໍ້ມູນທີ່ກ່ຽວຂ້ອງກັບການຄົ້ນຫາເອກະສານ ແຕ່ບໍ່ແມ່ນຈຸດສໍາຄັນຂອງຄໍາຖາມ.\n\nອັນດັບຄວາມກ່ຽວຂ້ອງແມ່ນ: 1,3,4,2,6,5 (ຫຼື 3,1,4,2,6,5 ອີງຕາມການພິຈາລນາເລັກນ້ອຍກ່ຽວກັບຄວາມສໍ

In [78]:
# Step 5: Parse and rerank
indices = list(set([int(x.strip()) - 1 for x in response.split(",") if x.strip().isdigit()]))
indices

[0, 1, 2, 3, 4, 5]

In [79]:
retrieved_docs

[Document(id='37722509-db76-486a-b5ff-b9fb7dba5544', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='91875ee3-cdf2-45a3-8938-7abc230f320d', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.'),
 Document(id='8e047901-e13d-4ec1-bae3-32488811979c', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond mo

In [80]:
reranked_docs = [retrieved_docs[i] for i in indices if 0 <= i < len(retrieved_docs)]
reranked_docs

[Document(id='37722509-db76-486a-b5ff-b9fb7dba5544', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='91875ee3-cdf2-45a3-8938-7abc230f320d', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.'),
 Document(id='8e047901-e13d-4ec1-bae3-32488811979c', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond mo

In [81]:
# Step 6: Show results
print("\n📊 Final Reranked Results:")
for i, doc in enumerate(reranked_docs, 1):
    print(f"\nRank {i}:\n{doc.page_content}")


📊 Final Reranked Results:

Rank 1:
LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.

Rank 2:
LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.

Rank 3:
LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.
Memory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.

Rank 4:
FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional sp

### Optional Using Re-Ranking Model instead AI Generate

In [45]:
from sentence_transformers import CrossEncoder
import numpy as np

# Load local cross-encoder model
cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L12-v2")

In [88]:
query="ວິທີການໃຊ້ LangChain ເພື່ອສ້າງແອັບ application ທີ່ມີຄວາມຈຳ ແລະ ເຄື່ອງມື ແລະ FAISS ?"
retrieved_docs = retriever.invoke(query)
retrieved_docs

[Document(id='390e44b0-495b-42b3-855a-797141e90dcb', metadata={'source': 'langchain_sample.txt'}, page_content='FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional spaces. It supports both flat and compressed indexes, which makes it scalable for large document stores.\nAgents in LangChain are chains that use LLMs to decide which tools to use and in what order. This makes them suitable for multi-step tasks like question answering with search and code execution.'),
 Document(id='37722509-db76-486a-b5ff-b9fb7dba5544', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='91875ee3-cdf2-45a3-8938-7abc230f320d', metadata={'source': 'langchain_sample.txt'}, page_content='Lan

In [89]:
# Use Cross-Encoder instead LLM Re-ranking
def rerank_with_cross_encoder(query, documents, cross_encoder, top_k=None):
    """
    Re-rank documents using cross-encoder model
    
    Args:
        query: User question
        documents: List of retrieved documents
        cross_encoder: Loaded CrossEncoder model
        top_k: Number of top documents to return (None = all)
    
    Returns:
        List of reranked documents with scores
    """
    # สร้าง pairs สำหรับ cross-encoder
    pairs = [(query, doc.page_content) for doc in documents]
    
    # คำนวณ relevance scores
    scores = cross_encoder.predict(pairs)
    
    # สร้าง tuples ของ (score, document, original_index)
    scored_docs = [(scores[i], documents[i], i) for i in range(len(documents))]
    
    # เรียงตาม score (สูงไปต่ำ)
    scored_docs.sort(key=lambda x: x[0], reverse=True)
    
    # ตัด top_k ถ้าระบุ
    if top_k:
        scored_docs = scored_docs[:top_k]
    
    return scored_docs

In [90]:
print("\n🔄 Re-ranking with Cross-Encoder...")
scored_reranked = rerank_with_cross_encoder(
    query=query, 
    documents=retrieved_docs, 
    cross_encoder=cross_encoder,
    top_k=4
)


🔄 Re-ranking with Cross-Encoder...


In [91]:
# Extract documents and scores
reranked_docs = [item[1] for item in scored_reranked]  # documents
rerank_scores = [item[0] for item in scored_reranked]  # scores
original_indices = [item[2] for item in scored_reranked]  # original positions

print("\n📊 Cross-Encoder Reranked Results:")
print(f"Query: {query}")
print("="*50)

for i, (doc, score, orig_idx) in enumerate(zip(reranked_docs, rerank_scores, original_indices)):
    print(f"\n🏆 Rank {i+1} (Original position: {orig_idx+1})")
    print(f"📈 Relevance Score: {score:.4f}")
    print(f"📄 Content: {doc.page_content}")
    print("-" * 40)


📊 Cross-Encoder Reranked Results:
Query: ວິທີການໃຊ້ LangChain ເພື່ອສ້າງແອັບ application ທີ່ມີຄວາມຈຳ ແລະ ເຄື່ອງມື ແລະ FAISS ?

🏆 Rank 1 (Original position: 1)
📈 Relevance Score: 4.5714
📄 Content: FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional spaces. It supports both flat and compressed indexes, which makes it scalable for large document stores.
Agents in LangChain are chains that use LLMs to decide which tools to use and in what order. This makes them suitable for multi-step tasks like question answering with search and code execution.
----------------------------------------

🏆 Rank 2 (Original position: 2)
📈 Relevance Score: 4.0050
📄 Content: LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.
---------------------------------------