### Reranking Hybrid Search Statergies

Re-ranking is a second stage filtering process in retrival system , especially in RAG pipelines, where we:
1. First use a retriever(like BM25, FAISS, HYBRID) to fetch top-k documents quickly.
2. Then use a more accurate but slower model(like a cross-encoder or LLM) to res-score and reorder those documents by relevance to the query.

üëâüèø It ensures that the most relevant documents appear at the top, improve the final answer from the LLM.

In [1]:
from langchain_classic.document_loaders import TextLoader
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain_classic.prompts import PromptTemplate
from langchain_classic.schema import Document
from langchain_core.output_parsers import StrOutputParser

In [5]:
## Load the text file
loader = TextLoader(
    "langchain_sample.txt"
)
raw_docs = loader.load()

# Split the text into documents chunks
splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap=50
)
docs = splitter.split_documents(raw_docs)
docs

[Document(metadata={'source': 'langchain_sample.txt'}, page_content='In modern information retrieval systems, understanding the difference between sparse and dense retrievers is essential. Sparse retrieval methods are based on exact or near-exact term matching. Techniques such as TF-IDF and BM25 fall into this category. These methods represent documents as sparse vectors where each dimension corresponds to a word or token in the vocabulary. If a word does not appear in a document, its value is zero. Sparse retrievers are highly interpretable and work'),
 Document(metadata={'source': 'langchain_sample.txt'}, page_content='retrievers are highly interpretable and work exceptionally well when queries contain exact keywords that also appear in documents. BM25 improves upon earlier sparse methods by incorporating term frequency saturation and document length normalization. This makes BM25 more robust across documents of varying sizes. In practice, sparse retrievers are very effective for tec

In [3]:
## user query
query = "How can i use langchain to build application with memory and tools?"

In [6]:
### FAISS  and Huggingface model embeddings

from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(docs, embedding_model)
retriever = vector_store.as_retriever(search_kwargs={"k":8})


In [7]:
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x0000025DA1911910>, search_kwargs={'k': 8})

In [8]:
## prompt and use the llm
import os
from langchain.chat_models import init_chat_model
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")

llm = init_chat_model(
    model='groq:openai/gpt-oss-20b'
)
llm

ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 32768, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x0000025DA345B1D0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x0000025DA34BF860>, model_name='openai/gpt-oss-20b', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [9]:
prompt = PromptTemplate.from_template("""
You are  a helpful assistant. Your task is to rank the following documents from most to least relevant to the user's question.
User Question: "{question}"
Documents:{documents}

Instruction: 
- Think about the relevance of each document to the user's question.
- Return the list of document indices in ranked order, starting from the most relevant.

Output format : comma-separated document indices (e.g,2,1,3,0,..)                                                                                                                                                                                            

""")



In [11]:
retrieve_docs = retriever.invoke(query)
retrieve_docs

[Document(id='f15a1ffd-670e-4fbb-8863-76ca0c11cd43', metadata={'source': 'langchain_sample.txt'}, page_content='ordering changes. LangChain supports re-ranking integrations, making it a suitable framework for experimentation and learning. In a typical LangChain RAG pipeline, documents are first loaded and split into chunks. Each chunk is embedded and stored. During query time, a retriever fetches candidate chunks. These chunks may come from sparse retrieval, dense retrieval, or a hybrid approach. A re-ranker then refines the order of these chunks before they are sent to the language model as context.'),
 Document(id='05abae1e-f581-443b-9a7c-1d6ea247f6ef', metadata={'source': 'langchain_sample.txt'}, page_content='using embeddings. The results are then merged or scored together. LangChain provides abstractions that make building hybrid retrievers straightforward, allowing developers to experiment with different weighting strategies. Within LangChain, retrievers are treated as modular co

In [12]:
chain = prompt | llm | StrOutputParser()
chain

PromptTemplate(input_variables=['documents', 'question'], input_types={}, partial_variables={}, template='\nYou are  a helpful assistant. Your task is to rank the following documents from most to least relevant to the user\'s question.\nUser Question: "{question}"\nDocuments:{documents}\n\nInstruction: \n- Think about the relevance of each document to the user\'s question.\n- Return the list of document indices in ranked order, starting from the most relevant.\n\nOutput format : comma-separated document indices (e.g,2,1,3,0,..)                                                                                                                                                                                            \n\n')
| ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 32768, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': True, 'tool_calling': True}, client

In [13]:
doc_lines = [f"{i+1}. {doc.page_content}" for i, doc in enumerate(retrieve_docs)]
formatted_docs = "\n".join(doc_lines)

In [14]:
doc_lines

['1. ordering changes. LangChain supports re-ranking integrations, making it a suitable framework for experimentation and learning. In a typical LangChain RAG pipeline, documents are first loaded and split into chunks. Each chunk is embedded and stored. During query time, a retriever fetches candidate chunks. These chunks may come from sparse retrieval, dense retrieval, or a hybrid approach. A re-ranker then refines the order of these chunks before they are sent to the language model as context.',
 '2. using embeddings. The results are then merged or scored together. LangChain provides abstractions that make building hybrid retrievers straightforward, allowing developers to experiment with different weighting strategies. Within LangChain, retrievers are treated as modular components. A retriever takes a query and returns relevant documents. These documents are then passed as context to the language model. This context is critical in Retrieval-Augmented Generation workflows, as it groun

In [15]:
formatted_docs

'1. ordering changes. LangChain supports re-ranking integrations, making it a suitable framework for experimentation and learning. In a typical LangChain RAG pipeline, documents are first loaded and split into chunks. Each chunk is embedded and stored. During query time, a retriever fetches candidate chunks. These chunks may come from sparse retrieval, dense retrieval, or a hybrid approach. A re-ranker then refines the order of these chunks before they are sent to the language model as context.\n2. using embeddings. The results are then merged or scored together. LangChain provides abstractions that make building hybrid retrievers straightforward, allowing developers to experiment with different weighting strategies. Within LangChain, retrievers are treated as modular components. A retriever takes a query and returns relevant documents. These documents are then passed as context to the language model. This context is critical in Retrieval-Augmented Generation workflows, as it grounds t

In [16]:
response = chain.invoke({"question": query, "documents":formatted_docs})
response

'1,5,6,7,2,3,4,8'

In [17]:
# Step 5: Parse and rerank
indices = [int(x.strip())-1 for x in response.split(",") if x.strip().isdigit()]
indices

[0, 4, 5, 6, 1, 2, 3, 7]

In [19]:
retrieve_docs

[Document(id='f15a1ffd-670e-4fbb-8863-76ca0c11cd43', metadata={'source': 'langchain_sample.txt'}, page_content='ordering changes. LangChain supports re-ranking integrations, making it a suitable framework for experimentation and learning. In a typical LangChain RAG pipeline, documents are first loaded and split into chunks. Each chunk is embedded and stored. During query time, a retriever fetches candidate chunks. These chunks may come from sparse retrieval, dense retrieval, or a hybrid approach. A re-ranker then refines the order of these chunks before they are sent to the language model as context.'),
 Document(id='05abae1e-f581-443b-9a7c-1d6ea247f6ef', metadata={'source': 'langchain_sample.txt'}, page_content='using embeddings. The results are then merged or scored together. LangChain provides abstractions that make building hybrid retrievers straightforward, allowing developers to experiment with different weighting strategies. Within LangChain, retrievers are treated as modular co

In [18]:
reranked_docs = [retrieve_docs[i] for i in indices if 0<=i <len(retrieve_docs)]
reranked_docs

[Document(id='f15a1ffd-670e-4fbb-8863-76ca0c11cd43', metadata={'source': 'langchain_sample.txt'}, page_content='ordering changes. LangChain supports re-ranking integrations, making it a suitable framework for experimentation and learning. In a typical LangChain RAG pipeline, documents are first loaded and split into chunks. Each chunk is embedded and stored. During query time, a retriever fetches candidate chunks. These chunks may come from sparse retrieval, dense retrieval, or a hybrid approach. A re-ranker then refines the order of these chunks before they are sent to the language model as context.'),
 Document(id='b37f05a0-9680-4261-afa5-54e1ab387946', metadata={'source': 'langchain_sample.txt'}, page_content='strategies, embedding models, and re-rankers, developers can observe how result ordering changes. LangChain supports re-ranking integrations, making it a suitable framework for experimentation and learning. In a typical LangChain RAG pipeline, documents are first loaded and sp

In [20]:
# Step 6: Show results
print("\nüìä Final Reranked Results:")
for i, doc in enumerate(reranked_docs,1):
    print(f"\nRank: {i}: \n{doc.page_content}")


üìä Final Reranked Results:

Rank: 1: 
ordering changes. LangChain supports re-ranking integrations, making it a suitable framework for experimentation and learning. In a typical LangChain RAG pipeline, documents are first loaded and split into chunks. Each chunk is embedded and stored. During query time, a retriever fetches candidate chunks. These chunks may come from sparse retrieval, dense retrieval, or a hybrid approach. A re-ranker then refines the order of these chunks before they are sent to the language model as context.

Rank: 2: 
strategies, embedding models, and re-rankers, developers can observe how result ordering changes. LangChain supports re-ranking integrations, making it a suitable framework for experimentation and learning. In a typical LangChain RAG pipeline, documents are first loaded and split into chunks. Each chunk is embedded and stored. During query time, a retriever fetches candidate chunks. These chunks may come from sparse retrieval, dense retrieval, or a