# Re-Ranking Hybrid Search Strategies

#### Re-ranking is second-stage filtering process in retrieval systems, especially in RAG pipelines, where we:
- 1. First use a fast retriever(like BM25, FAISS, hybrid) to fetch top k records quickly.
- 2. Then use a more accurate but slower model (like a cross-encoder or LLM) to re-score and reorder those documents by relevance to the query.
#### ðŸ‘‰ It ensures most relevant document appears in the top thereby improving the LLM final answer.

In [1]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_classic.chat_models import init_chat_model
from langchain_core.prompts import PromptTemplate
from langchain_core.documents import Document
from langchain_core.output_parsers import StrOutputParser

  from .autonotebook import tqdm as notebook_tqdm


In [14]:
## Load text file
loader=TextLoader("langchain_sample.txt")
raw_doc=loader.load()

# Split documents into chunks
splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs=splitter.split_documents(raw_doc)
docs


[Document(metadata={'source': 'langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(metadata={'source': 'langchain_sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.'),
 Document(metadata={'source': 'langchain_sample.txt'}, page_content='Retrieval-Augmented Generation (RAG) is a powerful technique where external knowledge is retrieved and passed into the prompt to ground LLM responses. LangChain makes it easy to implement RAG using vector databases like FAISS, Chroma, and Pinecone.\nBM25 is a traditional 

In [3]:
## User query
query="How can I use langchain to build an application with memory and tools?"

In [4]:
### FAISS vector store and HuggingFace models for embeddings

from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings

embedding_model=HuggingFaceEmbeddings(model="all-MiniLM-L6-v2")
vectorstore=FAISS.from_documents(docs, embedding_model)
retriever=vectorstore.as_retriever(search_kwargs={"k":8})

In [5]:
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x169014b10>, search_kwargs={'k': 8})

In [20]:
## Reranking prompt and LLM
from langchain_classic.chat_models import init_chat_model
import os
os.environ['OPENAI_API_KEY']=os.getenv("OPENAI_API_KEY")
os.environ['GROQ_API_KEY']=os.getenv("GROQ_API_KEY")
llm=init_chat_model("groq:llama-3.1-8b-instant")

In [7]:
# Prompt template
prompt=PromptTemplate.from_template("""You are a helpful assistant. Your task is to rank the follwing
                                    from most to least relevant
                                    
                                    User Question: {question}
                                    Document: {documents} 
                                    
                                    Instructions:
                                    - Think about the relevance of each document to the user's question.
                                    - Return the list of document   indices in ranked order, starting from the most relevant.

                                    Output format: comma separated document indices 
                                    """ )

In [8]:
retrieved_docs=retriever.invoke(query)
retrieved_docs

[Document(id='3b42a25f-a2d2-42d4-82bb-b4cdb003e2e8', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.'),
 Document(id='4b103b8b-8237-47f4-ada8-87656a758f3b', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='8b709901-34d5-45d9-bf85-40d9a8933dbf', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain integrates with many third-party services such as OpenAI, Hugging F

In [9]:
doc_lines=[f"{i+1}, {doc.page_content}" for i, doc in enumerate(retrieved_docs)]
formatted_docs="\n".join(doc_lines)

In [16]:
chain=prompt|llm|StrOutputParser()
response=chain.invoke({"question":query, "documents": formatted_docs })
print(response)

Based on the user's question, I would rank the documents from most to least relevant as follows:

3, 1, 2, 5, 4, 6

Here's the reasoning behind the ranking:

- Document 3 directly mentions tool integration, which is the key aspect of the user's question.
- Document 1 explains memory in LangChain, which is a crucial component for building a stateful application, aligning with the user's goal.
- Document 2 provides a general overview of LangChain, but it's still relevant as it mentions components that are relevant to the user's question.
- Document 5 discusses Retrieval-Augmented Generation (RAG), which is a specific technique that can be used in building an application with memory and tools.
- Document 4 is about FAISS, a library that's mentioned in Document 5, but it's not directly related to the user's question.
- Document 6 discusses dense retrieval, which is a specific retrieval method that's mentioned in Document 5, but it's not as directly relevant to the user's question as the ot

In [17]:
# Step 7: Parse and rerank
indices=[int(x.strip())-1 for x in response.split(",") if x.strip().isdigit()]
indices


[0, 1, 4, 3]

In [18]:
reranked_docs=[retrieved_docs[i] for i in indices if 0<=i<len(retrieved_docs)]
reranked_docs

[Document(id='3b42a25f-a2d2-42d4-82bb-b4cdb003e2e8', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.\nMemory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.'),
 Document(id='4b103b8b-8237-47f4-ada8-87656a758f3b', metadata={'source': 'langchain_sample.txt'}, page_content='LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.'),
 Document(id='b6e90e31-7ff4-4e0e-8cbc-5ed68a47356c', metadata={'source': 'langchain_sample.txt'}, page_content='Retrieval-Augmented Generation (RAG) is a powerful technique where external k

In [19]:
# Show result
print("\n Final Reranked result:\n")
for i, doc in enumerate(reranked_docs, 1):
    print(f"\nRank {i}: \n{doc.page_content}")


 Final Reranked result:


Rank 1: 
LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.
Memory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.

Rank 2: 
LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.

Rank 3: 
Retrieval-Augmented Generation (RAG) is a powerful technique where external knowledge is retrieved and passed into the prompt to ground LLM responses. LangChain makes it easy to implement RAG using vector databases like FAISS, Chroma, and Pinecone.
BM25 is a traditional sparse retrieval method that scores documents based on keyword matching. Although fast, it of