### 📖 Where We Are

**In the last notebook**, we explored **Hybrid Search**, a technique that combines dense (semantic) and sparse (keyword) retrievers to improve recall and get a more robust set of initial documents.

**In this notebook**, we'll learn how to take our retrieved results and make them even better with **Reranking**. Reranking is a second-stage process where a more powerful model scrutinizes the initial list of documents to re-order them based on true relevance. This adds a layer of intelligence to our RAG pipeline, ensuring the final LLM gets the highest possible quality context to work with.

### 1. Reranking with an LLM

Reranking is a second-stage filtering process in retrieval systems. The typical workflow is:

1.  **Retrieve**: Use a fast, efficient retriever (like FAISS or a hybrid retriever) to fetch an initial, larger set of top-k documents (e.g., k=10).
2.  **Rerank**: Use a more accurate but slower model (like a powerful LLM or a cross-encoder) to re-score and reorder those 10 documents based on their relevance to the specific query.
3.  **Generate**: Pass the final, re-ordered top-k documents (e.g., the best 3 out of the original 10) to the LLM for answer generation.

👉 This ensures that the most relevant documents appear at the top, significantly improving the quality of the final answer.

In [1]:
# --- Standard LangChain Imports ---
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import init_chat_model
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from langchain_core.output_parsers import StrOutputParser

In [2]:
# --- 1. Load and Split Documents ---
# Load a sample text file containing information about LangChain.
loader = TextLoader("langchain_sample.txt")
raw_docs = loader.load()

# Split the text into smaller, manageable document chunks.
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(raw_docs)

In [3]:
# Define the user query we will be working with.
query = "How can I use LangChain to build an application with memory and tools?"

In [4]:
# --- 2. Initial Retrieval ---
# We'll set up a standard FAISS vector store retriever.
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embedding_model)

# Note: We retrieve a larger number of documents (k=6) than we might actually need.
# This gives the reranker more candidates to choose from, improving the final selection.
retriever = vectorstore.as_retriever(search_kwargs={"k": 6})

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
# --- 3. Reranking with an LLM ---
# Initialize the LLM that will act as our reranker.
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")
llm = init_chat_model("groq:openai/gpt-oss-20b")

In [6]:
# We will use prompt engineering to instruct the LLM to perform the reranking task.
# The prompt provides the query and the list of retrieved documents.
# It explicitly asks the LLM to return only a comma-separated list of the document indices in their new order of relevance.
rerank_prompt = PromptTemplate.from_template("""
You are a helpful assistant. Your task is to rank the following documents from most to least relevant to the user's question.

User Question: "{question}"

Documents:
{documents}

Instructions:
- Carefully evaluate the relevance of each document to the user's question.
- Return a list of document indices in their new ranked order, from most to least relevant.
- The output should be a comma-separated list of numbers (e.g., 2,1,3,0).
""")

In [7]:
# Create the reranking chain using LCEL.
reranker_chain = rerank_prompt | llm | StrOutputParser()

In [8]:
# First, perform the initial retrieval.
retrieved_docs = retriever.invoke(query)

# Format the retrieved documents to be passed into the reranking prompt.
# We enumerate them so the LLM can refer to them by index.
doc_list = [f"{i}. {doc.page_content}" for i, doc in enumerate(retrieved_docs)]
formatted_docs = "\n".join(doc_list)

print("--- Initially Retrieved Documents (in order) ---")
print(formatted_docs)

--- Initially Retrieved Documents (in order) ---
0. LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.
Memory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.
1. LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.
2. LangChain integrates with many third-party services such as OpenAI, Hugging Face, and Cohere. This enables developers to experiment with different models and optimize performance for specific use cases like summarization, question answering, or translation.
3. FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional spac

In [9]:
# Invoke the reranker chain with the query and the formatted documents.
reranked_order_str = reranker_chain.invoke({"question": query, "documents": formatted_docs})
print("LLM's Reranked Order:", reranked_order_str)

LLM's Reranked Order: 0,1,3,2,4,5


In [10]:
# Parse the LLM's string output into a list of integer indices.
# This can be a bit brittle, so more robust parsing might be needed in production.
import re
reranked_indices = [int(x.strip()) for x in re.findall(r'\d+', reranked_order_str)]

# Use the ranked indices to reorder our original list of retrieved documents.
reranked_docs = [retrieved_docs[i] for i in reranked_indices if i < len(retrieved_docs)]

print("--- Final Reranked Documents ---")
for i, doc in enumerate(reranked_docs, 1):
    # This line has been corrected to form a valid f-string.
    print(f"\nRank {i}:\n{doc.page_content}")

--- Final Reranked Documents ---

Rank 1:
LangChain supports tool integration including web search, calculators, and APIs, allowing LLMs to interact with external systems and respond more accurately to dynamic queries.
Memory in LangChain enables context retention across multiple steps in a conversation or task, making the application more coherent and stateful.

Rank 2:
LangChain is a flexible framework designed for developing applications powered by large language models (LLMs). It provides tools and abstractions to work with LLMs more effectively and includes components for prompt management, chains, memory, and agents.

Rank 3:
FAISS is a popular library used for fast approximate nearest neighbor search in high-dimensional spaces. It supports both flat and compressed indexes, which makes it scalable for large document stores.
Agents in LangChain are chains that use LLMs to decide which tools to use and in what order. This makes them suitable for multi-step tasks like question answe

### 🔑 Key Takeaways

* **Reranking Boosts Precision**: Reranking is a two-stage process (fast retrieval, then accurate re-scoring) that prioritizes the absolute most relevant documents, significantly improving the precision of your context.
* **LLMs as Rerankers**: You can effectively use a powerful LLM as a sophisticated reranker. By providing the initial documents and a carefully crafted prompt, you can leverage the LLM's deep understanding to re-evaluate relevance.
* **Retrieve More, Then Filter**: A common pattern is to retrieve more documents than you need in the first stage (e.g., top 10) and then use the reranker to select and order the final, most relevant set (e.g., top 3).
* **Trade-offs**: This method adds an extra LLM call, which introduces a small amount of latency and cost. However, the resulting improvement in context quality often leads to better, faster, and more accurate final answers, justifying the trade-off.