### Dedicated Re-ranking

Standard retrieval methods give us a ranked list of documents, but this initial ranking isnâ€™t always perfect. Re-ranking is a crucial second-pass step where we take the initial set of retrieved documents and use a more sophisticated (and often more expensive) model to re-order them based on their relevance to the query.

In [1]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

  from .autonotebook import tqdm as notebook_tqdm
USER_AGENT environment variable not set, consider setting it to identify your requests.


In [8]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [3]:
from langchain_community.vectorstores import Chroma
from langchain_huggingface import HuggingFaceEndpointEmbeddings

hf_embeddings = HuggingFaceEndpointEmbeddings(
    model="Qwen/Qwen3-Embedding-8B",
    task="feature-extraction",
    huggingfacehub_api_token=os.getenv("HUGGINGFACE_API_KEY")
)

In [4]:
# Load, split, and index the document
loader = WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",))
blog_docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=300, chunk_overlap=50)
splits = text_splitter.split_documents(blog_docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=hf_embeddings)

# First-pass retriever: get the top 10 potentially relevant documents
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

Now, we introduce the `ContextualCompressionRetriever`. This special retriever wraps our base retriever and adds a "compressor" step. Here, our compressor will be the CohereRerank model.

It will take the 10 documents from our base retriever and re-order them, returning only the most relevant ones.

In [9]:
from langsmith import Client
from langchain_classic.retrievers.contextual_compression import (
    ContextualCompressionRetriever,
)
from langchain_cohere import ChatCohere, CohereRerank

# llm = ChatCohere(temperature=0)
compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

In [13]:
def pretty_print_docs(compressed_docs):
    print("--- Re-ranked and Compressed Documents ---")
    for doc in compressed_docs:
        print(f"Relevance Score: {doc.metadata['relevance_score']:.4f}")
        print(f"Content: {doc.page_content[:150]}...\n")
        print("-"*100)

In [14]:
# Let's test it with our query
question = "What is task decomposition for LLM agents?"

reranked_docs = compression_retriever.invoke(question)
pretty_print_docs(reranked_docs)

--- Re-ranked and Compressed Documents ---
Relevance Score: 0.9988
Content: Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chai...

----------------------------------------------------------------------------------------------------
Relevance Score: 0.9931
Content: LLM Powered Autonomous Agents | Lil'Log







































Lil'Log

















|






Posts




Archive




Search




Tags
...

----------------------------------------------------------------------------------------------------
Relevance Score: 0.9898
Content: Component Three: Tool Use

Case Studies

Scientific Discovery Agent

Generative Agents Simulation

Proof-of-Concept Examples


Challenges

Citation

R...

----------------------------------------------------------------------------------------------------
