Welcome! This is a quick notebook to introduce you to using rerankers in Langchain, at the end of a retrieval pipeline. It's heavily inspired by existing langchain examples.

First, let's define a helper function for printing docs:

In [1]:
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

Then, let's set up a normal document retrieval pipeline, using the common OpenAI embeddings + FAISS combo. If you want to run this example yourself and don't have faiss installed, you'll need to install it for this example! (the document is very small, so `faiss-cpu` is largely enough)

In [2]:
# Vanilla retrieval
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

documents = TextLoader("../data/state_of_the_union.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()

docs = retriever.get_relevant_documents(
    "What did the president say about the minimum wage?"
)
pretty_print_docs(docs)

Document 1:

And so many families are living paycheck to paycheck, struggling to keep up with the rising cost of food, gas, housing, and so much more. 

I understand. 

I remember when my Dad had to leave our home in Scranton, Pennsylvania to find work. I grew up in a family where if the price of food went up, you felt it. 

That’s why one of the first things I did as President was fight to pass the American Rescue Plan.  

Because people were hurting. We needed to act, and we did. 

Few pieces of legislation have done more in a critical moment in our history to lift us out of crisis. 

It fueled our efforts to vaccinate the nation and combat COVID-19. It delivered immediate economic relief for tens of millions of Americans.  

Helped put food on their table, keep a roof over their heads, and cut the cost of health insurance. 

And as my Dad used to say, it gave people a little breathing room.
---------------------------------------------------------------------------------------------

These results are interesting, but nothing about the actual new minimum wage pledge in the top two documents! Let's see if a re-ranker could help...

First, let's load a reranker. The one you load doesn't actually matter -- they all behave exactly the same. For this example, we're using MixedBread's excellent [mixedbread-ai/mxbai-rerank-base-v1](https://huggingface.co/mixedbread-ai/mxbai-rerank-base-v1)

In [3]:
# Load a reranker and convert it to a LangChain compressor
from rerankers import Reranker

ranker = Reranker("mixedbread-ai/mxbai-rerank-base-v1", verbose=0)

  from .autonotebook import tqdm as notebook_tqdm


Loading TransformerRanker model mixedbread-ai/mxbai-rerank-base-v1


Converting it to a Langchain compressor is very straightforward, all you have to do is call `as_langchain_compressor`. You can pass a `k` argument to define how many documents it should retrieve, otherwise, `k` will default to 5.

In [4]:
compressor = ranker.as_langchain_compressor(k=3)

You're all set! Let's just add it to our pipeline and retrieve+rerank documents:

In [5]:
from langchain.retrievers import ContextualCompressionRetriever

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()

compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)


compressed_docs = compression_retriever.get_relevant_documents(
    "What did the president say about the minimum wage?"
)

pretty_print_docs(compressed_docs)

Document 1:

Tonight, I’m announcing a crackdown on these companies overcharging American businesses and consumers. 

And as Wall Street firms take over more nursing homes, quality in those homes has gone down and costs have gone up.  

That ends on my watch. 

Medicare is going to set higher standards for nursing homes and make sure your loved ones get the care they deserve and expect. 

We’ll also cut costs and keep the economy going strong by giving workers a fair shot, provide more training and apprenticeships, hire them based on their skills not degrees. 

Let’s pass the Paycheck Fairness Act and paid leave.  

Raise the minimum wage to $15 an hour and extend the Child Tax Credit, so no one has to raise a family in poverty. 

Let’s increase Pell Grants and increase our historic support of HBCUs, and invest in what Jill—our First Lady who teaches full-time—calls America’s best-kept secret: community colleges.
-------------------------------------------------------------------------

Here it is! There's not much more to show -- just load any reranker you want and try it out!

Remember, not all rerankers work in the same way. It's important to experiment to find out which one works best for your data, and even to fine-tune them if you have the data to do so. The point of this library is to make it easy to try many different approaches to find the best one for your usecase!