https://blog.langchain.dev/improving-document-retrieval-with-contextual-compression/

https://github.com/langchain-ai/langchain/tree/master/libs/langchain/langchain/retrievers/document_compressors

In [2]:
from dotenv import load_dotenv
import os

load_dotenv()

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

In [3]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

docs = PyPDFLoader("state_of_union.pdf").load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(docs)

embeddings = OpenAIEmbeddings()
retriever = FAISS.from_documents(texts, OpenAIEmbeddings()).as_retriever()

In [4]:
retriever.invoke("What is President Biden's message to President Putin?")

[Document(id='ec6d5f2c-c4e8-4cde-833f-eafae5e8b4fc', metadata={'producer': 'Skia/PDF m122', 'creator': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36', 'creationdate': '2024-03-08T23:43:36+00:00', 'moddate': '2024-03-08T23:43:36+00:00', 'source': 'state_of_union.pdf', 'total_pages': 26, 'page': 2, 'page_label': '3'}, page_content='3/8/24, 6:43 PM Biden’s 2024 State of the Union Address: Read the Full Transcript - The New York Times\nhttps://www.nytimes.com/2024/03/08/us/politics/state-of-the-union-transcript-biden.html 3/26\nMy message to President Putin, who I have known for a long time, is simple: We\nwill not walk away. We will not bow down. I will not bow down.\nIn a literal sense, history is watching. History is watching. Just like history\nwatched three years ago on Jan. 6, when insurrectionists stormed this very Capitol\nand placed a dagger to the throat of American democracy.\nMany of you were here on that darkes

In [12]:
# Helper function for printing docs

def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

pretty_print_docs(docs)

Document 1:

3/8/24, 6:43 PM Biden’s 2024 State of the Union Address: Read the Full Transcript - The New York Times
https://www.nytimes.com/2024/03/08/us/politics/state-of-the-union-transcript-biden.html 1/26
In an address that previewed the issues his campaign will focus on in the
November election, President Biden made the case for a second term.
By The New York Times
March 8, 2024Updated 10:19 a.m. ET
President Biden delivered his annual State of the Union address on Thursday to a
joint session of Congress. The following is a transcript of his remarks, as recorded by
The New York Times.
Good evening. Good evening. If I were smart, I would go home now.
Mr. Speaker, Madam Vice President, members of Congress, my fellow Americans,
in January 1941, Franklin Roosevelt came to this chamber to speak to the nation,
and he said, “I address you in a moment, unprecedented in the history of the
union.”
Hitler was on the march. War was raging in Europe. President Roosevelt’s purpose
was to wake u

# Normal retrieval

In [6]:
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

llm = ChatOpenAI(model="gpt-4.1-mini")
chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

In [8]:
query = "What is President Biden's message to President Putin?"
result = chain.invoke(query)
result

{'query': "What is President Biden's message to President Putin?",
 'result': 'President Biden\'s message to President Putin is simple and clear: "We will not walk away. We will not bow down. I will not bow down."'}

# Contexual Compression Retriver

### compressor = LLMChainExtractor

In [14]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

compressor = LLMChainExtractor.from_llm(llm)
compression_retriever=ContextualCompressionRetriever(base_compressor=compressor, base_retriever=retriever)
compressed_docs = compression_retriever.invoke("What is President Biden's message to President Putin?")

In [15]:
pretty_print_docs(compressed_docs)

Document 1:

My message to President Putin, who I have known for a long time, is simple: We
will not walk away. We will not bow down. I will not bow down.
----------------------------------------------------------------------------------------------------
Document 2:

What makes our moment rare is that freedom and democracy are under attack both at home and overseas at the very same time. Overseas, Putin of Russia is on the march, invading Ukraine and sowing chaos throughout Europe and beyond. If anybody in this room thinks Putin will stop at Ukraine, I assure you, he will not. But Ukraine, Ukraine can stop Putin. Ukraine can stop Putin, if we stand with Ukraine and provide the weapons they need to defend itself. That is all — that is all Ukraine is asking. They’re not asking for American soldiers. In fact, there are no American soldiers at war in Ukraine, and I’m determined to keep it that way.
-------------------------------------------------------------------------------------------

In [20]:
chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)
chain.invoke("What is President Biden's message to President Putin?")

{'query': "What is President Biden's message to President Putin?",
 'result': 'President Biden\'s message to President Putin is clear and resolute: "We will not walk away. We will not bow down. I will not bow down."'}

### compressor = LLMChainFilter

In [16]:
from langchain.retrievers.document_compressors import LLMChainFilter

filter = LLMChainFilter.from_llm(llm)
compression_retriever2 = ContextualCompressionRetriever(base_compressor=filter, base_retriever=retriever)
compressed_docs2 = compression_retriever2.invoke("What does Biden say about defending democracy")

In [17]:
pretty_print_docs(compressed_docs2)

Document 1:

3/8/24, 6:43 PM Biden’s 2024 State of the Union Address: Read the Full Transcript - The New York Times
https://www.nytimes.com/2024/03/08/us/politics/state-of-the-union-transcript-biden.html 2/26
What makes our moment rare is that freedom and democracy are under attack
both at home and overseas at the very same time. Overseas, Putin of Russia is on
the march, invading Ukraine and sowing chaos throughout Europe and beyond. If
anybody in this room thinks Putin will stop at Ukraine, I assure you, he will not.
But Ukraine, Ukraine can stop Putin. Ukraine can stop Putin, if we stand with
Ukraine and provide the weapons they need to defend itself. That is all — that is all
Ukraine is asking. They’re not asking for American soldiers. In fact, there are no
American soldiers at war in Ukraine, and I’m determined to keep it that way.
But now, assistance to Ukraine is being blocked by those who want to walk away
from our world leadership. Wasn’t long ago when a Republican president n

In [21]:
chain2 = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever2)
chain2.invoke("What does Biden say about defending democracy?")

{'query': 'What does Biden say about defending democracy?',
 'result': 'In his 2024 State of the Union Address, Biden emphasizes the critical importance of defending democracy both at home and overseas. He highlights that democracy and freedom are under attack, citing the Russian invasion of Ukraine and the January 6 insurrection as grave threats. Biden states that the insurrectionists who stormed the Capitol were not patriots but had aimed to overturn the will of the people. He calls for honesty about the truth of those events and insists on burying the lies connected to them.\n\nBiden urges all members of Congress, regardless of party, to join together in defending democracy by respecting free and fair elections, restoring trust in institutions, and making it clear that political violence has no place in America. He conveys that history is watching how the nation responds to these assaults on freedom and stresses the need to stand strong for democracy. Ultimately, Biden sees a future

### compressor = EmbeddingsFilter

In [18]:
from langchain.retrievers.document_compressors import EmbeddingsFilter

embeddings_filter = EmbeddingsFilter(embeddings=embeddings)

compression_retriever3 = ContextualCompressionRetriever(base_compressor=embeddings_filter, base_retriever=retriever)
compressed_docs3 = compression_retriever3.invoke("What does Biden say about defending democracy?")

In [19]:
pretty_print_docs(compressed_docs3)

Document 1:

3/8/24, 6:43 PM Biden’s 2024 State of the Union Address: Read the Full Transcript - The New York Times
https://www.nytimes.com/2024/03/08/us/politics/state-of-the-union-transcript-biden.html 2/26
What makes our moment rare is that freedom and democracy are under attack
both at home and overseas at the very same time. Overseas, Putin of Russia is on
the march, invading Ukraine and sowing chaos throughout Europe and beyond. If
anybody in this room thinks Putin will stop at Ukraine, I assure you, he will not.
But Ukraine, Ukraine can stop Putin. Ukraine can stop Putin, if we stand with
Ukraine and provide the weapons they need to defend itself. That is all — that is all
Ukraine is asking. They’re not asking for American soldiers. In fact, there are no
American soldiers at war in Ukraine, and I’m determined to keep it that way.
But now, assistance to Ukraine is being blocked by those who want to walk away
from our world leadership. Wasn’t long ago when a Republican president n

In [22]:
chain3 = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever3)
chain3.invoke("What does Biden say about defending democracy?")

{'query': 'What does Biden say about defending democracy?',
 'result': 'In his 2024 State of the Union Address, Biden emphasizes that democracy is under significant threat both domestically and internationally. He highlights the attack on American democracy posed by the January 6 insurrection and election-related lies, calling them the gravest threat since the Civil War. Biden insists that these threats were met with resilience as "America stood strong and democracy prevailed." He stresses the importance of defending democracy honestly, speaking the truth, and burying lies. Biden urges all members of government, regardless of party, to join together to defend democracy by respecting free and fair elections, restoring trust in institutions, and rejecting political violence entirely. He insists that defending democracy involves protecting freedoms and ensuring fairness, such as by restoring the right to choose and ensuring the wealthy pay their fair share in taxes. Throughout the address

### compressor  = DocumentCompressorPipeline

In [24]:
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain_community.document_transformers import EmbeddingsRedundantFilter
from langchain_text_splitters import CharacterTextSplitter

redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)

relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
pipeline_compressor = DocumentCompressorPipeline(transformers=[redundant_filter, relevant_filter])
compression_retriever4 = ContextualCompressionRetriever(base_compressor=pipeline_compressor, base_retriever=retriever)
compressed_docs4 = compression_retriever4.invoke("What does Biden say about defending democracy?")


In [25]:
pretty_print_docs(compressed_docs4)

Document 1:

3/8/24, 6:43 PM Biden’s 2024 State of the Union Address: Read the Full Transcript - The New York Times
https://www.nytimes.com/2024/03/08/us/politics/state-of-the-union-transcript-biden.html 2/26
What makes our moment rare is that freedom and democracy are under attack
both at home and overseas at the very same time. Overseas, Putin of Russia is on
the march, invading Ukraine and sowing chaos throughout Europe and beyond. If
anybody in this room thinks Putin will stop at Ukraine, I assure you, he will not.
But Ukraine, Ukraine can stop Putin. Ukraine can stop Putin, if we stand with
Ukraine and provide the weapons they need to defend itself. That is all — that is all
Ukraine is asking. They’re not asking for American soldiers. In fact, there are no
American soldiers at war in Ukraine, and I’m determined to keep it that way.
But now, assistance to Ukraine is being blocked by those who want to walk away
from our world leadership. Wasn’t long ago when a Republican president n

In [26]:
chain4 = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever4)
chain4.invoke("What does Biden say about defending democracy?")

{'query': 'What does Biden say about defending democracy?',
 'result': 'In his 2024 State of the Union address, President Biden emphasizes the importance of defending democracy against threats both foreign and domestic. He highlights that democracy and freedom are under attack, referencing events such as the January 6 insurrection as the gravest threat to U.S. democracy since the Civil War. Biden insists on speaking the truth and burying lies about these events, stating that political violence has no place in America. He calls on all, regardless of party, to join together to defend democracy, respect free and fair elections, restore trust in institutions, and remember their oath of office. Biden stresses that defending democracy is crucial for the future, where freedoms are protected and opportunities made fairer for all.'}