🧠 What is Context Compression?<br>
In Retrieval-Augmented Generation (RAG), context compression means reducing the size of retrieved documents before passing them to the LLM, while keeping only the most relevant parts.<br>
<br>
✅ Why use it?<br>
LLMs have context limits (e.g., 4k, 8k, 32k tokens). If you retrieve a lot of documents, you might hit this limit.<br>
<br>
So, context compression:<br>
🗜️ Shrinks the retrieved docs<br>
🎯 Keeps only the important sections<br>
🚀 Improves efficiency and relevance<br>
💰 Saves tokens = money if you're using paid APIs<br>
<br>
🔧 How it works:<br>
Usually involves a second LLM pass to:<br>
Summarize the documents<br>
Or extract only relevant chunks for the query<br>
Or score and drop unimportant text<br>
<br>
🧪 Example:<br>
User query: "Explain the attention mechanism in transformers"<br>
🔹 Normal retriever fetches 10 full pages of transformer papers.<br>
🔹 With context compression, only the few paragraphs about attention are kept and passed to the LLM.

In [3]:
from langchain_chroma.vectorstores import Chroma
from langchain_ollama.embeddings import OllamaEmbeddings
from langchain_ollama.chat_models import ChatOllama
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain.retrievers.document_compressors.chain_extract import LLMChainExtractor

In [2]:
embeddings = OllamaEmbeddings(model = "llama3.2:1b")
db = Chroma(persist_directory="./vectorsMKUltra", embedding_function=embeddings)
chat = ChatOllama(model = "llama3.2:1b", temperature=0.2)

In [5]:
compressor = LLMChainExtractor.from_llm(chat)
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=db.as_retriever())

In [6]:
docs= db.similarity_search("What was this declassified?")

In [9]:
len(docs)

4

In [13]:
print(docs[0].metadata["summary"])

MKNAOMI is the code name for a joint Department of Defense/CIA research program from the 1950s through to the 1970s. Unclassified information about the MKNAOMI program and the related Special Operations Division is scarce. It is generally reported to be a successor to the MKULTRA project focusing on biological projects including biological warfare agents—specifically, to store materials that could either incapacitate or kill a test subject and to develop devices for the diffusion of such materials.
During its first twenty years, the CIA engaged in projects designed to increase U.S. biological and chemical warfare capabilities. Project MKNAOMI was initiated to provide the CIA with a covert support base to meet its top-secret operational requirements. The goal was to have a robust arsenal of lethal and incapacitating materials within the CIA's Technical Services Division (TSD). This would enable the TSD to serve as a center for supplying biological and chemical materials.
Surveillance, t

In [14]:
compressedDocs = compression_retriever.get_relevant_documents("What was this declassified?")

  compressedDocs = compression_retriever.get_relevant_documents("What was this declassified?")


In [15]:
len(compressedDocs)

3

In [24]:
print(compressedDocs[2].metadata["summary"])

The Central Intelligence Agency (CIA ) is a civilian foreign intelligence service of the federal government of the United States tasked with advancing national security through collecting and analyzing intelligence from around the world and conducting covert operations. The agency is headquartered in the George Bush Center for Intelligence in Langley, Virginia, and is sometimes metonymously called "Langley". A major member of the United States Intelligence Community (IC), the CIA has reported to the director of national intelligence since 2004, and is focused on providing intelligence for the president and the Cabinet.
The CIA is headed by a director and is divided into various directorates, including a Directorate of Analysis and Directorate of Operations. Unlike the Federal Bureau of Investigation (FBI), the CIA has no law enforcement function and focuses on intelligence gathering overseas, with only limited domestic intelligence collection. The CIA is responsible for coordinating al

In [25]:
print(compressedDocs[2].page_content)

* CIA
* Langley
* George Bush Center for Intelligence
* United States Intelligence Community (IC)
* Directorate of Analysis
* Directorate of Operations
* Human intelligence (HUMINT) activities
* Paramilitary operations units
* Special Activities Center
* World War II
* Office of Strategic Services (OSS)
* National Security Act of 1947
* Central Intelligence Agency Act of 1949


# Well thats all this is very clear, so instead sending the whole context just to reduce the token cost and the context size this context compression is used. Faster, cheaper and relevant.