# Technique 2: Contextual Compression

## The Problem
Basic RAG retrieves entire document chunks (often 500-1000 tokens each). Most of this content is **irrelevant** to the query. You're:
- Wasting tokens (cost)
- Adding noise (confusion)
- Hitting context limits

## The Solution
Use an LLM to **compress** retrieved chunks, keeping only information relevant to the query.

**Difficulty:** ⭐⭐⭐☆☆

## Step 1: Imports

In [1]:
from utils_openai import setup_openai_api, create_embeddings, create_llm, load_msme_data, create_vectorstore, get_baseline_prompt, count_tokens_approximate
from langchain_classic.retrievers.document_compressors import LLMChainExtractor
from langchain_classic.retrievers import ContextualCompressionRetriever
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

print('[OK] Imports done!')

  from .autonotebook import tqdm as notebook_tqdm


[OK] Imports done!


## Step 2: Setup

In [2]:
api_key = setup_openai_api()
embeddings = create_embeddings(api_key)
llm = create_llm(api_key)
docs, metas, ids = load_msme_data('msme.csv')
vectorstore = create_vectorstore(docs, metas, ids, embeddings, 'msme_t3', './chroma_db_t3')
base_retriever = vectorstore.as_retriever(search_kwargs={'k': 5})
print('[OK] Base retriever ready!')

[OK] Initialized embeddings: text-embedding-3-small
[OK] Initialized LLM: gpt-4o-mini (temp=0)
[OK] Loaded 14 documents from msme.csv
[OK] Created vector store: msme_t3 (14 docs)
[OK] Base retriever ready!


## Step 3: Create Compressor
The LLM will filter each retrieved chunk:

In [3]:
compressor = LLMChainExtractor.from_llm(llm)
print('[OK] Compressor created!')

[OK] Compressor created!


## Step 4: Wrap with ContextualCompressionRetriever

In [4]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=base_retriever
)
print('[OK] Compression retriever ready!')

[OK] Compression retriever ready!


## Step 5: Build RAG Chain

In [5]:
prompt = get_baseline_prompt()

compression_rag_chain = (
    {'context': compression_retriever, 'question': RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
print('[OK] Compression RAG chain ready!')

[OK] Compression RAG chain ready!


## Step 6: Test and Compare

In [6]:
question = 'How do I register a construction business?'

# Get baseline (uncompressed) docs
baseline_docs = base_retriever.invoke(question)
baseline_text = '\n\n'.join([d.page_content for d in baseline_docs])
baseline_tokens = count_tokens_approximate(baseline_text)

# Get compressed docs
compressed_docs = compression_retriever.invoke(question)
compressed_text = '\n\n'.join([d.page_content for d in compressed_docs])
compressed_tokens = count_tokens_approximate(compressed_text)

print(f'BASELINE: {len(baseline_docs)} docs, ~{baseline_tokens} tokens')
print(f'COMPRESSED: {len(compressed_docs)} docs, ~{compressed_tokens} tokens')
print(f'REDUCTION: {((baseline_tokens-compressed_tokens)/baseline_tokens*100):.1f}%')

# Get answer
answer = compression_rag_chain.invoke(question)
print(f'\nANSWER:\n{answer}')

BASELINE: 5 docs, ~20391 tokens
COMPRESSED: 3 docs, ~1819 tokens
REDUCTION: 91.1%

ANSWER:
To register a construction business in Nigeria, you must first complete the registration process with the Corporate Affairs Commission (CAC) by submitting the necessary documents, including completed CAC forms, valid identification of shareholders/directors, and the Memorandum and Articles of Association (MEMART). Additionally, you will need to register with the Council for the Regulation of Engineering in Nigeria (COREN) and the Council of Registered Builders of Nigeria (CORBON), which require specific documentation such as proof of engineering qualifications and company profiles. Ensure compliance with industry-specific licenses and permits before commencing operations. For detailed requirements, refer to the COREN and CORBON guidelines available on their respective websites.


## When to Use
**Use when:**
- Long retrieved documents
- High token costs
- Need focused context

**Avoid when:**
- Documents already short
- Extra LLM call unacceptable
- Need full context

## Exercise
1. Compare token usage before/after compression
2. Test with different queries
3. Check if quality improves or degrades


In [7]:
# Your code here

**Next:** Technique 3 - Semantic Chunking