#### **Contextual Compression**

- When we retrieve chunks from Retriever, they most of the text within those chunks is irrelevant. 

- So on top of that retriever we'll apply another contextual compression retriever that will compress the contextual chunks for us.

#### **Advantages**

- **Improved Relevancy** -> Chunks would be more relevant to the query asked.
- **Increased efficiency** -> As the size of chunks is decreased, token consumption would go down and also time to process this information.

<p align="center">
    <img src="../contextual_compression.png" width="400" height="700"/>
</p>

---

##### **LLM used**

In [1]:
from langchain_ollama import ChatOllama 

llm = ChatOllama(
    model='llama3.2',
    temperature=0,
    verbose=True
)

print(llm.invoke("What are you doing?"))

  from .autonotebook import tqdm as notebook_tqdm


content="I'm an AI designed to assist and communicate with users like you. I'm a large language model, which means I can understand and respond to natural language inputs.\n\nRight now, I'm waiting for your next question or prompt. You can ask me anything, from general knowledge questions to creative writing prompts, and I'll do my best to provide helpful and accurate responses.\n\nIf you need help with something specific, feel free to ask, and I'll get started!" additional_kwargs={} response_metadata={'model': 'llama3.2', 'created_at': '2025-12-19T16:41:45.946513Z', 'done': True, 'done_reason': 'stop', 'total_duration': 26135700666, 'load_duration': 3636857958, 'prompt_eval_count': 30, 'prompt_eval_duration': 12807944417, 'eval_count': 94, 'eval_duration': 6692792916, 'logprobs': None, 'model_name': 'llama3.2', 'model_provider': 'ollama'} id='lc_run--dfb3093d-4223-4548-9f56-b7eb1b2776de-0' usage_metadata={'input_tokens': 30, 'output_tokens': 94, 'total_tokens': 124}


##### **Embedding Model used**

From Huggingface sentence transformers we used HuggingFaceEmbeddings 

In [2]:
from langchain_huggingface import HuggingFaceEmbeddings 
import time 

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

text = "This is a test document."

start = time.time()
query_result = embedding_model.embed_query(text)
total_time = time.time() - start
# show only the first 100 characters of the stringified vector
print(f"Length of text embedding : {len(text)}")
print(f"Time taken to convert text to embedding : {total_time :.2f} sec")
print(str(query_result)[:100] + "...")

Length of text embedding : 24
Time taken to convert text to embedding : 4.93 sec
[-0.0383385606110096, 0.1234646886587143, -0.02864295430481434, 0.05365273356437683, 0.0088453618809...


In [None]:
# use this embedding model to create embeddings of text present

from langchain_core.prompts import PromptTemplate 
from pydantic import BaseModel, Field 
from typing import Annotated, List 

# making a data validation class 
class CompressedChunks(BaseModel):
    """
    This will create chunks which are from the original chunks but are compressed tto small length and contains the context.
    """

    compressed_chunk: Annotated[str, Field(description="This will be compressed chunk which contains most of the context of original chunk but shoreter in length.")]


# configure the llm with structured output 
compress_chunk_llm = llm.with_structured_output(CompressedChunks)

# writing system Prompt for this task of compressing 
compress_chunks = """ 

"""