# Contextual Compression

https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/contextual_compression/

https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.contextual_compression.ContextualCompressionRetriever.html

Community compressors

https://api.python.langchain.com/en/latest/community_api_reference.html#module-langchain_community.document_compressors


* Using ChromaDB as base retriever
* Using Cohere command as LLM

## Import packages

In [4]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_core.documents import Document
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_cohere import CohereEmbeddings

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.retrievers.document_compressors import LLMChainFilter
from langchain.retrievers.document_compressors import EmbeddingsFilter
from langchain_community.document_transformers import EmbeddingsRedundantFilter
from langchain.retrievers.document_compressors import DocumentCompressorPipeline

import warnings 
# Settings the warnings to be ignored 
warnings.filterwarnings('ignore') 

## 1. Create an LLM
The LLM will be used by the compression strategy classes

* Cohere command model
* Cohere embedding model

#### Note
* You must adjust the location of the API key file

In [5]:
from dotenv import load_dotenv
import sys
import json

# Load the file that contains the API keys - OPENAI_API_KEY
load_dotenv('C:\\Users\\raj\\.jupyter\\.env')

# setting path
sys.path.append('../')

from utils.create_chat_llm import create_gpt_chat_llm, create_cohere_chat_llm

# Try with GPT
llm = create_cohere_chat_llm()

llm_embeddings = CohereEmbeddings()

## 2. Utility function

* Pretty prints the documents before/after compression

In [6]:
def print_documents(docs):
    for i, doc in enumerate(docs):
        print("#",i)
        print(doc.page_content)

def  dump_before_after_compression(base_retriever, compressor, question) :  #(: #bef, aft):
    results_before = base_retriever.invoke(question)
    results_after = compressor.invoke(question)
    
    print("BEFORE. Doc count = ", len(results_before))
    print("--------------------------------------------------")
    print_documents(results_before)
    print("--------------------------------------------------")
    print("AFTER. Doc count = ", len(results_after))
    print_documents(results_after)

## 3. Setup base retriever

* using ChromaDB as a base retriever 

In [7]:
# Create the Chroma vector store
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
vector_store = Chroma(collection_name="full_documents", embedding_function=embedding_function) 

# Load sample docs
loader = DirectoryLoader('./util', glob="**/*.txt")
docs = loader.load()

# Chunking
doc_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
chunked_documents = doc_splitter.split_documents(docs)

# Add to vector DB
vector_store.add_documents(chunked_documents)

# Base retrievers
vector_store_retriever = vector_store.as_retriever()

## 4. LLMChainExtractor

Uses an LLM to extract relevant parts of a document.

https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.document_compressors.chain_extract.LLMChainExtractor.html

In [8]:
# Create the compressor
llm_chain_extractor_compressor = LLMChainExtractor.from_llm(llm)

# Create the retriever
llm_chain_extractor_compressor_retriever = ContextualCompressionRetriever(
    base_retriever=vector_store_retriever, 
    base_compressor=llm_chain_extractor_compressor)

### Test
* Apply compression to retrieved results
* Print the before/after results for comparison

In [9]:
question = "what is rag?"

dump_before_after_compression(vector_store_retriever, llm_chain_extractor_compressor_retriever, question)


BEFORE. Doc count =  4
--------------------------------------------------
# 0
Retrieval augmented generation (RAG)

Retrieval augmented generation, or RAG, helps ensure model outputs are grounded on your data. Instead of relying on the model’s training knowledge, AI apps architected for RAG can search your data for information relevant to a query, then pass that information into the prompt. This is similar to prompt engineering, except that the system can find and retrieve new context from your data with each interaction.
# 1
The RAG approach supports fresh data that’s constantly updated, private data that you connect, large-scale and multimodal data, and more — and it’s supported by an increasingly robust ecosystem of products, from simple integrations with databases to embedding APIs and other components for bespoke systems.

Supervised fine

tuning (SFT)
# 2
Which one to go for?

The first question to consider is: do you need the model to always give a citation of a source grounded 

## 5. LLM Chain Filter

Drops documents  that are not relevant for the query.

https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.document_compressors.chain_filter.LLMChainFilter.html

In [None]:
# Create the compressor
llm_chain_filter_compressor = LLMChainFilter.from_llm(llm)

# Create the retriever
llm_chain_filter_compressor_retriever = ContextualCompressionRetriever(
    base_retriever=vector_store_retriever, 
    base_compressor=llm_chain_filter_compressor)

In [None]:
question = "what is rag?"

dump_before_after_compression(vector_store_retriever, llm_chain_filter_compressor_retriever, question)

## 6. Embeddings Filter

Uses embeddings to drop documents unrelated to the query.

https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.document_compressors.embeddings_filter.EmbeddingsFilter.html

Making an extra LLM call over each retrieved document is expensive and slow. The EmbeddingsFilter provides a cheaper and faster option by embedding the documents and query and only returning those documents which have sufficiently similar embeddings to the query.

In [None]:
# Create the compressor
# Play with the threshold to understand the behavior
similarity_threshold = 0.5
embeddings_filter = EmbeddingsFilter(embeddings=llm_embeddings, similarity_threshold=similarity_threshold)

# Create the retriever
llm_embeddings_filter_compressor_retriever = ContextualCompressionRetriever(
    base_retriever=vector_store_retriever, 
    base_compressor=embeddings_filter)

In [None]:
question = "what is rag?"

dump_before_after_compression(vector_store_retriever, llm_embeddings_filter_compressor_retriever, question)

## 7. Compressor pipeline

Document compressor that uses a pipeline of Transformers.

https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.document_compressors.base.DocumentCompressorPipeline.html



In [None]:
transformers = [llm_chain_filter_compressor, embeddings_filter]
pipeline_compressor = DocumentCompressorPipeline(transformers=transformers)
# Create the retriever
pipeline_compressor_retriever = ContextualCompressionRetriever(
    base_retriever=vector_store_retriever, 
    base_compressor=pipeline_compressor)

In [None]:
question = "what is rag?"

dump_before_after_compression(vector_store_retriever, pipeline_compressor_retriever, question)