### Contextual Compression

The main idea of RAG with Contextual Compression is to improve the quality of the documents before sending it to a LLM. Instead of sending the raw chunks from the retrieval part, the documents chunks will be processed and the not usefull information discarded.

![contextual_comp](img/contextual-compression.png)

- Reference 1: https://medium.com/@SrGrace_/contextual-compression-langchain-llamaindex-7675c8d1f9eb
- Reference 2: https://python.langchain.com/docs/how_to/contextual_compression/

![img1](img/img1.png)

In [1]:
import os
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader, TextLoader, PyMuPDFLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_openai import ChatOpenAI

In [2]:
os.environ['SENTENCE_TRANSFORMERS_HOME'] = '/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/huggingface-models'

### Load documents

In [3]:
loader = DirectoryLoader(
    path = '/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/data/boardgames_rulebooks/',
    glob="./*.pdf",
    loader_cls=PyMuPDFLoader,
    show_progress=True
)
documents = loader.load()
print('Documents: ',len(documents))

100%|██████████| 4/4 [00:00<00:00, 11.52it/s]

Documents:  45





In [4]:
documents[0]

Document(metadata={'producer': '1jour-1jeu.com', 'creator': '1jour-1jeu.com', 'creationdate': '2013-12-11T15:42:07+01:00', 'source': '/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/data/boardgames_rulebooks/ba-splendor-rulebook copy.pdf', 'file_path': '/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/data/boardgames_rulebooks/ba-splendor-rulebook copy.pdf', 'total_pages': 4, 'format': 'PDF 1.4', 'title': 'Splendor Rulebook - 1jour-1jeu.com', 'author': '1jour-1jeu.com', 'subject': 'Splendor Rulebook - 1jour-1jeu.com', 'keywords': 'Splendor Rulebook - 1jour-1jeu.com', 'moddate': '2018-08-31T16:32:10+02:00', 'trapped': '', 'modDate': "D:20180831163210+02'00'", 'creationDate': "D:20131211154207+01'00'", 'page': 0}, page_content='')

### Split the documents

In [5]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1200,
    chunk_overlap = 100
)
docs = text_splitter.split_documents(documents)

In [6]:
len(docs)

173

### Embeddings

In [7]:
embeddings = HuggingFaceEmbeddings(
    model_name='BAAI/bge-small-en-v1.5', 
    model_kwargs={'device': 'cpu'},
    show_progress=True
)

In [8]:
db = FAISS.load_local('/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/rag-optimizations/faiss_pdfs', embeddings, allow_dangerous_deserialization=True)

In [None]:
# db = FAISS.from_documents(docs, embeddings)
# db.save_local('/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/rag-optimizations/faiss_pdfs')

In [10]:
retriever = db.as_retriever()

### LLM

In [11]:
model = ChatOpenAI(model='/Users/wangzeyu/Desktop/Github projects/legalai-chatbot/huggingface-models/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/7ae557604adf67be50417f59c2c2f167def9a775', base_url='http://0.0.0.0:8000/v1', api_key='n')

### 1. LLMChainExtractor

In [12]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [13]:
# will iterate over the initially returned documents and extract from 
# each only the content that is relevant to the query.
compressor = LLMChainExtractor.from_llm(model)
retriever1 = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

### 2. LLMChainFilter

In [14]:
from langchain.retrievers.document_compressors import LLMChainFilter

In [15]:
# LLMChainFilter: slightly simpler but more robust compressor that uses 
# an LLM chain to decide which of the initially retrieved documents to 
# filter out and which ones to return, without manipulating the document contents.
_filter = LLMChainFilter.from_llm(model)
retriever2 = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=retriever
)

### 3. LLMListwiseRerank

In [16]:
from langchain.retrievers.document_compressors import LLMListwiseRerank

In [17]:
# uses zero-shot listwise document reranking and functions similarly to 
# LLMChainFilter as a robust but more expensive option. It is recommended 
# to use a more powerful LLM.
_filter = LLMListwiseRerank.from_llm(model, top_n=2)
retriever3 = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=retriever
)

### 4. EmbeddingsFilter

In [18]:
from langchain.retrievers.document_compressors import EmbeddingsFilter

In [19]:
# provides a cheaper and faster option by embedding the documents and 
# query and only returning those documents which have sufficiently 
# similar embeddings to the query.
embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
retriever4 = ContextualCompressionRetriever(
    base_compressor=embeddings_filter, base_retriever=retriever
)

### 5. Combining Methods

Using the `DocumentCompressorPipeline` we can also easily combine multiple compressors in sequence. Along with compressors we can add `BaseDocumentTransformers` to our pipeline, which don't perform any contextual compression but simply perform some transformation on a set of documents. For example `TextSplitters` can be used as document transformers to split documents into smaller pieces, and the `EmbeddingsRedundantFilter` can be used to filter out redundant documents based on embedding similarity between documents.

In [20]:
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain_community.document_transformers import EmbeddingsRedundantFilter
from langchain_text_splitters import CharacterTextSplitter

In [21]:
# create a compressor pipeline by first splitting our docs into smaller 
# chunks, then removing redundant documents, and then filtering based 
# on relevance to the query.
splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=0, separators=". ")
redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, redundant_filter, relevant_filter]
)

In [22]:
retriever_combine = ContextualCompressionRetriever(
    base_compressor=pipeline_compressor, base_retriever=retriever
)

### Retrieval

In [23]:
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

In [24]:
query = 'How to Win in Root?'

In [25]:
base_docs = retriever.invoke(query)
chain_extractor_docs = retriever1.invoke(query)
chain_filter_docs = retriever2.invoke(query)
rerank_docs = retriever3.invoke(query)
embed_filter_docs = retriever4.invoke(query)
combine_docs = retriever_combine.invoke(query)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [26]:
pretty_print_docs(rerank_docs)

Document 1:

II     Recruit. Place one warrior in the Burrow.
III     Move. Take a move.
IV     Battle. Initiate a battle.
V     Dig. Spend one card to place a tunnel token 
in a matching clearing without a tunnel token. 
Then, move up to four warriors from the Bur-
row to that clearing. (If all three tunnels are on 
the map, you may remove a tunnel first.)
12.5.2     Parliament. You may take the action of each 
swayed minister once in any order. 
I     Foremole. Reveal any card to place a citadel or 
market in any clearing (matching or not) you rule.
II     Captain. Initiate a battle.
III     Marshal. Take a move.
IV     Brigadier. Take up to two moves or initiate up 
to two battles.
V     Banker. Spend any number of cards (even one) 
of the same suit to score victory points in equal 
number.
VI     Mayor. Take the action of any swayed noble 
or squire.
VII     Duchess of Mud. Score two victory points if 
all three tunnels are on the map.
VIII     Baron of Dirt. Score one victory poin

### RAG

In [27]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

In [28]:
prompt_template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Keep the answer as concise as possible.

Context: {context}

Question: {question}

Helpful Answer:"""
prompt = PromptTemplate.from_template(prompt_template)

In [29]:
rag_chain = (prompt | model | StrOutputParser())
res1 = rag_chain.invoke({"context": base_docs, "question": query})
res2 = rag_chain.invoke({"context": chain_extractor_docs, "question": query})
res3 = rag_chain.invoke({"context": chain_filter_docs, "question": query})
res4 = rag_chain.invoke({"context": embed_filter_docs, "question": query})
res5 = rag_chain.invoke({"context": rerank_docs, "question": query})
res6 = rag_chain.invoke({"context": combine_docs, "question": query})

In [31]:
print(f'\x1b[0m{"*"*50}\nBase Model\n{"*"*50}\n', res1)

print(f'\x1b[0m{"*"*50}\nChain Extractor\n{"*"*50}\n', res2)

print(f'\x1b[0m{"*"*50}\nChain Filter\n{"*"*50}\n', res3)

print(f'\x1b[0m{"*"*50}\nEmbedding Filter\n{"*"*50}\n', res4)

print(f'\x1b[0m{"*"*50}\nRerank Docs\n{"*"*50}\n', res5)

print(f'\x1b[0m{"*"*50}\nPipeline Filter\n{"*"*50}\n', res6)

[0m**************************************************
Base Model
**************************************************
 To win in Root, you need to reach 30 victory points immediately after playing. This means every time you remove an enemy's building or token, you get one victory point. You can also craft items to increase your victory points.
[0m**************************************************
Chain Extractor
**************************************************
 The variant setup allows players to choose whether to replace their standard deck with the Exiles and Partisans deck or to use bots according to the Law of Rootbotics.
[0m**************************************************
Chain Filter
**************************************************
 To win in root (rootless), it's crucial to understand and utilize the features provided by your operating system. This includes:

1. **Advanced System Administration**: Learn how to manage systems using commands like `sudo`, `chroot`, and `moun