## Single Pass Rerank and Contextual Compression using Recursive Reranking

### Overview

This notebook demonstrates how to use a **Reranker Model** to perform both _Reranking_ and _Contextual Compression_ in a single pass. We will go over the **Intuition** behind the technique, the **Implementation** details and, finally, a short **Conclusion**.

### 1. Intuition

The use of a Reranker Model has become essential in most modern RAG pipelines, especially when dealing with large or complex datasets. It significantly improves the **Precision** of retrieved results by re-evaluating and reordering initial retrieved documents based on deeper semantic understanding. A Reranker Model is tipically involved in the following steps:

1. Use a fast **Embedding Model** to retrieve a set of $top_K$ candidate documents of based on query similarity. Usually, $top_K$ is set to a relatively high number to ensure high **Recall**;

2. Use a **Reranker Model** to re-evaluate the $top_K$ documents and select the $top_N$ most relevant ones. $top_N$ is usually set to a much lower number to ensure high **Precision**.

You might wonder why a reranker model is necessary at all: after all, the initial retrieval step already returns a set of seemingly relevant documents. This is because embedding models, while effective for initial retrieval, rely on the **Encoder** Architecture which compresses the semantic meaning of the documents into fixed-size vectors. Relevance is then estimated using a _similarity function_ (such as cosine similarity) over the calculated vectors. While this approach is efficient, it can miss subtle semantic nuances and contextual cues of the original texts.
Reranker Models, on the other hand, use a **Cross-Encoder** architecture, which jointly processes the query and candidate documents at the _token level_, allowing for a more fine-grained understanding of their relationship. This two stage process, while more computationally expensive, ensures a higher quality of the final results.

To further enhance the quality of the results, we can also apply a **Contextual Compression** step. This step involves breaking down the retrieved documents into smaller, more manageable chunks. This allows us to not only select the most relevant documents but also to extract only the most relevant pieces from them, effectively compressing the context while retaining essential information.

The problem with this pipeline is that it now requires three separate steps: An initial retrieval step, a reranking step, and a compression step. Using traditional methods, this can be inefficient and highly time-consuming. What if we could combine both Reranking and Compression into a single step? This is where the **Recursive Reranking** technique comes into play, which functions as follows:

1. Use a fast Embedding Model to retrieve a set of $top_K$ candidate documents;

2. Using a Reranker Model, calculate a relevance score for each sub-section of each document;

3. Use calculated sub-section scores to both rerank documents and select the most relevant sub-sections of each document.