## Estimating the K hyperparamter using Query Complexity Score

### Overview

This notebook provides an explanation on how to dynamically estimate the **K hyperparameter** (number of docs to retrieve) during the retrieval phase using the **Query Complexity Score** (QCS).

### 1. Intuition

When building a comprehensive RAG system, one of the most important hyperparameters to tune is **K**, which represents the number of documents (before reranking, if any) to retrieve for each query. The choice of this hyperparameter can significantly impact the performances of the system, especially in terms of **precision** and **answer feasibility**:
 - A **low K** will lead to _higher precision_ (since the number of retrieved documents is low), but may result in missing relevant information, which leads to incomplete answers;
 - A **high K** will lead lo _lower precision_, but may provide more relevant information, which leads to more complete answers.

Unfortunately, **not all queries are created equal**: Some queries are more complex than others, and therefore may require more documents to be retrieved (higher K) in order to provide a complete answer. Other queries may be simpler and therefore require less documents (lower K) to achieve a satisfactory answer. For example, the query "What's the capital of Italy?" is a simple one and can be answered with a single chunk that contains the phrase "Rome is the capital of Italy"; on the other hand, the query "What are the main differences between Italian and French cusine?" is more complex and may require multiple chunks to be retrieved in order to provide an answer.

This leads to the problem of **estimating the K hyperparameter** for each query, which is not a simple task. What if we could associate a score to each query that roughly reflects its complexity and, based on that score, estimate the K value to use for that query so that higher complexity scores lead to higher K values? This is the intution behind the **Query Complexity Score** (QCS), which is discussed in this notebook.

### 2. Solutions

#### 2.1 High Static K

The simplest approach to _"solve"_ this problem is setting an high, fixed K value. If the varying range of complexity of the queries that our system is going to receive is known a priori, a K value that is high enough can be fixed to provide a satisfactory answer for the most complex queries.
As already discussed, this approach has the drawback of leading to lower precision as well as higher costs and latency even for the simplest queries.

> Please note that these problems could be somewhat mitigated by using a **reranker** model on the retrieved chunks, but this notebook focuses on another approach entirely.

#### 2.2 Training a Model to Estimate K for potential queries

Another approach is to train a model to predict the value of K for a given query. This model can be trained on a dataset of queries and their corresponding K values, which can be obtained by generating syntethic queries of varying complexity and manually annotating them with an appropiate value of K. This approach works well if the training dataset is highly representative of the queries that the system is going to receive. However, it has the drawback of requiring a good dataset (which is difficult to obtain, especially with syntethic queries) and a good model that is able to generalize well to unseen queries. However, this approach is costly, time-consuming, and requires effort to maintain the model up-to-date in a system that is constantly evolving.

> For a thorough implementation of this approach, you can check out this [Medium Article](https://medium.com/@sauravjoshi23/optimizing-retrieval-augmentation-with-dynamic-top-k-tuning-for-efficient-question-answering-11961503d4ae) by Saurav Joshi.

#### 2.3 Ask an LLM

Of course, the **JALM** (Just Ask a Language Model) approach is always an option and, in most cases, the best one in terms of quality. a smart-enough LLM could be able to estimate the K value for a given query based on its complexity and, optionally, the context of the system.
Another approach relies on the use of **Query Composition**: Starting from the original query the LLM is _kindly_ asked to generate a set of small, atomic queries (addionally each with an associated K value) that reflects the decomposition of the original query into smaller, more manageable parts. The retrieval results for each sb-query is then merged using techniques like **Reciprocal Rank Fusion** or, optionally, each sub-query is answered separately and the sub-results are merged. This approach is particularly useful when the original query is too complex to be answered in a single step, but it requires a good LLM and adds a lot of complexity and latency to the system.


### 3. Query Complexity Score

(Explaining the QCS)