# Improving Retrieval Performance in RAG Applications

## Introduction
Embedding models have revolutionized the field of natural language processing (NLP). These models transform high-dimensional data (like text) into a lower-dimensional space while preserving relevant informational and relational properties. This transformation facilitates various tasks in natural language processing (NLP), including search, recommendation systems, and information retrieval.


### Retrieval-Augmented Generation (RAG)


Retrieval-Augmented Generation (RAG) combines the strengths of retrieval-based methods and generation-based methods to improve the performance of NLP systems. RAG applications retrieve relevant information from a large corpus of documents and use this information to generate more accurate and contextually appropriate responses. This approach has found applications in numerous domains, including chatbots, search engines, recommendation systems, and knowledge management systems.


The effectiveness of RAG applications heavily relies on the quality of the embedding models used for information retrieval. Embedding models must accurately capture the semantic meaning of queries and documents to ensure that the most relevant information is retrieved. However, the reliance on general-purpose embedding models often limits the performance of RAG applications in specific domains.


Embedding models are mostly trained on extensive corpuses of general knowledge, such as Wikipedia or Common Crawl, this broad approach can be limiting when applied to specialized domains. For example, models trained on general data may not perform well in technical domains without additional tuning. This limitation arises from the fact that general knowledge embeddings may not capture the nuances and specialized terminology unique to specific domains.




Customizing embedding models to capture domain-specific knowledge is crucial for enhancing the performance of RAG applications. Domain-specific embeddings are trained on specialized corpora that reflect the language and terminology used in a particular field. By doing so, these embeddings can better capture the semantic nuances and context-specific meanings that are essential for accurate information retrieval. This customization process involves training or fine-tuning models on specialized datasets, incorporating domain-specific vocabularies, and possibly adjusting model architectures to better handle the characteristics of the data.




### Boosting Retrieval Performance in RAG Applications


Enhancing retrieval performance is crucial for the success of RAG applications, as the quality of retrieved documents significantly impacts the quality of the generated content. Customizing embeddings can lead to more accurate and relevant data retrieval, which in turn improves the overall output of the RAG system. For instance, a RAG application in the medical field, trained with domain-specific embeddings, would be able to retrieve and generate more precise and clinically relevant information than one using a generic embedding model.


Sentence Transformers is a Python library for using and training embedding models for a wide range of applications, such as retrieval augmented generation, semantic search, semantic textual similarity, paraphrase mining, and more. Its v3.0 update is the largest since the project's inception, introducing a new training approach.


Developed by UKPLab, the Sentence Transformers library extends the popular BERT (Bidirectional Encoder Representations from Transformers) model by Hugging Face, but with a focus on producing better sentence-level embeddings. Unlike traditional BERT that outputs a high-dimensional vector for each token in the input text, Sentence Transformers generate a single fixed-size vector for the entire input sentence or paragraph, making them more practical for tasks that require sentence-level comparisons. With this library, we can utilize and train embedding models across different applications. These applications include RAG, semantic search, semantic textual similarity, and many others. The v3.0 update introduces a new trainer that makes it easier to fine-tune and train embedding models. This update includes enhanced components like diverse datasets, updated loss functions, and a streamlined training process, improving the efficiency and flexibility of model development. In this post, I'll show you how to finetune a sentence transformer model on a specific task using the Sentence Transformer library.  


## Training a Sentence Transformer

Training Sentence Transformer models involves between 3 to 5 components.



<center><figure><img src="../imgs/Sentence Transformer Training.png" alt="drawing" width="1100"/><figcaption>Fig. 1: Sentence Transformer training components</figcaption></figure></center> 

### Dataset

You can load your local dataset or Hugging Face Datasets using datasets.load_dataset(). One important consideration is that your dataset format should match your loss function. If your loss function requires a Label accordingly, then your dataset must have a column named “label” or “score”. All other columns are considered Inputs. The number of remaining columns must match the number of valid inputs for your chosen loss. The names of these columns are irrelevant, only the order matters. Table 1 shows the requirements for the loss functions in Sentence Transformers v3.0.

#### Table 1: Loss function requirements




<table border="1">
    <caption>Table 1: Requirements for Loss Functions </caption>
    <tr>
        <th>Inputs</th>
        <th>Labels</th>
        <th>Appropriate Loss Functions</th>
    </tr>
    <tr>
        <td>single sentences</td>
        <td>class</td>
        <td>BatchAllTripletLoss<br> BatchHardSoftMarginTripletLoss<br> BatchHardTripletLoss<br> BatchSemiHardTripletLoss</td>
    </tr>
    <tr>
        <td>single sentences</td>
        <td>none</td>
        <td>ContrastiveTensionLoss<br> DenoisingAutoEncoderLoss</td>
    </tr>
    <tr>
        <td>(anchor, anchor) pairs</td>
        <td>none</td>
        <td>ContrastiveTensionLossInBatchNegatives</td>
    </tr>
    <tr>
        <td>(damaged_sentence, original_sentence) pairs</td>
        <td>none</td>
        <td>DenoisingAutoEncoderLoss</td>
    </tr>
    <tr>
        <td>(sentence_A, sentence_B) pairs</td>
        <td>class</td>
        <td>SoftmaxLoss</td>
    </tr>
    <tr>
        <td>(anchor, positive) pairs</td>
        <td>none</td>
        <td>CachedMultipleNegativesRankingLoss<br> MultipleNegativesRankingLoss<br> MultipleNegativesSymmetricRankingLoss<br> MegaBatchMarginLoss<br> CachedGISTEmbedLoss<br> GISTEmbedLoss</td>
    </tr>
    <tr>
        <td>(anchor, positive/negative) pairs</td>
        <td>1 if positive, 0 if negative</td>
        <td>ContrastiveLoss<br> OnlineContrastiveLoss</td>
    </tr>
    <tr>
        <td>(sentence_A, sentence_B) pairs</td>
        <td>float similarity score</td>
        <td>CoSENTLoss<br> AnglELoss<br> CosineSimilarityLoss</td>
    </tr>
    <tr>
        <td>(anchor, positive, negative) triplets</td>
        <td>none</td>
        <td>CachedMultipleNegativesRankingLoss<br> MultipleNegativesRankingLoss<br> TripletLoss<br>CachedGISTEmbedLoss<br>GISTEmbedLoss</td>
    </tr>
</table>

# Loss Function

The loss function is at the core of training in machine learning algorithms. Unfortunately, there is no single loss function that works best for all use cases. Choose your loss function based on your data, or curate your dataset based on your loss function. You can consult Table 1 for the loss function requirements.

# Training Arguments

You can specify training parameters to improve training performance. Training Arguments are optional, however, you can experiment with them to see how they can improve your training performance. Table 2 shows some of the training arguments to look at.

#### Table 2: Training Arguments Table

| Training Argument                   | Explanation                                                                                                                                  | Data type |
|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-----------|
| learning_rate                       | The learning rate of the optimizer                                                                                                           | float     |
| lr_scheduler_type                   | The scheduler type to use. Possible values are: “Constant”, “constant_with_warmup”, “cosine”, “cosine_with_warmup”, “linear_with_warmup”, “inverse_sqrt”. See [SchedulerType](https://huggingface.co/docs/transformers/main/en/main_classes/optimizer_schedules#transformers.SchedulerType) for more details | str       |
| warmup_ratio                        | For schedulers with warmup, Ratio of total training steps used for a linear warmup from 0 to learning_rate                                    | float     |
| num_train_epochs                    | Total number of training epochs to perform                                                                                                   | float     |
| max_steps                           | If set to a positive number, the total number of training steps to perform. Overrides num_train_epochs                                       | int       |
| per_device_train_batch_size         | The batch size per GPU core/CPU for training                                                                                                 | int       |
| per_device_eval_batch_size          | The batch size per GPU core/CPU for evaluation                                                                                               | int       |
| auto_find_batch_size                | Whether to find a batch size that will fit into memory automatically through exponential decay, avoiding CUDA Out-of-Memory errors            | bool      |
| fp16                                | Whether to use fp16                                                                                                                          | bool      |
| bf16                                | Whether to use bf16                                                                                                                          | bool      |
| gradient_accumulation_steps         | Number of updates steps to accumulate the gradients for, before performing a backward/update pass                                            | int       |
| gradient_checkpointing              | If True, use gradient checkpointing to save memory at the expense of slower backward pass                                                    | bool      |
| eval_accumulation_steps             | Number of predictions steps to accumulate the output tensors for, before moving the results to the CPU                                       | int       |
| optim                               | The optimizer to use. Some of the optimizers are: “adamw_hf”, “sgd”, “adamw_8bit”, “paged_adamw_32bit”, “paged_adamw_8bit”, “adagrad”, “rmsprop”, “rmsprop_bnb_32bit”. For the full list of optimizers available in Training Arguments see [Training Arguments](https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py) | str       |

*Check [Training Arguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) on Hugging Face <img src="../imgs/hf-logo.svg" alt="drawing" width="25"/> for a complete list.  