# Chapter 6: Stage 4: Selection of Fine-Tuning Techniques and Appropriate Model Configurations

##  Steps Involved in Fine-Tuning

1. **Initialise the Pre-Trained Tokenizer and Model**
2. **Modify the Model’s Output Layer**
3. **Choose an Appropriate Fine-Tuning Strategy**: Select the fine-tuning strategy that best fits the task and the model architecture. Some Options include:
+ Task-Specific Fine-Tuning: For tasks such as text summarisation, code generation, classification, and question answering, adapt the model using relevant datasets.
+ Domain-Specific Fine-Tuning: Tailor the model to comprehend and generate text relevant to specific domains, such as medical, financial, or legal fields.
+ Parameter-Efficient Fine-Tuning (PEFT): Techniques like LoRA, QLoRA, and adapters allow for fine-tuning with reduced computational costs by updating a small subset of model parameters.
+ Half Fine-Tuning (HFT): Balance between retaining pre-trained knowledge and learning new tasks by updating only half of the model’s parameters during each fine-tuning round.

4. **Set Up the Training Loop**
5. **Incorporate Techniques for Handling Multiple Tasks**
6. **Monitor Performance on a Validation Set**
7. **Optimise Model Using Advanced Techniques**: Employ techniques such as Proximal Policy Optimisation (PPO) for reinforcement learning scenarios, or Direct Preference Optimisation (DPO) for aligning model outputs with human preferences. These techniques are particularly useful in fine-tuning models for tasks requiring nuanced decision-making or human-like responses.

8. **Prune and optimise the Model** (if necessary)
9. **Continuous Evaluation and Iteration**

##  Fine-Tuning Strategies for LLMs

### Task-Specific Fine-Tuning

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:400px">
    <img src="image/task_specific.png" alt="" />
</div>

### Domain-Specific Fine-Tuning

##  Parameter-Efficient Fine-Tuning (PEFT) Techniques

Parameter Efficient Fine Tuning (PEFT) is an impactful NLP technique that adeptly adapts pre-trained language models to various applications with remarkable efficiency. PEFT methods fine-tune only a small subset of (additional) model parameters while keeping most of the pre-trained LLM parameters frozen, thereby significantly reducing computational and storage costs. This approach mitigates the issue of catastrophic forgetting, a phenomenon where neural networks lose previously acquired knowledge and experience a significant performance decline on previously learned tasks when trained on new datasets. PEFT methods have demonstrated superior performance compared to full fine-tuning, particularly in low-data scenarios, and exhibit better generalisation to out-of-domain contexts. 

### Adapters

Adapter-based methods introduce additional trainable parameters after the attention and fully connected
 layers of a frozen pre-trained model, aiming to reduce memory usage and accelerate training. 
 
 The specific approach varies depending on the adapter; it might involve adding an extra layer or representing the
 weight updates delta as a low-rank decomposition of the weight matrix.
 
  Regardless of the method,
 adapters are generally small yet achieve performance comparable to fully fine-tuned models, allowing for
 the training of larger models with fewer resources.

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:600px">
    <img src="image/peft.png" alt="" />
</div>

### Low-Rank Adaptation (LoRA)

 Low-Rank Adaptation (LoRA) is a technique designed for fine-tuning large language models, which
 modifies the fine-tuning process by freezing the original model weights and applying changes to a separate
 set of weights, added to the original parameters. 

 LoRA transforms the model parameters into a lower
rank dimension, reducing the number of trainable parameters, speeding up the process, and lowering
 costs.
 
  This method is particularly useful in scenarios where multiple clients require fine-tuned models
 for different applications, allowing for the creation of specific weights for each use case without the
 need for separate models. 

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:500px">
    <img src="image/lora.png" alt="" />
</div>

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:300px">
    <img src="image/lora_weight.png" alt="" />
</div>

### QLoRA

 QLoRA is an extended version of LoRA designed for greater memory efficiency in large language mod
els (LLMs) by quantising weight parameters to 4-bit precision. Typically, LLM parameters are stored
 in a 32-bit format, but QLoRA compresses them to 4-bit, significantly reducing the memory footprint.
 This allows fine-tuning on less powerful hardware, including consumer GPUs. QLoRA also quantises the
 weights of the LoRA adapters from 8-bit to 4-bit, further decreasing memory and storage requirements. Despite the reduction in bit precision, QLoRA maintains performance levels comparable
 to traditional 16-bit fine-tuning

### Weight-Decomposed Low-Rank Adaptation (DoRA)

Weight-Decomposed Low-Rank Adaptation (DoRA) is a novel fine-tuning methodology designed to
 optimise pre-trained models by decomposing their weights into magnitude and directional components.

 This approach leverages the efficiency of Low-Rank Adaptation (LoRA) for directional updates, facili
tating substantial parameter updates without altering the entire model architecture. 

DoRA addresses the computational challenges associated with traditional full fine-tuning (FT) by maintaining model
 simplicity and inference efficiency, while simultaneously bridging the performance gap typically observed
 between LoRA and FT. 

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:600px">
    <img src="image/dora.png" alt="" />
</div>

**Comparison between LoRA and DoRA**

<div style="background-color:white; padding:10px; display:flex; justify-content:center;height:300px">
    <img src="image/lora_dora.png" alt="" />
</div>

###  Fine-Tuning with Multiple Adapters