# LoRA

## Low-Rank Adaptation

Fine-tuning large language models is a resource intensive process. LoRA is a technique that allows us to fine-tune large language models with a small number of parameters. It works by adding and optimizing smaller matrices to the attention weights, typically reducing trainable parameters by about 90%.

LoRA (Low-Rank Adaptation) is a `parameter-efficient fine-tuning technique` that `freezes` the pre-trained model weights and injects trainable `rank decomposition` matrices into the model’s layers.

Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance.

LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable.

# Parameter-efficient fine-tuning (PEFT)

PEFT is a library that provides a unified interface for loading and managing PEFT methods, including LoRA. 

It allows you to easily load and switch between different PEFT methods, making it easier to experiment with different fine-tuning techniques.

Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren’t merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights.

We’ll use the LoRAConfig class. The setup requires just a few configuration steps:

- Define the LoRA configuration (rank, alpha, dropout)
- Create the SFTTrainer with PEFT config
- Train and save the adapter weights

| Parameter       | Description |
|----------------|-------------|
| **r (rank)**         | Dimension of the low-rank matrices used for weight updates. Typically between 4–32. Lower values provide more compression but potentially less expressiveness. |
| **lora_alpha**       | Scaling factor for LoRA layers, usually set to 2× the rank value. Higher values result in stronger adaptation effects. |
| **lora_dropout**     | Dropout probability for LoRA layers, typically 0.05–0.1. Higher values help prevent overfitting during training. |
| **bias**             | Controls training of bias terms. Options are “none”, “all”, or “lora_only”. “none” is most common for memory efficiency. |
| **target_modules**   | Specifies which model modules to apply LoRA to. Can be “all-linear” or specific modules like “q_proj,v_proj”. More modules enable greater adaptability but increase memory usage. |


When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key.

In [None]:
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

In [None]:
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM,AutoTokenizer
config = PeftConfig.from_pretrained("ybelkada/opt-350m-lora")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora")

In [None]:
from peft import LoraConfig
# r: rank dimension for LoRA update matrices (smaller = more compression)
rank_dimension = 6
# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)
lora_alpha = 8
# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)
lora_dropout = 0.05

peft_config = LoraConfig(
    r = rank_dimension,
    lora_alpha = lora_alpha, # 2x the rank
    lora_dropout = lora_dropout,
    bias = "none",
    target_modules = "all-linear",
    task_type="CAUSAL_LM",
)

In [None]:
from trl import SFTTrainer
trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=dataset["train"],
    peft_config=peft_config,
    max_seq_length=max_seq_length,
    processing_class=tokenizer,    
)