## LoRA: Low-Rank Adaptation of Large Language Models

### **Introduction to LoRA**

LoRA aims to adapt pre-trained language models by adding low-rank matrices to certain weight matrices, reducing the number of parameters that need to be updated. This saves memory and computation, making it ideal for large models. In LoRA, we introduce low-rank matrices to the weights of the model. This allow to train only the low-rank parameters, while the rest of the model remains frozen. This way, we can adapt the model to a specific task without having to train the entire model from scratch.


In [2]:
import torch
import torch.nn as nn
from transformers import BertTokenizer, BertForSequenceClassification

In [3]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
    )

### 1. Implement LoRA from scratch on a BERT model

In this section, we will implement Low-Rank Adaptation (LoRA) on a BERT model from scratch to better understand the concept and its benefits. We will use the `transformers` library to load the pre-trained BERT model and then modify its attention layers to include low-rank matrices. 

We will then train the modified BERT model on a downstream task to observe the efficiency of LoRA compared to standard fine-tuning. We will use the same training pipeline as proposed in `lab03` on bert to finetune it on a sentiment classification task on the IMDB dataset, so that we can easily compare the results obtained with previous ones. 

In [None]:
model_name = "bert-base-uncased"
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = BertTokenizer.from_pretrained(model_name)

In [None]:
print_trainable_parameters(model)

The `LoRA` class will inherits from `nn.Module`, which is the base class for all neural network modules in PyTorch. The constructor takes an `original_layer` (e.g., a linear layer from BERT) and a `rank` parameter that determines the rank of the low-rank matrices. It initializes two low-rank matrices `A` and `B`, which will be used for the adaptation. The dimensions of these matrices are determined by the input and output features of the original layer.

The `reset_parameters` method initializes the low-rank matrix `A` with values drawn from a normal distribution, as done in the original LoRA paper (https://arxiv.org/abs/2106.09685).

The `forward` method defines how the input `x` is processed through the LoRA layer. The input is multiplied by the low-rank matrix `A` to create a low-rank representation, which is then multiplied by the low-rank matrix `B` to obtain the adapted output. Finally, the output of the original layer is combined with the LoRA output.

In [5]:
class LoRA(nn.Module):
    def __init__(self, original_layer, rank=8):
        super(LoRA, self).__init__()
        self.original_layer = original_layer
        self.rank = rank
        self.in_features = original_layer.in_features
        self.out_features = original_layer.out_features

        # Initialize the Low-rank matrices A and B
        self.A = nn.Parameter(torch.zeros(self.in_features, rank))
        self.B = nn.Parameter(torch.zeros(rank, self.out_features))

        self.reset_parameters()

    def reset_parameters(self):
        nn.init.normal_(self.A)

    def forward(self, x):
        # The output is the original layer output plus the low-rank adaptation

        # LoRA output
        lora_output = torch.matmul(x, self.A)
        lora_output = torch.matmul(lora_output, self.B)

        # layer output, which combines the original output with the LoRA one 
        return self.original_layer(x) + lora_output

In [6]:
# We will apply LoRA only to BERT's attention layers (query, key, value)
# Loop through each layer of BERT and replace query, key, and value with LoRA

for layer in model.bert.encoder.layer:
    layer.attention.self.query = LoRA(layer.attention.self.query)
    layer.attention.self.key = LoRA(layer.attention.self.key)
    layer.attention.self.value = LoRA(layer.attention.self.value)

In [7]:
# Freeze all parameters except the LoRA parameters

# Freeze all parameters
for param in model.parameters():
    param.requires_grad = False  # Freeze all parameters
    
# Unfreeze only LoRA parameters
for layer in model.modules():
    if isinstance(layer, LoRA):
        for param in layer.parameters():
            param.requires_grad = True

In [None]:
print_trainable_parameters(model)

From this point onwards, the classic training pipeline can be applied to the model!

In [33]:
from datasets import load_dataset

# Load a sentiment analysis dataset
dataset = load_dataset('imdb')
train_dataset = dataset['train'].shuffle(seed=42).select(range(2000))
test_dataset = dataset['test'].shuffle(seed=42).select(range(1000))

In [34]:
from sklearn.metrics import accuracy_score

# Function to compute accuracy
def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    accuracy = accuracy_score(labels, preds)
    return {"accuracy": accuracy}

In [35]:
# Tokenize the dataset
def tokenize_function(sample):
    return tokenizer(sample['text'], padding="max_length", truncation=True)

train_dataset = train_dataset.map(tokenize_function, batched=True)
test_dataset = test_dataset.map(tokenize_function, batched=True)

In [36]:
from transformers import Trainer, TrainingArguments

batch_size = 32
num_train_epochs = 2

learning_rate = 2e-5
weight_decay = 0.01

# Define training arguments
training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy="steps",
    eval_steps=10,
    learning_rate=learning_rate,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=num_train_epochs,
    weight_decay=weight_decay,
    logging_dir='./logs',  # Directory for storing logs
    logging_steps=10,  # Log every 10 steps
)

# Initialize the Trainer object
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics
)

In [None]:
# Evaluate the model
results = trainer.evaluate()
print(f"Accuracy on the validation set: {results['eval_accuracy']:.4f}")

In [None]:
results = trainer.train()

In [None]:
# Evaluate the model
results = trainer.evaluate()
print(f"Accuracy on the validation set: {results['eval_accuracy']:.4f}")

### 2. Implement LoRA with Hugging Face Transformers

In addition to implementing Low-Rank Adaptation (LoRA) from scratch, Hugging Face provides a simplified approach for applying LoRA to models through the **PEFT** library and `LoraConfig`.

In [4]:
model_name = "bert-base-uncased"
model = BertForSequenceClassification.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


**PEFT (Parameter-Efficient Fine-Tuning)** is a framework within Hugging Face's ecosystem designed to enable efficient fine-tuning of large language models. PEFT supports various parameter-efficient techniques, including LoRA, Prefix Tuning, and Adapter Layers, to adapt pre-trained models to specific tasks without requiring extensive training or memory resources.

In [5]:
from peft import LoraConfig, get_peft_model

`LoraConfig` is a configuration class provided by the Hugging Face `peft` library to set up LoRA parameters. With `LoraConfig`, you can specify the rank of the LoRA matrices, the target layers in the model where LoRA should be applied, and additional details such as dropout rates.

In [6]:
config = LoraConfig(
    r=32,
    lora_alpha=32, 
    lora_dropout=0.1
)

In [7]:
model = get_peft_model(model, config)

In [8]:
print_trainable_parameters(model)

trainable params: 1179648 || all params: 110663426 || trainable%: 1.07
