### Parameter-efficient fine-tuning with LoRA

<i>Low-rank adaptation (LoRA)</i> is one of the most widely used techniques for parameter-efficient fine-tuning. LoRA is a technique that adapts a pretrained model to better suit a specific, often smaller dataset by adjusting only a small subset of the model's weight parameters. The "low-rank" aspect refers to the mathematical concept of limiting model adjustments to a smaller dimensional subspace of the total weight parameter space. This effectively captures the most influential directions of the weight parameter changes during training. LoRA is useful and popular because it enables efficient fine-tuning of large models on task-specific dat, significantly cutting down on computational costs and resources usually required for fine-tuning. 

Suppose a large weight matrix W is associated with a specific layer (LoRA can be applied to all linear layers in an LLM but we focus on a single layer for illustration purposes). During backpropagation, we learn a ΔW matrix, which contains information on how much we want to update the original weight parameters to minimise the loss function during training (from now on "weight" = model's weight parameters). In regular training and fine-tuning, the weight update is defined as<br>

W<sub>updated</sub> = W + WΔ

The LoRA method offers a more efficient alternative to computing the weight updates by learning an approximation of it:

ΔW ≈ AB

where A and B are two matrices much smaller than W, and AB represents the matrix multiplication product between A and B. Using LoRA, we can reformulate the weight update defined earlier:

W<sub>updated</sub> = W + AB

#### Preparing the dataset

In [2]:
import pandas as pd
train_df = pd.read_parquet("../Datasets/train.parquet")
valid_df = pd.read_parquet("../Datasets/valid.parquet")
test_df = pd.read_parquet("../Datasets/test.parquet")

In [3]:
import torch
from torch.utils.data import Dataset
from Chapter05 import tokeniser
from Chapter06 import SpamDataset

train_dataset = SpamDataset("../Datasets/train.parquet", 
                            max_length=None,
                            tokeniser=tokeniser
)
val_dataset = SpamDataset("../Datasets/valid.parquet", 
                            max_length=None,
                            tokeniser=tokeniser
)
test_dataset = SpamDataset("../Datasets/test.parquet", 
                            max_length=None,
                            tokeniser=tokeniser
)

In [4]:
from torch.utils.data import DataLoader

num_workers = 0
batch_size = 8
torch.manual_seed(42)

train_loader = DataLoader(
    dataset=train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=num_workers,
    drop_last=True
)
val_loader = DataLoader(
    dataset=val_dataset,
    batch_size=batch_size,
    num_workers=num_workers,
    drop_last=False
)
test_loader = DataLoader(
    dataset=test_dataset,
    batch_size=batch_size,
    num_workers=num_workers,
    drop_last=False
)

In [5]:
print("Train loader:")
for input_batch, target_batch in train_loader:
    pass

print("Input batch dimensions:", input_batch.shape)
print("Target batch dimensions:", target_batch.shape)

Train loader:
Input batch dimensions: torch.Size([8, 109])
Target batch dimensions: torch.Size([8])


In [6]:
print(f"{len(train_loader)} training batches")
print(f"{len(val_loader)} validation batches")
print(f"{len(test_loader)} test batches")

130 training batches
19 validation batches
38 test batches


#### Initialising the model