# Efficient Fine-Tuning with LoRA (Tutorial)

This notebook contains a brief example applying LoRA to fine-tune DistilBERT for sequence classification on limited hardware resources (v5e-1 TPU in Colab). The model weights and data are loaded using the HuggingFace framework, which also contains the Parameter-Efficient Fine-Tuning (PEFT) used for fine-tuning with LoRA.

### What is LoRA?

Low-Rank Adaptation (LoRA) is a fine-tuning technique used for Large Language Models (LLMs).

LoRA seeks to reduce the enormous computational cost of finetuning, and to simplify the process of adapting a pre-trained LLM for various downstream tasks. This is achieved by:


1.   Computing a low-rank adaptation of some weights (originally, the attention weights)

2.   Training this low-rank adaptation during fine-tuning

3.   Adding these weights back to the pre-trained LLM for inference

### Why use LoRA?

The pretrained model is preserved. This is preferable to retraining the entire model during fine-tuning for a couple of reasons:
1.   Improved performance on downstream tasks (the model does not "forget" the general knowledge acquired during pretraining)
2. Straightforward fine-tuning on other tasks that requires less memory (fine-tuned weights are low-rank; they can be saved separately and added on to the pre-trained "base" model as necerssary)

LoRA also has advantages over previous fine-tuning methods, such as
1. adapter layers (typically linear layers added to transformer module -> increase model depth -> slow down inference)
2. and prefix tuning, in which prefix tokens are prepended to prompt -> smaller % input directly relevant to prompt -> potentially worse performance. Moreover, the authors of the LoRA paper note that this method is "difficult to optimize" and that its performance with respect to model size is difficult to predict.

### Original LoRA Paper:

PDF: https://arxiv.org/pdf/2106.09685


# Imports

In [None]:
# useful huggingface packages for training transformers
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer,TrainingArguments, DataCollatorWithPadding

# huggingface packages for loading data
from datasets import load_from_disk

# finetuning-specific packages
from peft import LoraConfig, get_peft_model

# for subsampling data
import random

# for loading checkpoints
import os

# used for inference - scroll to bottom for demo!!
import torch

# Model

This notebook uses DistilBERT, a smaller version of the BERT LLM trained via knowledge distillation, due to computational constraints. Edit the model name here to use a different model.

## About this model

DistilBERT has 66 million parameters, 6 layers, and 12 attention heads. It ignores case, like the "teacher" BERT model.

More about DistilBERT:

https://huggingface.co/distilbert/distilbert-base-uncased

https://huggingface.co/docs/transformers/en/model_doc/distilbert

In [None]:
name = "distilbert-base-uncased"

# Loading data

The data was tokenized and saved (in case of disconnected runtime) using this script:



```
from datasets import load_dataset
amazon_reviews=load_dataset("amazon_polarity")
def tokenize_function(batch):
    return tokenizer(
        batch["content"], # content col contains actual review text
        truncation=True,
        padding="max_length",   # keeping batch-length uniform
        max_length=128,         # reasonable for seq class.
        )

 tokenized_reviews = amazon_reviews.map(tokenize_function, batched=True,num_proc=8) # using parallelization to speed up tokenization

 tokenized_reviews.save_to_disk("./tokenized_amazon")
```



In [None]:
# loading tokenized data
tokenized_reviews = load_from_disk("./tokenized_amazon")

In [None]:
# load the model-compatible tokenizer from HuggingFace
tokenizer = AutoTokenizer.from_pretrained(name)

# load the data collator (creates uniform batches) with the tokenizer
# (preserves compatible padding tokens, attention mask rules, etc)
collator = DataCollatorWithPadding(tokenizer=tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

# Subsampling

Run this cell to use less data and edit the train/eval dataset names per the comments in the training script.

This script was not used to finetune the provided model.

In [None]:
# take random subset of tokenized data (full dataset takes ~ 10h/epoch on T4 GPU available in Colab)

ratio = 0.1

len_subsample_train = int(ratio * len(tokenized_reviews["train"]))
len_subsample_test = int(ratio * len(tokenized_reviews["test"]))

print("The new train set will have ", len_subsample_train, "samples")
print("The new test set will have ", len_subsample_test, "samples")

train = tokenized_reviews["train"].shuffle(seed=7)[:len_subsample_train]
eval = tokenized_reviews["test"].shuffle(seed=7)[:len_subsample_test]

The new train set will have  360000 samples
The new test set will have  40000 samples


# Prepare model for LoRA Fine-Tuning

In [None]:
# loading the model

model = AutoModelForSequenceClassification.from_pretrained(name)

# The output below may say something like "some weights were not initialized"
# This indicates that the task-specific head was randomly initialized, as the model loaded
# from checkpoint was trained for a different task.
# This may lead to worse performance downstream, but it does not mean the entire model is trained from scratch

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    init_lora_weights="gaussian",
    target_modules=["q_lin", "v_lin"]
)

In [None]:
peft_model = get_peft_model(model, lora_config)

# Uncomment and run the code below to see how much the number of trainable params is reduced!!
# peft_model.print_trainable_parameters()

## Fine-tuning!!

In [None]:
checkpoints_dir = "./lora_results"

# check if checkpoints exist to determine if we need to load them, rather than re-fine-tuning
# checkpoints directory must exist and contain checkpoints
has_checkpoints = os.path.exists(checkpoints_dir) and len(os.listdir(checkpoints_dir))>0

In [None]:
# load adapters and inject into pretrained model
finetuned_model = peft_model.merge_and_unload()

# save for later use + in case of disconnected runtime
finetuned_model.save_pretrained("merged_distilbert_amazon")

In [None]:
    # set model args

    training_args = TrainingArguments(
        output_dir=checkpoints_dir,
        learning_rate=1e-3, # increased from 2e-4 to 1e-3 due to loss plateau at about 0.2
        num_train_epochs=1, # originally set to one because the model used is relatively small
        per_device_train_batch_size=4 # batch size was chosen heuristically, you may want to change it (+ LR acc.)
    )

    # init trainer
    trainer = Trainer(
        model=peft_model, # fine-tuning the low-rank adapters only
        args=training_args,
        train_dataset= tokenized_reviews["train"], # train if ran subsampling script
        eval_dataset= tokenized_reviews["test"],  # eval ' ' ' '
        tokenizer=tokenizer, # already tokenized, HF asks for this because metadata is used during training
        data_collator=collator
    )

    # train and resume from checkpoint if available
    trainer.train(resume_from_checkpoint=has_checkpoints)

  trainer = Trainer(


Step,Training Loss
73500,0.2273
74000,0.2381
74500,0.2324
75000,0.2415
75500,0.2614
76000,0.2228
76500,0.2315
77000,0.2379
77500,0.2246
78000,0.2596


Step,Training Loss
73500,0.2273
74000,0.2381
74500,0.2324
75000,0.2415
75500,0.2614
76000,0.2228
76500,0.2315
77000,0.2379
77500,0.2246
78000,0.2596


KeyboardInterrupt: 

# Saving Adapters

In [None]:
peft_model.save_pretrained("distilbert_amazon_adapters")

# Inference Example:

In [None]:
# Replace with whatever you want. Then run the cell below!
pos_prompt = "This is a very good laptop. It runs my model without crashing. Five stars"
neg_prompt = "This is a terrible laptop. It crashes constantly. Would give zero stars if I could"

# Failure case - the model cannot detect sarcasm - possibly due to untrained classifier head,
# short training, and/or small model size -> even smaller adapters
# Could also be a lack of domain-specific knowledge, since the fine-tuning dataset is not specific to tech products
sarcasm = "I love the way this laptop crashes constantly. I can run VS code for a whole second. Exactly what I was looking for"

In [None]:
# change pos_prompt to whatever prompt you want to use
# inputs tokenized and moved to gpu for inference
inputs = tokenizer(pos_prompt, return_tensors="pt").to(finetuned_model.device)

# get model output, logits, prediction, prob tensor
with torch.no_grad():
  out = model(**inputs)
  logits=out.logits
  predicted_class = logits.argmax(dim=-1)
  probs=torch.softmax(logits,dim=-1)

# small function to make outputs more human-readable
def polarity(pred):
  return "Positive" if pred[0] else "Negative"

# print results!!
print("Review Polarity: ", polarity(predicted_class))
print("Probabilities:", probs)
print("Raw logits:", logits)

Review Polarity:  Positive
Probabilities: tensor([[0.0230, 0.9770]], device='cuda:0')
Raw logits: tensor([[-1.6168,  2.1302]], device='cuda:0')
