# LoRA Guide

This notebook will walk you through LoRA fine-tuning.

# First let's make sure we have all the needed packages installed and imported

In [2]:
%pip install --upgrade pip
%pip install --disable-pip-version-check \
    torch==1.13.1 \
    torchdata==0.5.1 --quiet

%pip install \
    transformers==4.27.2 \
    datasets==2.11.0 \
    evaluate==0.4.0 \
    rouge_score==0.1.2 \
    loralib==0.1.1 \
    peft==0.3.0 --quiet

Collecting pip
  Downloading pip-23.3.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.1.2
    Uninstalling pip-23.1.2:
      Successfully uninstalled pip-23.1.2
Successfully installed pip-23.3.1
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m887.5/887.5 MB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.6/4.6 MB[0m [31m51.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m849.3/849.3 kB[0m [31m44.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m557.1/557.1 MB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m317.1/317.1 MB[0m [31m4.6 MB/s[0m eta [36m0

In [3]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

## Next, we need to load in the data

The dataset we'll be using is one of the many provided by HuggingFace. It contains 3 columns: summary, text, and title. For this notebook we will be using the first two - summary and text.

We can see that the dataset contains three subsets: train, test, and ca-test. In this notebook we'll use only the train one.

In [5]:
from datasets import load_dataset

billsum = load_dataset("billsum")

Downloading builder script:   0%|          | 0.00/3.66k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/1.80k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/6.70k [00:00<?, ?B/s]

Downloading and preparing dataset billsum/default to /root/.cache/huggingface/datasets/billsum/default/3.0.0/75cf1719d38d6553aa0e0714c393c74579b083ae6e164b2543684e3e92e0c4cc...


Downloading data:   0%|          | 0.00/67.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/18949 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/3269 [00:00<?, ? examples/s]

Generating ca_test split:   0%|          | 0/1237 [00:00<?, ? examples/s]

Dataset billsum downloaded and prepared to /root/.cache/huggingface/datasets/billsum/default/3.0.0/75cf1719d38d6553aa0e0714c393c74579b083ae6e164b2543684e3e92e0c4cc. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

In [6]:
billsum

DatasetDict({
    train: Dataset({
        features: ['text', 'summary', 'title'],
        num_rows: 18949
    })
    test: Dataset({
        features: ['text', 'summary', 'title'],
        num_rows: 3269
    })
    ca_test: Dataset({
        features: ['text', 'summary', 'title'],
        num_rows: 1237
    })
})

In [8]:
billsum = billsum['train']

billsum

Dataset({
    features: ['text', 'summary', 'title'],
    num_rows: 18949
})

# Now let's load the model

For this guide, we will be using the small version of the pre-trained [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5).
<br><br>

Short overview of the model: The Flan-T5 (Few-shot Learning Any-shot) model is a variation of the T5 (Text-to-Text Transfer Transformer) model that has been pre-trained on a wide range of publicly available text from the internet. It is specifically designed to handle various few-shot learning tasks, adapting to new tasks with minimal examples, and the "small" variant indicates a smaller model size compared to its base or large counterparts, making it more computationally efficient while still harnessing the powerful capabilities of the T5 architecture.

In [9]:
model_name='google/flan-t5-base'

original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

(…)le/flan-t5-base/resolve/main/config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

(…)base/resolve/main/generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

(…)-base/resolve/main/tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

(…)flan-t5-base/resolve/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

(…)ase/resolve/main/special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

# Analysis of Number of Parameters

Now let's take a look at how many parameters the model has. These are all of the parameters that we would need to update if we were doing full fine-tuning.

In [10]:
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

trainable model parameters: 247577856
all model parameters: 247577856
percentage of trainable model parameters: 100.00%


Now, let's compare how many parameters we'd be looking at if we were to do LoRA finetuning instead.
<br><br>
For reference, we are using the [peft](https://pypi.org/project/peft/) package, which supports various Parameter Efficient Fine-Tuning methods. In this guide, we focus only on LoRA.

In [11]:
from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
    r=32, # Rank - hyperparameter, according to the paper even rank of 4 shows extreme improvements in compute time without decrease in quality
    lora_alpha=32,
    target_modules=["q", "v"], # Which matrices to target with LoRA. These are the best candidates according to the paper
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5 is a Sequence-to-Sequence transformer
)

In [12]:
peft_model = get_peft_model(original_model,
                            lora_config)
print(print_number_of_trainable_model_parameters(peft_model))

trainable model parameters: 3538944
all model parameters: 251116800
percentage of trainable model parameters: 1.41%


# Training a LoRA Model

Using the transformers library, we can train the LoRA model, just as simply as we would train any other - set the TrainingArguments, and then use the Trainer(see the documentation [here](https://huggingface.co/docs/transformers/main_classes/trainer)) to perform the training.
- The training parameters are found experimentally.

<br><br>
*Note on learning rate*: LoRA can utilize a higher learning rate than full fine-tuning because its low-rank updates, which constrain changes to a subspace, reduce the risk of harming the model's pre-trained knowledge. This allows for faster convergence and potentially better performance within the defined subspace, compared to the cautious, lower learning rates needed for full fine-tuning to avoid significant deviations from pre-trained weights.

In [13]:
def tokenize_function(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + text + end_prompt for text in example["text"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    example['labels'] = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids

    return example

tokenized_billsum = billsum.map(tokenize_function, batched=True)

# Since at this point, we only need the input_ids and labels
tokenized_billsum = tokenized_billsum.remove_columns(['text', 'summary', 'title'])

Map:   0%|          | 0/18949 [00:00<?, ? examples/s]

In [None]:
output_dir = f'./lora-summary-training-{str(int(time.time()))}'

# Initialize Necessary Arguments
lora_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=1,
    logging_steps=1,
    max_steps=1
)

# Initialize Trainer
lora_trainer = Trainer(
    model=lora_training_args,
    args=lora_training_args,
    train_dataset=tokenized_billsum,
)

The next cell will take some time, please be patient and ignore any warning messages that might appear.

In [None]:
# Perform Training
lora_trainer.train()

Now that we trained the model, we need to save it so we can use it at a different time:

In [None]:
# Initialize Path for Final Model
lora_model_path="./lora-summary-checkpoint-local"

# Save Trained Model & Tokenizer
lora_trainer.model.save_pretrained(lora_model_path)
tokenizer.save_pretrained(lora_model_path)

# Evaluation

There is a few different approaches one can take towards evaluation of a model:
1. Manual qualitative evaluation - run the model on couple different prompts, and judge how much better/worse it is doing
2. Compute metrics such as ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [not useful for any natural language task]
3. Use another model, such as GPT-4 to rate the outputs [make sure the evaluation prompt passed to the model is well crafted]

Since this is meant as a short guide, we will leave the evaluation as an exercise to be performed at home. You can either evaluate the base model, pre-training and after training. Or, you can train it using full fine-tuning, and see how much better or worse LoRA performs compared to that.

Thank you for following along with this guide, I hope it was helpful and enjoyable!