# Lightweight Fine-Tuning Project

**PEFT technique:**

I have chosen LoRA. LoRA is effective for adapting large pre-trained models with minimal additional parameters. It works by inserting low-rank matrices into existing weights, allowing for targeted updates without altering the entire model. It was especially useful to handle the restrictive GPU RAM. 


**Model:**

The model selected is GPT-2 (124M), which is a smaller variant of the GPT-2 family, containing 124 million parameters. It's a good choice for experiments where computational resources might be a constraint, and it still retains strong language modeling capabilities.
    
**Evaluation approach:**

The evaluation metrics I have chosen are accuracy, precision, recall, and F1 score. This comprehensive set of metrics will allow to assess the model's performance from various angles, evaluating its overall correctness (accuracy), its ability to correctly identify positive cases (precision and recall), and a balanced metric considering both precision and recall (F1 score).
    
**Fine-tuning dataset:**

The IMDb dataset is selected for fine-tuning. This dataset, which contains movie reviews, is widely used for sentiment analysis tasks. It's an appropriate choice for fine-tuning a language model like GPT-2, especially when the aim is to enhance its capabilities in understanding and generating text related to movie reviews and sentiments.

## Loading and Evaluating a Foundation Model

In [49]:
import transformers
import torch
import torch.nn.functional as F
import numpy as np
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments, DataCollatorWithPadding

In [50]:
print(transformers.__version__) # Current version

4.37.1


In [51]:
tokenizer = AutoTokenizer.from_pretrained(
    "gpt2"
)

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1}
)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [52]:
ds = load_dataset("imdb")

In [53]:
## tokenize dataset
splits =["train", "test"]

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

tokenized_ds = {}

for split in splits:
    tokenized_ds[split] = ds[split].map(
        lambda x: tokenizer(
            x["text"], 
            truncation=True, 
            padding="max_length", 
            return_tensors="pt",
            max_length=1024
        ), batched=True 
    )

In [54]:
## function to calculate metrics only using numpy
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    logits = np.array(logits)
    labels = np.array(labels)
    predictions = np.argmax(logits, axis=-1)

    unique_labels = np.unique(labels)
    cm = np.zeros((len(unique_labels), len(unique_labels)), dtype=int)
    for i, label in enumerate(unique_labels):
        for j, pred in enumerate(unique_labels):
            cm[i, j] = np.sum((labels == label) & (predictions == pred))

            
    eps = 0.000000001
    tp = np.diag(cm) + eps
    fp = np.sum(cm, axis=0) - tp + eps
    fn = np.sum(cm, axis=1) - tp + eps

    precision = np.mean(tp / (tp + fp))
    recall = np.mean(tp / (tp + fn))
    accuracy = np.mean(labels == predictions)
    f1 = 2 * (precision * recall) / (precision + recall)

    metrics = {
        'accuracy': accuracy,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }
    return metrics


In [55]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device) # Check if GPU is available

cuda


In [56]:
def print_cuda_memory_usage():
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    if device.type == 'cuda':
        print(f"Used memory: {torch.cuda.memory_allocated(device)} bytes")
    else:
        print("CUDA is not available")

print_cuda_memory_usage() # Check memory usage

Used memory: 756926976 bytes


In [57]:
eval_ds_debug = tokenized_ds["test"].train_test_split(test_size=0.01) # Split off a small debug set
train_ds_debug = tokenized_ds["train"].train_test_split(test_size=0.01) # Split off a small debug set
model.config.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token) # Set pad token id for model

In [10]:
# using trainer class for evaluation of full test set
training_args_eval = TrainingArguments(
    output_dir="./outputs",
    do_train = False,
    do_eval = True,
    per_device_eval_batch_size=16,
)

eval_trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    args=training_args_eval,
    eval_dataset=tokenized_ds["test"],
    compute_metrics=compute_metrics
)

In [12]:
results = eval_trainer.evaluate() # Evaluate model on test set

1615it [06:38,  4.05it/s]                          


In [13]:
results # Print results

{'eval_loss': 1.502259373664856,
 'eval_accuracy': 0.4998,
 'eval_f1': 0.4711475401793022,
 'eval_precision': 0.44560212788392706,
 'eval_recall': 0.49980000000004005,
 'eval_runtime': 383.8466,
 'eval_samples_per_second': 65.13,
 'eval_steps_per_second': 4.072}

In [14]:
print_cuda_memory_usage() # Check memory usage

Used memory: 519954944 bytes


## Performing Parameter-Efficient Fine-Tuning

In [58]:
# import peft module
import gc
from peft import LoraConfig
from peft import get_peft_model

In [59]:
# function to get model parameters with peft
def print_trainable_parameters(model):
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
    )

In [60]:
from peft import TaskType
# set config for peft
config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
)

In [61]:
# get model with peft
model = get_peft_model(
    model,
    config,
)

print_trainable_parameters(model) # Print trainable parameters

trainable params: 296448 || all params: 124737792 || trainable%: 0.24




In [62]:
# clear memory if needed
gc.collect()
torch.cuda.empty_cache()

In [63]:
print_cuda_memory_usage() # Check memory usage

Used memory: 756645888 bytes


In [65]:
## set up trainer for training using peft model

trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./training/data", # The output directory
        learning_rate=2e-5, # learning rate
        per_device_train_batch_size=4, # batch size for training
        per_device_eval_batch_size=2, # batch size for evaluation
        evaluation_strategy="epoch", # evaluate after each epoch
        save_strategy="epoch", # save after each epoch
        num_train_epochs=3, # number of training epochs
        weight_decay=0.01, # strength of weight decay
        load_best_model_at_end=True, # load the best model when finished training (default metric is loss)
        label_names=["labels"] # name of the labels
    ),
    train_dataset=tokenized_ds["train"], # training dataset
    eval_dataset=eval_ds_debug["test"],# used smaller debug set for evaluation during training
    tokenizer=tokenizer, # tokenizer
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer), # data collator
    compute_metrics=compute_metrics # metrics function
)

In [66]:
trainer.train() # train model

  3%|▎         | 501/18750 [01:21<49:30,  6.14it/s]

{'loss': 0.8515, 'learning_rate': 1.9466666666666668e-05, 'epoch': 0.08}


  5%|▌         | 1001/18750 [02:42<48:02,  6.16it/s]

{'loss': 0.6485, 'learning_rate': 1.8933333333333334e-05, 'epoch': 0.16}


  8%|▊         | 1501/18750 [04:02<46:39,  6.16it/s]

{'loss': 0.4161, 'learning_rate': 1.8400000000000003e-05, 'epoch': 0.24}


 11%|█         | 2001/18750 [05:23<45:20,  6.16it/s]

{'loss': 0.3825, 'learning_rate': 1.7866666666666666e-05, 'epoch': 0.32}


 13%|█▎        | 2501/18750 [06:44<44:01,  6.15it/s]

{'loss': 0.4554, 'learning_rate': 1.7333333333333336e-05, 'epoch': 0.4}


 16%|█▌        | 3001/18750 [08:05<42:46,  6.14it/s]

{'loss': 0.4202, 'learning_rate': 1.6800000000000002e-05, 'epoch': 0.48}


 19%|█▊        | 3501/18750 [09:26<41:21,  6.15it/s]

{'loss': 0.4648, 'learning_rate': 1.6266666666666668e-05, 'epoch': 0.56}


 21%|██▏       | 4001/18750 [10:47<39:58,  6.15it/s]

{'loss': 0.426, 'learning_rate': 1.5733333333333334e-05, 'epoch': 0.64}


 24%|██▍       | 4501/18750 [12:08<38:37,  6.15it/s]

{'loss': 0.4546, 'learning_rate': 1.5200000000000002e-05, 'epoch': 0.72}


 27%|██▋       | 5001/18750 [13:29<37:15,  6.15it/s]

{'loss': 0.4056, 'learning_rate': 1.4666666666666666e-05, 'epoch': 0.8}


 29%|██▉       | 5501/18750 [14:50<35:52,  6.16it/s]

{'loss': 0.4062, 'learning_rate': 1.4133333333333334e-05, 'epoch': 0.88}


 32%|███▏      | 6001/18750 [16:11<34:37,  6.14it/s]

{'loss': 0.3714, 'learning_rate': 1.3600000000000002e-05, 'epoch': 0.96}


                                                    
 33%|███▎      | 6250/18750 [16:55<33:31,  6.21it/s]Checkpoint destination directory ./training/data\checkpoint-6250 already exists and is non-empty.Saving will proceed but saved results may be invalid.
 33%|███▎      | 6251/18750 [16:56<4:26:43,  1.28s/it]

{'eval_loss': 0.5142213702201843, 'eval_accuracy': 0.896, 'eval_f1': 0.8995856237093031, 'eval_precision': 0.9012337704290222, 'eval_recall': 0.8979434941387991, 'eval_runtime': 3.6837, 'eval_samples_per_second': 67.867, 'eval_steps_per_second': 33.933, 'epoch': 1.0}


 35%|███▍      | 6501/18750 [17:36<32:17,  6.32it/s]  

{'loss': 0.4467, 'learning_rate': 1.3066666666666668e-05, 'epoch': 1.04}


 37%|███▋      | 7001/18750 [18:55<30:59,  6.32it/s]

{'loss': 0.3904, 'learning_rate': 1.2533333333333336e-05, 'epoch': 1.12}


 40%|████      | 7501/18750 [20:13<29:46,  6.30it/s]

{'loss': 0.3884, 'learning_rate': 1.2e-05, 'epoch': 1.2}


 43%|████▎     | 8001/18750 [21:32<28:20,  6.32it/s]

{'loss': 0.4354, 'learning_rate': 1.1466666666666668e-05, 'epoch': 1.28}


 45%|████▌     | 8501/18750 [22:51<27:05,  6.31it/s]

{'loss': 0.3874, 'learning_rate': 1.0933333333333334e-05, 'epoch': 1.36}


 48%|████▊     | 9001/18750 [24:10<25:42,  6.32it/s]

{'loss': 0.395, 'learning_rate': 1.04e-05, 'epoch': 1.44}


 51%|█████     | 9501/18750 [25:29<24:22,  6.33it/s]

{'loss': 0.3901, 'learning_rate': 9.866666666666668e-06, 'epoch': 1.52}


 53%|█████▎    | 10001/18750 [26:48<23:02,  6.33it/s]

{'loss': 0.3512, 'learning_rate': 9.333333333333334e-06, 'epoch': 1.6}


 56%|█████▌    | 10501/18750 [28:07<21:47,  6.31it/s]

{'loss': 0.3643, 'learning_rate': 8.8e-06, 'epoch': 1.68}


 59%|█████▊    | 11001/18750 [29:26<20:29,  6.30it/s]

{'loss': 0.4414, 'learning_rate': 8.266666666666667e-06, 'epoch': 1.76}


 61%|██████▏   | 11501/18750 [30:45<19:06,  6.32it/s]

{'loss': 0.3772, 'learning_rate': 7.733333333333334e-06, 'epoch': 1.84}


 64%|██████▍   | 12001/18750 [32:04<17:48,  6.32it/s]

{'loss': 0.3788, 'learning_rate': 7.2000000000000005e-06, 'epoch': 1.92}


 67%|██████▋   | 12500/18750 [33:23<16:24,  6.35it/s]

{'loss': 0.3611, 'learning_rate': 6.666666666666667e-06, 'epoch': 2.0}


                                                     
 67%|██████▋   | 12500/18750 [33:26<16:24,  6.35it/s]Checkpoint destination directory ./training/data\checkpoint-12500 already exists and is non-empty.Saving will proceed but saved results may be invalid.
 67%|██████▋   | 12501/18750 [33:27<2:09:49,  1.25s/it]

{'eval_loss': 0.4192880392074585, 'eval_accuracy': 0.908, 'eval_f1': 0.9083501943229663, 'eval_precision': 0.9081541218645328, 'eval_recall': 0.908546351464627, 'eval_runtime': 3.5875, 'eval_samples_per_second': 69.687, 'eval_steps_per_second': 34.843, 'epoch': 2.0}


 69%|██████▉   | 13001/18750 [34:46<15:10,  6.32it/s]  

{'loss': 0.3586, 'learning_rate': 6.133333333333334e-06, 'epoch': 2.08}


 72%|███████▏  | 13501/18750 [36:04<13:50,  6.32it/s]

{'loss': 0.3802, 'learning_rate': 5.600000000000001e-06, 'epoch': 2.16}


 75%|███████▍  | 14001/18750 [37:23<12:32,  6.31it/s]

{'loss': 0.3954, 'learning_rate': 5.0666666666666676e-06, 'epoch': 2.24}


 77%|███████▋  | 14501/18750 [38:42<11:12,  6.32it/s]

{'loss': 0.3475, 'learning_rate': 4.533333333333334e-06, 'epoch': 2.32}


 80%|████████  | 15001/18750 [40:01<09:53,  6.32it/s]

{'loss': 0.3907, 'learning_rate': 4.000000000000001e-06, 'epoch': 2.4}


 83%|████████▎ | 15501/18750 [41:20<08:34,  6.32it/s]

{'loss': 0.3561, 'learning_rate': 3.4666666666666672e-06, 'epoch': 2.48}


 85%|████████▌ | 16001/18750 [42:39<07:14,  6.32it/s]

{'loss': 0.3567, 'learning_rate': 2.9333333333333338e-06, 'epoch': 2.56}


 88%|████████▊ | 16501/18750 [43:58<05:56,  6.31it/s]

{'loss': 0.419, 'learning_rate': 2.4000000000000003e-06, 'epoch': 2.64}


 91%|█████████ | 17001/18750 [45:17<04:36,  6.32it/s]

{'loss': 0.3543, 'learning_rate': 1.8666666666666669e-06, 'epoch': 2.72}


 93%|█████████▎| 17501/18750 [46:36<03:17,  6.31it/s]

{'loss': 0.3718, 'learning_rate': 1.3333333333333334e-06, 'epoch': 2.8}


 96%|█████████▌| 18001/18750 [47:55<01:58,  6.32it/s]

{'loss': 0.4005, 'learning_rate': 8.000000000000001e-07, 'epoch': 2.88}


 99%|█████████▊| 18501/18750 [49:14<00:39,  6.32it/s]

{'loss': 0.3499, 'learning_rate': 2.666666666666667e-07, 'epoch': 2.96}


                                                     
100%|██████████| 18750/18750 [49:56<00:00,  6.38it/s]Checkpoint destination directory ./training/data\checkpoint-18750 already exists and is non-empty.Saving will proceed but saved results may be invalid.
100%|██████████| 18750/18750 [49:56<00:00,  6.26it/s]

{'eval_loss': 0.41768962144851685, 'eval_accuracy': 0.912, 'eval_f1': 0.9122111113490372, 'eval_precision': 0.912000000000704, 'eval_recall': 0.9124223204568451, 'eval_runtime': 3.5688, 'eval_samples_per_second': 70.051, 'eval_steps_per_second': 35.025, 'epoch': 3.0}
{'train_runtime': 2996.9875, 'train_samples_per_second': 25.025, 'train_steps_per_second': 6.256, 'train_loss': 0.41210360310872396, 'epoch': 3.0}





TrainOutput(global_step=18750, training_loss=0.41210360310872396, metrics={'train_runtime': 2996.9875, 'train_samples_per_second': 25.025, 'train_steps_per_second': 6.256, 'train_loss': 0.41210360310872396, 'epoch': 3.0})

In [67]:
result = trainer.evaluate() # evaluate model on debug test set

100%|██████████| 125/125 [00:03<00:00, 34.43it/s]


In [68]:
result # print results

{'eval_loss': 0.41768962144851685,
 'eval_accuracy': 0.912,
 'eval_f1': 0.9122111113490372,
 'eval_precision': 0.912000000000704,
 'eval_recall': 0.9124223204568451,
 'eval_runtime': 3.8528,
 'eval_samples_per_second': 64.888,
 'eval_steps_per_second': 32.444,
 'epoch': 3.0}

In [69]:
# save the trained model
model.save_pretrained("./models/gpt2-lora")
trainer.save_model("./models/gpt2-lora-v2")

In [71]:
model = model.merge_and_unload() # merge peft model with original model. Not needed for inference. makes model footprint larger

## Performing Inference with a PEFT Model

In [72]:
# import needed modules for inference
from peft import PeftConfig, PeftModel

In [73]:
# load pretrained model with peft
peft_model_path = "./models/gpt2-lora-v2"

config = PeftConfig.from_pretrained(peft_model_path) # load config

# load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    config.base_model_name_or_path,
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
    id2label={0: "NEGATIVE", 1: "POSITIVE"}
)

# load peft model and merge with base model
inf_model = PeftModel.from_pretrained(
    base_model,
    peft_model_path
)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [77]:
inf_model.config.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token) # set pad token id for model
inf_model.eval() # set model to evaluation mode


## use trainer class for evaluation of full test set
training_args_eval = TrainingArguments(
    output_dir="./outputs",
    do_train = False,
    do_eval = True,
    per_device_eval_batch_size=16,
    label_names=["labels"]
)

eval_trainer = Trainer(
    model=inf_model,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    args=training_args_eval,
    eval_dataset=tokenized_ds["test"],
    compute_metrics=compute_metrics
)

In [78]:
results = eval_trainer.evaluate() # evaluate model on full test set

100%|██████████| 1563/1563 [06:48<00:00,  3.83it/s]


In [79]:
results # print results

{'eval_loss': 0.37846362590789795,
 'eval_accuracy': 0.9214,
 'eval_f1': 0.9216021099276379,
 'eval_precision': 0.9218043085407156,
 'eval_recall': 0.9214000000000062,
 'eval_runtime': 408.832,
 'eval_samples_per_second': 61.15,
 'eval_steps_per_second': 3.823}

In [83]:
## example inference and label prediction
# set up input for positive and negative example
inputs_1 = tokenizer(
    "I love this movie!",
    truncation=True,
    padding="max_length",
    return_tensors="pt",
    max_length=1024
)

inputs_2 = tokenizer(
    "I hate this movie!",
    truncation=True,
    padding="max_length",
    return_tensors="pt",
    max_length=1024
)

# move inputs to device
inputs_1.to(device)
inputs_2.to(device)
inf_model.to(device)

# inference
outputs_1 = inf_model(**inputs_1)
outputs_2 = inf_model(**inputs_2)
logits_1 = outputs_1.logits
logits_2 = outputs_2.logits
probabilities_1 = F.softmax(logits_1, dim=-1)
probabilities_2 = F.softmax(logits_2, dim=-1)
predicted_class_idx_1 = torch.argmax(probabilities_1, dim=-1)
predicted_class_idx_2 = torch.argmax(probabilities_2, dim=-1)

predicted_class_label_1 = inf_model.config.id2label[predicted_class_idx_1.item()]
predicted_class_label_2 = inf_model.config.id2label[predicted_class_idx_2.item()]

# print results
print(f"predicted label 1: {predicted_class_label_1}")
print(f"predicted label 2: {predicted_class_label_2}")

predicted label 1: POSITIVE
predicted label 2: NEGATIVE
