<a href="https://colab.research.google.com/github/desdesmond/GenAICourse/blob/main/Fine_tune_sentiment_into_emotions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Udacity GenAI Nanodegree**
# Project exercise: Classic fine tuning a pretrained model vs. fine tuning with a loRA adapter


Prepare HF Dataset for fine tuning

In [2]:
# Load HF Dataset

!pip install datasets
from datasets import load_dataset
dataset = load_dataset("dair-ai/emotion")
dataset = dataset["train"].train_test_split(test_size=0.2) # 20% for test, remaining for training
splits = ["train", "test"]

Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl (

README.md:   0%|          | 0.00/9.05k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/1.03M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/127k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/129k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/16000 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [None]:
dataset["test"][0]

{'text': 'i dont i feel amazed', 'label': 5}

In [3]:
# Get number of labels from the dataset
# atr: https://github.com/achimoraites/machine-learning-playground/blob/main/NLP/Text%20classification/Lightweight_RoBERTa_PEFT_LORA_FineTuning.ipynb
num_labels = dataset['train'].features['label'].num_classes
class_names = dataset["train"].features["label"].names
print(f"number of labels: {num_labels}")
print(f"the labels: {class_names}")

number of labels: 6
the labels: ['sadness', 'joy', 'love', 'anger', 'fear', 'surprise']


In [10]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

checkpoint = "mnoukhov/gpt2-imdb-sentiment-classifier"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(
    checkpoint,
    num_labels=num_labels,
    id2label={i: label for i, label in enumerate(class_names)},
    label2id={label: i for i, label in enumerate(class_names)},
    ignore_mismatched_sizes=True # the checkpoint had only 2 output, but now we are training this with 6 output.
    )



Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at mnoukhov/gpt2-imdb-sentiment-classifier and are newly initialized because the shapes did not match:
- score.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([6, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [11]:
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["text"], truncation=True), batched=True
    )

Map:   0%|          | 0/12800 [00:00<?, ? examples/s]

Map:   0%|          | 0/3200 [00:00<?, ? examples/s]

In [None]:
# Let's see what's the dataset looks like.

tokenized_dataset["train"]

Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 12800
})

In [None]:
# Unfreeze param, for some reason the checkpoint does not let param to be frozen, this block has been skipped.

# for param in model.parameters():
#     param.requires_grad = True
    # print(model)

In [12]:
# Setup evaluation metrics

import numpy as np
import os
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

# Setup evaluation metrics. Attributed to GenAI course from Udacity
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


## Here, it's a classic training setup.

In [None]:
# Trainer setup. Attributed to GenAI course from Udacity
trainer = Trainer(
    model=model,
    args=TrainingArguments(
        output_dir="./data/trained_with_six_emotions",
        # Set the learning rate
        learning_rate=2e-5,
        # Set the per device train batch size and eval batch size
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        # Evaluate and save the model after each epoch
        evaluation_strategy="epoch",
        save_strategy="epoch",
        # Set the learning rate
        num_train_epochs=5,
        weight_decay=0.01,
        report_to="none",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)



In [None]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.7913,0.225281,0.920312
2,0.1871,0.170745,0.93125
3,0.1476,0.165168,0.938125
4,0.1161,0.151894,0.932187
5,0.1053,0.154245,0.932187


TrainOutput(global_step=4000, training_loss=0.2333261079788208, metrics={'train_runtime': 853.7603, 'train_samples_per_second': 74.962, 'train_steps_per_second': 4.685, 'total_flos': 1452919564763136.0, 'train_loss': 0.2333261079788208, 'epoch': 5.0})

In [24]:
# Save and share my checkpoint on HF
from huggingface_hub import notebook_login
notebook_login()


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
trainer.push_to_hub("desdesmond/emotion-classifer")

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/498M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/desdesmond/trained_with_six_emotions/commit/e9c1a478c9e605d029b7d4dbb1d50dda79a8d1db', commit_message='desdesmond/emotion-classifer', commit_description='', oid='e9c1a478c9e605d029b7d4dbb1d50dda79a8d1db', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
# Inference via HF's Pipline method

from transformers import pipeline

pipe = pipeline("text-classification", model="desdesmond/trained_with_six_emotions")

pipe(" I've been waiting for a HuggingFace course my whole life.")

config.json:   0%|          | 0.00/1.28k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/498M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/477 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/587 [00:00<?, ?B/s]

[{'label': 'joy', 'score': 0.541525661945343}]

##Here, I am training the same pre-traied model as before, but with a loRA adapter.

In [14]:
from peft import LoraConfig, get_peft_model

# Let's see if we can improve my trained model with loRA
lora_checkpoint_model = AutoModelForSequenceClassification.from_pretrained(
    checkpoint,
    # "desdesmond/trained_with_six_emotions",
    num_labels=num_labels,
    id2label={i: label for i, label in enumerate(class_names)},
    label2id={label: i for i, label in enumerate(class_names)},
    ignore_mismatched_sizes=True,
    )

print(lora_checkpoint_model)


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at mnoukhov/gpt2-imdb-sentiment-classifier and are newly initialized because the shapes did not match:
- score.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([6, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=6, bias=False)
)


In [20]:
lora_config = LoraConfig(
    r=4,
    target_modules=["c_proj"], # selected the last c_proj to test on 1 layer only
    lora_alpha=8,
    lora_dropout=0.05,
    bias="none",
    task_type="TOKEN_CLS", # experimented with task_type and finally figure out that Token Classification is the correct task
)

lora_model = get_peft_model(lora_checkpoint_model, lora_config)

In [16]:
# Attr. pacman100 hhttps://github.com/huggingface/peft/issues/41#issuecomment-1404611868

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [17]:
# Let's check how many params are trainable with this lora_config
print_trainable_parameters(lora_checkpoint_model)

trainable params: 258048 || all params: 124702464 || trainable%: 0.2069309552696569


In [21]:
# Setup the loRA trainer

lora_trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./data/lora_training",
        learning_rate=2e-5,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        num_train_epochs=5,
        weight_decay=0.01,
        report_to="none",
        # load_best_model_at_end=True, # To-do, need to experiment with this again as now we are using the correct task in arg.
    ),
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),

)



In [22]:
lora_trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,1.9491,1.56831,0.340938
2,1.4934,1.421875,0.492188
3,1.4128,1.278417,0.524375
4,1.2545,1.201891,0.544063
5,1.1988,1.182313,0.548125


TrainOutput(global_step=4000, training_loss=1.4294532928466797, metrics={'train_runtime': 494.5681, 'train_samples_per_second': 129.406, 'train_steps_per_second': 8.088, 'total_flos': 1466465272676352.0, 'train_loss': 1.4294532928466797, 'epoch': 5.0})

In [26]:
lora_trainer.push_to_hub("desdesmond/lora_training")

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/desdesmond/lora_training/commit/775d8c15b0a62850c8bf1fb5e3b0c9fa9b72f890', commit_message='desdesmond/lora_training', commit_description='', oid='775d8c15b0a62850c8bf1fb5e3b0c9fa9b72f890', pr_url=None, pr_revision=None, pr_num=None)

In [27]:
lora_trainer.evaluate()

{'eval_loss': 1.1823127269744873,
 'eval_accuracy': 0.548125,
 'eval_runtime': 8.3818,
 'eval_samples_per_second': 381.78,
 'eval_steps_per_second': 23.861,
 'epoch': 5.0}

In [6]:
# Inferance with my loRA adapter on the original pretrained model

!pip install torch peft

from transformers import AutoModelForCausalLM, AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel, PeftConfig
import torch

text = "This is my worst nightmare"
base_model = "mnoukhov/gpt2-imdb-sentiment-classifier"
adapter_model = "desdesmond/lora_training"

# Comment out the next block if running the whole notebook
num_labels = 6
id2label = {0: "sadness", 1: "joy", 2: "fear", 3: "anger", 4: "surprise", 5: "love"}
label2id = {v: k for k, v in id2label.items()}
class_names = list(id2label.values())

model = AutoModelForSequenceClassification.from_pretrained(
    base_model,
    num_labels=num_labels,
    id2label={i: label for i, label in enumerate(class_names)},
    label2id={label: i for i, label in enumerate(class_names)},
    ignore_mismatched_sizes=True,
    )
model.load_adapter(adapter_model)

# model = PeftModel.from_pretrained(
#     model,
#     adapter_model,
#     num_labels=num_labels,
#     id2label={i: label for i, label in enumerate(class_names)},
#     label2id={label: i for i, label in enumerate(class_names)},
#     ignore_mismatched_sizes=True,
#     )

tokenizer = AutoTokenizer.from_pretrained(base_model)
inputs = tokenizer(text, return_tensors="pt")


with torch.no_grad():
    logits = model(**inputs).logits





Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at mnoukhov/gpt2-imdb-sentiment-classifier and are newly initialized because the shapes did not match:
- score.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([6, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/120 [00:00<?, ?B/s]



In [7]:
# Let's see how the adapter classify the text.

predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]

'sadness'

In [10]:
print(f'\n Label:{model.config.id2label[predicted_class_id]}, Text:{text}')


 Label:sadness, Text:This is my worst nightmare


Conclusion:


**Results from loRA adapter:**
Training Loss
1.949100
1.493400
1.412800
1.254500
1.198800

Validation Loss
1.568310
1.421875
1.278417
1.201891
1.182313

Accuracy
0.340938
0.492188
0.524375
0.544063
0.548125

**Results from classic training:**
Training Loss
0.920312
0.931250
0.938125
0.932187
0.932187

Validation Loss
0.225281
0.170745
0.165168
0.151894
0.154245

Accuracy
0.920312
0.931250
0.938125
0.932187
0.932187

After 5 Epochs on both training, tt seems that the loRA adapter has not been able to achieve the same accuracy as the classic training. Perhaps further experimentations on target layers and randomization of the dataset could help.