# Lightweight Fine-Tuning Project

* PEFT technique: LoRA Config
* Model: GPT2 as this is one of the best models to use for lots of NLP tasks and works well for this text classification task I worked on.
* Evaluation approach: My main evaluation approach was by the accuracy metric to evaluate the accuracy of the model. I ran 5 epochs for the fine tuning example to improve the accuracy and compared all 5 to the original training with the foundational model.
* Fine-tuning dataset: [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion). This dataset contains a piece of text along with the emotion that coresponds with that text.

## Loading and Evaluating a Foundation Model


In [1]:
from datasets import load_dataset
import numpy as np

from transformers import (AutoTokenizer)


In [2]:
dataset=load_dataset("dair-ai/emotion", split="train").train_test_split(
    test_size=0.2, shuffle=True, seed=42)
#sample elements
for split in dataset.keys():
    for i in range(3):
        entry = dataset[split][i]
        text = entry["text"]
        label = entry["label"]
        print(f"split={split}: text={text}, label={label}")
#number of items
splits = ["train", "test"]
print(dataset["train"])
print(dataset["test"])

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


split=train: text=i feel like god has been gracious in answering prayers, label=1
split=train: text=i should go to sleep but i m feeling reluctant to let go of the day, label=4
split=train: text=i feel all slutty for some reason oh wait i know ive had like guys talk to me about sex and stuff one guy dave was like, label=2
split=test: text=while cycling in the country, label=4
split=test: text=i had pocket qq and was feeling pretty confident lol, label=1
split=test: text=i am in no way complaining or whining or feeling ungrateful, label=0
Dataset({
    features: ['text', 'label'],
    num_rows: 12800
})
Dataset({
    features: ['text', 'label'],
    num_rows: 3200
})


In [3]:
tokenizer = AutoTokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token
tokenizer
def tokenize_function(examples):
    return tokenizer(examples["text"], padding=True, truncation=True)

tokenized_dataset = dataset.map(tokenize_function, batched=True)
tokenized_dataset["train"]

Map:   0%|          | 0/3200 [00:00<?, ? examples/s]

Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 12800
})

In [4]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=6,
    id2label={0: "sadness", 1: "joy", 2: "love", 3: "anger", 4: "fear", 5: "surprise"},  # For converting predictions to strings
    label2id={"sadness": 0, "joy": 1, "love": 2, "anger": 3, "fear": 4, "surprise": 5}
)
model.config.pad_token_id = model.config.eos_token_id
print(model)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=6, bias=False)
)


In [5]:
! pip install -U scikit-learn scipy matplotlib
! pip install -U accelerate

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable


In [6]:
import numpy as np
from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    accuracy = accuracy_score(labels, preds)
    recall = recall_score(labels, preds, average="weighted")
    precision = precision_score(labels, preds, average="weighted")
    f1 = f1_score(labels, preds, average="weighted")
    return {"accuracy": accuracy, "recall": recall, "precision": precision, "f1": f1}

In [7]:
from transformers import Trainer,DataCollatorWithPadding,DataCollator,TrainingArguments

foundation_model_trainer = Trainer(
    model=model,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
    args=TrainingArguments("model_outputs", evaluation_strategy="epoch"),
)

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)


In [8]:
foundation_model_trainer.evaluate()

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


{'eval_loss': 3.330284833908081,
 'eval_accuracy': 0.28875,
 'eval_recall': 0.28875,
 'eval_precision': 0.09845155709342562,
 'eval_f1': 0.13097115872985257,
 'eval_runtime': 14.9214,
 'eval_samples_per_second': 214.457,
 'eval_steps_per_second': 26.807}

## Performing Parameter-Efficient Fine-Tuning


In [10]:
!pip install peft
from peft import LoraConfig
from peft import get_peft_model
config = LoraConfig(
    task_type="SEQ_CLS",
    inference_mode=False,
    lora_alpha=32,
    lora_dropout=0.05,
    r=8,
)

lora_model = get_peft_model(model, config)
lora_model.print_trainable_parameters()

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable




trainable params: 304,128 || all params: 124,743,936 || trainable%: 0.2438018309763771


In [11]:
lora_model_trainer = Trainer(
    model=lora_model,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
    args=TrainingArguments("model_outputs", evaluation_strategy="epoch", num_train_epochs=5),
)
lora_model_trainer.train()

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)


Epoch,Training Loss,Validation Loss,Accuracy,Recall,Precision,F1
1,0.8158,0.594165,0.783438,0.783438,0.78419,0.772465
2,0.5379,0.426409,0.84625,0.84625,0.845367,0.843131
3,0.4386,0.356866,0.870313,0.870313,0.870489,0.868649
4,0.3818,0.324024,0.885625,0.885625,0.885497,0.885474
5,0.361,0.318324,0.885312,0.885312,0.885883,0.885479


Checkpoint destination directory model_outputs/checkpoint-500 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory model_outputs/checkpoint-1000 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory model_outputs/checkpoint-1500 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory model_outputs/checkpoint-2000 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory model_outputs/checkpoint-2500 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory model_outputs/checkpoint-3000 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory model_outputs/checkpoint-3500 already exists and is non-empty.Saving will 

TrainOutput(global_step=8000, training_loss=0.6061933250427246, metrics={'train_runtime': 861.5111, 'train_samples_per_second': 74.288, 'train_steps_per_second': 9.286, 'total_flos': 2550929380392960.0, 'train_loss': 0.6061933250427246, 'epoch': 5.0})

In [12]:
lora_model_trainer.evaluate()

{'eval_loss': 0.3183244466781616,
 'eval_accuracy': 0.8853125,
 'eval_recall': 0.8853125,
 'eval_precision': 0.8858827176293483,
 'eval_f1': 0.8854786517583563,
 'eval_runtime': 14.5221,
 'eval_samples_per_second': 220.354,
 'eval_steps_per_second': 27.544,
 'epoch': 5.0}

In [13]:
lora_model.save_pretrained("model_outputs/lora_model")

## Performing Inference with a PEFT Model


Results from foundational Model:
{'eval_loss': 3.330284833908081,
 'eval_accuracy': 0.28875,
 'eval_recall': 0.28875,
 'eval_precision': 0.09845155709342562,
 'eval_f1': 0.13097115872985257,
 'eval_runtime': 14.9214,
 'eval_samples_per_second': 214.457,
 'eval_steps_per_second': 26.807}
 
Results from PEFT Model Epoch:

| Epoch | Training Loss | Validation Loss | Accuracy | Recall | Precision | F1 |
|:---|:---|:---|:---|:---|:---|:---|
| 1 | 0.815800 | 0.594165 | 0.783438 | 0.783438 | 0.784190 | 0.772465 |
| 2 | 0.537900 | 0.426409 | 0.846250 | 0.846250 | 0.845367 | 0.843131 |
| 3 | 0.438600 | 0.356866 | 0.870313 | 0.870313 | 0.870489 | 0.868649 |
| 4 | 0.381800 | 0.324024 | 0.885625 | 0.885625 | 0.885497 | 0.885474 |
| 5 | 0.361000 | 0.318324 | 0.885312 | 0.885312 | 0.885883 | 0.885479 |


Even if just looking at the first epoch, the accuracy is much greater at 78.3% compared to 28.9

In [14]:
sample_string = "i also loved that you could really feel the desperation in these sequences and i especially liked the emotion between knight and squire as theyve been together in a similar fashion to batman and robin for a long time now"

# Move the model to the GPU
lora_model.to("cuda")
tokenized_sample = tokenizer(sample_string, return_tensors="pt").to("cuda")
predictions = lora_model(**tokenized_sample).logits
predicted_label = predictions.argmax().item()
predicted_label_str = model.config.id2label[predicted_label]
print(f"Predicted label: {predicted_label_str}")

Predicted label: love


In [15]:
# Comparison with the validation split in the dataset
dataset = load_dataset("dair-ai/emotion", split="validation")
tokenized_validation_dataset = dataset.map(tokenize_function, batched=True)
lora_model_trainer.evaluate(tokenized_validation_dataset)

You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


{'eval_loss': 0.2887805700302124,
 'eval_accuracy': 0.8975,
 'eval_recall': 0.8975,
 'eval_precision': 0.8981405337356684,
 'eval_f1': 0.8977233140226177,
 'eval_runtime': 9.6657,
 'eval_samples_per_second': 206.917,
 'eval_steps_per_second': 25.865,
 'epoch': 5.0}