# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: GPT2
* Evaluation approach: evaluate()
* Fine-tuning dataset: Emotion dataset

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
# Install the necessary libraries
!pip install datasets
!pip install accelerate
!pip install peft
!pip install scikit-learn

Collecting datasets
  Downloading datasets-2.19.0-py3-none-any.whl (542 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub>=0.21.2 (from datasets)
  Downloading huggingface_hub-0.22.2-py3-none-any.

In [1]:
# Importing the necessary libraries

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments, EvalPrediction
import pandas as pd
from peft import get_peft_model, LoraConfig, TaskType, PeftModelForSequenceClassification, AutoPeftModelForSequenceClassification
import torch
from datasets import load_dataset
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np

In [2]:
# Load the Dataset
emotions_dataset = load_dataset("emotion")

# Define the tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Data preprocessing
def preprocess_function(examples):
    result = tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512)
    result['labels'] = examples['label']
    return result

training_dataset = emotions_dataset['train'].map(preprocess_function, batched=True)
validation_dataset = emotions_dataset['validation'].map(preprocess_function, batched=True)
testing_dataset = emotions_dataset['test'].map(preprocess_function, batched=True)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.


Downloading builder script:   0%|          | 0.00/3.97k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/3.28k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/8.78k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/592k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/74.0k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/74.9k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/16000 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2000 [00:00<?, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/16000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [3]:
# Loading the model: GPT2
# Defining the task

id2label={0:'sadness',1:'joy',2:'love',3:'anger',4:'fear',5:'surprise'}
label2id={'sadness':0,'joy':1,'love':2,'anger':3,'fear':4,'surprise':5}

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=6,
    id2label=id2label,
    label2id=label2id,
)

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
# Ensuring consistency between the padding token used by our model and its corresponding tokenizer
model.config.pad_token_id = tokenizer.pad_token_id

# Compute metrics function
def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)

    accuracy = accuracy_score(p.label_ids, preds)

    # Use precision_recall_fscore_support for more detailed insights
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')

    return {
        "accuracy": accuracy,
        "f1": f1,
        "precision": precision,
        "recall": recall
    }

In [5]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

In [None]:
# Define data collator
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Define the training arguments
training_args = TrainingArguments(
        output_dir = "./data/emotion",
        learning_rate = 2e-3,
        per_device_train_batch_size = 8,
        per_device_eval_batch_size = 8,
        save_strategy = "epoch",
        evaluation_strategy = "epoch",
        num_train_epochs = 1,
        weight_decay = 0.01,
        load_best_model_at_end = True,
    )

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=training_dataset,
    eval_dataset=validation_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Start training
trainer.train()

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3565,0.298816,0.8935,0.893802,0.894556,0.8935


Evaluation Results: {'eval_loss': 0.29881560802459717, 'eval_accuracy': 0.8935, 'eval_f1': 0.8938024891860384, 'eval_precision': 0.8945556323423608, 'eval_recall': 0.8935, 'eval_runtime': 72.1012, 'eval_samples_per_second': 27.739, 'eval_steps_per_second': 3.467, 'epoch': 1.0}


<h4>It is evident that the model has decent accuracy of about 90%. Let us check if we can improve this further!

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [6]:
# Defining the peft model, and the task
config = LoraConfig(task_type=TaskType.SEQ_CLS, fan_in_fan_out = True)
lora_model = get_peft_model(model, config)

In [7]:
# Checking the number of trainable parameters
lora_model.print_trainable_parameters()

trainable params: 299,520 || all params: 124,743,936 || trainable%: 0.24010786384037136


In [9]:
# Defining the parameters for performing PEFT

training_args = TrainingArguments(
        output_dir = "./data/emotion_peft",
        learning_rate = 2e-3,
        per_device_train_batch_size = 8,
        per_device_eval_batch_size = 8,
        save_strategy = "epoch",
        evaluation_strategy = "epoch",
        num_train_epochs = 2,
        weight_decay = 0.01,
        load_best_model_at_end = True,
    )

trainer = Trainer(
    model = lora_model,
    args = training_args,
    train_dataset=training_dataset,
    eval_dataset=validation_dataset,
    tokenizer = tokenizer,
    data_collator = data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.2763,0.246494,0.9205,0.920491,0.92209,0.9205
2,0.1699,0.151039,0.9375,0.937516,0.938065,0.9375


TrainOutput(global_step=4000, training_loss=0.33503167724609373, metrics={'train_runtime': 2941.1778, 'train_samples_per_second': 10.88, 'train_steps_per_second': 1.36, 'total_flos': 8391242022912000.0, 'train_loss': 0.33503167724609373, 'epoch': 2.0})

In [10]:
# Saving the model
lora_model.save_pretrained('model/lora_model')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [11]:
# Loading the model
model_infer = AutoPeftModelForSequenceClassification.from_pretrained(
    "model/lora_model",
    num_labels=6
)

# Replacing pad token with eos token
model_infer.config.pad_token_id = model_infer.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [13]:
# Define the training arguments
training_args1 = TrainingArguments(
        output_dir = "./data/emotion1",
        learning_rate = 2e-3,
        per_device_train_batch_size = 8,
        per_device_eval_batch_size = 8,
        save_strategy = "epoch",
        evaluation_strategy = "epoch",
        num_train_epochs = 1,
        weight_decay = 0.01,
        load_best_model_at_end = True,
    )

trainer = Trainer(
    model=model_infer,
    args=training_args1,
    eval_dataset=validation_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Evaluate the model
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Evaluation Results: {'eval_loss': 0.15103906393051147, 'eval_accuracy': 0.9375, 'eval_f1': 0.9375162093336639, 'eval_precision': 0.9380649116670122, 'eval_recall': 0.9375, 'eval_runtime': 81.5753, 'eval_samples_per_second': 24.517, 'eval_steps_per_second': 3.065}


In [12]:
# Performing inferencing

# Function to predict the class of the input text
def predict(text: str) -> str:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model_infer.to(device)

    # Prepare the input
    inputs = tokenizer(text, return_tensors="pt").to(device)

    # Get predictions
    with torch.no_grad():
        outputs = model_infer(**inputs)
        logits = outputs.logits

    # Converting the logits to probabilities
    probabilities = torch.nn.functional.softmax(logits, dim=1)
    predicted_class_id = probabilities.argmax().item()
    predicted_emotion = id2label[predicted_class_id]

    return predicted_emotion

# Example usage
comment = "I feel my life wasting away."
predicted_emotion = predict(comment)
print(f"Comment: '{comment}'\nPredicted label: {predicted_emotion}")

Comment: 'I feel my life wasting away.'
Predicted label: sadness


In [17]:
# Performing some inferencing on the testing set

records_indices = [1,3,10,18,31,43]

for i in records_indices:
    record = testing_dataset[i]
    predicted_emotion = predict(record['text'])

    print(f"Comment: '{record['text']}'\nPredicted label: {predicted_emotion}")
    label = id2label[record['label']]
    print(f'Actual Label: {label}')

Comment: 'im updating my blog because i feel shitty'
Predicted label: sadness
Actual Label: sadness
Comment: 'i left with my bouquet of red and yellow tulips under my arm feeling slightly more optimistic than when i arrived'
Predicted label: joy
Actual Label: joy
Comment: 'i don t feel particularly agitated'
Predicted label: anger
Actual Label: fear
Comment: 'i feel just bcoz a fight we get mad to each other n u wanna make a publicity n let the world knows about our fight'
Predicted label: anger
Actual Label: anger
Comment: 'i posted on my facebook page earlier this week ive been feeling a little grumpy and out of sorts the past few days'
Predicted label: anger
Actual Label: anger
Comment: 'i feel i have to agree with her even though i can imagine some rather unpleasant possible cases'
Predicted label: sadness
Actual Label: sadness



We can see that the performance of the model improves by 4% in just 2 epochs. Fine tuning helps the model learn and predict better
---

