# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA (Low-Rank Adaptation). LoRA is widely used, easy to implement, and compatible with sequence classification models
* Model: GPT-2 (using GPT2ForSequenceClassification). It's small, efficient, and Hugging Face provides all utilities for sequence classification
* Evaluation approach: The Hugging Face Trainer’s evaluate method is used for straightforward, replicable evaluations.
* Fine-tuning dataset: https://huggingface.co/datasets/dair-ai/emotion
The dataset is small, clean, widely used for benchmarking, and contains tweets labeled with six emotions (joy, sadness, anger, fear, love, surprise), making it ideal for quick experiments on limited hardware.

In [None]:
pip install -U datasets

In [None]:
!pip install scikit-learn

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

1. Load the Pre-trained Model and Tokenizer

In [7]:
from transformers import GPT2Tokenizer, GPT2ForSequenceClassification

# Load and augment the tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})

# Load the model
model = GPT2ForSequenceClassification.from_pretrained("gpt2", num_labels=6)
model.resize_token_embeddings(len(tokenizer))

# The IMPORTANT BIT:
model.config.pad_token_id = tokenizer.pad_token_id


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


2. Load and Inspect the Dataset

In [2]:
from datasets import load_dataset
dataset = load_dataset("dair-ai/emotion")

train_dataset = dataset["train"]
val_dataset = dataset["validation"]
test_dataset = dataset["test"]

print(train_dataset[0])  # See one example
print(train_dataset.features)  # Output: {'text': ..., 'label': ...}


{'text': 'i didnt feel humiliated', 'label': 0}
{'text': Value('string'), 'label': ClassLabel(names=['sadness', 'joy', 'love', 'anger', 'fear', 'surprise'])}


3. Tokenize the Text Data

In [3]:
def preprocess(batch):
    return tokenizer(batch["text"], truncation=True, padding="max_length", max_length=64)

encoded_dataset = dataset.map(preprocess, batched=True)


4. Evaluate the Foundation Model

In [8]:
# Create a compute_metrics function:
import numpy as np
from sklearn.metrics import accuracy_score

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return {"accuracy": accuracy_score(labels, predictions)}


# Set up and run evaluation:
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="/tmp/gpt2-emotion",
    per_device_eval_batch_size=8,
    no_cuda=True
)

trainer = Trainer(
    model=model,
    args=training_args,
    eval_dataset=encoded_dataset["validation"],
    compute_metrics=compute_metrics,
)

baseline_metrics = trainer.evaluate()
print(baseline_metrics)



##This establishes a baseline for model performance prior to adaptation.

{'eval_loss': 4.240687847137451, 'eval_accuracy': 0.285, 'eval_runtime': 424.1897, 'eval_samples_per_second': 4.715, 'eval_steps_per_second': 0.589}


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

1. Configure LoRA and Create a PEFT Model

In [9]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
    bias="none",
    task_type="SEQ_CLS"
)
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()  # Shows only LoRA params are trainable




trainable params: 304,128 || all params: 124,744,704 || trainable%: 0.24380032999236584


2. Fine-Tune the Model

In [10]:
training_args = TrainingArguments(
    output_dir="/tmp/gpt2_peft",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=1,
    evaluation_strategy="epoch",
    save_strategy="epoch",
)
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["validation"],
    compute_metrics=compute_metrics,
)
trainer.train()
peft_model.save_pretrained("/tmp/gpt2_peft_lora")


Epoch,Training Loss,Validation Loss,Accuracy
1,1.1545,1.068019,0.6195


###  ⚠️ IMPORTANT ⚠️

Due to workspace storage constraints, you should not store the model weights in the same directory but rather use `/tmp` to avoid workspace crashes which are irrecoverable.
Ensure you save it in /tmp always.

In [13]:
# Saving the model
model.save_pretrained("/tmp/model")
tokenizer.save_pretrained("/tmp/model")

('/tmp/model/tokenizer_config.json',
 '/tmp/model/special_tokens_map.json',
 '/tmp/model/vocab.json',
 '/tmp/model/merges.txt',
 '/tmp/model/added_tokens.json')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

1. Load the Fine-Tuned Model

In [14]:
from peft import PeftModel

peft_model = PeftModel.from_pretrained(model, "/tmp/gpt2_peft_lora")
trainer = Trainer(
    model=peft_model,
    args=training_args,
    eval_dataset=encoded_dataset["validation"],
    compute_metrics=compute_metrics,
)


2. Evaluate the Fine-Tuned Model and Compare

In [15]:
peft_metrics = trainer.evaluate()
print("Baseline:", baseline_metrics)
print("PEFT Fine-Tuned:", peft_metrics)


Baseline: {'eval_loss': 4.240687847137451, 'eval_accuracy': 0.285, 'eval_runtime': 424.1897, 'eval_samples_per_second': 4.715, 'eval_steps_per_second': 0.589}
PEFT Fine-Tuned: {'eval_loss': 1.0680186748504639, 'eval_accuracy': 0.6195, 'eval_runtime': 8.8185, 'eval_samples_per_second': 226.796, 'eval_steps_per_second': 28.35}


The experiment shows that the original GPT-2 foundation model performs poorly at emotion classification (accuracy ≈ 28.5%). However, after applying LoRA-based parameter-efficient fine-tuning for just one epoch, validation accuracy rose to nearly 62%. This demonstrates the effectiveness of PEFT for adapting large language models to custom text classification tasks with minimal computational cost.

Stage   	Accuracy	Eval Loss	What it means
Baseline 	0.285   	4.24     	Pretrained, not tuned for emotions—almost random
PEFT LoRA 	0.6195   	1.07    	Model now specialized for emotion detection—good improvement!
