# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: 
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
# ! pip install -r requirements.txt
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, DataCollatorWithPadding, Trainer, TrainingArguments
import numpy as np
import datetime

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
# Load the sms_spam dataset
# See: https://huggingface.co/datasets/dair-ai/emotion
# Load the dataset
dataset = load_dataset("dair-ai/emotion", split="train").train_test_split(test_size=0.2, shuffle=True, seed=23)
# Inspect the dataset
print("Dataset sample:", dataset["train"][0])

Dataset sample: {'text': 'i am feeling hopeful excited and very much being made new', 'label': 1}


In [3]:
# Inspect the dataset
print("Dataset sample:", dataset["train"][0])

Dataset sample: {'text': 'i am feeling hopeful excited and very much being made new', 'label': 1}


In [4]:
# Initialize tokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_dataset = dataset.map(tokenize_function, batched=True)

# Verify tokenized dataset
print("Tokenized dataset sample:", tokenized_dataset["train"][0])

Tokenized dataset sample: {'text': 'i am feeling hopeful excited and very much being made new', 'label': 1, 'input_ids': [101, 1045, 2572, 3110, 17772, 7568, 1998, 2200, 2172, 2108, 2081, 2047, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


In [5]:
# Define label mappings
num_labels = 6
id2label = {0: 'sadness', 1: 'joy', 2: 'love', 3: 'anger', 4: 'fear', 5: 'surprise'}
label2id = {v: k for k, v in id2label.items()}

# Initialize the model
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=num_labels,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([6]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([6, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
# Verify the model
print(model)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [7]:
# Freeze the base model parameters
for param in model.base_model.parameters():
    param.requires_grad = False

# Print model parameters
total_params = sum(p.numel() for p in model.parameters())
total_trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"{total_params:,} total parameters.")
print(f"{total_trainable_params:,} trainable parameters.")
print(f"{total_trainable_params/total_params:.2%} of parameters are trainable.")


66,958,086 total parameters.
595,206 trainable parameters.
0.89% of parameters are trainable.


In [8]:
# Prepare for training
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

training_args = TrainingArguments(
    output_dir=f"./results/{model_name}/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S"),
    num_train_epochs=5,
    per_device_train_batch_size=64,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    learning_rate=2e-5,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=1,
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics,
)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [9]:
trainer.evaluate()

{'eval_loss': 1.7782342433929443,
 'eval_accuracy': 0.168125,
 'eval_runtime': 3.2789,
 'eval_samples_per_second': 975.944,
 'eval_steps_per_second': 15.249}

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [10]:
from peft import LoraConfig, get_peft_model, AutoPeftModelForCausalLM, AutoPeftModelForSequenceClassification
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Define label mappings
num_labels = 6
id2label = {0: 'sadness', 1: 'joy', 2: 'love', 3: 'anger', 4: 'fear', 5: 'surprise'}
label2id = {v: k for k, v in id2label.items()}

# Load the tokenizer and the original model
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, 
                                                           num_labels=num_labels,
                                                           id2label=id2label,
                                                           label2id=label2id,
                                                           ignore_mismatched_sizes=True)

# Define the LoRA configuration
config = LoraConfig(
    task_type='SEQ_CLS',
    target_modules=["q_lin", "k_lin", "v_lin"],  # Apply LoRA to attention layers
    r=8,  # Rank of the LoRA
    lora_alpha=32,  # Alpha scaling factor
    lora_dropout=0.1  # Dropout probability for LoRA
)

# Apply LoRA to the model
lora_model = get_peft_model(model, config)

# Print trainable parameters
lora_model.print_trainable_parameters()


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([6]) in the model instantiated
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([6, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 816,390 || all params: 67,774,476 || trainable%: 1.2046


In [11]:
# Prepare for training
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

training_args = TrainingArguments(
    output_dir=f"./results/{model_name}-lora/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S"),
    num_train_epochs=5,
    per_device_train_batch_size=64,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    learning_rate=2e-5,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=1,
    load_best_model_at_end=True,
)

lora_trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer),
    compute_metrics=compute_metrics,
)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [12]:
lora_trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,1.418761,0.501563
2,No log,1.232308,0.540625
3,1.446600,0.993635,0.639687
4,1.446600,0.867171,0.696562
5,0.972300,0.830087,0.705625


TrainOutput(global_step=1000, training_loss=1.2094049072265625, metrics={'train_runtime': 103.4553, 'train_samples_per_second': 618.625, 'train_steps_per_second': 9.666, 'total_flos': 954899335623168.0, 'train_loss': 1.2094049072265625, 'epoch': 5.0})

In [13]:
lora_trainer.evaluate()

{'eval_loss': 0.8300866484642029,
 'eval_accuracy': 0.705625,
 'eval_runtime': 2.1291,
 'eval_samples_per_second': 1503.016,
 'eval_steps_per_second': 23.485,
 'epoch': 5.0}

In [14]:
# Save the loRA model
lora_model.save_pretrained(f"./saved_peft/{model_name}-lora")

In [15]:
lora_model

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): DistilBertForSequenceClassification(
      (distilbert): DistilBertModel(
        (embeddings): Embeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (transformer): Transformer(
          (layer): ModuleList(
            (0-5): 6 x TransformerBlock(
              (attention): MultiHeadSelfAttention(
                (dropout): Dropout(p=0.1, inplace=False)
                (q_lin): lora.Linear(
                  (base_layer): Linear(in_features=768, out_features=768, bias=True)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.1, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=768

In [16]:
# merge and unload the model

lora_model.merge_and_unload()

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [17]:
# get samples from the tokenized dataset of test split and check some predictions
samples = tokenized_dataset["test"]
sample = samples[0]

# Predict the label
inputs = tokenizer(sample["text"], return_tensors="pt").to("cuda")
outputs = lora_model(**inputs)
predicted_label_id = np.argmax(outputs.logits[0].detach().cpu().numpy())
predicted_label = id2label[predicted_label_id]

print("Text:", sample["text"])
print("Predicted label:", predicted_label)

Text: i came to china feeling a little frightened of everything around me
Predicted label: fear


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [18]:
# get samples from the tokenized dataset of test split and check some predictions
sample = {'text':'I am feeling very happy that I finished this project.'}

# Predict the label
inputs = tokenizer(sample['text'], return_tensors="pt").to("cuda")
outputs = lora_model(**inputs)
predicted_label_id = np.argmax(outputs.logits[0].detach().cpu().numpy())
predicted_label = id2label[predicted_label_id]

print("Text:", sample["text"])
print("Predicted label:", predicted_label)

Text: I am feeling very happy that I finished this project.
Predicted label: joy


In [19]:
f"{model_name}-lora"

'distilbert-base-uncased-finetuned-sst-2-english-lora'

In [20]:
from transformers.utils import logging
logging.set_verbosity_error() 

from peft import LoraConfig, get_peft_model, AutoPeftModelForCausalLM, AutoPeftModelForSequenceClassification

# load lora model and check the predictions
loaded_lora_model = AutoPeftModelForSequenceClassification.from_pretrained(f"saved_peft/{model_name}-lora",
                                                                           id2label=id2label,
                                                                         label2id=label2id,
                                                                         ignore_mismatched_sizes=True)
sample = {'text':'i am fearful of the dark'}

# Predict the label
inputs = tokenizer(sample['text'], return_tensors="pt")
outputs = loaded_lora_model(**inputs)
predicted_label_id = np.argmax(outputs.logits[0].detach().cpu().numpy())
predicted_label = id2label[predicted_label_id]

print("Text:", sample["text"])
print("Predicted label:", predicted_label)

Text: i am fearful of the dark
Predicted label: fear
