# Lightweight Fine-Tuning Project

* PEFT technique:
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Fine Tuning BERT-BASE-UNCASED model to classify emotions

A model that classifies a given sequence of words into emotional states according to 6 different emotion categories using the "dair-ai/emotion" dataset.

https://huggingface.co/datasets/dair-ai/emotion

In [None]:
!pip install -q "datasets==2.15.0"

In [None]:
!pip install --upgrade "notebook"

In [None]:
!pip install --upgrade "ipywidgets"

In [None]:
!pip install --upgrade "scikit-learn"

### Load Dataset

In [1]:
from datasets import load_dataset

dataset = load_dataset("dair-ai/emotion", split='test[:100]')

No config specified, defaulting to: emotion/split
Found cached dataset emotion (C:/Users/ahmet.yaylalioglu/.cache/huggingface/datasets/dair-ai___emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd)


### Load Foundation Model as Base Model

Added parameters according to my task like number of labels...

In [64]:
from transformers import AutoModelForSequenceClassification,AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

model = AutoModelForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=6,
    id2label={0: "sadness", 1: "joy", 2: "love", 3: "anger", 4: "fear", 5:"surprise"},
    label2id={"sadness": 0, "joy": 1, "love": 2, "anger": 3 ,"fear": 4,"surprise": 5},
    ignore_mismatched_sizes=True
)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# 1) Evaluate Base Model Performance on the dataset

In [65]:
from transformers import pipeline

classifier = pipeline("text-classification", model=model,tokenizer=tokenizer)

In [66]:
predictions = []
i=0
for example in dataset:
    predictions.append(classifier(example["text"]))

In [67]:
true_labels = [example["label"] for example in dataset]  # Assuming 'label' is the correct field name

In [68]:
#[{'label': 'surprise', 'score': 0.19729962944984436}]
predicted_labels = [pred[0]['label'] for pred in predictions]
predicted_labels_numerical = []
label2id_list={"sadness": 0, "joy": 1, "love": 2, "anger": 3 ,"fear": 4,"surprise": 5}
for predict in predicted_labels:
    predicted_labels_numerical.append(label2id_list[predict])

In [69]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

# Compute accuracy
accuracy = accuracy_score(true_labels, predicted_labels_numerical)
print(f"Accuracy: {accuracy:.4f}")

# Compute precision, recall, and F1 score
precision, recall, f1, _ = precision_recall_fscore_support(true_labels, predicted_labels_numerical, average='weighted')
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

Accuracy: 0.1700
Precision: 0.1212
Recall: 0.1700
F1 Score: 0.1329


  _warn_prf(average, modifier, msg_start, len(result))


# 2) Parameter Efficient Fine Tuning with LoRa

In [70]:
# Prepare Dataset for FineTuning
dataset_train = load_dataset("dair-ai/emotion", split="train[:10000]").train_test_split(
    test_size=0.2, shuffle=True, seed=23)

splits = ["train", "test"]

# View the dataset characteristics
dataset_train["train"]

No config specified, defaulting to: emotion/split
Found cached dataset emotion (C:/Users/ahmet.yaylalioglu/.cache/huggingface/datasets/dair-ai___emotion/split/1.0.0/cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd)
Loading cached split indices for dataset at C:\Users\ahmet.yaylalioglu\.cache\huggingface\datasets\dair-ai___emotion\split\1.0.0\cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd\cache-bedc4e92f51cb7c1.arrow and C:\Users\ahmet.yaylalioglu\.cache\huggingface\datasets\dair-ai___emotion\split\1.0.0\cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd\cache-d0686eef115903c0.arrow


Dataset({
    features: ['text', 'label'],
    num_rows: 8000
})

In [71]:
# Sample Data in Train Set
dataset_train["train"][3272]

{'text': 'i am happy to see that he is off with hopefully a good job but i can t help feel a little greedy',
 'label': 3}

In [72]:
# Tokenize all examples in the dataset
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset_train[split].map(
        lambda x: tokenizer(x["text"], truncation=True), batched=True)

# Columns in dataset for inspection purpose
print(tokenized_dataset["train"])
print(tokenized_dataset["test"])

Loading cached processed dataset at C:\Users\ahmet.yaylalioglu\.cache\huggingface\datasets\dair-ai___emotion\split\1.0.0\cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd\cache-3e6fd90d1dbe137c.arrow
Loading cached processed dataset at C:\Users\ahmet.yaylalioglu\.cache\huggingface\datasets\dair-ai___emotion\split\1.0.0\cca5efe2dfeb58c1d098e0f9eeb200e9927d889b5a03c67097275dfb5fe463bd\cache-d6f22233b84dee18.arrow


Dataset({
    features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 8000
})
Dataset({
    features: ['text', 'label', 'input_ids', 'token_type_ids', 'attention_mask'],
    num_rows: 2000
})


In [73]:
# Unfreeze all the model parameters.
# Hint: Check the documentation at https://huggingface.co/transformers/v4.2.2/training.html
for param in model.parameters():
    param.requires_grad = False

In [74]:
# according to this post https://stackoverflow.com/questions/78031519/how-to-resolve-valueerror-you-should-supply-an-encoding-or-a-list-of-encodings I changed label column to labels
train_lora = tokenized_dataset['train'].rename_column('label', 'labels')
test_lora = tokenized_dataset['test'].rename_column('label', 'labels')

In [75]:
# Foundation Model Architecture
print(model)

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12, element

In [76]:
# Lora Configuration for my classification task
from peft import LoraConfig,get_peft_model, TaskType
config = LoraConfig(r=8, lora_alpha=32,lora_dropout=0.1,bias="all",fan_in_fan_out=False,task_type=TaskType.SEQ_CLS)
lora_model = get_peft_model(model, config)

In [77]:
# LoRa Model Architecture
print(lora_model)

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): BertForSequenceClassification(
      (bert): BertModel(
        (embeddings): BertEmbeddings(
          (word_embeddings): Embedding(30522, 768, padding_idx=0)
          (position_embeddings): Embedding(512, 768)
          (token_type_embeddings): Embedding(2, 768)
          (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (encoder): BertEncoder(
          (layer): ModuleList(
            (0): BertLayer(
              (attention): BertAttention(
                (self): BertSelfAttention(
                  (query): Linear(
                    in_features=768, out_features=768, bias=True
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.1, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=768, out_features

In [78]:
lora_model.print_trainable_parameters()

trainable params: 402,438 || all params: 109,786,380 || trainable%: 0.36656459571761085


In [79]:
# Disable WANDB visualizer on my local computer
import os
os.environ['WANDB_DISABLED'] = 'true'

In [80]:
# My Trainer
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments


def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer
trainer = Trainer(
    model=lora_model,
    args=TrainingArguments(
        output_dir="./data/emotion_classification_LoRa/model",
        # Set the learning rate
        learning_rate = 2e-5,
        # Set the per device train batch size and eval batch size
        per_device_train_batch_size = 16,
        per_device_eval_batch_size = 16,
        # Evaluate and save the model after each epoch
        evaluation_strategy = "epoch",
        save_strategy = "epoch",
        num_train_epochs=20,
        weight_decay=0.01,
        load_best_model_at_end=True,
        remove_unused_columns=True,
    ),
    train_dataset=train_lora,
    eval_dataset=test_lora,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer.train()

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss


TrainOutput(global_step=10000, training_loss=0.7724719161987305, metrics={'train_runtime': 1034.0852, 'train_samples_per_second': 154.726, 'train_steps_per_second': 9.67, 'total_flos': 3876050712812544.0, 'train_loss': 0.7724719161987305, 'epoch': 20.0})

In [81]:
# Evaluation of my LoRa model
trainer.evaluate()

{'eval_loss': 0.48413366079330444,
 'eval_accuracy': 0.8245,
 'eval_runtime': 6.568,
 'eval_samples_per_second': 304.507,
 'eval_steps_per_second': 19.032,
 'epoch': 20.0}

In [82]:
# Save LoRa Model
lora_model.save_pretrained("emotion_classifier-lora_v4")

# 3) Inference Saved Model

In [93]:
# Load Saved Model
from peft import AutoPeftModelForSequenceClassification
lora_model = AutoPeftModelForSequenceClassification.from_pretrained("emotion_classifier-lora_v4",num_labels=6, ignore_mismatched_sizes=True)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [96]:
# Example from test test and tokenize text
text = dataset_train["test"][1]['text']
print(text)
inputs = tokenizer(text, return_tensors="pt")

i don t think that woman ever feels generous because she is too busy dying of love


In [97]:
with torch.no_grad():
    logits = lora_model(**inputs).logits

tokens = inputs.tokens()
predictions = torch.argmax(logits, dim=1)

In [98]:
for token, prediction in zip(tokens, predictions.cpu().numpy()):
    print((token, model.config.id2label[prediction]))

('[CLS]', 'love')
