# Lightweight Fine-Tuning Project

## Dataset

- tweet: where each line includes the text of a tweet that included emoji (but the emoji has been removed).

- emoji: where each line includes the name of the emoji for the corresponding text in tweet.

Note: There are total 10 unique emojis, meaning that we have to use 10 labels.

# EDA

- We have removed special characters like @,%,^ etc.
- Kept hashtags, as i felt like they can be useful at some point.
- Removed tags, tags didn't seem that important for text classification.
- Removed extra spaces in text.

<br>

Unfortunately, we haven't trained tweet column here, we could have gone with GoogleNews-vectors-negative300 to improve text, stemming and lemmatization. We kept it as it is to see what accuracy our transformer models will give.

<br>

Doing stemming, lemmatization and training on GoogleNews-vectors-negative300 could have gave us more accuracy but lets just go with it see how our model does.


Summary

* PEFT technique: **LoRA**
* Model: **gpt2 or GPT-2**
* Evaluation approach: **Transformer trainer**
* Fine-tuning dataset: **Twitter emoji**

# Installing required libraries

In [1]:
!pip install transformers
!pip install peft
!pip install datasets
!pip install pandas
!pip install numpy
!pip install scikit-learn
!pip install tqdm

Collecting transformers
  Downloading transformers-4.38.1-py3-none-any.whl.metadata (131 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m131.1/131.1 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m[31m4.6 MB/s[0m eta [36m0:00:01[0m
[?25hCollecting filelock (from transformers)
  Downloading filelock-3.13.1-py3-none-any.whl.metadata (2.8 kB)
Collecting huggingface-hub<1.0,>=0.19.3 (from transformers)
  Downloading huggingface_hub-0.21.1-py3-none-any.whl.metadata (13 kB)
Collecting numpy>=1.17 (from transformers)
  Downloading numpy-1.26.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
Collecting regex!=2019.12.17 (from transformers)
  Downloading regex-2023.12.25-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

# Importing required libraries

In [7]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, EvalPrediction
from datasets import Dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np
from transformers import DataCollatorWithPadding
from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification
import torch
import tqdm

### Loading dataset

In [4]:
df = pd.read_csv("twitter.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,tweet,emoji
0,0,bet you'll get hungry,heart_eyes
1,1,starbucks employee confuses boyfriend by sayin...,yum
2,2,when your starbucks store makes you an iced mo...,sob
3,3,"being told ""girl your romper looks fierce!"" at...",blush
4,4,"i got a starbucks drink at school today, shit ...",sob


## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [5]:
# Encode the emoji labels into numerical format
unique_emojis = df['emoji'].unique()
emoji2id = {emoji: id for id, emoji in enumerate(unique_emojis)}
id2emoji = {id: emoji for emoji, id in emoji2id.items()}

# Add a new column for the encoded labels
df['label'] = df['emoji'].map(emoji2id)

# Split the dataset into training and validation sets
train_df, val_df = train_test_split(df, test_size=0.1, stratify=df['label'])

# Convert the dataframes into Hugging Face datasets
train_dataset = Dataset.from_pandas(train_df)
val_dataset = Dataset.from_pandas(val_df)

# Define the tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Tokenize and convert
def tokenize_and_encode(examples):
    tokenized_inputs = tokenizer(examples['tweet'], padding="max_length", truncation=True, max_length=512)
    tokenized_inputs['labels'] = examples['label']
    return tokenized_inputs

train_dataset = train_dataset.map(tokenize_and_encode, batched=True)
val_dataset = val_dataset.map(tokenize_and_encode, batched=True)


Map: 100%|█████████████████████| 202797/202797 [00:52<00:00, 3894.56 examples/s]
Map: 100%|███████████████████████| 22534/22534 [00:05<00:00, 4230.26 examples/s]


In [8]:
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=len(unique_emojis))
model.config.pad_token_id = tokenizer.pad_token_id

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


# Compute metrics function
def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": accuracy_score(p.label_ids, preds), "f1": f1, "precision": precision, "recall": recall}

# Define the training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Start training
trainer.train()

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,1.5625,1.50156,0.473152,0.461392,0.512691,0.473152


Evaluation Results: {'eval_loss': 1.5015604496002197, 'eval_accuracy': 0.4731516819029023, 'eval_f1': 0.4613924705482957, 'eval_precision': 0.5126908954309577, 'eval_recall': 0.4731516819029023, 'eval_runtime': 208.1504, 'eval_samples_per_second': 108.258, 'eval_steps_per_second': 1.696, 'epoch': 1.0}


In [9]:
model

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=10, bias=False)
)

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [10]:
# PEFT model configuration
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=4,
    lora_alpha=16,
    lora_dropout=0.1
)

# Load the pre-trained GPT-2 model
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=len(unique_emojis))
model.config.pad_token_id = model.config.eos_token_id

peft_model = PeftModelForSequenceClassification(model, peft_config)

# Print
peft_model.print_trainable_parameters()


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 155,136 || all params: 124,602,624 || trainable%: 0.12450460112300685




In [11]:
# Compute metrics function
def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": accuracy_score(p.label_ids, preds), "f1": f1, "precision": precision, "recall": recall}

# Define the training arguments
training_args = TrainingArguments(
    output_dir="./results/peft_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs/peft_model',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Start training
trainer.train()

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,1.9241,1.906503,0.338733,0.288626,0.40975,0.338733


Evaluation Results: {'eval_loss': 1.9065030813217163, 'eval_accuracy': 0.33873258187627586, 'eval_f1': 0.288626218178022, 'eval_precision': 0.40974959176114145, 'eval_recall': 0.33873258187627586, 'eval_runtime': 219.9013, 'eval_samples_per_second': 102.473, 'eval_steps_per_second': 1.605, 'epoch': 1.0}


In [12]:
peft_model.save_pretrained('model/peft_model')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [14]:
inference_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "model/peft_model",
    num_labels=len(unique_emojis)
)
inference_model.config.pad_token_id = inference_model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [15]:
trainer = Trainer(
    model=inference_model,
    args=training_args,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Evaluate the model
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Evaluation Results: {'eval_loss': 1.9065030813217163, 'eval_accuracy': 0.33873258187627586, 'eval_f1': 0.288626218178022, 'eval_precision': 0.40974959176114145, 'eval_recall': 0.33873258187627586, 'eval_runtime': 219.2801, 'eval_samples_per_second': 102.764, 'eval_steps_per_second': 1.61}


In [16]:
def predict(sentence: str) -> str:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    inference_model.to(device)

    # Prepare the input text
    inputs = tokenizer(sentence, return_tensors="pt").to(device)

    # Get predictions
    with torch.no_grad():
        outputs = inference_model(**inputs)
        logits = outputs.logits

    probabilities = torch.nn.functional.softmax(logits, dim=1)
    predicted_class_id = probabilities.argmax().item()
    predicted_label = id2emoji[predicted_class_id]

    return predicted_label

# Example usage
sentence = "I'm sad and i wanna cry"
predicted_label = predict(sentence)
print(f"Sentence: '{sentence}'\nPredicted label: {predicted_label}")


Sentence: 'I'm sad and i wanna cry'
Predicted label: sob


In [17]:
sentence = "That was delicious!!"
predicted_label = predict(sentence)
print(f"Sentence: '{sentence}'\nPredicted label: {predicted_label}")

Sentence: 'That was delicious!!'
Predicted label: heart_eyes


In [18]:
sentence = "it was yummy"
predicted_label = predict(sentence)
print(f"Sentence: '{sentence}'\nPredicted label: {predicted_label}")

Sentence: 'it was yummy'
Predicted label: heart_eyes


In [19]:
sentence = "I love you!"
predicted_label = predict(sentence)
print(f"Sentence: '{sentence}'\nPredicted label: {predicted_label}")

Sentence: 'I love you!'
Predicted label: heart_eyes


In [20]:
indices_for_review = [0, 1, 2, 3, 4]

for idx in indices_for_review:
    item = val_dataset[idx]


    print(item['tweet'][:100])
    actual_label_id = item['label']
    actual_label = id2emoji[actual_label_id]
    print(f'label:  {actual_label}')

    # Tokenize the text
    inputs = tokenizer(item['tweet'], return_tensors="pt").to(inference_model.device)

    with torch.no_grad():
        logits = inference_model(**inputs).logits

    predictions = torch.argmax(logits, dim=1).item()
    predicted_label = id2emoji[predictions]
    print(f'prediction: {predicted_label}\n')


i miss you so much but i bet i don't even cross your mind
label:  weary
prediction: sob

missed walmart #aldubdatekay
label:  heart_eyes
prediction: heart_eyes

you bet
label:  yum
prediction: wink

starbucks pumpkin spice latte is back, yes
label:  heart_eyes
prediction: yum

so the media isn't going to promote this movie?! bet. y'all gone see me at the theatre on august 26t
label:  blush
prediction: sob



### **Overall Summary:**


| Metric              | GPT-2                  | PEFT                   |
|---------------------|------------------------|------------------------|
| Training Loss       | 1.5625                 | 1.9241                 |
| Validation Loss     | 1.5016                 | 1.9065                 |
| Accuracy            | 47.32%                 | 33.87%                 |
| F1 Score            | 46.14%                 | 28.86%                 |
| Precision           | 51.27%                 | 40.97%                 |
| Recall              | 47.32%                 | 33.87%                 |
| Training Duration   | 1 hr 33 min            | 1 hr 25 min            |
| Eval Runtime        | 3 min 28 sec           | 3 min 39 sec           |
| Samples per Second  | 108.258                | 102.473                |
| Steps per Second    | 1.696                  | 1.605                  |
| Epoch               | 1.0                    | 1.0                    |


Conclusions:

1. We can clearly see that GPT-2 approach was more effective for this specific task and dataset. Which lets us know that fine-tuning will not always increase the accuracy.
2. In dataset, we had multiple labels, if our dataset had sentimental analysis like positive or negative, the hyperparameters could have been better and would have played some role in improving PEFT's accuracy.
3. Increasing Epoch could have given us more of an insight or perhaps more accuracy?