# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: 
* Model: 
* Evaluation approach: 
* Fine-tuning dataset: 

## Load Libaries

In [2]:
%pip install scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
  Downloading scikit_learn-1.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.5/13.5 MB[0m [31m31.0 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
Collecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Collecting joblib>=1.2.0
  Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m35.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.4.2 scikit-learn-1.6.0 threadpoolctl-3.5.0
Note: you may need to restart the kernel to use updated packages.


In [60]:
# Standard Libraries
import os

# Data Science Libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score

# Logger Libraries
from tqdm import tqdm
import bitsandbytes as bnb

# PyTorch Libraries
import torch
import torch.nn as nn

# Transformers and Datasets Libraries
import transformers
from transformers import RobertaTokenizer, RobertaForSequenceClassification, Trainer, TrainingArguments
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification

from datasets import Dataset

# Fine-Tuning Libraries
from peft import LoraConfig, PeftConfig

from transformers import (AutoModelForCausalLM, 
                          AutoTokenizer, 
                          BitsAndBytesConfig, 
                          TrainingArguments, 
                          pipeline, 
                          logging)

## Load Data

Data is coming from Kaggle Challenge for NLP Tweets for Natural Disaster.

https://www.kaggle.com/c/nlp-getting-started

You are predicting whether a given tweet is about a real disaster or not. If so, predict a 1. If not, predict a 0.



In [4]:
# Load the dataset from a CSV file
df = pd.read_csv("./train.csv")
df.rename(columns={"text": "tweet"}, inplace=True)
df.head()

Unnamed: 0,id,keyword,location,tweet,target
0,1,,,Our Deeds are the Reason of this #earthquake M...,1
1,4,,,Forest fire near La Ronge Sask. Canada,1
2,5,,,All residents asked to 'shelter in place' are ...,1
3,6,,,"13,000 people receive #wildfires evacuation or...",1
4,7,,,Just got sent this photo from Ruby #Alaska as ...,1


In [61]:
df.shape

(5080, 5)

### Missing Values

In [5]:
missing_values = df.isnull().sum()

print("Number of Missing Values:", missing_values)

Number of Missing Values: id             0
keyword       61
location    2533
tweet          0
target         0
dtype: int64


In [10]:
df = df.dropna()
missing_values = df.isnull().sum()
print("Number of Missing Values:", missing_values)

Number of Missing Values: id          0
keyword     0
location    0
tweet       0
target      0
dtype: int64


### Train/Test/Validation Split

In [11]:
# Split the dataset into train and test sets
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)


In [13]:
df_train.target.value_counts()

target
0    2304
1    1760
Name: count, dtype: int64

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [54]:
# Foundational Model
from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification

base_model_name = "distilbert-base-uncased"

#Tokenizer
tokenizer = DistilBertTokenizerFast.from_pretrained(base_model_name)

model = DistilBertForSequenceClassification.from_pretrained(
    base_model_name, 
    num_labels=2
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [19]:
# Tokenize the dataset
def tokenize_function(examples):
    return tokenizer(examples['tweet'], truncation=True, padding='max_length')

# Convert pandas DataFrame to Hugging Face Dataset
train_data = Dataset.from_pandas(df_train[["tweet", "target"]])
eval_data = Dataset.from_pandas(df_test[["tweet", "target"]])


In [20]:
# Apply the tokenizer to the training and evaluation sets
train_data = train_data.map(tokenize_function, batched=True)
eval_data = eval_data.map(tokenize_function, batched=True)


Map:   0%|          | 0/4064 [00:00<?, ? examples/s]

Map:   0%|          | 0/1016 [00:00<?, ? examples/s]

In [None]:
# Set the format for PyTorch
train_data.set_format(type='torch', columns=['input_ids', 'attention_mask', 'target'])
eval_data.set_format(type='torch', columns=['input_ids', 'attention_mask', 'target'])

# Rename the 'target' column to 'labels' as expected by the Trainer
train_data = train_data.rename_column("target", "labels")
eval_data = eval_data.rename_column("target", "labels")

In [None]:
from torch.utils.data import DataLoader

batch_size = 16

val_loader = DataLoader(eval_data, batch_size=batch_size, shuffle=False)


### Evaluate Foundational Model

In [29]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

all_preds = []
all_labels = []

with torch.no_grad():
    for batch in val_loader:
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        labels = batch["labels"].to(device)

        outputs = model(input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        preds = torch.argmax(logits, dim=1)

        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

# Compute Accuracy and F1
val_acc = accuracy_score(all_labels, all_preds)
val_f1 = f1_score(all_labels, all_preds, average="weighted")

print(f"Validation Accuracy (foundation model): {val_acc:.4f}")
print(f"Validation F1 (foundation model): {val_f1:.4f}")


Validation Accuracy (foundation model): 0.5768
Validation F1 (foundation model): 0.4315


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [36]:
pip install --upgrade peft trl


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Defaulting to user installation because normal site-packages is not writeable
Collecting peft
  Downloading peft-0.14.0-py3-none-any.whl (374 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m374.8/374.8 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting trl
  Downloading trl-0.13.0-py3-none-any.whl (293 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m293.4/293.4 kB[0m [31m33.6 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub>=0.25.0
  Downloading huggingface_hub-0.27.0-py3-none-any.whl (450 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m450.5/450.5 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
Collecting transformers
  Downloading transformers-4.47.1-py3-none-any.whl (10.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m76.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting rich
  Downloading rich-13.9.4-py3-none-any.whl (242 kB)
[2K 

      Successfully uninstalled tqdm-4.66.2
[0m  Attempting uninstall: requests
    Found existing installation: requests 2.31.0
    Uninstalling requests-2.31.0:
      Successfully uninstalled requests-2.31.0
[0m  Attempting uninstall: huggingface-hub
    Found existing installation: huggingface-hub 0.21.4
    Uninstalling huggingface-hub-0.21.4:
      Successfully uninstalled huggingface-hub-0.21.4
[0mSuccessfully installed accelerate-1.2.1 datasets-3.2.0 huggingface-hub-0.27.0 markdown-it-py-3.0.0 mdurl-0.1.2 peft-0.14.0 requests-2.32.3 rich-13.9.4 safetensors-0.5.0 tokenizers-0.21.0 tqdm-4.67.1 transformers-4.47.1 trl-0.13.0
Note: you may need to restart the kernel to use updated packages.


In [80]:
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
    r=16,  # LoRA rank
    lora_alpha=32,  # Scaling factor
    target_modules = ["attention.q_lin", "attention.k_lin", "attention.v_lin"],
    lora_dropout=0.1,  # Dropout rate for LoRA
    bias="none",  # Bias mode
    task_type="SEQ_CLS"  # Task type
)

# Wrap model with PEFT
peft_model = get_peft_model(model, lora_config)


In [81]:
# Set up training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=2,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    evaluation_strategy="epoch"
)

In [82]:
# Initialize the Trainer
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=eval_data,
    compute_metrics=lambda p: {"accuracy": (p.predictions.argmax(-1) == p.label_ids).astype(float).mean().item()}
)

# Train the model
trainer.train()



Epoch,Training Loss,Validation Loss,Accuracy
1,0.2549,0.51032,0.820866
2,0.3596,0.533657,0.823819


Checkpoint destination directory ./results/checkpoint-500 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=508, training_loss=0.23244310317077035, metrics={'train_runtime': 354.3119, 'train_samples_per_second': 22.94, 'train_steps_per_second': 1.434, 'total_flos': 1102525620289536.0, 'train_loss': 0.23244310317077035, 'epoch': 2.0})

In [97]:
# Evaluate the model
eval_results = trainer.evaluate()

# Print evaluation results
print("Initial Evaluation Results:")
print(eval_results)

Initial Evaluation Results:
{'eval_loss': 0.5336573123931885, 'eval_accuracy': 0.8238188976377953, 'eval_runtime': 20.2849, 'eval_samples_per_second': 50.087, 'eval_steps_per_second': 3.155, 'epoch': 2.0}


### Save Model

In [98]:
output_dir = "./fine_tuned_model"
trainer.save_model(output_dir)

tokenizer.save_pretrained(output_dir)

print(f"Model and tokenizer saved to {output_dir}")

Model and tokenizer saved to ./fine_tuned_model


In [106]:
peft_model.save_pretrained("./fine_tuned_model")


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [105]:
# Load the base model
m = AutoModelForSequenceClassification.from_pretrained(
    base_model_name,
    torch_dtype=torch.bfloat16,  # Use bfloat16 for efficient memory usage
    device_map={"": 0}           # Map all layers to the first GPU (GPU 0)
)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [110]:
from peft import PeftModel
m = PeftModel.from_pretrained(m, "./fine_tuned_model")  # Load the adapter
tuned_model = m.merge_and_unload()                         # Merge adapter into base model and unload PEFT-specific layers


In [112]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tuned_model.to(device)
tuned_model.eval()

all_preds = []
all_labels = []

with torch.no_grad():
    for batch in val_loader:
        input_ids = batch["input_ids"].to(device)
        attention_mask = batch["attention_mask"].to(device)
        labels = batch["labels"].to(device)

        outputs = model(input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        preds = torch.argmax(logits, dim=1)

        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

# Compute Accuracy and F1
val_acc = accuracy_score(all_labels, all_preds)
val_f1 = f1_score(all_labels, all_preds, average="weighted")

print(f"Validation Accuracy (Fine-Tune Model): {val_acc:.4f}")
print(f"Validation F1 (Fine-Tune Model): {val_f1:.4f}")


Validation Accuracy (Fine-Tune Model): 0.8238
Validation F1 (Fine-Tune Model): 0.8240


### Conclusion

Ou initial results were:

Base Model: distilbert-base-uncased  
Validation Accuracy (foundation model): 0.5768  
Validation F1 (foundation model): 0.4315

with the adapters and base model, we merged and evaluate and got the following:  
Validation Accuracy (Fine-Tune Model): 0.8238  
Validation F1 (Fine-Tune Model): 0.8240  

In conclusion, using PEFT we improved our model performance by almost double.