# Lightweight Fine-Tuning Project

### Project Summary
In this project,we need to demonstrate the following:

1. Load a pre-trained model and evaluate its performance
2. Perform parameter-efficient fine tuning using the pre-trained model
3. Perform inference using the fine-tuned model and compare its performance to the original model

## Lightweight Fine-Tuning Project - Summary



| Metric | Hugging Face- GPT2 Model | PEFT(GPT2) Model |
| --- | --- | --- |
| Epoch | 1 | 1 |
|Training Loss  | 0.248100 | 0.396800 |
| Validation Loss | 0.235470 | 0.481451|
| Accuracy | 0.929000| 0.776400 |
|F1  | 0.929000 | 0.768503 |
| Precision | 0.929000 | 0.817533|
| Recall | 0.929000 |0.776400|

### Conclusions:

1. Basis the above details, we noticed that GPT-2 model is more effective with the IMDB dataset. 
2. Fine Tuning with PEFT GPT2 model seems will not alwys increase the accuracy.
3. Increase in EPOCs might help in increasing the accuracy but it incurr extra cost.

TODO: In this cell, describe your choices for each of the following

PEFT technique:LoRA
Model:GPT2
Evaluation approach:PEFT with trainer
Fine-tuning dataset:Hugging Face IMDB

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [2]:
!pip install transformers
!pip install peft
!pip install datasets
!pip install pandas
!pip install numpy
!pip install scikit-learn
!pip install tqdm

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable


Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-learn
  Downloading scikit_learn-1.5.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.4/13.4 MB[0m [31m57.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Collecting joblib>=1.2.0
  Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m36.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: threadpoolctl, joblib, scikit-learn
Successfully installed joblib-1.4.2 scikit-learn-1.5.1 threadpoolctl-3.5.0
Defaulting to user installation because normal site-packages is

In [3]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, EvalPrediction
from datasets import Dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np
from transformers import DataCollatorWithPadding
from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification
import torch
import tqdm
import os

## Loading the IMDB Datasets

Loading the Datasets and split them into the Train and Test datasets

In [4]:
from datasets import load_dataset

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "caching_allocator"


# The sms_spam dataset only has a train split, so we use the train_test_split method to split it into train and test
dataset = load_dataset("imdb", split="train").train_test_split(
    test_size=0.2, shuffle=True, seed=23
)

splits = ["train", "test"]

# View the dataset characteristics
#dataset["train"]
#dataset["test"]
dataset

Using device: cuda


Downloading readme:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

Downloading data: 100%|██████████| 21.0M/21.0M [00:00<00:00, 21.0MB/s]
Downloading data: 100%|██████████| 20.5M/20.5M [00:00<00:00, 24.7MB/s]
Downloading data: 100%|██████████| 42.0M/42.0M [00:01<00:00, 26.6MB/s]


Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 20000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 5000
    })
})

## Inspect the dataset

In [5]:
# Inspect the first example. Do you think this is spam or not?
dataset["train"][0]

{'text': 'The stories in this video are very entertaining, and it definately is worth a look! The first one concerns a young couple harrassed in the woods by two rednecks, with a great, but unexplained twist at the end.<br /><br />The seond is the best of the lot, and it alone, makes this worth watching - A man is attacked by a dog, which he fears to be rabid - He finds shelter in what appears to be a hospital, but he finds out the employees there are not exactly what they appear to be...... Great twist at the end, and this episode alone scores 10/10! If the others were up to par with this one, this would get 10/10!<br /><br />The third is the weakest of the bunch - A girl meets with some guys and has wild sex! There appears to be no point to the story until the end, with a good little twist, but it is spoiled by the awful first part!<br /><br />Never the less, this is a great movie that will not do you wrong at all! Well worth a rental!',
 'label': 1}

## Tokenize using the gpt2 tokenizer and Load the gpt2 model

In [11]:
#Pre-process Datasets
from transformers import AutoTokenizer

training_dataset=dataset["train"]
eval_dataset=dataset["test"]

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Tokenize and convert
def tokenize_and_encode(examples):
    tokenized_inputs = tokenizer(examples['text'], padding="max_length", truncation=True, max_length=512)
    tokenized_inputs['label'] = examples['label']
    return tokenized_inputs

train_dataset = training_dataset.map(tokenize_and_encode, batched=True)
val_dataset = eval_dataset.map(tokenize_and_encode, batched=True)

#Loading the foundation Model
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=2).to(device)
model.config.pad_token_id = tokenizer.pad_token_id
for param in model.parameters():
    param.requires_grad = True

train_dataset


Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Dataset({
    features: ['text', 'label', 'input_ids', 'attention_mask'],
    num_rows: 20000
})

## Print the GPT2 model details

In [6]:
print(model)

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)


## Evaluate the Foundation Model (GPT2)

Evaluate the foundation model and save the model in model and tokenizer here , ./results/model

In [7]:
# Evautaion
# Compute metrics function
def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": accuracy_score(p.label_ids, preds), "f1": f1, "precision": precision, "recall": recall}

# Define the training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=10,
    per_device_eval_batch_size=10,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    #data_collator=data_collator,
)

# Start training
trainer.train()

# save the model and tokenizer explicitly
model_output_dir = "./results/model"

model.save_pretrained(model_output_dir)
tokenizer.save_pretrained(model_output_dir)

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.2481,0.23547,0.929,0.929,0.929,0.929


Evaluation Results: {'eval_loss': 0.23546965420246124, 'eval_accuracy': 0.929, 'eval_f1': 0.9289998892332609, 'eval_precision': 0.9289998527905785, 'eval_recall': 0.929, 'eval_runtime': 173.8706, 'eval_samples_per_second': 28.757, 'eval_steps_per_second': 2.876, 'epoch': 1.0}


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

## Training with PEFT
### Create the PEFT configration. 

In [16]:
# PEFT model configuration
from peft import LoraConfig

# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "caching_allocator"

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=4,
    lora_alpha=16,
    lora_dropout=0.1
)

Using device: cuda


## Converting a Transformers Model into a PEFT Model

In [18]:
# Load the pre-trained GPT-2 model

training_dataset=dataset["train"]
eval_dataset=dataset["test"]

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Tokenize and convert
def tokenize_and_encode(examples):
    tokenized_inputs = tokenizer(examples['text'], padding="max_length", truncation=True, max_length=512)
    tokenized_inputs['label'] = examples['label']
    return tokenized_inputs

train_dataset = training_dataset.map(tokenize_and_encode, batched=True)
val_dataset = eval_dataset.map(tokenize_and_encode, batched=True)

model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2).to(device)
model.config.pad_token_id = model.config.eos_token_id

lora_model = PeftModelForSequenceClassification(model, peft_config)

# Print
lora_model.print_trainable_parameters()

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 150,528 || all params: 124,590,336 || trainable%: 0.1208183594592762


## Training & Saving a Trained PEFT Model

In [19]:
# Compute metrics function
def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": accuracy_score(p.label_ids, preds), "f1": f1, "precision": precision, "recall": recall}

# Define the training arguments
training_args = TrainingArguments(
    output_dir="./results/lora_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=10,
    per_device_eval_batch_size=10,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs/lora_model',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    #data_collator=data_collator,
)

# Start training
trainer.train()

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,0.3968,0.481451,0.7764,0.768503,0.817533,0.7764


Evaluation Results: {'eval_loss': 0.48145079612731934, 'eval_accuracy': 0.7764, 'eval_f1': 0.7685032853977475, 'eval_precision': 0.8175328322284016, 'eval_recall': 0.7764, 'eval_runtime': 181.1661, 'eval_samples_per_second': 27.599, 'eval_steps_per_second': 2.76, 'epoch': 1.0}


In [20]:
lora_model.save_pretrained('model/peft_lora_model')

## Performing Inference with a PEFT Mode

In [6]:
#Performing Inference with a PEFT Mode
inference_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "model/peft_lora_model",
    num_labels=2
)
inference_model.config.pad_token_id = inference_model.config.eos_token_id

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [7]:
# Load the pre-trained GPT-2 model

training_dataset=dataset["train"]
eval_dataset=dataset["test"]

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Tokenize and convert
def tokenize_and_encode(examples):
    tokenized_inputs = tokenizer(examples['text'], padding="max_length", truncation=True, max_length=512)
    tokenized_inputs['label'] = examples['label']
    return tokenized_inputs

train_dataset = training_dataset.map(tokenize_and_encode, batched=True)
val_dataset = eval_dataset.map(tokenize_and_encode, batched=True)

# Compute metrics function
def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": accuracy_score(p.label_ids, preds), "f1": f1, "precision": precision, "recall": recall}

# Define the training arguments
training_args = TrainingArguments(
    output_dir="./results/peft_model_inf",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=10,
    per_device_eval_batch_size=10,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs/peft_model_inf',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)
trainer = Trainer(
    model=inference_model,
    args=training_args,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    #data_collator=data_collator,
)

# Evaluate the model
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Map:   0%|          | 0/20000 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Evaluation Results: {'eval_loss': 0.48145079612731934, 'eval_accuracy': 0.7764, 'eval_f1': 0.7685032853977475, 'eval_precision': 0.8175328322284016, 'eval_recall': 0.7764, 'eval_runtime': 183.0446, 'eval_samples_per_second': 27.316, 'eval_steps_per_second': 2.732}


In [11]:
def predict(sentence: str) -> str:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    inference_model.to(device)

    # Prepare the input text
    inputs = tokenizer(sentence, return_tensors="pt").to(device)

    # Get predictions
    with torch.no_grad():
        outputs = inference_model(**inputs)
        logits = outputs.logits

    probabilities = torch.nn.functional.softmax(logits, dim=1)
    predicted_class_id = probabilities.argmax().item()
  

    return predicted_class_id

# Example usage
sentence = 'The stories in this video are very entertaining, and it definately is worth a look! The first one concerns a young couple harrassed in the woods by two rednecks, with a great, but unexplained twist at the end.<br /><br />The seond is the best of the lot, and it alone, makes this worth watching - A man is attacked by a dog, which he fears to be rabid - He finds shelter in what appears to be a hospital, but he finds out the employees there are not exactly what they appear to be...... Great twist at the end, and this episode alone scores 10/10! If the others were up to par with this one, this would get 10/10!<br /><br />The third is the weakest of the bunch - A girl meets with some guys and has wild sex! There appears to be no point to the story until the end, with a good little twist, but it is spoiled by the awful first part!<br /><br />Never the less, this is a great movie that will not do you wrong at all! Well worth a rental!'
predicted_label = predict(sentence)
print(f"Sentence: '{sentence}'\nPredicted label: {predicted_label}")

Sentence: 'The stories in this video are very entertaining, and it definately is worth a look! The first one concerns a young couple harrassed in the woods by two rednecks, with a great, but unexplained twist at the end.<br /><br />The seond is the best of the lot, and it alone, makes this worth watching - A man is attacked by a dog, which he fears to be rabid - He finds shelter in what appears to be a hospital, but he finds out the employees there are not exactly what they appear to be...... Great twist at the end, and this episode alone scores 10/10! If the others were up to par with this one, this would get 10/10!<br /><br />The third is the weakest of the bunch - A girl meets with some guys and has wild sex! There appears to be no point to the story until the end, with a good little twist, but it is spoiled by the awful first part!<br /><br />Never the less, this is a great movie that will not do you wrong at all! Well worth a rental!'
Predicted label: 1
