<a href="https://colab.research.google.com/github/akash166d/PEFT_fine_tuning_IMDB/blob/main/PEFT_fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lightweight Fine-Tuning Project

In this cell, describe your choices for each of the following

* PEFT technique:  LORA
* Model: GPT-2
* Evaluation approach: Transformer trainer
* Fine-tuning dataset: Walmart ecommerce Review

**Libraries**

In [1]:
!pip install transformers
!pip install peft
!pip install datasets
!pip install pandas
!pip install numpy
!pip install scikit-learn
!pip install tqdm



In [2]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer, EvalPrediction
from datasets import Dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np
from transformers import DataCollatorWithPadding
from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification
import torch
import tqdm

## Dataset

In [3]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [4]:

# ! ln -s "/content/drive/My Drive" "/content/MyDrive"
FILE_PATH = "/content/drive/My Drive/PEFT_fine_tuning/Womens Clothing E-Commerce Reviews.csv"
# Load & transform data
data = pd.read_csv(FILE_PATH)

data = data[["Title", "Review Text" , "Rating"]]
data = data[data.Title.notnull()]

data["review"] = data["Title"] + ": /n/n " + data["Review Text"]
data.drop(columns=['Title', 'Review Text'], inplace=True)
data.rename(columns = {"Rating" : "label"} , inplace = True)
data.dropna(inplace = True)
data.head()
data = data.iloc[:1000] # less Data for faster processing

In [5]:
print("# of reviews: " + str(data.shape))
print("first review length: " + str(len(data["review"].values[0])))

# of reviews: (1000, 2)
first review length: 530


In [6]:
unique_rating = data['label'].unique()
# Encode the label rating(1-5) into numerical format(0-4)
unique_rat = unique_rating
rat2id = {rat: id for id, rat in enumerate(unique_rat)}
id2rat = {id: rat for rat, id in rat2id.items()}

In [7]:
rat2id
data.replace({"label": rat2id} , inplace =True) # model takes 0 to n as label and not from 1

In [8]:
data.head(2)

Unnamed: 0,label,review
2,0,Some major design flaws: /n/n I had such high ...
3,1,"My favorite buy!: /n/n I love, love, love this..."


# Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

## Tokenizer

In [9]:
# Split the dataset into training and validation sets
train_df, val_df = train_test_split(data, test_size=0.1, stratify=data['label'], random_state=14)
# Convert the dataframes into Hugging Face datasets
train_dataset = Dataset.from_pandas(train_df)
val_dataset = Dataset.from_pandas(val_df)

In [10]:
# Define the tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
# Tokenize and convert
def tokenize_and_encode(examples):
    tokenized_inputs = tokenizer(examples['review'], padding="max_length", truncation=True, max_length=256)
    tokenized_inputs['labels'] = examples['label']
    return tokenized_inputs

train_dataset = train_dataset.map(tokenize_and_encode, batched=True)
val_dataset = val_dataset.map(tokenize_and_encode, batched=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Map:   0%|          | 0/900 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [11]:
# len(train_dataset["attention_mask"])


In [12]:

train_dataset = train_dataset.map(tokenize_and_encode, batched=True)
val_dataset = val_dataset.map(tokenize_and_encode, batched=True)

Map:   0%|          | 0/900 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

## model load

In [13]:
# import torch
# torch.cuda.empty_cache()

model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=len(unique_rat))
model.config.pad_token_id = tokenizer.pad_token_id

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


# Compute metrics function
def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": accuracy_score(p.label_ids, preds), "f1": f1, "precision": precision, "recall": recall}

# Define the training arguments
training_args = TrainingArguments(
    output_dir="/content/drive/My Drive/PEFT_fine_tuning/results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16, # decreasing batch size for memory ** this helped a lot
    per_device_eval_batch_size=32,
    num_train_epochs=10,
    weight_decay=0.01,
    logging_dir='/content/drive/My Drive/PEFT_fine_tuning/logs',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Start training
trainer.train()

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,1.471181,0.49,0.375632,0.367428,0.49
2,3.020500,1.135193,0.52,0.419709,0.36676,0.52
3,3.020500,0.872374,0.66,0.608329,0.570967,0.66
4,0.882000,0.88745,0.62,0.589674,0.567878,0.62
5,0.882000,0.887188,0.62,0.605314,0.599471,0.62
6,0.736500,0.836669,0.65,0.623079,0.618086,0.65
7,0.736500,0.894663,0.62,0.618159,0.632084,0.62
8,0.626400,0.880312,0.65,0.640577,0.631667,0.65
9,0.550600,0.882055,0.65,0.637772,0.634907,0.65
10,0.550600,0.893122,0.66,0.642197,0.633602,0.66


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Evaluation Results: {'eval_loss': 0.836669385433197, 'eval_accuracy': 0.65, 'eval_f1': 0.6230794930875576, 'eval_precision': 0.6180857427716849, 'eval_recall': 0.65, 'eval_runtime': 1.6648, 'eval_samples_per_second': 60.066, 'eval_steps_per_second': 2.403, 'epoch': 10.0}


  _warn_prf(average, modifier, msg_start, len(result))


In [14]:
model

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2SdpaAttention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=5, bias=False)
)

# Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [15]:
# PEFT model configuration
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=4,
    lora_alpha=16,
    lora_dropout=0.1
)

# Load the pre-trained GPT-2 model
model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=len(unique_rat))
model.config.pad_token_id = model.config.eos_token_id

peft_model = PeftModelForSequenceClassification(model, peft_config)

# Print
peft_model.print_trainable_parameters()

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 151,296 || all params: 124,594,944 || trainable%: 0.1214




In [16]:
# Compute metrics function
def compute_metrics(p: EvalPrediction):
    preds = np.argmax(p.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(p.label_ids, preds, average='weighted')
    return {"accuracy": accuracy_score(p.label_ids, preds), "f1": f1, "precision": precision, "recall": recall}

# Define the training arguments
training_args = TrainingArguments(
    output_dir="/content/drive/My Drive/PEFT_fine_tuning/results/peft_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    num_train_epochs=10,
    weight_decay=0.01,
    logging_dir='/content/drive/My Drive/PEFT_fine_tuning/logs/peft_model',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Start training
trainer.train()

# Evaluate
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)



Epoch,Training Loss,Validation Loss,Accuracy,F1,Precision,Recall
1,No log,3.48932,0.52,0.362632,0.278384,0.52
2,3.383100,3.079492,0.52,0.362632,0.278384,0.52
3,3.383100,2.570876,0.53,0.383366,0.391224,0.53
4,2.630800,2.012727,0.53,0.404655,0.467312,0.53
5,2.630800,1.724267,0.53,0.414896,0.407778,0.53
6,1.850100,1.645918,0.54,0.419161,0.450471,0.54
7,1.850100,1.608755,0.53,0.387988,0.325105,0.53
8,1.605200,1.574732,0.53,0.389703,0.321191,0.53
9,1.492800,1.563204,0.53,0.389703,0.321191,0.53
10,1.492800,1.557958,0.53,0.389703,0.321191,0.53


  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


Evaluation Results: {'eval_loss': 1.5579581260681152, 'eval_accuracy': 0.53, 'eval_f1': 0.38970282849982096, 'eval_precision': 0.3211914893617021, 'eval_recall': 0.53, 'eval_runtime': 1.7026, 'eval_samples_per_second': 58.732, 'eval_steps_per_second': 2.349, 'epoch': 10.0}


  _warn_prf(average, modifier, msg_start, len(result))


## Save model

In [17]:
peft_model.save_pretrained('/content/drive/My Drive/PEFT_fine_tuning/model/peft_model')

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

## load PEFT

In [18]:
inference_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "/content/drive/My Drive/PEFT_fine_tuning/model/peft_model",
    num_labels=len(unique_rat)
)
inference_model.config.pad_token_id = inference_model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [19]:
trainer = Trainer(
    model=inference_model,
    args=training_args,
    eval_dataset=val_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Evaluate the model
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Evaluation Results: {'eval_loss': 1.5579581260681152, 'eval_accuracy': 0.53, 'eval_f1': 0.38970282849982096, 'eval_precision': 0.3211914893617021, 'eval_recall': 0.53, 'eval_runtime': 1.6799, 'eval_samples_per_second': 59.529, 'eval_steps_per_second': 2.381}


  _warn_prf(average, modifier, msg_start, len(result))


In [20]:
def predict(sentence: str) -> str:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    inference_model.to(device)

    # Prepare the input text
    inputs = tokenizer(sentence, return_tensors="pt").to(device)

    # Get predictions
    with torch.no_grad():
        outputs = inference_model(**inputs)
        logits = outputs.logits

    probabilities = torch.nn.functional.softmax(logits, dim=1)
    predicted_class_id = probabilities.argmax().item()
    predicted_label = id2rat[predicted_class_id]

    return predicted_label



In [21]:
# Example usage
sentence = """Are people actually paying attention?:

I can't understand why people are being so critical. Is all you care for blood & gore?
The last episode was amazing yes, a thrilling experience that left me at the edge of my seat. Though everyone condemning this episode as a "filler" aren't paying attention.
There are so many plots being put together, where things are starting to make sense and add up to something greater.
Had we not had these episode prior, The Red Dragon and the Gold would not have been so successful. We saw the build up of conflicts, power tripping and ignorance, lead to a devastating battle that changed the whole direction of this show.
Patience is a virtue. I am eager & waiting to see the chaos that's about to unfold."""
predicted_label = predict(sentence)
print(f"Sentence: '{sentence}'\nPredicted label: {predicted_label}")

Sentence: 'Are people actually paying attention?:

I can't understand why people are being so critical. Is all you care for blood & gore?
The last episode was amazing yes, a thrilling experience that left me at the edge of my seat. Though everyone condemning this episode as a "filler" aren't paying attention.
There are so many plots being put together, where things are starting to make sense and add up to something greater.
Had we not had these episode prior, The Red Dragon and the Gold would not have been so successful. We saw the build up of conflicts, power tripping and ignorance, lead to a devastating battle that changed the whole direction of this show.
Patience is a virtue. I am eager & waiting to see the chaos that's about to unfold.'
Predicted label: 5


In [22]:
sentence = """The Dark Knight and Empire Strikes Back of our generation:
christianreedbrown-6514526 February 2024
I just got out of an early access showing and it was absolutely incredible. See for yourself in IMAX. The characters, acting, screenplay, world building, storytelling, score, actions sequences, cinematography, and everything in between make for a cinematic masterpiece. Denis Villeneuve provides a masterclass of filmmaking. The casting continuation was perfect all the way through, with great new add-ons. Timothee Chalamet is believable, raw and real as Paul Atriedes. He was flawless as the lead. Zendaya, Rebecca Ferguson, Javier Bardem, and Josh Brolin are fantastic per usual. Stellan Skarsgard and Dave Bautista continue their evil. Austin Butler steals the show as Feyd-Rautha, and Florence Pugh and Christopher Walken are solid fresh casts.
Overall, Dune: Part Two is an inspiring, visually stunning sci-fi spectacle and an incredible collision of myth, adventure, and destiny on a galactic scale. It's a fantastic piece of filmmaking, rarely seen in modern day cinema."""
predicted_label = predict(sentence)
print(f"Sentence: '{sentence}'\nPredicted label: {predicted_label}")

Sentence: 'The Dark Knight and Empire Strikes Back of our generation:
christianreedbrown-6514526 February 2024
I just got out of an early access showing and it was absolutely incredible. See for yourself in IMAX. The characters, acting, screenplay, world building, storytelling, score, actions sequences, cinematography, and everything in between make for a cinematic masterpiece. Denis Villeneuve provides a masterclass of filmmaking. The casting continuation was perfect all the way through, with great new add-ons. Timothee Chalamet is believable, raw and real as Paul Atriedes. He was flawless as the lead. Zendaya, Rebecca Ferguson, Javier Bardem, and Josh Brolin are fantastic per usual. Stellan Skarsgard and Dave Bautista continue their evil. Austin Butler steals the show as Feyd-Rautha, and Florence Pugh and Christopher Walken are solid fresh casts.
Overall, Dune: Part Two is an inspiring, visually stunning sci-fi spectacle and an incredible collision of myth, adventure, and destiny on 

# Conclusions

Defaulut HF model performs better than PEFT. This make sense because HF model weights are already tuned for task (an instruct LLM) and by creating an additional layer we are just eventually training a new layer weight to achieve same task. We may eventuallly reach the same or slightly better performance by increasing training data size or epochs but it will be too much of computation.