# Lightweight Fine-Tuning Project

In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: gpt2
* Evaluation approach:Transformer trainer 
* Fine-tuning dataset:sms_spam 

## Loading and Evaluating a Foundation Model

In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [None]:
# Install the required version of datasets in case you have an older version
# You will need to choose "Kernel > Restart Kernel" from the menu after executing this cell
#!pip install -q "datasets==2.15.0"
#!pip install transformers
#!pip install peft
#!pip install datasets
#!pip install pandas
#!pip install numpy
#!pip install scikit-learn
#!pip install tqdm

In [1]:
# Load the sms_spam dataset
# See: https://huggingface.co/datasets/sms_spam

from datasets import load_dataset

# The sms_spam dataset only has a train split, so we use the train_test_split method to split it into train and test
dataset = load_dataset("sms_spam", split="train").train_test_split(
    test_size=0.2, shuffle=True, seed=23
)

splits = ["train", "test"]

# View the dataset characteristics
print("Training dataset", dataset["train"])
print("Testing dataset", dataset["test"])

# Inspect the first example. Do you think this is spam or not?
dataset["train"][0]

Training dataset Dataset({
    features: ['sms', 'label'],
    num_rows: 4459
})
Testing dataset Dataset({
    features: ['sms', 'label'],
    num_rows: 1115
})


{'sms': 'Had your mobile 10 mths? Update to the latest Camera/Video phones for FREE. KEEP UR SAME NUMBER, Get extra free mins/texts. Text YES for a call\n',
 'label': 1}

In [2]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Let's use a lambda function to tokenize all the examples
tokenized_dataset = {}
for split in splits:
    tokenized_dataset[split] = dataset[split].map(
        lambda x: tokenizer(x["sms"], truncation=True), batched=True
    )

# Inspect the available columns in the dataset
tokenized_dataset["train"]

Map:   0%|          | 0/1115 [00:00<?, ? examples/s]

Dataset({
    features: ['sms', 'label', 'input_ids', 'attention_mask'],
    num_rows: 4459
})

In [3]:
from transformers import AutoModelForSequenceClassification

foundation_model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)
foundation_model.config.pad_token_id = tokenizer.pad_token_id


Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
print(foundation_model)

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)


In [5]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

## Evaluating gpt2 model without changing parameters

In [6]:
import numpy as np
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer

training_args = TrainingArguments(
    output_dir='./results/foundational_model',  # Output directory for model predictions and checkpoints
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    num_train_epochs=1,
    logging_dir='./logs/foundational_model',
    logging_steps=10,
    evaluation_strategy="epoch"
)

# Initialize the Trainer with compute_metrics
trainer = Trainer(
    model=foundation_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)

trainer.train()

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Accuracy
1,0.0482,0.046051,0.991031


TrainOutput(global_step=140, training_loss=0.10274728012404272, metrics={'train_runtime': 66.5003, 'train_samples_per_second': 67.052, 'train_steps_per_second': 2.105, 'total_flos': 164527132114944.0, 'train_loss': 0.10274728012404272, 'epoch': 1.0})

In [7]:
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Evaluation Results: {'eval_loss': 0.04605131968855858, 'eval_accuracy': 0.9910313901345291, 'eval_runtime': 5.2555, 'eval_samples_per_second': 212.16, 'eval_steps_per_second': 6.66, 'epoch': 1.0}


## Performing Parameter-Efficient Fine-Tuning

In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [8]:
from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification

# PEFT model configuration
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=4,
    lora_alpha=16,
    lora_dropout=0.1
)

# Load the pre-trained GPT-2 model

model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)
model.config.pad_token_id = model.config.eos_token_id

peft_model = PeftModelForSequenceClassification(model, peft_config)

# Print
peft_model.print_trainable_parameters()

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 150,528 || all params: 124,590,336 || trainable%: 0.1208183594592762


## PEFT Evaluation

In [9]:
# The HuggingFace Trainer class handles the training and eval loop for PyTorch for us.
# Read more about it here https://huggingface.co/docs/transformers/main_classes/trainer

peft_training_args = TrainingArguments(
    output_dir="./results/peft_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs/peft_model',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

# Initialize the Trainer with compute_metrics
peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)

peft_trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.9833,0.945679,0.719283


TrainOutput(global_step=140, training_loss=0.9740984507969448, metrics={'train_runtime': 51.5809, 'train_samples_per_second': 86.447, 'train_steps_per_second': 2.714, 'total_flos': 164815327936512.0, 'train_loss': 0.9740984507969448, 'epoch': 1.0})

In [10]:
# Evaluate
evaluation_results_peft = peft_trainer.evaluate()
print("Evaluation Results:", evaluation_results_peft)

Evaluation Results: {'eval_loss': 0.9456785917282104, 'eval_accuracy': 0.7192825112107624, 'eval_runtime': 7.0263, 'eval_samples_per_second': 158.689, 'eval_steps_per_second': 2.562, 'epoch': 1.0}


## Save PEFT model

In [11]:
peft_model.save_pretrained('model/peft_model')

## Performing Inference with a PEFT Model

In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [12]:
from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification

inference_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "model/peft_model",
    num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)
inference_model.config.pad_token_id = inference_model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [13]:
peft_training_args = TrainingArguments(
    output_dir="./results/inference_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs/inference_model',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

trainer = Trainer(
    model=inference_model,
    args=peft_training_args,
    eval_dataset=tokenized_dataset["test"],
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)

# Evaluate the model
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

Evaluation Results: {'eval_loss': 0.9456785917282104, 'eval_accuracy': 0.7192825112107624, 'eval_runtime': 7.0144, 'eval_samples_per_second': 158.958, 'eval_steps_per_second': 2.566}


In [14]:
import torch

def predict(prompt: str) -> str:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")    
    inference_model.to(device)

    # Prepare the input text
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    # Get predictions
    with torch.no_grad():
        outputs = inference_model(**inputs)
        logits = outputs.logits        

    probabilities = torch.nn.functional.softmax(logits, dim=1)    
    predicted_class_id = probabilities.argmax().item()    
    id2label={0: "spam", 1: "not spam"}
    predicted_label = id2label[predicted_class_id]

    return predicted_label

In [15]:
# Example usage
prompt = "Had your mobile 10 mths? Update to the latest Camera/Video phones for FREE."
predicted_label = predict(prompt)
print(f"Prompt: '{prompt}'\nPredicted label: {predicted_label}")

Prompt: 'Had your mobile 10 mths? Update to the latest Camera/Video phones for FREE.'
Predicted label: spam


In [16]:
# Example usage
prompt = "I am Arun and want to say thanks"
predicted_label = predict(prompt)
print(f"Prompt: '{prompt}'\nPredicted label: {predicted_label}")

Prompt: 'I am Arun and want to say thanks'
Predicted label: spam


# Conclusion:
## Compare PEFT performance to performance of the original foundational model 

* It's evident that the GPT-2 model demonstrated superior performance for this particular dataset and task, suggesting that fine-tuning isn't always guaranteed to enhance accuracy.

* Exploring the impact of increasing the number of epochs could have provided further insights or potentially led to higher accuracy levels.