<a href="https://colab.research.google.com/github/ShutterStack/Lightweight-Fine-Tuning-to-a-Foundation-Model/blob/main/Lightweight_FineTuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lightweight Fine-Tuning Project
PEFT Technique:
The selected Parameter-Efficient Fine-Tuning (PEFT) approach involves initially training the base model with frozen parameters for one epoch. After this phase, all model parameters are unfrozen, allowing for a more task-specific adaptation during fine-tuning. The fine-tuning process continues for an additional two epochs with the unfrozen parameters, enabling the model to refine its performance effectively.

Model:
The distilbert-base-uncased model is used as the base for sequence classification. This same model is utilized both for initial training and throughout the PEFT process.

Evaluation Approach:
Model evaluation is conducted using the Trainer class from the Hugging Face transformers library. The evaluation strategy follows an "epoch" schedule, meaning assessments are carried out after each training epoch. Key evaluation metrics include loss, accuracy, runtime, samples processed per second, steps per second, and epoch count.

Fine-Tuning Dataset:
The fine-tuning dataset is derived from the rotten_tomatoes dataset, utilizing both train and test splits. To speed up the example, a subset of 500 samples from each split is used. The dataset is pre-processed using the distilbert-base-uncased tokenizer to ensure compatibility with the model.

### Loading and Evaluating a Foundation Model

In [27]:
!pip install -q "datasets==2.15.0"
!pip install transformers
!pip install peft
!pip install datasets
!pip install pandas
!pip install numpy
!pip install scikit-learn
!pip install tqdm



In [28]:
from datasets import load_dataset

# The sms_spam dataset only has a train split, so we use the train_test_split method to split it into train and test
dataset = load_dataset("sms_spam", split="train").train_test_split(
    test_size=0.2, shuffle=True, seed=23
)

In [29]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

def tokenize(batch):
    return tokenizer(batch["sms"], padding=True, truncation=True)

train_dataset = dataset["train"].map(tokenize, batched=True)
test_dataset = dataset["test"].map(tokenize, batched=True)

Map:   0%|          | 0/1115 [00:00<?, ? examples/s]

In [30]:
train_dataset

Dataset({
    features: ['sms', 'label', 'input_ids', 'attention_mask'],
    num_rows: 4459
})

In [31]:
test_dataset

Dataset({
    features: ['sms', 'label', 'input_ids', 'attention_mask'],
    num_rows: 1115
})

In [32]:
from transformers import AutoModelForSequenceClassification

foundation_model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)
foundation_model.config.pad_token_id = tokenizer.pad_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [33]:
print(foundation_model)

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)


### Performing Parameter-Efficient Fine-Tuning

In [34]:
from transformers import Trainer, TrainingArguments
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import torch

predictions = []
labels = []
for example in test_dataset:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    foundation_model.to(device)

    # Prepare the input text
    inputs = tokenizer(example["sms"], return_tensors="pt").to(device)

    # Get predictions
    with torch.no_grad():
        outputs = foundation_model(**inputs)
        logits = outputs.logits

    probabilities = torch.nn.functional.softmax(logits, dim=1)
    predicted_class_id = probabilities.argmax().item()

    # Here are the lists for the predicted output and the ground truth
    predictions.append(predicted_class_id)
    labels.append(example["label"])

In [35]:
# Define function to compute metrics
def compute_metrics(labels, preds):
    acc = accuracy_score(labels, preds)
    #precision = precision_score(labels, preds)
    #recall = recall_score(labels, preds)
    #f1 = f1_score(labels, preds)
    return {"accuracy": acc}

# Compute evaluation metrics
evaluation_metrics = compute_metrics(labels, predictions)
print(evaluation_metrics)

{'accuracy': 0.12825112107623318}


### Performing Parameter-Efficient Fine-Tuning
In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [36]:
from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification

# PEFT model configuration
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=4,
    lora_alpha=16,
    lora_dropout=0.1
)

# Load the pre-trained GPT-2 model

model = AutoModelForSequenceClassification.from_pretrained("gpt2", num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)
model.config.pad_token_id = model.config.eos_token_id

peft_model = PeftModelForSequenceClassification(model, peft_config)

# Print
peft_model.print_trainable_parameters()

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 148,992 || all params: 124,590,336 || trainable%: 0.1196




PERT Evalutaion

In [37]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

In [38]:
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments
import numpy as np

peft_training_args = TrainingArguments(
    output_dir="./results/peft_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs/peft_model',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

# Initialize the Trainer with compute_metrics
peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)

peft_trainer.train()

  peft_trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,0.5228,0.34017,0.873543


TrainOutput(global_step=140, training_loss=0.5083827972412109, metrics={'train_runtime': 172.0593, 'train_samples_per_second': 25.915, 'train_steps_per_second': 0.814, 'total_flos': 588140786128896.0, 'train_loss': 0.5083827972412109, 'epoch': 1.0})

In [39]:
# Evaluate
evaluation_results_peft = peft_trainer.evaluate()
print("Evaluation Results:", evaluation_results_peft)

Evaluation Results: {'eval_loss': 0.3401700556278229, 'eval_accuracy': 0.873542600896861, 'eval_runtime': 14.2747, 'eval_samples_per_second': 78.11, 'eval_steps_per_second': 2.452, 'epoch': 1.0}


In [40]:
peft_model.save_pretrained('model/peft_model')

In [41]:
from peft import LoraConfig, PeftModelForSequenceClassification, TaskType, AutoPeftModelForSequenceClassification

inference_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "model/peft_model",
    num_labels=2,
    id2label={0: "not spam", 1: "spam"},
    label2id={"not spam": 0, "spam": 1},
)
inference_model.config.pad_token_id = inference_model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [42]:
peft_training_args = TrainingArguments(
    output_dir="./results/inference_model",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir='./logs/inference_model',
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_steps=100,
    warmup_ratio=0.1,
)

trainer = Trainer(
    model=inference_model,
    args=peft_training_args,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
)

# Evaluate the model
evaluation_results = trainer.evaluate()
print("Evaluation Results:", evaluation_results)

  trainer = Trainer(


Evaluation Results: {'eval_loss': 0.34017014503479004, 'eval_model_preparation_time': 0.0031, 'eval_accuracy': 0.873542600896861, 'eval_runtime': 14.42, 'eval_samples_per_second': 77.323, 'eval_steps_per_second': 1.248}


In [43]:
import torch

def predict(prompt: str) -> str:
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    inference_model.to(device)

    # Prepare the input text
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    # Get predictions
    with torch.no_grad():
        outputs = inference_model(**inputs)
        logits = outputs.logits

    probabilities = torch.nn.functional.softmax(logits, dim=1)
    predicted_class_id = probabilities.argmax().item()
    id2label={0: "spam", 1: "not spam"}
    predicted_label = id2label[predicted_class_id]

    return predicted_label

In [44]:
# Example usage
prompt = "Had your mobile 10 mths? Update to the latest Camera/Video phones for FREE."
predicted_label = predict(prompt)
print(f"Prompt: '{prompt}'\nPredicted label: {predicted_label}")

Prompt: 'Had your mobile 10 mths? Update to the latest Camera/Video phones for FREE.'
Predicted label: spam


In [45]:
# Example usage
prompt = "I am Arun and want to say thanks"
predicted_label = predict(prompt)
print(f"Prompt: '{prompt}'\nPredicted label: {predicted_label}")

Prompt: 'I am Arun and want to say thanks'
Predicted label: spam


In [46]:
import torch
from transformers import AutoTokenizer, DataCollatorWithPadding, TrainingArguments, Trainer, AutoModelForSequenceClassification
from datasets import load_dataset
import pandas as pd
import numpy as np

splits = ["train", "test"]
ds = {split: ds for split, ds in zip(splits, load_dataset("rotten_tomatoes", split=splits))}

for split in splits:
    ds[split] = ds[split].shuffle(seed=42).select(range(500))

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def preprocess_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_ds = {}
for split in splits:
    tokenized_ds[split] = ds[split].map(preprocess_function, batched=True)

base_model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2,
    id2label={0: "NEGATIVE", 1: "POSITIVE"},
    label2id={"NEGATIVE": 0, "POSITIVE": 1},
)

for param in base_model.base_model.parameters():
    param.requires_grad = False

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [47]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}


trainer_base = Trainer(
    model=base_model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis_base",
        learning_rate=2e-3,
        per_device_train_batch_size=6,
        per_device_eval_batch_size=6,
        num_train_epochs=2,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer_base.train()

base_model_evaluation = trainer_base.evaluate()

  trainer_base = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.643577,0.63
2,No log,0.52322,0.762


In [48]:

for param in base_model.parameters():
    param.requires_grad = True

trainer_peft = Trainer(
    model=base_model,
    args=TrainingArguments(
        output_dir="./data/sentiment_analysis_peft",
        learning_rate=2e-5,
        per_device_train_batch_size=12,
        per_device_eval_batch_size=12,
        num_train_epochs=4,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        load_best_model_at_end=True,
    ),
    train_dataset=tokenized_ds["train"],
    eval_dataset=tokenized_ds["test"],
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(tokenizer=tokenizer),
    compute_metrics=compute_metrics,
)

trainer_peft.train()

  trainer_peft = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.571426,0.73
2,No log,0.80986,0.778
3,No log,0.997572,0.782
4,No log,1.081727,0.782


TrainOutput(global_step=168, training_loss=0.17127627418154762, metrics={'train_runtime': 201.1397, 'train_samples_per_second': 9.943, 'train_steps_per_second': 0.835, 'total_flos': 264934797312000.0, 'train_loss': 0.17127627418154762, 'epoch': 4.0})

In [49]:
peft_model_evaluation = trainer_peft.evaluate()

print("Base Model Evaluation:")
print(base_model_evaluation)

print("\nPEFT Model Evaluation:")
print(peft_model_evaluation)

Base Model Evaluation:
{'eval_loss': 0.5232203602790833, 'eval_accuracy': 0.762, 'eval_runtime': 7.6925, 'eval_samples_per_second': 64.998, 'eval_steps_per_second': 10.92, 'epoch': 2.0}

PEFT Model Evaluation:
{'eval_loss': 0.5714263319969177, 'eval_accuracy': 0.73, 'eval_runtime': 7.5046, 'eval_samples_per_second': 66.626, 'eval_steps_per_second': 5.597, 'epoch': 4.0}
