# Lightweight Fine-Tuning Project

TODO: In this cell, describe your choices for each of the following

* PEFT technique: LoRA
* Model: gpt2
* Evaluation approach: the evaluate method with a Hugging Face Trainer and comparing the original foundation model's performance and the fine-tuned model's performance.
* Fine-tuning dataset: imdb

## Loading and Evaluating a Foundation Model

TODO: In the cells below, load your chosen pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [1]:
!pip install transformers datasets peft torch evaluate scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hCollecting scikit-learn
  Downloading scikit_learn-1.5.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.3/13.3 MB[0m [31m81.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting joblib>=1.2.0
  Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.8/301.8 kB[0m [31m35.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting threadpoolctl>=3.1.0
  Downloading threadpoolctl-3.5.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn, evaluate
[0mSuccessfully installed evaluate-0.4.3 joblib-1.4.2 scikit-learn-1.5.2 t

In [2]:
import logging
logging.basicConfig(level=logging.INFO)

In [3]:
import torch
from transformers import (
    GPT2ForSequenceClassification, 
    GPT2Tokenizer, 
    Trainer, 
    TrainingArguments, 
)
from datasets import load_dataset
from peft import (
    LoraConfig,
    PeftModel,
    get_peft_model, 
    TaskType
)

INFO:datasets:PyTorch version 2.0.1 available.


In [4]:
# -----------------------------
# Step 1: Load the Model and Tokenizer
# -----------------------------

model_name = "gpt2"
model = GPT2ForSequenceClassification.from_pretrained(model_name, num_labels=2)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Fix the padding token issue (GPT-2 doesn't have a pad_token by default)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
model.config.pad_token_id = tokenizer.pad_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [5]:
# -----------------------------
# Step 2: Load the Dataset
# -----------------------------
dataset = load_dataset("imdb")

In [6]:
# -----------------------------
# Step 3: Tokenize the Datasets
# -----------------------------
def tokenize_data(example):
    return tokenizer(example['text'], padding='max_length', truncation=True, max_length=128)

tokenized_dataset = dataset.map(tokenize_data, batched=True)

In [7]:
# -----------------------------
# Step 4: Prepare training and testing datasets
# -----------------------------
train_dataset = tokenized_dataset['train'].shuffle(seed=42).select(range(2000))  # Reduce size for quick testing
test_dataset = tokenized_dataset['test'].shuffle(seed=42).select(range(500))

In [8]:
# -----------------------------
# Step 5: Define the evaluation metric
# -----------------------------
from evaluate import load as load_metric
accuracy_metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = torch.argmax(torch.tensor(logits), dim=-1)
    return accuracy_metric.compute(predictions=predictions, references=labels)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

In [9]:
# -----------------------------
# Step 6: Set up training arguments and initialize the Trainer for evaluation
# -----------------------------
training_args = TrainingArguments(
    output_dir="./results",
    per_device_eval_batch_size=8,
    do_train=False,
    do_eval=True,
    logging_steps=10,
)

# Initialize the Trainer for evaluation
trainer = Trainer(
    model=model,
    args=training_args,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

In [10]:
# -----------------------------
# Step 7: Evaluating the pre-trained model
# -----------------------------
print("Evaluating the pre-trained model:")
eval_results = trainer.evaluate()
print(f"Pre-trained model accuracy: {eval_results['eval_accuracy']:.4f}")

Evaluating the pre-trained model:


Pre-trained model accuracy: 0.4920


## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [11]:
# -----------------------------
# Step 6: Configure LoRA (Low-Rank Adaptation)
# -----------------------------
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)
peft_model = get_peft_model(model, lora_config)  # Apply LoRA to the GPT-2 model
peft_model.print_trainable_parameters()          # Print trainable parameters to verify LoRA setup



trainable params: 297,984 || all params: 124,737,792 || trainable%: 0.23888830740245906


## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.

In [12]:
# -----------------------------
# Step 8: Define Training Arguments for fine-tuning
# -----------------------------
training_args = TrainingArguments(
    output_dir="./results",           # Directory to save model checkpoints
    evaluation_strategy="epoch",      # Evaluate at the end of each epoch
    per_device_train_batch_size=8,    # Batch size per device during training
    per_device_eval_batch_size=8,     # Batch size per device during evaluation
    num_train_epochs=1,               # Number of training epochs
    logging_steps=10,                 # Log training info every 10 step
    save_strategy="epoch",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",    
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)


In [13]:
# -----------------------------
# Step 8: Fine Tuning the Model with LoRA
# -----------------------------
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,3.18,2.929555,0.514


Checkpoint destination directory ./results/checkpoint-250 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=250, training_loss=4.526958145141601, metrics={'train_runtime': 40.3206, 'train_samples_per_second': 49.602, 'train_steps_per_second': 6.2, 'total_flos': 131103719424000.0, 'train_loss': 4.526958145141601, 'epoch': 1.0})

In [14]:
# -----------------------------
# Step 9: Save the fine-tuned LoRA model
# -----------------------------
peft_model.save_pretrained("./lora_gpt2_imdb")

In [15]:
# -----------------------------
# Step 10: Load the fine-tuned LoRA model
# -----------------------------

loaded_model = GPT2ForSequenceClassification.from_pretrained(model_name, num_labels=2)
loaded_peft_model = PeftModel.from_pretrained(loaded_model, "./lora_gpt2_imdb")

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [16]:
# -----------------------------
# Step 11: Ensure pad_token_id is set
# -----------------------------
loaded_peft_model.config.pad_token_id = tokenizer.pad_token_id

In [17]:
# -----------------------------
# Step 12: Initialize the Trainer for evaluation
# -----------------------------
trainer = Trainer(
    model=loaded_peft_model,
    args=training_args,
    eval_dataset=test_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
)

In [18]:
# -----------------------------
# Step 13: Evaluate the fine-tuned model
# -----------------------------
print("\nEvaluating the fine-tuned model:")
eval_results_peft = trainer.evaluate()
print(f"Fine-tuned model accuracy: {eval_results_peft['eval_accuracy']:.4f}")


Evaluating the fine-tuned model:


Fine-tuned model accuracy: 0.5140


In [19]:
# -----------------------------
# Step 13: Comparison
# -----------------------------
print("\nComparison of performance:")
print(f"Pre-trained model accuracy: {eval_results['eval_accuracy']:.4f}")
print(f"Fine-tuned model accuracy: {eval_results_peft['eval_accuracy']:.4f}")


Comparison of performance:
Pre-trained model accuracy: 0.4920
Fine-tuned model accuracy: 0.5140
