# Project 1: Applying Lightweight Fine-Tuning to a Foundation Model <a class="jp-toc-ignore"></a>
## Project Introduction <a class="jp-toc-ignore"></a>
In this project, you will explore the power of parameter-efficient fine-tuning (PEFT) for adapting large foundation models to your specific needs—without requiring extensive computational resources. Leveraging the Hugging Face peft library, you will implement a workflow that demonstrates how modern generative AI models can be efficiently customized for downstream tasks.

The challenge is to bring together all the essential components of a PyTorch + Hugging Face training and inference pipeline. You will load a pre-trained transformer model, perform lightweight fine-tuning using the LoRA (Low-Rank Adaptation) technique, and compare the performance of the original and fine-tuned models on a sequence classification task. This project highlights the practical advantages of PEFT, including reduced training costs and model size, while maintaining strong performance.

## Project Structure <a class="jp-toc-ignore"></a>
The current project is broken into the following parts:

1. **Loading Base Model and Dataset:** Select and load a compatible transformer model and a text classification dataset from Hugging Face. Tokenize and preprocess the data for training and evaluation.
2. **Evaluating Pre Trained Model**: Evaluate the pre-trained model’s performance on the selected dataset to establish a reference point.
3. **Defining LoRA Configuration and PEFT model**: Create a LoRA configuration and convert the base model into a parameter-efficient trainable model.
4. **Fine-Tuning and Saving:** Fine-tune the PEFT model on the dataset, monitor training progress, and save the adapter weights.
5. **Evaluating with PEFT Model:** Load the fine-tuned PEFT model, run inference, and compare its performance to the original model to assess the impact of PEFT.
6. **Results and Insights:** Summarize findings, and highlight practical considerations for deploying PEFT in real-world scenarios.


# Loading Base Model and Dataset 

## Base Model
As base model we are going to use GPT2.

In [1]:
import torch
from datasets import load_dataset
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorWithPadding
)
from peft import LoraConfig, get_peft_model, TaskType
import evaluate
import numpy as np

In [20]:
model_name = "gpt2"
num_labels = 2

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=num_labels,
    device_map="auto",
    pad_token_id=tokenizer.pad_token_id
)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Dataset
We are going to use IMDB full dataset with 25.000 samples for each split (train and validate).

In [3]:
dataset = load_dataset("imdb", split=['train', 'test'])
train_dataset, eval_dataset = dataset[0], dataset[1]

In [5]:
dataset[0].to_pandas()

Unnamed: 0,text,label
0,I rented I AM CURIOUS-YELLOW from my video sto...,0
1,"""I Am Curious: Yellow"" is a risible and preten...",0
2,If only to avoid making this type of film in t...,0
3,This film was probably inspired by Godard's Ma...,0
4,"Oh, brother...after hearing about this ridicul...",0
...,...,...
24995,A hit at the time but now better categorised a...,1
24996,I love this movie like no other. Another time ...,1
24997,This film and it's sequel Barry Mckenzie holds...,1
24998,'The Adventures Of Barry McKenzie' started lif...,1


## Tokenizing the text samples

In [6]:
# Tokenization function
def tokenize_function(examples):
    return tokenizer(
        examples["text"], 
        truncation=True, 
        padding=True, 
        max_length=512
    )

# Tokenize datasets
tokenized_train = train_dataset.map(tokenize_function, batched=True)
tokenized_eval = eval_dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

# Evaluating Pre Trained Model

In [7]:
tokenized_train_dataset = train_dataset.map(tokenize_function, batched=True).shuffle(seed=666).select(range(5000))
tokenized_eval_dataset = eval_dataset.map(tokenize_function, batched=True).shuffle(seed=666).select(range(250))

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

In [12]:
from enum import Enum

In [13]:
class ReviewSentiment(Enum):
    NEGATIVE = 0
    POSITIVE = 1

In [14]:
id2label = {v.value: v.name for v in ReviewSentiment}
label2id = {v.name: v.value for v in ReviewSentiment}

In [15]:
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=len(id2label),
    id2label=id2label,
    label2id=label2id,
    device_map="auto"
)
model.config.pad_token_id = model.config.eos_token_id

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [16]:
def compute_metrics(eval_pred):
    # Taken from https://huggingface.co/docs/evaluate/transformers_integrations#trainer
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

In [17]:
metric_name = "accuracy"
metric = evaluate.load(metric_name)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

In [18]:
with torch.no_grad():
    evaluate_results_pretrained = Trainer(
        model=model,
        train_dataset=tokenized_train_dataset,
        eval_dataset=tokenized_eval_dataset,
        tokenizer=tokenizer,
        data_collator=DataCollatorWithPadding(tokenizer=tokenizer, padding="max_length"),
        compute_metrics=compute_metrics
    ).evaluate()

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


In [19]:
evaluate_results_pretrained

{'eval_loss': 1.8557566404342651,
 'eval_accuracy': 0.428,
 'eval_runtime': 22.5326,
 'eval_samples_per_second': 11.095,
 'eval_steps_per_second': 1.42}

# Defining LoRA Configuration and PEFT model

In [23]:
# Create LoRA configuration
peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    r=8,
    inference_mode=True,
    lora_alpha=32,
    lora_dropout=0.1,
    #target_modules="all-linear",
    #target_modules=["c_proj", "c_attn"],
    bias="none",
)

In [24]:
# Create PEFT model
peft_model = get_peft_model(base_model, peft_config)
peft_model.print_trainable_parameters()




trainable params: 3,072 || all params: 124,737,792 || trainable%: 0.0024627660556954542


# Fine-Tuning and Saving

In [25]:
# Evaluation metrics
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)


In [42]:
# Training arguments
training_args = TrainingArguments(
    output_dir="gpt2-imdb-peft",
    learning_rate=1e-3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=10,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

In [43]:
# Initialize trainer
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_eval_dataset,
    compute_metrics=compute_metrics,
    data_collator=DataCollatorWithPadding(tokenizer),
)


In [44]:
# Train the model
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,0.5888,0.570954,0.844
2,0.5339,0.4953,0.876
3,0.4232,0.538966,0.864
4,0.4572,0.535041,0.872
5,0.4774,0.476366,0.888
6,0.4473,0.464938,0.876
7,0.4623,0.504814,0.864
8,0.4047,0.444909,0.86
9,0.4253,0.439666,0.884
10,0.4338,0.436142,0.884


Checkpoint destination directory gpt2-imdb-peft/checkpoint-625 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory gpt2-imdb-peft/checkpoint-1250 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory gpt2-imdb-peft/checkpoint-1875 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=6250, training_loss=0.4646330676269531, metrics={'train_runtime': 2193.7489, 'train_samples_per_second': 22.792, 'train_steps_per_second': 2.849, 'total_flos': 1.31103719424e+16, 'train_loss': 0.4646330676269531, 'epoch': 10.0})

In [45]:
peft_model.save_pretrained("gpt2-imdb-peft/best_model-2")

# Evaluating with PEFT Model

In [46]:
from peft import AutoPeftModelForSequenceClassification

# Load the fine-tuned model
loaded_peft_model = AutoPeftModelForSequenceClassification.from_pretrained(
    "gpt2-imdb-peft/best_model-2",
    is_trainable=False, 
    device_map="auto"
)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [47]:
loaded_peft_model.config.pad_token_id = loaded_peft_model.config.eos_token_id

In [48]:
evaluate_results_peft = Trainer(
    model=loaded_peft_model,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_eval_dataset,
    tokenizer=tokenizer,
    data_collator=DataCollatorWithPadding(
        tokenizer=tokenizer, 
        padding="max_length"),
    compute_metrics=compute_metrics
).evaluate()

In [49]:
evaluate_results_peft

{'eval_loss': 0.4361419677734375,
 'eval_accuracy': 0.884,
 'eval_runtime': 22.2802,
 'eval_samples_per_second': 11.221,
 'eval_steps_per_second': 1.436}

In [50]:
evaluate_results_pretrained

{'eval_loss': 1.8557566404342651,
 'eval_accuracy': 0.428,
 'eval_runtime': 22.5326,
 'eval_samples_per_second': 11.095,
 'eval_steps_per_second': 1.42}

# Results and Insights
We have demonstrated the potential of using PEFT for fine tuning large language models, without having to retrain it from scratch. The customized PEFT model has double the precision when compared to the original one (88.4% vs 42.8%).