# Lightweight Fine-Tuning Project

* PEFT technique: LoRA, P-Tuning, Prompt-Tuning
* Model: GPT2
* Evaluation approach: Hugging face 
* Fine-tuning dataset: imdb

# Table of Contents
1. [Data Preprocess and EDA](#data)
2. [Loading and Evaluating a Foundation Model](#fundation)
3. [Performing Parameter-Efficient Fine-Tuning](#peft)
   
   3.1 [Peft with LoRA](#LoRA)
   
   3.2 [Peft with Prompt Tuning](#pttuning)
   
   3.3 [Peft with P Tuning](#ptuning)
   
5. [Performing Inference with a PEFT Model](#evaluation)

## Load Packages

In [54]:
from datasets import load_dataset
from transformers import GPT2Tokenizer,GPT2ForSequenceClassification, Trainer, TrainingArguments
import torch
import numpy as np
import pandas as pd
from peft import LoraConfig, TaskType, get_peft_model, PromptTuningConfig

## Data Preprocess and EDA 

### Load data

In [55]:
dataset = load_dataset('imdb')

# Random select 2000 observations as train and test datasets
train_data = dataset['train'].shuffle(seed=42).select(range(2000))
test_data = dataset['test'].shuffle(seed=42).select(range(2000))

### EDA

In [27]:
train_df = pd.DataFrame(train_data)
test_df = pd.DataFrame(test_data)

print(train_df.head())
print(test_df.head())

label_counts = train_df['label'].value_counts(normalize=True) * 100
print("Train Data Label Distribution:")
print(f"Percentage of 0: {label_counts[0]:.2f}%")
print(f"Percentage of 1: {label_counts[1]:.2f}%")


                                                text  label
0  There is no relation at all between Fortier an...      1
1  This movie is a great. The plot is very true t...      1
2  George P. Cosmatos' "Rambo: First Blood Part I...      0
3  In the process of trying to establish the audi...      1
4  Yeh, I know -- you're quivering with excitemen...      0
                                                text  label
0  <br /><br />When I unsuspectedly rented A Thou...      1
1  This is the latest entry in the long series of...      1
2  This movie was so frustrating. Everything seem...      0
3  I was truly and wonderfully surprised at "O' B...      1
4  This movie spends most of its time preaching t...      0
Train Data Label Distribution:
Percentage of 0: 52.00%
Percentage of 1: 48.00%


As we can see from the label distributions. The distribution of data is balanced. Not favor for label 1 or 0.

### Tokenize the data

In [56]:
# Load GPT2 tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token

# Tokenizer padding and truncation
def preprocess_function(examples):
    return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=512)

# Data preprocess
train_data = train_data.map(preprocess_function, batched=True)
test_data = test_data.map(preprocess_function, batched=True)

# Transform data to torch tensor format
train_data.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])
test_data.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])

## Loading and Evaluating a Foundation Model

Here we choose the GPT2 as our fundation model. From the followed results, the training accuracy is around 0.5. Which is close to random guess. Before that, we define our metric and set up training arguments for this project.

In [57]:
# Load fundation model
model = GPT2ForSequenceClassification.from_pretrained('gpt2', num_labels=2)
model.config.pad_token_id=50256

# Evaluation function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

# Training arguments
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=2,
    weight_decay=0.01,
)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [58]:
# Evaluate fundation model
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=test_data,
    compute_metrics=compute_metrics
)

eval_results = trainer.evaluate()

print(f"Evaluation results: {eval_results}")

Evaluation results: {'eval_loss': 0.7916995286941528, 'eval_accuracy': 0.515, 'eval_runtime': 3.7862, 'eval_samples_per_second': 52.824, 'eval_steps_per_second': 13.206}


In [42]:
model.save_pretrained("./saved_model/original")

## Performing Parameter-Efficient Fine-Tuning

### Peft with LoRA

LoRA (Low-Rank Adaptation) is a fine tuning method that decompose high rank large parameter metrices into product of lower rank matrices to reduce computation.
This method hereby reduces the number of parameters in tuning process.

In [59]:
LoRA_config = LoraConfig(task_type=TaskType.SEQ_CLS, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)

LoRA_model = get_peft_model(model, LoRA_config)
LoRA_model.print_trainable_parameters()

trainable params: 296,448 || all params: 124,737,792 || trainable%: 0.2377


In [60]:
# Create trainer instance
LoRA_trainer = Trainer(
    model=LoRA_model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=test_data,
    compute_metrics=compute_metrics
)
LoRA_trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.789687,0.49
2,No log,0.788505,0.49


TrainOutput(global_step=100, training_loss=0.8190373992919922, metrics={'train_runtime': 24.8348, 'train_samples_per_second': 16.106, 'train_steps_per_second': 4.027, 'total_flos': 104882975539200.0, 'train_loss': 0.8190373992919922, 'epoch': 2.0})

In [61]:
LoRA_model.save_pretrained("./saved_model/LoRA")

### Peft with Prompt Tuning

Prompt is a subset of embedding vectors that can lead to better training for language model. Tuning only these vectors is called prompt tuning.

### Peft with P Tuning

P tuning, or prefix tuning is to tune the prefix vector to get better training performance. It is helpful for allocate large model on device with limited computation resource.

## Performing Inference with a PEFT Model


In [69]:
from datasets import load_dataset
from transformers import GPT2Tokenizer,GPT2ForSequenceClassification, Trainer, TrainingArguments
import torch
import numpy as np
import pandas as pd
from peft import LoraConfig, TaskType, get_peft_model, PromptTuningConfig

In [70]:
dataset = load_dataset('imdb')

# Random select 5000 observations as train and test datasets
train_data = dataset['train'].shuffle(seed=41).select(range(2000))
test_data = dataset['test'].shuffle(seed=41).select(range(2000))

In [74]:
# Load GPT2 tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
tokenizer.pad_token = tokenizer.eos_token

# Tokenizer padding and truncation
def preprocess_function(examples):
    return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=512)

# Data preprocess
train_data = train_data.map(preprocess_function, batched=True)
test_data = test_data.map(preprocess_function, batched=True)

# Transform data to torch tensor format
train_data.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])
test_data.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

In [71]:
# Evaluation function
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return {"accuracy": (predictions == labels).mean()}

# Training arguments
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=2,
    weight_decay=0.01,
)

In [75]:
# LoRA 
LoRA_model = GPT2ForSequenceClassification.from_pretrained("./saved_model/LoRA")
LoRA_model.config.pad_token_id=50256
LoRA_trainer = Trainer(
    model=LoRA_model,
    args=training_args,
    train_dataset=train_data,
    eval_dataset=test_data,
    compute_metrics=compute_metrics
)
LoRA_eval_results = LoRA_trainer.evaluate()

print(f"Evaluation results: {LoRA_eval_results}")

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Evaluation results: {'eval_loss': 0.765064537525177, 'eval_accuracy': 0.51, 'eval_runtime': 3.7282, 'eval_samples_per_second': 53.646, 'eval_steps_per_second': 13.411}
