# HW3 - PEFT

In this notebook, we will fine-tune the GPT2 model on the [WikiText](https://huggingface.co/datasets/Salesforce/wikitext#wikitext-2-v1) dataset using different fine-tuning methodologies.

Parameter-Efficient Fine-Tuning (PEFT) is a technique that enables the adaptation of large pre-trained models to specific tasks while modifying only a small subset of their parameters, significantly reducing computational and memory costs. Instead of updating all model parameters, PEFT methods, such as LoRA (Low-Rank Adaptation), Adapter layers, and Prefix-Tuning, introduce lightweight trainable modules that are inserted into the model or modify activations in a structured way. This approach retains the general knowledge of the base model while efficiently adapting to new tasks, making it particularly useful for fine-tuning large-scale models like LLMs and vision-language models on resource-constrained hardware.

## Install required libraries

In [None]:
!pip install datasets



## Import required libraries

In [None]:
import gc
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
from datasets import load_dataset
from peft import LoraConfig, PrefixTuningConfig, get_peft_model, PeftModel

## Setup

In [None]:
gpt_2_medium_model_name = "openai-community/gpt2-medium"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(gpt_2_medium_model_name)
tokenizer.pad_token = tokenizer.eos_token

# Tokenize the dataset
def tokenizing_preprocess(examples):
    inputs =  tokenizer(examples['text'], truncation=True, padding='max_length', max_length=128)
    inputs['labels'] = inputs['input_ids'].copy()
    return inputs


# Define training arguments
training_args = TrainingArguments(
    output_dir='./gpt2',
    eval_strategy='no',
    save_strategy="no",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    learning_rate=2e-5,
    warmup_steps=500,
    weight_decay=0.01,
    report_to="none"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

## Load dataset (5 pt)

In [None]:
# TODO: Load the wikitext-2-v1 version of wikitext
dataset = ...

In [None]:
# TODO Select 1000 data for train and 500 data for validation
train_data = ...
eval_data = ...

# Apply tokenization preprocess on datasets
train_dataset = train_data.map(tokenizing_preprocess, batched=True)
eval_dataset = eval_data.map(tokenizing_preprocess, batched=True)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

## Full Fine-Tuning (5 pt)

In [None]:
# Load the model
ff_model = AutoModelForCausalLM.from_pretrained(gpt_2_medium_model_name)

In [None]:
# Initialize Trainer
trainer = Trainer(
    model=ff_model,
    args=training_args,
    train_dataset=train_dataset,
)

In [None]:
# Zero-Shot evaluation of model

# TODO: Evaluate model on eval_dataset
eval_output = ...

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

In [None]:
# TODO: Get reserved memory from cuda
gpu_memory_before = ...
# TODO: Train the model using trainer
train_output = ...
# TODO: Get reserved memory from cuda
gpu_memory_after = ...

# Report the training time and gpu memory consumption
print(f"Training time: {train_output.metrics['train_runtime']:.4f} seconds")
print(f"GPU memory used: {gpu_memory_after - gpu_memory_before:.4f} bytes")

In [None]:
# TODO: Evaluate model on eval_dataset
eval_output = ...

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

In [None]:
# Delete the model
del ff_model
del trainer

In [None]:
# Empty the GPU memory (Run this cell twice if the GPU RAM is not close to zero(~0.2))
gc.collect()
torch.cuda.empty_cache()

## Prefix Tuning (20 pt)

TODO: Explain about Prefix Tuning briefly

In [None]:
from transformers import AutoModel
prefix_model = AutoModelForCausalLM.from_pretrained(gpt_2_medium_model_name)

In [None]:
# TODO: Define your LoRA configuration using PrefixTuningConfig class from peft library
#       Set task_type to CAUSAL_LM


# TODO: Wrraped the GPT2LMHeadModel with above prefix config using get_peft_model function
prefix_model = ...

# TODO: Print number of trainable parameters


In [None]:
prefix_model

In [None]:
# Initialize Trainer
trainer = Trainer(
    model=prefix_model,
    args=training_args,
    train_dataset=train_dataset,
)

In [None]:
# TODO: Get reserved memory from cuda
gpu_memory_before = ...
# TODO: Train the model
train_output = ...
# TODO: Get reserved memory from cuda
gpu_memory_after = ...

# Report the training time and gpu memory consumption
print(f"Training time: {train_output.metrics['train_runtime']:.4f} seconds")
print(f"GPU memory used: {gpu_memory_after - gpu_memory_before:.4f} bytes")

In [None]:
# TODO: Evaluate model on eval_dataset
eval_output = ...

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

In [None]:
# Delete the model
del prefix_model
del trainer

In [None]:
# Empty the GPU memory (Run this cell twice if the GPU RAM is not close or less than 1.5Gb)
gc.collect()
torch.cuda.empty_cache()

## Fine-Tuning by LoRA (Low-Rank Adaptation) (40 pt)

TODO: Explain about LoRA (Low-Rank Adaptation) briefly

In [None]:
lora_model = AutoModelForCausalLM.from_pretrained(gpt_2_medium_model_name)

In [None]:
# Print the model artitechture
print(lora_model)

In [None]:

# TODO: Define your LoRA configuration using LoraConfig class from peft library
#       Apply the LoRA on Conv1D modules (c_attn and c_proj) of GPT2Attention blocks (attn).
#       Set fan_in_fan_out to True
#       Set task_type to CAUSAL_LM
...

# # TODO: Wrraped the transformer module of GPT2LMHeadModel with above lora config
# #       using get_peft_model function
lora_model = ...

# TODO: Print number of trainable parameters
...

In [None]:
# Print the model artitechture and see the changes
print(lora_model)

In [None]:
# Initialize Trainer
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=train_dataset,
)

In [None]:
# TODO: Get reserved memory from cuda
gpu_memory_before = ...
# TODO: Train the model using trainer
train_output = ...
# TODO: Get reserved memory from cuda
gpu_memory_after = ...

# Report the training time and gpu memory consumption
print(f"Training time: {train_output.metrics['train_runtime']:.4f} seconds")
print(f"GPU memory used: {gpu_memory_after - gpu_memory_before:.4f} bytes")

In [None]:
# TODO: Evaluate model on eval_dataset
eval_output = ...

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

eval_loss = 1.2751


In [None]:
# Delete the model
del lora_model
del trainer

In [None]:
# Empty the GPU memory (Run this cell twice if the GPU RAM is not close or less than 1.5Gb)
gc.collect()
torch.cuda.empty_cache()

#### Run LoRA for different rank values

Fine-tune the GPT-2 model with different rank values. (Be sure to change the alpha value according to the rank so that the results are fair.)

Enter the requested items in the table.

Compare the values ​​obtained and explain their differences.

TODO

| Method | Training Time(s) | Training Memory(Gb) | Validation Loss| #Trainable Params(M)|
|:-:|:-:|:-:|:-:|:-:|
| Zero-Shot         | ...  | ...  | ... | ... |
| Full Fine-Tuning  | ...  | ...  | ... | ... |
| Prefix Tuning     | ...  | ...  | ... | ... |
| Lora rank=4       | ...  | ...  | ... | ... |
| Lora rank=16      | ...  | ...  | ... | ... |
| Lora rank=64      | ...  | ...  | ... | ... |
| Lora rank=256     | ...  | ...  | ... | ... |




TODO:

Your detailed and complete explanation

## Implement LoRA from scratch (30 pt)

In [None]:
custom_lora_model = AutoModelForCausalLM.from_pretrained(gpt_2_medium_model_name)
print(custom_lora_model)

In [None]:
class LoRALayer(nn.Module):
    def __init__(self, base_layer, rank=8, alpha=16):
        super().__init__()
        self.rank = rank
        self.alpha = alpha

        # TODO: set the base_layer and extract the input and output shape of it
        ...

        # TODO: Define the A and B matrices
        #       Note that the B matrices must be initialized with zero (both weight and bias)
        ...

    def forward(self, x):
        # TODO: Complete the forward layer
        ...

In [None]:
# TODO: Freeze the model
...

# TODO: Loop over list of GPT2Blocks of model and replace the Conv1D
#       modules (c_attn, c_proj) of them with your LoRALayer
...

In [None]:
print(custom_lora_model)

In [None]:
# Initialize Trainer
trainer = Trainer(
    model=custom_lora_model,
    args=training_args,
    train_dataset=train_dataset,
)

In [None]:
# TODO: Get reserved memory from cuda
gpu_memory_before = ...
# TODO: Train the model using trainer
train_output = ...
# TODO: Get reserved memory from cuda
gpu_memory_after = ...

# Report the training time and gpu memory consumption
print(f"Training time: {train_output.metrics['train_runtime']:.4f} seconds")
print(f"GPU memory used: {gpu_memory_after - gpu_memory_before:.4f} bytes")

In [None]:
# TODO: Evaluate model on eval_dataset
eval_output = ...

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

In [None]:
# Delete the model
del custom_lora_model
del trainer

In [None]:
# Empty the GPU memory (Run this cell twice if the GPU RAM is not close or less than 1.5Gb)
gc.collect()
torch.cuda.empty_cache()