# HW3 - PEFT

In this notebook, we will fine-tune the GPT2 model on the [WikiText](https://huggingface.co/datasets/Salesforce/wikitext#wikitext-2-v1) dataset using different fine-tuning methodologies.

Parameter-Efficient Fine-Tuning (PEFT) is a technique that enables the adaptation of large pre-trained models to specific tasks while modifying only a small subset of their parameters, significantly reducing computational and memory costs. Instead of updating all model parameters, PEFT methods, such as LoRA (Low-Rank Adaptation), Adapter layers, and Prefix-Tuning, introduce lightweight trainable modules that are inserted into the model or modify activations in a structured way. This approach retains the general knowledge of the base model while efficiently adapting to new tasks, making it particularly useful for fine-tuning large-scale models like LLMs and vision-language models on resource-constrained hardware.

## Install required libraries

In [None]:
!pip install -U datasets

Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m30.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2025.3.0-py3-none-any.whl (193 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fsspec, datasets
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2025.3.2
    Uninstalling fsspec-2025.3.2:
      Successfully uninstalled fsspec-2025.3.2
  Attempting uninstall: datasets
    Found existing installation: datasets 2.14.4
    Uninstalling datasets-2.14.4:
      Successfully uninstalled datasets-2.14.4
[31mERROR: pip's dependency r

## Import required libraries

In [None]:
import gc
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments
from datasets import load_dataset
from peft import LoraConfig, PrefixTuningConfig, get_peft_model, PeftModel

## Setup

In [None]:
gpt_2_medium_model_name = "openai-community/gpt2-medium"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(gpt_2_medium_model_name)
tokenizer.pad_token = tokenizer.eos_token

# Tokenize the dataset
def tokenizing_preprocess(examples):
    inputs =  tokenizer(examples['text'], truncation=True, padding='max_length', max_length=128)
    inputs['labels'] = inputs['input_ids'].copy()
    return inputs


# Define training arguments
training_args = TrainingArguments(
    output_dir='./gpt2',
    eval_strategy='no',
    save_strategy="no",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    learning_rate=2e-5,
    warmup_steps=500,
    weight_decay=0.01,
    report_to="none"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

## Load dataset (5 pt)

In [None]:
# TODO: Load the wikitext-2-v1 version of wikitext
dataset = load_dataset(path="Salesforce/wikitext", name="wikitext-2-v1", split="train")

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/685k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/6.07M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/618k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/4358 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/36718 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/3760 [00:00<?, ? examples/s]

In [None]:
dataset[0]

{'text': ' <unk> , short @-@ arc , high pressure xenon arc lamps have a color temperature closely <unk> noon sunlight and are used in solar simulators . That is , the <unk> of these lamps closely <unk> a heated black body <unk> that has a temperature close to that observed from the Sun . After they were first introduced during the 1940s , these lamps began replacing the shorter @-@ lived carbon arc lamps in movie <unk> . They are employed in typical 35mm , <unk> and the new digital <unk> film projection systems , automotive <unk> <unk> , high @-@ end " tactical " <unk> and other specialized uses . These arc lamps are an excellent source of short wavelength ultraviolet radiation and they have intense emissions in the near infrared , which is used in some night vision systems . \n'}

In [None]:
# TODO Select 1000 data for train and 500 data for validation
dataset = dataset.shuffle(seed=42)
train_data = dataset.select(range(1000))
eval_data = dataset.select(range(1000, 1500))

# Apply tokenization preprocess on datasets
train_dataset = train_data.map(tokenizing_preprocess, batched=True)
eval_dataset = eval_data.map(tokenizing_preprocess, batched=True)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

## Full Fine-Tuning (5 pt)

In [None]:
# Load the model
ff_model = AutoModelForCausalLM.from_pretrained(gpt_2_medium_model_name)

model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [None]:
# Initialize Trainer
trainer = Trainer(
    model=ff_model,
    args=training_args,
    train_dataset=train_dataset
)

In [None]:
# Zero-Shot evaluation of model

# TODO: Evaluate model on eval_dataset
eval_output = trainer.evaluate(eval_dataset)

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


eval_loss = 6.8448


In [None]:
# TODO: Get reserved memory from cuda
gpu_memory_before = torch.cuda.max_memory_reserved()
# TODO: Train the model using trainer
train_output = trainer.train()
# TODO: Get reserved memory from cuda
gpu_memory_after = torch.cuda.max_memory_reserved()

# Report the training time and gpu memory consumption
print(f"Training time: {train_output.metrics['train_runtime']:.4f} seconds")
print(f"GPU memory used: {gpu_memory_after - gpu_memory_before:.4f} bytes")

Step,Training Loss


Training time: 122.9619 seconds
GPU memory used: 5542772736.0000 bytes


In [None]:
# TODO: Evaluate model on eval_dataset
eval_output = trainer.evaluate(eval_dataset)

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

eval_loss = 1.2028


In [None]:
# Delete the model
del ff_model
del trainer

In [None]:
# Empty the GPU memory (Run this cell twice if the GPU RAM is not close to zero(~0.2))
gc.collect()
torch.cuda.empty_cache()

## Prefix Tuning (20 pt)

TODO: Explain about Prefix Tuning briefly

In [None]:
from transformers import AutoModel
prefix_model = AutoModelForCausalLM.from_pretrained(gpt_2_medium_model_name)

In [None]:
# TODO: Define your LoRA configuration using PrefixTuningConfig class from peft library
#       Set task_type to CAUSAL_LM

prefix_config = PrefixTuningConfig(task_type = 'CAUSAL_LM', num_virtual_tokens = 20, encoder_hidden_size = 768)

# TODO: Wrraped the GPT2LMHeadModel with above prefix config using get_peft_model function
prefix_model = get_peft_model(prefix_model, prefix_config)

# TODO: Print number of trainable parameters
num_params = sum(p.numel() for p in prefix_model.parameters() if p.requires_grad)
all_params = sum(p.numel() for p in prefix_model.parameters())
print(f'Number of trainable paremeters: {num_params} out of {all_params}')

Number of trainable paremeters:  983040


In [None]:
prefix_model

PeftModelForCausalLM(
  (base_model): GPT2LMHeadModel(
    (transformer): GPT2Model(
      (wte): Embedding(50257, 1024)
      (wpe): Embedding(1024, 1024)
      (drop): Dropout(p=0.1, inplace=False)
      (h): ModuleList(
        (0-23): 24 x GPT2Block(
          (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (attn): GPT2Attention(
            (c_attn): Conv1D(nf=3072, nx=1024)
            (c_proj): Conv1D(nf=1024, nx=1024)
            (attn_dropout): Dropout(p=0.1, inplace=False)
            (resid_dropout): Dropout(p=0.1, inplace=False)
          )
          (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (mlp): GPT2MLP(
            (c_fc): Conv1D(nf=4096, nx=1024)
            (c_proj): Conv1D(nf=1024, nx=4096)
            (act): NewGELUActivation()
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
      )
      (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
    )
    (lm_head): Linear(in_fea

In [None]:
# Initialize Trainer
trainer = Trainer(
    model=prefix_model,
    args=training_args,
    train_dataset=train_dataset,
)

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
# TODO: Evaluate model on eval_dataset
eval_output = trainer.evaluate(eval_dataset)

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

eval_loss = 8.5969


In [None]:
# TODO: Get reserved memory from cuda
gpu_memory_before = torch.cuda.max_memory_reserved()
# TODO: Train the model
train_output = trainer.train()
# TODO: Get reserved memory from cuda
gpu_memory_after = torch.cuda.max_memory_reserved()

# Report the training time and gpu memory consumption
print(f"Training time: {train_output.metrics['train_runtime']:.4f} seconds")
print(f"GPU memory used: {gpu_memory_after - gpu_memory_before:.4f} bytes")

Step,Training Loss


Training time: 66.8062 seconds
GPU memory used: 0.0000 bytes


In [None]:
# TODO: Evaluate model on eval_dataset
eval_output = trainer.evaluate(eval_dataset)

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

eval_loss = 7.7825


In [None]:
# Delete the model
# del prefix_model
# del trainer

In [None]:
# Empty the GPU memory (Run this cell twice if the GPU RAM is not close or less than 1.5Gb)
gc.collect()
torch.cuda.empty_cache()

## Fine-Tuning by LoRA (Low-Rank Adaptation) (40 pt)

TODO: Explain about LoRA (Low-Rank Adaptation) briefly

In [None]:
lora_model = AutoModelForCausalLM.from_pretrained(gpt_2_medium_model_name)

In [None]:
# Print the model artitechture
print(lora_model)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 1024)
    (wpe): Embedding(1024, 1024)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-23): 24 x GPT2Block(
        (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=3072, nx=1024)
          (c_proj): Conv1D(nf=1024, nx=1024)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=4096, nx=1024)
          (c_proj): Conv1D(nf=1024, nx=4096)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=1024, out_features=50257, bias=False)
)


In [None]:

# TODO: Define your LoRA configuration using LoraConfig class from peft library
#       Apply the LoRA on Conv1D modules (c_attn and c_proj) of GPT2Attention blocks (attn).
#       Set fan_in_fan_out to True
#       Set task_type to CAUSAL_LM
lora_config = LoraConfig(
    r = 8,
    lora_alpha = 16,
    task_type = 'CAUSAL_LM',
    target_modules=["attn.c_attn", "attn.c_proj"],
    fan_in_fan_out = True
)

# # TODO: Wrraped the transformer module of GPT2LMHeadModel with above lora config
# #       using get_peft_model function
lora_model = get_peft_model(lora_model, lora_config)

# TODO: Print number of trainable parameters
num_params = sum(p.numel() for p in lora_model.parameters() if p.requires_grad)
all_params = sum(p.numel() for p in lora_model.parameters())
print(f'Number of trainable paremeters: {num_params} out of {all_params}')

Number of trainable paremeters: 1179648 out of 356002816


In [None]:
# Print the model artitechture and see the changes
print(lora_model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): GPT2LMHeadModel(
      (transformer): GPT2Model(
        (wte): Embedding(50257, 1024)
        (wpe): Embedding(1024, 1024)
        (drop): Dropout(p=0.1, inplace=False)
        (h): ModuleList(
          (0-23): 24 x GPT2Block(
            (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
            (attn): GPT2Attention(
              (c_attn): lora.Linear(
                (base_layer): Conv1D(nf=3072, nx=1024)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=1024, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=3072, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_mag

In [None]:
# Initialize Trainer
trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=train_dataset,
)

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
# TODO: Get reserved memory from cuda
gpu_memory_before = torch.cuda.max_memory_reserved()
# TODO: Train the model using trainer
train_output = trainer.train()
# TODO: Get reserved memory from cuda
gpu_memory_after = torch.cuda.max_memory_reserved()

# Report the training time and gpu memory consumption
print(f"Training time: {train_output.metrics['train_runtime']:.4f} seconds")
print(f"GPU memory used: {gpu_memory_after - gpu_memory_before:.4f} bytes")

Step,Training Loss


Training time: 78.3969 seconds
GPU memory used: 0.0000 bytes


In [None]:
# TODO: Evaluate model on eval_dataset
eval_output = trainer.evaluate(eval_dataset)

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

eval_loss = 4.3911


In [None]:
# Delete the model
del lora_model
del trainer

In [None]:
# Empty the GPU memory (Run this cell twice if the GPU RAM is not close or less than 1.5Gb)
gc.collect()
torch.cuda.empty_cache()

#### Run LoRA for different rank values

Fine-tune the GPT-2 model with different rank values. (Be sure to change the alpha value according to the rank so that the results are fair.)

Enter the requested items in the table.

Compare the values ​​obtained and explain their differences.

TODO

| Method | Training Time(s) | Training Memory(Gb) | Validation Loss| #Trainable Params(M)|
|:-:|:-:|:-:|:-:|:-:|
| Zero-Shot         | ...  | ...  | ... | ... |
| Full Fine-Tuning  | ...  | ...  | ... | ... |
| Prefix Tuning     | ...  | ...  | ... | ... |
| Lora rank=4       | ...  | ...  | ... | ... |
| Lora rank=16      | ...  | ...  | ... | ... |
| Lora rank=64      | ...  | ...  | ... | ... |
| Lora rank=256     | ...  | ...  | ... | ... |




TODO:

Your detailed and complete explanation

## Implement LoRA from scratch (30 pt)

In [None]:
custom_lora_model = AutoModelForCausalLM.from_pretrained(gpt_2_medium_model_name)
print(custom_lora_model)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 1024)
    (wpe): Embedding(1024, 1024)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-23): 24 x GPT2Block(
        (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=3072, nx=1024)
          (c_proj): Conv1D(nf=1024, nx=1024)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=4096, nx=1024)
          (c_proj): Conv1D(nf=1024, nx=4096)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=1024, out_features=50257, bias=False)
)


In [None]:
import math

class LoRALayer(nn.Module):
    def __init__(self, base_layer, rank=8, alpha=16):
        super().__init__()
        self.rank = rank
        self.alpha = alpha
        self.scaling = alpha / rank

        # TODO: set the base_layer and extract the input and output shape of it
        self.base_layer = base_layer
        self.base_layer.weight.requires_grad = False
        self.base_layer.bias.requires_grad = False

        self.in_dim = base_layer.nx
        self.out_dim = base_layer.nf

        # TODO: Define the A and B matrices
        #       Note that the B matrices must be initialized with zero (both weight and bias)
        self.B = nn.Linear(self.rank, self.out_dim, bias=False)
        nn.init.zeros_(self.B.weight)

        self.A = nn.Linear(self.in_dim, self.rank, bias=False)
        nn.init.kaiming_uniform_(self.A.weight, a=math.sqrt(5))


    def forward(self, x):
        # TODO: Complete the forward layer
        base_out = self.base_layer(x)

        lora_out = self.A(x)
        lora_out = self.B(lora_out)
        lora_out = lora_out * self.scaling

        return base_out + lora_out

In [None]:
# TODO: Freeze the model
for p in custom_lora_model.parameters():
  p.requires_grad = False

# TODO: Loop over list of GPT2Blocks of model and replace the Conv1D
#       modules (c_attn, c_proj) of them with your LoRALayer
blocks = custom_lora_model.transformer.h
for i in range(24):
  blocks[i].attn.c_attn = LoRALayer(blocks[i].attn.c_attn)
  blocks[i].attn.c_proj = LoRALayer(blocks[i].attn.c_proj)

In [None]:
print(custom_lora_model)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 1024)
    (wpe): Embedding(1024, 1024)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-23): 24 x GPT2Block(
        (ln_1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): LoRALayer(
            (base_layer): Conv1D(nf=3072, nx=1024)
            (B): Linear(in_features=8, out_features=3072, bias=False)
            (A): Linear(in_features=1024, out_features=8, bias=False)
          )
          (c_proj): LoRALayer(
            (base_layer): Conv1D(nf=1024, nx=1024)
            (B): Linear(in_features=8, out_features=1024, bias=False)
            (A): Linear(in_features=1024, out_features=8, bias=False)
          )
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
       

In [None]:
# num_params = sum(p.numel() for p in custom_lora_model.parameters() if p.requires_grad)
# all_params = sum(p.numel() for p in custom_lora_model.parameters())
# print(f'Number of trainable paremeters: {num_params} out of {all_params}')

In [None]:
# Initialize Trainer
trainer = Trainer(
    model=custom_lora_model,
    args=training_args,
    train_dataset=train_dataset,
)

In [None]:
# TODO: Evaluate model on eval_dataset
eval_output = trainer.evaluate(eval_dataset)

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


eval_loss = 6.8448


In [None]:
# TODO: Get reserved memory from cuda
gpu_memory_before = torch.cuda.max_memory_reserved()
# TODO: Train the model using trainer
train_output = trainer.train()
# TODO: Get reserved memory from cuda
gpu_memory_after = torch.cuda.max_memory_reserved()

# Report the training time and gpu memory consumption
print(f"Training time: {train_output.metrics['train_runtime']:.4f} seconds")
print(f"GPU memory used: {gpu_memory_after - gpu_memory_before:.4f} bytes")

Step,Training Loss


Training time: 75.7350 seconds
GPU memory used: 0.0000 bytes


In [None]:
# TODO: Evaluate model on eval_dataset
eval_output = trainer.evaluate(eval_dataset)

print(f"eval_loss = {eval_output['eval_loss']:.4f}")

eval_loss = 4.3801


In [None]:
# Delete the model
del custom_lora_model
del trainer

In [None]:
# Empty the GPU memory (Run this cell twice if the GPU RAM is not close or less than 1.5Gb)
gc.collect()
torch.cuda.empty_cache()