# **CIS 6200 Spring 2024 Homework 4**


**Coding: Experiment LoRA and Last-layer Fine Tuning with PEFT**
1. With the provided notebook, load a pre-trained Vision-Transformer Model, and preprocess the dataset to fit the fine-tuning task. Report:
  * What metrics are we using for the evaluation? Can you think of any other options (no need to implement)?
  * What is the baseline accuracy on our task?
2. Implement a LoRA fine-tuning with PEFT frameworks and Report:
  * What hyperparameters have you used for LoRA configuration? What are their meanings respectively?
  * How many parameters are there in the LoRA model?
  * How long does the fine-tuning take? What is the resulting accuracy?
  * [Note] Recommended LoRA rank = 8 (optional: feel free to try other values!)
3. Instead of LoRA, directly fine-tune the last layer (with all previous layers fixed). Report:
  * How many parameters are involved in the fine-tuning?
  * How long does the fine-tuning take? What is the resulting accuracy?

**Discussion question:**
4. Calculate theoretically the number of parameters for the fine-tuning task in 2 and 3. Does it match with your implementation?
  * [Note] Information about model architecture is available at HuggingFace
5. Compare the two fine-tuning methods on their accuracy, efficiency, and reliability. What are the advantages and disadvantages of each?



**Note: Answers to the questions need to be submitted in the corresponding PDF submission along with this coding submission on gradescope.**

## Install Dependencies

In [1]:
! pip install -U accelerate
! pip install -U transformers
! pip install torch



In [2]:
!pip install evaluate datasets git+https://github.com/huggingface/peft -q

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## Data Pre-processing
***No required code changes. Feel free to edit if needed.***

In [3]:
# HuggingFace authentication
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [4]:
import transformers
import accelerate
import peft
import evaluate
import torch
import numpy as np
from datasets import load_dataset
from peft import (
    LoraConfig,
    get_peft_model
)
from transformers import (
    AutoImageProcessor,
    AutoModelForImageClassification,
    TrainingArguments,
    Trainer
)
from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)
import time

print(f"Transformers version: {transformers.__version__}")
print(f"Accelerate version: {accelerate.__version__}")
print(f"PEFT version: {peft.__version__}")

Transformers version: 4.37.2
Accelerate version: 0.27.2
PEFT version: 0.8.2


In [5]:
model_checkpoint = "google/vit-base-patch16-224-in21k"

We use the first 5000 instances from the training set of the [Food-101 dataset](https://huggingface.co/datasets/food101).

In [6]:
dataset = load_dataset("food101", split="train[:5000]")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/490M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/464M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/472M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/464M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/475M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/470M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/478M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/486M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/423M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/413M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/426M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/75750 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/25250 [00:00<?, ? examples/s]

In [7]:
# Prepare label reference
labels = dataset.features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = i
    id2label[i] = label

In [8]:
image_processor = AutoImageProcessor.from_pretrained(model_checkpoint)

normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
train_transforms = Compose(
    [
        RandomResizedCrop(image_processor.size["height"]),
        RandomHorizontalFlip(),
        ToTensor(),
        normalize,
    ]
)

val_transforms = Compose(
    [
        Resize(image_processor.size["height"]),
        CenterCrop(image_processor.size["height"]),
        ToTensor(),
        normalize,
    ]
)

def preprocess_train(example_batch):
    """Apply train_transforms across a batch."""
    example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

def preprocess_val(example_batch):
    """Apply val_transforms across a batch."""
    example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
    return example_batch

preprocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/502 [00:00<?, ?B/s]

In [9]:
# split up training into training + validation
splits = dataset.train_test_split(test_size=0.1)
train_ds = splits["train"]
val_ds = splits["test"]

In [10]:
train_ds.set_transform(preprocess_train)
val_ds.set_transform(preprocess_val)

In [11]:
def collate_fn(examples):
    pixel_values = torch.stack([example["pixel_values"] for example in examples])
    labels = torch.tensor([example["label"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}

## Set-up

In [12]:
# Method to get number of trained parameters in a model
def trainable_parameters(model):
    """ <add code here> """
    num_parameters=sum(p.numel() for p in model.parameters())
    print(f"Total Parameters:{num_parameters}")
    num_trainable_parameters=sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f"Percentage of trainable parameters={(num_trainable_parameters/num_parameters)*100}%\n\n")

    return num_trainable_parameters

In [13]:
# Import model from HuggingFace
model = AutoModelForImageClassification.from_pretrained(
    model_checkpoint,
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,
)

model.safetensors:   0%|          | 0.00/346M [00:00<?, ?B/s]

Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [14]:
print(model)

ViTForImageClassification(
  (vit): ViTModel(
    (embeddings): ViTEmbeddings(
      (patch_embeddings): ViTPatchEmbeddings(
        (projection): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
      )
      (dropout): Dropout(p=0.0, inplace=False)
    )
    (encoder): ViTEncoder(
      (layer): ModuleList(
        (0-11): 12 x ViTLayer(
          (attention): ViTAttention(
            (attention): ViTSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.0, inplace=False)
            )
            (output): ViTSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.0, inplace=False)
            )
          )
          (intermediate): ViTIntermediate(
            (dense): Linear(in_features=7

In [15]:
trainable_parameters(model)

Total Parameters:85876325
Percentage of trainable parameters=100.0%




85876325

In [13]:
# Evaluation metrics - what are we evaluating?
metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return metric.compute(predictions=predictions, references=eval_pred.label_ids)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

## LoRA Fine-tuning
***TODO: Fine-tune with LoRA and answer the questions***

In [17]:
# LoRA Configuration and model

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
    modules_to_save=["classifier"],
)

lora_model = get_peft_model(model, config)

In [18]:
trainable_parameters(lora_model)

Total Parameters:86248906
Percentage of trainable parameters=0.4319834503176191%




372581

In [20]:
# HuggingFace trainer to run PEFT

model_name = model_checkpoint.split("/")[-1]
batch_size = 128


args = TrainingArguments(
    f"{model_name}-finetuned-lora-food101",
    remove_unused_columns=False,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-3,
    per_device_train_batch_size=batch_size,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    num_train_epochs=5,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    push_to_hub=True,
    label_names=["labels"],
)


trainer = Trainer(
    lora_model,
    args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    tokenizer=image_processor,
    compute_metrics=compute_metrics,
    data_collator=collate_fn,
)

In [21]:
start=time.time()
train_results=trainer.train()
end=time.time()
print(f"Time taken:{end-start}")

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.603083,0.9
2,2.184000,0.210725,0.962
3,0.371800,0.149772,0.962
4,0.212000,0.137715,0.958
5,0.179500,0.126671,0.964


Time taken:330.8643636703491


## Last layer tuning
***TODO: Use the same model, fix all parameters except the last one, and implement fine-tuning to answer required questions***

In [87]:
last_layer_model = AutoModelForImageClassification.from_pretrained(
    model_checkpoint,
    label2id=label2id,
    id2label=id2label,
    ignore_mismatched_sizes=True,
)

print(last_layer_model)

Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


ViTForImageClassification(
  (vit): ViTModel(
    (embeddings): ViTEmbeddings(
      (patch_embeddings): ViTPatchEmbeddings(
        (projection): Conv2d(3, 768, kernel_size=(16, 16), stride=(16, 16))
      )
      (dropout): Dropout(p=0.0, inplace=False)
    )
    (encoder): ViTEncoder(
      (layer): ModuleList(
        (0-11): 12 x ViTLayer(
          (attention): ViTAttention(
            (attention): ViTSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.0, inplace=False)
            )
            (output): ViTSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.0, inplace=False)
            )
          )
          (intermediate): ViTIntermediate(
            (dense): Linear(in_features=7

In [91]:
#Freezing
for param in last_layer_model.parameters():
    param.requires_grad = False
for param in last_layer_model.vit.encoder.layer[11].parameters():
    param.requires_grad = True
for param in last_layer_model.classifier.parameters():
    param.requires_grad = True




In [95]:
trainable_parameters(last_layer_model)

Total Parameters:85876325
Percentage of trainable parameters=8.344023803999532%




7165541

In [96]:
# Can still use trainer, or implement separate training loops

args = TrainingArguments(
    f"{model_name}-last-layer-finetuned-food101",
    remove_unused_columns=False,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-3,
    per_device_train_batch_size=batch_size,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=batch_size,
    fp16=True,
    num_train_epochs=5,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    push_to_hub=True,
    label_names=["labels"],
)

trainer = Trainer(
    last_layer_model,
    args,
    train_dataset=train_ds,
    eval_dataset=val_ds,
    tokenizer=image_processor,
    compute_metrics=compute_metrics,
    data_collator=collate_fn,
)

In [98]:
start=time.time()
last_layer_train_results=trainer.train()
end=time.time()
print(f"Time taken:{end-start}")

Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.129803,0.952
2,0.239600,0.13031,0.954
3,0.170300,0.135166,0.958
4,0.157600,0.13436,0.954
5,0.136700,0.132819,0.95


Checkpoint destination directory vit-base-patch16-224-in21k-last-layer-finetuned-food101/checkpoint-9 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory vit-base-patch16-224-in21k-last-layer-finetuned-food101/checkpoint-18 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory vit-base-patch16-224-in21k-last-layer-finetuned-food101/checkpoint-27 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory vit-base-patch16-224-in21k-last-layer-finetuned-food101/checkpoint-36 already exists and is non-empty.Saving will proceed but saved results may be invalid.
Checkpoint destination directory vit-base-patch16-224-in21k-last-layer-finetuned-food101/checkpoint-45 already exists and is non-empty.Saving will proceed but saved results may be invalid.


Time taken:256.609943151474
