<a href="https://colab.research.google.com/github/danielchen-pyc/CS61C_Lab/blob/master/MAT1510_HW4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Instructions

You are working at an art gallery tasked with creating a search function that can classify artworks based on their subject and have settled on using a vision transformer to do the classification. You are curious in comparing between:

1. Fine-tuning a pre-trained model using LoRA.
2. Fine-tuning the entire pre-trained model.
3. Training a randomly initialized model from scratch on the dataset.

You will be using the popular libraries: transformers, datasets, and peft for models, datasets, and fine-tuning/training.

This guide might be helpful when doing the homework: https://huggingface.co/docs/peft/main/en/task_guides/image_classification_lora

**Make sure to use a GPU (T4) runtime.**

# Tasks
1.   Load the dataset: https://huggingface.co/datasets/flwrlabs/pacs and partition the dataset into a train and test split so that 20% of the dataset is reserved for evaluation. What are the classes/subjects in the dataset. What are the different art styles/domains present in the dataset? We will just be interested in classifying the subjects and won't be distinguishing between art styles/domains.
2.   We will be using Google's ViT-B (https://huggingface.co/google/vit-base-patch16-224). Initialize the image processor and two instances of the pre-trained model (one for LoRA and one for full fine-tuning). What dataset was the pre-trained model trained on? Make sure when loading the model to modify the classifier layer of the model so that it will work with the PACS dataset (this layer will be fine-tuned later).
3. Initialize a randomly initialized model with the same architecture as the pre-trained model that will be trained from scratch (be careful with the classifier layer).
4. Preprocess the dataset so that images are passed through the image processor prior to being fed to the model. No other augmentation or transformation is needed.
5. Using the PEFT library, initialize a LoRA model that adds low-rank adapters for the query and value weight matrices in the transformer block. Set it so that these adapters have rank 4, a scaling of 32, a dropout of 0.1, and no bias. How many parameters are going to be trained in the LoRA model (make sure to include the classifier layer).
6. Using the Trainer from the transformers library, create a Trainer that uses AdamW to fine-tune the model for 5 epochs with a learning rate of 0.0002, batch size of 64, warm up ratio of 0.1, and a l2 weight decay of 0.01. Train the LoRA model.
7. Using the same optimizer, perform full fine-tuning on the pre-trained model as well as train the randomly initialized model from scratch.
8. Evaluate the models on the test datset (code is already available) and output the model accuracies.
9. Which method performs the best on the test dataset. What do you notice about the time needed to train each respective model? Why is there such a difference? In 2-4 sentences explain why large foundation models have become so prominent and why you would pick one fine-tuning approach over the other.

NOTE: It should not be necessary to delete or change any of the existing code present in the notebook.

In [None]:
#@title Install Packages
!pip install datasets
!pip install lm_eval

Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl (

In [None]:

#@title Imports
from transformers import ViTImageProcessor, ViTForImageClassification, AutoConfig, AutoModelForImageClassification, Trainer, TrainingArguments
from datasets import load_dataset
import torch
from PIL import Image
from torch.utils.data import DataLoader, Dataset
from peft import LoraConfig, get_peft_model
import numpy as np
import evaluate

In [9]:
#@title Load Dataset and Create Train and Test Partition
ds = load_dataset("flwrlabs/pacs")

# Might need to do something here to set up the classifier layer properly later...
train_test_split = ds['train'].train_test_split(test_size=0.2, stratify_by_column="label")
# print(train_test_split['train'][0])

# # Access the train and test sets
train_ds = train_test_split['train']
test_ds = train_test_split['test']

In [10]:
#@title Load Model and Image Processor
processor          = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')
pretrained_model   = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')
pretrained_model_2 = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')

# Might need to do something here to load randomly initialized model with same architecture...

untrained_model = ViTForImageClassification(pretrained_model.config)

In [11]:
#@title Preprocess the Dataset with the Image Processor

# def preprocess(examples):
#     # 'return_tensors' is set to 'pt' for PyTorch tensors
#     # examples['image'] = [processor(image, return_tensors='pt') for image in examples['image']]
#     examples['image'] = processor(images=examples['image'], return_tensors='pt')
#     return examples

def preprocess(examples):
    # examples['pixel_values'] = [processor(image, return_tensors='pt') for image in examples['image']]
    # examples['labels'] = torch.tensor(examples["label"])
    # return {'pixel_values': processor(images=examples['image'], return_tensors='pt'),
    #         'labels': examples["label"]}
    inputs = processor([x for x in examples['image']], return_tensors='pt')
    inputs['labels'] = examples['label']
    return inputs

# def preprocess(example_batch):
#     inputs = {}
#     inputs['pixel_values'] = [processor(image, return_tensors='pt') for image in example_batch['image']]
#     # inputs['domain'] = example_batch['domain']
#     inputs['labels'] = example_batch['label']
#     return inputs

# def collate_fn(examples):
#     pixel_values = torch.stack([example["pixel_values"] for example in examples])
#     labels = torch.tensor([example["label"] for example in examples])
#     return {"pixel_values": pixel_values, "labels": labels}


# print(test_ds[0].keys(), test_ds[0])
test_ds.set_transform(preprocess)
train_ds.set_transform(preprocess)

test_loader = DataLoader(test_ds, batch_size=64, shuffle=True)
train_loader = DataLoader(train_ds, batch_size=64)

# print(test_ds[0].keys(), test_ds[0])
# for a in test_loader:
#   print(list(a.items())[0])
#   break

In [12]:
#@title LoRA using PEFT
lora_config = LoraConfig(
    r=4,
    lora_alpha=32,
    lora_dropout=0.1,
    target_modules=["query", "value"],
    bias="none"
)

lora_model = get_peft_model(pretrained_model_2, lora_config)
lora_model.print_trainable_parameters()

trainable params: 147,456 || all params: 86,715,112 || trainable%: 0.1700


In [None]:
batch_size = 64
training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=5,
    per_device_train_batch_size=64,
    learning_rate=0.0002,
    weight_decay=0.01,
    warmup_ratio=0.1,
    fp16=True,
    remove_unused_columns=False
)

lora_trainer = Trainer(
    model=lora_model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=test_ds
)

finetune_trainer = Trainer(
    model=pretrained_model_2,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=test_ds
)

untrained_trainer = Trainer(
    model=untrained_model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=test_ds
)

# Train the models
lora_trainer.train()
finetune_trainer.train()
untrained_trainer.train()

Step,Training Loss


In [None]:
#@title Evaluate Vision Transformers
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

lora_model.to(device)
pretrained_model_2.to(device)
untrained_model.to(device)
def eval_model(model, dataloader, model_type):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for batch in dataloader:
            # inputs = {}
            # for key, val in batch.items():
            #     print(type(val))
            #     print(val)
            #     try:
            #         inputs[key] = val.to(device)
            #     except:
            #         inputs[key] = torch.Tensor(val).to(device)
            inputs = {key: val.to(device) for key, val in batch.items()}
            outputs = model(**inputs)
            logits = outputs.logits
            predictions = torch.argmax(logits, dim=-1)

            # print(batch.keys())

            labels = batch['labels'].to(device)
            correct += (predictions == labels).sum().item()
            total += labels.size(0)
        accuracy = correct / total
    print(f'Accuracy on test split for model trained {model_type}: {accuracy*100:.4f}%')

eval_model(lora_model, test_loader, "with LoRA")

eval_model(pretrained_model_2, test_loader, "with full fine-tuning")

eval_model(untrained_model, test_loader, "from random initialization")