## Hyper Kvasir Labeled Images: Image Classification

Dataset Source: https://huggingface.co/datasets/sahilur/hyper-kvasir-labeled-images

#### Install Necessary Libraries

In [1]:
%pip install torch datasets
%pip install transformers evaluate
%pip install accelerate -U
%pip install peft

Collecting datasets
  Downloading datasets-2.13.1-py3-none-any.whl (486 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m486.2/486.2 kB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.7,>=0.3.0 (from datasets)
  Downloading dill-0.3.6-py3-none-any.whl (110 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m14.9 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m212.5/212.5 kB[0m [31m24.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downloading multiprocess-0.70.14-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.3/134.3 kB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0.0,>=0.11.0 (from datasets)
  Downloading huggingface_hub-0.16.4-py3-none-a

#### Import Necessary Libraries

In [2]:
import os, sys, random, shutil
os.environ['TOKENIZERS_PARALLELISM']='false'

from PIL import ImageDraw, ImageFont, Image
import PIL.Image

from tqdm import tqdm

import numpy as np
import pandas as pd

import datasets
from datasets import load_dataset, Image, DatasetDict, ClassLabel

import transformers
from transformers import Trainer, TrainingArguments
from transformers import ViTForImageClassification, ViTImageProcessor

import torch
from torchvision.transforms import (
    CenterCrop,
    Compose,
    Normalize,
    RandomHorizontalFlip,
    RandomResizedCrop,
    Resize,
    ToTensor,
)

import peft
from peft import LoraConfig, get_peft_model

import evaluate

!git lfs install

Error: Failed to call git rev-parse --git-dir: exit status 128 
Git LFS initialized.


#### Enter HuggingFace Access Token

In [3]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


#### Display Versions of Relevant Libraries

In [4]:
print("Python :".rjust(15), sys.version[0:6])
print("NumPy :".rjust(15), np.__version__)
print("Pandas :".rjust(15), pd.__version__)
print("Datasets :".rjust(15), datasets.__version__)
print("Evaluate :".rjust(15), evaluate.__version__)
print("Transformers :".rjust(15), transformers.__version__)
print("Torch :".rjust(15), torch.__version__)
print("PEFT :".rjust(15), torch.__version__)

       Python : 3.10.1
        NumPy : 1.22.4
       Pandas : 1.5.3
     Datasets : 2.13.1
     Evaluate : 0.4.0
 Transformers : 4.30.2
        Torch : 2.0.1+cu118
         PEFT : 2.0.1+cu118


#### Ingest Dataset

In [5]:
ds = load_dataset("sahilur/hyper-kvasir-labeled-images")

ds = ds.rename_column("label", "labels")

ds

Downloading readme:   0%|          | 0.00/151 [00:00<?, ?B/s]

Downloading and preparing dataset imagefolder/sahilur--hyper-kvasir-labeled-images to /root/.cache/huggingface/datasets/sahilur___imagefolder/sahilur--hyper-kvasir-labeled-images-9086a7ef193fa3a6/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f...


Downloading data files: 0it [00:00, ?it/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/3.14G [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data files: 0it [00:00, ?it/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/393M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data files: 0it [00:00, ?it/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/389M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Dataset imagefolder downloaded and prepared to /root/.cache/huggingface/datasets/sahilur___imagefolder/sahilur--hyper-kvasir-labeled-images-9086a7ef193fa3a6/0.0.0/37fbb85cc714a338bea574ac6c7d0b5be5aff46c1862c1989b20e0771199e93f. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['image', 'labels'],
        num_rows: 8528
    })
    validation: Dataset({
        features: ['image', 'labels'],
        num_rows: 1069
    })
    test: Dataset({
        features: ['image', 'labels'],
        num_rows: 1065
    })
})

#### Display Grid of Examples From Each Class to Gain Better Picture of Data

In [12]:
def show_grid_of_examples(ds,
                          seed: int = 42,
                          examples_per_class: int = 3,
                          size=(350, 350)):
    '''
    This function displays a few pictures
    from each class in the dataset.
    '''
    w, h = size
    labels = ds['train'].features['labels'].names
    grid = PIL.Image.new(mode='RGB', size=(examples_per_class * w, len(labels) * h))
    draw = ImageDraw.Draw(grid)

    for label_id, label in enumerate(labels):
        # filter the dataset by a single label, shuffle it, then grab a few samples
        ds_slice = ds['train'] \
                    .filter(lambda ex: ex['labels'] == label_id) \
                    .shuffle(seed) \
                    .select(range(examples_per_class))

        # plot this label's examples in a row
        for i, example in enumerate(ds_slice):
            image = example['image']
            idx = examples_per_class * label_id + i
            box = (idx % examples_per_class * w, idx // examples_per_class * h)
            grid.paste(image.resize(size), box=box)
            draw.text(box, label, (255, 255, 255), dill=(0,0,255,1.0))

    return grid

In [32]:
#show_grid_of_examples(ds, seed=42, examples_per_class=3)

"""
Because there are 23 classes and this function would take
over 30 minutes to complete on the Google Colab's GPU, I
want to save the GPU access time for what will save me the
most time overall. Thus, I will run this locally later.
"""

"\nBecause there are 23 classes and this function would take \nover 30 minutes to complete on the Google Colab's GPU, I \nwant to save the GPU access time for what will save me the \nmost time overall. Thus, I will run this locally later.\n"

#### Basic Values/Constants

In [14]:
MODEL_CKPT = 'google/vit-large-patch32-384'
MODEL_NAME = MODEL_CKPT.split(f'/')[-1] + "-Hyper_Kvasir_Labeled_Images"

NUM_OF_EPOCHS = 8
LEARNING_RATE = 5e-3

STEPS = 150
BATCH_SIZE = 64

GRAD_ACC_STEPS = 4

DEVICE = torch.device("cuda")
REPORTS_TO = 'tensorboard'

#### Load ViT Feature Extractor

In [15]:
image_processor = ViTImageProcessor.from_pretrained(MODEL_CKPT)

Downloading (…)rocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

#### Define Transformations For Both Training & Evaluation Datasets

In [16]:
normalize_image = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)

train_transforms = Compose(
    [
        RandomResizedCrop(image_processor.size["height"]),
        RandomHorizontalFlip(),
        ToTensor(),
        normalize_image,
    ]
)

eval_transforms = Compose(
    [
        Resize(image_processor.size["height"]),
        CenterCrop(image_processor.size["height"]),
        ToTensor(),
        normalize_image,
    ]
)

#### Define Functions to Apply Transformations to Datasets

In [17]:
def preprocess_train_dataset(sample_batch):
    """
    This method applies the train_transforms
    across a batch of training samples
    """
    sample_batch["pixel_values"] = [train_transforms(train_image.convert("RGB"))
                                    for train_image in sample_batch["image"]]
    return sample_batch


def preprocess_eval_dataset(sample_batch):
    """
    This method applies the eval_transforms
    across a batch of evaluation samples.
    """
    sample_batch["pixel_values"] = [eval_transforms(eval_image.convert("RGB"))
                                    for eval_image in sample_batch["image"]]
    return sample_batch

#### Apply Transform Functions to Dataset

In [18]:
prepped_train_ds = ds['train'].with_transform(preprocess_train_dataset)
prepped_eval_ds = ds['validation'].with_transform(preprocess_eval_dataset)

#### Define Function to Display Parameter Information

In [19]:
# Define Helper Function to Check Total Number of Model Parameters
# Also, return Number of Trainable Parameters

def print_parameters_information(model):
    trainable_parameters = 0
    all_parameters = 0

    for _, parameters in model.named_parameters():
        all_parameters += parameters.numel()
        if parameters.requires_grad:
            trainable_parameters += parameters.numel()

    print(f'Trainable Parameters: {trainable_parameters} ' + \
          f'|| All Parameters: {all_parameters} ' + \
          f'|| Trainable %: {round(trainable_parameters / all_parameters * 100, 2)}')

#### Define Data Collator

In [20]:
def data_collator(batch):
    return {
        'pixel_values' : torch.stack([x['pixel_values'] for x in batch]),
        'labels' : torch.tensor([x['labels'] for x in batch])
    }

#### Create List of Label Values & Dictionaries to Convert Between String & Integer Data Types

In [21]:
unique_label_values = ds['train'].features['labels'].names

NUM_OF_LABELS = len(unique_label_values)

id2label = {str(i): c for i, c in enumerate(unique_label_values)}
label2id = {c: str(i) for i, c in enumerate(unique_label_values)}

print(f"List of Unique Label Values: \n{unique_label_values}\n")
print(f"Number of Unique Label Values: \n{NUM_OF_LABELS}\n")
print(f"id2label: \n{id2label}\n")
print(f"label2id: \n{label2id}")

List of Unique Label Values: 
['barretts', 'barretts-short-segment', 'bbps-0-1', 'bbps-2-3', 'cecum', 'dyed-lifted-polyps', 'dyed-resection-margins', 'esophagitis-a', 'esophagitis-b-d', 'hemorrhoids', 'ileum', 'impacted-stool', 'polyps', 'pylorus', 'retroflex-rectum', 'retroflex-stomach', 'ulcerative-colitis-grade-0-1', 'ulcerative-colitis-grade-1', 'ulcerative-colitis-grade-1-2', 'ulcerative-colitis-grade-2', 'ulcerative-colitis-grade-2-3', 'ulcerative-colitis-grade-3', 'z-line']

Number of Unique Label Values: 
23

id2label: 
{'0': 'barretts', '1': 'barretts-short-segment', '2': 'bbps-0-1', '3': 'bbps-2-3', '4': 'cecum', '5': 'dyed-lifted-polyps', '6': 'dyed-resection-margins', '7': 'esophagitis-a', '8': 'esophagitis-b-d', '9': 'hemorrhoids', '10': 'ileum', '11': 'impacted-stool', '12': 'polyps', '13': 'pylorus', '14': 'retroflex-rectum', '15': 'retroflex-stomach', '16': 'ulcerative-colitis-grade-0-1', '17': 'ulcerative-colitis-grade-1', '18': 'ulcerative-colitis-grade-1-2', '19': 'u

#### Load Pretrained Model

In [22]:
model = ViTForImageClassification.from_pretrained(
    MODEL_CKPT,
    num_labels=NUM_OF_LABELS,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True
).to(DEVICE)

Downloading (…)lve/main/config.json:   0%|          | 0.00/69.7k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.23G [00:00<?, ?B/s]

Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-large-patch32-384 and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([1000, 1024]) in the checkpoint and torch.Size([23, 1024]) in the model instantiated
- classifier.bias: found shape torch.Size([1000]) in the checkpoint and torch.Size([23]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


#### Print Original Parameters Information

In [23]:
print_parameters_information(model)

Trainable Parameters: 305631255 || All Parameters: 305631255 || Trainable %: 100.0


#### Define Function to Compute Metric

In [24]:
def compute_metrics(p):
    accuracy_metric = evaluate.load("accuracy")
    accuracy = accuracy_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)['accuracy']

    ### ------------------- F1 scores -------------------

    f1_score_metric = evaluate.load("f1")
    weighted_f1_score = f1_score_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='weighted')["f1"]
    micro_f1_score = f1_score_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='micro')['f1']
    macro_f1_score = f1_score_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='macro')["f1"]

    ### ------------------- recall -------------------

    recall_metric = evaluate.load("recall")
    weighted_recall = recall_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='weighted')["recall"]
    micro_recall = recall_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='micro')["recall"]
    macro_recall = recall_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='macro')["recall"]

    ### ------------------- precision -------------------

    precision_metric = evaluate.load("precision")
    weighted_precision = precision_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='weighted')["precision"]
    micro_precision = precision_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='micro')["precision"]
    macro_precision = precision_metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids, average='macro')["precision"]

    return {"accuracy" : accuracy,
            "Weighted F1" : weighted_f1_score,
            "Micro F1" : micro_f1_score,
            "Macro F1" : macro_f1_score,
            "Weighted Recall" : weighted_recall,
            "Micro Recall" : micro_recall,
            "Macro Recall" : macro_recall,
            "Weighted Precision" : weighted_precision,
            "Micro Precision" : micro_precision,
            "Macro Precision" : macro_precision
            }

#### Define PEFT Configuration

In [25]:
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    target_modules=["query", "value"],
    lora_dropout=0.1,
    bias="none",
    modules_to_save=['classifier']
)

#### Instantiate PEFT/LoRA Model

In [26]:
lora_model = get_peft_model(model, peft_config)

print_parameters_information(lora_model)

Trainable Parameters: 1620014 || All Parameters: 307227694 || Trainable %: 0.53


#### Define Training Arguments

In [27]:
args = TrainingArguments(
    output_dir=MODEL_NAME,
    remove_unused_columns=False,
    num_train_epochs=NUM_OF_EPOCHS,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    #per_device_train_batch_size=BATCH_SIZE,
    auto_find_batch_size=True,
    learning_rate=LEARNING_RATE,
    report_to=REPORTS_TO,
    disable_tqdm=False,
    logging_first_step=True,
    label_names=['labels'],
    gradient_accumulation_steps=GRAD_ACC_STEPS,
    hub_private_repo=True,
    fp16=True,
    push_to_hub=True
)

#### Instantiate Trainer

In [28]:
trainer = Trainer(
    model = lora_model,
    args = args,
    data_collator = data_collator,
    compute_metrics = compute_metrics,
    train_dataset = prepped_train_ds,
    eval_dataset = prepped_eval_ds,
    tokenizer = image_processor,
)

Cloning https://huggingface.co/DunnBC22/vit-large-patch32-384-Hyper_Kvasir_Labeled_Images into local empty directory.


#### Train Model

In [29]:
train_results = trainer.train()



Epoch,Training Loss,Validation Loss,Accuracy,Weighted f1,Micro f1,Macro f1,Weighted recall,Micro recall,Macro recall,Weighted precision,Micro precision,Macro precision
0,3.3339,0.613803,0.796071,0.768104,0.796071,0.481403,0.796071,0.796071,0.499431,0.806754,0.796071,0.523521
2,0.8462,0.493526,0.836296,0.821976,0.836296,0.54951,0.836296,0.836296,0.557229,0.820376,0.836296,0.558859
2,0.8462,0.460053,0.854069,0.83612,0.854069,0.55002,0.854069,0.854069,0.547678,0.844973,0.854069,0.594605
4,0.6255,0.402943,0.862488,0.846666,0.862488,0.549936,0.862488,0.862488,0.560962,0.84004,0.862488,0.561316
4,0.6255,0.380568,0.881197,0.869304,0.881197,0.586758,0.881197,0.881197,0.605131,0.867091,0.881197,0.583923
6,0.5022,0.36542,0.869972,0.86423,0.869972,0.593222,0.869972,0.869972,0.583695,0.871213,0.869972,0.640053
6,0.5022,0.331258,0.877456,0.86503,0.877456,0.575,0.877456,0.877456,0.576908,0.86104,0.877456,0.580746
7,0.3822,0.331823,0.875585,0.868136,0.875585,0.577786,0.875585,0.875585,0.58231,0.861945,0.875585,0.574631


Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/6.77k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.36k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/7.55k [00:00<?, ?B/s]

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


#### Save & Log Model

In [30]:
trainer.save_model()
trainer.log_metrics("train", train_results.metrics)
trainer.save_metrics("train", train_results.metrics)
trainer.save_state()

Upload file adapter_model.bin:   1%|          | 32.0k/6.12M [00:00<?, ?B/s]

Upload file runs/Jul16_17-47-10_b9f308dd2e89/events.out.tfevents.1689529636.b9f308dd2e89.684.0: 100%|#########…

To https://huggingface.co/DunnBC22/vit-large-patch32-384-Hyper_Kvasir_Labeled_Images
   02feb87..d704616  main -> main

   02feb87..d704616  main -> main

To https://huggingface.co/DunnBC22/vit-large-patch32-384-Hyper_Kvasir_Labeled_Images
   d704616..3c3ac39  main -> main

   d704616..3c3ac39  main -> main



***** train metrics *****
  epoch                    =          7.98
  total_flos               = 51715061679GF
  train_loss               =        0.5749
  train_runtime            =    1:04:07.16
  train_samples_per_second =        17.734
  train_steps_per_second   =         0.553


#### Push Model to Hub (My Profile!)

In [31]:
kwargs = {
    "finetuned_from" : model.config._name_or_path,
    "tasks" : "image-classification",
    "tags" : ["image-classification"],
}

if args.push_to_hub:
    trainer.push_to_hub("All Dunn!!!")
else:
    trainer.create_model_card(**kwargs)

To https://huggingface.co/DunnBC22/vit-large-patch32-384-Hyper_Kvasir_Labeled_Images
   3c3ac39..5a6bc70  main -> main

   3c3ac39..5a6bc70  main -> main



### Notes & Other Takeaways
****
- This project uses PEFT/LoRA to speed up fine-tuning as well as a larger version of the Vision Transformer checkpoint.

- While I am not happy with the end result, I think this project could have better results if I trained it for longer. I may return later (after I finish my current backlog of projects) and train it for 15-20 epochs instead of 8 epochs. I will also make sure to account for imbalanced classes/labels.

****

### Citations

- Model Checkpoint

  > @misc{wu2020visual, title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision}, author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda}, year={2020}, eprint={2006.03677}, archivePrefix={arXiv}, primaryClass={cs.CV}}

  > @inproceedings{deng2009imagenet, title={Imagenet: A large-scale hierarchical image database}, author={Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li}, booktitle={2009 IEEE conference on computer vision and pattern recognition}, pages={248--255}, year={2009}, organization={Ieee}}

- Dataset
  > Unfortunately, the best information about the source of this dataset was the link that I retrieved it from: https://huggingface.co/datasets/sahilur/hyper-kvasir-labeled-images.