<a href="https://www.kaggle.com/code/aisuko/image-classification?scriptVersionId=164960899" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Overview

Image classification assigns a label or class to an image. Unlike text or audio classification, the inputs are the pixel values that comprise an image. There are many applications for image classification, such as:

* Detecting damage after a natural disaster
* Monitoring crop health
* Helping screen medical images for signs of disease

In [1]:
%%capture
!pip install transformers==4.35.2
!pip install accelerate==0.25.0
!pip install datasets==2.15.0
!pip install evaluate==0.4.1

In [2]:
import os
from huggingface_hub import login
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()

login(token=user_secrets.get_secret("HUGGINGFACE_TOKEN"))

os.environ["WANDB_API_KEY"]=user_secrets.get_secret("WANDB_API_KEY")
os.environ["WANDB_PROJECT"] = "Fine-tune-models"
os.environ["WANDB_NOTES"] = "Fine tune model distilbert base uncased"
os.environ["WANDB_NAME"] = "ft-vit-with-food-101"
os.environ["MODEL_NAME"] = "google/vit-base-patch16-224-in21k"

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [3]:
!accelerate estimate-memory ${MODEL_NAME} --library-name transformers

usage: accelerate <command> [<args>]
Accelerate CLI tool: error: unrecognized arguments: --library-name transformers


# Loading Dataset

Each example in the dataset has two fields:
* image: a PIL image of the food item
* label: the label class of the food item

In [4]:
from datasets import load_dataset

food=load_dataset("food101", split="train[:500]")

Downloading readme:   0%|          | 0.00/10.5k [00:00<?, ?B/s]



Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/490M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/464M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/472M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/464M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/475M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/470M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/478M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/486M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/423M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/413M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/426M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/75750 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/25250 [00:00<?, ? examples/s]

In [5]:
food=food.train_test_split(test_size=0.2)
food["train"][0]

{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=512x384>,
 'label': 6}

In [6]:
labels=food["train"].features["label"].names
label2id, id2label=dict(), dict()
for i, label in enumerate(labels):
    label2id[label]=str(i)
    id2label[str(i)]=label
    
id2label[str(79)]

'prime_rib'

# Preprocess data

THe next step is to load a VIT image processor to process the image into a tensor:

In [7]:
from transformers import AutoImageProcessor

image_processor=AutoImageProcessor.from_pretrained(os.getenv('MODEL_NAME'))

preprocessor_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/502 [00:00<?, ?B/s]

Apply some image transformations to the images to make the model more robust against overfitting. Crop a random part of the image, resize it, and normalize it with the image mean and stansard deviation. And create a preprocessing function to apply the transforms and return the pixel_values

In [8]:
from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor

normalize=Normalize(mean=image_processor.image_mean, std=image_processor.image_std)

size=(
    image_processor.size["shortest_edge"] if "shortest_edge" in image_processor.size else (image_processor.size["height"], image_processor.size["width"])
)

_transforms=Compose([RandomResizedCrop(size), ToTensor(), normalize])


def transforms(examples):
    examples["pixel_values"]=[_transforms(img.convert("RGB")) for img in examples["image"]]
    del examples["image"]
    return examples

In [9]:
food=food.with_transform(transforms)

## Loading Default Batch Strategy

In [10]:
from transformers import DefaultDataCollator

data_collator=DefaultDataCollator()

# Evaluate

In [11]:
import evaluate
import numpy as np

accuracy=evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels=eval_pred
    predictions=np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

# Training

In [12]:
from transformers import AutoModelForImageClassification, TrainingArguments, Trainer

model=AutoModelForImageClassification.from_pretrained(
    os.getenv('MODEL_NAME'),
    num_labels=len(labels),
    id2label=id2label,
    label2id=label2id,
)

model.safetensors:   0%|          | 0.00/346M [00:00<?, ?B/s]

Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [13]:
training_args=TrainingArguments(
    output_dir=os.getenv("WANDB_NAME"),
    remove_unused_columns=False,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-5,
    fp16=True,
    per_device_train_batch_size=32,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=32,
    num_train_epochs=2, # minimal value for test quickly
    warmup_ratio=0.1,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    push_to_hub=False,
    report_to="wandb",
    run_name=os.getenv("WANDB_NAME"),
)

trainer=Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=food["train"],
    eval_dataset=food["test"],
    tokenizer=image_processor,
    compute_metrics=compute_metrics,
)

trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33murakiny[0m ([33mcausal_language_trainer[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: wandb version 0.16.3 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade
[34m[1mwandb[0m: Tracking run with wandb version 0.16.1
[34m[1mwandb[0m: Run data is saved locally in [35m[1m/kaggle/working/wandb/run-20240301_062914-rt0hfs0h[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.
[34m[1mwandb[0m: Syncing run [33mft-vit-with-food-101[0m
[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/causal_language_trainer/Fine-tune-models[0m
[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/causal_language_trainer/Fine-tune-models/runs/rt0hfs0h[0m


Epoch,Training Loss,Validation Loss,Accuracy
0,No log,4.594155,0.0
1,No log,4.409156,0.5




TrainOutput(global_step=2, training_loss=4.599879264831543, metrics={'train_runtime': 50.1656, 'train_samples_per_second': 15.947, 'train_steps_per_second': 0.04, 'total_flos': 3.59881873956864e+16, 'train_loss': 4.599879264831543, 'epoch': 1.14})

In [14]:
import math

eval_results=trainer.evaluate()
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")



Perplexity: 82.35


In [15]:
kwargs={
    'model_name': os.getenv('WANDB_NAME'),
    'finetuned_from': os.getenv('MODEL_NAME'),
    'tasks': 'Image Classification',
#     'dataset_tags':'',
    'dataset':'food101'
}

image_processor.push_to_hub(os.getenv("WANDB_NAME"))
trainer.push_to_hub(**kwargs)

model.safetensors:   0%|          | 0.00/344M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.16k [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

'https://huggingface.co/aisuko/ft-vit-with-food-101/tree/main/'

# Inference

In [16]:
ds=load_dataset("food101", split="validation[:10]")
image=ds["image"][0]

In [17]:
from transformers import pipeline

classifier=pipeline("image-classification", model=os.getenv("WANDB_NAME"), device="cuda")
print(classifier.device)
classifier(image)

cuda


[{'score': 0.012230444699525833, 'label': 'omelette'},
 {'score': 0.012047692202031612, 'label': 'fish_and_chips'},
 {'score': 0.01182125136256218, 'label': 'macaroni_and_cheese'},
 {'score': 0.01176254078745842, 'label': 'deviled_eggs'},
 {'score': 0.011584739200770855, 'label': 'chocolate_mousse'}]

# With PyTorch

In [18]:
from transformers import AutoImageProcessor
import torch

image_processor=AutoImageProcessor.from_pretrained(os.getenv("WANDB_NAME"), device_map="cuda")
inputs=image_processor(image, return_tensors="pt")

In [19]:
model=AutoModelForImageClassification.from_pretrained(os.getenv("WANDB_NAME"), device_map="auto")
with torch.no_grad():
    logits=model(**inputs).logits

In [20]:
predicted_label=logits.detach().argmax(-1).item()
model.config.id2label[predicted_label]

'omelette'