<a href="https://colab.research.google.com/github/OrsolaMBorrini/rcm-thesis/blob/main/finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Finetuning the ViT model

In this notebook, I am finetuning the ViT machine learning model using the `transformers` library from 🤗 HF.

## Dataset
The temporary dataset I am using is the [**rcm dataset**](https://huggingface.co/datasets/ombrr/rcm), created locally on my system and uploaded on the 🤗 HF Hub.
The structure of the dataset is the following:
```
rcm-1
│
├── train
│   ├── sensitive
│   │   ├── Q_017042.jpg
│   │   └── ...
│   ├── not-sensitive
│   │   ├── Q_017043.jpg
│   │   └── ...
│   └── dubious
│       ├── Q_017044.jpg
│       └── ...
│
├── validation
│   ├── sensitive
│   │   ├── Q_017045.jpg
│   │   └── ...
│   ├── not-sensitive
│   │   ├── Q_017046.jpg
│   │   └── ...
│   └── dubious
│       ├── Q_017047.jpg
│       └── ...
│
├── test
│   ├── sensitive
│   │   ├── Q_017042.jpg
│   │   └── ...
│   ├── not-sensitive
│   │   ├── Q_017043.jpg
│   │   └── ...
│   └── dubious
│       ├── Q_017044.jpg
│       └── ...
│
└── dataset.csv
```


## Inspecting the dataset

In [1]:
# Install necessary libraries
!pip install -U datasets
!pip install transformers evaluate
!pip install accelerate -U

Collecting datasets
  Downloading datasets-2.14.6-py3-none-any.whl (493 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m493.7/493.7 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.8,>=0.3.0 (from datasets)
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.15-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0.0,>=0.14.0 (from datasets)
  Downloading huggingface_hub-0.19.0-py3-none-any.whl (311 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.2/311.2 kB[0m [31m44.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: dill, multiprocess, huggingface-hub, datasets
Successfully installed datasets-2.1

In [2]:
# --- The dataset on the HF hub is now set as 'public' to simplify the access to it in this testing phase
from huggingface_hub import login
login(token="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")

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
from datasets import load_dataset

train_dataset = load_dataset("ombrr/rcm", split="train", streaming=True)
train_dataset

#validation_dataset = load_dataset("ombrr/rcm", split="validation")

In [None]:
train_dataset.features

In [None]:
for example in train_dataset:
  example['image'].resize((200,200))
  break



In [None]:
labels = train_dataset.features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
    label2id[label] = str(i)
    id2label[str(i)] = label

In [None]:
id2label[str(0)]

In [None]:
id2label[str(1)]

In [None]:
id2label[str(2)]

In [None]:
from transformers import AutoImageProcessor, ViTForImageClassification
import evaluate

model_checkpoint = "google/vit-base-patch16-224-in21k"
batch_size = 32

image_processor = AutoImageProcessor.from_pretrained(model_checkpoint)
image_processor

In [None]:
from transformers import AutoModelForImageClassification, TrainingArguments, Trainer

model = AutoModelForImageClassification.from_pretrained(
    model_checkpoint,
    label2id = label2id,
    id2label = id2label,
    ignore_mismatched_sizes = True
)

In [None]:
model_name = model_checkpoint.split("/")[-1]

args = TrainingArguments(
    f"{model_name}-finetuned-eurosat",
    remove_unused_columns=False,
    evaluation_strategy = "epoch",
    save_strategy = "epoch",
    learning_rate=5e-5,
    per_device_train_batch_size=batch_size,
    gradient_accumulation_steps=4,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=3,
    max_steps = 47,
    warmup_ratio=0.1,
    logging_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="accuracy",
    push_to_hub=False,
)

In [None]:
import numpy as np

# the compute_metrics function takes a Named Tuple as input:
# predictions, which are the logits of the model as Numpy arrays,
# and label_ids, which are the ground-truth labels as Numpy arrays.
def compute_metrics(eval_pred):
    """Computes accuracy on a batch of predictions"""
    predictions = np.argmax(eval_pred.predictions, axis=1)
    return metric.compute(predictions=predictions, references=eval_pred.label_ids)

In [None]:
'''import torch

def collate_fn(examples):
    pixel_values = torch.stack([example["pixel_values"] for example in examples])
    labels = torch.tensor([example["label"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}

'''

In [None]:
trainer = Trainer(
    model,
    args,
    train_dataset=train_dataset,
    #eval_dataset=val_ds,
    tokenizer=image_processor,
    compute_metrics=compute_metrics,
    #data_collator=collate_fn,
)

In [None]:
train_results = trainer.train()
# rest is optional but nice to have
trainer.save_model()
trainer.log_metrics("train", train_results.metrics)
trainer.save_metrics("train", train_results.metrics)
trainer.save_state()