This notebook is heavily based on HuggingFace's Transformers tutorial. 
https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_the_%F0%9F%A4%97_Trainer.ipynb


The notebook has been adapted to work with the phytolith images and carry out an evaluation by 10x10 cross validation compatible with the rest of the experiments. The rest of the notebook remains unchanged from the original tutorial

The images are loaded using a zip file, the notebook is self explanatory.

Works on Google Colab

Let's start by installing the relevant libraries.

## installing the relevant libraries




In [None]:
!pip install pickle5

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pickle5
  Downloading pickle5-0.0.12-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (256 kB)
[K     |████████████████████████████████| 256 kB 16.3 MB/s 
[?25hInstalling collected packages: pickle5
Successfully installed pickle5-0.0.12


In [None]:
!pip install -q transformers datasets

[K     |████████████████████████████████| 4.4 MB 31.9 MB/s 
[K     |████████████████████████████████| 365 kB 56.3 MB/s 
[K     |████████████████████████████████| 596 kB 47.9 MB/s 
[K     |████████████████████████████████| 101 kB 14.4 MB/s 
[K     |████████████████████████████████| 6.6 MB 55.5 MB/s 
[K     |████████████████████████████████| 140 kB 74.8 MB/s 
[K     |████████████████████████████████| 1.1 MB 58.3 MB/s 
[K     |████████████████████████████████| 212 kB 49.4 MB/s 
[K     |████████████████████████████████| 127 kB 57.7 MB/s 
[K     |████████████████████████████████| 144 kB 52.6 MB/s 
[K     |████████████████████████████████| 94 kB 4.1 MB/s 
[K     |████████████████████████████████| 271 kB 52.7 MB/s 
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible

## Loading Phytoliths data

A file named "images.zip" should be loaded into you Google Drive.
This images.zip can be found in out github repository



The zip contains all the images used in the experiment. Images are renamed in a way that indicates to the cross-validation procedure in which test fold each image is used in each repetition. A 10x10 cross validation is used.



In [None]:
# using Google Drive as a drive unit in Colab
from google.colab import drive
drive.mount('/content/drive',force_remount=True)

Mounted at /content/drive


In [None]:
# The file should be previously uploaded to Google Drive
# Por example in /Fitolitos/images.zip
import zipfile
import os 
zip_path = './drive/MyDrive/fitolitos/images.zip'
exists = os.path.isfile(zip_path)
if not exists:
  print("Upload to Google Drive the file images.zip")


with zipfile.ZipFile(zip_path, mode="r") as archive:
  archive.extractall("./")

## Utilities

In [None]:
path_images = "raw_images"
path_images_folds ="raw_images_fold"

model_id_base = "google/vit-base-patch16-224-in21k"
model_id_large = 'google/vit-large-patch16-224-in21k'


id2label = {0: 'Globular',
            1: 'Cross',
            2: 'Bilobate',
            3: 'Trichome',
            4: 'Elongate',
            5: 'Rondel-Trapeziform',
            6: 'Saddle',
            7: 'Bulliform'}

label2id = {'Bilobate': 2,
            'Bulliform': 7,
            'Cross': 1,
            'Elongate': 4,
            'Globular': 0,
            'Rondel-Trapeziform': 5,
            'Saddle': 6,
            'Trichome': 3}

# exchange model base and model large
#model_id = model_id_base
model_id = model_id_large

In [None]:
from transformers import ViTFeatureExtractor

feature_extractor = ViTFeatureExtractor.from_pretrained(model_id)

Downloading:   0%|          | 0.00/160 [00:00<?, ?B/s]

In [None]:
import datasets 
import os
import shutil

'''
These functions create a file structure with the images divided into train and 
test folders in order to create HuggingFace datasets.
Uncompressed images from the "images.zip" file are used 
'''

def get_info_from_images(root_fold,n_repetitions):
  img_files = os.listdir( root_fold )
  all_info = []
  # This would print all the files and directories
  for img in img_files:
     all_info.append(img.split("#"))

  df = pd.DataFrame(all_info)
  df.columns = [f"Test_Fold{i}" for i in range(n_repetitions)]+["Class","Name"]
  return df

def create_folder_images(df, root_fold, new_fold, rep, fold):
  columns = list(df.columns)  
  repetitions = [int(c.replace('Test_Fold','')) for c in columns if "Test_Fold" in c]
  partitions = np.sort(df["Test_Fold0"].unique())
  classes = np.sort(df["Class"].unique())
  test_df = df[df[f"Test_Fold{rep}"]==str(fold)]
  #display(test_df)
  train_df = df[~(df[f"Test_Fold{rep}"]==str(fold))]
  #display(train_df)

  if os.path.exists(new_fold) and os.path.isdir(new_fold):
    shutil.rmtree(new_fold)
  os.mkdir(f"{new_fold}")
  os.mkdir(f"{new_fold}{os.sep}train")
  os.mkdir(f"{new_fold}{os.sep}test")
  for c in classes:
      os.mkdir(f"{new_fold}{os.sep}train{os.sep}{c}")
      os.mkdir(f"{new_fold}{os.sep}test{os.sep}{c}")

  dfs = [("train",train_df),("test",test_df)]
  for name_df,df in dfs:
    for i in range(df.shape[0]):
      img_info = df.iloc[i]
      class_i = img_info.Class
      name = img_info.Name
      src = [f"{f}#" for f in list(img_info.values)]
      src = "".join(src)
      src = f"{root_fold}{os.sep}{src[:-1]}"
      dst =f"{new_fold}{os.sep}{name_df}{os.sep}{class_i}{os.sep}{name}"
      #print(src)
      #print(dst)
      shutil.copyfile(src, dst)

def create_image_folder_dataset(root_path):
  """creates `Dataset` from image folder structure"""
  
  # get class names by folders names
  _CLASS_NAMES= os.listdir(root_path)
  # defines `datasets` features`
  features=datasets.Features({
                      "img": datasets.Image(),
                      "label": datasets.features.ClassLabel(names=_CLASS_NAMES),
                  })
  # temp list holding datapoints for creation
  img_data_files=[]
  label_data_files=[]
  # load images into list for creation
  for img_class in os.listdir(root_path):
    for img in os.listdir(os.path.join(root_path,img_class)):
      path_=os.path.join(root_path,img_class,img)
      img_data_files.append(path_)
      label_data_files.append(img_class)
  # create dataset
  ds = datasets.Dataset.from_dict({"img":img_data_files,"label":label_data_files},features=features)
  return ds

In [None]:
import pandas as pd
import numpy as np

 
# basic processing (only resizing)
def process(examples):
    examples.update(feature_extractor(examples['img'], ))
    return examples

def get_datasets(root_path,folds_path,rep,fold):
  df = get_info_from_images(path_images,10)
  create_folder_images(df,root_path,folds_path,rep,fold)
  train_ds = create_image_folder_dataset(f"{folds_path}/train")
  test_ds = create_image_folder_dataset(f"{folds_path}/test")

  return train_ds, test_ds




In [None]:
from torchvision.transforms import (CenterCrop, 
                                    Compose, 
                                    Normalize, 
                                    RandomHorizontalFlip,
                                    RandomResizedCrop, 
                                    Resize, 
                                    ToTensor)

normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)

_train_transforms = Compose(
        [
            RandomResizedCrop(feature_extractor.size),
            RandomHorizontalFlip(),
            ToTensor(),
            normalize,
        ]
    )

_val_transforms = Compose(
        [
            Resize(feature_extractor.size),
            CenterCrop(feature_extractor.size),
            ToTensor(),
            normalize,
        ]
    )

def train_transforms(examples):
    examples['pixel_values'] = [_train_transforms(image.convert("RGB")) for image in examples['img']]
    return examples

def val_transforms(examples):
    examples['pixel_values'] = [_val_transforms(image.convert("RGB")) for image in examples['img']]
    return examples

In [None]:
def collate_fn(examples):
    pixel_values = torch.stack([example["pixel_values"] for example in examples])
    labels = torch.tensor([example["label"] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels}

In [None]:
from datasets import load_metric
import numpy as np

metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

Downloading builder script:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

In [None]:
from transformers import ViTFeatureExtractor

'''
Gets the datasets for training, validation, and test, corresponding to a given
iteration and partition of the cross-validation.

Necessary to obtain results compatible with those of the rest of the experiments.
'''
def get_and_process_data(rep,fold):
  train_ds, test_ds = get_datasets(path_images,path_images_folds,rep,fold)


  splits = train_ds.train_test_split(test_size=0.1)
  train_ds = splits['train']
  val_ds = splits['test']

  
  feature_extractor = ViTFeatureExtractor.from_pretrained(model_id)

  # Preprocessing the data
  train_ds.set_transform(train_transforms)
  val_ds.set_transform(val_transforms)
  test_ds.set_transform(val_transforms)


  return train_ds, val_ds, test_ds



In [None]:
import matplotlib.pyplot as plt

# for debugging
def show_images(dataset,name,rep,fold):

  dict_clases = {}
  total = len(dataset)

  for data in dataset:
    label = id2label[data["label"]]
    if not label in dict_clases:
      dict_clases[label]=[]
    else:
      dict_clases[label].append(data["img"])

  max_label = 0
  for label in dict_clases:
    if len(dict_clases[label])>max_label:
      max_label = len(dict_clases[label])

  f, axarr = plt.subplots(8,max_label,figsize=(3*max_label,20))
  for i in range(8):
    print(id2label[i])
    img_list = dict_clases[id2label[i]]
    for j,img in enumerate(img_list):
      axarr[i,j].imshow(img)

  f.savefig(f'{name}{rep}{fold}.png', dpi=200) 

'''
t,v,te = get_and_process_data(0,0)
show_images(t,"train",0,0)
show_images(v,"val",0,0)
show_images(te,"test",0,0)

'''

In [None]:
from transformers import ViTForImageClassification
from transformers import TrainingArguments, Trainer
import torch

'''
Creates and configurates the Vision Transformed Model
'''
def create_configure_model(model_id,epocs):
  model = ViTForImageClassification.from_pretrained(model_id,
                                                  num_labels=8,
                                                  id2label=id2label,
                                                  label2id=label2id)
  
  metric_name = "accuracy"

  args = TrainingArguments(
      f"test-fitos",
      save_strategy="epoch",
      evaluation_strategy="epoch",
      learning_rate=2e-5,
      per_device_train_batch_size=10,
      per_device_eval_batch_size=4,
      num_train_epochs=epocs,
      weight_decay=0.001, 
      load_best_model_at_end=True,
      metric_for_best_model=metric_name,
      logging_dir='logs',
      remove_unused_columns=False,
  )

  return model, args




In [None]:
import pickle5 as pickle
import os

'''
It loads results of previous executions, allows experiments to be carried out at 
different times, bypassing the maximum execution time limitations of Google Colab.
'''
results_file_path = f'./drive/MyDrive/fitolitos/{model_id.replace("/","_")}.obj'




def load_results():
  
  results_dict = None
  if os.path.isfile(results_file_path):
    with open(results_file_path, 'rb') as handle:
        results_dict = pickle.load(handle)

    results = list(results_dict.keys())
    for clave in results_dict:
        print(clave)
    return results_dict
  else:
    return {}

In [None]:
'''
Perfom experiments

updates results in the disk

params:
  - dict_model_results Dictionay containing all predictions and real classes
  - reps 
    - all Performs 10x10 cross validation
    - list of repetitions. Perform a specific number of repetitions, a part of the total number of experiments (10x10) 
'''
def perform_experiments(dict_model_results, reps="all"):

  if dict_model_results is None:
    dict_model_results = {}

  epocs = 10
  folds = 10

  


  if reps == "all":
    reps = range(10)

  for repetition in reps:
    for partition in range(folds):

      if not repetition in dict_model_results or not partition in dict_model_results[repetition]:
        
        model, args = create_configure_model(model_id,epocs)
        train_ds, val_ds, test_ds = get_and_process_data(repetition,partition)


        trainer = Trainer(
          model,
          args,
          train_dataset=train_ds,
          eval_dataset=val_ds,
          data_collator=collate_fn,
          compute_metrics=compute_metrics,
          tokenizer=feature_extractor,
        )
        
        print(f"Training {repetition}-{partition}")
        trainer.train()
        
        # retrieve results
        outputs = trainer.predict(test_ds)
        y_true = outputs.label_ids
        y_pred = outputs.predictions.argmax(1)

        y_test = [id2label[i] for i in y_true]
        y_preds = [id2label[i] for i in y_pred] 

        ## delete logs
        shutil.rmtree('./logs')
        shutil.rmtree('./test-fitos')

        if dict_model_results.get(repetition) is None:
            dict_model_results[repetition] = {}
        dict_model_results[repetition][partition]=(y_test,y_preds)

        with open(results_file_path, 'wb') as handle:
              pickle.dump(dict_model_results, handle, protocol=pickle.HIGHEST_PROTOCOL)
              print(f"Sav R{repetition}F{partition}", end=' - ')

      else:
        print(f"Rec R{repetition}F{partition}", end=' - ')
      



   


In [None]:
# obtains the results of the experiments already carried out
res_dict = load_results()
# continue or start from scratch
perform_experiments(res_dict, reps = 'all')

5
6
7
8
9


Downloading:   0%|          | 0.00/504 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.13G [00:00<?, ?B/s]

Some weights of the model checkpoint at google/vit-large-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-large-patch16-224-in21k and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
***** Running training *****
 

Entrenamiento 0-0


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.863856,0.717949
2,No log,0.446009,0.871795
3,No log,0.230164,0.948718
4,No log,0.181832,0.923077
5,No log,0.182719,0.974359
6,No log,0.206408,0.923077
7,No log,0.135341,0.974359
8,No log,0.213462,0.923077
9,No log,0.170597,0.974359
10,No log,0.163096,0.974359


***** Running Evaluation *****
  Num examples = 39
  Batch size = 4
Saving model checkpoint to test-fitos/checkpoint-35
Configuration saved in test-fitos/checkpoint-35/config.json
Model weights saved in test-fitos/checkpoint-35/pytorch_model.bin
Feature extractor saved in test-fitos/checkpoint-35/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 39
  Batch size = 4
Saving model checkpoint to test-fitos/checkpoint-70
Configuration saved in test-fitos/checkpoint-70/config.json
Model weights saved in test-fitos/checkpoint-70/pytorch_model.bin
Feature extractor saved in test-fitos/checkpoint-70/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 39
  Batch size = 4
Saving model checkpoint to test-fitos/checkpoint-105
Configuration saved in test-fitos/checkpoint-105/config.json
Model weights saved in test-fitos/checkpoint-105/pytorch_model.bin
Feature extractor saved in test-fitos/checkpoint-105/preprocessor_config.json
***** Running Evaluation **

loading configuration file https://huggingface.co/google/vit-large-patch16-224-in21k/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/46eeea9e9ba1ab0f72b082ff9a1df2cc3eac17e7ef99558e77d2849c1ec52ff6.21151db1dd2ecfb957206d9314221bc97d606d41ef47ba909c66f5d9a2231a6d
Model config ViTConfig {
  "_name_or_path": "google/vit-large-patch16-224-in21k",
  "architectures": [
    "ViTModel"
  ],
  "attention_probs_dropout_prob": 0.0,
  "encoder_stride": 16,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.0,
  "hidden_size": 1024,
  "id2label": {
    "0": "Globular",
    "1": "Cross",
    "2": "Bilobate",
    "3": "Trichome",
    "4": "Elongate",
    "5": "Rondel-Trapeziform",
    "6": "Saddle",
    "7": "Bulliform"
  },
  "image_size": 224,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "Bilobate": 2,
    "Bulliform": 7,
    "Cross": 1,
    "Elongate": 4,
    "Globular": 0,
    "Rondel-Trapeziform": 5,
    "Saddle": 6,
    "Trichome": 3

Sav R0F0 - 

Some weights of the model checkpoint at google/vit-large-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.weight', 'pooler.dense.bias']
- This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-large-patch16-224-in21k and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
PyTorch: setting up devices
Th

Entrenamiento 0-1


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,0.717109,0.820513
2,No log,0.413601,0.871795
3,No log,0.269253,0.923077


***** Running Evaluation *****
  Num examples = 39
  Batch size = 4
Saving model checkpoint to test-fitos/checkpoint-35
Configuration saved in test-fitos/checkpoint-35/config.json
Model weights saved in test-fitos/checkpoint-35/pytorch_model.bin
Feature extractor saved in test-fitos/checkpoint-35/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 39
  Batch size = 4
Saving model checkpoint to test-fitos/checkpoint-70
Configuration saved in test-fitos/checkpoint-70/config.json
Model weights saved in test-fitos/checkpoint-70/pytorch_model.bin
Feature extractor saved in test-fitos/checkpoint-70/preprocessor_config.json
***** Running Evaluation *****
  Num examples = 39
  Batch size = 4
Saving model checkpoint to test-fitos/checkpoint-105
Configuration saved in test-fitos/checkpoint-105/config.json
Model weights saved in test-fitos/checkpoint-105/pytorch_model.bin
Feature extractor saved in test-fitos/checkpoint-105/preprocessor_config.json


In [None]:
from sklearn.metrics import accuracy_score     
import numpy as np       

'''
Compute mean accuracy from 10x10 cross validation raw results
'''

def get_repetitions(res_dict): 
  return list(res_dict.keys())

def get_accuracy(dict_results):
  accs = []
  #print(type(dict_results),dict_results)
  folds = dict_results.keys()
  for fold in folds:
      y_test,preds = dict_results[fold]
      print("Fold",fold,accuracy_score(y_test,preds))
      accs.append(accuracy_score(y_test,preds))
  accs = np.array(accs)
  return accs.mean()

def get_mean_acc(dict_results):

  labels = list(np.unique(np.array(dict_results[0][0])))
  repetitions = get_repetitions(dict_results)
  accs = []
  for repetition in repetitions:
      print("Total Rep",repetition, get_accuracy(dict_results[repetition]))
      accs.append(get_accuracy(dict_results[repetition]))
  acc = np.array(accs).mean()
  return acc

In [None]:
res_dict = load_results() 
get_mean_acc(res_dict)

0
1
2
3
4
5
6
7
8
Fold 0 0.7906976744186046
Fold 1 0.7906976744186046
Fold 2 0.8604651162790697
Fold 3 0.7674418604651163
Fold 4 0.8372093023255814
Fold 5 0.7906976744186046
Fold 6 0.8604651162790697
Fold 7 0.8837209302325582
Fold 8 0.8604651162790697
Fold 9 0.8809523809523809
Total Rep 0 0.8322812846068659
Fold 0 0.7906976744186046
Fold 1 0.7906976744186046
Fold 2 0.8604651162790697
Fold 3 0.7674418604651163
Fold 4 0.8372093023255814
Fold 5 0.7906976744186046
Fold 6 0.8604651162790697
Fold 7 0.8837209302325582
Fold 8 0.8604651162790697
Fold 9 0.8809523809523809
Fold 0 0.8372093023255814
Fold 1 0.8372093023255814
Fold 2 0.8837209302325582
Fold 3 0.6976744186046512
Fold 4 0.7906976744186046
Fold 5 0.8837209302325582
Fold 6 0.813953488372093
Fold 7 0.813953488372093
Fold 8 0.8604651162790697
Fold 9 0.8809523809523809
Total Rep 1 0.8299557032115171
Fold 0 0.8372093023255814
Fold 1 0.8372093023255814
Fold 2 0.8837209302325582
Fold 3 0.6976744186046512
Fold 4 0.7906976744186046
Fold 5 0.883

0.8466900455272548

In [None]:
res_dict.keys()

dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8])