## Vizuální predikce chorob u fazolí

Nejdříve importujeme knihovny a dataset z HuggingFace Api a zobrazíme nějaké příklady.
Poté už se pustíme do tvorby a dolaďění modelu, postup je intuitivní. V procesu používáme knihovny transformers a PyTorch 
(knihovna transformers se bez PyTorch neobejde),
kdy třídě Trainer poskytneme data plus nutné parametry a tím vytvoříme finální model k natrénování na našich datech,

Trainer: https://huggingface.co/docs/transformers/main_classes/trainer

Dataset: https://huggingface.co/datasets/beans

Inspirace: https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb

In [1]:
# import knihoven, napřed musíme mít nainstalované všechny knihovny, např:
# pip3 install torch torchvision torchaudio
# pip install datasets transformers
# conda install -c conda-forge tensorboard
# conda install -c conda-forge protobuf
from datasets import load_dataset, load_metric
from transformers import Trainer, ViTFeatureExtractor, ViTForImageClassification, TrainingArguments
import numpy as np
import torch
from PIL import ImageDraw, ImageFont, Image
import random

### Načtení datasetu a modelu

In [2]:
# načtení datové sady díky huggingface knihovně datasets viz. https://huggingface.co/docs/datasets/index
ds = load_dataset('beans')
ds

Found cached dataset beans (C:/Users/Gigon/.cache/huggingface/datasets/beans/default/0.0.0/90c755fb6db1c0ccdad02e897a37969dbf070bed3755d4391e269ff70642d791)


  0%|          | 0/3 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['image_file_path', 'image', 'labels'],
        num_rows: 1034
    })
    validation: Dataset({
        features: ['image_file_path', 'image', 'labels'],
        num_rows: 133
    })
    test: Dataset({
        features: ['image_file_path', 'image', 'labels'],
        num_rows: 128
    })
})

In [3]:
# zobrazení obrázku
image = ds['train'][42]
image

{'image_file_path': 'C:\\Users\\Gigon\\.cache\\huggingface\\datasets\\downloads\\extracted\\ab87c331001da2f7769bdbbf7c3596bbfb41d2845c97674c7b502aab7f668023\\train\\angular_leaf_spot\\angular_leaf_spot_train.136.jpg',
 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=500x500>,
 'labels': 0}

In [4]:
# načtení modelu z Huggingface a vytvoření feature extractoru (ten z obrázku vrátí hodnoty pixelů) podle návodu https://huggingface.co/docs/transformers/main_classes/extractor
huggingface_model = 'google/vit-base-patch16-224-in21k' # zde jsem narazil chybovou hlášku, že můj PC nepodporuje symlinks a musel jsem povolit Windows Developer mode viz. https://learn.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
extractor = ViTFeatureExtractor.from_pretrained(huggingface_model)

In [5]:
extractor(image["image"], return_tensors='pt')

{'pixel_values': tensor([[[[-0.0510, -0.0510, -0.1059,  ..., -0.0196,  0.0510,  0.0039],
          [-0.0510, -0.0902, -0.0902,  ...,  0.1137, -0.0275, -0.1529],
          [-0.0980, -0.2000, -0.0588,  ...,  0.1765, -0.0745, -0.1608],
          ...,
          [ 0.0980,  0.0196,  0.0588,  ..., -0.3490, -0.5216, -0.6314],
          [-0.0902, -0.0980, -0.0431,  ..., -0.4039, -0.5294, -0.4275],
          [-0.2000, -0.1686, -0.0980,  ..., -0.4980, -0.5216, -0.3647]],

         [[-0.3255, -0.3098, -0.3490,  ..., -0.1608, -0.0745, -0.1216],
          [-0.3255, -0.3490, -0.3333,  ..., -0.0431, -0.1765, -0.3020],
          [-0.3647, -0.4588, -0.3176,  ...,  0.0039, -0.2549, -0.3333],
          ...,
          [ 0.4431,  0.3804,  0.4353,  ..., -0.3176, -0.5294, -0.6706],
          [ 0.3412,  0.3333,  0.3882,  ..., -0.3725, -0.5294, -0.4275],
          [ 0.2784,  0.3020,  0.3569,  ..., -0.4667, -0.5059, -0.3569]],

         [[-0.5608, -0.5294, -0.5686,  ..., -0.3725, -0.3569, -0.4196],
          [-0

In [6]:
image['labels']

0

### Příprava nutných funkcí

In [7]:
# funkce pro převedení datasetu obrázků na dataset hodnot pixelů
def get_pixels(images):
    inputs = extractor([x for x in images['image']], return_tensors='pt')

    # plus přidat labels
    inputs['labels'] = images['labels']
    return inputs

# další povinná funkce pro Trainer
def get_metrics(p):
    metric = load_metric("accuracy")
    
    return metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)

arguments = TrainingArguments( # definice specifických parametrů pro dotrénování modelu
  output_dir="./vit-base-beans",
  per_device_train_batch_size=16,
  evaluation_strategy="steps",
  num_train_epochs=4,
  save_steps=100,
  eval_steps=100,
  logging_steps=10,
  learning_rate=2e-4,
  save_total_limit=2,
  remove_unused_columns=False,
  push_to_hub=False,
  report_to='tensorboard',
  load_best_model_at_end=True,
)

# funkce nutná pro fungování Traineru, rozbaluje vstupní batche do výstupní viz. https://huggingface.co/docs/transformers/main_classes/data_collator
def collator(batch): 
    return {
        'pixel_values': torch.stack([x['pixel_values'] for x in batch]),
        'labels': torch.tensor([x['labels'] for x in batch])
    }

### Trénování a vyhodnocení

In [8]:
ds_pixels = ds.with_transform(get_pixels) # aplikace funkce get_pixels na celý dataset

final_model = ViTForImageClassification.from_pretrained(
    huggingface_model,
    num_labels=len(ds['train'].features['labels'].names),
    id2label={str(i): c for i, c in enumerate(ds['train'].features['labels'].names)},
    label2id={c: str(i) for i, c in enumerate(ds['train'].features['labels'].names)}
)

trainer = Trainer(
    model=final_model,
    args=arguments,
    data_collator=collator,
    compute_metrics=get_metrics,
    train_dataset=ds_pixels["train"],
    eval_dataset=ds_pixels["validation"],
    tokenizer=extractor,
)

train_results = trainer.train()
trainer.save_model()
trainer.log_metrics("train", train_results.metrics)
trainer.save_metrics("train", train_results.metrics)

Some weights of the model checkpoint at google/vit-base-patch16-224-in21k were not used when initializing ViTForImageClassification: ['pooler.dense.bias', 'pooler.dense.weight']
- This IS expected if you are initializing ViTForImageClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ViTForImageClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
***** Running training *****
  N

Step,Training Loss,Validation Loss,Accuracy
100,0.0947,0.03264,0.992481
200,0.0126,0.033406,0.992481


***** Running Evaluation *****
  Num examples = 133
  Batch size = 8
  # This is added back by InteractiveShellApp.init_path()
Saving model checkpoint to ./vit-base-beans\checkpoint-100
Configuration saved in ./vit-base-beans\checkpoint-100\config.json
Model weights saved in ./vit-base-beans\checkpoint-100\pytorch_model.bin
Image processor saved in ./vit-base-beans\checkpoint-100\preprocessor_config.json
***** Running Evaluation *****
  Num examples = 133
  Batch size = 8
Saving model checkpoint to ./vit-base-beans\checkpoint-200
Configuration saved in ./vit-base-beans\checkpoint-200\config.json
Model weights saved in ./vit-base-beans\checkpoint-200\pytorch_model.bin
Image processor saved in ./vit-base-beans\checkpoint-200\preprocessor_config.json


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ./vit-base-beans\checkpoint-100 (score: 0.03264022246003151).
Saving model checkpoint to ./vit-base-beans
Configuration saved in ./v

***** train metrics *****
  epoch                    =         4.0
  total_flos               = 298497957GF
  train_loss               =      0.1154
  train_runtime            =  0:50:16.01
  train_samples_per_second =       1.371
  train_steps_per_second   =       0.086


In [12]:
metrics = trainer.evaluate(ds_pixels['validation'])
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)

metrics = trainer.evaluate(ds_pixels['test'])
trainer.log_metrics("test", metrics)
trainer.save_metrics("test", metrics)

***** Running Evaluation *****
  Num examples = 133
  Batch size = 8
***** Running Evaluation *****
  Num examples = 128
  Batch size = 8


***** eval metrics *****
  epoch                   =        4.0
  eval_accuracy           =     0.9925
  eval_loss               =     0.0326
  eval_runtime            = 0:00:32.89
  eval_samples_per_second =      4.043
  eval_steps_per_second   =      0.517
***** test metrics *****
  epoch                   =        4.0
  eval_accuracy           =     0.9453
  eval_loss               =     0.1801
  eval_runtime            = 0:00:36.92
  eval_samples_per_second =      3.467
  eval_steps_per_second   =      0.433
