# Notebook pro trénink s destilací nad datasetem CIFAR10
V tomto notebooku je trénován MobileNetV2 nad datasetem CIFAR10, jako učitelsý model je využíván finetunued ViT nad stejným datasetem. 

MobileNetV2 je používán s náhodnou inicializací, tréninkem pouze klasifikační hlavy inicializovaného (předtrénovaného nad ImageNetem) MobileNetuV2 a trénink celého modelu, taktéž inicializovaného. Tyto tři úlohy jsou trénovány bězným způsobem a také s pomocí destilace výše zmíněného modelu.  

Při destilaci je využíváno předpočítaných logitů ze sešitu precompute_logits.

## Import knihoven a definice metod

In [None]:
from transformers import Trainer, EarlyStoppingCallback
from torch.utils.data import ConcatDataset, DataLoader
import pandas as pd
import torch
import base
import os

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


In [2]:
dataset_part = base.get_dataset_part()

Resetování náhodného seedu pro replikovatelnost výsledků.

In [3]:
base.reset_seed()

In [4]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Provedení transformací nad datasetem.

In [5]:
DATASET = "cifar10"

In [None]:
transform = base.base_transforms()

#Poslední train batch použijeme jako eval část...
test = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/{10}-logits", dataset_part=dataset_part.TEST, transform=transform)
train = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TRAIN, transform=transform)
eval = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.EVAL, transform=transform)

In [7]:
train[0]["labels"]

6

In [8]:
augment_transform = base.aug_transforms()
train_aug = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TRAIN, transform=augment_transform)

In [9]:
train_aug = base.remove_diff_pred_class(train, train_aug, pytorch_dataset=True)

Removing entries from augmented dataset that are different from the base one - based on saved logits:   0%|   …

In [10]:
train_combo = ConcatDataset([train, train_aug])

In [11]:
# Test rozložení --> Good Enough
df = pd.DataFrame(eval.labels)
print(df.value_counts())

0
5    1025
9    1022
3    1016
0    1014
1    1014
8    1003
4     997
6     980
7     977
2     952
Name: count, dtype: int64


In [None]:
train_part_cpu = base.CustomCIFAR10(root=f"{os.path.expanduser('~')}/data/10", train=True, batch=1, transform=transform, device="cpu")
cpu_data_loader = DataLoader(train_part_cpu, batch_size=1, shuffle=False)
train_part_gpu = base.CustomCIFAR10(root=f"{os.path.expanduser('~')}/data/10", train=True, batch=1, transform=transform, device="cuda")
gpu_data_loader = DataLoader(train_part_gpu, batch_size=1, shuffle=False)

### Standardní trénink náhodně inicializovaného modelu. 

In [12]:
base.reset_seed()

In [13]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-base_aug", logging_dir=f"~/logs/{DATASET}/random-base_aug", lr=0.0005, weight_decay=0.008, adam_beta1=.95, epochs=30)
model = base.get_random_init_mobilenet(10)
model.to(device)

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [14]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 4)]
)

In [15]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.4865,1.093143,0.606,0.610664,0.605263,0.59678
2,1.0132,0.849768,0.7003,0.704617,0.699392,0.69783
3,0.7933,0.664946,0.7656,0.765777,0.766001,0.763336
4,0.6547,0.56076,0.807,0.804383,0.807221,0.803649
5,0.5522,0.538597,0.8143,0.822196,0.814565,0.81339
6,0.471,0.497567,0.8334,0.838328,0.833386,0.834422
7,0.4057,0.467834,0.8406,0.844173,0.841005,0.840653
8,0.3439,0.447875,0.852,0.852826,0.851984,0.851704
9,0.2889,0.448228,0.8564,0.860999,0.85623,0.857671
10,0.2393,0.457234,0.8582,0.859826,0.858611,0.857956


TrainOutput(global_step=10127, training_loss=0.3783520771718583, metrics={'train_runtime': 5544.3066, 'train_samples_per_second': 368.897, 'train_steps_per_second': 2.884, 'total_flos': 2.61672390737127e+18, 'train_loss': 0.3783520771718583, 'epoch': 19.0})

In [16]:
model.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [17]:
trainer.evaluate(test)

{'eval_loss': 0.579505980014801,
 'eval_accuracy': 0.8586,
 'eval_precision': 0.8594507577193161,
 'eval_recall': 0.8585999999999998,
 'eval_f1': 0.858603121391775,
 'eval_runtime': 32.3133,
 'eval_samples_per_second': 309.47,
 'eval_steps_per_second': 2.445,
 'epoch': 19.0}

In [18]:
torch.save(model.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/random-base_aug.pth")

In [None]:
base.count_parameters(model)

In [None]:
cpu_benchmark = base.BenchMarkRunner(model, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(model, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

## Definice destilačního tréninku

Třída, která upravuje hugging face trenéra pro destilaci znalostí. Nově pracuje s logity uloženými v datasetu.

### Trénink náhodně inicializovaného modelu s pomocí destilace znalostí

In [19]:
base.reset_seed()

In [20]:
student_model = base.get_random_init_mobilenet(10)

In [21]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-distill_aug", logging_dir=f"~/logs/{DATASET}/random-distill_aug", remove_unused_columns=False, epochs=30, lr=0.00047, weight_decay=0, adam_beta1=.9, lambda_param=1, temp=6)

In [22]:
trainer = base.DistilTrainer(
    student_model=student_model,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
)

In [23]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4537,0.359441,0.6249,0.637672,0.624846,0.616519
2,0.2844,0.230584,0.7476,0.773817,0.746909,0.750111
3,0.2227,0.197249,0.7887,0.793021,0.7895,0.784358
4,0.1881,0.167945,0.8128,0.818846,0.812352,0.811216
5,0.1659,0.153055,0.8267,0.836503,0.826999,0.825565
6,0.1448,0.145682,0.8333,0.840048,0.833844,0.834097
7,0.129,0.135474,0.8395,0.854478,0.840126,0.840186
8,0.1155,0.122012,0.8555,0.857653,0.855685,0.855327
9,0.1044,0.133694,0.8479,0.864233,0.847686,0.851566
10,0.0925,0.121965,0.8597,0.863439,0.860022,0.859765


TrainOutput(global_step=15990, training_loss=0.09810652133447816, metrics={'train_runtime': 9488.3879, 'train_samples_per_second': 215.556, 'train_steps_per_second': 1.685, 'total_flos': 4.1316693274283213e+18, 'train_loss': 0.09810652133447816, 'epoch': 30.0})

In [24]:
student_model.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [25]:
trainer.evaluate(test)

{'eval_loss': 0.09991142898797989,
 'eval_accuracy': 0.8717,
 'eval_precision': 0.8764743996558894,
 'eval_recall': 0.8717,
 'eval_f1': 0.8726752677693439,
 'eval_runtime': 19.9477,
 'eval_samples_per_second': 501.311,
 'eval_steps_per_second': 3.96,
 'epoch': 30.0}

In [None]:
torch.save(student_model.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/random-distill_aug.pth")

In [None]:
base.count_parameters(student_model)

In [None]:
cpu_benchmark = base.BenchMarkRunner(student_model, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(student_model, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

## Získání inicializovaného MobileNetV2 modelu

In [27]:
base.reset_seed()

In [28]:
model_pretrained = base.get_mobilenet(10)

In [29]:
print(model_pretrained)

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [30]:
model_pretrained = base.freeze_model(model_pretrained)

In [31]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/head-base_aug", logging_dir=f"~/logs/{DATASET}/head-base_aug", epochs=15, lr=0.0005, weight_decay=0.008, adam_beta1=.95)

In [32]:
trainer = Trainer(
    model=model_pretrained,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 2)]
)

In [33]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.1341,0.756586,0.747,0.74668,0.746417,0.745291
2,0.9065,0.710192,0.7621,0.764527,0.761861,0.761175
3,0.8713,0.697629,0.7615,0.764266,0.76133,0.759349
4,0.853,0.686825,0.7659,0.777288,0.765425,0.765711
5,0.843,0.665531,0.7724,0.772252,0.771982,0.770255
6,0.8367,0.686097,0.7642,0.7737,0.76401,0.765735
7,0.8318,0.673814,0.7723,0.775269,0.772182,0.77209
8,0.8291,0.666093,0.7731,0.774326,0.772868,0.772515
9,0.8234,0.672318,0.7692,0.772313,0.768632,0.769854
10,0.8225,0.669086,0.7686,0.774021,0.768702,0.768146


TrainOutput(global_step=5330, training_loss=0.8751568534808132, metrics={'train_runtime': 1480.273, 'train_samples_per_second': 690.846, 'train_steps_per_second': 5.401, 'total_flos': 1.3772231091427738e+18, 'train_loss': 0.8751568534808132, 'epoch': 10.0})

In [34]:
model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [35]:
trainer.evaluate(test)

{'eval_loss': 0.6728852391242981,
 'eval_accuracy': 0.7722,
 'eval_precision': 0.7738367676370215,
 'eval_recall': 0.7722,
 'eval_f1': 0.771958535620026,
 'eval_runtime': 21.0445,
 'eval_samples_per_second': 475.183,
 'eval_steps_per_second': 3.754,
 'epoch': 10.0}

In [None]:
torch.save(model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/head-base_aug.pth")

In [None]:
base.count_parameters(model_pretrained)

In [None]:
cpu_benchmark = base.BenchMarkRunner(model_pretrained, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(model_pretrained, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

### Trénink inicializovaného MobileNetV2

In [37]:
base.reset_seed()

In [38]:
model_pretrained_whole = base.get_mobilenet(10)

In [39]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-base_aug", logging_dir=f"~/logs/{DATASET}/pretrained-base_aug", epochs=10, lr=0.0005, weight_decay=0.008, adam_beta1=.95)

In [40]:
trainer = Trainer(
    model=model_pretrained_whole,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

In [41]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.4356,0.268988,0.9105,0.912651,0.910846,0.910433
2,0.198,0.230399,0.927,0.927953,0.927265,0.926597
3,0.117,0.20938,0.9321,0.932249,0.93258,0.9318
4,0.0739,0.231659,0.9341,0.935116,0.934336,0.934391
5,0.0436,0.23168,0.9421,0.943105,0.942301,0.94226
6,0.0282,0.252098,0.9403,0.94122,0.940507,0.940521
7,0.0159,0.284193,0.94,0.941329,0.940259,0.940084
8,0.007,0.255501,0.9455,0.946165,0.945673,0.945657
9,0.0023,0.252063,0.9504,0.951156,0.950529,0.950708
10,0.0011,0.250247,0.9479,0.948566,0.948185,0.94776


TrainOutput(global_step=5330, training_loss=0.0922575179564349, metrics={'train_runtime': 1188.5801, 'train_samples_per_second': 573.592, 'train_steps_per_second': 4.484, 'total_flos': 1.3772231091427738e+18, 'train_loss': 0.0922575179564349, 'epoch': 10.0})

In [42]:
model_pretrained_whole.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [43]:
trainer.evaluate(test)

{'eval_loss': 0.27267807722091675,
 'eval_accuracy': 0.948,
 'eval_precision': 0.9484112579568773,
 'eval_recall': 0.9480000000000001,
 'eval_f1': 0.9481439375826664,
 'eval_runtime': 13.6157,
 'eval_samples_per_second': 734.445,
 'eval_steps_per_second': 5.802,
 'epoch': 10.0}

In [None]:
torch.save(model_pretrained_whole.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/pretrained-base_aug.pth")

In [None]:
base.count_parameters(model_pretrained_whole)

In [None]:
cpu_benchmark = base.BenchMarkRunner(model_pretrained_whole, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(model_pretrained_whole, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

## Trénink s pomocí destilace znalostí inicializovaného MobileNetV2

### Trénink inicializovaného modelu - pouze klasifikační hlavy s pomocí destilace

In [45]:
base.reset_seed()

In [46]:
student_model_pretrained = base.get_mobilenet(10)

In [47]:
student_model_pretrained = base.freeze_model(student_model_pretrained)

In [51]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/head-distill_aug", logging_dir=f"~/logs/{DATASET}/head-distill_aug", remove_unused_columns=False, epochs=15, lr=0.00047, weight_decay=0, adam_beta1=.9, lambda_param=1, temp=6)

In [52]:
trainer = base.DistilTrainer(
    student_model=student_model_pretrained,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 2)]
)

In [53]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3838,0.32741,0.7393,0.753303,0.738899,0.74023
2,0.3408,0.314586,0.7539,0.75894,0.753601,0.753921
3,0.3348,0.314628,0.7469,0.766677,0.746497,0.744972
4,0.3328,0.305853,0.7564,0.768946,0.755548,0.756601
5,0.3317,0.304901,0.7602,0.765186,0.759883,0.758956
6,0.3304,0.309137,0.7483,0.766057,0.747963,0.750746
7,0.3306,0.311593,0.7501,0.768915,0.75,0.750614


TrainOutput(global_step=3731, training_loss=0.34068631244006087, metrics={'train_runtime': 1731.7264, 'train_samples_per_second': 590.532, 'train_steps_per_second': 4.617, 'total_flos': 9.640561763999416e+17, 'train_loss': 0.34068631244006087, 'epoch': 7.0})

In [54]:
student_model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [55]:
trainer.evaluate(test)

{'eval_loss': 0.30593815445899963,
 'eval_accuracy': 0.7565,
 'eval_precision': 0.7610543317863664,
 'eval_recall': 0.7565000000000001,
 'eval_f1': 0.7557109052491887,
 'eval_runtime': 29.2331,
 'eval_samples_per_second': 342.078,
 'eval_steps_per_second': 2.702,
 'epoch': 7.0}

In [None]:
torch.save(student_model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/head-distill_aug.pth")

In [None]:
base.count_parameters(model)

In [None]:
cpu_benchmark = base.BenchMarkRunner(student_model_pretrained, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(student_model_pretrained, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

### Trénink inicializovaného modelu s pomocí destilace

In [65]:
base.reset_seed()

In [66]:
student_model_pretrained_whole = base.get_mobilenet(10)

In [67]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-distill_aug", logging_dir=f"~/logs/{DATASET}/pretrained-distill_aug", remove_unused_columns=False, epochs=10, lr=0.00047, weight_decay=0, adam_beta1=.9, lambda_param=1, temp=6)

In [68]:
trainer = base.DistilTrainer(
    student_model=student_model_pretrained_whole.to(device),
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

In [69]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1309,0.076614,0.9169,0.919946,0.917055,0.917099
2,0.0679,0.061118,0.9236,0.927461,0.923914,0.923382
3,0.0494,0.062874,0.9286,0.929813,0.92913,0.928294
4,0.0391,0.054711,0.9352,0.937762,0.935217,0.935265
5,0.0325,0.049933,0.937,0.938965,0.937317,0.937149
6,0.028,0.046688,0.9392,0.941039,0.939483,0.939613
7,0.0244,0.048451,0.9331,0.936251,0.933595,0.93324
8,0.0215,0.042235,0.943,0.944416,0.943314,0.943311
9,0.0187,0.044114,0.9406,0.944571,0.940873,0.94131
10,0.0166,0.044702,0.9443,0.945271,0.94467,0.944438


TrainOutput(global_step=5330, training_loss=0.0429091950965867, metrics={'train_runtime': 3003.9231, 'train_samples_per_second': 226.957, 'train_steps_per_second': 1.774, 'total_flos': 1.3772231091427738e+18, 'train_loss': 0.0429091950965867, 'epoch': 10.0})

In [70]:
student_model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [71]:
trainer.evaluate(test)

{'eval_loss': 0.046246908605098724,
 'eval_accuracy': 0.9419,
 'eval_precision': 0.9424457486592726,
 'eval_recall': 0.9419000000000001,
 'eval_f1': 0.9417359918811032,
 'eval_runtime': 20.01,
 'eval_samples_per_second': 499.75,
 'eval_steps_per_second': 3.948,
 'epoch': 10.0}

In [None]:
torch.save(student_model_pretrained_whole.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/pretrained-distill_aug.pth")

In [None]:
base.count_parameters(student_model_pretrained_whole)

In [None]:
cpu_benchmark = base.BenchMarkRunner(student_model_pretrained_whole, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(student_model_pretrained_whole, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())