# Trénink s destilací nad augmentovaným datasetem CIFAR10 s modelem MobileNetV2
V tomto notebooku je trénován MobileNetV2 nad augmentovaným datasetem CIFAR10, jako učitelský model je využíván finetunued ViT nad stejným datasetem. 

Model MobileNetV2 je využíván ve třech variantách: náhodně inicializovaný, předtrénovaný (doučení pouze klasifikační hlavy) a předtrénovaný (doučení celého modelu). Pro každou z variant je na základě nalezených hyperparametrů ze sešitu hp_search proveden normální trénink a trénink s destilací znalostí. V rámci tréninků je oproti prohledávání hyperparametrů využito EarlyStoppingu pro zamezení přeučení. Navíc jsou získány také výsledky nad testovací částí datasetu a další metriky využívané v práci (velikost modelu a rychlost inference). 

Při destilaci je využíváno předpočítaných logitů ze sešitu precompute_logits. Konfigurace jednotlivých tréninků odpovídá výstup pěti nejlepších běhů z prohledávání hyperparametrů u dané konfigurace. Maximální délka tréninku je nastavena na 20 epoch. EarlyStopping pracuje s trpělivostí tří epoch.

## Import knihoven a základní nastavení

In [1]:
from transformers import Trainer, EarlyStoppingCallback
from torch.utils.data import ConcatDataset, DataLoader
import torch
import base
import os

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


In [None]:
dataset_part = base.get_dataset_part()

Resetování náhodného seedu pro replikovatelnost výsledků.

In [None]:
base.reset_seed()

Ověření dostupnosti GPU.

In [4]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Načtení datasetu a aplikace základních a augmentačních transformací.

In [5]:
DATASET = "cifar10"

In [6]:
transform = base.base_transforms()

test = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/{10}-logits", dataset_part=dataset_part.TEST, transform=transform)
train = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TRAIN, transform=transform)
eval = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.EVAL, transform=transform)

In [8]:
augment_transform = base.aug_transforms()
train_aug = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TRAIN, transform=augment_transform)

Provedení filtrace augmentovaného datasetu dle popsaného mechanismu.

In [9]:
train_aug = base.remove_diff_pred_class(train, train_aug, pytorch_dataset=True)

Removing entries from augmented dataset that are different from the base one - based on saved logits:   0%|   …

Spojení augmentovaných a původních záznamů.

In [10]:
train_combo = ConcatDataset([train, train_aug])

Příprava dataloaderů pro finální ověření rychlosti inference. Testování probíhá pouze nad jedním záznamem z trénovací části.

In [12]:
train_part_cpu = base.CustomCIFAR10(root=f"{os.path.expanduser('~')}/data/10", train=True, batch=1, transform=transform, device="cpu")
cpu_data_loader = DataLoader(train_part_cpu, batch_size=1, shuffle=False)
train_part_gpu = base.CustomCIFAR10(root=f"{os.path.expanduser('~')}/data/10", train=True, batch=1, transform=transform, device="cuda")
gpu_data_loader = DataLoader(train_part_gpu, batch_size=1, shuffle=False)

## Normální trénink náhodně inicializovaného modelu

Konfigurace tréninku, zvolené parametry odpovídají pěti nejlepším výstupům z prohledávání hyperparametrů.
Získání náhodně inicializovaného modelu.

In [14]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-base_aug", logging_dir=f"~/logs/{DATASET}/random-base_aug", lr=0.00045, weight_decay=0.007, warmup_steps=10, epochs=20)
model = base.get_random_init_mobilenet(10)

Konfigurace trenéra s trpělivostí 3 epoch. 

In [15]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)


Spuštění tréninku, výstupy nad validační částí datasetu.

In [16]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5071,1.125158,0.6033,0.625951,0.603297,0.602108
2,0.9853,0.771221,0.7319,0.739313,0.731118,0.732748
3,0.7619,0.631122,0.7828,0.782569,0.783053,0.780887
4,0.6234,0.540581,0.8136,0.812128,0.813629,0.811525
5,0.5186,0.495491,0.8312,0.831375,0.831221,0.829975
6,0.4353,0.49646,0.83,0.832869,0.830364,0.829905
7,0.3618,0.459039,0.8474,0.853717,0.848007,0.847274
8,0.2906,0.43194,0.8582,0.859296,0.858279,0.858516
9,0.2296,0.476235,0.8542,0.862087,0.853726,0.856046
10,0.1799,0.501984,0.8516,0.854171,0.851892,0.851608


TrainOutput(global_step=5863, training_loss=0.5483821587169737, metrics={'train_runtime': 1308.0706, 'train_samples_per_second': 1042.39, 'train_steps_per_second': 8.149, 'total_flos': 1.5149454200570511e+18, 'train_loss': 0.5483821587169737, 'epoch': 11.0})

Přepnutí modelu do evaluačního režimu.


In [17]:
model.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe


Otestování modelu nad testovací částí datasetu.

In [18]:
trainer.evaluate(test)

{'eval_loss': 0.4583030343055725,
 'eval_accuracy': 0.8496,
 'eval_precision': 0.8511097284520334,
 'eval_recall': 0.8496,
 'eval_f1': 0.8498490193143002,
 'eval_runtime': 13.2839,
 'eval_samples_per_second': 752.791,
 'eval_steps_per_second': 5.947,
 'epoch': 11.0}

Uložení modelu.


In [None]:
torch.save(model.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/random-base_aug.pth")

In [19]:
base.reset_seed()

## Trénink s destilací do náhodně inicializovaného modelu
Získání náhodně inicializovaného studentského modelu.

In [20]:
student_model = base.get_random_init_mobilenet(10)


Konfigurace tréninku s destilací, zvolené parametry odpovídají pěti nejlepším výstupům z prohledávání hyperparametrů.

In [21]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-distill_aug", logging_dir=f"~/logs/{DATASET}/random-distill_aug", remove_unused_columns=False, epochs=20, lr=0.0006, weight_decay=0.001, warmup_steps=30, lambda_param=0.3, temp=6.5)

Konfigurace destilačního trenéra s trpělivostí 3 epoch. 

In [22]:
trainer = base.DistilTrainer(
    student_model=student_model,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

Spuštění tréninku s destilací, výstupy nad validační částí datasetu.

In [23]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.2531,0.952375,0.6196,0.618549,0.618654,0.611474
2,0.8607,0.6844,0.7386,0.741987,0.738267,0.738333
3,0.6977,0.603544,0.7708,0.771868,0.77134,0.767541
4,0.6002,0.527311,0.8099,0.809324,0.809594,0.806545
5,0.5292,0.495412,0.8239,0.827982,0.824019,0.822892
6,0.4678,0.479551,0.8317,0.834586,0.832018,0.831662
7,0.4151,0.433759,0.8521,0.856168,0.852743,0.851919
8,0.3687,0.419601,0.8593,0.858899,0.859476,0.858457
9,0.328,0.435989,0.8507,0.863998,0.850199,0.853787
10,0.2912,0.421444,0.8571,0.859412,0.857396,0.856426


TrainOutput(global_step=10660, training_loss=0.3840334614938017, metrics={'train_runtime': 2358.5439, 'train_samples_per_second': 578.119, 'train_steps_per_second': 4.52, 'total_flos': 2.7544462182855475e+18, 'train_loss': 0.3840334614938017, 'epoch': 20.0})

Přepnutí studenta do evaluačního režimu.

In [24]:
student_model.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

Otestování studenta nad testovací částí datasetu.

In [25]:
trainer.evaluate(test)

{'eval_loss': 0.3856470584869385,
 'eval_accuracy': 0.8727,
 'eval_precision': 0.8758898290314864,
 'eval_recall': 0.8726999999999998,
 'eval_f1': 0.8733697508431805,
 'eval_runtime': 13.0045,
 'eval_samples_per_second': 768.967,
 'eval_steps_per_second': 6.075,
 'epoch': 20.0}

Uložení studentského modelu.

In [None]:
torch.save(student_model.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/random-distill_aug.pth")

In [None]:
base.reset_seed()

## Normální trénink s doučením klasifikační hlavy předtrénovaného modelu

Získání předtrénovaného modelu.

In [20]:
model_pretrained = base.get_mobilenet(10)

In [21]:
print(model_pretrained)

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe


Zmražení všech parametrů kromě klasifikační hlavy.

In [22]:
model_pretrained = base.freeze_model(model_pretrained)

Konfigurace tréninku, zvolené parametry odpovídají pěti nejlepším výstupům z prohledávání hyperparametrů.

In [23]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/head-base_aug", logging_dir=f"~/logs/{DATASET}/head-base_aug", epochs=20, lr=0.0018, weight_decay=0.002, warmup_steps=25)


Konfigurace trenéra s trpělivostí 3 epoch.

In [24]:
trainer = Trainer(
    model=model_pretrained,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

Spuštění tréninku, výstupy nad validační částí datasetu.

In [25]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0319,0.716718,0.7516,0.755801,0.751491,0.750032
2,0.9003,0.695257,0.7628,0.768514,0.762726,0.763396
3,0.883,0.678211,0.7681,0.771087,0.767787,0.76569
4,0.8683,0.665547,0.7722,0.774148,0.771622,0.77016
5,0.8639,0.671913,0.7715,0.774187,0.771214,0.769318
6,0.8625,0.695257,0.7623,0.776779,0.761948,0.764137
7,0.859,0.664645,0.7727,0.776843,0.772592,0.772641
8,0.858,0.658363,0.7715,0.773936,0.771328,0.770761
9,0.8493,0.674281,0.7701,0.778435,0.769294,0.771682
10,0.8483,0.691852,0.7638,0.774654,0.763713,0.763567


TrainOutput(global_step=5330, training_loss=0.8824500737002374, metrics={'train_runtime': 810.6294, 'train_samples_per_second': 1682.051, 'train_steps_per_second': 13.15, 'total_flos': 1.3772231091427738e+18, 'train_loss': 0.8824500737002374, 'epoch': 10.0})


Přepnutí modelu do evaluačního režimu.

In [26]:
model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

Otestování modelu nad testovací částí datasetu.

In [27]:
trainer.evaluate(test)

{'eval_loss': 0.6696250438690186,
 'eval_accuracy': 0.7706,
 'eval_precision': 0.7741226069943992,
 'eval_recall': 0.7706000000000001,
 'eval_f1': 0.770693767312176,
 'eval_runtime': 13.6136,
 'eval_samples_per_second': 734.557,
 'eval_steps_per_second': 5.803,
 'epoch': 10.0}

Uložení modelu.

In [28]:
torch.save(model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/head-base_aug.pth")

In [None]:
base.reset_seed()

## Normální trénink předtrénovaného modelu

Získání předtrénovaného modelu.

In [30]:
model_pretrained_whole = base.get_mobilenet(10)

Konfigurace tréninku, zvolené parametry odpovídají pěti nejlepším výstupům z prohledávání hyperparametrů.


In [31]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-base_aug", logging_dir=f"~/logs/{DATASET}/pretrained-base_aug", epochs=20, lr=0.00035, weight_decay=0.008, warmup_steps=10)

Konfigurace trenéra s trpělivostí 3 epoch. 

In [32]:
trainer = Trainer(
    model=model_pretrained_whole,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

Spuštění tréninku, výstupy nad validační částí datasetu.

In [33]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.435,0.277288,0.9045,0.909978,0.904694,0.904387
2,0.178,0.214747,0.9294,0.930866,0.929688,0.929366
3,0.1065,0.225928,0.9278,0.927853,0.928156,0.927398
4,0.0742,0.237948,0.9327,0.934085,0.932782,0.932739
5,0.0522,0.263896,0.9326,0.933032,0.932793,0.93259
6,0.0427,0.250471,0.9361,0.937331,0.936413,0.936365
7,0.0322,0.245255,0.9425,0.942723,0.942688,0.942482
8,0.025,0.219642,0.9455,0.945886,0.945526,0.945607
9,0.0198,0.276751,0.9419,0.9439,0.941903,0.942556
10,0.0157,0.250731,0.9454,0.946081,0.945724,0.945397


TrainOutput(global_step=7462, training_loss=0.07224945052639468, metrics={'train_runtime': 1653.5066, 'train_samples_per_second': 824.623, 'train_steps_per_second': 6.447, 'total_flos': 1.9281123527998833e+18, 'train_loss': 0.07224945052639468, 'epoch': 14.0})

Přepnutí modelu do evaluačního režimu.

In [34]:
model_pretrained_whole.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe


Otestování modelu nad testovací částí datasetu.

In [35]:
trainer.evaluate(test)

{'eval_loss': 0.340167760848999,
 'eval_accuracy': 0.9371,
 'eval_precision': 0.9381908271475531,
 'eval_recall': 0.9370999999999998,
 'eval_f1': 0.937193628248514,
 'eval_runtime': 12.8016,
 'eval_samples_per_second': 781.153,
 'eval_steps_per_second': 6.171,
 'epoch': 14.0}

Uložení modelu.

In [36]:
torch.save(model_pretrained_whole.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/pretrained-base_aug.pth")

In [None]:
base.reset_seed()

## Trénink s destilací s doučením klasifikační hlavy předtrénovaného modelu

Získání předtrénovaného studentského modelu.


In [38]:
student_model_pretrained = base.get_mobilenet(10)

Zmražení všech parametrů kromě klasifikační hlavy.

In [39]:
student_model_pretrained = base.freeze_model(student_model_pretrained)


Konfigurace tréninku s destilací, zvolené parametry odpovídají pěti nejlepším výstupům z prohledávání hyperparametrů.

In [40]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/head-distill_aug", logging_dir=f"~/logs/{DATASET}/head-distill_aug", remove_unused_columns=False, epochs=20, lr=0.00085, weight_decay=.008, warmup_steps=5, lambda_param=.3, temp=5)

Konfigurace destilačního trenéra s trpělivostí 3 epoch. 

In [41]:
trainer = base.DistilTrainer(
    student_model=student_model_pretrained,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

Spuštění tréninku s destilací, výstupy nad validační částí datasetu.

In [42]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.9624,0.740998,0.7534,0.757778,0.753066,0.752863
2,0.8689,0.725078,0.7674,0.77091,0.767066,0.767063
3,0.8572,0.714603,0.7656,0.76971,0.76531,0.763915
4,0.8516,0.701048,0.7741,0.77693,0.773384,0.772942
5,0.8489,0.700044,0.773,0.774222,0.772692,0.771049
6,0.8468,0.711893,0.7619,0.772429,0.761514,0.763534
7,0.8467,0.706788,0.7702,0.774271,0.770192,0.769936


TrainOutput(global_step=3731, training_loss=0.8689309199161167, metrics={'train_runtime': 577.8759, 'train_samples_per_second': 2359.538, 'train_steps_per_second': 18.447, 'total_flos': 9.640561763999416e+17, 'train_loss': 0.8689309199161167, 'epoch': 7.0})

Přepnutí studenta do evaluačního režimu.

In [43]:
student_model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

Otestování studenta nad testovací částí datasetu.


In [44]:
trainer.evaluate(test)

{'eval_loss': 0.7104712128639221,
 'eval_accuracy': 0.7707,
 'eval_precision': 0.7744924667578328,
 'eval_recall': 0.7707,
 'eval_f1': 0.7698503725478776,
 'eval_runtime': 13.146,
 'eval_samples_per_second': 760.689,
 'eval_steps_per_second': 6.009,
 'epoch': 7.0}

Uložení studentského modelu.

In [45]:
torch.save(student_model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/head-distill_aug.pth")

In [None]:
base.reset_seed()


## Trénink s destilací s doučením předtrénovaného modelu

Získání předtrénovaného studentského modelu.

In [47]:
student_model_pretrained_whole = base.get_mobilenet(10)


Konfigurace tréninku s destilací, zvolené parametry odpovídají pěti nejlepším výstupům z prohledávání hyperparametrů.

In [48]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-distill_aug", logging_dir=f"~/logs/{DATASET}/pretrained-distill_aug", remove_unused_columns=False, epochs=20, lr=0.0004, weight_decay=.004, warmup_steps=25, lambda_param=.8, temp=5)

Konfigurace destilačního trenéra s trpělivostí 3 epoch. 

In [49]:
trainer = base.DistilTrainer(
    student_model=student_model_pretrained_whole,
    args=training_args,
    train_dataset=train_combo,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

Spuštění tréninku s destilací, výstupy nad validační částí datasetu.

In [50]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.2606,0.166664,0.9133,0.916675,0.913401,0.913569
2,0.1519,0.134888,0.9336,0.936088,0.933972,0.933257
3,0.1246,0.133528,0.9391,0.940057,0.939499,0.938933
4,0.1125,0.129485,0.9384,0.940049,0.938368,0.938384
5,0.1048,0.116963,0.9466,0.947576,0.946724,0.946678
6,0.1,0.118011,0.9454,0.945752,0.945782,0.945359
7,0.0976,0.119436,0.9443,0.945343,0.944673,0.94425
8,0.0952,0.111626,0.951,0.951681,0.951243,0.951057
9,0.0924,0.113866,0.9467,0.951429,0.946899,0.947536
10,0.0905,0.112662,0.9526,0.952881,0.952924,0.952528


TrainOutput(global_step=9594, training_loss=0.10569548870291043, metrics={'train_runtime': 2136.5532, 'train_samples_per_second': 638.187, 'train_steps_per_second': 4.989, 'total_flos': 2.479001596456993e+18, 'train_loss': 0.10569548870291043, 'epoch': 18.0})

Přepnutí studenta do evaluačního režimu.

In [51]:
student_model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

Otestování studenta nad testovací částí datasetu.

In [52]:
trainer.evaluate(test)

{'eval_loss': 0.10659950971603394,
 'eval_accuracy': 0.9531,
 'eval_precision': 0.953135376135972,
 'eval_recall': 0.9531000000000001,
 'eval_f1': 0.9530021715293957,
 'eval_runtime': 12.8615,
 'eval_samples_per_second': 777.513,
 'eval_steps_per_second': 6.142,
 'epoch': 18.0}

Uložení studentského modelu.

In [53]:
torch.save(student_model_pretrained_whole.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/pretrained-distill_aug.pth")

Získání počtu trénovatelných parametrů v modelu. 

In [54]:
base.count_parameters(student_model_pretrained_whole)

model size: 8.663MB.
Total Trainable Params: 2236682.


Unnamed: 0,Modules,Parameters
0,mobilenet_v2.conv_stem.first_conv.convolution....,864
1,mobilenet_v2.conv_stem.first_conv.normalizatio...,32
2,mobilenet_v2.conv_stem.first_conv.normalizatio...,32
3,mobilenet_v2.conv_stem.conv_3x3.convolution.we...,288
4,mobilenet_v2.conv_stem.conv_3x3.normalization....,32
...,...,...
153,mobilenet_v2.conv_1x1.convolution.weight,409600
154,mobilenet_v2.conv_1x1.normalization.weight,1280
155,mobilenet_v2.conv_1x1.normalization.bias,1280
156,classifier.weight,12800


Získání počtu trénovatelných parametrů v případě tréninku pouze klasifikační hlavy.


In [55]:
base.count_parameters(student_model_pretrained)

model size: 8.663MB.
Total Trainable Params: 12810.


Unnamed: 0,Modules,Parameters
0,classifier.weight,12800
1,classifier.bias,10


Změření rychlosti inference při použití CPU, 1000 pokusů s jedním záznamem.

In [56]:
cpu_benchmark = base.BenchMarkRunner(student_model_pretrained_whole, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

<torch.utils.benchmark.utils.common.Measurement object at 0x797dd81f9750>
self.infer_speed_comp()
  25.12 ms
  1 measurement, 1000 runs , 4 threads


Změření rychlosti inference při použití GPU, 1000 pokusů s jedním záznamem.

In [57]:
gpu_benchmark = base.BenchMarkRunner(student_model_pretrained_whole, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

<torch.utils.benchmark.utils.common.Measurement object at 0x797dd3c58670>
self.infer_speed_comp()
  8.94 ms
  1 measurement, 1000 runs , 4 threads
