# Notebook pro trénink s destilací nad datasetem CIFAR10
V tomto notebooku je trénován MobileNetV2 nad datasetem CIFAR10, jako učitelsý model je využíván finetunued ViT nad stejným datasetem. 

MobileNetV2 je používán s náhodnou inicializací, tréninkem pouze klasifikační hlavy inicializovaného (předtrénovaného nad ImageNetem) MobileNetuV2 a trénink celého modelu, taktéž inicializovaného. Tyto tři úlohy jsou trénovány bězným způsobem a také s pomocí destilace výše zmíněného modelu.  

Při destilaci je využíváno předpočítaných logitů ze sešitu precompute_logits.

## Import knihoven a definice metod

In [None]:
from transformers import Trainer, EarlyStoppingCallback
from torch.utils.data import DataLoader
import pandas as pd
import torch
import base
import os

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


In [2]:
dataset_part = base.get_dataset_part()

Resetování náhodného seedu pro replikovatelnost výsledků.

In [3]:
base.reset_seed()

In [4]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Provedení transformací nad datasetem.

In [5]:
DATASET = "cifar10"

In [6]:
transform = base.base_transforms()

#Poslední train batch použijeme jako eval část...
test = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TEST, transform=transform)
train = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.TRAIN, transform=transform)
eval = base.CustomCIFAR10L(root=f"{os.path.expanduser('~')}/data/10-logits", dataset_part=dataset_part.EVAL, transform=transform)

In [7]:
train[0]["labels"]

6

In [8]:
# Test rozložení --> Good Enough
df = pd.DataFrame(eval.labels)
print(df.value_counts())

0
5    1025
9    1022
3    1016
0    1014
1    1014
8    1003
4     997
6     980
7     977
2     952
Name: count, dtype: int64


In [None]:
train_part_cpu = base.CustomCIFAR10(root=f"{os.path.expanduser('~')}/data/10", train=True, batch=1, transform=transform, device="cpu")
cpu_data_loader = DataLoader(train_part_cpu, batch_size=1, shuffle=False)
train_part_gpu = base.CustomCIFAR10(root=f"{os.path.expanduser('~')}/data/10", train=True, batch=1, transform=transform, device="cuda")
gpu_data_loader = DataLoader(train_part_gpu, batch_size=1, shuffle=False)

### Standardní trénink náhodně inicializovaného modelu. 

In [9]:
base.reset_seed()

In [10]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/cifar10-random", logging_dir=f"~/logs/{DATASET}/cifar10-random", lr=0.0005, weight_decay=0.008, adam_beta1=.95, epochs=30)
model = base.get_random_init_mobilenet(10)
model.to(device)

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [11]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 4)]
)

In [12]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.5476,1.073661,0.613,0.61874,0.613018,0.604274
2,1.0039,0.861785,0.6975,0.707029,0.697189,0.69641
3,0.7897,0.727871,0.7493,0.750574,0.748952,0.746941
4,0.6499,0.604314,0.7888,0.795469,0.788709,0.78835
5,0.5445,0.55997,0.8093,0.816708,0.809599,0.809887
6,0.4644,0.543778,0.8126,0.818717,0.812414,0.81287
7,0.3944,0.55524,0.8176,0.821053,0.818053,0.815094
8,0.3265,0.517368,0.8321,0.836846,0.832271,0.832274
9,0.2784,0.581174,0.8247,0.832869,0.824669,0.824372
10,0.2222,0.500503,0.844,0.846665,0.844521,0.842609


TrainOutput(global_step=9390, training_loss=0.23816965082233832, metrics={'train_runtime': 5237.755, 'train_samples_per_second': 229.106, 'train_steps_per_second': 1.793, 'total_flos': 2.4241195302912e+18, 'train_loss': 0.23816965082233832, 'epoch': 30.0})

In [13]:
model.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [14]:
trainer.evaluate(test)

{'eval_loss': 0.7925772666931152,
 'eval_accuracy': 0.862,
 'eval_precision': 0.8657647562960742,
 'eval_recall': 0.8619999999999999,
 'eval_f1': 0.8617050752529256,
 'eval_runtime': 32.7672,
 'eval_samples_per_second': 305.183,
 'eval_steps_per_second': 2.411,
 'epoch': 30.0}

In [15]:
torch.save(model.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/random-base.pth")

In [None]:
base.count_parameters(model)

In [None]:
cpu_benchmark = base.BenchMarkRunner(model, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(model, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

## Definice destilačního tréninku

Třída, která upravuje hugging face trenéra pro destilaci znalostí. Nově pracuje s logity uloženými v datasetu.

### Trénink náhodně inicializovaného modelu s pomocí destilace znalostí

In [16]:
base.reset_seed()

In [17]:
student_model = base.get_random_init_mobilenet(10)

In [18]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/cifar10-random-KD", logging_dir=f"~/logs/{DATASET}/cifar10-random-KD", remove_unused_columns=False, epochs=30, lr=0.00047, weight_decay=0, adam_beta1=.9, lambda_param=1, temp=6)

In [19]:
trainer = base.DistilTrainer(
    student_model=student_model,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
)

In [20]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.5474,0.388791,0.5694,0.633921,0.569466,0.542884
2,0.3333,0.269868,0.715,0.723726,0.714784,0.715332
3,0.2566,0.224506,0.7636,0.769499,0.763049,0.763081
4,0.2107,0.190474,0.7903,0.798593,0.790343,0.789729
5,0.1785,0.16493,0.8148,0.82153,0.814791,0.815378
6,0.1548,0.168505,0.8054,0.828019,0.805212,0.807153
7,0.1351,0.171534,0.8131,0.826438,0.813839,0.81028
8,0.1183,0.153647,0.832,0.841779,0.832542,0.8318
9,0.1053,0.165944,0.8262,0.841171,0.825619,0.827467
10,0.093,0.137925,0.8428,0.854168,0.843346,0.840958


TrainOutput(global_step=9390, training_loss=0.10265842697744806, metrics={'train_runtime': 6377.4023, 'train_samples_per_second': 188.164, 'train_steps_per_second': 1.472, 'total_flos': 2.4241195302912e+18, 'train_loss': 0.10265842697744806, 'epoch': 30.0})

In [21]:
student_model.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [22]:
trainer.evaluate(test)

{'eval_loss': 0.10525421053171158,
 'eval_accuracy': 0.8714,
 'eval_precision': 0.8734816558916215,
 'eval_recall': 0.8714000000000001,
 'eval_f1': 0.8715830170061967,
 'eval_runtime': 32.8753,
 'eval_samples_per_second': 304.18,
 'eval_steps_per_second': 2.403,
 'epoch': 30.0}

In [None]:
torch.save(student_model.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/random-distill.pth")

In [None]:
base.count_parameters(student_model)

In [None]:
cpu_benchmark = base.BenchMarkRunner(student_model, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(student_model, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

## Získání inicializovaného MobileNetV2 modelu

In [24]:
base.reset_seed()

In [25]:
model_pretrained = base.get_mobilenet(10)

In [26]:
print(model_pretrained)

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [27]:
model_pretrained = base.freeze_model(model_pretrained)

In [28]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/cifar10-pretrained-head", logging_dir=f"~/logs/{DATASET}/cifar10-pretrained-head", epochs=15, lr=0.0005, weight_decay=0.008, adam_beta1=.95)

In [29]:
trainer = Trainer(
    model=model_pretrained,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 2)]
)

In [30]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.0722,0.805487,0.7257,0.734511,0.725454,0.725479
2,0.7568,0.74196,0.7462,0.757626,0.745023,0.747063
3,0.7093,0.719927,0.7539,0.755313,0.753178,0.752071
4,0.6893,0.684819,0.7641,0.764704,0.764243,0.762436
5,0.6721,0.716556,0.7489,0.75685,0.748897,0.74991
6,0.6663,0.67467,0.7691,0.770169,0.768903,0.768635
7,0.6548,0.684226,0.7668,0.766977,0.766616,0.763534
8,0.6485,0.679772,0.7651,0.771319,0.764866,0.764723


TrainOutput(global_step=2504, training_loss=0.7336705424153386, metrics={'train_runtime': 1079.6832, 'train_samples_per_second': 555.719, 'train_steps_per_second': 4.348, 'total_flos': 6.4643187474432e+17, 'train_loss': 0.7336705424153386, 'epoch': 8.0})

In [31]:
model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [32]:
trainer.evaluate(test)

{'eval_loss': 0.6756030321121216,
 'eval_accuracy': 0.7719,
 'eval_precision': 0.7725421423514587,
 'eval_recall': 0.7719,
 'eval_f1': 0.7716213917921004,
 'eval_runtime': 30.4119,
 'eval_samples_per_second': 328.818,
 'eval_steps_per_second': 2.598,
 'epoch': 8.0}

In [None]:
torch.save(model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/head-base.pth")

In [None]:
base.count_parameters(model_pretrained)

In [None]:
cpu_benchmark = base.BenchMarkRunner(model_pretrained, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(model_pretrained, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

### Trénink inicializovaného MobileNetV2

In [34]:
base.reset_seed()

In [35]:
model_pretrained_whole = base.get_mobilenet(10)

In [36]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/cifar10-pretrained", logging_dir=f"~/logs/{DATASET}/cifar10-pretrained", epochs=10, lr=0.0005, weight_decay=0.008, adam_beta1=.95)

In [37]:
trainer = Trainer(
    model=model_pretrained_whole,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

In [38]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.3987,0.277764,0.9045,0.909067,0.904586,0.904557
2,0.1729,0.251689,0.917,0.921033,0.916892,0.91795
3,0.1017,0.22691,0.929,0.930525,0.929145,0.928956
4,0.0621,0.231554,0.9338,0.934803,0.934095,0.933751
5,0.0377,0.240219,0.9391,0.939718,0.939365,0.939145
6,0.0206,0.263013,0.9372,0.939679,0.93712,0.93754
7,0.0119,0.256022,0.9411,0.9414,0.941476,0.940896
8,0.0059,0.24486,0.9485,0.94993,0.948719,0.948749
9,0.0013,0.384629,0.9281,0.935518,0.928183,0.928713
10,0.0007,0.218316,0.9497,0.949817,0.949927,0.949623


TrainOutput(global_step=3130, training_loss=0.08135286880948674, metrics={'train_runtime': 1991.6452, 'train_samples_per_second': 200.839, 'train_steps_per_second': 1.572, 'total_flos': 8.080398434304e+17, 'train_loss': 0.08135286880948674, 'epoch': 10.0})

In [39]:
model_pretrained_whole.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [40]:
trainer.evaluate(test)

{'eval_loss': 0.26257821917533875,
 'eval_accuracy': 0.9467,
 'eval_precision': 0.9468578238505667,
 'eval_recall': 0.9467000000000001,
 'eval_f1': 0.9465257834003598,
 'eval_runtime': 30.1268,
 'eval_samples_per_second': 331.931,
 'eval_steps_per_second': 2.622,
 'epoch': 10.0}

In [None]:
torch.save(model_pretrained_whole.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/pretrained-base.pth")

In [None]:
base.count_parameters(model_pretrained_whole)

In [None]:
cpu_benchmark = base.BenchMarkRunner(model_pretrained_whole, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(model_pretrained_whole, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

## Trénink s pomocí destilace znalostí inicializovaného MobileNetV2

### Trénink inicializovaného modelu - pouze klasifikační hlavy s pomocí destilace

In [42]:
base.reset_seed()

In [43]:
student_model_pretrained = base.get_mobilenet(10)

In [44]:
student_model_pretrained = base.freeze_model(student_model_pretrained)

In [48]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/cifar10-pretrained-head-KD", logging_dir=f"~/logs/{DATASET}/cifar10-pretrained-head-KD", remove_unused_columns=False, epochs=15, lr=0.00047, weight_decay=0, adam_beta1=.9, lambda_param=1, temp=6)

In [49]:
trainer = base.DistilTrainer(
    student_model=student_model_pretrained,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 2)]
)

In [50]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.407,0.345192,0.7152,0.741203,0.715037,0.715472
2,0.3406,0.327989,0.7375,0.742867,0.736843,0.736335
3,0.3308,0.324567,0.7433,0.746582,0.742526,0.742951
4,0.3271,0.314279,0.7444,0.75737,0.744623,0.744168
5,0.3248,0.322354,0.7369,0.751009,0.736714,0.739923
6,0.3231,0.312415,0.7476,0.754019,0.747367,0.747677
7,0.3219,0.320375,0.742,0.74894,0.742009,0.737723
8,0.3198,0.315109,0.7477,0.758157,0.747553,0.746934


TrainOutput(global_step=2504, training_loss=0.33687836217423217, metrics={'train_runtime': 1184.9913, 'train_samples_per_second': 506.333, 'train_steps_per_second': 3.962, 'total_flos': 6.4643187474432e+17, 'train_loss': 0.33687836217423217, 'epoch': 8.0})

In [51]:
student_model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [52]:
trainer.evaluate(test)

{'eval_loss': 0.3118944466114044,
 'eval_accuracy': 0.7496,
 'eval_precision': 0.7551763301271757,
 'eval_recall': 0.7495999999999999,
 'eval_f1': 0.7497581114499822,
 'eval_runtime': 35.5617,
 'eval_samples_per_second': 281.201,
 'eval_steps_per_second': 2.221,
 'epoch': 8.0}

In [None]:
torch.save(student_model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/head-distill.pth")

In [None]:
base.count_parameters(student_model_pretrained)

In [None]:
cpu_benchmark = base.BenchMarkRunner(student_model_pretrained, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(student_model_pretrained, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

### Trénink inicializovaného modelu s pomocí destilace

In [62]:
base.reset_seed()

In [63]:
student_model_pretrained_whole = base.get_mobilenet(10)

In [64]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/cifar10-pretrained-KD", logging_dir=f"~/logs/{DATASET}/cifar10-pretrained-KD", remove_unused_columns=False, epochs=10, lr=0.00047, weight_decay=0, adam_beta1=.9, lambda_param=1, temp=6)

In [65]:
trainer = base.DistilTrainer(
    student_model=student_model_pretrained_whole.to(device),
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

In [66]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,0.1314,0.098951,0.8866,0.900823,0.886535,0.887629
2,0.0572,0.068374,0.9306,0.932019,0.930707,0.931056
3,0.0382,0.060784,0.9283,0.931205,0.92866,0.928632
4,0.029,0.054761,0.9322,0.935587,0.932601,0.932449
5,0.0231,0.048829,0.9348,0.938034,0.935061,0.935196
6,0.0191,0.047473,0.9362,0.940337,0.936431,0.936803
7,0.0167,0.050216,0.936,0.937911,0.936542,0.935874
8,0.0147,0.044237,0.9377,0.943267,0.938126,0.937977
9,0.0131,0.050507,0.9371,0.9392,0.937318,0.937537
10,0.0116,0.044561,0.9397,0.943684,0.940108,0.939874


TrainOutput(global_step=3130, training_loss=0.035421487317679405, metrics={'train_runtime': 2124.0087, 'train_samples_per_second': 188.323, 'train_steps_per_second': 1.474, 'total_flos': 8.080398434304e+17, 'train_loss': 0.035421487317679405, 'epoch': 10.0})

In [None]:
student_model_pretrained_whole.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [68]:
trainer.evaluate(test)

{'eval_loss': 0.04614365100860596,
 'eval_accuracy': 0.9343,
 'eval_precision': 0.937537682703262,
 'eval_recall': 0.9343,
 'eval_f1': 0.9340889931506855,
 'eval_runtime': 34.0975,
 'eval_samples_per_second': 293.276,
 'eval_steps_per_second': 2.317,
 'epoch': 10.0}

In [None]:
torch.save(student_model_pretrained_whole.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/pretrained-distill.pth")

In [None]:
base.count_parameters(student_model_pretrained_whole)

In [None]:
cpu_benchmark = base.BenchMarkRunner(student_model_pretrained_whole, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(student_model_pretrained_whole, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())