# Notebook pro trénink s destilací nad datasetem CIFAR100
V tomto notebooku je trénován MobileNetV2 nad datasetem CIFAR100, jako učitelsý model je využíván finetunued ViT nad stejným datasetem. 

MobileNetV2 je používán s náhodnou inicializací, tréninkem pouze klasifikační hlavy inicializovaného (předtrénovaného nad ImageNetem) MobileNetuV2 a trénink celého modelu, taktéž inicializovaného. Tyto tři úlohy jsou trénovány bězným způsobem a také s pomocí destilace výše zmíněného modelu.  

Při destilaci je využíváno předpočítaných logitů ze sešitu precompute_logits.

## Import knihoven a definice metod

In [1]:
from transformers import Trainer, EarlyStoppingCallback
from torch.utils.data import DataLoader
import torch
import base
import os

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


In [2]:
dataset_part = base.get_dataset_part()
DATASET = "cifar100"

Inicializovaný MobileNetV2.

In [3]:
base.reset_seed()

In [4]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Provedení transformací nad datasetem.

In [5]:
transform = base.base_transforms()

train = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.TRAIN, transform=transform)
eval = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.EVAL, transform=transform)
test = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.TEST, transform=transform)


In [None]:
train_part_cpu = base.CustomCIFAR100(root=f"{os.path.expanduser('~')}/data/100", train=True, transform=transform, device="cpu")
cpu_data_loader = DataLoader(train_part_cpu, batch_size=1, shuffle=False)
train_part_gpu = base.CustomCIFAR100(root=f"{os.path.expanduser('~')}/data/100", train=True, transform=transform, device="cuda")
gpu_data_loader = DataLoader(train_part_gpu, batch_size=1, shuffle=False)

### Standardní trénink náhodně inicializovaného modelu. 

In [6]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-basetrain", logging_dir=f"~/logs/{DATASET}/random-basetrain", lr=0.0005, weight_decay=0.008, adam_beta1=.95, epochs=30)
model = base.get_random_init_mobilenet(100)

In [7]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
)

In [8]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1641,3.671114,0.1142,0.08698,0.1142,0.076501
2,3.5356,3.15366,0.2068,0.206148,0.2068,0.179577
3,3.0818,2.802437,0.2791,0.273674,0.2791,0.251028
4,2.7302,2.581651,0.3208,0.354018,0.3208,0.300799
5,2.4497,2.251105,0.3975,0.405736,0.3975,0.383424
6,2.1999,2.120491,0.4236,0.439504,0.4236,0.414884
7,2.0062,2.038861,0.4471,0.457363,0.4471,0.438329
8,1.8164,2.002307,0.4577,0.482751,0.4577,0.452542
9,1.6438,1.983347,0.4695,0.472717,0.4695,0.457576
10,1.4695,1.8224,0.5082,0.520621,0.5082,0.505719


TrainOutput(global_step=5947, training_loss=1.691373352646046, metrics={'train_runtime': 4251.6575, 'train_samples_per_second': 282.243, 'train_steps_per_second': 2.209, 'total_flos': 1.61441164394496e+18, 'train_loss': 1.691373352646046, 'epoch': 19.0})

In [9]:
model.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [10]:
trainer.evaluate(test)

{'eval_loss': 1.8386343717575073,
 'eval_accuracy': 0.522,
 'eval_precision': 0.5407068801158281,
 'eval_recall': 0.522,
 'eval_f1': 0.5205869502384851,
 'eval_runtime': 32.2135,
 'eval_samples_per_second': 310.429,
 'eval_steps_per_second': 2.452,
 'epoch': 19.0}

In [11]:
torch.save(model.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/random-basetrain.pth")

In [None]:
base.count_parameters(model)

In [None]:
cpu_benchmark = base.BenchMarkRunner(model, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(model, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

## Definice destilačního tréninku

Třída, která upravuje hugging face trenéra pro destilaci znalostí. Nově pracuje s logity uloženými v datasetu.

### Trénink náhodně inicializovaného modelu s pomocí destilace znalostí

In [12]:
base.reset_seed()

In [13]:
student_model = base.get_random_init_mobilenet(100)

In [14]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-distilltrain", logging_dir=f"~/logs/{DATASET}/random-distilltrain", remove_unused_columns=False, epochs=30, lr=0.00047, weight_decay=0, adam_beta1=.9, lambda_param=1, temp=6)

In [15]:
trainer = base.DistilTrainer(
    student_model=student_model,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
)

In [16]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.8512,2.647862,0.0891,0.049531,0.0891,0.044646
2,2.5561,2.371657,0.1671,0.150112,0.1671,0.108691
3,2.3443,2.184198,0.2121,0.218822,0.2121,0.157513
4,2.1482,2.031065,0.2844,0.298528,0.2844,0.241592
5,1.9769,1.831024,0.3356,0.356112,0.3356,0.289144
6,1.8344,1.706426,0.3763,0.416747,0.3763,0.336914
7,1.7169,1.634481,0.4178,0.478688,0.4178,0.383779
8,1.604,1.599871,0.4272,0.464101,0.4272,0.399841
9,1.5078,1.575131,0.4511,0.475925,0.4511,0.425779
10,1.4138,1.451572,0.4727,0.52569,0.4727,0.454491


TrainOutput(global_step=9390, training_loss=1.2322501461853743, metrics={'train_runtime': 6384.5322, 'train_samples_per_second': 187.954, 'train_steps_per_second': 1.471, 'total_flos': 2.5490710167552e+18, 'train_loss': 1.2322501461853743, 'epoch': 30.0})

In [17]:
student_model.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [18]:
trainer.evaluate(test)

{'eval_loss': 1.09884512424469,
 'eval_accuracy': 0.5504,
 'eval_precision': 0.5808980316978757,
 'eval_recall': 0.5504,
 'eval_f1': 0.5546805225416259,
 'eval_runtime': 37.502,
 'eval_samples_per_second': 266.653,
 'eval_steps_per_second': 2.107,
 'epoch': 30.0}

In [19]:
torch.save(student_model.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/random-distilltrain.pth")

In [None]:
base.count_parameters(student_model)

In [None]:
cpu_benchmark = base.BenchMarkRunner(student_model, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(student_model, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

## Získání inicializovaného MobileNetV2 modelu

In [20]:
base.reset_seed()

In [21]:
model_pretrained = base.get_mobilenet(100)

In [22]:
print(model_pretrained)

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [23]:
model_pretrained = base.freeze_model(model_pretrained)

In [24]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/head-basetrain", logging_dir=f"~/logs/{DATASET}/head-basetrain", epochs=30, lr=0.0005, weight_decay=0.008, adam_beta1=.95)

In [25]:
trainer = Trainer(
    model=model_pretrained,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 4)]
)

In [26]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.9836,2.269643,0.4622,0.500246,0.4622,0.452878
2,1.9461,1.93454,0.5006,0.527785,0.5006,0.500196
3,1.7179,1.819396,0.5246,0.531786,0.5246,0.517029
4,1.6008,1.842921,0.5191,0.543362,0.5191,0.515463
5,1.5256,1.753413,0.5344,0.547202,0.5344,0.528293
6,1.4659,1.708714,0.5487,0.559242,0.5487,0.544608
7,1.4208,1.691448,0.5462,0.55876,0.5462,0.544824
8,1.3867,1.725464,0.5397,0.547705,0.5397,0.534985
9,1.3538,1.742697,0.5364,0.553864,0.5364,0.532641
10,1.332,1.715718,0.5451,0.556272,0.5451,0.543123


TrainOutput(global_step=3443, training_loss=1.6397477104814115, metrics={'train_runtime': 1384.4762, 'train_samples_per_second': 866.754, 'train_steps_per_second': 6.782, 'total_flos': 9.3465937281024e+17, 'train_loss': 1.6397477104814115, 'epoch': 11.0})

In [27]:
model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [28]:
trainer.evaluate(test)

{'eval_loss': 1.6807736158370972,
 'eval_accuracy': 0.5448,
 'eval_precision': 0.5554512716827431,
 'eval_recall': 0.5448,
 'eval_f1': 0.5425965178184046,
 'eval_runtime': 28.3343,
 'eval_samples_per_second': 352.929,
 'eval_steps_per_second': 2.788,
 'epoch': 11.0}

In [29]:
torch.save(model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/head-basetrain.pth")

In [None]:
base.count_parameters(model_pretrained)

In [None]:
cpu_benchmark = base.BenchMarkRunner(model_pretrained, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(model_pretrained, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

### Trénink inicializovaného MobileNetV2

In [30]:
base.reset_seed()

In [31]:
model_pretrained_whole = base.get_mobilenet(100)

In [32]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-basetrain", logging_dir=f"~/logs/{DATASET}/pretrained-basetrain", epochs=20, lr=0.0005, weight_decay=0.008, adam_beta1=.95)

In [33]:
trainer = Trainer(
    model=model_pretrained_whole,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
)

In [34]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6266,1.122238,0.6742,0.701454,0.6742,0.671801
2,0.7777,1.028637,0.701,0.726411,0.701,0.700547
3,0.494,0.974911,0.7327,0.750675,0.7327,0.730586
4,0.3234,1.037076,0.727,0.753051,0.727,0.728092
5,0.2229,1.043154,0.7364,0.752796,0.7364,0.735159
6,0.1575,1.016334,0.7492,0.76403,0.7492,0.750541
7,0.1173,1.069007,0.7559,0.767965,0.7559,0.755248
8,0.0857,1.114457,0.7449,0.757896,0.7449,0.74609
9,0.0694,1.209292,0.7414,0.756156,0.7414,0.741349
10,0.0504,1.162453,0.758,0.76861,0.758,0.75792


TrainOutput(global_step=6260, training_loss=0.20208128212025753, metrics={'train_runtime': 2833.2, 'train_samples_per_second': 282.366, 'train_steps_per_second': 2.21, 'total_flos': 1.6993806778368e+18, 'train_loss': 0.20208128212025753, 'epoch': 20.0})

In [35]:
model_pretrained_whole.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [36]:
trainer.evaluate(test)

{'eval_loss': 1.1938177347183228,
 'eval_accuracy': 0.7781,
 'eval_precision': 0.7872945120038796,
 'eval_recall': 0.7780999999999999,
 'eval_f1': 0.7790445312198271,
 'eval_runtime': 22.1271,
 'eval_samples_per_second': 451.935,
 'eval_steps_per_second': 3.57,
 'epoch': 20.0}

In [37]:
torch.save(model_pretrained_whole.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/pretrained-basetrain.pth")

In [None]:
base.count_parameters(model_pretrained_whole)

In [None]:
cpu_benchmark = base.BenchMarkRunner(model_pretrained_whole, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(model_pretrained_whole, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

## Trénink s pomocí destilace znalostí inicializovaného MobileNetV2

### Trénink inicializovaného modelu - pouze klasifikační hlavy s pomocí destilace

In [38]:
base.reset_seed()

In [39]:
student_model_pretrained = base.get_mobilenet(100)

In [40]:
student_model_pretrained = base.freeze_model(student_model_pretrained)

In [44]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/head-distilltrain", logging_dir=f"~/logs/{DATASET}/head-distilltrain", remove_unused_columns=False, epochs=30, lr=0.00047, weight_decay=0, adam_beta1=.9, lambda_param=1, temp=6)

In [45]:
trainer = base.DistilTrainer(
    student_model=student_model_pretrained,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 2)]
)

In [46]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.3972,2.131692,0.4195,0.520629,0.4195,0.401676
2,2.0407,2.002115,0.4697,0.523559,0.4697,0.463499
3,1.9609,1.947987,0.4792,0.516946,0.4792,0.461773
4,1.922,1.956555,0.4767,0.518447,0.4767,0.46475
5,1.9023,1.91502,0.4876,0.525532,0.4876,0.474504
6,1.8847,1.886172,0.5027,0.526543,0.5027,0.489904
7,1.8748,1.870293,0.5059,0.53159,0.5059,0.492473
8,1.8661,1.891301,0.4958,0.523303,0.4958,0.482693
9,1.8597,1.895612,0.5021,0.531784,0.5021,0.492652
10,1.8559,1.888642,0.4934,0.53109,0.4934,0.4846


TrainOutput(global_step=3443, training_loss=1.9468287352451985, metrics={'train_runtime': 1635.7003, 'train_samples_per_second': 733.631, 'train_steps_per_second': 5.741, 'total_flos': 9.3465937281024e+17, 'train_loss': 1.9468287352451985, 'epoch': 11.0})

In [47]:
student_model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [48]:
trainer.evaluate(test)

{'eval_loss': 1.7363566160202026,
 'eval_accuracy': 0.5033,
 'eval_precision': 0.5358871798854771,
 'eval_recall': 0.5032999999999999,
 'eval_f1': 0.4935532209138579,
 'eval_runtime': 33.5165,
 'eval_samples_per_second': 298.36,
 'eval_steps_per_second': 2.357,
 'epoch': 11.0}

In [49]:
torch.save(student_model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/head-distilltrain.pth")

In [None]:
base.count_parameters(student_model_pretrained)

In [None]:
cpu_benchmark = base.BenchMarkRunner(student_model_pretrained, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(student_model_pretrained, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

### Trénink inicializovaného modelu s pomocí destilace

In [6]:
base.reset_seed()

In [7]:
student_model_pretrained_whole = base.get_mobilenet(100)

In [60]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-distilltrain", logging_dir=f"~/logs/{DATASET}/pretrained-distilltrain", remove_unused_columns=False, epochs=20, lr=0.00047, weight_decay=0, adam_beta1=.9, lambda_param=1, temp=6)

In [61]:
trainer = base.DistilTrainer(
    student_model=student_model_pretrained_whole.to(device),
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
)

In [62]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.3079,0.872937,0.6924,0.716092,0.6924,0.688613
2,0.7046,0.798874,0.7137,0.738727,0.7137,0.713146
3,0.5144,0.728881,0.7369,0.759275,0.7369,0.73523
4,0.3995,0.73866,0.731,0.755904,0.731,0.731919
5,0.3272,0.679789,0.7501,0.769699,0.7501,0.74942
6,0.2742,0.64768,0.7578,0.777823,0.7578,0.758964
7,0.239,0.648104,0.7537,0.769453,0.7537,0.753817
8,0.2117,0.658245,0.7452,0.771381,0.7452,0.749592
9,0.1906,0.669629,0.7468,0.768676,0.7468,0.748455
10,0.1732,0.628112,0.7566,0.775906,0.7566,0.757682


TrainOutput(global_step=6260, training_loss=0.2772444731130387, metrics={'train_runtime': 3276.5856, 'train_samples_per_second': 244.157, 'train_steps_per_second': 1.911, 'total_flos': 1.6993806778368e+18, 'train_loss': 0.2772444731130387, 'epoch': 20.0})

In [63]:
student_model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [64]:
trainer.evaluate(test)

{'eval_loss': 0.46802881360054016,
 'eval_accuracy': 0.7715,
 'eval_precision': 0.786703734949406,
 'eval_recall': 0.7715000000000002,
 'eval_f1': 0.7733199635641056,
 'eval_runtime': 12.3807,
 'eval_samples_per_second': 807.707,
 'eval_steps_per_second': 6.381,
 'epoch': 20.0}

In [65]:
torch.save(student_model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/pretrained-distilltrain.pth")

In [None]:
base.count_parameters(student_model_pretrained_whole)

In [None]:
cpu_benchmark = base.BenchMarkRunner(student_model_pretrained_whole, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

In [None]:
gpu_benchmark = base.BenchMarkRunner(student_model_pretrained_whole, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())