# Notebook pro trénink s destilací nad datasetem CIFAR100
V tomto notebooku je trénován MobileNetV2 nad datasetem CIFAR100, jako učitelsý model je využíván finetunued ViT nad stejným datasetem. 

MobileNetV2 je používán s náhodnou inicializací, tréninkem pouze klasifikační hlavy inicializovaného (předtrénovaného nad ImageNetem) MobileNetuV2 a trénink celého modelu, taktéž inicializovaného. Tyto tři úlohy jsou trénovány bězným způsobem a také s pomocí destilace výše zmíněného modelu.  

Při destilaci je využíváno předpočítaných logitů ze sešitu precompute_logits.

## Import knihoven a definice metod

In [1]:
from transformers import Trainer, EarlyStoppingCallback
from torch.utils.data import DataLoader
import torch
import base
import os

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /home/jovyan/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!


In [None]:
dataset_part = base.get_dataset_part()
DATASET = "cifar100"

Inicializovaný MobileNetV2.

In [3]:
base.reset_seed()

In [4]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("GPU is available and will be used:", torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")

GPU is available and will be used: NVIDIA A100 80GB PCIe MIG 2g.20gb


Provedení transformací nad datasetem.

In [5]:
transform = base.base_transforms()

train = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.TRAIN, transform=transform)
eval = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.EVAL, transform=transform)
test = base.CustomCIFAR100L(root=f"{os.path.expanduser('~')}/data/100-logits", dataset_part=dataset_part.TEST, transform=transform)


In [6]:
train_part_cpu = base.CustomCIFAR100(root=f"{os.path.expanduser('~')}/data/100", train=True, transform=transform, device="cpu")
cpu_data_loader = DataLoader(train_part_cpu, batch_size=1, shuffle=False)
train_part_gpu = base.CustomCIFAR100(root=f"{os.path.expanduser('~')}/data/100", train=True, transform=transform, device="cuda")
gpu_data_loader = DataLoader(train_part_gpu, batch_size=1, shuffle=False)

### Standardní trénink náhodně inicializovaného modelu. 

In [7]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-basetrain", logging_dir=f"~/logs/{DATASET}/random-basetrain", lr=0.0004, weight_decay=0.01, warmup_steps=10, epochs=20)
model = base.get_random_init_mobilenet(100)

In [8]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

In [9]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,4.1766,3.61169,0.1255,0.133008,0.1255,0.094799
2,3.4796,3.049455,0.2376,0.237549,0.2376,0.212951
3,2.9878,2.619388,0.3086,0.307653,0.3086,0.282478
4,2.5994,2.461323,0.3501,0.378392,0.3501,0.33167
5,2.3006,2.125087,0.4202,0.428918,0.4202,0.408275
6,2.0321,1.985527,0.4591,0.468514,0.4591,0.451511
7,1.8221,1.882811,0.4831,0.491239,0.4831,0.474532
8,1.6099,1.874212,0.486,0.501931,0.486,0.478379
9,1.4157,1.887007,0.4939,0.504681,0.4939,0.48641
10,1.2296,1.763537,0.5192,0.529264,0.5192,0.516479


TrainOutput(global_step=4695, training_loss=1.82076535940932, metrics={'train_runtime': 1889.6508, 'train_samples_per_second': 423.359, 'train_steps_per_second': 3.313, 'total_flos': 1.2745355083776e+18, 'train_loss': 1.82076535940932, 'epoch': 15.0})

In [10]:
model.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [11]:
trainer.evaluate(test)

{'eval_loss': 1.7861734628677368,
 'eval_accuracy': 0.5262,
 'eval_precision': 0.5353124237729427,
 'eval_recall': 0.5262,
 'eval_f1': 0.522760067269561,
 'eval_runtime': 13.3476,
 'eval_samples_per_second': 749.196,
 'eval_steps_per_second': 5.919,
 'epoch': 15.0}

In [None]:
torch.save(model.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/random-basetrain.pth")

## Definice destilačního tréninku

Třída, která upravuje hugging face trenéra pro destilaci znalostí. Nově pracuje s logity uloženými v datasetu.

### Trénink náhodně inicializovaného modelu s pomocí destilace znalostí

In [12]:
base.reset_seed()

In [13]:
student_model = base.get_random_init_mobilenet(100)

In [14]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/random-distilltrain", logging_dir=f"~/logs/{DATASET}/random-distilltrain", remove_unused_columns=False, epochs=20, lr=0.0005, weight_decay=0.005, lambda_param=.3, temp=3)

In [15]:
trainer = base.DistilTrainer(
    student_model=student_model,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 5)]
)

In [16]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,5.1768,4.555348,0.1279,0.097177,0.1279,0.089569
2,4.4056,3.959158,0.241,0.249622,0.241,0.214913
3,3.8467,3.4338,0.3179,0.329225,0.3179,0.288892
4,3.4312,3.240775,0.3579,0.384904,0.3579,0.338049
5,3.0941,2.860819,0.4308,0.437902,0.4308,0.413779
6,2.8008,2.67527,0.4616,0.46804,0.4616,0.449108
7,2.5405,2.526206,0.4935,0.498147,0.4935,0.481563
8,2.3101,2.473935,0.4974,0.512868,0.4974,0.488989
9,2.108,2.519288,0.4975,0.517307,0.4975,0.489329
10,1.895,2.286786,0.5345,0.54483,0.5345,0.530388


TrainOutput(global_step=6260, training_loss=2.1294196326892596, metrics={'train_runtime': 2515.4951, 'train_samples_per_second': 318.029, 'train_steps_per_second': 2.489, 'total_flos': 1.6993806778368e+18, 'train_loss': 2.1294196326892596, 'epoch': 20.0})

In [17]:
student_model.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [18]:
trainer.evaluate(test)

{'eval_loss': 2.1033380031585693,
 'eval_accuracy': 0.5579,
 'eval_precision': 0.5816133681514294,
 'eval_recall': 0.5579,
 'eval_f1': 0.5601790504918533,
 'eval_runtime': 13.3924,
 'eval_samples_per_second': 746.694,
 'eval_steps_per_second': 5.899,
 'epoch': 20.0}

In [19]:
torch.save(student_model.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/random-distilltrain.pth")

## Získání inicializovaného MobileNetV2 modelu

In [7]:
base.reset_seed()

In [8]:
model_pretrained = base.get_mobilenet(100)

In [9]:
print(model_pretrained)

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [10]:
model_pretrained = base.freeze_model(model_pretrained)

In [11]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/head-basetrain", logging_dir=f"~/logs/{DATASET}/head-basetrain", epochs=20, lr=0.0007, weight_decay=0.002, warmup_steps=10)

In [12]:
trainer = Trainer(
    model=model_pretrained,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

In [13]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.804,2.126095,0.4793,0.517572,0.4793,0.468088
2,1.8344,1.871585,0.5088,0.535445,0.5088,0.508149
3,1.6364,1.777097,0.5298,0.540643,0.5298,0.52288
4,1.5309,1.817416,0.5209,0.546608,0.5209,0.517269
5,1.4635,1.736494,0.5338,0.550315,0.5338,0.528645
6,1.4067,1.683275,0.5491,0.554656,0.5491,0.544002
7,1.364,1.675491,0.5472,0.557886,0.5472,0.545631
8,1.3318,1.72439,0.5405,0.551298,0.5405,0.535631
9,1.3004,1.746469,0.5352,0.55646,0.5352,0.532149
10,1.2808,1.715653,0.5443,0.554998,0.5443,0.542364


TrainOutput(global_step=3130, training_loss=1.5952828794241714, metrics={'train_runtime': 521.3059, 'train_samples_per_second': 1534.607, 'train_steps_per_second': 12.008, 'total_flos': 8.496903389184e+17, 'train_loss': 1.5952828794241714, 'epoch': 10.0})

In [14]:
model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [15]:
trainer.evaluate(test)

{'eval_loss': 1.667069911956787,
 'eval_accuracy': 0.5489,
 'eval_precision': 0.5616031799329835,
 'eval_recall': 0.5488999999999999,
 'eval_f1': 0.5472081090392713,
 'eval_runtime': 13.107,
 'eval_samples_per_second': 762.953,
 'eval_steps_per_second': 6.027,
 'epoch': 10.0}

In [16]:
torch.save(model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/head-basetrain.pth")

### Trénink inicializovaného MobileNetV2

In [17]:
base.reset_seed()

In [18]:
model_pretrained_whole = base.get_mobilenet(100)

In [19]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-basetrain", logging_dir=f"~/logs/{DATASET}/pretrained-basetrain", epochs=20, lr=0.00045, weight_decay=0.008, warmup_steps=10)

In [20]:
trainer = Trainer(
    model=model_pretrained_whole,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

In [21]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,1.6459,1.094082,0.6828,0.711404,0.6828,0.679868
2,0.7523,0.999931,0.7055,0.731369,0.7055,0.70361
3,0.4659,0.951674,0.7335,0.754114,0.7335,0.732675
4,0.2988,1.081038,0.7175,0.753623,0.7175,0.721016
5,0.1994,1.037649,0.7323,0.755816,0.7323,0.730741
6,0.1404,0.995097,0.7535,0.767464,0.7535,0.753435
7,0.1007,1.067018,0.7465,0.760286,0.7465,0.746174
8,0.0798,1.130778,0.7389,0.758184,0.7389,0.740796
9,0.059,1.201012,0.7461,0.764508,0.7461,0.747616


TrainOutput(global_step=2817, training_loss=0.4157917948957449, metrics={'train_runtime': 684.1623, 'train_samples_per_second': 1169.313, 'train_steps_per_second': 9.15, 'total_flos': 7.6472130502656e+17, 'train_loss': 0.4157917948957449, 'epoch': 9.0})

In [22]:
model_pretrained_whole.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [23]:
trainer.evaluate(test)

{'eval_loss': 1.0088976621627808,
 'eval_accuracy': 0.7509,
 'eval_precision': 0.7678942172262897,
 'eval_recall': 0.7509,
 'eval_f1': 0.7517015832277407,
 'eval_runtime': 15.0075,
 'eval_samples_per_second': 666.335,
 'eval_steps_per_second': 5.264,
 'epoch': 9.0}

In [24]:
torch.save(model_pretrained_whole.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/pretrained-basetrain.pth")

## Trénink s pomocí destilace znalostí inicializovaného MobileNetV2

### Trénink inicializovaného modelu - pouze klasifikační hlavy s pomocí destilace

In [25]:
base.reset_seed()

In [26]:
student_model_pretrained = base.get_mobilenet(100)

In [27]:
student_model_pretrained = base.freeze_model(student_model_pretrained)

In [28]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/head-distilltrain", logging_dir=f"~/logs/{DATASET}/head-distilltrain", remove_unused_columns=False, epochs=20, lr=0.0018, weight_decay=.008, warmup_steps=15, lambda_param=.5, temp=6.5)

In [29]:
trainer = base.DistilTrainer(
    student_model=student_model_pretrained,
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

In [30]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.2858,1.978952,0.5023,0.54607,0.5023,0.496312
2,1.8895,1.934427,0.5105,0.548982,0.5105,0.510723
3,1.8298,1.897401,0.5306,0.551934,0.5306,0.522761
4,1.7957,1.927918,0.5199,0.542789,0.5199,0.514661
5,1.7792,1.885013,0.5303,0.554722,0.5303,0.524689
6,1.7578,1.845162,0.5403,0.550906,0.5403,0.533324
7,1.745,1.841347,0.5414,0.553636,0.5414,0.537634
8,1.734,1.863085,0.5385,0.550581,0.5385,0.53159
9,1.7198,1.888343,0.5325,0.557288,0.5325,0.529045
10,1.7139,1.865158,0.5312,0.550953,0.5312,0.528733


TrainOutput(global_step=3130, training_loss=1.8250659686688797, metrics={'train_runtime': 567.1775, 'train_samples_per_second': 1410.493, 'train_steps_per_second': 11.037, 'total_flos': 8.496903389184e+17, 'train_loss': 1.8250659686688797, 'epoch': 10.0})

In [31]:
student_model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [32]:
trainer.evaluate(test)

{'eval_loss': 1.7722300291061401,
 'eval_accuracy': 0.5421,
 'eval_precision': 0.5575726894510638,
 'eval_recall': 0.5421,
 'eval_f1': 0.5380948023141979,
 'eval_runtime': 13.2452,
 'eval_samples_per_second': 754.989,
 'eval_steps_per_second': 5.964,
 'epoch': 10.0}

In [33]:
torch.save(student_model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/head-distilltrain.pth")

### Trénink inicializovaného modelu s pomocí destilace

In [34]:
base.reset_seed()

In [35]:
student_model_pretrained_whole = base.get_mobilenet(100)

In [36]:
training_args = base.get_training_args(output_dir=f"~/results/{DATASET}/pretrained-distilltrain", logging_dir=f"~/logs/{DATASET}/pretrained-distilltrain", remove_unused_columns=False, epochs=20, lr=0.0006, weight_decay=.008, warmup_steps=30, lambda_param=.6, temp=4)

In [37]:
trainer = base.DistilTrainer(
    student_model=student_model_pretrained_whole.to(device),
    args=training_args,
    train_dataset=train,
    eval_dataset=eval,
    compute_metrics=base.compute_metrics,
    callbacks = [EarlyStoppingCallback(early_stopping_patience = 3)]
)

In [38]:
trainer.train()

Epoch,Training Loss,Validation Loss,Accuracy,Precision,Recall,F1
1,2.0931,1.387267,0.6834,0.708803,0.6834,0.677074
2,1.081,1.237874,0.7162,0.74284,0.7162,0.713802
3,0.7819,1.130511,0.7419,0.762151,0.7419,0.739457
4,0.5969,1.147801,0.7383,0.765169,0.7383,0.740055
5,0.4811,1.077415,0.7559,0.770175,0.7559,0.754137
6,0.3966,0.984049,0.7733,0.784696,0.7733,0.773403
7,0.3397,1.028793,0.7658,0.782588,0.7658,0.766467
8,0.2957,1.028331,0.7661,0.785748,0.7661,0.769125
9,0.2623,1.049195,0.7594,0.779363,0.7594,0.761152


TrainOutput(global_step=2817, training_loss=0.7031379919082593, metrics={'train_runtime': 677.1355, 'train_samples_per_second': 1181.447, 'train_steps_per_second': 9.245, 'total_flos': 7.6472130502656e+17, 'train_loss': 0.7031379919082593, 'epoch': 9.0})

In [39]:
student_model_pretrained.eval()

MobileNetV2ForImageClassification(
  (mobilenet_v2): MobileNetV2Model(
    (conv_stem): MobileNetV2Stem(
      (first_conv): MobileNetV2ConvLayer(
        (convolution): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (conv_3x3): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32, bias=False)
        (normalization): BatchNorm2d(32, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
        (activation): ReLU6()
      )
      (reduce_1x1): MobileNetV2ConvLayer(
        (convolution): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (normalization): BatchNorm2d(16, eps=0.001, momentum=0.997, affine=True, track_running_stats=True)
      )
    )
    (layer): ModuleList(
      (0): MobileNetV2InvertedResidual(
        (expand_1x1): MobileNe

In [40]:
trainer.evaluate(test)

{'eval_loss': 0.9045265913009644,
 'eval_accuracy': 0.7633,
 'eval_precision': 0.7749853333712801,
 'eval_recall': 0.7633,
 'eval_f1': 0.7629852925213931,
 'eval_runtime': 12.6789,
 'eval_samples_per_second': 788.71,
 'eval_steps_per_second': 6.231,
 'epoch': 9.0}

In [41]:
torch.save(student_model_pretrained.state_dict(), f"{os.path.expanduser('~')}/models/{DATASET}/pretrained-distilltrain.pth")

In [42]:
base.count_parameters(student_model_pretrained_whole)

model size: 9.103MB.
Total Trainable Params: 2351972.


Unnamed: 0,Modules,Parameters
0,mobilenet_v2.conv_stem.first_conv.convolution....,864
1,mobilenet_v2.conv_stem.first_conv.normalizatio...,32
2,mobilenet_v2.conv_stem.first_conv.normalizatio...,32
3,mobilenet_v2.conv_stem.conv_3x3.convolution.we...,288
4,mobilenet_v2.conv_stem.conv_3x3.normalization....,32
...,...,...
153,mobilenet_v2.conv_1x1.convolution.weight,409600
154,mobilenet_v2.conv_1x1.normalization.weight,1280
155,mobilenet_v2.conv_1x1.normalization.bias,1280
156,classifier.weight,128000


In [43]:
base.count_parameters(student_model_pretrained)

model size: 9.103MB.
Total Trainable Params: 128100.


Unnamed: 0,Modules,Parameters
0,classifier.weight,128000
1,classifier.bias,100


In [44]:
cpu_benchmark = base.BenchMarkRunner(student_model_pretrained_whole, cpu_data_loader, "cpu", 1000)
print(cpu_benchmark.run_benchmark())

<torch.utils.benchmark.utils.common.Measurement object at 0x766fd9616950>
self.infer_speed_comp()
  31.66 ms
  1 measurement, 1000 runs , 1 thread


In [45]:
gpu_benchmark = base.BenchMarkRunner(student_model_pretrained_whole, gpu_data_loader, "cuda", 1000)
print(gpu_benchmark.run_benchmark())

<torch.utils.benchmark.utils.common.Measurement object at 0x766e6d7b7dc0>
self.infer_speed_comp()
  8.68 ms
  1 measurement, 1000 runs , 1 thread
