# 1. Data & Model Loading

This notebook prepares the data and models used for the subsequent optimisation pipeline. This is to emulate a non-compressed model training and evaluation process, where the model is adapted to a specific dataset and then exported for further compression for embedded deployment.

The process is defined as such:
* A Torch dataset (already split into train and val) and model are loaded. Those must be specialized for classification tasks, but are agnostic
of the modality.
* The model"s classification head is adapted to the number of classes in the dataset, trained on the training set while freezing the backbone, and evaluated on the validation set.
* The whole model (backbone + classification head) is then adapted to the dataset by freezing all layers except the classification head, which is trained on the training set.
* The adapted model is then exported as a Torch model for later use in the optimisation pipeline.

An image MobileNetV2 model with a classification head adapted to the CIFAR-10 dataset is used as an example in this notebook.

## Setup

In [1]:
import torch

from nnopt.model.train import adapt_model_head_to_dataset
from nnopt.model.eval import eval_model
from nnopt.model.const import DEVICE, DTYPE, AMP_ENABLE
from nnopt.recipes.mobilenetv2_cifar10 import init_mobilenetv2_cifar10_model, get_cifar10_datasets, save_mobilenetv2_cifar10_model, load_mobilenetv2_cifar10_model

2025-06-13 18:07:44,607 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Using device: cuda, dtype: torch.bfloat16


# MobileNetV2 and CIFAR-10 adaptation

In [2]:
mobilenetv2 = init_mobilenetv2_cifar10_model()
cifar10_train_dataset, cifar10_val_dataset, cifar10_test_dataset = get_cifar10_datasets()

# Adapt the MobileNetV2 model to CIFAR-10 dataset
mobilenetv2_cifar10_baseline = adapt_model_head_to_dataset(
    model=mobilenetv2,
    train_dataset=cifar10_train_dataset,
    val_dataset=cifar10_val_dataset,
    batch_size=64,  # Adjust batch size as needed
    head_train_epochs=5,  # Train head for fewer epochs
    fine_tune_epochs=5,  # Fine-tune for fewer epochs
    optimizer_cls=torch.optim.Adam,  # Use Adam optimizer
    head_train_lr=0.001,  # Learning rate for head training
    fine_tune_lr=0.0001,  # Learning rate for fine-tuning
    use_amp=AMP_ENABLE,  # Use mixed precision training for efficiency
    device=DEVICE, # Should be CUDA is available or CPU
    dtype=DTYPE # Should be torch.float32 or torch.float16
)

2025-06-13 18:07:44,613 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading MobileNetV2 model with weights: MobileNet_V2_Weights.IMAGENET1K_V1, to_quantize: False, is_quantized: False
2025-06-13 18:07:44,701 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Replacing head of the model to match 10 classes
2025-06-13 18:07:44,703 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading existing training and validation datasets...
2025-06-13 18:07:46,455 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading existing test dataset...
2025-06-13 18:07:46,780 - nnopt.model.train - INFO - Training head of the model with backbone frozen...
Epoch 1/5 [Training]: 100%|██████████| 704/704 [00:38<00:00, 18.34it/s, acc=0.4881, cpu=2.4%, gpu_mem=15.4/24.0GB (64.2%), gpu_util=40.0%, loss=1.4172, ram=9.8/30.9GB (41.4%), samples/s=345.1]  
Epoch 1/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 39.74it/s, acc=0.6194, cpu=3.7%, gpu_mem=15.4/24.0GB (64.2%), gpu_util=39.0%, loss=1.0402, ram=9.8/30.9GB (41.4%),

Epoch 1/5, Train Loss: 1.4505, Train Acc: 0.4881, Train Throughput: 4092.43 samples/s | Val Loss: 1.0513, Val Acc: 0.6194, Val Throughput: 7841.12 samples/s | CPU Usage: 12.10% | RAM Usage: 9.6/30.9GB (40.6%) | GPU 0 Util: 34.00% | GPU 0 Mem: 15.4/24.0GB (64.2%)


Epoch 2/5 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.12it/s, acc=0.5297, cpu=5.6%, gpu_mem=15.4/24.0GB (64.3%), gpu_util=37.0%, loss=1.4813, ram=9.8/30.9GB (41.3%), samples/s=961.2]  
Epoch 2/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 40.03it/s, acc=0.6704, cpu=3.7%, gpu_mem=15.4/24.0GB (64.3%), gpu_util=40.0%, loss=0.9317, ram=9.7/30.9GB (41.2%), samples/s=1303.1] 


Epoch 2/5, Train Loss: 1.3348, Train Acc: 0.5297, Train Throughput: 4176.73 samples/s | Val Loss: 0.9389, Val Acc: 0.6704, Val Throughput: 8026.04 samples/s | CPU Usage: 12.00% | RAM Usage: 9.6/30.9GB (40.6%) | GPU 0 Util: 39.00% | GPU 0 Mem: 15.4/24.0GB (64.3%)


Epoch 3/5 [Training]: 100%|██████████| 704/704 [00:37<00:00, 18.68it/s, acc=0.5333, cpu=5.6%, gpu_mem=15.4/24.0GB (64.2%), gpu_util=33.0%, loss=1.7383, ram=9.7/30.9GB (41.1%), samples/s=940.3]  
Epoch 3/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 40.23it/s, acc=0.6770, cpu=3.6%, gpu_mem=15.4/24.0GB (64.2%), gpu_util=39.0%, loss=0.9623, ram=9.7/30.9GB (41.1%), samples/s=1302.2] 


Epoch 3/5, Train Loss: 1.3243, Train Acc: 0.5333, Train Throughput: 4035.04 samples/s | Val Loss: 0.9315, Val Acc: 0.6770, Val Throughput: 7997.55 samples/s | CPU Usage: 12.20% | RAM Usage: 9.4/30.9GB (40.2%) | GPU 0 Util: 39.00% | GPU 0 Mem: 15.4/24.0GB (64.2%)


Epoch 4/5 [Training]: 100%|██████████| 704/704 [00:37<00:00, 18.80it/s, acc=0.5334, cpu=5.4%, gpu_mem=15.4/24.0GB (64.2%), gpu_util=39.0%, loss=1.2960, ram=9.7/30.9GB (41.0%), samples/s=930.2]  
Epoch 4/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 39.82it/s, acc=0.6398, cpu=3.4%, gpu_mem=15.4/24.0GB (64.2%), gpu_util=43.0%, loss=0.9943, ram=9.7/30.9GB (41.1%), samples/s=1299.4] 


Epoch 4/5, Train Loss: 1.3266, Train Acc: 0.5334, Train Throughput: 4306.88 samples/s | Val Loss: 0.9941, Val Acc: 0.6398, Val Throughput: 7889.46 samples/s | CPU Usage: 10.40% | RAM Usage: 9.5/30.9GB (40.3%) | GPU 0 Util: 43.00% | GPU 0 Mem: 15.4/24.0GB (64.2%)


Epoch 5/5 [Training]: 100%|██████████| 704/704 [00:37<00:00, 18.80it/s, acc=0.5359, cpu=0.0%, gpu_mem=15.4/24.0GB (64.2%), gpu_util=37.0%, loss=1.1995, ram=9.7/30.9GB (41.0%), samples/s=985.1]  
Epoch 5/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 39.53it/s, acc=0.6818, cpu=0.0%, gpu_mem=15.4/24.0GB (64.2%), gpu_util=39.0%, loss=1.0843, ram=9.7/30.9GB (40.9%), samples/s=1311.2]  
2025-06-13 18:11:04,517 - nnopt.model.train - INFO - Fine-tuning full model...


Epoch 5/5, Train Loss: 1.3203, Train Acc: 0.5359, Train Throughput: 4359.81 samples/s | Val Loss: 0.9001, Val Acc: 0.6818, Val Throughput: 8361.36 samples/s | CPU Usage: 11.70% | RAM Usage: 9.5/30.9GB (40.3%) | GPU 0 Util: 39.00% | GPU 0 Mem: 15.4/24.0GB (64.2%)


Epoch 1/5 [Training]: 100%|██████████| 704/704 [00:38<00:00, 18.50it/s, acc=0.6839, cpu=2.9%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=58.0%, loss=0.3977, ram=9.8/30.9GB (41.2%), samples/s=156.5]  
Epoch 1/5 [Validation]: 100%|██████████| 79/79 [00:02<00:00, 38.89it/s, acc=0.8712, cpu=3.6%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=38.0%, loss=1.2380, ram=9.8/30.9GB (41.3%), samples/s=1283.5] 


Epoch 1/5, Train Loss: 0.9063, Train Acc: 0.6839, Train Throughput: 1830.57 samples/s | Val Loss: 0.3747, Val Acc: 0.8712, Val Throughput: 8494.87 samples/s | CPU Usage: 11.50% | RAM Usage: 9.5/30.9GB (40.5%) | GPU 0 Util: 38.00% | GPU 0 Mem: 17.9/24.0GB (74.5%)


Epoch 2/5 [Training]: 100%|██████████| 704/704 [00:38<00:00, 18.11it/s, acc=0.7584, cpu=4.5%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=67.0%, loss=0.5516, ram=9.7/30.9GB (41.2%), samples/s=455.5]  
Epoch 2/5 [Validation]: 100%|██████████| 79/79 [00:02<00:00, 38.70it/s, acc=0.9080, cpu=6.5%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=39.0%, loss=0.7174, ram=9.8/30.9GB (41.4%), samples/s=1222.6]  


Epoch 2/5, Train Loss: 0.6968, Train Acc: 0.7584, Train Throughput: 1902.84 samples/s | Val Loss: 0.2624, Val Acc: 0.9080, Val Throughput: 8351.48 samples/s | CPU Usage: 11.10% | RAM Usage: 9.5/30.9GB (40.5%) | GPU 0 Util: 39.00% | GPU 0 Mem: 17.9/24.0GB (74.5%)


Epoch 3/5 [Training]: 100%|██████████| 704/704 [00:37<00:00, 18.62it/s, acc=0.7807, cpu=3.2%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=71.0%, loss=0.8692, ram=9.8/30.9GB (41.2%), samples/s=466.6]  
Epoch 3/5 [Validation]: 100%|██████████| 79/79 [00:02<00:00, 39.24it/s, acc=0.9122, cpu=3.3%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=39.0%, loss=1.3442, ram=9.7/30.9GB (41.2%), samples/s=1278.9] 


Epoch 3/5, Train Loss: 0.6282, Train Acc: 0.7807, Train Throughput: 1900.33 samples/s | Val Loss: 0.2616, Val Acc: 0.9122, Val Throughput: 8474.21 samples/s | CPU Usage: 10.00% | RAM Usage: 9.5/30.9GB (40.5%) | GPU 0 Util: 39.00% | GPU 0 Mem: 17.9/24.0GB (74.5%)


Epoch 4/5 [Training]: 100%|██████████| 704/704 [00:38<00:00, 18.48it/s, acc=0.7961, cpu=3.0%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=71.0%, loss=0.4151, ram=9.8/30.9GB (41.5%), samples/s=447.2]  
Epoch 4/5 [Validation]: 100%|██████████| 79/79 [00:02<00:00, 38.74it/s, acc=0.9220, cpu=3.1%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=42.0%, loss=0.5584, ram=9.8/30.9GB (41.5%), samples/s=1114.1] 


Epoch 4/5, Train Loss: 0.5837, Train Acc: 0.7961, Train Throughput: 1869.21 samples/s | Val Loss: 0.2263, Val Acc: 0.9220, Val Throughput: 8209.90 samples/s | CPU Usage: 11.90% | RAM Usage: 9.5/30.9GB (40.5%) | GPU 0 Util: 39.00% | GPU 0 Mem: 17.9/24.0GB (74.5%)


Epoch 5/5 [Training]: 100%|██████████| 704/704 [00:38<00:00, 18.42it/s, acc=0.8116, cpu=3.2%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=65.0%, loss=0.1891, ram=9.8/30.9GB (41.5%), samples/s=459.1]  
Epoch 5/5 [Validation]: 100%|██████████| 79/79 [00:02<00:00, 38.76it/s, acc=0.9264, cpu=3.4%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=41.0%, loss=0.4238, ram=9.8/30.9GB (41.4%), samples/s=1220.0] 

Epoch 5/5, Train Loss: 0.5419, Train Acc: 0.8116, Train Throughput: 1841.09 samples/s | Val Loss: 0.2117, Val Acc: 0.9264, Val Throughput: 8313.05 samples/s | CPU Usage: 13.50% | RAM Usage: 9.6/30.9GB (40.6%) | GPU 0 Util: 41.00% | GPU 0 Mem: 17.9/24.0GB (74.5%)





In [3]:
# Evaluate the adapted model on the validation and test set
val_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_val_dataset,
    batch_size=64,  # Adjust batch size as needed
    device=DEVICE,
    use_amp=AMP_ENABLE,
    dtype=DTYPE
)

test_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_test_dataset,
    batch_size=64,  # Adjust batch size as needed
    device=DEVICE,
    use_amp=AMP_ENABLE,
    dtype=DTYPE
)
print(f"Validation accuracy of the adapted MobileNetV2 on CIFAR-10: {val_metrics['accuracy']:.2f}")
print(f"Test accuracy of the adapted MobileNetV2 on CIFAR-10: {test_metrics['accuracy']:.2f}")

2025-06-13 18:14:25,774 - nnopt.model.eval - INFO - Starting evaluation on device: cuda, dtype: torch.bfloat16, batch size: 64
2025-06-13 18:14:25,778 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 11.74it/s]
2025-06-13 18:14:26,286 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 79/79 [00:02<00:00, 39.02it/s, acc=0.9264, cpu=3.7%, gpu_mem=17.9/24.0GB (74.6%), gpu_util=43.0%, loss=0.4238, ram=9.8/30.9GB (41.4%), samples/s=1340.9] 
2025-06-13 18:14:28,317 - nnopt.model.eval - INFO - Starting evaluation on device: cuda, dtype: torch.bfloat16, batch size: 64
2025-06-13 18:14:28,320 - nnopt.model.eval - INFO - Starting warmup for 5 batches...


Evaluation Complete: Avg Loss: 0.2117, Accuracy: 0.9264
Throughput: 7964.50 samples/sec | Avg Batch Time: 7946.66 ms | Avg Sample Time: 125.56 ms
System Stats: CPU Usage: 12.40% | RAM Usage: 9.6/30.9GB (40.6%) | GPU 0 Util: 43.00% | GPU 0 Mem: 17.9/24.0GB (74.6%)


[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 12.61it/s]
2025-06-13 18:14:28,800 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 40.61it/s, acc=0.9254, cpu=3.3%, gpu_mem=17.9/24.0GB (74.5%), gpu_util=42.0%, loss=0.1026, ram=9.8/30.9GB (41.5%), samples/s=617.5]  

Evaluation Complete: Avg Loss: 0.2157, Accuracy: 0.9254
Throughput: 8007.81 samples/sec | Avg Batch Time: 7954.02 ms | Avg Sample Time: 124.88 ms
System Stats: CPU Usage: 10.40% | RAM Usage: 9.6/30.9GB (40.6%) | GPU 0 Util: 34.00% | GPU 0 Mem: 17.9/24.0GB (74.5%)
Validation accuracy of the adapted MobileNetV2 on CIFAR-10: 0.93
Test accuracy of the adapted MobileNetV2 on CIFAR-10: 0.93





In [4]:
# Export the adapted model
save_mobilenetv2_cifar10_model(
    model=mobilenetv2_cifar10_baseline,
    metrics_values={
        "val_metrics": val_metrics,
        "test_metrics": test_metrics,
    },
    version="mobilenetv2_cifar10/fp32/baseline",
)

2025-06-13 18:14:32,729 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Model saved to /home/pbeuran/repos/nnopt/models/mobilenetv2_cifar10/fp32/baseline/model.pt
2025-06-13 18:14:32,768 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Model state_dict saved to /home/pbeuran/repos/nnopt/models/mobilenetv2_cifar10/fp32/baseline/state_dict.pt
2025-06-13 18:14:32,770 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Metadata saved to /home/pbeuran/repos/nnopt/models/mobilenetv2_cifar10/fp32/baseline/metadata.json
2025-06-13 18:14:32,771 - nnopt.model.prune - INFO - Making pruning permanent by removing reparameterization...
2025-06-13 18:14:32,772 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Saving model in JIT script format...
2025-06-13 18:14:33,108 - nnopt.recipes.mobilenetv2_cifar10 - INFO - JIT script model saved to /home/pbeuran/repos/nnopt/models/mobilenetv2_cifar10/fp32/baseline/jit_script.pt
2025-06-13 18:14:33,109 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Saving model in JIT trace format.

# Analysis

## GPU FP32

In [5]:
# Evaluate the adapted model on the validation and test set on GPU
val_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_val_dataset,
    batch_size=64,  # Adjust batch size as needed
    device="cuda",
    use_amp=False,
    dtype=torch.float32
)

test_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_test_dataset,
    batch_size=64,  # Adjust batch size as needed
    device="cuda",
    use_amp=False,
    dtype=torch.float32
)

2025-06-13 18:14:33,785 - nnopt.model.eval - INFO - Starting evaluation on device: cuda, dtype: torch.float32, batch size: 64
2025-06-13 18:14:33,811 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:00<00:00,  8.81it/s]
2025-06-13 18:14:34,462 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 79/79 [00:01<00:00, 39.54it/s, acc=0.9260, cpu=2.9%, gpu_mem=18.5/24.0GB (77.0%), gpu_util=62.0%, loss=0.3825, ram=10.0/30.9GB (41.9%), samples/s=430.4]  
2025-06-13 18:14:36,466 - nnopt.model.eval - INFO - Starting evaluation on device: cuda, dtype: torch.float32, batch size: 64
2025-06-13 18:14:36,469 - nnopt.model.eval - INFO - Starting warmup for 5 batches...


Evaluation Complete: Avg Loss: 0.2145, Accuracy: 0.9260
Throughput: 5387.35 samples/sec | Avg Batch Time: 11748.11 ms | Avg Sample Time: 185.62 ms
System Stats: CPU Usage: 13.50% | RAM Usage: 9.8/30.9GB (41.3%) | GPU 0 Util: 52.00% | GPU 0 Mem: 18.5/24.0GB (77.0%)


[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 12.74it/s]
2025-06-13 18:14:36,952 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 41.54it/s, acc=0.9238, cpu=10.6%, gpu_mem=18.5/24.0GB (77.1%), gpu_util=55.0%, loss=0.0849, ram=10.0/30.9GB (41.9%), samples/s=951.1] 

Evaluation Complete: Avg Loss: 0.2178, Accuracy: 0.9238
Throughput: 5507.21 samples/sec | Avg Batch Time: 11565.61 ms | Avg Sample Time: 181.58 ms
System Stats: CPU Usage: 14.10% | RAM Usage: 9.8/30.9GB (41.3%) | GPU 0 Util: 55.00% | GPU 0 Mem: 18.5/24.0GB (77.1%)





In [6]:
# Print the val metrics
import yaml
print("- Validation Metrics:")
yaml_str = yaml.dump(val_metrics, default_flow_style=False)
print(yaml_str)

# Print the test metrics
print("- Test Metrics:")
yaml_str = yaml.dump(test_metrics, default_flow_style=False)
print(yaml_str)

- Validation Metrics:
accuracy: 0.926
avg_loss: 0.21445406112670898
avg_time_per_batch: 11.748108253057213
avg_time_per_sample: 0.18562011039830395
params_stats:
  bn_param_params: 34112
  float_bias_params: 10
  float_weight_params: 2202560
  int_weight_params: 0
  other_float_params: 0
  total_params: 2236682
samples_per_second: 5387.347296875313

- Test Metrics:
accuracy: 0.9238
avg_loss: 0.2178233437180519
avg_time_per_batch: 11.56561239467282
avg_time_per_sample: 0.18158011459636328
params_stats:
  bn_param_params: 34112
  float_bias_params: 10
  float_weight_params: 2202560
  int_weight_params: 0
  other_float_params: 0
  total_params: 2236682
samples_per_second: 5507.210975292711



## CPU FP32

In [7]:
# Evaluate the adapted model on the validation and test set on CPU
val_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_val_dataset,
    batch_size=32,  # Adjust batch size as needed
    device="cpu",
    use_amp=False,
    dtype=torch.float32
)

test_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_test_dataset,
    batch_size=32,  # Adjust batch size as needed
    device="cpu",
    use_amp=False,
    dtype=torch.float32
)

2025-06-13 18:14:40,781 - nnopt.model.eval - INFO - Starting evaluation on device: cpu, dtype: torch.float32, batch size: 32
2025-06-13 18:14:40,820 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:02<00:00,  2.08it/s]
2025-06-13 18:14:43,315 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [01:06<00:00,  2.35it/s, acc=0.9258, cpu=45.3%, loss=0.3829, ram=10.1/30.9GB (43.5%), samples/s=146.2]
2025-06-13 18:15:50,260 - nnopt.model.eval - INFO - Starting evaluation on device: cpu, dtype: torch.float32, batch size: 32
2025-06-13 18:15:50,264 - nnopt.model.eval - INFO - Starting warmup for 5 batches...


Evaluation Complete: Avg Loss: 0.2145, Accuracy: 0.9258
Throughput: 75.36 samples/sec | Avg Batch Time: 422592.31 ms | Avg Sample Time: 13269.40 ms
System Stats: CPU Usage: 11.40% | RAM Usage: 9.9/30.9GB (42.7%)


[Warmup]: 100%|██████████| 5/5 [00:02<00:00,  1.94it/s]
2025-06-13 18:15:52,930 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 313/313 [02:28<00:00,  2.10it/s, acc=0.9240, cpu=31.2%, loss=0.0850, ram=10.1/30.9GB (43.3%), samples/s=80.0]

Evaluation Complete: Avg Loss: 0.2178, Accuracy: 0.9240
Throughput: 67.57 samples/sec | Avg Batch Time: 472835.00 ms | Avg Sample Time: 14799.74 ms
System Stats: CPU Usage: 15.50% | RAM Usage: 9.8/30.9GB (42.6%)





In [8]:
# Print the val metrics
import yaml
print("- Validation Metrics:")
yaml_str = yaml.dump(val_metrics, default_flow_style=False)
print(yaml_str)

# Print the test metrics
print("- Test Metrics:")
yaml_str = yaml.dump(test_metrics, default_flow_style=False)
print(yaml_str)

- Validation Metrics:
accuracy: 0.9258
avg_loss: 0.21445582203865052
avg_time_per_batch: 422.59230760506273
avg_time_per_sample: 13.26939845879897
params_stats:
  bn_param_params: 34112
  float_bias_params: 10
  float_weight_params: 2202560
  int_weight_params: 0
  other_float_params: 0
  total_params: 2236682
samples_per_second: 75.36136646321731

- Test Metrics:
accuracy: 0.924
avg_loss: 0.21783703691661357
avg_time_per_batch: 472.8349969327639
avg_time_per_sample: 14.79973540399551
params_stats:
  bn_param_params: 34112
  float_bias_params: 10
  float_weight_params: 2202560
  int_weight_params: 0
  other_float_params: 0
  total_params: 2236682
samples_per_second: 67.56877556946243



## Conclusions

* Accuracy is ~92.8% for CIFAR-10 with MobileNetV2, with fast convergence for so few epochs.
* GPU is ~75 time faster than CPU for both training and evaluation, which is to be expected considering architecture differences.
* Thus, if wanting to run the model on a CPU for embedded cases, and expect high throughput during inference with little-to-no accuracy loss, the model should be optimised for the CPU. This can be done with pruning, quantization, knowledge distillation.
* Pruning and quantization are good candidates and explored in the next notebooks, while knowledge distillation isn't because of the already efficient architecture of MobileNetV2.