# 1. Data & Model Loading

This notebook prepares the data and models used for the subsequent optimisation pipeline. This is to emulate a non-compressed model training and evaluation process, where the model is adapted to a specific dataset and then exported for further compression for embedded deployment.

The process is defined as such:
* A Torch dataset (already split into train and val) and model are loaded. Those must be specialized for classification tasks, but are agnostic
of the modality.
* The model"s classification head is adapted to the number of classes in the dataset, trained on the training set while freezing the backbone, and evaluated on the validation set.
* The whole model (backbone + classification head) is then adapted to the dataset by freezing all layers except the classification head, which is trained on the training set.
* The adapted model is then exported as a Torch model for later use in the optimisation pipeline.

An image MobileNetV2 model with a classification head adapted to the CIFAR-10 dataset is used as an example in this notebook.

## Setup

In [None]:
import torch

from nnopt.model.train import adapt_model_head_to_dataset
from nnopt.model.eval import eval_model
from nnopt.model.const import DEVICE, DTYPE, AMP_ENABLE
from nnopt.recipes.mobilenetv2_cifar10 import init_mobilenetv2_cifar10_model, get_cifar10_datasets, save_mobilenetv2_cifar10_model, load_mobilenetv2_cifar10_model

2025-06-12 10:37:56,802 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Using device: cuda, dtype: torch.bfloat16


# MobileNetV2 and CIFAR-10 adaptation

In [None]:
mobilenetv2 = init_mobilenetv2_cifar10_model()
cifar10_train_dataset, cifar10_val_dataset, cifar10_test_dataset = get_cifar10_datasets()

# Adapt the MobileNetV2 model to CIFAR-10 dataset
mobilenetv2_cifar10_baseline = adapt_model_head_to_dataset(
    model=mobilenetv2,
    train_dataset=cifar10_train_dataset,
    val_dataset=cifar10_val_dataset,
    batch_size=64,  # Adjust batch size as needed
    head_train_epochs=5,  # Train head for fewer epochs
    fine_tune_epochs=5,  # Fine-tune for fewer epochs
    optimizer_cls=torch.optim.Adam,  # Use Adam optimizer
    head_train_lr=0.001,  # Learning rate for head training
    fine_tune_lr=0.0001,  # Learning rate for fine-tuning
    use_amp=AMP_ENABLE,  # Use mixed precision training for efficiency
    device=DEVICE, # Should be CUDA is available or CPU
    dtype=DTYPE # Should be torch.float32 or torch.float16
)

2025-06-12 10:41:39,106 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading MobileNetV2 model with weights: MobileNet_V2_Weights.IMAGENET1K_V1, to_quantize: False, is_quantized: False, num_classes: 10
2025-06-12 10:41:39,152 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Replacing head of the model to match 10 classes
2025-06-12 10:41:39,154 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading existing training and validation datasets...
2025-06-12 10:41:40,621 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading existing test dataset...
2025-06-12 10:41:40,773 - nnopt.model.train - INFO - Training head of the model with backbone frozen...
Epoch 1/5 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.54it/s, acc=0.4885, cpu=3.6%, gpu_mem=15.5/24.0GB (64.5%), gpu_util=35.0%, loss=2.4434, ram=9.2/30.9GB (39.4%), samples/s=1202.4] 
Epoch 1/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 43.14it/s, acc=0.6566, cpu=0.0%, gpu_mem=15.5/24.0GB (64.6%), gpu_util=38.0%, loss=1.1313, ram=9.

Epoch 1/5, Train Loss: 1.4494, Train Acc: 0.4885, Train Throughput: 4263.55 samples/s | Val Loss: 0.9658, Val Acc: 0.6566, Val Throughput: 8564.91 samples/s | CPU Usage: 12.80% | RAM Usage: 9.0/30.9GB (38.4%) | GPU 0 Util: 38.00% | GPU 0 Mem: 15.5/24.0GB (64.6%)


Epoch 2/5 [Training]: 100%|██████████| 704/704 [00:35<00:00, 19.69it/s, acc=0.5300, cpu=3.1%, gpu_mem=15.5/24.0GB (64.6%), gpu_util=40.0%, loss=1.0195, ram=9.3/30.9GB (39.6%), samples/s=1091.5] 
Epoch 2/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 43.29it/s, acc=0.6864, cpu=3.6%, gpu_mem=15.5/24.0GB (64.6%), gpu_util=38.0%, loss=1.2681, ram=9.3/30.9GB (39.6%), samples/s=1467.4]  


Epoch 2/5, Train Loss: 1.3353, Train Acc: 0.5300, Train Throughput: 4276.14 samples/s | Val Loss: 0.9010, Val Acc: 0.6864, Val Throughput: 9078.04 samples/s | CPU Usage: 9.80% | RAM Usage: 9.0/30.9GB (38.7%) | GPU 0 Util: 38.00% | GPU 0 Mem: 15.5/24.0GB (64.6%)


Epoch 3/5 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.05it/s, acc=0.5332, cpu=3.1%, gpu_mem=15.5/24.0GB (64.5%), gpu_util=37.0%, loss=1.4594, ram=9.3/30.9GB (39.5%), samples/s=1067.9] 
Epoch 3/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 43.38it/s, acc=0.6662, cpu=4.0%, gpu_mem=15.5/24.0GB (64.5%), gpu_util=39.0%, loss=0.8524, ram=9.3/30.9GB (39.5%), samples/s=1478.6]  


Epoch 3/5, Train Loss: 1.3242, Train Acc: 0.5332, Train Throughput: 4206.87 samples/s | Val Loss: 0.9348, Val Acc: 0.6662, Val Throughput: 8850.24 samples/s | CPU Usage: 11.10% | RAM Usage: 9.0/30.9GB (38.7%) | GPU 0 Util: 39.00% | GPU 0 Mem: 15.5/24.0GB (64.5%)


Epoch 4/5 [Training]: 100%|██████████| 704/704 [00:34<00:00, 20.22it/s, acc=0.5392, cpu=0.0%, gpu_mem=15.5/24.0GB (64.5%), gpu_util=40.0%, loss=0.9433, ram=9.3/30.9GB (39.6%), samples/s=1081.9] 
Epoch 4/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 43.75it/s, acc=0.6518, cpu=3.4%, gpu_mem=15.5/24.0GB (64.5%), gpu_util=38.0%, loss=1.0523, ram=9.3/30.9GB (39.5%), samples/s=1377.0]  


Epoch 4/5, Train Loss: 1.3208, Train Acc: 0.5392, Train Throughput: 4202.78 samples/s | Val Loss: 0.9768, Val Acc: 0.6518, Val Throughput: 8561.41 samples/s | CPU Usage: 14.30% | RAM Usage: 9.0/30.9GB (38.7%) | GPU 0 Util: 36.00% | GPU 0 Mem: 15.5/24.0GB (64.5%)


Epoch 5/5 [Training]: 100%|██████████| 704/704 [00:37<00:00, 18.77it/s, acc=0.5388, cpu=6.5%, gpu_mem=15.5/24.0GB (64.5%), gpu_util=39.0%, loss=1.6631, ram=9.3/30.9GB (39.5%), samples/s=1098.6] 
Epoch 5/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 42.92it/s, acc=0.6610, cpu=3.6%, gpu_mem=15.5/24.0GB (64.5%), gpu_util=40.0%, loss=1.0176, ram=9.3/30.9GB (39.5%), samples/s=1445.9]  
2025-06-12 10:44:50,999 - nnopt.model.train - INFO - Fine-tuning full model...


Epoch 5/5, Train Loss: 1.3209, Train Acc: 0.5388, Train Throughput: 4012.76 samples/s | Val Loss: 0.9607, Val Acc: 0.6610, Val Throughput: 8514.76 samples/s | CPU Usage: 10.50% | RAM Usage: 9.0/30.9GB (38.6%) | GPU 0 Util: 40.00% | GPU 0 Mem: 15.5/24.0GB (64.5%)


Epoch 1/5 [Training]: 100%|██████████| 704/704 [00:35<00:00, 20.07it/s, acc=0.6821, cpu=3.4%, gpu_mem=18.0/24.0GB (75.0%), gpu_util=62.0%, loss=2.2703, ram=9.4/30.9GB (39.8%), samples/s=182.3]  
Epoch 1/5 [Validation]: 100%|██████████| 79/79 [00:05<00:00, 14.37it/s, acc=0.8854, cpu=0.0%, gpu_mem=18.0/24.0GB (75.1%), gpu_util=38.0%, loss=0.4311, ram=9.4/30.9GB (39.8%), samples/s=1344.1]  


Epoch 1/5, Train Loss: 0.9125, Train Acc: 0.6821, Train Throughput: 2183.96 samples/s | Val Loss: 0.3333, Val Acc: 0.8854, Val Throughput: 8838.10 samples/s | CPU Usage: 12.90% | RAM Usage: 9.1/30.9GB (38.9%) | GPU 0 Util: 31.00% | GPU 0 Mem: 18.0/24.0GB (75.1%)


Epoch 2/5 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.49it/s, acc=0.7559, cpu=3.6%, gpu_mem=18.0/24.0GB (75.0%), gpu_util=63.0%, loss=0.7060, ram=9.4/30.9GB (39.8%), samples/s=529.9]  
Epoch 2/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 43.34it/s, acc=0.8990, cpu=4.0%, gpu_mem=18.0/24.0GB (75.1%), gpu_util=38.0%, loss=0.6878, ram=9.4/30.9GB (39.8%), samples/s=1478.6]  


Epoch 2/5, Train Loss: 0.7015, Train Acc: 0.7559, Train Throughput: 2141.56 samples/s | Val Loss: 0.2846, Val Acc: 0.8990, Val Throughput: 9535.81 samples/s | CPU Usage: 10.50% | RAM Usage: 9.1/30.9GB (38.9%) | GPU 0 Util: 34.00% | GPU 0 Mem: 18.0/24.0GB (75.1%)


Epoch 3/5 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.20it/s, acc=0.7775, cpu=3.6%, gpu_mem=18.0/24.0GB (75.1%), gpu_util=64.0%, loss=0.6134, ram=9.4/30.9GB (39.8%), samples/s=526.6]  
Epoch 3/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 42.63it/s, acc=0.9054, cpu=0.0%, gpu_mem=18.0/24.0GB (75.1%), gpu_util=39.0%, loss=0.7339, ram=9.4/30.9GB (39.8%), samples/s=1445.9]  


Epoch 3/5, Train Loss: 0.6329, Train Acc: 0.7775, Train Throughput: 2161.97 samples/s | Val Loss: 0.2739, Val Acc: 0.9054, Val Throughput: 8860.93 samples/s | CPU Usage: 11.50% | RAM Usage: 9.1/30.9GB (38.9%) | GPU 0 Util: 39.00% | GPU 0 Mem: 18.0/24.0GB (75.1%)


Epoch 4/5 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.47it/s, acc=0.7975, cpu=7.0%, gpu_mem=18.0/24.0GB (75.1%), gpu_util=66.0%, loss=0.4283, ram=9.4/30.9GB (39.8%), samples/s=524.3]  
Epoch 4/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 43.25it/s, acc=0.9216, cpu=3.8%, gpu_mem=18.0/24.0GB (75.1%), gpu_util=38.0%, loss=0.7340, ram=9.4/30.9GB (39.8%), samples/s=1454.5]  


Epoch 4/5, Train Loss: 0.5791, Train Acc: 0.7975, Train Throughput: 2178.57 samples/s | Val Loss: 0.2252, Val Acc: 0.9216, Val Throughput: 9675.00 samples/s | CPU Usage: 10.60% | RAM Usage: 9.1/30.9GB (38.9%) | GPU 0 Util: 38.00% | GPU 0 Mem: 18.0/24.0GB (75.1%)


Epoch 5/5 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.21it/s, acc=0.8078, cpu=3.4%, gpu_mem=18.0/24.0GB (75.0%), gpu_util=64.0%, loss=0.8391, ram=9.4/30.9GB (39.8%), samples/s=515.0]  
Epoch 5/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 42.44it/s, acc=0.9254, cpu=7.4%, gpu_mem=18.0/24.0GB (75.1%), gpu_util=38.0%, loss=0.5486, ram=9.4/30.9GB (39.8%), samples/s=1404.4]  


Epoch 5/5, Train Loss: 0.5474, Train Acc: 0.8078, Train Throughput: 2165.49 samples/s | Val Loss: 0.2146, Val Acc: 0.9254, Val Throughput: 9099.89 samples/s | CPU Usage: 10.20% | RAM Usage: 9.1/30.9GB (38.9%) | GPU 0 Util: 38.00% | GPU 0 Mem: 18.0/24.0GB (75.1%)


In [None]:
# Evaluate the adapted model on the validation and test set
val_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_val_dataset,
    batch_size=64,  # Adjust batch size as needed
    device=DEVICE,
    use_amp=AMP_ENABLE,
    dtype=DTYPE
)

test_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_test_dataset,
    batch_size=64,  # Adjust batch size as needed
    device=DEVICE,
    use_amp=AMP_ENABLE,
    dtype=DTYPE
)
print(f"Validation accuracy of the adapted MobileNetV2 on CIFAR-10: {val_metrics['accuracy']:.2f}")
print(f"Test accuracy of the adapted MobileNetV2 on CIFAR-10: {test_metrics['accuracy']:.2f}")

2025-06-12 10:48:13,673 - nnopt.model.eval - INFO - Starting evaluation on device: cuda, dtype: torch.bfloat16, batch size: 64
2025-06-12 10:48:13,676 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 11.21it/s]
2025-06-12 10:48:14,211 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 79/79 [00:01<00:00, 43.85it/s, acc=0.9254, cpu=3.8%, gpu_mem=18.1/24.0GB (75.2%), gpu_util=39.0%, loss=0.5486, ram=9.4/30.9GB (39.8%), samples/s=1420.2]  
2025-06-12 10:48:16,017 - nnopt.model.eval - INFO - Starting evaluation on device: cuda, dtype: torch.bfloat16, batch size: 64
2025-06-12 10:48:16,020 - nnopt.model.eval - INFO - Starting warmup for 5 batches...


Evaluation Complete: Avg Loss: 0.2146, Accuracy: 0.9254
Throughput: 9594.24 samples/sec | Avg Batch Time: 6.60 ms | Avg Sample Time: 0.10 ms
System Stats: CPU Usage: 13.40% | RAM Usage: 9.1/30.9GB (38.9%) | GPU 0 Util: 39.00% | GPU 0 Mem: 18.1/24.0GB (75.2%)


[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 14.10it/s]
2025-06-12 10:48:16,472 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 45.88it/s, acc=0.9288, cpu=2.7%, gpu_mem=18.0/24.0GB (75.2%), gpu_util=38.0%, loss=0.0395, ram=9.4/30.9GB (39.8%), samples/s=767.2]   

Evaluation Complete: Avg Loss: 0.2064, Accuracy: 0.9288
Throughput: 9116.58 samples/sec | Avg Batch Time: 6.99 ms | Avg Sample Time: 0.11 ms
System Stats: CPU Usage: 12.90% | RAM Usage: 9.1/30.9GB (38.9%) | GPU 0 Util: 38.00% | GPU 0 Mem: 18.0/24.0GB (75.2%)
Validation accuracy of the adapted MobileNetV2 on CIFAR-10: 0.93
Test accuracy of the adapted MobileNetV2 on CIFAR-10: 0.93





In [5]:
# Export the adapted model
save_mobilenetv2_cifar10_model(
    model=mobilenetv2_cifar10_baseline,
    metrics_values={
        "val_metrics": val_metrics,
        "test_metrics": test_metrics,
    },
    version="mobilenetv2_cifar10/fp32/baseline",
)

2025-06-12 10:48:22,315 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Model saved to /home/pbeuran/repos/nnopt/models/mobilenetv2_cifar10/fp32/baseline/model.pt
2025-06-12 10:48:22,351 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Model state_dict saved to /home/pbeuran/repos/nnopt/models/mobilenetv2_cifar10/fp32/baseline/state_dict.pt
2025-06-12 10:48:22,352 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Metadata saved to /home/pbeuran/repos/nnopt/models/mobilenetv2_cifar10/fp32/baseline/metadata.json
2025-06-12 10:48:22,352 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Saving model in JIT script format...
2025-06-12 10:48:22,623 - nnopt.recipes.mobilenetv2_cifar10 - INFO - JIT script model saved to /home/pbeuran/repos/nnopt/models/mobilenetv2_cifar10/fp32/baseline/jit_script.pt
2025-06-12 10:48:22,623 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Saving model in JIT trace format...
2025-06-12 10:48:23,210 - nnopt.recipes.mobilenetv2_cifar10 - INFO - JIT model saved to /home/pbeuran/repos/n

# Analysis

## GPU FP32

In [6]:
# Evaluate the adapted model on the validation and test set on GPU
val_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_val_dataset,
    batch_size=64,  # Adjust batch size as needed
    device="cuda",
    use_amp=False,
    dtype=torch.float32
)

test_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_test_dataset,
    batch_size=64,  # Adjust batch size as needed
    device="cuda",
    use_amp=False,
    dtype=torch.float32
)

2025-06-12 10:48:27,638 - nnopt.model.eval - INFO - Starting evaluation on device: cuda, dtype: torch.float32, batch size: 64
2025-06-12 10:48:27,669 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:00<00:00,  7.88it/s]
2025-06-12 10:48:28,401 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 79/79 [00:01<00:00, 43.78it/s, acc=0.9248, cpu=5.3%, gpu_mem=18.6/24.0GB (77.6%), gpu_util=60.0%, loss=0.5485, ram=9.5/30.9GB (40.2%), samples/s=531.4]  
2025-06-12 10:48:30,210 - nnopt.model.eval - INFO - Starting evaluation on device: cuda, dtype: torch.float32, batch size: 64
2025-06-12 10:48:30,213 - nnopt.model.eval - INFO - Starting warmup for 5 batches...


Evaluation Complete: Avg Loss: 0.2172, Accuracy: 0.9248
Throughput: 6371.58 samples/sec | Avg Batch Time: 9.93 ms | Avg Sample Time: 0.16 ms
System Stats: CPU Usage: 11.60% | RAM Usage: 9.3/30.9GB (39.6%) | GPU 0 Util: 60.00% | GPU 0 Mem: 18.6/24.0GB (77.6%)


[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 15.10it/s]
2025-06-12 10:48:30,644 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 46.19it/s, acc=0.9280, cpu=3.6%, gpu_mem=18.6/24.0GB (77.5%), gpu_util=56.0%, loss=0.0346, ram=9.5/30.9GB (40.2%), samples/s=1074.0] 

Evaluation Complete: Avg Loss: 0.2091, Accuracy: 0.9280
Throughput: 6477.11 samples/sec | Avg Batch Time: 9.83 ms | Avg Sample Time: 0.15 ms
System Stats: CPU Usage: 11.10% | RAM Usage: 9.3/30.9GB (39.6%) | GPU 0 Util: 56.00% | GPU 0 Mem: 18.6/24.0GB (77.5%)





In [7]:
# Print the val metrics
import yaml
print("- Validation Metrics:")
yaml_str = yaml.dump(val_metrics, default_flow_style=False)
print(yaml_str)

# Print the test metrics
print("- Test Metrics:")
yaml_str = yaml.dump(test_metrics, default_flow_style=False)
print(yaml_str)

- Validation Metrics:
accuracy: 0.9248
avg_loss: 0.21724928364753723
avg_time_per_batch: 0.009933343151853386
avg_time_per_sample: 0.0001569468217992835
params_stats:
  approx_memory_mb_for_params: 8.532264709472656
  bn_param_params: 34112
  float_bias_params: 10
  float_weight_params: 2202560
  int_weight_params: 0
  other_float_params: 0
  total_params: 2236682
samples_per_second: 6371.584900769015

- Test Metrics:
accuracy: 0.928
avg_loss: 0.20905147356987
avg_time_per_batch: 0.009833745019248824
avg_time_per_sample: 0.00015438979680220655
params_stats:
  approx_memory_mb_for_params: 8.532264709472656
  bn_param_params: 34112
  float_bias_params: 10
  float_weight_params: 2202560
  int_weight_params: 0
  other_float_params: 0
  total_params: 2236682
samples_per_second: 6477.111964083549



## CPU FP32

In [8]:
# Evaluate the adapted model on the validation and test set on CPU
val_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_val_dataset,
    batch_size=32,  # Adjust batch size as needed
    device="cpu",
    use_amp=False,
    dtype=torch.float32
)

test_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_test_dataset,
    batch_size=32,  # Adjust batch size as needed
    device="cpu",
    use_amp=False,
    dtype=torch.float32
)

2025-06-12 10:48:42,621 - nnopt.model.eval - INFO - Starting evaluation on device: cpu, dtype: torch.float32, batch size: 32
2025-06-12 10:48:42,662 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:02<00:00,  2.50it/s]
2025-06-12 10:48:44,761 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [01:05<00:00,  2.40it/s, acc=0.9248, cpu=46.6%, loss=0.5482, ram=9.6/30.9GB (41.7%), samples/s=163.4]
2025-06-12 10:49:50,152 - nnopt.model.eval - INFO - Starting evaluation on device: cpu, dtype: torch.float32, batch size: 32
2025-06-12 10:49:50,155 - nnopt.model.eval - INFO - Starting warmup for 5 batches...


Evaluation Complete: Avg Loss: 0.2173, Accuracy: 0.9248
Throughput: 85.90 samples/sec | Avg Batch Time: 370.74 ms | Avg Sample Time: 11.64 ms
System Stats: CPU Usage: 15.10% | RAM Usage: 9.4/30.9GB (41.0%)


[Warmup]: 100%|██████████| 5/5 [00:02<00:00,  2.21it/s]
2025-06-12 10:49:52,525 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 313/313 [02:17<00:00,  2.27it/s, acc=0.9279, cpu=32.7%, loss=0.0346, ram=9.6/30.9GB (41.4%), samples/s=91.2]

Evaluation Complete: Avg Loss: 0.2091, Accuracy: 0.9279
Throughput: 80.13 samples/sec | Avg Batch Time: 398.73 ms | Avg Sample Time: 12.48 ms
System Stats: CPU Usage: 16.00% | RAM Usage: 9.3/30.9GB (40.7%)





In [10]:
# Print the val metrics
import yaml
print("- Validation Metrics:")
yaml_str = yaml.dump(val_metrics, default_flow_style=False)
print(yaml_str)

# Print the test metrics
print("- Test Metrics:")
yaml_str = yaml.dump(test_metrics, default_flow_style=False)
print(yaml_str)

- Validation Metrics:
accuracy: 0.9248
avg_loss: 0.21728209077119828
avg_time_per_batch: 0.37074374931211085
avg_time_per_sample: 0.011641353728400281
params_stats:
  approx_memory_mb_for_params: 8.532264709472656
  bn_param_params: 34112
  float_bias_params: 10
  float_weight_params: 2202560
  int_weight_params: 0
  other_float_params: 0
  total_params: 2236682
samples_per_second: 85.90066270045527

- Test Metrics:
accuracy: 0.9279
avg_loss: 0.20908061562776564
avg_time_per_batch: 0.3987278120160723
avg_time_per_sample: 0.012480180516103064
params_stats:
  approx_memory_mb_for_params: 8.532264709472656
  bn_param_params: 34112
  float_bias_params: 10
  float_weight_params: 2202560
  int_weight_params: 0
  other_float_params: 0
  total_params: 2236682
samples_per_second: 80.1270461360482



## Conclusions

* Accuracy is ~92.8% for CIFAR-10 with MobileNetV2, with fast convergence for so few epochs.
* GPU is ~75 time faster than CPU for both training and evaluation, which is to be expected considering architecture differences.
* Thus, if wanting to run the model on a CPU for embedded cases, and expect high throughput during inference with little-to-no accuracy loss, the model should be optimised for the CPU. This can be done with pruning, quantization, knowledge distillation.
* Pruning and quantization are good candidates and explored in the next notebooks, while knowledge distillation isn't because of the already efficient architecture of MobileNetV2.