# 1. Data & Model Loading

This notebook prepares the data and models used for the subsequent optimisation pipeline. This is to emulate a non-compressed model training and evaluation process, where the model is adapted to a specific dataset and then exported for further compression for embedded deployment.

The process is defined as such:
* A Torch dataset (already split into train and val) and model are loaded. Those must be specialized for classification tasks, but are agnostic
of the modality.
* The model"s classification head is adapted to the number of classes in the dataset, trained on the training set while freezing the backbone, and evaluated on the validation set.
* The whole model (backbone + classification head) is then adapted to the dataset by freezing all layers except the classification head, which is trained on the training set.
* The adapted model is then exported as a Torch model for later use in the optimisation pipeline.

2 models are exported:
* An image MobileNetV2 model with a classification head adapted to the CIFAR-10 dataset.
* An audio YAML model with a classification head adapted to the ESC-50 dataset.

## Setup

In [1]:
import torch
import torchvision

from nnopt.model.train import adapt_model_head_to_dataset
from nnopt.model.eval import eval_model
from nnopt.model.const import DEVICE, DTYPE
from nnopt.recipes.mobilenetv2_cifar10 import get_cifar10_datasets, save_mobilenetv2_cifar10_model

# MobileNetV2 and CIFAR-10 adaptation

In [2]:
mobilenetv2 = torchvision.models.mobilenet_v2(
    weights=torchvision.models.MobileNet_V2_Weights.DEFAULT
)
cifar10_train_dataset, cifar10_val_dataset, cifar10_test_dataset = get_cifar10_datasets()

# Adapt the MobileNetV2 model to CIFAR-10 dataset
mobilenetv2_cifar10_baseline = adapt_model_head_to_dataset(
    model=mobilenetv2,
    num_classes=10,  # CIFAR-10 has 10 classes
    train_dataset=cifar10_train_dataset,
    val_dataset=cifar10_val_dataset,
    batch_size=64,  # Adjust batch size as needed
    head_train_epochs=5,  # Train head for fewer epochs
    fine_tune_epochs=3,  # Fine-tune for fewer epochs
    optimizer_cls=torch.optim.Adam,  # Use Adam optimizer
    head_train_lr=0.001,  # Learning rate for head training
    fine_tune_lr=0.0001,  # Learning rate for fine-tuning
    use_amp=True,  # Use mixed precision training
    device=DEVICE,
    dtype=DTYPE
)

2025-06-11 05:08:30,320 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Training and/or validation dataset does not exist, creating, splitting and saving...
100%|██████████| 170M/170M [00:34<00:00, 4.89MB/s] 
2025-06-11 05:09:11,699 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Test dataset does not exist, creating and saving...
100%|██████████| 170M/170M [00:25<00:00, 6.81MB/s] 
2025-06-11 05:09:39,667 - nnopt.model.train - INFO - Training head of the model with backbone frozen...
Epoch 1/5 [Training]: 100%|██████████| 704/704 [00:17<00:00, 40.39it/s, acc=0.6562, cpu=2.4%, gpu_mem=2.6/24.0GB (11.0%), gpu_util=49.0%, loss=0.8481, ram=6.1/30.9GB (23.0%), samples/s=328.6]  
Epoch 1/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 41.97it/s, acc=0.7198, cpu=7.4%, gpu_mem=2.7/24.0GB (11.1%), gpu_util=42.0%, loss=1.1183, ram=6.1/30.9GB (23.0%), samples/s=1455.2]  


Epoch 1/5, Train Loss: 1.0699, Train Acc: 0.6562, Train Throughput: 5415.28 samples/s | Val Loss: 0.8165, Val Acc: 0.7198, Val Throughput: 9107.78 samples/s | CPU Usage: 10.60% | RAM Usage: 5.8/30.9GB (22.1%) | GPU 0 Util: 42.00% | GPU 0 Mem: 2.7/24.0GB (11.1%)


Epoch 2/5 [Training]: 100%|██████████| 704/704 [00:15<00:00, 44.71it/s, acc=0.7112, cpu=0.0%, gpu_mem=2.6/24.0GB (10.9%), gpu_util=49.0%, loss=1.4019, ram=6.2/30.9GB (23.4%), samples/s=1137.3] 
Epoch 2/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 41.95it/s, acc=0.7378, cpu=3.7%, gpu_mem=2.6/24.0GB (10.9%), gpu_util=38.0%, loss=1.0328, ram=6.1/30.9GB (23.2%), samples/s=1472.4]  


Epoch 2/5, Train Loss: 0.8397, Train Acc: 0.7112, Train Throughput: 6689.78 samples/s | Val Loss: 0.7575, Val Acc: 0.7378, Val Throughput: 9478.90 samples/s | CPU Usage: 10.20% | RAM Usage: 5.9/30.9GB (22.5%) | GPU 0 Util: 39.00% | GPU 0 Mem: 2.6/24.0GB (10.9%)


Epoch 3/5 [Training]: 100%|██████████| 704/704 [00:15<00:00, 45.21it/s, acc=0.7219, cpu=3.6%, gpu_mem=2.6/24.0GB (10.9%), gpu_util=45.0%, loss=0.8321, ram=6.1/30.9GB (23.1%), samples/s=1134.4] 
Epoch 3/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 42.22it/s, acc=0.7502, cpu=7.7%, gpu_mem=2.6/24.0GB (10.9%), gpu_util=33.0%, loss=1.0484, ram=6.0/30.9GB (22.8%), samples/s=1510.1]  


Epoch 3/5, Train Loss: 0.8010, Train Acc: 0.7219, Train Throughput: 6876.91 samples/s | Val Loss: 0.7229, Val Acc: 0.7502, Val Throughput: 9652.09 samples/s | CPU Usage: 11.80% | RAM Usage: 5.8/30.9GB (22.2%) | GPU 0 Util: 33.00% | GPU 0 Mem: 2.6/24.0GB (10.9%)


Epoch 4/5 [Training]: 100%|██████████| 704/704 [00:15<00:00, 44.55it/s, acc=0.7314, cpu=3.1%, gpu_mem=2.6/24.0GB (10.9%), gpu_util=50.0%, loss=0.4964, ram=6.2/30.9GB (23.5%), samples/s=1051.7] 
Epoch 4/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 42.32it/s, acc=0.7502, cpu=4.0%, gpu_mem=2.6/24.0GB (10.9%), gpu_util=33.0%, loss=1.0616, ram=6.2/30.9GB (23.5%), samples/s=1477.7]  


Epoch 4/5, Train Loss: 0.7742, Train Acc: 0.7314, Train Throughput: 6561.94 samples/s | Val Loss: 0.7201, Val Acc: 0.7502, Val Throughput: 9538.85 samples/s | CPU Usage: 10.00% | RAM Usage: 6.0/30.9GB (22.8%) | GPU 0 Util: 33.00% | GPU 0 Mem: 2.6/24.0GB (10.9%)


Epoch 5/5 [Training]: 100%|██████████| 704/704 [00:15<00:00, 45.16it/s, acc=0.7322, cpu=3.6%, gpu_mem=2.6/24.0GB (10.9%), gpu_util=44.0%, loss=0.6791, ram=6.2/30.9GB (23.4%), samples/s=1191.6] 
Epoch 5/5 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 42.83it/s, acc=0.7446, cpu=7.4%, gpu_mem=2.6/24.0GB (10.9%), gpu_util=35.0%, loss=1.1355, ram=6.2/30.9GB (23.3%), samples/s=1487.6]  
2025-06-11 05:11:09,187 - nnopt.model.train - INFO - Fine-tuning full model...


Epoch 5/5, Train Loss: 0.7683, Train Acc: 0.7322, Train Throughput: 6853.77 samples/s | Val Loss: 0.7251, Val Acc: 0.7446, Val Throughput: 9733.20 samples/s | CPU Usage: 11.50% | RAM Usage: 6.0/30.9GB (22.7%) | GPU 0 Util: 35.00% | GPU 0 Mem: 2.6/24.0GB (10.9%)


Epoch 1/3 [Training]: 100%|██████████| 704/704 [00:25<00:00, 27.90it/s, acc=0.8413, cpu=2.5%, gpu_mem=5.1/24.0GB (21.2%), gpu_util=87.0%, loss=1.1297, ram=6.2/30.9GB (23.4%), samples/s=162.4]  
Epoch 1/3 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 42.17it/s, acc=0.8972, cpu=0.0%, gpu_mem=5.1/24.0GB (21.3%), gpu_util=37.0%, loss=0.3177, ram=6.3/30.9GB (23.7%), samples/s=1513.6]  


Epoch 1/3, Train Loss: 0.4587, Train Acc: 0.8413, Train Throughput: 2048.16 samples/s | Val Loss: 0.3037, Val Acc: 0.8972, Val Throughput: 9146.27 samples/s | CPU Usage: 11.20% | RAM Usage: 6.0/30.9GB (22.8%) | GPU 0 Util: 37.00% | GPU 0 Mem: 5.1/24.0GB (21.3%)


Epoch 2/3 [Training]: 100%|██████████| 704/704 [00:25<00:00, 27.94it/s, acc=0.9237, cpu=3.6%, gpu_mem=5.1/24.0GB (21.3%), gpu_util=86.0%, loss=0.3928, ram=6.2/30.9GB (23.5%), samples/s=534.7]  
Epoch 2/3 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 42.14it/s, acc=0.9166, cpu=2.9%, gpu_mem=5.1/24.0GB (21.3%), gpu_util=40.0%, loss=0.2739, ram=6.3/30.9GB (23.7%), samples/s=1394.2] 


Epoch 2/3, Train Loss: 0.2214, Train Acc: 0.9237, Train Throughput: 2059.00 samples/s | Val Loss: 0.2463, Val Acc: 0.9166, Val Throughput: 8912.47 samples/s | CPU Usage: 10.10% | RAM Usage: 6.0/30.9GB (22.8%) | GPU 0 Util: 40.00% | GPU 0 Mem: 5.1/24.0GB (21.3%)


Epoch 3/3 [Training]: 100%|██████████| 704/704 [00:25<00:00, 27.27it/s, acc=0.9589, cpu=5.0%, gpu_mem=5.1/24.0GB (21.3%), gpu_util=88.0%, loss=2.3772, ram=6.4/30.9GB (23.9%), samples/s=496.6]  
Epoch 3/3 [Validation]: 100%|██████████| 79/79 [00:01<00:00, 41.99it/s, acc=0.9144, cpu=4.0%, gpu_mem=5.1/24.0GB (21.3%), gpu_util=39.0%, loss=0.1405, ram=6.4/30.9GB (23.9%), samples/s=1371.5]  

Epoch 3/3, Train Loss: 0.1221, Train Acc: 0.9589, Train Throughput: 2011.36 samples/s | Val Loss: 0.2498, Val Acc: 0.9144, Val Throughput: 8856.77 samples/s | CPU Usage: 10.30% | RAM Usage: 6.2/30.9GB (23.3%) | GPU 0 Util: 39.00% | GPU 0 Mem: 5.1/24.0GB (21.3%)





In [3]:
# Evaluate the adapted model on the validation and test set
val_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_val_dataset,
    batch_size=64,  # Adjust batch size as needed
    device=DEVICE,
    use_amp=True,
    dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32
)

test_metrics = eval_model(
    model=mobilenetv2_cifar10_baseline,
    test_dataset=cifar10_test_dataset,
    batch_size=64,  # Adjust batch size as needed
    device=DEVICE,
    use_amp=True,
    dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32
)
print(f"Validation accuracy of the adapted MobileNetV2 on CIFAR-10: {val_metrics['accuracy']:.2f}")
print(f"Test accuracy of the adapted MobileNetV2 on CIFAR-10: {test_metrics['accuracy']:.2f}")

2025-06-11 05:12:31,094 - nnopt.model.eval - INFO - Starting evaluation on device: cuda, dtype: torch.bfloat16, batch size: 64
2025-06-11 05:12:31,098 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 13.05it/s]
2025-06-11 05:12:31,550 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 79/79 [00:01<00:00, 41.62it/s, acc=0.9144, cpu=8.0%, gpu_mem=5.1/24.0GB (21.3%), gpu_util=41.0%, loss=0.1405, ram=6.4/30.9GB (24.1%), samples/s=1455.3]  
2025-06-11 05:12:33,505 - nnopt.model.eval - INFO - Starting evaluation on device: cuda, dtype: torch.bfloat16, batch size: 64
2025-06-11 05:12:33,508 - nnopt.model.eval - INFO - Starting warmup for 5 batches...


Evaluation Complete: Avg Loss: 0.2498, Accuracy: 0.9144
Throughput: 8953.20 samples/sec | Avg Batch Time: 7.07 ms | Avg Sample Time: 0.11 ms
System Stats: CPU Usage: 11.90% | RAM Usage: 6.2/30.9GB (23.3%) | GPU 0 Util: 41.00% | GPU 0 Mem: 5.1/24.0GB (21.3%)


[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 14.61it/s]
2025-06-11 05:12:33,912 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 43.42it/s, acc=0.9151, cpu=3.5%, gpu_mem=5.1/24.0GB (21.3%), gpu_util=38.0%, loss=0.0830, ram=6.4/30.9GB (23.9%), samples/s=660.7]   


Evaluation Complete: Avg Loss: 0.2575, Accuracy: 0.9151
Throughput: 9123.23 samples/sec | Avg Batch Time: 6.98 ms | Avg Sample Time: 0.11 ms
System Stats: CPU Usage: 13.50% | RAM Usage: 6.1/30.9GB (23.2%) | GPU 0 Util: 32.00% | GPU 0 Mem: 5.1/24.0GB (21.3%)
Validation accuracy of the adapted MobileNetV2 on CIFAR-10: 0.91
Test accuracy of the adapted MobileNetV2 on CIFAR-10: 0.92


In [4]:
# Export the adapted model
save_mobilenetv2_cifar10_model(
    model=mobilenetv2_cifar10_baseline,
    metrics_values={
        "val_metrics": val_metrics,
        "test_metrics": test_metrics,
    },
    version="mobilenetv2_cifar10/baseline",
)

2025-06-11 05:12:37,596 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Metadata saved to /home/pbeuran/repos/nnopt/models/mobilenetv2_cifar10/baseline/metadata.json
2025-06-11 05:12:37,597 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Model saved to /home/pbeuran/repos/nnopt/models/mobilenetv2_cifar10/baseline/model.pt
