# 1. Data & Model Loading

This notebook prepares the data and models used for the subsequent optimisation pipeline. This is to emulate a non-compressed model training and evaluation process, where the model is adapted to a specific dataset and then exported for further compression for embedded deployment.

The process is defined as such:
* A Torch dataset (already split into train and val) and model are loaded. Those must be specialized for classification tasks, but are agnostic
of the modality.
* The model"s classification head is adapted to the number of classes in the dataset, trained on the training set while freezing the backbone, and evaluated on the validation set.
* The whole model (backbone + classification head) is then adapted to the dataset by freezing all layers except the classification head, which is trained on the training set.
* The adapted model is then exported as a Torch model for later use in the optimisation pipeline.

2 models are exported:
* An image MobileNetV2 model with a classification head adapted to the CIFAR-10 dataset.
* An audio YAML model with a classification head adapted to the ESC-50 dataset.

## Setup

In [1]:
import torch
import torchvision

from nnopt.model.train import adapt_model_head_to_dataset
from nnopt.model.eval import eval_model
from nnopt.recipes.mobilenetv2_cifar10 import get_cifar10_datasets, save_mobilenetv2_cifar10_model, DEVICE, DTYPE

2025-06-10 13:40:00,420 - nnopt.model.utils - DEBUG - pynvml available: True


Using device: cuda


# MobileNetV2 and CIFAR-10 adaptation

In [2]:
mobilenetv2 = torchvision.models.mobilenet_v2(
    weights=torchvision.models.MobileNet_V2_Weights.DEFAULT
)
train_dataset, val_dataset, test_dataset = get_cifar10_datasets()

# Adapt the MobileNetV2 model to CIFAR-10 dataset
adapted_model = adapt_model_head_to_dataset(
    model=mobilenetv2,
    num_classes=10,  # CIFAR-10 has 10 classes
    train_dataset=train_dataset,
    val_dataset=val_dataset,
    batch_size=64,  # Adjust batch size as needed
    head_train_epochs=5,  # Train head for fewer epochs
    fine_tune_epochs=3,  # Fine-tune for fewer epochs
    optimizer_cls=torch.optim.Adam,  # Use Adam optimizer
    head_train_lr=0.001,  # Learning rate for head training
    fine_tune_lr=0.0001,  # Learning rate for fine-tuning
    use_amp=True,  # Use mixed precision training
    device=DEVICE,
    dtype=DTYPE
)

2025-06-10 13:40:00,518 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading existing training and validation datasets...
2025-06-10 13:40:02,038 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Loading existing test dataset...
2025-06-10 13:40:02,348 - nnopt.model.train - INFO - Training head of the model with backbone frozen...
Epoch 1/5 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.31it/s, acc=0.4721, cpu=2.6%, gpu_mem=3.6/24.0GB (14.9%), gpu_util=33.0%, loss=1.7012, ram=7.8/30.9GB (28.9%), samples/s=359.4]  
Epoch 1/5 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.71it/s, acc=0.5356, cpu=0.0%, gpu_mem=3.6/24.0GB (14.9%), gpu_util=32.0%, loss=1.5813, ram=7.8/30.9GB (28.7%), samples/s=1344.8] 


Epoch 1/5, Train Loss: 1.5410, Train Acc: 0.4721, Train Throughput: 3821.88 samples/s | Val Loss: 1.3513, Val Acc: 0.5356, Val Throughput: 4664.13 samples/s | CPU Usage: 10.70% | RAM Usage: 7.6/30.9GB (28.1%) | GPU 0 Util: 32.00% | GPU 0 Mem: 3.6/24.0GB (14.9%)


Epoch 2/5 [Training]: 100%|██████████| 704/704 [00:35<00:00, 19.67it/s, acc=0.5216, cpu=2.9%, gpu_mem=3.6/24.0GB (14.8%), gpu_util=36.0%, loss=1.4824, ram=7.8/30.9GB (28.8%), samples/s=1069.4] 
Epoch 2/5 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.54it/s, acc=0.5452, cpu=3.8%, gpu_mem=3.5/24.0GB (14.7%), gpu_util=34.0%, loss=1.3460, ram=7.8/30.9GB (28.9%), samples/s=1402.5] 


Epoch 2/5, Train Loss: 1.3719, Train Acc: 0.5216, Train Throughput: 3756.08 samples/s | Val Loss: 1.3025, Val Acc: 0.5452, Val Throughput: 4264.46 samples/s | CPU Usage: 10.90% | RAM Usage: 7.6/30.9GB (28.2%) | GPU 0 Util: 23.00% | GPU 0 Mem: 3.5/24.0GB (14.7%)


Epoch 3/5 [Training]: 100%|██████████| 704/704 [00:35<00:00, 19.58it/s, acc=0.5294, cpu=3.0%, gpu_mem=3.7/24.0GB (15.2%), gpu_util=41.0%, loss=1.3764, ram=7.8/30.9GB (28.9%), samples/s=1005.6] 
Epoch 3/5 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.38it/s, acc=0.5464, cpu=3.7%, gpu_mem=3.6/24.0GB (15.2%), gpu_util=35.0%, loss=1.0885, ram=7.9/30.9GB (29.0%), samples/s=1399.5] 


Epoch 3/5, Train Loss: 1.3448, Train Acc: 0.5294, Train Throughput: 3800.47 samples/s | Val Loss: 1.2866, Val Acc: 0.5464, Val Throughput: 4990.68 samples/s | CPU Usage: 9.80% | RAM Usage: 7.7/30.9GB (28.3%) | GPU 0 Util: 35.00% | GPU 0 Mem: 3.6/24.0GB (15.2%)


Epoch 4/5 [Training]: 100%|██████████| 704/704 [00:35<00:00, 19.68it/s, acc=0.5327, cpu=0.0%, gpu_mem=3.7/24.0GB (15.3%), gpu_util=35.0%, loss=1.9961, ram=7.7/30.9GB (28.5%), samples/s=1093.6] 
Epoch 4/5 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.11it/s, acc=0.5494, cpu=10.0%, gpu_mem=3.6/24.0GB (15.2%), gpu_util=44.0%, loss=1.1079, ram=7.7/30.9GB (28.6%), samples/s=1290.6]


Epoch 4/5, Train Loss: 1.3329, Train Acc: 0.5327, Train Throughput: 3806.81 samples/s | Val Loss: 1.2799, Val Acc: 0.5494, Val Throughput: 4420.74 samples/s | CPU Usage: 20.10% | RAM Usage: 7.5/30.9GB (28.0%) | GPU 0 Util: 44.00% | GPU 0 Mem: 3.6/24.0GB (15.2%)


Epoch 5/5 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.51it/s, acc=0.5338, cpu=3.0%, gpu_mem=3.6/24.0GB (15.1%), gpu_util=39.0%, loss=2.2023, ram=7.7/30.9GB (28.6%), samples/s=1059.3] 
Epoch 5/5 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.32it/s, acc=0.5492, cpu=6.7%, gpu_mem=3.6/24.0GB (15.1%), gpu_util=40.0%, loss=0.9344, ram=7.7/30.9GB (28.6%), samples/s=1386.2] 
2025-06-10 13:43:23,896 - nnopt.model.train - INFO - Fine-tuning full model...


Epoch 5/5, Train Loss: 1.3258, Train Acc: 0.5338, Train Throughput: 3694.76 samples/s | Val Loss: 1.2649, Val Acc: 0.5492, Val Throughput: 4531.25 samples/s | CPU Usage: 9.90% | RAM Usage: 7.6/30.9GB (28.0%) | GPU 0 Util: 40.00% | GPU 0 Mem: 3.6/24.0GB (15.1%)


Epoch 1/3 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.54it/s, acc=0.6462, cpu=3.1%, gpu_mem=6.1/24.0GB (25.4%), gpu_util=65.0%, loss=2.4696, ram=7.8/30.9GB (28.9%), samples/s=165.5]  
Epoch 1/3 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 18.28it/s, acc=0.7222, cpu=6.7%, gpu_mem=6.1/24.0GB (25.6%), gpu_util=35.0%, loss=0.7444, ram=7.9/30.9GB (29.1%), samples/s=1192.1]  


Epoch 1/3, Train Loss: 1.0083, Train Acc: 0.6462, Train Throughput: 1915.63 samples/s | Val Loss: 0.8022, Val Acc: 0.7222, Val Throughput: 6033.46 samples/s | CPU Usage: 10.50% | RAM Usage: 7.6/30.9GB (28.2%) | GPU 0 Util: 35.00% | GPU 0 Mem: 6.1/24.0GB (25.6%)


Epoch 2/3 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.33it/s, acc=0.7217, cpu=5.0%, gpu_mem=6.1/24.0GB (25.4%), gpu_util=62.0%, loss=0.9187, ram=7.9/30.9GB (29.1%), samples/s=489.4]  
Epoch 2/3 [Validation]: 100%|██████████| 79/79 [00:03<00:00, 20.33it/s, acc=0.7482, cpu=0.0%, gpu_mem=6.1/24.0GB (25.4%), gpu_util=33.0%, loss=1.7175, ram=7.9/30.9GB (29.0%), samples/s=1331.9]  


Epoch 2/3, Train Loss: 0.7934, Train Acc: 0.7217, Train Throughput: 1886.41 samples/s | Val Loss: 0.7057, Val Acc: 0.7482, Val Throughput: 6457.33 samples/s | CPU Usage: 9.80% | RAM Usage: 7.7/30.9GB (28.4%) | GPU 0 Util: 33.00% | GPU 0 Mem: 6.1/24.0GB (25.4%)


Epoch 3/3 [Training]: 100%|██████████| 704/704 [00:36<00:00, 19.45it/s, acc=0.7572, cpu=3.1%, gpu_mem=6.1/24.0GB (25.3%), gpu_util=65.0%, loss=0.2487, ram=7.8/30.9GB (28.7%), samples/s=473.5]  
Epoch 3/3 [Validation]: 100%|██████████| 79/79 [00:04<00:00, 17.86it/s, acc=0.7710, cpu=3.8%, gpu_mem=6.1/24.0GB (25.4%), gpu_util=30.0%, loss=0.9391, ram=7.8/30.9GB (28.9%), samples/s=1419.8]  

Epoch 3/3, Train Loss: 0.6957, Train Acc: 0.7572, Train Throughput: 1902.18 samples/s | Val Loss: 0.6446, Val Acc: 0.7710, Val Throughput: 6689.27 samples/s | CPU Usage: 11.70% | RAM Usage: 7.5/30.9GB (28.0%) | GPU 0 Util: 30.00% | GPU 0 Mem: 6.1/24.0GB (25.4%)





In [3]:
# Evaluate the adapted model on the validation set
test_accuracy = eval_model(
    model=adapted_model,
    test_dataset=test_dataset,
    batch_size=64,  # Adjust batch size as needed
    device=DEVICE,
    use_amp=True,
    dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32
)
print(f"Test accuracy of the adapted MobileNetV2 on CIFAR-10: {test_accuracy:.2f}")

2025-06-10 13:45:25,213 - nnopt.model.eval - INFO - Starting warmup for 5 batches...
[Warmup]: 100%|██████████| 5/5 [00:00<00:00, 13.61it/s]
2025-06-10 13:45:25,671 - nnopt.model.eval - INFO - Warmup complete.
[Evaluation]: 100%|██████████| 157/157 [00:03<00:00, 40.25it/s, acc=0.9013, cpu=3.6%, gpu_mem=6.1/24.0GB (25.4%), gpu_util=36.0%, loss=0.1680, ram=7.8/30.9GB (28.7%), samples/s=691.0]  

Evaluation Complete: Avg Loss: 0.2823, Accuracy: 0.9013
Throughput: 7871.93 samples/sec | Avg Batch Time: 8.09 ms | Avg Sample Time: 0.13 ms
System Stats: CPU Usage: 12.00% | RAM Usage: 7.5/30.9GB (28.0%) | GPU 0 Util: 36.00% | GPU 0 Mem: 6.1/24.0GB (25.4%)
Test accuracy of the adapted MobileNetV2 on CIFAR-10: 0.90





In [4]:
# Export the adapted model
save_mobilenetv2_cifar10_model(
    model=adapted_model,
    version="baseline"
)

2025-06-10 13:45:29,623 - nnopt.recipes.mobilenetv2_cifar10 - INFO - Model saved to ../models/baseline/mobilenetv2_cifar10.pt
