## Search architecture with latency strictly lower than the specified value

This notebook describes how to search only architectures with latency strictly lower than the specified value.

### Main chapters of this notebook:
1. Setup the environment
1. Prepare dataset and create dataloaders
1. Build search space
1. Test constrained search procedure

## Setup the environment

First, let's set up the environment and make some common imports.

In [None]:
import os

os.environ['CUDA_DEVICE_ORDER'] = 'PCI_BUS_ID'
# You may need to uncomment and change this variable to match free GPU index
# os.environ['CUDA_VISIBLE_DEVICES'] = '0'

In [None]:
import sys

sys.path.append('../')

from pathlib import Path

import torch
import torch.nn as nn
import numpy as np
from torch_optimizer import RAdam
from torchvision.models.mobilenet import mobilenet_v2

from enot.autogeneration import TransformationParameters
from enot.autogeneration import generate_pruned_search_variants_model
from enot.latency import current_latency
from enot.latency import initialize_latency
from enot.latency import min_latency
from enot.latency import max_latency
from enot.latency import mean_latency
from enot.models import SearchSpaceModel
from enot.optimize import FixedLatencySearchOptimizer

from tutorial_utils.checkpoints import download_autogen_pretrain_checkpoint
from tutorial_utils.dataset import create_imagenette_dataloaders
from tutorial_utils.phases import tutorial_train_loop
from tutorial_utils.train import accuracy

## Prepare dataset and create dataloaders

In [None]:
HOME_DIR = Path.home() / '.optimization_experiments'
DATASETS_DIR = HOME_DIR / 'datasets'
PROJECT_DIR = HOME_DIR / 'search_with_the_specified_latency'

HOME_DIR.mkdir(exist_ok=True)
DATASETS_DIR.mkdir(exist_ok=True)
PROJECT_DIR.mkdir(exist_ok=True)

In [None]:
dataloaders = create_imagenette_dataloaders(
    dataset_root_dir=DATASETS_DIR,
    project_dir=PROJECT_DIR,
    input_size=(224, 224),
    batch_size=32,
)

## Build search space

In [None]:
my_model = mobilenet_v2(pretrained=True)

classifier = my_model.classifier[1]
my_model.classifier = nn.Linear(
    in_features=classifier.in_features,
    out_features=10,
    bias=True,
)
my_model.eval()

first_block = my_model.features[0]  # First MobileNet block in model.
generated_model = generate_pruned_search_variants_model(
    my_model,
    search_variant_descriptors=(
        TransformationParameters(width_mult=1.0),
        TransformationParameters(width_mult=0.75),
        TransformationParameters(width_mult=0.5),
        TransformationParameters(width_mult=0.25),
        TransformationParameters(width_mult=0.0),
    ),
    excluded_modules=[first_block],  # Leave first MobileNet block unchanged.
)
# move model to search space
search_space = SearchSpaceModel(generated_model).cuda()

#### In this example, we will use <span style="color:red">pre-trained</span> search space from <span style="color:green;white-space:nowrap">***2. Tutorial - search space autogeneration***</span>. You can find the detailed description of pretrain procedure in <span style="color:green;white-space:nowrap">***1. Tutorial - getting started***</span>.

In [None]:
checkpoint_path = PROJECT_DIR / 'autogen_pretrain_checkpoint.pth'
download_autogen_pretrain_checkpoint(checkpoint_path)
search_space.load_state_dict(
    torch.load(checkpoint_path)['model'],
)

## Test constrained search procedure

In this tutorial we use the same tune loop as in <span style="color:green;white-space:nowrap">***1. Tutorial - getting started***</span>.

To use latency optimization:

1. Initialize search space latency with `initialize_latency` function.
1. Use `FixedLatencySearchOptimizer` instead of the `SearchOptimizer`.
1. Pass `max_latency_value` parameter to the enot optimizer constructor. This value restricts the maximal latency of the generated models during the search process.
1. **Apply `modify_loss` function** from the `FixedLatencySearchOptimizer` to your target loss. An example can be seen below.

#### Check minimal and maximal latency in the search space to select reasonable latency constraint

In [None]:
latency_type = 'mmac.thop'

sample_inputs = torch.zeros((1, 3, 224, 224)).cuda()
latency_container = initialize_latency(latency_type, search_space, (sample_inputs,))

print(f'Constant latency = {latency_container.constant_latency:.1f}')
print(
    f'Min, mean and max latencies of search space: '
    f'{min_latency(latency_container):.1f}, '
    f'{mean_latency(latency_container):.1f}, '
    f'{max_latency(latency_container):.1f}'
)

#### Run constrained search process

In [None]:
N_EPOCHS = 30

max_latency_value = 140.0  # 140.0 is in range of search space latency distribution

# Optimizing `search_space.architecture_parameters()`.
optimizer = RAdam(search_space.architecture_parameters(), lr=0.02)

# Using `FixedLatencySearchOptimizer` as a default optimizer.
search_optimizer = FixedLatencySearchOptimizer(
    search_space,
    optimizer,
    max_latency_value=max_latency_value,
)

metric_function = accuracy
loss_function = nn.CrossEntropyLoss().cuda()

train_loader = dataloaders['search_train_dataloader']
validation_loader = dataloaders['search_validation_dataloader']

for epoch in range(N_EPOCHS):
    print(f'EPOCH #{epoch}')

    search_space.train()
    train_metrics_acc = {
        'loss': 0.0,
        'accuracy': 0.0,
        'n': 0,
    }
    for inputs, labels in train_loader:
        search_optimizer.zero_grad()

        # Wrapping model step and backward with closure.
        def closure():
            pred_labels = search_space(inputs)
            batch_loss = loss_function(pred_labels, labels)

            # Apply loss modification function.
            modified_loss = search_optimizer.modify_loss(batch_loss)

            modified_loss.backward()
            batch_metric = metric_function(pred_labels, labels)

            train_metrics_acc['loss'] += batch_loss.item()
            train_metrics_acc['accuracy'] += batch_metric.item()
            train_metrics_acc['n'] += 1

        search_optimizer.step(closure)

    train_loss = train_metrics_acc['loss'] / train_metrics_acc['n']
    train_accuracy = train_metrics_acc['accuracy'] / train_metrics_acc['n']
    arch_probabilities = np.array(search_space.architecture_probabilities)

    print('train metrics:')
    print('  loss:', train_loss)
    print('  accuracy:', train_accuracy)
    print('  arch_probabilities:')
    print(arch_probabilities)

    search_space.eval()

    # Selecting the best architecture for validation.
    search_optimizer.prepare_validation_model()

    validation_loss = 0
    validation_accuracy = 0
    with torch.no_grad():
        for inputs, labels in validation_loader:
            pred_labels = search_space(inputs)
            batch_loss = loss_function(pred_labels, labels)
            batch_metric = metric_function(pred_labels, labels)

            validation_loss += batch_loss.item()
            validation_accuracy += batch_metric.item()

    n = len(validation_loader)
    validation_loss /= n
    validation_accuracy /= n

    print('validation metrics:')
    print('  loss:', validation_loss)
    print('  accuracy:', validation_accuracy)
    if search_space.latency_type is not None:
        # Getting latency of the best architecture.
        latency = current_latency(search_space)
        print('  latency:', latency)

    print()

In [None]:
best_arch = search_space.forward_architecture

print(f'Found architecuture: {best_arch}')
print(
    f'It\'s latency is {current_latency(search_space):.1f}, '
    f'which satisfies the specified latency constraint (<={max_latency_value:.1f})'
)

In [None]:
# Get regular model with the best architecture.
best_model = search_space.get_network_by_indexes(best_arch).cuda()

### Tune found architecture

In [None]:
optimizer = RAdam(best_model.parameters(), lr=1e-3, weight_decay=1e-4)

tutorial_train_loop(
    epochs=5,
    model=best_model,
    optimizer=optimizer,
    metric_function=accuracy,
    loss_function=loss_function,
    train_loader=dataloaders['tune_train_dataloader'],
    validation_loader=dataloaders['tune_validation_dataloader'],
    scheduler=None,
)