# Neural Architecture Search
Neural Architecture Search (NAS) is a special kind of Hyperparameter Optimization (HO) where we aim to tune the model architecture, i.e. structural properties of our model, instead of hyperparameters such as learning rates. Model architectures are hyperparameters as well, however, the search space is combinatorial in size. Thus classical HO-algorithms' runtime scales up very quickly. Specialized algorithms have been developed to search architectures more efficiently, two of them are ENAS and DARTS which we will look at here. Both of these algorithms are *one-shot* approaches to NAS. This means that they don't train each architecture sampled from the search space independently. Both exploit *weight sharing* which simply means that all architectures in the search space share the same model-parameters. This also means that both, the architecture as well as the model-parameters, are being optimized at the same time. This makes learning much faster and EANS and DARTS have been empirically proven to yield state of the art architectures.

## The Problem
Usually, NAS is defined as a bi-level optimization problem:
\begin{align}
    & \min_{\mathbf{a} \in \mathcal{A}} \mathcal{L}(\mathbf{X}_{val}, \mathbf{y}_{val}; \mathbf{w}^*) \\
    \text{s.t. } & \mathbf{w}^* = \arg \min_{\mathbf{w} \in \mathbb{R}^n} \mathcal{L}(\mathbf{X}_{train}, \mathbf{y}_{train}; \mathbf{w})
\end{align}
We will now consider two different approaches to solve this optimization problem.

## ENAS
ENAS defines the search space of neural architectures as a single Direct Acyclic Graph (DAG). This DAG (supernet) contains a set of possible architectures that can be sampled. Sampling is done by a controller, a neural network (RNN) which is trained s.t. it samples well suited architectures. The RNN is trained using REINFORCE, thus ENAS essentially uses Reinforcement Learning (RL) to solve Equation 1 from above. To speed up training, ENAS employs weight sharing. That is, each module in the supernet has a fixed set of model-parameters which is shared by all architectures.

ENAS has two phases which are performed in an alternating manner: It starts by initlializing the RNN controller which is then fixed and used to sample architectures. Each architecture is then trained for a small number of epochs (usually just one), thereby updating the model-paramaters. After a certain number of iterations, the model-parameters are fixed and the RNN-controller is updated. This is done using REINFORCE, i.e. the RNN is used to sample a set of architectures. Then the gradient of the reward function (can be validation loss or any other function) is approximated w.r.t. the parameters of the RNN. The RNN-parameters are then updated using this gradient. This is done for a pre-defined number of iterations, followed by the firs step again.

Below you can see how ENAS can be used using the *Neural Network Intelligence (NNI)* library.

In [1]:
import logging
import time
from argparse import ArgumentParser

import torch
import torch.nn as nn

import datasets
from macro import GeneralNetwork
from micro import MicroNetwork
from nni.retiarii.oneshot.pytorch.enas import EnasTrainer
from utils import accuracy, reward_accuracy

  warn(f"Failed to load image Python extension: {e}")


In [2]:
dataset_train, dataset_valid = datasets.get_dataset("cifar10")
mutator = None
ctrl_kwargs = {}
search_for = 'macro'
if search_for == "macro":
    model = GeneralNetwork()
    num_epochs = 310
elif search_for == "micro":
    model = MicroNetwork(num_layers=6, out_channels=20, num_nodes=5, dropout_rate=0.1, use_aux_heads=False)
    num_epochs = 150
    ctrl_kwargs = {"tanh_constant": 1.1}
else:
    raise AssertionError
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=num_epochs, eta_min=0.001)
trainer = EnasTrainer(model,
                      loss=criterion,
                      metrics=accuracy,
                      reward_function=reward_accuracy,
                      optimizer=optimizer,
                      batch_size=64,
                      num_epochs=num_epochs,
                      dataset=dataset_train,
                      log_frequency=10,
                      ctrl_kwargs=ctrl_kwargs)
trainer.fit()

Files already downloaded and verified
Files already downloaded and verified




[2023-03-02 14:24:20] [32mModel Epoch [1/310] Step [1/391]  acc1 0.046875 (0.046875)  loss 2.383232 (2.383232)[0m
[2023-03-02 14:24:26] [32mModel Epoch [1/310] Step [11/391]  acc1 0.125000 (0.127841)  loss 2.245122 (2.277718)[0m


KeyboardInterrupt: 