# Intelligent Architectures (5LIL0) Assignment 2 (version 0.1)

#### **Authors:** [Alexios Balatsoukas-Stimming](mailto:a.k.balatsoukas.stimming@tue.nl) (TU/e), [Hiram Rayo Torres Rodriguez](mailto:hiram.rayotorresrodriguez@nxp.com) (NXP), [Willem Sanberg](mailto:willem.sanberg@nxp.com) (NXP)

#### **License:** [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)

## Part B: Neural Architecture Search

In this part of the assignment, you will perform neural architecture search (NAS). As a NAS tool, you will use [Optuna](https://optuna.org/), which is a general framework for automated hyperparameter tuning and optimization. Contrary to previous assignments, the focus will be more on interpreting results rather than writing your own code. Also, we will use TensorFlow with the Keras front-end instead of PyTorch for the neural network operations. Finally, we will rely more on the confidence and independence you have developed through previous assignments (e.g., we may point you to online documentation for certain points rather than explaining all details).

## 1. Setting up the NAS Experiment

The class ``NASNet``, which is defined in the separate file ``NASNet.py``, contains various pre-implemented functions that are used to evaluate the performance of each NAS round, which is called a ``trial`` in Optuna terminology. In the table below is a list of the most important functions and their functionality:

| Name | Functionality |
| ------------- | ------------- |
| ``__call__`` | Called when a class object is instantiated, performs training and calculates the optimization metrics |
| ``_get_mnist_dataset`` | Loads and normalizes the MNIST dataset |
| ``_quantize_model`` | Quantizes model using [LiteRT](https://ai.google.dev/edge/litert) using 8-bit full-integer quantization |
| ``_evaluate_quantized_model`` | Calculates the accuracy of the quantized model |
| ``_profile_model_latency`` | Calculates quantized model latency using the [Vela compiler](https://pypi.org/project/ethos-u-vela/) for an [ARM Ethos U55 NPU](https://armkeil.blob.core.windows.net/developer/Files/pdf/product-brief/arm-ethos-u55-product-brief.pdf) target |


Let us now import the class and other packages that are required for this assignment (you can ignore any warnings/errors you see).

In [5]:
from NASNet import NASNet
import tensorflow as tf
import logging
import optuna
from optuna.storages import JournalStorage
from optuna.storages.journal import JournalFileBackend
from optuna.samplers import TPESampler
import os
# os.environ["CUDA_VISIBLE_DEVICES"] = ""  # delete line to enable GPU-based training
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

Certain functionality was omitted from the NASNet class and we will implement it and add it to the class definition. First, a function is required to define the structure of the neural network to be optimized. We will implement a neural network for the MNIST dataset in Keras that looks as follows:

Input (28x28 image, 1 channel) &rarr; Convolutional Layer (``num_filters`` filters, ``k_size``x``k_size`` filter size) &rarr; Batch Normalization Layer &rarr; ReLU activation &rarr; Dense Layer (``n_units`` neurons, ReLU activation) &rarr; Dense Layer (10 neurons, no activation) 

Note that ``num_filters``, ``k_size``, and ``n_units`` are parameters that will be optimized by Optuna. They can be accessed, for example for ``num_filters``, as ``self.trial_hp["num_filters"]`` in the function below. Detailed documentation for Keras layers can be found [here](https://keras.io/api/layers/) and an explanation of the functional API to help you with the syntax can be found [here](https://keras.io/guides/functional_api/).

In [9]:
def _get_CNN(self, batch_size=256, training=True):
    inputs = tf.keras.Input(shape=(28, 28, 1), batch_size=batch_size)

    x = tf.keras.layers.Conv2D(filters=self.trial_hp["num_filters"], 
                               kernel_size=self.trial_hp["k_size"], 
                               padding="same")(inputs)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.ReLU()(x)
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(units=self.trial_hp["n_units"], activation="relu")(x)

    outputs = tf.keras.layers.Dense(10)(x)
        
    return tf.keras.Model(inputs=inputs,outputs=outputs)

# Add function to class
NASNet._get_CNN = _get_CNN

Next, we need to define the search space for our three hyperparameters to guide the NAS procedure. This is done by simply calling (no return argument required) the ``trial.suggest_int`` function for each of the three named hyperparameters we used in the neural network definiton above (i.e., ``"num_filters"``, ``"n_units"`` and ``"k_size"``). You will find documentation and examples for this function [here](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial.suggest_int). The ranges for ``"num_filters"``, ``"n_units"`` and ``"k_size"`` should be [2,8], [4,16], and [3,9], respectively, with a step size of 2 for all hyperparameters.

In [7]:
def _search_space_by_func(self, trial):
        # search for kernel size, number of output features and neurons
        self.num_filters = trial.suggest_int("num_filters", 2, 8, step=2)
        self.n_units = trial.suggest_int("n_units", 4, 16, step=2)
        self.k_size = trial.suggest_int("k_size", 3, 9, step=2)
        return trial.params

# Add function to class
NASNet._search_space_by_func = _search_space_by_func

## 2. Running the NAS Experiment

Before running our NAS experiment, we need to define a name for it that will be used as a folder name for the results. More importantly, we need to define the objectives of the optimization, which in our case are the floating-point accuracy (``fp32_accuracy``) and the number of parameters in the neural network (``num_params``), which we want to maximize and minimize, respectively. These metrics are calculated in the ``__call__`` function of the ``NASNet`` class. We also define the number of training epochs (``epochs = 1`` to keep the runtime of the experiment reasonable) and the learning rate (``lr = 0.001``).

Running this cell with the default values will take approximately 10 minutes. If you need to experiment to verify your code, you can set ``n_trials = 1`` temporarily to run the code in a few seconds. Don't forget to set it back to ``n_trials = 50`` before running the final experiment.

If you are interested, you can read the outputs that are printed, but this is not necessary. We will visualize and interpret the results in the following section. If the output of the cell becomes too long making it difficult for you to work, you can right click on it and select "Clear Cell Output" after it has finished running.

In [8]:
# Set TensorFlow/Keras seed for reproducibility
tf.keras.utils.set_random_seed(0)

# Configure experiment 
exp_name = "mnist_nas"
objectives = ['fp32_accuracy', 'num_params']
directions= ['maximize', 'minimize']
n_trials = 1
epochs = 1
lr = 0.001

# create experiment directory
exp_dir = os.path.join(os.getcwd(), exp_name)
os.makedirs(exp_dir, exist_ok=True)

# define search strategy and set seed for reproducibility
sampler = TPESampler(seed=0)

# Create optuna study for optimization (delete first if it already exists)
try:
    optuna.delete_study(study_name=exp_name,storage=JournalStorage(JournalFileBackend(os.path.join(exp_dir, "journal.log"))))
except Exception:    
    pass    
study = optuna.create_study(
    sampler=sampler,
    study_name=exp_name,
    storage=JournalStorage(JournalFileBackend(os.path.join(exp_dir, "journal.log"))),
    # storage = "sqlite:///mnist_nas.db",
    load_if_exists=True,
    directions=directions,
)

# perform NAS
study.optimize(NASNet(epochs=epochs, lr=lr, exp_dir=exp_dir, objectives=objectives), n_trials=n_trials)

[I 2025-03-24 23:46:24,914] A new study created in Journal with name: mnist_nas




[W 2025-03-24 23:46:27,838] Trial 0 failed with parameters: {'num_filters': 6, 'n_units': 14, 'k_size': 7} because of the following error: InvalidArgumentError().
Traceback (most recent call last):
  File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\optuna\study\_optimize.py", line 197, in _run_trial
    value_or_values = func(trial)
  File "c:\Users\Siyu Chen\git\5LIL0-Intelligent-architectures\assignments\assignment2_partB\NASNet.py", line 45, in __call__
    model.fit(ds_train, epochs=self.epochs, validation_data=ds_test)
  File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\tensorflow\python\eager\execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution e

InvalidArgumentError: Graph execution error:

Detected at node 'gradient_tape/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/mul' defined at (most recent call last):
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\runpy.py", line 196, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\runpy.py", line 86, in _run_code
      exec(code, run_globals)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\ipykernel_launcher.py", line 18, in <module>
      app.launch_new_instance()
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\traitlets\config\application.py", line 1075, in launch_instance
      app.start()
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\ipykernel\kernelapp.py", line 739, in start
      self.io_loop.start()
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\tornado\platform\asyncio.py", line 205, in start
      self.asyncio_loop.run_forever()
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\asyncio\base_events.py", line 603, in run_forever
      self._run_once()
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\asyncio\base_events.py", line 1909, in _run_once
      handle._run()
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\asyncio\events.py", line 80, in _run
      self._context.run(self._callback, *self._args)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\ipykernel\kernelbase.py", line 545, in dispatch_queue
      await self.process_one()
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\ipykernel\kernelbase.py", line 534, in process_one
      await dispatch(*args)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\ipykernel\kernelbase.py", line 437, in dispatch_shell
      await result
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\ipykernel\ipkernel.py", line 362, in execute_request
      await super().execute_request(stream, ident, parent)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\ipykernel\kernelbase.py", line 778, in execute_request
      reply_content = await reply_content
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\ipykernel\ipkernel.py", line 449, in do_execute
      res = shell.run_cell(
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\ipykernel\zmqshell.py", line 549, in run_cell
      return super().run_cell(*args, **kwargs)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\IPython\core\interactiveshell.py", line 3075, in run_cell
      result = self._run_cell(
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\IPython\core\interactiveshell.py", line 3130, in _run_cell
      result = runner(coro)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\IPython\core\async_helpers.py", line 128, in _pseudo_sync_runner
      coro.send(None)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\IPython\core\interactiveshell.py", line 3334, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\IPython\core\interactiveshell.py", line 3517, in run_ast_nodes
      if await self.run_code(code, result, async_=asy):
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\IPython\core\interactiveshell.py", line 3577, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "C:\Users\Siyu Chen\AppData\Local\Temp\ipykernel_13788\3529270118.py", line 34, in <module>
      study.optimize(NASNet(epochs=epochs, lr=lr, exp_dir=exp_dir, objectives=objectives), n_trials=n_trials)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\optuna\study\study.py", line 475, in optimize
      _optimize(
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\optuna\study\_optimize.py", line 63, in _optimize
      _optimize_sequential(
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\optuna\study\_optimize.py", line 160, in _optimize_sequential
      frozen_trial = _run_trial(study, func, catch)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\optuna\study\_optimize.py", line 197, in _run_trial
      value_or_values = func(trial)
    File "c:\Users\Siyu Chen\git\5LIL0-Intelligent-architectures\assignments\assignment2_partB\NASNet.py", line 45, in __call__
      model.fit(ds_train, epochs=self.epochs, validation_data=ds_test)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\keras\engine\training.py", line 1564, in fit
      tmp_logs = self.train_function(iterator)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\keras\engine\training.py", line 1160, in train_function
      return step_function(self, iterator)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\keras\engine\training.py", line 1146, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\keras\engine\training.py", line 1135, in run_step
      outputs = model.train_step(data)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\keras\engine\training.py", line 997, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 576, in minimize
      grads_and_vars = self._compute_gradients(
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 634, in _compute_gradients
      grads_and_vars = self._get_gradients(
    File "c:\Users\Siyu Chen\.conda\envs\program\lib\site-packages\keras\optimizers\optimizer_v2\optimizer_v2.py", line 510, in _get_gradients
      grads = tape.gradient(loss, var_list, grad_loss)
Node: 'gradient_tape/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/mul'
required broadcastable shapes
	 [[{{node gradient_tape/sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/mul}}]] [Op:__inference_train_function_2909]

## 3. Visualizing the NAS Results

Optuna has several built-in functions to visualize the results of an experiment, which we will explore below.

### 3.1 Pareto Front

Our optimization procedure provides various trade-offs between the two objectives of floating-point accuracy and number of parameters. As such, there is no single optimal solution and we instead use the notion of **Pareto optimality**. In a two-objective situation, a solution is said to be **Pareto optimal** if the only way to improve one of the objectives is to deteriorate the other objective. A solution that is not Pareto optimal is said to be **Pareto dominated** by some other solution. The set of Pareto optimal solutions forms the [**Pareto front**](https://en.wikipedia.org/wiki/Pareto_front) of a problem.

Below, we use the ``plot_pareto_front`` function and we exclude all Pareto dominated solutions to only visualize the Pareto front. The plot is interactive, if you hover over any point you will details about the trial that produced this solution: ``values`` contains the values of our two objectives, ``params`` contains the hyperparameters corresponding to the solution, and ``user_attrs`` contains the additional information that we calculated for each trial (note that this also includes the two objectives for convenience).

In [None]:
optuna.visualization.plot_pareto_front(study,
                                       targets=lambda t: (t.values[0]*100, t.values[1]), 
                                       target_names=['Accuracy (%)', 'Number of Parameters'], 
                                       include_dominated_trials=False)

From this point on we want to include all trials in our plots, which we can do by omitting the ``include_dominated_trials`` parameter (its default value is ``True``). The Pareto optimal solutions are plotted with hues of red, while the Pareto dominated solutions are plotted with hues of blue. You can verify that, for any Pareto dominated solution, there exists a Pareto optimal solution that is better in at least one of our two optimization metrics, i.e., it has higher accuracy, or a smaller number of parameters, or both.

In [None]:
optuna.visualization.plot_pareto_front(study, 
                                       targets=lambda t: (t.values[0]*100, t.values[1]), 
                                       target_names=['Accuracy (%)', 'Number of Parameters'])

We can also plot results stored in the ``user_attrs`` field. For example, below we plot the floating-point accuracy versus the model size of the quantized model in kB.

In [None]:
optuna.visualization.plot_pareto_front(study,
                                       targets=lambda t: (t.values[0]*100, t.user_attrs['int8_model_size']), 
                                       target_names=['Accuracy (%)', 'Quantized Model Size (kB)'])

Finally, below we plot the latency of the model of each trial (``int8_latency``) when deployed on the Ethos U55 NPU (calculated by the ``_profile_model_latency`` function in the ``NASNet`` class using the Vela compiler) versus the quantized model size in kB. Note that, contrary to the title of the plot that is added automatically, in general this is no longer a Pareto front, since we are not plotting the two objectives (or a monotonic function of the objectives) against each other.

In [None]:
optuna.visualization.plot_pareto_front(study,
                                       targets=...), 
                                       target_names=['Latency (ms)', 'Quantized Model Size (kB)'])

### 3.2 Per-Layer Performance Details

You can find details about the performance and resource utilization of each layer of the neural network for trial number ``x`` in the folder ``mnist_nas/trial_x/vela_output/mnist_nas_full_int8_per-layer.csv``. For example, the column ``SRAM Usage`` shows how many bytes of the SRAM are used by each layer. Details for each column can be found [here](https://github.com/nxp-imx/ethos-u-vela/blob/lf-6.6.3_1.0.0/PERFORMANCE.md#vela-performance-estimation-per-layer).

### 3.3 Hyperparameter Importance

Optuna can tell us to what extent each hyperparameter influences the optimization of each objective (i.e., maximization or minimization, depending on the direction defined for each objective) and the user-defined parameters. This largely depends on the range that we have defined: if the range for some hyperparameter is very restrictive, it will become very important to increase/decrease it as much as possible. Nevertheless, this visualization gives an indication of the importance of each hyperparameter and, more importantly, can reveal the inherent conflicts and synergies between the optimization objectives.

In the following three cells, we plot the importance of each hyperparameter for the accuracy, the number of parameters, and the quantized model latency, respectively.

In [None]:
optuna.visualization.plot_param_importances(study, target=lambda t: t.values[0]*100, target_name="Accuracy (%)")

In [None]:
optuna.visualization.plot_param_importances(study, target=lambda t: t.values[1], target_name="Number of Parameters")

In [None]:
optuna.visualization.plot_param_importances(study, target=lambda t: t.user_attrs['int8_latency'], target_name="Latency")