<a href="https://colab.research.google.com/github/toufiqmusah/IndabaX25/blob/main/Neural%20Architecture%20Search%20%26%20Deployment%20Optimization%20-%20Part%204" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
# **Efficient Deep Learning: Neural Architecture Search & Optimized Model Deployment** - Part 4

#### Program: `Deep Learning Indabax, Ghana, 2025` .
#### 🏫 Institution: Kumasu Centre for Collaborative Research in Tropical Medicine (KCCR)
#### 📅 Date: `*16 th June, 2025`

---

##### 👨‍🏫 Facilitator: Toufiq Musah       

**Research Engineering**  

✉️ Email: [toufiq Musah](toufiqmusah32@gmail.com)  
🔗 LinkedIn: [toufiq](https://www.linkedin.com/in/toufiqmusah/)

---
### 🛠️ Tools and  Frameworks used  

- Python 3.x .
- TensorFlow 2.x / Keras
- Keras-Tuner for Parameterization
- Matplotlib / Seaborn for Visualization

---

# **Table of Contents**

1.   [Introduction](#Introduction)
2.   [Prerequisites](#Prerequisites)
3.   [Step-by-Step-Guide](#Step-by-Step-Guide)
4.   [Code Examples](#Code-Examples)
5.   [Troubleshooting](#Troubleshooting)
6.   [Conclusion](#Conclusion)
7.   [References](#References)

# 1. **Neural Architecture Search**


 *Why automate the design of the network itself?*

**Definition.** Neural Architecture Search (NAS) is an automated procedure that explores a *search space* of possible layer topologies and operations, trains many candidate networks, and selects the one that best optimises a user-defined objective (accuracy, latency, memory, …).

**How it differs from hyper-parameter optimisation.**  
NAS, in contrast to hyperparameterisation, can change the entire structure of a network (example; how many stages, which kernel size per block), instead of finetuning a given architecture.

---

#### **Example 1 – MnasNet: Reinforcement-Learning NAS**
Google’s **MnasNet** used a policy-gradient RL agent to sample entire mobile CNNs, training 8000 candidates and rewarding those with high ImageNet accuracy *and* low real-device latency.

#### **Example 2 – EfficientNet: NAS Baseline + Compound Scaling**
EfficientNet begins with **EfficientNet-B0**, *exactly* the architecture produced by MnasNet’s RL search. The authors then introduced a *compound scaling* rule to simultaneously widen, deepen, and increase input resolution, generating B1–B7 models that set new ImageNet efficiency records.

---
> **Take-away:** NAS automates decisions at the structural level—creating novel cells and macro layouts—whereas classical hyper-parameter search rather fine-tunes numeric settings inside a hand-crafted design.


In [1]:
!pip -q install keras-tuner


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3 -m pip install --upgrade pip[0m


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf
import keras_tuner as kt
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.optimizers import SGD, Adam
from tensorflow.keras.metrics import SparseCategoricalAccuracy
from tensorflow.keras import layers, models, optimizers, callbacks, regularizers

from sklearn.model_selection import train_test_split

2025-06-06 16:21:50.526774: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-06 16:21:50.528751: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-06 16:21:50.532159: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-06 16:21:50.540947: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1749226910.555392   24860 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749226910.55

In [3]:
# The Cancer Genome Atlas, Breast Cancer Dataset
# reviewer_comment: missing import- !pip install gdown
!gdown -q '1odEM4UcDSnc_Yo8I6YeIa128d271k9q-'

### The Cancer Genomics Archive - Breast Cancer Omics Dataset
This is a **high dimensional** type of dataset, often requiring rigourous optimization techniques for the best of results.

In [4]:
data = pd.read_csv('TCGA-BRCA-dataset.csv')
data.head()

Unnamed: 0,rs_CLEC3A,rs_CPB1,rs_SCGB2A2,rs_SCGB1D2,rs_TFF1,rs_MUCL1,rs_GSTM1,rs_PIP,rs_ADIPOQ,rs_ADH1B,...,pp_p27.pT198,pp_p38.MAPK,pp_p38.pT180.Y182,pp_p53,pp_p62.LCK.ligand,pp_p70S6K,pp_p70S6K.pT389,pp_p90RSK,pp_p90RSK.pT359.S363,vital.status
0,0.892818,6.580103,14.123672,10.606501,13.189237,6.649466,10.520335,10.33849,10.248379,10.22997,...,-0.04333,-0.002598,0.449228,-0.37523,-0.691766,-0.337863,-0.178503,0.011638,-0.207257,0
1,0.0,3.691311,17.11609,15.517231,9.867616,9.691667,8.179522,7.911723,1.289598,1.818891,...,-0.220764,0.220809,1.035115,-0.074136,0.279067,0.292925,-0.155242,-0.089365,0.26753,0
2,3.74815,4.375255,9.658123,5.326983,12.109539,11.644307,10.51733,5.114925,11.975349,11.911437,...,0.010615,-0.133214,0.344969,-0.351936,0.21991,0.30811,-0.190794,-0.22215,-0.198518,0
3,0.0,18.235519,18.53548,14.533584,14.078992,8.91376,10.557465,13.304434,8.205059,9.211476,...,0.06407,-0.384008,0.678042,0.096329,-0.266554,-0.079871,-0.463237,0.522998,-0.046902,0
4,0.0,4.583724,15.711865,12.804521,8.881669,8.430028,12.964607,6.806517,4.294341,5.385714,...,-0.065488,0.209858,0.920408,0.04221,-0.441542,-0.152317,0.511386,-0.096482,0.037473,0


In [5]:
(x_train, x_test, y_train, y_test) = train_test_split(data.drop('vital.status', axis=1), data['vital.status'], test_size=0.2, random_state=42)
x_train.shape, y_train.shape, x_test.shape, y_test.shape

((564, 1936), (564,), (141, 1936), (141,))

In [10]:
'''x_train = np.expand_dims(x_train, axis=2)
x_test = np.expand_dims(x_test, axis=2)'''

'x_train = np.expand_dims(x_train, axis=2)\nx_test = np.expand_dims(x_test, axis=2)'

### **NAS Model Building Function**

### Defining the Search Space

A compact **MobileNet‑style** search space:

| Hyper‑parameter | Choices |
|-----------------|---------|
| Number of Conv Blocks | 2 – 4 |
| Filters per Block     | 24 – 64 |
| Kernel Size           | 2, 3, or 5 |
| Depthwise Separable?  | {True, False} |
| Dropout Rate          | 0.0 – 0.3 |

In [6]:
# model-building function that KerasTuner can use

def model_builder(hp):
    inputs = tf.keras.Input(shape=(x_train.shape[1], 1))
    x = inputs
    num_blocks = hp.Int('num_blocks', 2, 4, 6)
    for i in range(num_blocks):
        filters = hp.Choice(f'filters_{i}', [24,32,40,48,64])
        kernel  = hp.Choice(f'kernel_{i}', [2,3,5])
        if hp.Boolean(f'ds_sep_{i}'):
            x = tf.keras.layers.SeparableConv1D(filters, kernel,
                                                padding='same', activation='relu')(x)
        else:
            x = tf.keras.layers.Conv1D(filters, kernel,
                                       padding='same', activation='relu')(x)
        x = tf.keras.layers.BatchNormalization()(x)
        x = tf.keras.layers.MaxPool1D()(x)
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dropout(rate=hp.Float('dropout',0.0,0.3,0.05))(x)
    outputs = tf.keras.layers.Dense(10, activation='softmax')(x)
    model = tf.keras.Model(inputs, outputs)

    lr = hp.Choice('lr',[1e-3, 5e-4, 1e-4])
    model.compile(optimizer=tf.keras.optimizers.Adam(lr),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model



# instantiating tuner

tuner = kt.Hyperband(model_builder,
                     objective='val_accuracy',
                     max_epochs=2,
                     factor=3,
                     directory='nas_dir',
                     overwrite=True,
                     project_name='intro_to_NAS')

2025-06-06 16:24:25.701382: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


In [7]:
# adding early stopping
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)

# run NAS
print("Starting NAS search...")
tuner.search(x_train, y_train, epochs=4, validation_data=(x_test, y_test), callbacks=[stop_early])
print("NAS search complete.")

# optimal hyperparameters and the best model
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]
nas_model = tuner.hypermodel.build(best_hps)
print(f"Best hyperparameters: {best_hps.values}")

Trial 2 Complete [00h 00m 06s]
val_accuracy: 0.8865247964859009

Best val_accuracy So Far: 0.8865247964859009
Total elapsed time: 00h 00m 12s
NAS search complete.
Best hyperparameters: {'num_blocks': 2, 'filters_0': 64, 'kernel_0': 5, 'ds_sep_0': False, 'filters_1': 64, 'kernel_1': 2, 'ds_sep_1': False, 'dropout': 0.05, 'lr': 0.0001, 'tuner/epochs': 2, 'tuner/initial_epoch': 0, 'tuner/bracket': 0, 'tuner/round': 0}


In [8]:
# !rm -rf nas_dir

In [8]:
# train best model found by NAS

print("Training the best model found by NAS...")
history = nas_model.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))
val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print(f'Best epoch: {best_epoch}')

Training the best model found by NAS...
Epoch 1/5
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 102ms/step - accuracy: 0.6773 - loss: 0.9069 - val_accuracy: 0.3333 - val_loss: 0.7549
Epoch 2/5
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 101ms/step - accuracy: 0.8597 - loss: 0.4338 - val_accuracy: 0.8227 - val_loss: 0.5722
Epoch 3/5
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 90ms/step - accuracy: 0.8981 - loss: 0.2699 - val_accuracy: 0.2199 - val_loss: 0.8328
Epoch 4/5
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 91ms/step - accuracy: 0.9043 - loss: 0.2719 - val_accuracy: 0.8794 - val_loss: 0.4596
Epoch 5/5
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 87ms/step - accuracy: 0.9275 - loss: 0.1996 - val_accuracy: 0.8794 - val_loss: 0.4736
Best epoch: 4


In [9]:
# re-initialize the model and train up to the best epoch

hypermodel = tuner.hypermodel.build(best_hps)
print("Retraining the best model until the best epoch...")
hypermodel.fit(x_train, y_train, epochs=best_epoch, validation_data=(x_test, y_test))
nas_best_model = hypermodel
print("Best model from NAS is trained and ready.")

# save model
nas_best_model.save("nas_optimal_model.keras")

Retraining the best model until the best epoch...
Epoch 1/4
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 106ms/step - accuracy: 0.7012 - loss: 0.9615 - val_accuracy: 0.8865 - val_loss: 0.4324
Epoch 2/4
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 90ms/step - accuracy: 0.8878 - loss: 0.3402 - val_accuracy: 0.8723 - val_loss: 0.4582
Epoch 3/4
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 100ms/step - accuracy: 0.9070 - loss: 0.2610 - val_accuracy: 0.8794 - val_loss: 0.4159
Epoch 4/4
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 87ms/step - accuracy: 0.9221 - loss: 0.2198 - val_accuracy: 0.8794 - val_loss: 0.3947
Best model from NAS is trained and ready.


In [11]:
nas_best_model.summary()

# 2. **Post-training Quantization**

We will convert the SavedModel to **TFLite dynamic‑range INT8** and to **full‑integer INT8** with a *representative dataset*:

In [12]:
# using a portion of the dataset

def representative_data_generator():
  for input_value in tf.data.Dataset.from_tensor_slices(x_train.astype(np.float32)).batch(1).take(100):
    yield [tf.expand_dims(input_value, axis=-1)]

converter_int8 = tf.lite.TFLiteConverter.from_keras_model(nas_best_model)
converter_int8.optimizations = [tf.lite.Optimize.DEFAULT]

converter_int8.representative_dataset = representative_data_generator

converter_int8.target_spec.supported_ops = [
    tf.lite.OpsSet.TFLITE_BUILTINS,
    tf.lite.OpsSet.TFLITE_BUILTINS_INT8
]

In [13]:
tflite_int8_quant_model = converter_int8.convert()
with open('nas_model_int8_quant.tflite', 'wb') as f:
    f.write(tflite_int8_quant_model)
print("Full INT8 quantized model saved as nas_model_int8_quant.tflite")

INFO:tensorflow:Assets written to: /tmp/tmpml08bt9s/assets


INFO:tensorflow:Assets written to: /tmp/tmpml08bt9s/assets


Saved artifact at '/tmp/tmpml08bt9s'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 1936, 1), dtype=tf.float32, name='keras_tensor_20')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  136770109032784: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136770109034512: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136770080660688: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136770080664336: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136770109041808: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136770106939472: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136770080662800: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136770080660880: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136770080662224: TensorSpec(shape=(), dtype=tf.resource, name=None)
  136770080660496: TensorSpec(shape=(), dtype=tf.resource, name=None)
  13677008066548

W0000 00:00:1749227151.337728   24860 tf_tfl_flatbuffer_helpers.cc:365] Ignored output_format.
W0000 00:00:1749227151.337765   24860 tf_tfl_flatbuffer_helpers.cc:368] Ignored drop_control_dependency.
2025-06-06 16:25:51.338222: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmpml08bt9s
2025-06-06 16:25:51.339583: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2025-06-06 16:25:51.339627: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: /tmp/tmpml08bt9s
I0000 00:00:1749227151.349889   24860 mlir_graph_optimization_pass.cc:425] MLIR V1 optimization pass is not enabled
2025-06-06 16:25:51.351699: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle.
2025-06-06 16:25:51.409035: I tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: /tmp/tmpml08bt9s
2025-06-06 16:25:51.420049: I tensorflow/cc/saved_model/loader.cc:471] SavedModel 

Full INT8 quantized model saved as nas_model_int8_quant.tflite


2025-06-06 16:25:53.058479: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence
fully_quantize: 0, inference_type: 6, input_inference_type: FLOAT32, output_inference_type: FLOAT32
