# Hyperparameter Tuning using Keras Tuning

**Note on the importance of hyperparameters** The hyperparameters are the variables that govern the training process of a model such as learning rate, batch size, and number of layers of the neural network.  Keras Tuner is a useful library to help automate the proces of hyparparameter tuning which allows defining a model with hyperparameters, configure the search and the run the hyperparameter search then analyze the results and train the optimized model.

- This notebook demonstrates how to use the keras tuner library to set hyperparameters (e.g. learning rate, batch size, number of layers of units per layer) before the training begins 

In [25]:
#Set up environment and import and install keras tuner if not already installed

import subprocess
import sys
import warnings

# Suppress urllib3 warnings
warnings.filterwarnings('ignore', category=UserWarning, module='urllib3')

try:
   import keras_tuner
   print("keras tuner is already installed")
except ImportError:
   print("Installing keras-tuner...")
   subprocess.check_call([sys.executable, "-m", "pip", "install", "keras-tuner"])
   print("Done")

#Imports Keras modules (keras tuner, tensorflow and keras components for building model & loading dataset)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.datasets import mnist
from tensorflow.keras.optimizers import Adam
import keras_tuner as kt
import tensorflow as tf

keras tuner is already installed


In [26]:
warnings.filterwarnings('ignore', message='Do not pass an `input_shape`')

#Defines a model building function that specifies the model parameters that you want to tune
def build_model(hp):
   model = Sequential([
       Flatten(input_shape=(28, 28)), 
       Dense(units=hp.Int('units', min_value=32, max_value=512, step=32), activation='relu'),
       Dense(10, activation='softmax')
   ])
   model.compile(
       optimizer=Adam(learning_rate=hp.Float('learning_rate', min_value=1e-4, max_value=1e-2, sampling='LOG')), 
       loss='sparse_categorical_crossentropy', 
       metrics=['accuracy'])
   return model

#configure the search process by creating a random search tuner
tuner = kt.RandomSearch(
   build_model, 
   objective='val_accuracy',
   max_trials=10,
   executions_per_trial=2,
   directory='files_generated_from_notebooks',
   project_name='intro_to_kt'
)

Reloading Tuner from files_generated_from_notebooks/intro_to_kt/tuner0.json


**Once the tuner is configured, you can run the hyperparameter search**
- The tuner will evaluate different hyperparameter combinations and find the best based on validation accuracy

In [18]:
#Hyperparameter search

#Load and pre-process the mnist dataset
(x_train, y_train), (x_val, y_val) = mnist.load_data()
x_train, x_val = x_train / 255.0, x_val / 255.0

#Run the hyperparameter search using the search method
tuner.search(x_train, y_train, epochs=5, validation_data=(x_val, y_val))

Trial 10 Complete [00h 00m 28s]
val_accuracy: 0.9769000113010406

Best val_accuracy So Far: 0.9796499907970428
Total elapsed time: 00h 04m 07s


**Last step:** After the search is complete and best values are discovered, we need to build a model with these optimized values.  Below cells demonstrate.

In [22]:
"""
Retrieve and print the best hyperparameter values using the get_best_hyperparameters 
function and print the best hyperparameters
"""
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"The optimal number of units in the first dense layer is {best_hps.get('units')}. The optimal learning rate for the optimizer is {best_hps.get('learning_rate')}.")

model = tuner.hypermodel.build(best_hps)

The optimal number of units in the first dense layer is 256. The optimal learning rate for the optimizer is 0.0009328604664075379.


In [27]:
# Load and preprocess the data (if not already done)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

#Train the model with the optimized hyparaparameters

model.fit(x_train, y_train, epochs=10, validation_split=0.2)

#Evaluate the model's performance on the test set 
test_loss, test_acc = model.evaluate(x_test, y_test)

print(f'Test accuracy: {test_acc}')

Epoch 1/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9971 - loss: 0.0089 - val_accuracy: 0.9777 - val_loss: 0.0956
Epoch 2/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9974 - loss: 0.0086 - val_accuracy: 0.9795 - val_loss: 0.0946
Epoch 3/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9984 - loss: 0.0059 - val_accuracy: 0.9783 - val_loss: 0.1049
Epoch 4/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9982 - loss: 0.0057 - val_accuracy: 0.9774 - val_loss: 0.1062
Epoch 5/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9991 - loss: 0.0042 - val_accuracy: 0.9783 - val_loss: 0.1033
Epoch 6/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 1ms/step - accuracy: 0.9982 - loss: 0.0059 - val_accuracy: 0.9785 - val_loss: 0.1091
Epoch 7/10
[1m1

# Summary and results

## Summary

This notebook demonstrates automated hyperparameter optimization using Keras Tuner's RandomSearch to find optimal network architecture and learning rates. The tuner systematically tested different combinations of hidden layer units (32-512) and learning rates (0.0001-0.01) to maximize validation accuracy on MNIST digit classification.

## Results

The automated tuning successfully found optimal hyperparameters that achieved nearly 98% test accuracy on MNIST. Signs of mild overfitting are evident with training accuracy (99.84%) significantly higher than validation accuracy (97.88%), suggesting the model memorized some training patterns. 

## Results interpretation
97.99% test accuracy is excellent for a simple neural network on MNIST.  ~2% gap between training (99.84%) and validation (97.88%) is normal and manageable. MNIST is designed to be learnable, so high accuracy is expected.