Let's see how to explore hyperparameter search/tuning with the package [Optuna](https://https://optuna.org/)

As before, firstly, we will start with the familiar [MNIST](https://https://en.wikipedia.org/wiki/MNIST_database) dataset.

However, first of all, let's just install the Optuna library:

In [1]:
pip install optuna

Collecting optuna
  Downloading optuna-4.4.0-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.16.2-py3-none-any.whl.metadata (7.3 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Downloading optuna-4.4.0-py3-none-any.whl (395 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m395.9/395.9 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.16.2-py3-none-any.whl (242 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.7/242.7 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Installing collected packages: colorlog, alembic, optuna
Successfully installed alembic-1.16.2 colorlog-6.9.0 optuna-4.4.0


# Importing needed libraries

In [2]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt
import pandas as pd

import numpy as np
import seaborn as sns
import warnings
import optuna

warnings.filterwarnings('ignore')
pd.options.display.float_format = '{:,.2f}'.format
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 200)

import tensorflow as tf

from __future__ import print_function
from datetime import datetime
from matplotlib.colors import ListedColormap
from sklearn.datasets import make_classification, make_moons, make_circles
from sklearn.metrics import confusion_matrix, classification_report, mean_squared_error, mean_absolute_error, r2_score
from sklearn.linear_model import LogisticRegression
from sklearn.utils import shuffle
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization, Activation
from keras.optimizers import *
#from tf.keras.optimizers import *
from keras.callbacks import EarlyStopping
from keras.utils import to_categorical
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold, KFold
import keras.backend as K
# from keras.wrappers.scikit_learn import KerasClassifier

# Utility functions

In [None]:
def plot_loss_accuracy(history):
    historydf = pd.DataFrame(history.history, index=history.epoch)
    plt.figure(figsize=(8, 6))
    historydf.plot(ylim=(0, max(1, historydf.values.max())))
    loss = history.history['loss'][-1]
    acc = history.history['accuracy'][-1]
    plt.title('Loss: %.3f, Accuracy: %.3f' % (loss, acc))

# Main section

Let's import and prepare the MNIST dataset first.

In [4]:
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [5]:
print("Training data shape: ", x_train.shape) # (60000, 28, 28) -- 60000 images, each 28x28 pixels
print("Test data shape", x_test.shape) # (10000, 28, 28) -- 10000 images, each 28x28
print("Training response shape:, ", y_train.shape)
print("Testing response shape: ", y_test.shape)

image_size = (x_train.shape[1], x_train.shape[2])

Training data shape:  (60000, 28, 28)
Test data shape (10000, 28, 28)
Training response shape:,  (60000,)
Testing response shape:  (10000,)


In [6]:
# Flatten the images
image_vector_size = image_size[0] * image_size[1] # 28 * 28
x_train = x_train.reshape(x_train.shape[0], image_vector_size) /255.
x_test = x_test.reshape(x_test.shape[0], image_vector_size) /255.
print(x_train.shape)

(60000, 784)


In [7]:
print("First 5 training labels: ", y_train[:5]) # [5, 0, 4, 1, 9]

# Convert to "one-hot" vectors using the to_categorical function
num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)
print("First 5 training lables as one-hot encoded vectors:\n", y_train[:5])
print(y_train.shape)

First 5 training labels:  [5 0 4 1 9]
First 5 training lables as one-hot encoded vectors:
 [[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
(60000, 10)



Again, we will start by defining the model from the Sequential() function and adding some Dense (fully-connected) layers.
Here we will be starting with a simple network with just a single hidden layer with 32 neurons.
Make sure you set the size of the output layer to be the number of classes that we are trying to predict!
We can have a look at our model using the model.summary() function

In [8]:
#image_size = 784 # 28*28
#num_classes = 10 # ten unique digits
def build_basic_model():
  model = Sequential()

  # The input layer requires the special input_shape parameter which should match
  # the shape of our training data.
  model.add(Dense(units=32, activation='sigmoid', input_shape=(image_vector_size,)))
  model.add(Dense(units=num_classes, activation='softmax'))
  return model
model = build_basic_model()
model.summary()

Let's compile our model, train for for 5 epochs and examine the perfomance

In [9]:
model.compile(optimizer="sgd", loss='categorical_crossentropy', metrics=['accuracy'])

In [10]:
history = model.fit(x_train, y_train, batch_size=128, epochs=5, verbose=True, validation_split=.1)
loss, accuracy  = model.evaluate(x_test, y_test, verbose=False)

Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.2357 - loss: 2.2653 - val_accuracy: 0.6275 - val_loss: 1.9135
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.6282 - loss: 1.8497 - val_accuracy: 0.7052 - val_loss: 1.5798
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.6918 - loss: 1.5416 - val_accuracy: 0.7823 - val_loss: 1.3177
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7482 - loss: 1.3089 - val_accuracy: 0.8203 - val_loss: 1.1218
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.7798 - loss: 1.1408 - val_accuracy: 0.8445 - val_loss: 0.9758


In [11]:
plot_loss_accuracy(history)
print(f'Test loss: {loss:.3}')
print(f'Test accuracy: {accuracy:.3}')

NameError: name 'plot_loss_accuracy' is not defined

As we can see above, the perfomance of our model is not too bad.

We did try default learning rate, so, for our SGD optimizer. Can you tell what the value of the default learning rate we used is? Hint: See documentation for Keras SGD optimizer.

Learning rate is a **hyperparameter**, because we set it before we started the training process. What if we can tune the learning rate and choose the one which gives us a better result on the validation set instead?

We could try different values by hand (for example, 0.1, 0.2, 0.3) but it might take some time. Why not use a better search strategy? Let's use a package called **Optuna**.

Below, we will set up a so called **Trial**, where we define our model, hyperparameters that we would like to examine - learning rate, and the objective function - accuracy, on the validation set after 5 epochs.  
We will sample various learning rates between 1e-5 and 1e-1 (uniformly distributed on the logarithmic scale).  



In [12]:
def objective(trial):
    model = build_basic_model()
    # We compile our model with a sampled learning rate.
    lr = trial.suggest_float('lr', 1e-5, 1e-1, log=True) #sample value of learning rate between 1e-5 and 1e-1
    model.compile(tf.keras.optimizers.SGD(learning_rate=lr), loss='categorical_crossentropy', metrics=['accuracy'])
    history = model.fit(x_train, y_train, batch_size=128, epochs=5, verbose=True, validation_split=.1)
    acc = history.history['val_accuracy'][-1]
    return acc #we will use validation accuracy to compare the perfomance of the models trained with the sample learning rates

Next we will sample 10 different learning rates and evaluate the perfomance. It might take a bit of time, that's why we only use 5 epochs.

In [13]:
study = optuna.create_study(direction='maximize') # we would like to maximise the validation accuracy
study.optimize(objective, n_trials=10) #use objective function defined above and sample 10 different learning rates

[I 2025-07-02 04:28:14,367] A new study created in memory with name: no-name-badc9817-81dc-45cc-a88f-29e28437904e


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.5585 - loss: 1.7066 - val_accuracy: 0.8728 - val_loss: 0.6957
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.8527 - loss: 0.6792 - val_accuracy: 0.9033 - val_loss: 0.4391
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.8819 - loss: 0.4856 - val_accuracy: 0.9162 - val_loss: 0.3550
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8921 - loss: 0.4111 - val_accuracy: 0.9200 - val_loss: 0.3154
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9009 - loss: 0.3697 - val_accuracy: 0.9233 - val_loss: 0.2904


[I 2025-07-02 04:28:24,097] Trial 0 finished with value: 0.9233333468437195 and parameters: {'lr': 0.08319278161618736}. Best is trial 0 with value: 0.9233333468437195.


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.5509 - loss: 1.7953 - val_accuracy: 0.8512 - val_loss: 0.8435
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.8309 - loss: 0.8010 - val_accuracy: 0.8910 - val_loss: 0.5304
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.8661 - loss: 0.5644 - val_accuracy: 0.9045 - val_loss: 0.4144
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.8856 - loss: 0.4613 - val_accuracy: 0.9155 - val_loss: 0.3537
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8961 - loss: 0.4014 - val_accuracy: 0.9198 - val_loss: 0.3196


[I 2025-07-02 04:28:33,805] Trial 1 finished with value: 0.9198333621025085 and parameters: {'lr': 0.06395000821636873}. Best is trial 0 with value: 0.9233333468437195.


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.0972 - loss: 2.4854 - val_accuracy: 0.0960 - val_loss: 2.4658
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.0943 - loss: 2.4710 - val_accuracy: 0.0965 - val_loss: 2.4535
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.0971 - loss: 2.4595 - val_accuracy: 0.0965 - val_loss: 2.4418
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.0951 - loss: 2.4449 - val_accuracy: 0.0957 - val_loss: 2.4307
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.0966 - loss: 2.4346 - val_accuracy: 0.0953 - val_loss: 2.4200


[I 2025-07-02 04:28:43,752] Trial 2 finished with value: 0.09533333033323288 and parameters: {'lr': 5.614578230533841e-05}. Best is trial 0 with value: 0.9233333468437195.


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.0837 - loss: 2.3885 - val_accuracy: 0.1687 - val_loss: 2.2735
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.2125 - loss: 2.2513 - val_accuracy: 0.3255 - val_loss: 2.1864
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.3436 - loss: 2.1741 - val_accuracy: 0.4580 - val_loss: 2.1210
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.4569 - loss: 2.1125 - val_accuracy: 0.5502 - val_loss: 2.0617
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.5375 - loss: 2.0574 - val_accuracy: 0.6117 - val_loss: 2.0047


[I 2025-07-02 04:28:54,972] Trial 3 finished with value: 0.6116666793823242 and parameters: {'lr': 0.0016911348393130238}. Best is trial 0 with value: 0.9233333468437195.


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.0638 - loss: 2.5123 - val_accuracy: 0.0713 - val_loss: 2.5007
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.0670 - loss: 2.4891 - val_accuracy: 0.0770 - val_loss: 2.4816
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.0737 - loss: 2.4722 - val_accuracy: 0.0828 - val_loss: 2.4640
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.0788 - loss: 2.4550 - val_accuracy: 0.0873 - val_loss: 2.4475
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.0819 - loss: 2.4341 - val_accuracy: 0.0940 - val_loss: 2.4321


[I 2025-07-02 04:29:08,656] Trial 4 finished with value: 0.09399999678134918 and parameters: {'lr': 9.739242899559474e-05}. Best is trial 0 with value: 0.9233333468437195.


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.0972 - loss: 2.4999 - val_accuracy: 0.0998 - val_loss: 2.3472
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.1034 - loss: 2.3187 - val_accuracy: 0.1527 - val_loss: 2.2614
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.1757 - loss: 2.2449 - val_accuracy: 0.2812 - val_loss: 2.2042
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.3123 - loss: 2.1940 - val_accuracy: 0.3985 - val_loss: 2.1563
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 5ms/step - accuracy: 0.4179 - loss: 2.1478 - val_accuracy: 0.5095 - val_loss: 2.1117


[I 2025-07-02 04:29:20,919] Trial 5 finished with value: 0.5095000267028809 and parameters: {'lr': 0.0012784132567807845}. Best is trial 0 with value: 0.9233333468437195.


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.1148 - loss: 2.4748 - val_accuracy: 0.1237 - val_loss: 2.4510
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.1144 - loss: 2.4598 - val_accuracy: 0.1257 - val_loss: 2.4386
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.1179 - loss: 2.4457 - val_accuracy: 0.1277 - val_loss: 2.4268
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.1188 - loss: 2.4396 - val_accuracy: 0.1292 - val_loss: 2.4157
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.1197 - loss: 2.4287 - val_accuracy: 0.1310 - val_loss: 2.4052


[I 2025-07-02 04:29:28,203] Trial 6 finished with value: 0.13099999725818634 and parameters: {'lr': 6.43888712317673e-05}. Best is trial 0 with value: 0.9233333468437195.


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.4176 - loss: 2.0461 - val_accuracy: 0.7720 - val_loss: 1.3192
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.7592 - loss: 1.2247 - val_accuracy: 0.8548 - val_loss: 0.8660
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.8243 - loss: 0.8651 - val_accuracy: 0.8787 - val_loss: 0.6542
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.8490 - loss: 0.6939 - val_accuracy: 0.8915 - val_loss: 0.5382
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8660 - loss: 0.5884 - val_accuracy: 0.9005 - val_loss: 0.4666


[I 2025-07-02 04:29:37,718] Trial 7 finished with value: 0.9004999995231628 and parameters: {'lr': 0.030857841255894063}. Best is trial 0 with value: 0.9233333468437195.


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.0791 - loss: 2.4542 - val_accuracy: 0.0752 - val_loss: 2.4542
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.0803 - loss: 2.4539 - val_accuracy: 0.0757 - val_loss: 2.4516
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.0797 - loss: 2.4556 - val_accuracy: 0.0762 - val_loss: 2.4491
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.0787 - loss: 2.4487 - val_accuracy: 0.0763 - val_loss: 2.4465
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.0811 - loss: 2.4470 - val_accuracy: 0.0762 - val_loss: 2.4440


[I 2025-07-02 04:29:48,715] Trial 8 finished with value: 0.07616666704416275 and parameters: {'lr': 1.546736980001369e-05}. Best is trial 0 with value: 0.9233333468437195.


Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.2780 - loss: 2.2599 - val_accuracy: 0.6893 - val_loss: 1.9128
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.6710 - loss: 1.8358 - val_accuracy: 0.7567 - val_loss: 1.5341
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.7343 - loss: 1.4891 - val_accuracy: 0.8045 - val_loss: 1.2164
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.7807 - loss: 1.2024 - val_accuracy: 0.8450 - val_loss: 0.9883
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8121 - loss: 1.0062 - val_accuracy: 0.8613 - val_loss: 0.8329


[I 2025-07-02 04:29:59,817] Trial 9 finished with value: 0.8613333106040955 and parameters: {'lr': 0.012959844198546795}. Best is trial 0 with value: 0.9233333468437195.


In [14]:
from optuna.visualization import plot_contour
from optuna.visualization import plot_edf
from optuna.visualization import plot_intermediate_values
from optuna.visualization import plot_optimization_history
from optuna.visualization import plot_parallel_coordinate
from optuna.visualization import plot_param_importances
from optuna.visualization import plot_slice

Let's see how our validation accuracy (Y axis) changed during 10 trials

In [15]:
plot_optimization_history(study)

Let's see what the best value (gives best validation accuracy) Optuna found for the learning rate. Is it different from the default value of the Keras SGD optimiser?

In [16]:
print("Number of finished trials: {}".format(len(study.trials)))
print("Best trial:")
trial = study.best_trial
print("  Value: {}".format(trial.value))
print("  Params: ")
for key, value in trial.params.items():
  print("    {}: {}".format(key, value))

Number of finished trials: 10
Best trial:
  Value: 0.9233333468437195
  Params: 
    lr: 0.08319278161618736


In [17]:
print(study.best_params['lr'])

0.08319278161618736


Finally, let's use this new learning rate, train our model and evaluate its perfomance on the test set. Did we get an improved result?

In [18]:
model = build_basic_model()
# We compile our model with a sampled learning rate.
lr=study.best_params['lr']
#lr=0.01
model.compile(tf.keras.optimizers.SGD(learning_rate=lr), loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, batch_size=128, epochs=5, verbose=True, validation_split=.1)


loss, accuracy  = model.evaluate(x_test, y_test, verbose=False)
print(f'Test loss: {loss:.3}')
print(f'Test accuracy: {accuracy:.3}')

Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.5617 - loss: 1.7690 - val_accuracy: 0.8820 - val_loss: 0.6896
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8580 - loss: 0.6678 - val_accuracy: 0.9097 - val_loss: 0.4298
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8822 - loss: 0.4791 - val_accuracy: 0.9165 - val_loss: 0.3480
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.8952 - loss: 0.3990 - val_accuracy: 0.9233 - val_loss: 0.3082
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.8990 - loss: 0.3677 - val_accuracy: 0.9277 - val_loss: 0.2833
Test loss: 0.324
Test accuracy: 0.911
