# Hyperparameter optimization with `Hyperopt` and `Hyperas`

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

Many seemingly arbitrary decisions must be made when developing a deep-learning model, including: What number of layers should you use? How many filters or units should each layer contain? Should you choose a different function or use `ReLU` as an activation function? How much `dropout` ought you to employ? These architecture-level parameters are therefore referred to as hyperparameters.

In actual use, skilled hyperparameter-tuning engineers and researchers develop intuition over time about what choices work and don't work when it comes to these choices. However, there are no set guidelines.

As a result, you must automatically, methodically, and ethically explore the space of potential decisions. You must empirically search the architecture space to identify the best options. The study of automatic hyperparameter optimization focuses on this.

![hyper_optimization](https://miro.medium.com/max/1142/1*5mStLTnIxsANpOHSwAFJhg.png)

And for models built using Keras, we can use `Hyperopt` and `Hyperas` to automatize this work.

> _[Hyperas](https://github.com/maxpumperla/hyperas) brings fast experimentation with Keras and hyperparameter optimization with Hyperopt together. It lets you use the power of hyperopt without having to learn the syntax of it. Instead, just define your keras model as you are used to, but use a simple template notation to define hyper-parameter ranges to tune._

In this notebook, we show how to use `Hyperas` using the [Fashion MNIST dataset](https://keras.io/api/datasets/fashion_mnist/). This is a dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images.

![fashion_mnist](https://www.mathworks.com/matlabcentral/mlc-downloads/downloads/3682850e-dc4d-4c07-a2c8-4e58a721b65b/f50369fd-32ea-477d-b74c-1b3f6e014122/images/screenshot.gif)

To make Hyperes work, you have to wrap your data and model into functions as shown below and then pass them as parameters to the minimizer.

Discrete hyper-parameters (_like units in a layer_) can be optimized by using the `choice` function; just wrap the parameters you want to optimize into double curly brackets and choose a distribution over which to run the algorithm (e.g., `{{choice([32, 64, 128])}}`). To hyper-tune continuous values (_like the learning rate_), we use the `uniform` function (e.g., `{{uniform(0, 1)}}`).

In [1]:
import tensorflow as tf
import numpy as np

from hyperas import optim
from hyperopt import STATUS_OK
from hyperas.distributions import choice, uniform

def data():

    (x_train, y_train), (x_test,
                               y_test) = tf.keras.datasets.fashion_mnist.load_data()
                               
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')

    x_train = x_train / 255.
    x_test = x_test / 255.

    y_train = tf.keras.utils.to_categorical(y_train, 10)
    y_test = tf.keras.utils.to_categorical(y_test, 10)
    return x_train, y_train, x_test, y_test

def create_model(x_train, y_train, x_test, y_test):

    model = tf.keras.models.Sequential()

    model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
    model.add(tf.keras.layers.Dense({{choice([64, 128, 256])}}, activation='relu'))
    model.add(tf.keras.layers.Dropout({{uniform(0, 1)}}))

    model.add(tf.keras.layers.Dense({{choice([64, 128, 256])}}, activation='relu'))
    model.add(tf.keras.layers.Dropout({{uniform(0, 1)}}))

    model.add(tf.keras.layers.Dense({{choice([64, 128, 256])}}, activation='relu'))
    model.add(tf.keras.layers.Dropout({{uniform(0, 1)}}))

    model.add(tf.keras.layers.Dense(10, activation='softmax'))

    model.compile(optimizer={{choice(['rmsprop', 'adam', 'sgd'])}},
                loss='categorical_crossentropy',
                metrics=['categorical_accuracy'])

    result = model.fit(x_train, y_train,
                    batch_size={{choice([32, 64, 128, 256])}}, epochs=50,
                    validation_split=0.2, verbose=2)

    validation_acc = np.amax(result.history['val_categorical_accuracy']) 
    print('Best validation acc of epoch:', validation_acc)
    return {'loss': -validation_acc, 'status': STATUS_OK, 'model': model}

x_train, y_train, x_test, y_test = data()


We call the `optim.minimize` function to run the optimization, passing our `make_model` and `data` function. You can also set the number of evaluations you want to make using the `max_evals` argument. If you are using a `notebook`, you need to pass the `notebook_name` with the name of the notebook you are running.

In [2]:
from hyperopt import Trials, tpe

best_run, best_model = optim.minimize(model=create_model,
                                        data=data,
                                        algo=tpe.suggest,
                                        max_evals=5,
                                        trials=Trials(),
                                        notebook_name='10_hyperparameter_tuning')

print("Evalutation of best performing model:")
print(best_model.evaluate(x_test, y_test))
print("Best performing model chosen hyper-parameters:")
print(best_run)
best_model.summary()

>>> Imports:
#coding=utf-8

try:
    import tensorflow as tf
except:
    pass

try:
    import numpy as np
except:
    pass

try:
    from hyperas import optim
except:
    pass

try:
    from hyperopt import STATUS_OK
except:
    pass

try:
    from hyperas.distributions import choice, uniform
except:
    pass

try:
    from hyperopt import Trials, tpe
except:
    pass

try:
    import tensorflow as tf
except:
    pass

try:
    import plotly.graph_objects as go
except:
    pass

>>> Hyperas search space:

def get_space():
    return {
        'Dense': hp.choice('Dense', [64, 128, 256]),
        'Dropout': hp.uniform('Dropout', 0, 1),
        'Dense_1': hp.choice('Dense_1', [64, 128, 256]),
        'Dropout_1': hp.uniform('Dropout_1', 0, 1),
        'Dense_2': hp.choice('Dense_2', [64, 128, 256]),
        'Dropout_2': hp.uniform('Dropout_2', 0, 1),
        'optimizer': hp.choice('optimizer', ['rmsprop', 'adam', 'sgd']),
        'batch_size': hp.choice('batch_size', [32, 64, 128, 256]),

Now that the hyperparameters have been optimized, we can train a new model from scratch for a greater number of epochs. Additionally, since we "_know what works_," we can try our own experiments in a more constrained research space.

In [4]:
import tensorflow as tf

model = tf.keras.models.Sequential()

model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dropout(0.01))

model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(0))

model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.Dropout(0.25))

model.add(tf.keras.layers.Dense(10, activation='softmax'))

model.compile(optimizer='rmsprop',
            loss='categorical_crossentropy',
            metrics=['categorical_accuracy'])

my_callbacks = [
    tf.keras.callbacks.EarlyStopping(monitor='val_loss', 
                                    patience=8),  
    tf.keras.callbacks.ModelCheckpoint(filepath='models/model.{epoch:02d}-{val_loss:.2f}.h5', 
                                        monitor='val_loss', 
                                        save_best_only=True,),  
]

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

history = model.fit(x_train, y_train,
                batch_size=128, epochs=16,
                validation_split=0.2, verbose=2,
                callbacks=my_callbacks)

history_dict = history.history

acc = history_dict['categorical_accuracy']
val_acc = history_dict['val_categorical_accuracy']
loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(acc) + 1)

import plotly.graph_objects as go

fig = go.Figure(layout={'template': 'plotly_dark'})

fig.add_trace(go.Scatter(x=list(epochs), y=acc,
                         line_color='rgba(0, 102, 255, 0.5)', line=dict(width=3, dash='dash'), name='Accuracy (Training)', mode='lines',
                         hoverlabel=dict(namelength=-1),
                         hovertemplate='Accuracy (Training): %{y:.5f} acc <extra></extra>',
                         showlegend=True))
fig.add_trace(go.Scatter(x=list(epochs), y=val_acc,
                         line_color='rgba(255, 0, 0, 0.5)', line=dict(width=3, dash='dash'), name='Accuracy (Validation)', mode='lines',
                         hoverlabel=dict(namelength=-1),
                         hovertemplate='Accuracy (Validation): %{y:.2f} acc <extra></extra>',
                         showlegend=True))


fig.update_layout(
    paper_bgcolor='rgba(0, 0, 0, 0)',
    plot_bgcolor='rgba(0, 0, 0, 0)'

)

fig.show()


fig2 = go.Figure(layout={'template': 'plotly_dark'})

fig2.add_trace(go.Scatter(x=list(epochs), y=loss,
                          line_color='rgba(0, 102, 255, 0.5)', line=dict(width=3, dash='dash'), name='Loss (Training)', mode='lines',
                          hoverlabel=dict(namelength=-1),
                          hovertemplate='Loss (Training): %{y:.5f} loss <extra></extra>',
                          showlegend=True))
fig2.add_trace(go.Scatter(x=list(epochs), y=val_loss,
                          line_color='rgba(255, 0, 0, 0.5)', line=dict(width=3, dash='dash'), name='Loss (Validation)', mode='lines',
                          hoverlabel=dict(namelength=-1),
                          hovertemplate='Loss (Validation): %{y:.2f} loss <extra></extra>',
                          showlegend=True))

fig2.update_layout(
    paper_bgcolor='rgba(0, 0, 0, 0)',
    plot_bgcolor='rgba(0, 0, 0, 0)'

)
fig.show()

test_loss_score, test_acc_score = model.evaluate(x_test, y_test)

print(f'Final Loss: {round(test_loss_score, 2)}.')
print(f'Final Performance: {round(test_acc_score * 100, 2)} %.')

Version:  2.10.1
Eager mode:  True
GPU is available
Epoch 1/16
375/375 - 2s - loss: 0.6147 - categorical_accuracy: 0.7744 - val_loss: 0.4280 - val_categorical_accuracy: 0.8407 - 2s/epoch - 6ms/step
Epoch 2/16
375/375 - 1s - loss: 0.4147 - categorical_accuracy: 0.8459 - val_loss: 0.4122 - val_categorical_accuracy: 0.8516 - 1s/epoch - 4ms/step
Epoch 3/16
375/375 - 2s - loss: 0.3709 - categorical_accuracy: 0.8647 - val_loss: 0.3909 - val_categorical_accuracy: 0.8550 - 2s/epoch - 4ms/step
Epoch 4/16
375/375 - 2s - loss: 0.3437 - categorical_accuracy: 0.8733 - val_loss: 0.3474 - val_categorical_accuracy: 0.8750 - 2s/epoch - 5ms/step
Epoch 5/16
375/375 - 2s - loss: 0.3267 - categorical_accuracy: 0.8794 - val_loss: 0.3353 - val_categorical_accuracy: 0.8772 - 2s/epoch - 4ms/step
Epoch 6/16
375/375 - 2s - loss: 0.3100 - categorical_accuracy: 0.8854 - val_loss: 0.4000 - val_categorical_accuracy: 0.8599 - 2s/epoch - 4ms/step
Epoch 7/16
375/375 - 2s - loss: 0.2972 - categorical_accuracy: 0.8892 - 

Final Loss: 0.4.
Final Performance: 87.82 %.


Overall, hyperparameter optimization is a potent method that is essential to developing cutting-edge models for any task. 🙃

---

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).