# Hyperparameter optimization with `Hyperopt` and `Hyperas`

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

**Many seemingly arbitrary decisions must be made when developing a deep-learning model, including: What number of layers should you use? How many filters or units should each layer contain? Should you choose a different function or use `ReLU` as an activation function? How much `dropout` ought you to employ? These architecture-level parameters are therefore referred to as hyperparameters.**

**In actual use, skilled hyperparameter-tuning engineers and researchers develop intuition over time about what choices work and don't work when it comes to these choices. However, there are no set guidelines.**

**As a result, you must automatically, methodically, and ethically explore the space of potential decisions. You must conduct an empirical search within the architecture space to identify the best options. The study of automatic hyperparameter optimization focuses on this.**

![hyper_optimization](https://miro.medium.com/max/1142/1*5mStLTnIxsANpOHSwAFJhg.png)

**And for models built using Keras, we can use `Hyperopt` and `Hyperas` to automatize this work.**

> **_[Hyperas](https://github.com/maxpumperla/hyperas) brings fast experimentation with Keras and hyperparameter optimization with Hyperopt together. It lets you use the power of hyperopt without having to learn the syntax of it. Instead, just define your keras model as you are used to, but use a simple template notation to define hyper-parameter ranges to tune._**

**In this notebook, we show how to use `Hyperas` using the [Fashion MNIST dataset](https://keras.io/api/datasets/fashion_mnist/). This is a dataset of 60,000 28x28 grayscale images of 10 fashion categories, along with a test set of 10,000 images.**

![fashion_mnist](https://www.mathworks.com/matlabcentral/mlc-downloads/downloads/3682850e-dc4d-4c07-a2c8-4e58a721b65b/f50369fd-32ea-477d-b74c-1b3f6e014122/images/screenshot.gif)

**To make Hyperes work, you have to wrap your data and model into functions as shown below, and then pass them as parameters to the minimizer.**

**Discrete hyper-parameters (_like units in a layer_) can be optimized by using the `choice` function, just wrap the parameters you want to optimize into double curly brackets and choose a distribution over which to run the algorithm (e.g., `{{choice([32, 64, 128])}}`). To hyper-tune continuous values (_like the learning rate_), we use the `uniform` function (e.g., `{{uniform(0, 1)}}`).**

In [1]:

import numpy as np
from keras.utils import np_utils
from keras.models import Sequential
from keras.datasets import fashion_mnist
from keras import datasets, layers, models
from hyperopt import Trials, STATUS_OK, tpe

from hyperas import optim
from hyperas.distributions import choice, uniform

def data():

    (x_train, y_train), (x_test,
                               y_test) = datasets.fashion_mnist.load_data()
                               
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')

    x_train = x_train / 255.
    x_test = x_test / 255.

    y_train = np_utils.to_categorical(y_train, 10)
    y_test = np_utils.to_categorical(y_test, 10)
    return x_train, y_train, x_test, y_test

def create_model(x_train, y_train, x_test, y_test):

    model = Sequential()

    model.add(layers.Flatten(input_shape=(28, 28)))
    model.add(layers.Dense({{choice([64, 128, 256])}}, activation='relu'))
    model.add(layers.Dropout({{uniform(0, 1)}}))

    model.add(layers.Dense({{choice([64, 128, 256])}}, activation='relu'))
    model.add(layers.Dropout({{uniform(0, 1)}}))

    model.add(layers.Dense({{choice([64, 128, 256])}}, activation='relu'))
    model.add(layers.Dropout({{uniform(0, 1)}}))

    model.add(layers.Dense(10, activation='softmax'))

    model.compile(optimizer={{choice(['rmsprop', 'adam', 'sgd'])}},
                loss='categorical_crossentropy',
                metrics=['categorical_accuracy'])

    result = model.fit(x_train, y_train,
                    batch_size={{choice([32, 64, 128, 256])}}, epochs=50,
                    validation_split=0.2, verbose=2)

    validation_acc = np.amax(result.history['val_categorical_accuracy']) 
    print('Best validation acc of epoch:', validation_acc)
    return {'loss': -validation_acc, 'status': STATUS_OK, 'model': model}


**To run the optimization, we call the `optim.minimize` function, passing our `make_model` and `data` function. You can also set the number of evaluations you want to make using the `max_evals` argument. If you are using a `notebook`, you need to pass the `notebook_name` with the name of the notebook you are running.**

In [3]:
best_run, best_model = optim.minimize(model=create_model,
                                        data=data,
                                        algo=tpe.suggest,
                                        max_evals=5,
                                        trials=Trials(),
                                        notebook_name='hype')

x_train, y_train, x_test, y_test = data()

print("Evalutation of best performing model:")
print(best_model.evaluate(x_test, y_test))
print("Best performing model chosen hyper-parameters:")
print(best_run)
best_model.summary()

Evalutation of best performing model:
[0.5747981667518616, 0.8673999905586243]
Best performing model chosen hyper-parameters:
{'Dense': 0, 'Dense_1': 2, 'Dense_2': 2, 'Dropout': 0.03323327852409652, 'Dropout_1': 0.0886198698550964, 'Dropout_2': 0.2330896882313117, 'batch_size': 3, 'optimizer': 0}
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_4 (Flatten)         (None, 784)               0         
                                                                 
 dense_16 (Dense)            (None, 64)                50240     
                                                                 
 dropout_12 (Dropout)        (None, 64)                0         
                                                                 
 dense_17 (Dense)            (None, 256)               16640     
                                                                 
 dropout_13 (Dropout

**Now that the hyperparameters have been optimized, we can train a new model from scratch for a greater number of epochs. Additionally, since we "_know what works_," we can try our own experiments in a more constrained research area.**

In [5]:
model = Sequential()

model.add(layers.Flatten(input_shape=(28, 28)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dropout(0.01))

model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0))

model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dropout(0.25))

model.add(layers.Dense(10, activation='softmax'))

model.compile(optimizer='rmsprop',
            loss='categorical_crossentropy',
            metrics=['categorical_accuracy'])

history = model.fit(x_train, y_train,
                batch_size=128, epochs=16,
                validation_split=0.2, verbose=2)

history_dict = history.history

acc = history_dict['categorical_accuracy']
val_acc = history_dict['val_categorical_accuracy']
loss = history_dict['loss']
val_loss = history_dict['val_loss']

epochs = range(1, len(acc) + 1)

import plotly.graph_objects as go

fig = go.Figure(layout={'template': 'plotly_dark'})

fig.add_trace(go.Scatter(x=list(epochs), y=acc,
                         line_color='rgba(0, 102, 255, 0.5)', line=dict(width=3, dash='dash'), name='Accuracy (Training)', mode='lines',
                         hoverlabel=dict(namelength=-1),
                         hovertemplate='Accuracy (Training): %{y:.5f} acc <extra></extra>',
                         showlegend=True))
fig.add_trace(go.Scatter(x=list(epochs), y=val_acc,
                         line_color='rgba(255, 0, 0, 0.5)', line=dict(width=3, dash='dash'), name='Accuracy (Validation)', mode='lines',
                         hoverlabel=dict(namelength=-1),
                         hovertemplate='Accuracy (Validation): %{y:.2f} acc <extra></extra>',
                         showlegend=True))


fig.update_layout(
    paper_bgcolor='rgba(0, 0, 0, 0)',
    plot_bgcolor='rgba(0, 0, 0, 0)'

)

fig.show()


fig2 = go.Figure(layout={'template': 'plotly_dark'})

fig2.add_trace(go.Scatter(x=list(epochs), y=loss,
                          line_color='rgba(0, 102, 255, 0.5)', line=dict(width=3, dash='dash'), name='Loss (Training)', mode='lines',
                          hoverlabel=dict(namelength=-1),
                          hovertemplate='Loss (Training): %{y:.5f} loss <extra></extra>',
                          showlegend=True))
fig2.add_trace(go.Scatter(x=list(epochs), y=val_loss,
                          line_color='rgba(255, 0, 0, 0.5)', line=dict(width=3, dash='dash'), name='Loss (Validation)', mode='lines',
                          hoverlabel=dict(namelength=-1),
                          hovertemplate='Loss (Validation): %{y:.2f} loss <extra></extra>',
                          showlegend=True))

fig2.update_layout(
    paper_bgcolor='rgba(0, 0, 0, 0)',
    plot_bgcolor='rgba(0, 0, 0, 0)'

)
fig.show()

test_loss_score, test_acc_score = model.evaluate(x_test, y_test)

print(f'Final Loss: {round(test_loss_score, 2)}.')
print(f'Final Performance: {round(test_acc_score * 100, 2)} %.')

Epoch 1/16
375/375 - 2s - loss: 0.6145 - categorical_accuracy: 0.7765 - val_loss: 0.4419 - val_categorical_accuracy: 0.8338 - 2s/epoch - 4ms/step
Epoch 2/16
375/375 - 1s - loss: 0.4166 - categorical_accuracy: 0.8478 - val_loss: 0.3855 - val_categorical_accuracy: 0.8589 - 1s/epoch - 3ms/step
Epoch 3/16
375/375 - 1s - loss: 0.3749 - categorical_accuracy: 0.8615 - val_loss: 0.3498 - val_categorical_accuracy: 0.8712 - 1s/epoch - 3ms/step
Epoch 4/16
375/375 - 1s - loss: 0.3451 - categorical_accuracy: 0.8739 - val_loss: 0.3399 - val_categorical_accuracy: 0.8747 - 1s/epoch - 3ms/step
Epoch 5/16
375/375 - 1s - loss: 0.3217 - categorical_accuracy: 0.8802 - val_loss: 0.3484 - val_categorical_accuracy: 0.8732 - 1s/epoch - 3ms/step
Epoch 6/16
375/375 - 1s - loss: 0.3068 - categorical_accuracy: 0.8866 - val_loss: 0.3339 - val_categorical_accuracy: 0.8794 - 1s/epoch - 3ms/step
Epoch 7/16
375/375 - 1s - loss: 0.2941 - categorical_accuracy: 0.8905 - val_loss: 0.3491 - val_categorical_accuracy: 0.8783 

Final Loss: 0.4.
Final Performance: 88.26 %.


**Overall, hyperparameter optimization is a potent method that is essential to developing cutting-edge models for any task.** 🙃

---

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).