# Hyperparameter Search
In this notebook, we take a look at using hyperparameter search to automate the process of selecting/fine tuning the hyperparameters used in our neural network. The parameters of a neural network are the weights and biases that are learned and continually updated during training. Networks often have thousands or even millions of parameters, and they are never adjusted by hand, but rather through some sort of automated process like backpropagation. In contrast, hyperparameters effect the architecture of the network itself (they can sort of be thought of as meta-parameters), and are often decided by humans. Things like number of layers, number of neurons per layer, and the learning rate or momentum of the network are all hyperparameters. 

Training neural networks can be very difficult and it is often the role of the programmer to select the hyperparameters that will yield the best results. It is not uncommon for a programmer to run 10 to 20 (or more) experiments, each with different hyperparameters, before arriving at an optimal solution (or even one that is simply "good enough").

Hyperparameter search (often called Hyperparameter Optimization) is a method used to automate the discovery of effective hyperparameters for a network. Rather than using intuition and experience to fine-tune your hyperparameters by hand hyperparameter search can be used to automatically discover optimal hyperparameters given enough compute time and resources.

In this example we are going to use hyperparameter search to discover a good network structure and appropriate learning rate, etc... for a classification task using the popular [MNIST dataset](https://en.wikipedia.org/wiki/MNIST_database). We will be using the Keras library to construct our network, and [hyperas](https://github.com/maxpumperla/hyperas), a keras port of the hyperopt library, to automate our hyperparameters.



In [1]:
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from hyperopt import Trials, STATUS_OK, tpe
from hyperas import optim
from hyperas.distributions import choice, uniform, conditional

Using TensorFlow backend.


Hyperas syntax is similar to that of a templating engine. It uses double curly-braces (`{{`) to denote hyperparameter content that should be replaced by hyperas at runtime. More on that soon, but for now it is important to realized that when you use hyperas, your code is pre-processed and many iterations of your source code are generated and run with the contents of each `{{...}}` replaced with a different set of parameters each time.

## Data() and Model()

In order for the hyperas optmizer (`keras.optim`) to generate and evaluate different permutations of your source code it the must be given two special functions: `data()` and `model(X_train, Y_train, X_test, Y_test)`. 

`data()` is used to load and preprocess your data and must return `X_train, Y_train, X_test, Y_test`. This function is run only once by `keras.optim` to avoid reloading your data. To the best of my knowledge, templated `{{` statements are not permitted (or are at least not useful) inside of `data()`.

`model(X_train, Y_train, X_test, Y_test)` is run once per an evaluation of a new hyperparameter configuration. This function is responsible for generating a model, training that model (in our case using `keras.fit(...)`), and returning a special dictionary object used by hyperas to evaluate the success of the hyperparameters chosen in that particular configuration.

In [2]:
# Note: For whatever reason, I've experienced a bug with hyperas that
# prevents me from using any kind of comment in either the data() or
# model() function. For this reason I will attempt to describe the 
# code in both of these functions through comments and explanations
# outside of the functions themselves.
def data():
    (X_train, y_train), (X_test, y_test) = mnist.load_data()
    X_train = X_train.reshape(60000, 784)
    X_test = X_test.reshape(10000, 784)
    X_train = X_train.astype('float32')
    X_test = X_test.astype('float32')
    X_train /= 255
    X_test /= 255
    nb_classes = 10
    Y_train = to_categorical(y_train, nb_classes)
    Y_test = to_categorical(y_test, nb_classes)
    return X_train, Y_train, X_test, Y_test

In [3]:
def model():
    model = Sequential()
    model.add(Dense(512, input_dim=784))
    model.add(Activation('relu'))
    model.add(Dense({{choice([256, 512, 1024])}}))
    model.add(Dropout({{uniform(0, 1)}}))
    model.add(Activation({{choice(['relu', 'sigmoid'])}}))
    model.add(Dense({{choice([256, 512, 1024])}}))
    model.add(Dropout({{uniform(0, 1)}}))
    
    if conditional({{choice(['three', 'four']) == 'four'}}):
        model.add(Dense({{choice([64, 128, 256])}}))
        model.add(Dropout(0.5))
        
    model.add(Dense(10))
    model.add(Activation('softmax'))
    
    model.compile(loss='categorical_crossentropy', metrics=['accuracy'],
                  optimizer={{choice(['rmsprop', 'adam', 'sgd'])}})
    
    model.fit(X_train, Y_train,
              batch_size={{choice([64, 128])}},
              nb_epoch=1,
              show_accuracy=True,
              verbose=2,
              validation_data=(X_test, Y_test))
    
    score, acc = model.evaluate(X_test, Y_test, verbose=0)
    return {'loss': -acc, 'status': STATUS_OK, 'model': model}
    

Here we've used three of the most common `hyperas` distribution functions: `choice`, `uniform`, and `conditional`. `choice([a, b, c, ...])` takes a list of possible values from which to select during hyperparameter search. `uniform(min, max)` samples from a uniform distribution between floats `min` and `max`. `conditional()` returns `True` or `False` based on the boolean contents inside of `{{}}` and can be used with `choice()` to easily include or exclude entire code blocks. For a complete list of hyperas (really hyperopt) distrobutions see [here](https://github.com/maxpumperla/hyperas/blob/master/hyperas/distributions.py).

Note that `choice()` and `uniform()` are inside of the `{{}}` but with `conditional()` these brackets are inside the function parenthesis. 

## Running the search 

Once we've created our `data()` and `model(...)` functions, we are ready to execute our hyperparameter search using the `keras.optim.minimize(...)` function. This function performs `max_evals` versions of your `model(...)` function using different combinations of hyperparameter values attempting to minimize the "loss" value in the dictionary returned by each internal call to `model(...)`. It returns a tuple of the `(score, accuracy)` of the best run as well as the Keras model that achieved that run.

__WARNING__: Performing a hyperparameter search in this way can take a __very long time__. The search space grows exponentially with each addition of a `hyperas` distribution function. The `max_evals` named parameter to `keras.optim.minimize(...)` can be used to limit the maximum number of hyperparameter searches that will be run. Also note that keeping the number of epochs low for each model fitting (i.e. `model.fit(X, y, nb_epoch=1)`) can drastically reduce the time it takes to execute the hyperparameter searches. The danger with this strategy is that hyperparameters, like learning rate, that take longer to achieve good results will be severly disadvantaged by the search and an optimal solution may be overlooked.

In [9]:
trials = Trials()
best_run, best_model = optim.minimize(model=model,
                                      data=data,
                                      algo=tpe.suggest,
                                      max_evals=10,
                                      notebook_name='hyperparameter_search',
                                      trials=trials)
X_train, Y_train, X_test, Y_test = data()
print("Evalutation of best performing model:")
print(best_model.evaluate(X_test, Y_test))
print(best_run, best_model)

>>> Imports:
try:
    from keras.datasets import mnist
except:
    pass

try:
    from keras.utils.np_utils import to_categorical
except:
    pass

try:
    from keras.models import Sequential
except:
    pass

try:
    from keras.layers.core import Dense, Dropout, Activation
except:
    pass

try:
    from hyperopt import Trials, STATUS_OK, tpe
except:
    pass

try:
    from hyperas import optim
except:
    pass

try:
    from hyperas.distributions import choice, uniform, conditional
except:
    pass

>>> Hyperas search space:

def get_space():
    return {
        'Dense': hp.choice('Dense', [256, 512, 1024]),
        'Dropout': hp.uniform('Dropout', 0, 1),
        'Activation': hp.choice('Activation', ['relu', 'sigmoid']),
        'Dense_1': hp.choice('Dense_1', [256, 512, 1024]),
        'Dropout_1': hp.uniform('Dropout_1', 0, 1),
        'conditional': hp.choice('conditional', ['three', 'four']) == 'four',
        'Dense_2': hp.choice('Dense_2', [64, 128, 256]),
        'optimize

`model.compile(optimizer, loss, metrics=["accuracy"])`


Train on 60000 samples, validate on 10000 samples
Epoch 1/1
3s - loss: 0.3570 - acc: 0.8931 - val_loss: 0.1370 - val_acc: 0.9614
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
2s - loss: 0.3131 - acc: 0.9032 - val_loss: 0.1155 - val_acc: 0.9637
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
3s - loss: 0.8328 - acc: 0.7466 - val_loss: 0.3436 - val_acc: 0.9056
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
4s - loss: 0.3271 - acc: 0.9098 - val_loss: 0.2653 - val_acc: 0.9438
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
2s - loss: 1.0171 - acc: 0.7238 - val_loss: 0.4397 - val_acc: 0.8874
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
1s - loss: 1.9475 - acc: 0.4085 - val_loss: 1.3285 - val_acc: 0.7453
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
2s - loss: 0.3250 - acc: 0.8994 - val_loss: 0.1334 - val_acc: 0.9583
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
3s - loss: 1.6659 - acc: 0.4631 - val