Using the good old MNIST digits, we'll explore hyper parameters. First up -- let's load and normalize the digits.

In [8]:
import numpy as np
import seaborn as sns
import math

import keras
from keras.datasets import mnist
from keras.layers import Input, Dense, Dropout, Flatten, MaxPooling2D, Conv2D, BatchNormalization, ZeroPadding2D, Reshape
from keras.models import Model, Sequential
import numpy as np
from sklearn.model_selection import GridSearchCV, PredefinedSplit
import keras.wrappers.scikit_learn

In [2]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [3]:
train_images = np.expand_dims(x_train / np.max(x_train), -1)
test_images = np.expand_dims(x_test / np.max(x_test), -1)
train_labels = keras.utils.to_categorical(y_train, 10)
test_labels = keras.utils.to_categorical(y_test, 10)
train_images.shape, train_labels.shape

((60000, 28, 28, 1), (60000, 10))

Scikit learn has a great 'standardized' interface that we can use with keras. This requires a 'build function' -- which creates a model.

This build function has a set of parameters -- hyperparameters -- that we will vary in order to find a 'best' model.

In [4]:
input_shape = train_images[0].shape
num_classes = 10

def builder(activation='relu', depth=1, pool=3, 
            strides=2, filters=32, hidden=64, dropout=0.0,
            loss='categorical_crossentropy', optimzer='adam'):
    model = Sequential()
    # initial reshape to have consistent layering
    model.add(Reshape(input_shape, input_shape=input_shape))
    
    # convolutional stack
    for i in range(depth):
        model.add(Conv2D(filters, pool, activation=activation))
        model.add(ZeroPadding2D(pool//2))
        model.add(MaxPooling2D(pool, strides=strides))
        if dropout > 0:
            model.add(Dropout(dropout))
    
    # multilayer perceptron
    model.add(Flatten())
    for i in range(depth):
        model.add(Dense(hidden, activation=activation))
        if dropout > 0:
            model.add(Dropout(dropout))
    # final class activation
    model.add(Dense(num_classes, activation='softmax'))
    
    model.compile(loss=loss, optimizer=optimzer, metrics=['accuracy'])
    return model

classifier = keras.wrappers.scikit_learn.KerasClassifier(builder)

Now -- a simple model run like we are used to -- but notice we are just training.

In [5]:
classifier.fit(train_images, train_labels)

Epoch 1/1


<keras.callbacks.History at 0x7fac56f87f28>

Interesting, that just ran one epoch. Epochs -- that itself is a hyperparameter. Let's set up a grid search to look through some epoch values.

Here is where we introduce *grid search*. It's not the most efficient mechanism, as it is pretty exhaustive, but it gets the job done if you have a lot of compute power. Essentially, think of a big checkerboard -- which can be more than two dimensions -- and every possible parameter permutation is tested, saving you from doing so by hand.

Since we already have separated testing data, we'll use a custom `cv` to split at the index point where the training images end, and then just concatenate the training and testing data end to end. 

The `PredefinedSplit` let's us declare which samples are training `-1` and testing folds. We'll use only one testing fold, with a value of `0`.

We're setting refit to false. The is no real need for us to train with all our data when we are looking for just the parameters. However, in a real application setting, where your data to predict will be *in the future* using all your training data to improve the model can help.

In [6]:
param_grid = {
    'epochs': [1, 2]
}

all_images = np.concatenate((train_images, test_images), axis=0)
all_labels = np.concatenate((train_labels, test_labels), axis=0)
train_test_bitmap = np.concatenate(
    (np.full(len(train_images), -1), np.zeros(len(test_images))),
    axis=0)

grid = GridSearchCV(estimator=classifier, 
                    param_grid=param_grid, 
                    refit=False,
                    verbose=1,
                    cv=PredefinedSplit(train_test_bitmap))
grid_result = grid.fit(all_images, all_labels)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Fitting 1 folds for each of 2 candidates, totalling 2 fits
Epoch 1/1
Epoch 1/2
Epoch 2/2
Best: 0.981300 using {'epochs': 2}


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   50.9s finished


We can see while this runs -- from Keras progress bar -- the direct accuracy but not the validation accuracy. That's the role of the splits, where we will use the validation data -- the part bitmapped to 0 -- to extract the accuracy.

From there -- our final output is -- the best set of parameters.

OK. That was a very simple 'grid', just two squares to search. Now, let's explore a much larger space. Just a warning, this is going to take a **long while to run**. On my system, I did this with a GPU, but if you don't have a GPU, you can set `n_jobs` to have your CPUs go at this work in parallel.

Ultimately -- this is what folks mean when they say *caviar*. We're going to try a lot of things in parallel and see what works, just like a fish laying eggs!


In [7]:
param_grid = {
    'epochs': [4, 8],
    'activation': ['relu', 'elu'],
    'depth': [1, 2],
    'filters': [32, 64],
    'hidden': [64, 128],
    'dropout': [0.0, 0.25]
}
grid = GridSearchCV(estimator=classifier, 
                    param_grid=param_grid, 
                    refit=False,
                    verbose=1,
                    cv=PredefinedSplit(train_test_bitmap))
grid_result = grid.fit(all_images, all_labels)
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Fitting 1 folds for each of 64 candidates, totalling 64 fits
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/8
Epoch 2/8
Epoch 3/8


Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4


Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8


Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8


Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Best: 0.994200 using {'activation': 'relu', 'depth': 2, 'dropout': 0.25, 'epochs': 8, 'filters': 64, 'hidden': 128}


[Parallel(n_jobs=1)]: Done  64 out of  64 | elapsed: 119.7min finished


Machine learning is often a lot of human waiting, but it beats sitting and manually supervising experiments!