# Hyperparameter tuning exercise
Hyper parameter tuning is essential to deep learning. The reason is that neural networks are extremely hard to configure and there are a lot of parameters that need to be set. On top of that, individual models can be very slow to train.

In this exercise, you will use a grid search to tune the hyperparameter of deep learning models, including *learning rates*, *regularization strengthens*, *activations*, *the dimension of hidden layers* ... etc. You may use any deep learning toolkits, such as keras or tensorflow.

# Problem description

This time we are dealing with fashion mnist with 10 classes. Let's now look at some of the statistics of the dataset. More information at [fashion-mnist](https://github.com/zalandoresearch/fashion-mnist).

In [1]:
import time

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config = config)

from keras.datasets import fashion_mnist
from sklearn.model_selection import StratifiedShuffleSplit
import matplotlib.pyplot as plt
import numpy as np
# fix random seed for reproducibility
seed = 123
np.random.seed(seed)

from keras.wrappers.scikit_learn import KerasClassifier
from keras.models import Sequential
from keras.layers import Dense

from sklearn.model_selection import GridSearchCV

Using TensorFlow backend.


In [8]:
nb_classes = 10
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

X_train = X_train.reshape((X_train.shape[0], -1))
X_test = X_test.reshape((X_test.shape[0], -1))

# normalization
X_train = X_train / 255.
X_test = X_test / 255.

X = np.array(X_train)
y = np.array(y_train)
    
# split train and validation
sss = StratifiedShuffleSplit(n_splits = 10, test_size = 0.5, random_state = 0)
for train_idx, val_idx in sss.split(X_train, y_train):
    X_train, X_val = X[train_idx], X[val_idx]
    y_train, y_val = y[train_idx], y[val_idx]

print("X_train original shape {}".format(X_train.shape))
print("y_train original shape {}".format(y_train.shape))
print("X_val original shape {}".format(X_val.shape))
print("y_val original shape {}".format(y_val.shape))
print("X_test original shape {}".format(X_test.shape))
print("y_test original shape {}".format(y_test.shape))

for i in range(9):
    plt.subplot(3, 3, i + 1)
    plt.imshow(X_train[i].reshape((28, 28)), cmap = 'gray', interpolation = 'none')
    plt.title("Class {}".format(class_names[y_train[i]]))
plt.tight_layout()

unique, count = np.unique(y_train, return_counts = True)
cls_count = np.concatenate((unique.reshape(nb_classes, 1), count.reshape(nb_classes, 1)), axis = 1)
print('class\tcount')
print('\n'.join(['{}\t{}'.format(item[0], item[1]) for item in cls_count]))

X_train original shape (30000, 784)
y_train original shape (30000,)
X_val original shape (30000, 784)
y_val original shape (30000,)
X_test original shape (10000, 784)
y_test original shape (10000,)
class	count
0	3000
1	3000
2	3000
3	3000
4	3000
5	3000
6	3000
7	3000
8	3000
9	3000


# Write a generic neural network

In [9]:
def create_model(hidden_layers = [128, 64, 32], 
                 activations = ['relu', 'relu', 'relu', 'softmax'], 
                 weight_initialization = 'he_normal', 
                 learning_rate = 1e-5,
                 loss = 'categorical_crossentropy',
                 optimizer = 'adam', 
                 metrics = ['accuracy']):
    model = Sequential()
    model.add(Dense(hidden_layers[0], input_shape = (784,), activation = activations[0], kernel_initializer = weight_initialization))
    for i in range(1, len(hidden_layers) - 1):
        model.add(Dense(hidden_layers[i], activation = activations[i], kernel_initializer = weight_initialization))
    model.add(Dense(10, activation = activations[-1]))
    model.compile(loss = loss, optimizer = optimizer, metrics = metrics)
    return model

# How to tune learning rates

In [4]:
learning_rates = [1e-2, 1e-3, 1e-4, 1e-5, 1e-6]

model = KerasClassifier(build_fn = create_model, batch_size = None, epochs = 10, verbose = 0)

param_grid = dict(learning_rate = learning_rates)

grid = GridSearchCV(estimator = model, param_grid = param_grid, n_jobs = 1, verbose = 50)
grid_result = grid.fit(X_train, y_train)

# result summary
print('Best result: %f using %s' % (grid_result.best_score_, grid_result.best_params_))
for mean, stdev, param in zip(grid_result.cv_results_['mean_test_score'], 
                              grid_result.cv_results_['std_test_score'], 
                              grid_result.cv_results_['params']):
    print('%f (%f) with: %r' % (mean, stdev, param))

Fitting 3 folds for each of 5 candidates, totalling 15 fits
[CV] learning_rate=0.01 ..............................................
[CV] ................. learning_rate=0.01, score=0.8676, total=  32.0s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   34.1s remaining:    0.0s
[CV] learning_rate=0.01 ..............................................
[CV] ................. learning_rate=0.01, score=0.8646, total=  28.5s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:  1.1min remaining:    0.0s
[CV] learning_rate=0.01 ..............................................
[CV] ................. learning_rate=0.01, score=0.8571, total=  26.7s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  1.5min remaining:    0.0s
[CV] learning_rate=0.001 .............................................
[CV] ................ learning_rate=0.001, score=0.8659, total=  26.4s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:  2.0min remaining:    0.0s
[CV] learning_rate=0.001 ...........................

# How to tune activations

In [5]:
activations = [
    ['elu', 'elu', 'elu', 'softmax'],
    ['selu', 'selu', 'selu', 'softmax'],
    ['relu', 'relu', 'relu', 'softmax'],
    ['tanh', 'tanh', 'tanh', 'softmax'],
    ['sigmoid', 'sigmoid', 'sigmoid', 'softmax'],
    ['linear', 'linear', 'linear', 'softmax']
]

model = KerasClassifier(build_fn = create_model, batch_size = None, epochs = 10, verbose = 0)

param_grid = dict(activations = activations)

grid = GridSearchCV(estimator = model, param_grid = param_grid, n_jobs = 1, verbose = 50)
grid_result = grid.fit(X_train, y_train)

# result summary
print('Best result: %f using %s' % (grid_result.best_score_, grid_result.best_params_))
for mean, stdev, param in zip(grid_result.cv_results_['mean_test_score'], 
                              grid_result.cv_results_['std_test_score'], 
                              grid_result.cv_results_['params']):
    print('%f (%f) with: %r' % (mean, stdev, param))

Fitting 3 folds for each of 6 candidates, totalling 18 fits
[CV] activations=['elu', 'elu', 'elu', 'softmax'] ....................
[CV]  activations=['elu', 'elu', 'elu', 'softmax'], score=0.8722, total=  19.3s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   20.0s remaining:    0.0s
[CV] activations=['elu', 'elu', 'elu', 'softmax'] ....................
[CV]  activations=['elu', 'elu', 'elu', 'softmax'], score=0.8737, total=  19.6s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   40.3s remaining:    0.0s
[CV] activations=['elu', 'elu', 'elu', 'softmax'] ....................
[CV]  activations=['elu', 'elu', 'elu', 'softmax'], score=0.8538, total=  20.2s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  1.0min remaining:    0.0s
[CV] activations=['selu', 'selu', 'selu', 'softmax'] .................
[CV]  activations=['selu', 'selu', 'selu', 'softmax'], score=0.8495, total=  21.1s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:  1.4min remaining:    0.0s
[CV] activati

# How to tune batch sizes

In [6]:
batch_sizes = [32, 64, 128]

model = KerasClassifier(build_fn = create_model, batch_size = None, epochs = 10, verbose = 0)

param_grid = dict(batch_size = batch_sizes)

grid = GridSearchCV(estimator = model, param_grid = param_grid, n_jobs = 1, verbose = 50)
grid_result = grid.fit(X_train, y_train)

# result summary
print('Best result: %f using %s' % (grid_result.best_score_, grid_result.best_params_))
for mean, stdev, param in zip(grid_result.cv_results_['mean_test_score'], 
                              grid_result.cv_results_['std_test_score'], 
                              grid_result.cv_results_['params']):
    print('%f (%f) with: %r' % (mean, stdev, param))

Fitting 3 folds for each of 3 candidates, totalling 9 fits
[CV] batch_size=32 ...................................................
[CV] ...................... batch_size=32, score=0.8743, total=  22.1s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   23.0s remaining:    0.0s
[CV] batch_size=32 ...................................................
[CV] ...................... batch_size=32, score=0.8643, total=  22.6s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   46.5s remaining:    0.0s
[CV] batch_size=32 ...................................................
[CV] ...................... batch_size=32, score=0.8668, total=  23.2s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  1.2min remaining:    0.0s
[CV] batch_size=64 ...................................................
[CV] ...................... batch_size=64, score=0.8608, total=  12.8s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:  1.4min remaining:    0.0s
[CV] batch_size=64 ..................................

# How to tune the number of epochs

In [7]:
number_of_epochs = [10, 20, 30]

model = KerasClassifier(build_fn = create_model, batch_size = None, epochs = 10, verbose = 0)

param_grid = dict(epochs = number_of_epochs)

grid = GridSearchCV(estimator = model, param_grid = param_grid, n_jobs = 1, verbose = 50)
grid_result = grid.fit(X_train, y_train)

# result summary
print('Best result: %f using %s' % (grid_result.best_score_, grid_result.best_params_))
for mean, stdev, param in zip(grid_result.cv_results_['mean_test_score'], 
                              grid_result.cv_results_['std_test_score'], 
                              grid_result.cv_results_['params']):
    print('%f (%f) with: %r' % (mean, stdev, param))

Fitting 3 folds for each of 3 candidates, totalling 9 fits
[CV] epochs=10 .......................................................
[CV] .......................... epochs=10, score=0.8671, total=  23.7s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   24.7s remaining:    0.0s
[CV] epochs=10 .......................................................
[CV] .......................... epochs=10, score=0.8745, total=  23.9s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   49.7s remaining:    0.0s
[CV] epochs=10 .......................................................
[CV] .......................... epochs=10, score=0.8628, total=  24.7s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  1.3min remaining:    0.0s
[CV] epochs=20 .......................................................
[CV] .......................... epochs=20, score=0.8731, total=  45.4s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:  2.0min remaining:    0.0s
[CV] epochs=20 ......................................

# How to tune training optimization algorithm

In [8]:
training_optimization_algorithms = ['sgd', 'rmsprop', 'adagrad', 'adadelta', 'adam', 'adamax', 'nadam']

model = KerasClassifier(build_fn = create_model, batch_size = None, epochs = 10, verbose = 0)

param_grid = dict(optimizer = training_optimization_algorithms)

grid = GridSearchCV(estimator = model, param_grid = param_grid, n_jobs = 1, verbose = 50)
grid_result = grid.fit(X_train, y_train)

# result summary
print('Best result: %f using %s' % (grid_result.best_score_, grid_result.best_params_))
for mean, stdev, param in zip(grid_result.cv_results_['mean_test_score'], 
                              grid_result.cv_results_['std_test_score'], 
                              grid_result.cv_results_['params']):
    print('%f (%f) with: %r' % (mean, stdev, param))

Fitting 3 folds for each of 7 candidates, totalling 21 fits
[CV] optimizer=sgd ...................................................
[CV] ...................... optimizer=sgd, score=0.8441, total=  21.9s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   22.9s remaining:    0.0s
[CV] optimizer=sgd ...................................................
[CV] ...................... optimizer=sgd, score=0.8464, total=  22.2s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   46.2s remaining:    0.0s
[CV] optimizer=sgd ...................................................
[CV] ...................... optimizer=sgd, score=0.8169, total=  22.2s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  1.2min remaining:    0.0s
[CV] optimizer=rmsprop ...............................................
[CV] .................. optimizer=rmsprop, score=0.8669, total=  23.4s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:  1.6min remaining:    0.0s
[CV] optimizer=rmsprop .............................

# How to tune weight initialization

In [9]:
weight_initializations = ['uniform', 'lecun_uniform', 'he_uniform', 'glorot_uniform', 'lecun_normal', 'he_normal', 'glorot_normal']

model = KerasClassifier(build_fn = create_model, batch_size = None, epochs = 10, verbose = 0)

param_grid = dict(weight_initialization = weight_initializations)

grid = GridSearchCV(estimator = model, param_grid = param_grid, n_jobs = 1, verbose = 50)
grid_result = grid.fit(X_train, y_train)

# result summary
print('Best result: %f using %s' % (grid_result.best_score_, grid_result.best_params_))
for mean, stdev, param in zip(grid_result.cv_results_['mean_test_score'], 
                              grid_result.cv_results_['std_test_score'], 
                              grid_result.cv_results_['params']):
    print('%f (%f) with: %r' % (mean, stdev, param))

Fitting 3 folds for each of 7 candidates, totalling 21 fits
[CV] weight_initialization=uniform ...................................
[CV] ...... weight_initialization=uniform, score=0.8622, total=  28.5s
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   29.6s remaining:    0.0s
[CV] weight_initialization=uniform ...................................
[CV] ...... weight_initialization=uniform, score=0.8739, total=  27.8s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:   58.6s remaining:    0.0s
[CV] weight_initialization=uniform ...................................
[CV] ...... weight_initialization=uniform, score=0.8663, total=  28.3s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:  1.5min remaining:    0.0s
[CV] weight_initialization=lecun_uniform .............................
[CV]  weight_initialization=lecun_uniform, score=0.8769, total=  27.9s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:  2.0min remaining:    0.0s
[CV] weight_initialization=lecun_uniform ...........

# What's your conclusion?