# Hyper Parameter Tuning with keras and sklearn's GridSearch

### What is a Neural Network?

### Why Keras?

### What is Parameter Tuning?

### What are some different parameter tuning techniques?

#### In this walkthrough we will be using keras a long with scikit-learn's gridSearch library in order to provide an example of parameter tuning.

#### First we will use keras and numpy to create a very simple neural network to identify the numbers 3 and 8 in the MNIST handwritten numbers dataset. Limiting the scope of the classification task to only the numbers 3 and 8 keeps the dataset small and allows us to train our model quickly on an average CPU. This will enable us to test a greater number of parameters in a shorter period of time. 

In [88]:
# We'll start off by importing keras as well as the MNIST dataset which conveniently comes included in the keras library.
# Keras comes with six different out of the box datasets which can be viewed here: https://keras.io/datasets/
from __future__ import print_function
import keras
from keras.datasets import mnist

# Next we'll load the MNIST data as our test and training datasets 
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Now that we have the MNIST data we want to select from the digits 0-9 only 3 and 8 to work with to keep our training time manageable. 
# An easy way to do this is to use numpy's logical_or function which will return to us an array with the 
# boolean value at each position based on whether or not that value meets one of the conditions passed to the function. 
# items that meet the conditions that we pass to it. In this case the number 3s are at index 2 and 8s at index 7.

import numpy

train_picks = numpy.logical_or(y_train==2,y_train==7)
test_picks = numpy.logical_or(y_test==2,y_test==7)

# Lets take a look at the first 20 elements of one the arrays that logical_or returns to us:
print(train_picks[:20])


[False False False False False  True False False False False False False
 False False False  True  True False False False]


In [89]:
# We can then select only those elements from our x_train and y_train datasets that match the boolean values of....
# I'm going to come back to explaining the data prep if I end up having time for it...

from keras import backend as K

num_classes=2

####### DATA PREP!! #######

x_train = x_train[train_picks]
x_test = x_test[test_picks]

y_train = numpy.array(y_train[train_picks]==7,dtype=int)
y_test = numpy.array(y_test[test_picks]==7,dtype=int)

# print("x_train: ", x_train[:10])
# print("x_test: ", x_test[:10])

# print("y_train: ", y_train[:10])
# print("y_test: ", y_test[:10])

# input image dimensions
img_rows, img_cols = 28, 28

# Don't really know what this if statements is doing... - Something with setting the input_shape which we need for 
#Setting up the Conv2D later on.
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# Check Shape
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train shape: (12223, 28, 28, 1)
12223 train samples
2060 test samples


### Here's a pretty basic neural network for the task, yet after 12 epochs it reaches 98.11% accuracy. Lets see how accurate we can make it through hyper-param tuning.

In [90]:
from keras.models import Sequential
from keras.layers import Activation, Flatten, Dense, Conv2D

# variables for a bunch of params
num_classes = 2
filters = 3
kernel_size = 3
batch_size = 128
epochs = 12

model = Sequential()
# This is an older version of the keras API Convolutional2D
# model.add(Conv2D(filters, kernel_size, kernel_size, border_mode='same', input_shape=(1,28,28)))
# Should be more like this: input shape was created above in the if-else statement.
model.add(Conv2D(filters, kernel_size, padding='valid', input_shape=input_shape))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(num_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer="adadelta", metrics=['accuracy'])

# model.compile(loss=keras.losses.categorical_crossentropy,
#               optimizer=keras.optimizers.Adadelta(),
#               metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])


Train on 12223 samples, validate on 2060 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12
Test loss: 0.0573836410224
Test accuracy: 0.981067961165


In [91]:
# Adding dropout and MaxPooling2D - I'm going to repeata lot of code on purpose

# Creates model comprised of 2 convolutional layers followed by dense layers
# dense_layer_sizes: List of layer sizes.

# This list has one number for each layer:
# filters: Number of convolutional filters in each convolutional layer
# kernel_size: Convolutional kernel size
# pool_size: Size of pooling area for max pooling

# This doesn't necessarily improve our accuracy a great amount, but it gives us access to more parameters to tune

from keras.layers import Dropout, MaxPooling2D

num_classes = 2
filters = 3
kernel_size = 3
batch_size = 128
epochs = 12
pool_size = 2
dense_layer_sizes = [32, 64]

model = Sequential()

model.add(Conv2D(filters, kernel_size,
                     padding='valid',
                     input_shape=input_shape))
model.add(Activation('relu'))
model.add(Conv2D(filters, kernel_size))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
for layer_size in dense_layer_sizes:
    model.add(Dense(layer_size))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))

# Different compile function here. Not sure why they're different.
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)

print('Test loss:', score[0])
print('Test accuracy:', score[1])

Train on 12223 samples, validate on 2060 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12
Test loss: 0.0455011042168
Test accuracy: 0.984466019417


#### We need to do three things in order to begin tuning our parameters. First we need to wrap our keras model in a KerasClassifier and give it a build function that will return our model to us. We will them create a param_grid which is essentially a dictionary of all of the different parameters that we want to test. Then when we fit our model using GridSearch we will pass it this dictionary of potential parameter values and it will perform the task using all of the different combinations of the parameters that we give it to find the best one. We can then use the best_score_ and best_params_ members to have GridSearch report back to use which combination of parameters gave the best results. 

In [92]:
# First we'll create our build function that we will return our model to us.

from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV

def create_model():
    
    model = Sequential()
    
    # Create first layer
    model.add(Conv2D(filters, kernel_size,
                     padding='valid',
                     input_shape=input_shape))
    model.add(Activation('relu'))
    model.add(Conv2D(filters, kernel_size))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=pool_size))
    model.add(Dropout(0.25))
    
    # Create second layer
    model.add(Flatten())
    for layer_size in dense_layer_sizes:
        model.add(Dense(layer_size))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes))
    model.add(Activation('softmax'))

    # Compile the model
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
    
    # Return the model
    return model



# Next we will create our param_grid dictionary full of all of the different parameters that we want to use.

# Here's the same list of parameters that we've been using. However, now you'll notice that they are the values 
# are contained inside arrays so that they have  
num_classes = [2]
filters = [3]
kernel_size = [(2,3)]
batch_size = [128]
epochs = [12]
pool_size = [2]
dense_layer_sizes = [[32], [64]]
dense_size_candidates = [[32]]

# We will make a dictionary out of all of our parameters to serve as our param_grid
# param_grid = {
#               'epochs': [12],
#               'batch_size': [128]}

param_grid = {'dense_layer_sizes': dense_layer_sizes,
              'epochs': epochs,
              'batch_size': batch_size,
              'filters': filters,
              'kernel_size': kernel_size,
              'pool_size': pool_size}


#print(param_grid)


# Create KerasClassifier using our create_model build function

my_classifier = KerasClassifier(build_fn=create_model)

grid = GridSearchCV(my_classifier, param_grid, scoring='neg_log_loss', n_jobs = 1)

grid.fit(x_train, y_train)

# grid = GridSearchCV(my_classifier,
#                          param_grid={'dense_layer_sizes': dense_size_candidates,
#                                      # epochs is avail for tuning even when not
#                                      # an argument to model building function
#                                      'epochs': epochs,
#                                      'batch_size': batch_size,
#                                      'filters': [8],
#                                      'kernel_size': [3],
#                                      'pool_size': [2]},
#                          scoring='neg_log_loss',
#                          n_jobs=1)

# grid.fit(x_train, y_train)

# grid = GridSearchCV(my_classifier, param_grid, scoring='neg_log_loss', n_jobs=1)

# grid.fit(x_train, y_train)

# print('The parameters of the best model are: ')
# print(grid.best_params_)

# my_classifier = KerasClassifier(make_model, batch_size=32)
# validator = GridSearchCV(my_classifier,
#                          param_grid={'dense_layer_sizes': dense_size_candidates,
#                                      # epochs is avail for tuning even when not
#                                      # an argument to model building function
#                                      'epochs': epochs,
#                                      'batch_size': batch_size,
#                                      'filters': [8],
#                                      'kernel_size': [3],
#                                      'pool_size': [2]},
#                          scoring='neg_log_loss',
#                          n_jobs=1)
# validator.fit(x_train, y_train)

# print('The parameters of the best model are: ')
# print(validator.best_params_)



    

ValueError: dense_layer_sizes is not a legal parameter

In [93]:
#Prepare MNIST dataset using only numbers 3 and 8 
#in order to keep the training task manageable for a CPU

'''**Different Here**'''
#batch_size = 128
num_classes = 2
#epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# load training data and do basic data normalization
# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Only look at 3s and 8s
train_picks = np.logical_or(y_train==2,y_train==7)
test_picks = np.logical_or(y_test==2,y_test==7)

x_train = x_train[train_picks]
x_test = x_test[test_picks]

y_train = np.array(y_train[train_picks]==7,dtype=int)
y_test = np.array(y_test[test_picks]==7,dtype=int)

# Don't really know what this if statements is doing...
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# Check Shape
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train shape: (12223, 28, 28, 1)
12223 train samples
2060 test samples


In [95]:
#Keras models can be used in scikit-learn by wrapping them with the 
#KerasClassifier or KerasRegressor class.

#To use these wrappers you must define a function that creates and returns 
#your Keras sequential model, then pass this function to the build_fn argument when 
#constructing the KerasClassifier class.
    
    
# def make_model(dense_layer_sizes, filters, kernel_size, pool_size):
#     '''Creates model comprised of 2 convolutional layers followed by dense layers
#     dense_layer_sizes: List of layer sizes.
#         This list has one number for each layer
#     filters: Number of convolutional filters in each convolutional layer
#     kernel_size: Convolutional kernel size
#     pool_size: Size of pooling area for max pooling
#     '''

#     model = Sequential()
#     model.add(Conv2D(filters, kernel_size,
#                      padding='valid',
#                      input_shape=input_shape))
#     model.add(Activation('relu'))
#     model.add(Conv2D(filters, kernel_size))
#     model.add(Activation('relu'))
#     model.add(MaxPooling2D(pool_size=pool_size))
#     model.add(Dropout(0.25))

#     model.add(Flatten())
#     for layer_size in dense_layer_sizes:
#         model.add(Dense(layer_size))
#     model.add(Activation('relu'))
#     model.add(Dropout(0.5))
#     model.add(Dense(num_classes))
#     model.add(Activation('softmax'))

#     model.compile(loss='categorical_crossentropy',
#                   optimizer='adadelta',
#                   metrics=['accuracy'])

#     return model


# # We then use our build function to create our KerasClassifier
# model = KerasClassifier(build_fn=make_model, verbose=0)

# # Set Param Arrays for param_grid
# #dense_size_candidates = [[32], [64], [32, 32], [64, 64]]
# dense_size_candidates = [[32]]
# batch_size = [100]
# epochs = [6]


# my_classifier = KerasClassifier(make_model, batch_size=32)
# validator = GridSearchCV(my_classifier,
#                          param_grid={'dense_layer_sizes': dense_size_candidates,
#                                      # epochs is avail for tuning even when not
#                                      # an argument to model building function
#                                      'epochs': epochs,
#                                      'batch_size': batch_size,
#                                      'filters': [8],
#                                      'kernel_size': [3],
#                                      'pool_size': [2]},
#                          scoring='neg_log_loss',
#                          n_jobs=1)
# validator.fit(x_train, y_train)

# print('The parameters of the best model are: ')
# print(validator.best_params_)

# # validator.best_estimator_ returns sklearn-wrapped version of best model.
# # validator.best_estimator_.model returns the (unwrapped) keras model
# # best_model = validator.best_estimator_.model
# # metric_names = best_model.metrics_names
# # metric_values = best_model.evaluate(x_test, y_test)
# # for metric, value in zip(metric_names, metric_values):
# #     print(metric, ': ', value)







%reset

from __future__ import print_function

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.wrappers.scikit_learn import KerasClassifier
from keras import backend as K
from sklearn.grid_search import GridSearchCV

num_classes = 10

# input image dimensions
img_rows, img_cols = 28, 28

# load training data and do basic data normalization
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

def make_model(dense_layer_sizes, filters, kernel_size, pool_size):
    '''Creates model comprised of 2 convolutional layers followed by dense layers
    dense_layer_sizes: List of layer sizes.
        This list has one number for each layer
    filters: Number of convolutional filters in each convolutional layer
    kernel_size: Convolutional kernel size
    pool_size: Size of pooling area for max pooling
    '''

    model = Sequential()
    model.add(Conv2D(filters, kernel_size,
                     padding='valid',
                     input_shape=input_shape))
    model.add(Activation('relu'))
    model.add(Conv2D(filters, kernel_size))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=pool_size))
    model.add(Dropout(0.25))

    model.add(Flatten())
    for layer_size in dense_layer_sizes:
        model.add(Dense(layer_size))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy',
                  optimizer='adadelta',
                  metrics=['accuracy'])

    return model

dense_size_candidates = [[32], [64], [32, 32], [64, 64]]
my_classifier = KerasClassifier(make_model, batch_size=32)
validator = GridSearchCV(my_classifier,
                         param_grid={'dense_layer_sizes': dense_size_candidates,
                                     # epochs is avail for tuning even when not
                                     # an argument to model building function
                                     'epochs': [3, 6],
                                     'filters': [8],
                                     'kernel_size': [3],
                                     'pool_size': [2]},
                         scoring='neg_log_loss',
                         n_jobs=1)
validator.fit(x_train, y_train)

print('The parameters of the best model are: ')
print(validator.best_params_)

Once deleted, variables cannot be recovered. Proceed (y/[n])? y




Epoch 1/3
Epoch 2/3
Epoch 3/3
Epoch 2/3
Epoch 3/3
Epoch 2/3
Epoch 3/3
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6

KeyboardInterrupt: 