# Grid Search with Keras

Grid search (GridSearchCV in scikit-learn) is a model hyperparameter optimization technique.

When constructing this class you must provide a dictionary of hyperparameters to evaluate in the param_grid argument. This is a map of the model parameter name and an array of values to try.

We will be using a grid search with the KerasClassifier wrapper to evaluate different configurations for our neural network model and use the combination that provides the best estimated performance.

We will then run GridSearchCV through k-fold stratified cross validation.

This type of scheme will take a long time to compute, especially if we have an enormous dataset. If we have a big dataset, it might be wise to design small experiments with a smaller subset of our data so that the search can be completed in a reasonable amount of time. Once the grid search gives us the best parameters, we can train on all of the data using those parameters.

Note: for parallel computing, use n_jobs=-1 as a parameter in GridSearchCV.

---

Let's use the Pima Indians Dataset:

In [5]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
from keras.optimizers import SGD
from keras.constraints import maxnorm
import numpy as np
import pandas as pd

# load dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data"
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv(url, names=names)
array = df.values
df

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
5,5,116,74,0,0,25.6,0.201,30,0
6,3,78,50,32,88,31.0,0.248,26,1
7,10,115,0,0,0,35.3,0.134,29,0
8,2,197,70,45,543,30.5,0.158,53,1
9,8,125,96,0,0,0.0,0.232,54,1


## Description of the hyperparameters we can tune

Since we are using KerasClassifier, we need to use a create_model() function that contains the two arguments optimizer and init, both of which must have default values.

We're going to define the following arrays for the parameters we wish to search (and place them in a dictionary before GridSearchCV):

### Batch Size

Batch size in iterative gradient descent is the number of samples we show to the neural network before the weights are updated (batch learning). Often described in contrast to online learning which updates the weights every single training example (essentially a batch size of 1 training example).

Batch Size is also an optimization in the training of the network, defining how many patterns to read at a time and keep in memory. *** We can then use batch learning in datasets that depend on the time domain in order to learn patterns of certain times. For example, when using a LSTM RNNs to find out which periods of the day crimes occur most often if we divide the day in periods of 8 hours. It could be that there is a much more prononced pattern if we batch every 6 hours. ***

### Number of Epochs

Number of epochs for training the model for different number of exposures to the dataset. 5 epochs mean the neural network will go through the dataset 5 times. Grid Search can be helpful at finding the correct number of epochs, though it's important to cross-validate otherwise the grid search will most likely pick the biggest number of epochs and we will overfit our dataset.

### Optimization Algorithm (optimizer)

***Optimizers search for different weight values.***

### Learning Rate (learn_rate)



### Momentum



### Weight Initialization

***Initializers prepare the network weights using different schemes.***

### Activation Function



### Dropout Rate



### Number of Neurons



## General tips 



In [7]:
# split into input (X) and output (Y) variables
X = array[:,0:8]
y = array[:,8]
# prepare configuration for cross validation test harness
seed = 5
# Function to create model for TensorFlow, required for Keras classifier
def create_model(neurons_h1=1, init_mode='uniform', learn_rate=0.1, momentum=0.2, dropout_rate=0.0, weight_constraint=0):
    # Create model
    model = Sequential()
    model.add(Dense(neurons_h1, input_dim=8, kernel_initializer=init_mode, activation='relu', kernel_constraint=maxnorm(weight_constraint)))
    model.add(Dropout(dropout_rate))
    model.add(Dense(8, kernel_initializer=init_mode, activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(1, kernel_initializer=init_mode, activation='sigmoid'))
    # compile model
    optimizer = SGD(lr=learn_rate, momentum=momentum)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=0)

# Define the grid search parameters.
# Usually when we do a grid search, we won't try every single possible since it will take an enormous time
# to compute if we are using neural networks.

# optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
# activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
init_mode = ['glorot_uniform', 'glorot_normal' 'normal', 'uniform']  #, 'lecun_uniform', 'zero', 'he_normal', 'he_uniform']
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
neurons_h1 = [1, 5, 10, 12, 15, 20, 25, 30]
weight_constraint = [1, 2, 3, 4, 5]
dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
batch_size = [10, 40, 60, 100]
epochs = [10, 50, 100]
param_grid = dict(neurons_h1=neurons_h1, init_mode=init_mode, batch_size=batch_size, epochs=epochs, learn_rate=learn_rate, momentum=momentum, dropout_rate=dropout_rate, weight_constraint=weight_constraint)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1)
grid_result = grid.fit(X, y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_result_['mean_test_score']
stds = grid_result.cv_result_['std_test_score']
params = grid_result.cv_result_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r") % (mean, stdev, param)

KeyboardInterrupt: 