## Grid Search Hyperparameters for Deep Learning Models

Reference: [How to Grid Search Hyperparameters for Deep Learning Models in Python with Keras](https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/)

Goal:

* `Wrap` Keras models for use in Scikit-learn and use Grid search
* Grid search for common neural network parameters, such as `learning rate`, `dropout rate`, `epochs`, and `number of neurons`

### Hyperparameter Optimization Tips

* **k-fold Cross Validation**. A default cross-validation of `3` was used, but perhaps `k=5` or `k=10` would be more stable.

* **Parallelize** . Use all your cores if you can.

* **Sample your Dataset**. Because networks are slow to train, try training them on a smaller sample of your training dataset

### Using Keras models in Scikit-learn

Keras models can be used in scikit-learn by wrapping them with the `KerasClassifier` or `KerasRegressor` class from the module [SciKeras](https://adriangb.com/scikeras/stable/).

In [None]:
! pip install scikeras[tensorflow]
# ! pip install --no-deps scikeras  # You can also install SciKeras without any dependencies

To use these wrappers, you must define a function that creates and returns your `Keras sequential model`, then pass this function to the model argument when constructing the `KerasClassifier` class.

    def create_model():
      ...
      return model

    model = KerasClassifier(model=create_model)

The constructor for the KerasClassifier class can take default arguments that are `passed` on to the calls to `model.fit()`, such as the `number of epochs` and the `batch size`.

    def create_model():
      ...
      return model

    model = KerasClassifier(model=create_model, epochs=10)

And new arguments can be defined in the signature of your `create_model()` function with default parameters.

    def create_model(dropout_rate=0.0):
    	...
    	return model

    model = KerasClassifier(model=create_model, dropout_rate=0.2)

### Using Grid Search Algorithm

In scikit-learn, this technique is provided in the `GridSearchCV` class.

By default, the grid search will only use one thread. By setting the `n_jobs` argument in the GridSearchCV constructor to `-1`, the process will use `all cores` on your machine.

The GridSearchCV process will then construct and evaluate one model for each combination of parameters. `Cross validation` is used to evaluate each individual model, and the default of `3-fold cross validation` is used, although you can `override` this by specifying the cv argument to the GridSearchCV constructor.

    param_grid = {'epochs': [10,20,30]}
    grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
    grid_result = grid.fit(X, Y)

The `best_score_` member provides access to the best score observed during the optimization procedure, and the `best_params_` describes the `combination of parameters` that achieved the `best results`.

### Tuning Batch size and training epoch

Load and Prepare Dataset - Locate your `kaggle.json` file into workspace, then run the command below

In [2]:
! pip install -q kaggle

#from google.colab import files
# files.upload()

! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
!kaggle datasets download -d kumargh/pimaindiansdiabetescsv

Downloading pimaindiansdiabetescsv.zip to /content
  0% 0.00/8.89k [00:00<?, ?B/s]
100% 8.89k/8.89k [00:00<00:00, 11.1MB/s]


In [3]:
import zipfile

# Unzip the downloaded file
def unzip_data(zip_file_name):
  zip_ref = zipfile.ZipFile(zip_file_name, "r")
  zip_ref.extractall()
  zip_ref.close()

In [4]:
unzip_data('/content/pimaindiansdiabetescsv.zip')

In [5]:
import numpy as np

DATASET_PATH = '/content/pima-indians-diabetes.csv'

# Load dataset
dataset = np.loadtxt(DATASET_PATH, delimiter=",")

In [6]:
import pandas as pd

pd.DataFrame(dataset)

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,6.0,148.0,72.0,35.0,0.0,33.6,0.627,50.0,1.0
1,1.0,85.0,66.0,29.0,0.0,26.6,0.351,31.0,0.0
2,8.0,183.0,64.0,0.0,0.0,23.3,0.672,32.0,1.0
3,1.0,89.0,66.0,23.0,94.0,28.1,0.167,21.0,0.0
4,0.0,137.0,40.0,35.0,168.0,43.1,2.288,33.0,1.0
...,...,...,...,...,...,...,...,...,...
763,10.0,101.0,76.0,48.0,180.0,32.9,0.171,63.0,0.0
764,2.0,122.0,70.0,27.0,0.0,36.8,0.340,27.0,0.0
765,5.0,121.0,72.0,23.0,112.0,26.2,0.245,30.0,0.0
766,1.0,126.0,60.0,0.0,0.0,30.1,0.349,47.0,1.0


In [7]:
# split dataset into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]

In [8]:
from scikeras.wrappers import KerasClassifier
import tensorflow as tf
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier

# Function to create model, required for KerasClassifier
def create_model():
  model = Sequential([
      Dense(12, input_shape=(8,), activation='relu'),
      Dense(1, activation='sigmoid')
  ])

  # Compile model
  model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

In [9]:
SEED = 7
BATCH_SIZE_TABLE = [10, 20, 40, 60, 80, 100]
EPOCHS_TABLE = [10, 50, 100]

# fix random seed for reproducibility
tf.random.set_seed(SEED)

# Set Grid search parameter dictionary
# param_grid = dict(batch_size=BATCH_SIZE_TABLE, epochs=EPOCHS_TABLE)
param_grid = {
    'batch_size': BATCH_SIZE_TABLE,
    'epochs': EPOCHS_TABLE
    }

In [10]:
def grid_search_report(grid_result):
  # Grid Search Algortihm results
  print("Best score: %f using %s parameters" % (grid_result.best_score_, grid_result.best_params_))
  means = grid_result.cv_results_['mean_test_score']
  stds = grid_result.cv_results_['std_test_score']
  params = grid_result.cv_results_['params']
  for mean, stdev, param in zip(means, stds, params):
    print("Mean: %f,  Std_dev.: (%f) with: %r" % (mean, stdev, param))

Create model using KerasClassifier

In [11]:
# create model
model = KerasClassifier(model=create_model, verbose=0)

Run Grid Search Algorithm on Keras model

In [12]:
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)

In [13]:
# Report Grid Search Algorithm results for given model
grid_search_report(grid_result)

Best score: 0.691406 using {'batch_size': 10, 'epochs': 50} parameters
Mean: 0.572917,  Std_dev.: (0.060878) with: {'batch_size': 10, 'epochs': 10}
Mean: 0.691406,  Std_dev.: (0.041463) with: {'batch_size': 10, 'epochs': 50}
Mean: 0.678385,  Std_dev.: (0.041626) with: {'batch_size': 10, 'epochs': 100}
Mean: 0.605469,  Std_dev.: (0.038670) with: {'batch_size': 20, 'epochs': 10}
Mean: 0.674479,  Std_dev.: (0.025582) with: {'batch_size': 20, 'epochs': 50}
Mean: 0.671875,  Std_dev.: (0.016573) with: {'batch_size': 20, 'epochs': 100}
Mean: 0.552083,  Std_dev.: (0.094309) with: {'batch_size': 40, 'epochs': 10}
Mean: 0.662760,  Std_dev.: (0.031948) with: {'batch_size': 40, 'epochs': 50}
Mean: 0.660156,  Std_dev.: (0.016573) with: {'batch_size': 40, 'epochs': 100}
Mean: 0.566406,  Std_dev.: (0.078579) with: {'batch_size': 60, 'epochs': 10}
Mean: 0.638021,  Std_dev.: (0.021236) with: {'batch_size': 60, 'epochs': 50}
Mean: 0.639323,  Std_dev.: (0.012075) with: {'batch_size': 60, 'epochs': 100}
M

### Tuning Optimization Algorithms

In [17]:
OPTIMIZER_TABLE = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']

# Set Grid search parameter dictionary
# param_grid = dict(batch_size=BATCH_SIZE_TABLE, epochs=EPOCHS_TABLE)
param_grid = {
    'optimizer': OPTIMIZER_TABLE
    }

In [18]:
# create model
model = KerasClassifier(model=create_model, loss="binary_crossentropy", epochs=100, batch_size=10, verbose=0)

In [19]:
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# Report Grid Search Algorithm results for given model
grid_search_report(grid_result)

Best score: 0.720052 using {'optimizer': 'SGD'} parameters
Mean: 0.720052,  Std_dev.: (0.023510) with: {'optimizer': 'SGD'}
Mean: 0.696615,  Std_dev.: (0.023073) with: {'optimizer': 'RMSprop'}
Mean: 0.691406,  Std_dev.: (0.027621) with: {'optimizer': 'Adagrad'}
Mean: 0.669271,  Std_dev.: (0.038318) with: {'optimizer': 'Adadelta'}
Mean: 0.714844,  Std_dev.: (0.030758) with: {'optimizer': 'Adam'}
Mean: 0.664062,  Std_dev.: (0.008438) with: {'optimizer': 'Adamax'}
Mean: 0.688802,  Std_dev.: (0.067382) with: {'optimizer': 'Nadam'}


The KerasClassifier wrapper `will not compile your model again` if the model is `already compiled`. Hence the other way to run GridSearchCV is to set the optimizer as an argument to the create_model() function, which returns an appropriately compiled model like the following:

In [20]:
# Function to create model, required for KerasClassifier
def create_model(optimizer='Adam'):
  model = Sequential([
      Dense(12, input_shape=(8,), activation='relu'),
      Dense(1, activation='sigmoid')
  ])

  # Compile model
  model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
  return model

# create model
model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)

In [21]:
# Set Grid search parameter dictionary
# param_grid = dict(batch_size=BATCH_SIZE_TABLE, epochs=EPOCHS_TABLE)
param_grid = {
    'model__optimizer': OPTIMIZER_TABLE
    }

Note that in the above, you have the prefix model__ in the parameter dictionary param_grid. This is required for the KerasClassifier in the SciKeras module to make clear that the parameter needs to route into the create_model() function as arguments, rather than some parameter to set up in compile() or fit(). See also the routed parameter section of SciKeras documentation.

In [22]:
# Run Grid Search Algorithm and Report the results for given model
grid_search_report(GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3).fit(X, Y))

Best score: 0.712240 using {'model__optimizer': 'Adam'} parameters
Mean: 0.649740,  Std_dev.: (0.003683) with: {'model__optimizer': 'SGD'}
Mean: 0.679688,  Std_dev.: (0.042910) with: {'model__optimizer': 'RMSprop'}
Mean: 0.597656,  Std_dev.: (0.033299) with: {'model__optimizer': 'Adagrad'}
Mean: 0.526042,  Std_dev.: (0.062201) with: {'model__optimizer': 'Adadelta'}
Mean: 0.712240,  Std_dev.: (0.015733) with: {'model__optimizer': 'Adam'}
Mean: 0.703125,  Std_dev.: (0.019137) with: {'model__optimizer': 'Adamax'}
Mean: 0.695312,  Std_dev.: (0.044993) with: {'model__optimizer': 'Nadam'}


### Tuning Learning rate and Momentum

It is common to `pre-select` an `optimization algorithm` to train your network and tune its parameters.



In [11]:
# Function to create model, required for KerasClassifier
def create_model():
  model = Sequential([
      Dense(12, input_shape=(8,), activation='relu'),
      Dense(1, activation='sigmoid')
  ])

  # Compile model
  model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

model = KerasClassifier(model=create_model, loss="binary_crossentropy", optimizer="Adam", epochs=100, batch_size=10, verbose=0)

In [13]:
LEARNING_RATE_TABLE = [0.001, 0.01, 0.1, 0.2, 0.3]
MOMENTUM_TABLE = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]

# Set Grid search parameter dictionary
# param_grid = dict(batch_size=BATCH_SIZE_TABLE, epochs=EPOCHS_TABLE)
param_grid = {
    'optimizer__learning_rate': LEARNING_RATE_TABLE,
    'optimizer__momentum':      MOMENTUM_TABLE
    }

In [None]:
# Run Grid Search Algorithm and Report the results for given model
grid_search_report(GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3, verbose=3).fit(X, Y))

### Tuning network weight initialization

Initialize the model's weights with `small random values`.

We will use the `same weight initialization method` on each layer. Ideally, it may be better to use `different weight initialization` schemes `according to the activation function` used on each layer.

We will use a `rectifier` for the `hidden layer`. Use `sigmoid` for the `output layer` because the `predictions are binary`.

We need to use the `model__init_mode` prefix to ask the KerasClassifier to route the parameter to the model creation function.

In [15]:
# Function to create model, required for KerasClassifier
def create_model(init_mode='uniform'):
  model = Sequential([
      Dense(12, input_shape=(8,), kernel_initializer=init_mode, activation='relu'),
      Dense(1, kernel_initializer=init_mode, activation='sigmoid')
  ])

  # Compile model
  model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)

In [16]:
WEIGHTS_INIT_MODE_TABLE = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform']

# Set Grid search parameter dictionary
# param_grid = dict(batch_size=BATCH_SIZE_TABLE, epochs=EPOCHS_TABLE)
param_grid = {
    'model__init_mode': WEIGHTS_INIT_MODE_TABLE
    }

In [19]:
# Run Grid Search Algorithm and Report the results for given model
grid_search_report(GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3, verbose=3).fit(X, Y))

Fitting 3 folds for each of 8 candidates, totalling 24 fits
Best score: 0.722656 using {'model__init_mode': 'uniform'} parameters
Mean: 0.722656,  Std_dev.: (0.011500) with: {'model__init_mode': 'uniform'}
Mean: 0.701823,  Std_dev.: (0.038450) with: {'model__init_mode': 'lecun_uniform'}
Mean: 0.722656,  Std_dev.: (0.022326) with: {'model__init_mode': 'normal'}
Mean: 0.651042,  Std_dev.: (0.001841) with: {'model__init_mode': 'zero'}
Mean: 0.680990,  Std_dev.: (0.008027) with: {'model__init_mode': 'glorot_normal'}
Mean: 0.704427,  Std_dev.: (0.022402) with: {'model__init_mode': 'glorot_uniform'}
Mean: 0.686198,  Std_dev.: (0.012890) with: {'model__init_mode': 'he_normal'}
Mean: 0.670573,  Std_dev.: (0.008027) with: {'model__init_mode': 'he_uniform'}


### Tuning Activation Functions

The activation function controls the `non-linearity of individual neurons` and when to fire.

We will only use updatable activation functions in the hidden layer, as a `sigmoid` activation function is required in the `output` for the `binary classification` problem.

Similar to the previous example, this is an argument to the create_model() function, and we will use the `model__activation` prefix for the GridSearchCV parameter grid.

In [20]:
# Function to create model, required for KerasClassifier
def create_model(activation='relu'):
  model = Sequential([
      Dense(12, input_shape=(8,), kernel_initializer='uniform', activation=activation),
      Dense(1, kernel_initializer='uniform', activation='sigmoid')
  ])

  # Compile model
  model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)

In [21]:
ACTIVATION_FUNC_TABLE = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']

# Set Grid search parameter dictionary
# param_grid = dict(batch_size=BATCH_SIZE_TABLE, epochs=EPOCHS_TABLE)
param_grid = {
    'model__activation': ACTIVATION_FUNC_TABLE
    }

In [22]:
# Run Grid Search Algorithm and Report the results for given model
grid_search_report(GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3).fit(X, Y))

Best score: 0.734375 using {'model__activation': 'softplus'} parameters
Mean: 0.665365,  Std_dev.: (0.024150) with: {'model__activation': 'softmax'}
Mean: 0.734375,  Std_dev.: (0.020915) with: {'model__activation': 'softplus'}
Mean: 0.664062,  Std_dev.: (0.012758) with: {'model__activation': 'softsign'}
Mean: 0.704427,  Std_dev.: (0.040637) with: {'model__activation': 'relu'}
Mean: 0.674479,  Std_dev.: (0.028587) with: {'model__activation': 'tanh'}
Mean: 0.674479,  Std_dev.: (0.001841) with: {'model__activation': 'sigmoid'}
Mean: 0.688802,  Std_dev.: (0.035564) with: {'model__activation': 'hard_sigmoid'}
Mean: 0.718750,  Std_dev.: (0.009568) with: {'model__activation': 'linear'}


### Tuning dropout regularization

We will look at tuning the `dropout rate` for `regularization` in an effort to `limit overfitting` and `improve` the model’s ability to `generalize`.

For the best results, dropout is best `combined` with a `weight constraint` such as the `max norm constraint`.

[Dropout Regularization in Deep Learning Models with Keras](https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/)

We will try dropout percentages between `0.0` and `0.9` (1.0 does not make sense) and `maxnorm weight constraint` values between `0` and `5`.

In [23]:
from tensorflow.keras.constraints import MaxNorm
from tensorflow.keras.layers import Dropout

# Function to create model, required for KerasClassifier
def create_model(dropout_rate, weight_constraint):
  model = Sequential([
      Dense(12, input_shape=(8,), kernel_initializer='uniform',
            activation='linear', kernel_constraint=MaxNorm(weight_constraint)),
      Dropout(dropout_rate),
      Dense(1, kernel_initializer='uniform', activation='sigmoid')
  ])

  # Compile model
  model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)

In [24]:
WEIGHT_CONSTRAINT_TABLE = [1.0, 2.0, 3.0, 4.0, 5.0]
DROPOUT_RATE_TABLE = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]

# Set Grid search parameter dictionary
# param_grid = dict(batch_size=BATCH_SIZE_TABLE, epochs=EPOCHS_TABLE)
param_grid = {
    'model__dropout_rate': DROPOUT_RATE_TABLE,
    'model__weight_constraint': WEIGHT_CONSTRAINT_TABLE
    }

In [25]:
# Run Grid Search Algorithm and Report the results for given model
grid_search_report(GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3).fit(X, Y))

Best score: 0.723958 using {'model__dropout_rate': 0.4, 'model__weight_constraint': 4.0} parameters
Mean: 0.703125,  Std_dev.: (0.014616) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 1.0}
Mean: 0.701823,  Std_dev.: (0.004872) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 2.0}
Mean: 0.718750,  Std_dev.: (0.014616) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 3.0}
Mean: 0.713542,  Std_dev.: (0.009744) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 4.0}
Mean: 0.704427,  Std_dev.: (0.006639) with: {'model__dropout_rate': 0.0, 'model__weight_constraint': 5.0}
Mean: 0.707031,  Std_dev.: (0.003189) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 1.0}
Mean: 0.707031,  Std_dev.: (0.012758) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 2.0}
Mean: 0.713542,  Std_dev.: (0.011201) with: {'model__dropout_rate': 0.1, 'model__weight_constraint': 3.0}
Mean: 0.709635,  Std_dev.: (0.016053) with: {'model_

### Tuning the number of neurons in the hidden layer

The number of neurons in a layer is an important parameter to tune. Generally the number of neurons in a layer controls the representational capacity of the network.

We will tune the number of neurons in a `single hidden layer`. We will try values from `1` to `30` in steps of `5`.

In [26]:
# Function to create model, required for KerasClassifier
def create_model(neurons):
  model = Sequential([
      Dense(neurons, input_shape=(8,), kernel_initializer='uniform',
            activation='linear', kernel_constraint=MaxNorm(4)),
      Dropout(0.2),
      Dense(1, kernel_initializer='uniform', activation='sigmoid')
  ])

  # Compile model
  model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  return model

model = KerasClassifier(model=create_model, epochs=100, batch_size=10, verbose=0)

In [27]:
LAYER_1_NEURON_TABLE = [1, 5, 10, 15, 20, 25, 30]

# Set Grid search parameter dictionary
# param_grid = dict(batch_size=BATCH_SIZE_TABLE, epochs=EPOCHS_TABLE)
param_grid = {
    'model__neurons': LAYER_1_NEURON_TABLE
    }

In [28]:
# Run Grid Search Algorithm and Report the results for given model
grid_search_report(GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3).fit(X, Y))

Best score: 0.714844 using {'model__neurons': 20} parameters
Mean: 0.695312,  Std_dev.: (0.019918) with: {'model__neurons': 1}
Mean: 0.696615,  Std_dev.: (0.008027) with: {'model__neurons': 5}
Mean: 0.709635,  Std_dev.: (0.008027) with: {'model__neurons': 10}
Mean: 0.712240,  Std_dev.: (0.019225) with: {'model__neurons': 15}
Mean: 0.714844,  Std_dev.: (0.014616) with: {'model__neurons': 20}
Mean: 0.703125,  Std_dev.: (0.011500) with: {'model__neurons': 25}
Mean: 0.695312,  Std_dev.: (0.014616) with: {'model__neurons': 30}


### Reference

https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/