### Grid Search Hyperparameters for Deep Learning Models
https://machinelearningmastery.com/grid-search-hyperparameters-deep-learning-models-python-keras/

In [3]:
# Use scikit-learn to grid search the batch size and epochs
import numpy
from sklearn.model_selection import GridSearchCV
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model():
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=8, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("datasets/pima-indians-diabetes.data_06apr2021.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, verbose=0)
# define the grid search parameters
batch_size = [10, 20, 40, 60, 80, 100]
epochs = [10, 50, 100]
param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.709635 using {'batch_size': 10, 'epochs': 100}
0.622396 (0.016367) with: {'batch_size': 10, 'epochs': 10}
0.654948 (0.025582) with: {'batch_size': 10, 'epochs': 50}
0.709635 (0.015073) with: {'batch_size': 10, 'epochs': 100}
0.596354 (0.052440) with: {'batch_size': 20, 'epochs': 10}
0.649740 (0.020256) with: {'batch_size': 20, 'epochs': 50}
0.675781 (0.013902) with: {'batch_size': 20, 'epochs': 100}
0.574219 (0.019918) with: {'batch_size': 40, 'epochs': 10}
0.636719 (0.028348) with: {'batch_size': 40, 'epochs': 50}
0.652344 (0.027805) with: {'batch_size': 40, 'epochs': 100}
0.561198 (0.053496) with: {'batch_size': 60, 'epochs': 10}
0.613281 (0.015947) with: {'batch_size': 60, 'epochs': 50}
0.657552 (0.001841) with: {'batch_size': 60, 'epochs': 100}
0.562500 (0.063709) with: {'batch_size': 80, 'epochs': 10}
0.557292 (0.012075) with: {'batch_size': 80, 'epochs': 50}
0.652344 (0.022999) with: {'batch_size': 80, 'epochs': 100}
0.458333 (0.050865) with: {'batch_size': 100, 'epochs':

In [4]:
# Use scikit-learn to grid search the batch size and epochs
import numpy
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(optimizer='adam'):
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=8, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("datasets/pima-indians-diabetes.data_06apr2021.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
param_grid = dict(optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.709635 using {'optimizer': 'Adam'}
0.652344 (0.028705) with: {'optimizer': 'SGD'}
0.688802 (0.024150) with: {'optimizer': 'RMSprop'}
0.519531 (0.012758) with: {'optimizer': 'Adagrad'}
0.460938 (0.120399) with: {'optimizer': 'Adadelta'}
0.709635 (0.004872) with: {'optimizer': 'Adam'}
0.661458 (0.024360) with: {'optimizer': 'Adamax'}
0.653646 (0.043420) with: {'optimizer': 'Nadam'}


In [5]:
# Use scikit-learn to grid search the learning rate and momentum
import numpy
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras.optimizers import SGD
# Function to create model, required for KerasClassifier
def create_model(learn_rate=0.01, momentum=0):
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=8, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	optimizer = SGD(lr=learn_rate, momentum=momentum)
	model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("datasets/pima-indians-diabetes.data_06apr2021.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
learn_rate = [0.001, 0.01, 0.1, 0.2, 0.3]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
param_grid = dict(learn_rate=learn_rate, momentum=momentum)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.674479 using {'learn_rate': 0.001, 'momentum': 0.4}
0.670573 (0.016053) with: {'learn_rate': 0.001, 'momentum': 0.0}
0.669271 (0.039365) with: {'learn_rate': 0.001, 'momentum': 0.2}
0.674479 (0.019225) with: {'learn_rate': 0.001, 'momentum': 0.4}
0.671875 (0.033754) with: {'learn_rate': 0.001, 'momentum': 0.6}
0.665365 (0.044690) with: {'learn_rate': 0.001, 'momentum': 0.8}
0.656250 (0.027805) with: {'learn_rate': 0.001, 'momentum': 0.9}
0.666667 (0.024150) with: {'learn_rate': 0.01, 'momentum': 0.0}
0.627604 (0.060375) with: {'learn_rate': 0.01, 'momentum': 0.2}
0.647135 (0.027126) with: {'learn_rate': 0.01, 'momentum': 0.4}
0.648438 (0.028348) with: {'learn_rate': 0.01, 'momentum': 0.6}
0.649740 (0.026557) with: {'learn_rate': 0.01, 'momentum': 0.8}
0.651042 (0.024774) with: {'learn_rate': 0.01, 'momentum': 0.9}
0.648438 (0.026107) with: {'learn_rate': 0.1, 'momentum': 0.0}
0.651042 (0.024774) with: {'learn_rate': 0.1, 'momentum': 0.2}
0.651042 (0.024774) with: {'learn_rate':

In [6]:
# Use scikit-learn to grid search the activation function
import numpy
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(activation='relu'):
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation=activation))
	model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("datasets/pima-indians-diabetes.data_06apr2021.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
param_grid = dict(activation=activation)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.743490 using {'activation': 'softplus'}
0.651042 (0.024774) with: {'activation': 'softmax'}
0.743490 (0.032734) with: {'activation': 'softplus'}
0.645833 (0.021236) with: {'activation': 'softsign'}
0.703125 (0.016877) with: {'activation': 'relu'}
0.667969 (0.009568) with: {'activation': 'tanh'}
0.701823 (0.023073) with: {'activation': 'sigmoid'}
0.683594 (0.006379) with: {'activation': 'hard_sigmoid'}
0.704427 (0.010253) with: {'activation': 'linear'}


In [7]:
# Use scikit-learn to grid search the activation function
import numpy
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(activation='relu'):
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation=activation))
	model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("datasets/pima-indians-diabetes.data_06apr2021.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
param_grid = dict(activation=activation)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.730469 using {'activation': 'softplus'}
0.657552 (0.003683) with: {'activation': 'softmax'}
0.730469 (0.025315) with: {'activation': 'softplus'}
0.677083 (0.017566) with: {'activation': 'softsign'}
0.716146 (0.015073) with: {'activation': 'relu'}
0.690104 (0.018414) with: {'activation': 'tanh'}
0.684896 (0.050865) with: {'activation': 'sigmoid'}
0.695312 (0.012758) with: {'activation': 'hard_sigmoid'}
0.730469 (0.014616) with: {'activation': 'linear'}


In [8]:
# Use scikit-learn to grid search the activation function
import numpy
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
# Function to create model, required for KerasClassifier
def create_model(activation='relu'):
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation=activation))
	model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataset = numpy.loadtxt("datasets/pima-indians-diabetes.data_06apr2021.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = KerasClassifier(build_fn=create_model, epochs=100, batch_size=10, verbose=0)
# define the grid search parameters
activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear']
param_grid = dict(activation=activation)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X, Y)
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.733073 using {'activation': 'softplus'}
0.636719 (0.017758) with: {'activation': 'softmax'}
0.733073 (0.018414) with: {'activation': 'softplus'}
0.666667 (0.004872) with: {'activation': 'softsign'}
0.723958 (0.017566) with: {'activation': 'relu'}
0.658854 (0.023510) with: {'activation': 'tanh'}
0.695312 (0.016877) with: {'activation': 'sigmoid'}
0.700521 (0.001841) with: {'activation': 'hard_sigmoid'}
0.697917 (0.015073) with: {'activation': 'linear'}


## Tips for Hyperparameter Optimization
This section lists some handy tips to consider when tuning hyperparameters of your neural network.

k-fold Cross Validation. You can see that the results from the examples in this post show some variance. A default cross-validation of 3 was used, but perhaps k=5 or k=10 would be more stable. Carefully choose your cross validation configuration to ensure your results are stable.

Review the Whole Grid. Do not just focus on the best result, review the whole grid of results and look for trends to support configuration decisions.

Parallelize. Use all your cores if you can, neural networks are slow to train and we often want to try a lot of different parameters. Consider spinning up a lot of AWS instances.

Use a Sample of Your Dataset. Because networks are slow to train, try training them on a smaller sample of your training dataset, just to get an idea of general directions of parameters rather than optimal configurations.

Start with Coarse Grids. Start with coarse-grained grids and zoom into finer grained grids once you can narrow the scope.

Do not Transfer Results. Results are generally problem specific. Try to avoid favorite configurations on each new problem that you see. It is unlikely that optimal results you discover on one problem will transfer to your next project. Instead look for broader trends like number of layers or relationships between parameters.

Reproducibility is a Problem. Although we set the seed for the random number generator in NumPy, the results are not 100% reproducible. There is more to reproducibility when grid searching wrapped Keras models than is presented in this post.
