# Optimization

As mentioned in the model, this script was generated to find the best parameters of the handwritten digits recognition. Two parameters (also called hyperparameters) optimized were `batch_size` and number of `epochs`. This example can be extrapolated to your hyperparameters as well in order to improve the accuracy of the model. However, given the accuracy is 98% at the moment the optimization was stop here.

The first lines of the code as similar to the ones in the model. The difference here is the used of `GridSearchCV` from `scikit-learn` which allows us to change parameters of a particular model until we find the best parameters. This method does not add a bias to the model despite using the training set because it uses cross-validation which is a method of folding the data into segments and test the hyperparameters there. In other words, the optimization of hyperparameters do not use the test set as expected.

In [2]:
import tensorflow as tf
import numpy as np
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt

In [3]:
mnist = tf.keras.datasets.mnist
(x_train,y_train), (x_test,y_test) = mnist.load_data()

In [4]:
x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

In [5]:
def create_model():
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
    model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
    model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
    model.compile(optimizer='adam',
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])
    return model

In [6]:
#fix random seed for reproducibility
seed = 7
np.random.seed(seed)

In [7]:
model = tf.keras.wrappers.scikit_learn.KerasClassifier(build_fn=create_model, verbose=0)

Here we can define the different values of the parameters that we want to test. More values could be added with the penalty of a longer computing time. The trick is always balancing the cost-benefit. The different combinations are run and saved under `grid_results`.

In [None]:
#define the grid search parameters
batch_size = [10,50,100,200]
epochs = [3,10,20,40]
param_grid = dict(batch_size = batch_size, epochs = epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(x_train, y_train)

Now that the combinations have been performed, we can analyze which combination is the most optimal for our model.

In [9]:
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Best: 0.972283 using {'batch_size': 10, 'epochs': 40}
0.965633 (0.002690) with: {'batch_size': 10, 'epochs': 3}
0.970267 (0.002123) with: {'batch_size': 10, 'epochs': 10}
0.969450 (0.001150) with: {'batch_size': 10, 'epochs': 20}
0.972283 (0.003406) with: {'batch_size': 10, 'epochs': 40}
0.960300 (0.002526) with: {'batch_size': 50, 'epochs': 3}
0.968667 (0.002490) with: {'batch_size': 50, 'epochs': 10}
0.970433 (0.001190) with: {'batch_size': 50, 'epochs': 20}
0.970100 (0.004351) with: {'batch_size': 50, 'epochs': 40}
0.957933 (0.001782) with: {'batch_size': 100, 'epochs': 3}
0.968483 (0.001164) with: {'batch_size': 100, 'epochs': 10}
0.968450 (0.002266) with: {'batch_size': 100, 'epochs': 20}
0.971050 (0.002541) with: {'batch_size': 100, 'epochs': 40}
0.950133 (0.001185) with: {'batch_size': 200, 'epochs': 3}
0.967417 (0.002374) with: {'batch_size': 200, 'epochs': 10}
0.969650 (0.001101) with: {'batch_size': 200, 'epochs': 20}
0.970750 (0.001564) with: {'batch_size': 200, 'epochs': 40

Or, we can simply do:

In [10]:
grid_result.best_params_

{'batch_size': 10, 'epochs': 40}

Notice one thing here. The best parameters are `batch_size` equals 10 and 40 `epochs`. However, using the same `batch_size` but only 10 `epochs` keeps the accuracy high while only taking a tench of the time.