##**Hyperparameter Tuning with Tensorflow**

The Keras Tuner is a library that helps you pick the optimal set of hyperparameters for the Neural Network. The process of selecting the right set of hyperparameters for our ML application is called *hyperparameter tuning* or *hypertuning*. Hyperparameters are the variables that govern the training process and the topology of an ML model. These variables remani constant over the training process and directly impact the performance of you ML program. There are two types of Hyperparameters:
* **Model Hyperparameters** which influence model selection such as the number and width of hidden layers.
* **Algorithm Hyperparameters** which influence the speed and quality of the learning algorithm such as the learning rate for Stochastic Gradient Descent and the number of nearest neighbours for a KNN classifier.

Hyperparameters are the main reason that Neural Networks are notoriously difficult to configure and there are a lot of parameters that need to be set. On top of that, individual models can be very slow to train.

In [1]:
!pip install -q -U keras-tuner

[?25l[K     |█████▏                          | 10kB 27.7MB/s eta 0:00:01[K     |██████████▍                     | 20kB 32.2MB/s eta 0:00:01[K     |███████████████▋                | 30kB 32.1MB/s eta 0:00:01[K     |████████████████████▉           | 40kB 35.4MB/s eta 0:00:01[K     |██████████████████████████      | 51kB 31.0MB/s eta 0:00:01[K     |███████████████████████████████▎| 61kB 33.5MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 9.0MB/s 
[?25h  Building wheel for keras-tuner (setup.py) ... [?25l[?25hdone
  Building wheel for terminaltables (setup.py) ... [?25l[?25hdone


In [2]:
%tensorflow_version 2.x
import tensorflow as tf
import kerastuner as kt

In [3]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
x_train.shape, y_train.shape, x_test.shape, y_test.shape

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


((60000, 28, 28), (60000,), (10000, 28, 28), (10000,))

In [4]:
x_train = x_train.astype("float32") / 255.
x_test = x_test.astype("float32") / 255.

### Define the Model
When you build a model for hypertuning, you also define the hyperparameter search space in addition to the model architecture. The model you set up for hypertuning is called a *hypermodel*.
You can define a hypermodel through two approaches:
* By using a model builder function
* By subclassing the ```kerastuner.engine.hypermodel.HyperModel``` class

We can also use two pre-defined HyperModel classes - HyperXception and HyperResNet for computer vision applications. Here, we use a model builder function to define the image classification model. The model builder function returns a compiled model and uses hyperparameters we define inline to hypertune the model

In [5]:
def model_builder(hp):
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))

    # Tune the number of units in the first Dense Layer
    # Choose an optimal value between 32-512
    hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
    model.add(tf.keras.layers.Dense(units=hp_units, activation="relu"))
    model.add(tf.keras.layers.Dense(10))

    # Tune the learning rate for the optimizer
    # Choose an optimal value from 0.01, 0.001, or 0.0001
    hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
                  loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                  metrics=['accuracy'])
    return model

### Instantiate the tuner and perform hypertuning
Instantiate the tuner to perform the hypertuning. The Keras Tuner has four tuners availale - ```RandomSearch```, ```Hyperband```, ```BayesianOptimization```, ```Sklearn```. Here, we use the Hyperband tuner. To instantiate the Hyperband tuner, we must specify the hypermodel, the ```objective``` to optimize and the maximum number of epochs to train (```max_epochs```)

The Hyperband tuning algorithm uses adaptive resource allocation and early-stopping to quickly converge on a high-performing model. This is done using a sports championship style bracket. The algorithm trains a large number of models for a few epochs and carries forward only the top-performing half of models to the next round. Hyperband determines the number of models to train in a bracket by computing 1 + log```factor```(```max_epochs```) and rounding it up to the nearest integer.
Create a callback to stop training early after reaching a certain value for the validation loss.

Run the hyperparameter search. The arguments for the search model are the same as those used for ````tf.keras.model.fit``` in addition to the callback above.

In [6]:
tuner = kt.Hyperband(model_builder,
                     objective='val_accuracy',
                     max_epochs=10,
                     factor=3,
                     directory='./my_dir',
                     project_name="intro_to_kt")

In [7]:
tuner.search(x_train, y_train,
             epochs=50,
             validation_split=0.2,
             callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5)])

# Get the optimal hyperparameters
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

print(f"""
The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is {best_hps.get('units')} and the optimal learning rate for the optimizer
is {best_hps.get('learning_rate')}.
""")

Trial 30 Complete [00h 00m 30s]
val_accuracy: 0.8805000185966492

Best val_accuracy So Far: 0.8896666765213013
Total elapsed time: 00h 06m 10s
INFO:tensorflow:Oracle triggered exit

The hyperparameter search is complete. The optimal number of units in the first densely-connected
layer is 320 and the optimal learning rate for the optimizer
is 0.001.



In [10]:
model = tuner.hypermodel.build(best_hps)
history = model.fit(x_train, y_train, epochs = 20, validation_split=0.2)

val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
print('Best epoch: %d' % (best_epoch,))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Best epoch: 18


### **Grid Search CV and Randomized Search CV using keras models in sklearn**

Keras models can be used in scikit-learn by wrapping them with the ```KerasClassifier``` or ```KerasRegressor``` class/
To use these wrappers you must define a function that creates and returns your Keras Sequential model, then pass this function to the ```build_fn``` argument when constructing the ```KerasClassifier``` class.

The constructor for the KerasClassifier class can take default arguments that are passed on to the calls to model.fit(), such as the number of epochs and the batch size

**Grid Search**
Grid Search is a model hyperparameter optimization technique where a model is trained on all combinations of hyperparameters provided and returns the best set of hyperparameters for the model. In sci-kit learn, this technique is provided in the ```GridSearchCV``` class.

**Random Search**
Random Search is also a model hyperparameter optimization technique where a model is trained on a random set of combinations of hyperparameters provided and only for some maximum number of iteration sets. In sci-kit-learn, this technique is provided in the ```RandomSearchCV``` class

When constructing this class you must provide a dictionary of hyperparameters to evaluate in the param_grid argument. This is a map of the model parameter name and an array of values to try. By default, accuracy is the score that is optimized, but other scores can be specified in the ```score``` argument of the GridSearchCV constructor. By default, the grid search will only use one thread. By setting the n_jobs argument in the GridSearchCV constructor to -1, the process will use all cores on the machine. Depending on your Keras backend, this may interfere with the main neural network training process.
The GridSearchCV process will then construct and evaluate one model for each combination of parameters. Cross Validation is used to evaluate each indiviual model and the default of 3-fold cross validation is used, although this can be overriden by specifying the ```cv``` argument to the ```GridSearchCV``` constructor.

Once completed, we can access the outcome of the grid search in the result object returned from grid.fit(). The ```best_score_``` member provides access to the best score observed during the optimization procedure and the best_params_ describes the combination of parameters that achieved the best results.

In [14]:
import numpy as np
from sklearn.model_selection import GridSearchCV
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier

In [17]:
def create_model():
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
    model.add(tf.keras.layers.Dense(units=320, activation="relu"))
    model.add(tf.keras.layers.Dense(10))
    model.compile(optimizer="adam",
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])
    return model

In [19]:
model = KerasClassifier(build_fn=create_model, verbose=1)
batch_size=[32, 62, 128]
epochs = [15, 20]

param_grid = dict(batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator = model, param_grid=param_grid, cv=3)
grid_result = grid.fit(x_train, y_train)

print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epo

In [23]:
def create_model_opt(optimizer):
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
    model.add(tf.keras.layers.Dense(units=320, activation="relu"))
    model.add(tf.keras.layers.Dense(10))
    model.compile(optimizer=optimizer,
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])
    return model

In [26]:
model_b = KerasClassifier(build_fn=create_model_opt, verbose=1)
optimizer=["SGD", "RMSprop", "Adagrad", "Adadelta", "Adam", "Adamax", "Nadam"]

param_grid = dict(optimizer=optimizer)
grid = GridSearchCV(estimator = model_b, param_grid=param_grid, cv=3)
grid_result_2 = grid.fit(x_train, y_train, batch_size=128, epochs=20)

print("Best: %f using %s" % (grid_result_2.best_score_, grid_result_2.best_params_))
means = grid_result_2.cv_results_['mean_test_score']
stds = grid_result_2.cv_results_['std_test_score']
params = grid_result_2.cv_results_['params']
print()
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20

In [29]:
def create_model_distrib(init_mode):
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
    model.add(tf.keras.layers.Dense(units=320, activation="relu", kernel_initializer=init_mode))
    model.add(tf.keras.layers.Dense(10, kernel_initializer=init_mode))
    model.compile(optimizer="Adamax",
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])
    return model

In [30]:
model_c = KerasClassifier(build_fn=create_model_distrib, verbose=1)
init_mode=["uniform", "lecun_uniform", "normal", "zero", "glorot_normal", "glorot_uniform", "he_uniform"]

param_grid = dict(init_mode=init_mode)
grid = GridSearchCV(estimator = model_c, param_grid=param_grid, cv=3)
grid_result_3 = grid.fit(x_train, y_train, batch_size=128, epochs=20)

print("Best: %f using %s" % (grid_result_3.best_score_, grid_result_3.best_params_))
means = grid_result_3.cv_results_['mean_test_score']
stds = grid_result_3.cv_results_['std_test_score']
params = grid_result_3.cv_results_['params']
print()
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20