# Exercice 3: Optimizing training

**Degree** Master Inter-Universitario de Data Science 

**Course** Machine Learning I

**Lecturer** Ignacio Heredia

---

**Objective**

Try to find the best training routines (optimizers, regularizations, ...) and the best hyperparameters for each method.

**Duration**

60 min (30 + 30)



In [None]:
import tensorflow
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD
from keras import backend as K

import numpy as np

**Load dataset**

In [None]:
batch_size = 128
num_classes = 10
epochs = 1

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

# Take subsample for fast training (demo)
x_train, y_train = x_train[:5000], y_train[:5000]
x_test, y_test = x_test[:1000], y_test[:1000]
print('Taking subsample:')
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

**Define model**

In [None]:
def model_definition(reg_dict, init_dict):
    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3),
                   activation='relu',
                   input_shape=input_shape, **reg_dict))
    model.add(Conv2D(64, (3, 3), activation='relu', **reg_dict))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation='relu', **reg_dict, **init_dict))
    model.add(Dropout(0.5))
    model.add(Dense(num_classes, activation='softmax', **init_dict))
    return model

**Compile and train**

In [None]:
def train_and_score(optimizer=SGD(), regularization_args={}, initialization_args={}):
    model = model_definition(reg_dict=regularization_args, init_dict=initialization_args)
    model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=optimizer,
              metrics=['accuracy'])
    model.fit(x_train, y_train,
            batch_size=batch_size,
            epochs=epochs,
            verbose=0)
    score = model.evaluate(x_test, y_test, verbose=0)
    #print('Test loss:', score[0])
    #print('Test accuracy:', score[1])
    return (score[0], score[1])

## Exercise 3.1

Explore modyfing ```model_definition``` and ```train_and_score``` functions options to find an optimal training routine (while keeping the same model architecture).

**Hints**:
  * Try to add [regularization](https://keras.io/regularizers/) to ```model_definition```
  * Try to add [initializers](https://keras.io/initializers/) to ```model_definition```
  * Try different [optimizers](https://keras.io/optimizers/) in ```train_and_score```
  
Use the default hyperparameters for each option.

**Try different regularizations**

We explore it **without** combinatorial options that would take 4³=64 iterations to explore.

In [None]:
########################################
#                                      #
#       FILL THIS WITH CODE!           #
#                                      #
########################################

In [None]:
########################################
#                                      #
#       FILL THIS WITH CODE!           #
#                                      #
########################################


**Try different initilizers**

We explore it **without** combinatorial options

In [None]:
########################################
#                                      #
#       FILL THIS WITH CODE!           #
#                                      #
########################################

In [None]:
########################################
#                                      #
#       FILL THIS WITH CODE!           #
#                                      #
########################################

**Try different optimizers**

We explore it **without** combinatorial options

In [None]:
########################################
#                                      #
#       FILL THIS WITH CODE!           #
#                                      #
########################################

In [None]:
########################################
#                                      #
#       FILL THIS WITH CODE!           #
#                                      #
########################################

## Exercise 3.2


Now we fix the training options:


* **Regularizer:** No regularization
* **Initializer:** He Uniform
* **Optimizer:** SGD with decay 0

For the sake of simplicity we are going to play only with the some hyperparameters of the optimizer while fixing the 
hyperparameters of the initializer and the regularizer. We will therefore play with:

* learning rate
* momentum value

Implement *random search* and *grid search* on these hyperparameters to find the best set of hyperparameters and compare the results of both search methods.

In [None]:
import numpy as np
import matplotlib as mpl
mpl.rcParams['figure.dpi']= 300
from matplotlib.mlab import griddata
import matplotlib.pylab as plt

from keras.optimizers import SGD
from keras.initializers import he_uniform
from keras.regularizers import l1_l2

# Define the ranges
lr_r = [-2, 0] #sample in logscale [1e-2, 1e0]
mom_r = [0.8, 1.]

def loss_acc(option_list, verbose=True):
    loss_list, acc_list = [], []
    for (lr, mom) in option_list:
        if verbose:
            print('Testing with: lr:{}, momentum:{}'.format(lr, mom))

        init_args = {'kernel_initializer': he_uniform(), 'bias_initializer': he_uniform()}
        reg_args = {}
        opt_args = {'lr': lr, 'momentum': mom, 'decay': 0.}

        loss, acc= train_and_score(optimizer=SGD(**opt_args), initialization_args=init_args, regularization_args=reg_args)
        loss_list.append(loss)
        acc_list.append(acc)
  
    return loss_list, acc_list


def print_winner(loss_list, acc_list):
    loss_list, acc_list = np.array(loss_list), np.array(acc_list)
    args = np.argsort(acc_list)[::-1]
    print('Winner table:')
    for i, ind in enumerate(args):
        print('{}) {}'.format(i, option_list[ind]))
        print('    Loss: {:0.2}, Acc: {}'.format(loss_list[ind], acc_list[ind]))

        
def grid(x, y, z, resX=100, resY=100):
    "Convert 3 column data to matplotlib grid"
    xi = np.linspace(min(x), max(x), resX)
    yi = np.linspace(min(y), max(y), resY)
    Z = griddata(x, y, z, xi, yi, interp='linear')
    X, Y = np.meshgrid(xi, yi)
    return X, Y, Z

**Random search**

In [None]:
########################################
#                                      #
#       FILL THIS WITH CODE!           #
#                                      #
########################################

In [None]:
loss_list, acc_list = loss_acc(option_list)

In [None]:
option_list = list(option_list)
print_winner(loss_list, acc_list)

In [None]:
X, Y, Z = grid(lr_samples, mom_samples, np.array(acc_list))

plt.contourf(X, Y, Z)
plt.colorbar()
plt.scatter(lr_samples, mom_samples, c='r')

plt.xlim(*lr_r)
plt.ylim(*mom_r)
plt.xlabel('Learning rate (log)')
plt.ylabel('Momemtum')
plt.show()

**Grid search**

In [None]:
########################################
#                                      #
#       FILL THIS WITH CODE!           #
#                                      #
########################################

In [None]:
loss_list, acc_list = loss_acc(option_list)

In [None]:
print_winner(loss_list, acc_list)

In [None]:
X, Y, Z = grid(lr_samples, mom_samples, np.array(acc_list))

plt.contourf(X, Y, Z)
plt.colorbar()
plt.scatter(lr_samples, mom_samples, c='r')

plt.xlim(*lr_r)
plt.ylim(*mom_r)
plt.xlabel('Learning rate (log)')
plt.ylabel('Momemtum')
plt.show()

**Extensive random search**

In [None]:
########################################
#                                      #
#       FILL THIS WITH CODE!           #
#                                      #
########################################

In [None]:
X, Y, Z = grid(lr_samples, mom_samples, np.array(acc_list))

ax = plt.contourf(X, Y, Z)
plt.scatter(lr_samples, mom_samples, c='r')

plt.colorbar(ax)
plt.xlim(*lr_r)
plt.ylim(*mom_r)
plt.xlabel('Learning rate (log)')
plt.ylabel('Momemtum')
plt.show()

**Extensive Grid search**

In [None]:
########################################
#                                      #
#       FILL THIS WITH CODE!           #
#                                      #
########################################

In [None]:
X, Y, Z = grid(lr_samples, mom_samples, np.array(acc_list))

plt.contourf(X, Y, Z)
plt.colorbar()
plt.scatter(lr_samples, mom_samples, c='r')

plt.xlim(*lr_r)
plt.ylim(*mom_r)
plt.xlabel('Learning rate (log)')
plt.ylabel('Momemtum')
plt.show()