<a href="https://colab.research.google.com/github/camilodlt/rtidy-python/blob/main/Computer%20vision/CIFAR/Hyperparameter_Tuning_for_Convolutional_Neural_Network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hyperparameter Tuning for Convolutional Neural Network

## What we'll be doing? 

- We want to implement two hyperparameters search methods (=> Grid search and random search) on our convolutional model trained on [CIFAR 10](https://github.com/camilodlt/rtidy-python/blob/main/Computer%20vision/CIFAR/1_Deep_Learning_Models_for_Image_Classification.ipynb). Our last model overfitted rapidly because of the model's capacity. Some parameter tunning might alleviate this issue. 

- We'll try to improve our previous results (65%-67% accuracy on testing set). 

In [1]:
# SAME AS BEFORE ------
# LOAD LIBS ------
import tensorflow as tf 
import pandas as pd 
import numpy as np
import sys
import matplotlib.pyplot as plt
# TF VERSION ------
print("TF VERSION: ",tf.version.VERSION)
# PYTHON VERSION ------
print("Python Version:",sys.version)
# LOAD DATA ------ 
# we can simplely use the tf.keras.datasets.cifar10.load_data API to load dataset
(x_train, y_train), (x_test, y_test) =  tf.keras.datasets.cifar10.load_data()

# Scale --- 
x_train= x_train.astype('float32')/255
x_test= x_test.astype('float32')/255


TF VERSION:  2.6.0
Python Version: 3.7.11 (default, Jul  3 2021, 18:01:19) 
[GCC 7.5.0]
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


## Grid search hyperparameter for Convolutional neural network

This is going to be a greedy operation. I'll use our previous network (architecture held constant) but the number of CONV filters will vary. 

Only CONV filter will be variable. 


In [2]:
# WRAP MODEL IN A FUNCTION ------ 
"""
The only changing parameter is the convolutional layers filters 
Everything else is held constant
"""
def grid_search(settings:list):

  # constant
  inputs = tf.keras.Input(shape=(32,32,3,),name="Image_flatten")

  # variable
  x= tf.keras.layers.Conv2D(settings[0],kernel_size=(3,3), name="First_CONV")(inputs) # 32 FILTERS, 3*3 kernel boxes
  # constant
  x= tf.keras.layers.MaxPooling2D()(x) # default 2*2

  # variable
  x= tf.keras.layers.Conv2D(settings[1],kernel_size=(3,3), name="Second_CONV")(x) # 64 FILTERS, 3*3 kernel boxes
  # constant
  x= tf.keras.layers.MaxPooling2D()(x) # default 2*2

  # variable
  x= tf.keras.layers.Conv2D(settings[2],kernel_size=(3,3), name="Third_CONV")(x) # 64 FILTERS, 3*3 kernel boxes

  # constant
  x= tf.keras.layers.Flatten()(x)
  # constant
  x= tf.keras.layers.Dense(64, activation="relu", name= "Dense_1")(x)
  outputs = tf.keras.layers.Dense(10, activation="softmax",name="output_Dense_2")(x)
    #* Model --- 
  model = tf.keras.Model(inputs=inputs, outputs=outputs, name="Cifar_10_CONV_model")
  # COMPILE MODEL --- 
  model.compile(optimizer="adam",
              loss="sparse_categorical_crossentropy", #Since we did not one hot encode y_train
              metrics= ["sparse_categorical_accuracy"])
  history_conv = model.fit(x_train, y_train, batch_size=64, epochs=20, validation_split=0.2,verbose=0)

  return((history_conv.history['val_sparse_categorical_accuracy'],history_conv.history['val_loss'])) # return val accuracy at every epoch

We know define our grid search. We use 16, 32, 48 and 64 filters for every conv layer. 

We take the cartesian product to try all possible combinations. This yields 64 combinations to try => 64 models to train. **Very greedy process**. 


In [3]:
# GRID SEARCH ------
n_layers=3 # How many CONVS
params= [] 
for layer in range(n_layers):
  params.append(np.linspace(start=16, stop=64, num=4)) # 16, 32, 48, 64 

from itertools import product
params=list(product(*params)) # Cartesian product => all combinations 4*4*4
print("Number of combinations to try :", len(params))

Number of combinations to try : 64


We fit every model and save every validation accuracy history. 

For each model we print the maximum accuracy obtained, no matter the epoch. 

Results did not change much from the parameters we had already specified. Validation accuracy stayed rather similar for every model (between 65 and 68)

In [None]:
# RUN MODELS ------
score=[]
loss= []
for i,param in enumerate(params):
  print(f"[TRYING SETTING N:{i}]")
  val_result,loss_result= grid_search(param)
  print(f"Max validation accuracy: {max(val_result)}")
  score.append(val_result)
  loss.append(loss_result)
print("END")

[TRYING SETTING N:0]
Max validation accuracy: 0.6574000120162964
[TRYING SETTING N:1]
Max validation accuracy: 0.6660000085830688
[TRYING SETTING N:2]
Max validation accuracy: 0.6658999919891357
[TRYING SETTING N:3]
Max validation accuracy: 0.6603000164031982
[TRYING SETTING N:4]
Max validation accuracy: 0.6747999787330627
[TRYING SETTING N:5]
Max validation accuracy: 0.6608999967575073
[TRYING SETTING N:6]
Max validation accuracy: 0.6577000021934509
[TRYING SETTING N:7]
Max validation accuracy: 0.675000011920929
[TRYING SETTING N:8]
Max validation accuracy: 0.6758999824523926
[TRYING SETTING N:9]
Max validation accuracy: 0.6692000031471252
[TRYING SETTING N:10]
Max validation accuracy: 0.671999990940094
[TRYING SETTING N:11]
Max validation accuracy: 0.6801999807357788
[TRYING SETTING N:12]
Max validation accuracy: 0.6783000230789185
[TRYING SETTING N:13]
Max validation accuracy: 0.6796000003814697
[TRYING SETTING N:14]
Max validation accuracy: 0.680899977684021
[TRYING SETTING N:15]
M

The parameters of the fixed model we tried in the previous notebooks is somewhere in our params list. 

In [None]:
# LET'S FIND OUR FIXED MODEL ------
def find_fixed():
  for x,y in enumerate(params):
    if (y==(32,32,64)):
      return(x) 
fixed=find_fixed()

Let's find the best results we obtained, on the validation set, with the grid search. 

In [None]:
# RESULTS ------

# BEST RESULTS GRID ---
  # Which model 
index_model= np.argmax(np.max(score, 1))
print("Combination that provided the highest accuracy:", params[index_model])
  # Which epoch 
index_epoch= np.argmax(score[index_model])
  # retrieve
best_accuracy = score[index_model][index_epoch]
best_loss= loss[index_model][index_epoch]

# BEST RESULTS FIXED ---
# In the previous model we implemented (32, 64, 64 filters) => in our grid search it corresponds to index_fixed = 23
index_fixed = fixed
index= np.argmax(score[index_fixed])
fixed_conv_best_accuracy= score[index_fixed][index]
fixed_conv_best_loss= loss[index_fixed][index]

# BEST RESULTS MLP (PREV NOTEBOOK) --- 
MLP_best_accuracy= 0.4865
MLP_best_loss= 1.5456 

results=pd.DataFrame(
    {"Row_names":["Accuracy", "Loss"], 
     "MLP":[MLP_best_accuracy,MLP_best_loss],
     "CONV": [fixed_conv_best_accuracy,fixed_conv_best_loss],
     "GRID":   [best_accuracy, best_loss]
    }
)
results=results.set_index( "Row_names" )
results

Unnamed: 0_level_0,MLP,CONV,GRID
Row_names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Accuracy,0.4865,0.6721,0.6869
Loss,1.5456,0.6721,0.966773


We obtained a 1.4 % increase in accuracy. Clearly changing only the number of filters didn't help much. 

## Random search hyperparameter for Convolutional neural network

In [4]:
# RANDOM SEARCH ------
n_layers=3 # How many CONVS
params= [] 
for layer in range(n_layers):
  params.append(np.linspace(start=16, stop=128, num=10)) # 16, 32, 48, 64, 128 => We try a wider space 

# Create all possible combinations  --- 
params=list(product(*params)) # Cartesian product => all combinations 4*4*4
print("All possible combinations :", len(params))

# Sample from them --- 
indexes= np.random.choice(range(len(params)),20)
random_params= [params[i] for i in indexes]

print("RANDOMLY SAMPLED:", len(random_params))

All possible combinations : 1000
RANDOMLY SAMPLED: 20


There are a lot of possible combinations since we widen the space search, nonetheless, we restrained this space by sampling only 20 combinations. Hopefully one of them would yield good results. 

We fit every model again: 

In [5]:
# RUN MODELS ------
score=[]
loss= []
for i,param in enumerate(random_params):
  print(f"[TRYING SETTING N:{i}]")
  val_result,loss_result= grid_search(param)
  print(f"Max validation accuracy: {max(val_result)}")
  score.append(val_result)
  loss.append(loss_result)
print("END")

[TRYING SETTING N:0]
Max validation accuracy: 0.6590999960899353
[TRYING SETTING N:1]
Max validation accuracy: 0.6743999719619751
[TRYING SETTING N:2]
Max validation accuracy: 0.6692000031471252
[TRYING SETTING N:3]
Max validation accuracy: 0.6762999892234802
[TRYING SETTING N:4]
Max validation accuracy: 0.673799991607666
[TRYING SETTING N:5]
Max validation accuracy: 0.6744999885559082
[TRYING SETTING N:6]
Max validation accuracy: 0.6777999997138977
[TRYING SETTING N:7]
Max validation accuracy: 0.6708999872207642
[TRYING SETTING N:8]
Max validation accuracy: 0.6836000084877014
[TRYING SETTING N:9]
Max validation accuracy: 0.6733999848365784
[TRYING SETTING N:10]
Max validation accuracy: 0.6726999878883362
[TRYING SETTING N:11]
Max validation accuracy: 0.675599992275238
[TRYING SETTING N:12]
Max validation accuracy: 0.666700005531311
[TRYING SETTING N:13]
Max validation accuracy: 0.669700026512146
[TRYING SETTING N:14]
Max validation accuracy: 0.6802999973297119
[TRYING SETTING N:15]
Ma

We bind the new best results on the validation set in our table to compare this method. 

In [9]:
# BEST RESULTS RANDOM ------ 
# Which model --- 
index_model=np.argmax(np.max(score,1))
print("Combination that provided the highest accuracy:", params[index_model])
# Which epoch --- 
index_epoch=np.argmax(score[index_model])
# Retrieve --- 
best_accuracy = score[index_model][index_epoch]
best_loss= loss[index_model][index_epoch]

to_append= [best_accuracy,best_loss ]
# COLBIND 
results["RANDOM"]= to_append
results

Combination that provided the highest accuracy: (16.0, 28.444444444444443, 128.0)


Unnamed: 0,Row_names,MLP,CONV,GRID,RANDOM
0,Accuracy,0.4865,0.6721,0.6869,0.6849
1,Loss,1.5456,1.451947,0.966773,0.962091


Grid search and a random search of a few hyperparameters improve our results but not significantly (at our current results the change was small, since we are still underfitting on new data). **We achieved a 1.6 percentage point increase in accuracy** but we were definitively hoping for more. This is specially the case because of the time it took to test all hyperparameters. 