<a href="https://colab.research.google.com/github/HardikPrabhu/Application-of-Gaussian-Process-on-designing-a-CNN/blob/main/HardikPrabhu.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Application of Gaussian Process to design a CNN


In [None]:
import numpy as np
import pandas as pd
# splitting tool for the validation set
from sklearn.model_selection import train_test_split


In [None]:
# for visualization if needed
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
#for CNN model
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense , Flatten, Dropout
from keras.optimizers import Adam

Importing the files containing the dataset, Bayes optimization function



In [None]:
from google.colab import files
src = list(files.upload().values())[0]
open('gp.py','wb').write(src),
import gp
%load gp.py
%run gp.py

Saving gp.py to gp.py


In [None]:
from google.colab import files
uploaded=files.upload()


Saving X.npy to X.npy
Saving Y.npy to Y.npy


In [None]:
X = np.load("X.npy")
Y = np.load("Y.npy")

In [None]:
print(" Shape of X: ",X.shape)
print(" Shape of Y: ",Y.shape)

 Shape of X:  (2062, 64, 64)
 Shape of Y:  (2062, 10)


Input is 64x64 image.
The total number of class labels is 10.


**CNN Architecture**

* We take some alternating layers of Convolution and max-pooling. The number of such layers is a hyperparameter.
* Each convolution layer consists of some filters, the number of filters in a layer is a hyperparameter.
* Each filter has square kernel of size nxn. "n" is a hyperparameter.
* Each max-pooling layer has pool size of 2. (compresses the image by half) 
* The image is flattened and then fed to a neural network.
* Total numer of hidden layers is a hyperparameter.
* Neurons per layer is also another hyperparameter.
* The final layer has 10 neurons(1 per class) with softmax activation function. Every other neuron not in the last layer has Re-Lu activation function.









In [None]:
def model(param):
  #param [0]:no of cn, pooling layers. [1]:(list)filters per cn layer. [2]: (list) size of kernel per layer. [3]:no of layers in simple nn. [4]:neurons per layer 
  CNN_model = Sequential()
  for i in range (param[0]):
    if i == 0:
      CNN_model.add(Conv2D(filters=param[1][i],kernel_size=(param[2][i],param[2][i]),activation="relu",padding="same",input_shape=((int(64/(2**i)),int(64/(2**i)),1))))
    else:  
      CNN_model.add(Conv2D(filters=param[1][i],kernel_size=(param[2][i],param[2][i]),activation="relu",padding="same",input_shape=(int(64/(2**i)),int(64/(2**i)))))
    
    CNN_model.add(MaxPooling2D(pool_size=(2,2),padding="same"))
  CNN_model.add(Flatten())
  for i in range (param[3]): 
    CNN_model.add(Dense(param[4][i],activation="relu"))
  CNN_model.add(Dense(10,activation="softmax"))
  return CNN_model




In [None]:
x=model([3,[16,24,12],[3,6,5],3,[32,32,16]])
x.summary()


Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_15 (Conv2D)           (None, 64, 64, 16)        160       
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 32, 32, 16)        0         
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 32, 32, 24)        13848     
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 16, 16, 24)        0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 16, 16, 12)        7212      
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 8, 8, 12)          0         
_________________________________________________________________
flatten_6 (Flatten)          (None, 768)             

"x" is an example of a network generated using the function. One of the problems with this approach is that the number of neurons per layer is a vector of size= number of layers. We simply can't have such a set of hyperparameters. So either all the layers should have the same no. of neurons(which is a hyper-parameter), or we should fix the total number of layers. We need to add certain rigidity to the structure of our network. The same goes for other parameters. To simplify things, we would assume that the simple neural network attached at the end has 3 layers. And each convolution layer has the same number of filters, and also, the same size of the kernel. With the above-mentioned compromises we define a new model generating function.( We are allowed to have different values for each variables but the number of variables should be fixed.) 

In [None]:
def model_gen(param):
  #param [0]:no of cn, pooling layers. [1]:filters per cn layer. [2]:size of kernel for each filter  [3]:neurons per layer in the simple 3 layered nn at back  
  CNN_model = Sequential()
  for i in range (param[0]):
    if i == 0:
      CNN_model.add(Conv2D(filters=param[1],kernel_size=(param[2],param[2]),activation="relu",padding="same",input_shape=((int(64/(2**i)),int(64/(2**i)),1))))
    else:  
      CNN_model.add(Conv2D(filters=param[1],kernel_size=(param[2],param[2]),activation="relu",padding="same",input_shape=(int(64/(2**i)),int(64/(2**i)))))
    
    CNN_model.add(MaxPooling2D(pool_size=(2,2),padding="same"))
  CNN_model.add(Flatten())
  for i in range (3): 
    CNN_model.add(Dense(param[3][i],activation="relu"))
  CNN_model.add(Dense(10,activation="softmax"))
  return CNN_model


In [None]:
ad_hoc=model_gen([2,16,3,[16,16,16]])

In [None]:
ad_hoc.summary()


Model: "sequential_15"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_30 (Conv2D)           (None, 64, 64, 16)        160       
_________________________________________________________________
max_pooling2d_22 (MaxPooling (None, 32, 32, 16)        0         
_________________________________________________________________
conv2d_31 (Conv2D)           (None, 32, 32, 16)        2320      
_________________________________________________________________
max_pooling2d_23 (MaxPooling (None, 16, 16, 16)        0         
_________________________________________________________________
flatten_11 (Flatten)         (None, 4096)              0         
_________________________________________________________________
dense_24 (Dense)             (None, 16)                65552     
_________________________________________________________________
dense_25 (Dense)             (None, 16)              

We will use the above network to compare the accuracy with the final tuning of the hyper-parameters.


In [None]:
#spliting train-test
x_train, x_test, y_train, y_test = train_test_split(X,Y,test_size=0.2,random_state=42)
x_train = x_train.reshape(-1,64,64,1)
x_test = x_test.reshape(-1,64,64,1)


In [None]:
ad_hoc.compile(optimizer=Adam(lr=0.002),loss=keras.losses.categorical_crossentropy,metrics=["accuracy"])
results = ad_hoc.fit(x_train,y_train,epochs=15,validation_split=0.3)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


In [None]:
ad_hoc.evaluate(x_test,y_test)[1] #accuracy over test set



0.2857142984867096

Clearly, we could do better at 15th epoch, as the accuracy is really poor.

Before we apply Bayesian Optimization, lets further split training data into training and validation. That way we can compute loss function as the accuracy over the validation set while tuning the hyperparameters using bayes optimization.

In [None]:

x_train, x_val, y_train, y_val = train_test_split(x_train,y_train,test_size=0.3,random_state=42)
x_train = x_train.reshape(-1,64,64,1)
x_val = x_val.reshape(-1,64,64,1)

**Bayesian Optimization**

In [None]:
#define the loss function

def sample_loss(params):  #list of parameters
  model=model_gen([int(params[0]),int(params[1]),int(params[2]),[int(params[3]),int(params[4]),int(params[5])]])
  model.compile(optimizer=Adam(lr=0.002),loss=keras.losses.categorical_crossentropy,metrics=["accuracy"])
  result=model.fit(x_train,y_train,epochs=15)
  return model.evaluate(x_val,y_val)[1]






  


  


**Define the boundary**


In [None]:
bounds=np.array([[2,5],[15,64],[2,5],[10,60],[10,60],[10,60]])

In [None]:
xp,yp=bayesian_optimisation(n_iters=20,sample_loss=sample_loss,bounds=bounds,n_pre_samples=10)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/1

In [None]:
xphat=np.round(xp[np.array(yp).argmax(),:])
print(xphat)


[ 4. 37.  5. 36. 29. 31.]


These are the hyper-parameters given by our optimization algorithm. We should now check whether a model trained with these parameters perform significantly better than the previously randomly constructed model.



In [None]:
final_model=model_gen([4,37,5,[36,29,31]])
final_model.compile(optimizer=Adam(lr=0.002),loss=keras.losses.categorical_crossentropy,metrics=["accuracy"])
results2 =final_model.fit(x_train,y_train,epochs=15)
final_model.evaluate(x_test,y_test)[1]


Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


0.9249394536018372

After hyper-parameter tuning using Bayesian optimization for 20 iterations, the accuracy over the test dataset is 0.92 which is significantly better than random hyper-parameter tuning.