# Section 2: MLP

This notebook shows the steps to train a MLP Model. The notebook are separated into 3 main sections:

## Contents

- <a href='#MLP_1'>1. Import libraries and data</a>
- <a href='#MLP_2'>2. Train MLP using pixel intensity as an input</a>
 - <a href='#MLP_2.1'>2.1 Find appropriate architecture by tuning hidden layer, hidden dimension and activation function</a> 
 - <a href='#MLP_2.2'>2.2 Compared the performance of the networks </a>         
- <a href='#MLP_3'>3. Train MLP using HOG descriptor as an input</a>
 - <a href='#MLP_3.1'>3.1 Find appropriate architecture by tuning hidden layer, hidden dimension and activation function</a> 
 - <a href='#MLP_3.2'>3.2 Compared the performance of the networks </a>   

## 1.Import libraries and data <a id='MLP_1'></a> 

In [30]:
import os
import matplotlib.pyplot as plt
from skimage.feature import hog
import numpy as np
import pandas as pd
import pickle
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
import torch.nn.functional as F
from skorch.callbacks import EarlyStopping
import torch
from torch import nn
from sklearn.base import BaseEstimator, TransformerMixin
from skorch import NeuralNetClassifier
import copy
import time
pd.set_option('expand_frame_repr', False)
pd.set_option('display.max_columns', None)
from sklearn.model_selection import GridSearchCV

device = 'cpu'

In [4]:
## Helper Function to show the first 5 images
# Credit from INM427 Neural Computing Exercise
def plot_example(X, y):
    """Plot the first 5 images and their labels in a row."""
    for i, (img, y) in enumerate(zip(X[:5].reshape(5, 28, 28), y[:5])):
        plt.subplot(151 + i)
        plt.imshow(img,cmap ='gray')
        plt.xticks([])
        plt.yticks([])
        plt.title(y)

In [31]:
# %% Get Current Path
# Get data location path
cwd = os.getcwd()
script_path = cwd + '/'
data_path = script_path + 'Data2'
train_path = data_path +'/' + 'mnist_background_random_train.amat'
test_path = data_path +'/' + 'mnist_background_random_test.amat'

In [32]:
# Import Data
df_train = np.loadtxt(train_path)
df_test = np.loadtxt(test_path)

X_train = df_train[:,0:-1]
y_train = df_train[:,-1]
X_test = df_test[:,0:-1]
y_test = df_test[:,-1]

In [33]:
# %% Change Datatype to tensor

X_train  = torch.from_numpy(X_train).float()
# converting the target into torch format
y_train = torch.from_numpy(np.array(y_train))
y_train = y_train.type(torch.LongTensor)

#Formatting on testing set
X_test  = torch.from_numpy(X_test).float()
y_test = torch.from_numpy(np.array(y_test))
y_test = y_test.type(torch.LongTensor)

In [34]:
# Helper dictionary to convert the value to display
func_dict = {1:'Relu',2:'LeakyReLu',3:'tanh',4:'sigmoid'}

## 2.Train MLP using pixel intensity as an input  <a id='MLP_2'></a> 

The method to train MLP is divided into 2 steps

- 2.1 Find appropriate architecture by tuning hidden layer, hidden dimension and activation function
  - 2.1.1 Create a network with 1 hidden layer
  - 2.1.2 Tune the learning rate to the best 1 hidden layer architecture 
  - 2.1.3 Create a network with 2 hidden layers
  - 2.1.4 Tune the learning rate to the best 2 hidden layer architecture

- 2.2  Compared the performance of the networks 


### 2.1 Find appropriate architecture by tuning hidden layer, hidden dimension and activation function <a id='MLP_2.1'></a> 

#### 2.1.1 Create a network with 1 hidden layer

The general aritecture of the MLP will have the input as 784 (28*28 pixel),the output of 10.
Softmax activation function at the output nodes and use cross entropy as a loss function


In [35]:
# %% Set Class
# MLP_1: 1 hidden layer Network

class MLP_1(nn.Module):
    def __init__(self,hidden_dim,function):
        super(MLP_1,self).__init__()
        self.fc1 = nn.Linear(784,hidden_dim)

        if function ==1:
            self.func = nn.ReLU()
        elif function == 2:
            self.func = nn.LeakyReLU()
        elif function == 3:
            self.func = nn.Tanh()
        elif function ==4:
            self.func = nn.Sigmoid()

        self.output = nn.Linear(hidden_dim,10)

    def forward(self,x):

        hidden = self.fc1(x)
        hidden = self.func(hidden)

        out = F.softmax(self.output(hidden), dim = -1)

        return out

In [315]:
#Set Default parameter
net = NeuralNetClassifier(module = MLP_1,
                          module__hidden_dim = 50,
                          module__function = 1,
                          max_epochs = 500,
                          criterion = nn.CrossEntropyLoss,
                          lr = 0.1,
                          batch_size = 100,
                          callbacks= [EarlyStopping()],
                          device= device,
                          verbose = 0)

In [20]:
# Construct Gridseach to search for the best architecture
hidden_list = np.arange(50, 550, 50).tolist()
func_list = [1,2,3,4]

param_grid = {'module__hidden_dim':hidden_list,
              'module__function':func_list}
grid_1 = GridSearchCV(net, param_grid,
                    scoring = 'accuracy',
                    n_jobs = -1,
                    cv= 5,
                    verbose = 4,
                    return_train_score = True)

grid_1.fit(X_train, y_train)


Fitting 5 folds for each of 40 candidates, totalling 200 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  2.9min
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed: 20.4min
[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 81.3min finished


GridSearchCV(cv=5,
             estimator=<class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.MLP_1'>,
  module__function=1,
  module__hidden_dim=50,
),
             n_jobs=-1,
             param_grid={'module__function': [1, 2, 3, 4],
                         'module__hidden_dim': [50, 100, 150, 200, 250, 300,
                                                350, 400, 450, 500]},
             return_train_score=True, scoring='accuracy', verbose=4)

In [311]:
# %% Display the results
df_temp1 = pd.DataFrame(grid_1.cv_results_)
col = ['param_module__function','param_module__hidden_dim',
       'mean_test_score','std_test_score']


df_temp1 =  df_temp1[col]
df_temp1["param_module__function"].replace(func_dict, inplace=True)
print(df_temp1)

   param_module__function param_module__hidden_dim  mean_test_score  std_test_score
0                    Relu                       50         0.727583        0.010895
1                    Relu                      100         0.721250        0.037902
2                    Relu                      150         0.723917        0.034604
3                    Relu                      200         0.707417        0.030508
4                    Relu                      250         0.727417        0.023099
5                    Relu                      300         0.735000        0.011879
6                    Relu                      350         0.719667        0.019478
7                    Relu                      400         0.739833        0.036971
8                    Relu                      450         0.718083        0.023014
9                    Relu                      500         0.716583        0.032232
10              LeakyReLu                       50         0.724500        0

In [334]:
# Sort the result by accuracy score
print(df_temp1.sort_values('mean_test_score', ascending=False).head(20))

   param_module__function param_module__hidden_dim  mean_test_score  std_test_score
20                   tanh                       50         0.790833        0.008878
28                   tanh                      450         0.790500        0.007779
26                   tanh                      350         0.788583        0.012158
25                   tanh                      300         0.788500        0.011728
27                   tanh                      400         0.788250        0.014440
23                   tanh                      200         0.785917        0.016615
29                   tanh                      500         0.785667        0.011207
21                   tanh                      100         0.785667        0.014925
24                   tanh                      250         0.784250        0.018317
22                   tanh                      150         0.783083        0.017084
13              LeakyReLu                      200         0.755583        0

The architecture that provides the best mean test score is a small net work with just 50 dimension with a tanh activation funciton.
The next step is to find the appropriate learning rate for this network

#### 2.1.2 Tune the learning rate to the best 1 hidden layer architecture 

In [316]:
# Construct Gridseach to identify the best learning rate for the 
# previous network architecture (50 hidden nodes with activation function of tanh)
hidden_list = [50]
func_list = [3]
lr_list = [1,0.5,0.1,0.05,0.01,0.005,0.001]

param_grid = {'module__hidden_dim':hidden_list,
              'module__function':func_list,
             'lr':lr_list}


grid_1_2 = GridSearchCV(net, param_grid,
                    scoring = 'accuracy',
                    n_jobs = -1,
                    cv= 5,
                    verbose = 4,
                    return_train_score = True)

grid_1_2.fit(X_train, y_train)


Fitting 5 folds for each of 7 candidates, totalling 35 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  2.3min
[Parallel(n_jobs=-1)]: Done  35 out of  35 | elapsed: 23.9min finished


GridSearchCV(cv=5,
             estimator=<class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.MLP_1'>,
  module__function=1,
  module__hidden_dim=50,
),
             n_jobs=-1,
             param_grid={'lr': [1, 0.5, 0.1, 0.05, 0.01, 0.005, 0.001],
                         'module__function': [3], 'module__hidden_dim': [50]},
             return_train_score=True, scoring='accuracy', verbose=4)

In [317]:
#Display the result
df_temp1_2 = pd.DataFrame(grid_1_2.cv_results_)
col = ['param_lr',
       'mean_test_score','std_test_score']


df_temp1_2 =  df_temp1_2[col]
print(df_temp1_2)
 

  param_lr  mean_test_score  std_test_score
0        1         0.723917        0.040675
1      0.5         0.731833        0.103863
2      0.1         0.790167        0.008707
3     0.05         0.794583        0.004986
4     0.01         0.793083        0.024391
5    0.005         0.754667        0.018605
6    0.001         0.300333        0.031259


The leaning rate of 0.05 perfrome the best, this will be use in a candidate model from a network of 1 hidden layer.<br>
<b>Hence the final architecture for a MLP of 1 hidden layer is<b>
* Hidden layer: 50
* Activation Function: tanh
* Learning Rate: 0.05

#### 2.1.3 Create a network with 2 hidden layers

In [327]:
# %% Set Class
# MLP_2: 2 hidden layer

class MLP_2(nn.Module):
    def __init__(self,hidden_dim,function):
        super(MLP_2,self).__init__()
        
        hid1, hid2 = hidden_dim
        
        self.fc1 = nn.Linear(784,hid1)
        self.fc2 = nn.Linear(hid1,hid2)

        if function ==1:
            self.func = nn.ReLU()
        elif function == 2:
            self.func = nn.LeakyReLU()
        elif function == 3:
            self.func = nn.Tanh()
        elif function ==4:
            self.func = nn.Sigmoid()

        self.output = nn.Linear(hid2,10)

    def forward(self,x):

        hidden1 = self.fc1(x)
        hidden1 = self.func(hidden1)

        hidden2 = self.fc2(hidden1)
        hidden2 = self.func(hidden2)
        
        out = F.softmax(self.output(hidden2), dim = -1)

        return out

In [328]:
#Set Default parameter
net_2 = NeuralNetClassifier(module = MLP_2,
                          module__hidden_dim = (50,50),
                          module__function = 1,
                          max_epochs = 500,
                          criterion = nn.CrossEntropyLoss,
                          lr = 0.1,
                          batch_size = 100,
                          callbacks= [EarlyStopping()],
                          device= device,
                          verbose = 0)

The  number of the second hidden layer is set to be half of the first layer

In [98]:
# Setting up of combination of nodes of 2 hidden layer architect network
hidden_1_list = np.arange(50, 550, 50).tolist()
hidden_2_list = (np.array(hidden_1_list)/2).astype(int).tolist()


hidden_2MLP_list = []
for i in hidden_1_list:
    for j in hidden_2_list:
        if i/j ==2:
            hidden_tuple = (i,j)
            hidden_2MLP_list.append(hidden_tuple)
hidden_2MLP_list

[(50, 25),
 (100, 50),
 (150, 75),
 (200, 100),
 (250, 125),
 (300, 150),
 (350, 175),
 (400, 200),
 (450, 225),
 (500, 250)]

In [102]:
# Construct Gridseach to search for the best architecture for 2 hidden layer network

param_grid = {'module__hidden_dim':hidden_2MLP_list,
              'module__function':func_list}

grid_2 = GridSearchCV(net_2, param_grid,
                    scoring = 'accuracy',
                    n_jobs = -1,
                    cv= 5,
                    verbose = 4,
                    return_train_score = True)

grid_2.fit(X_train, y_train)

Fitting 5 folds for each of 40 candidates, totalling 200 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  1.8min
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed: 13.0min
[Parallel(n_jobs=-1)]: Done 200 out of 200 | elapsed: 26.1min finished


GridSearchCV(cv=5,
             estimator=<class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.MLP_2'>,
  module__function=1,
  module__hidden_dim=(50, 50),
),
             n_jobs=-1,
             param_grid={'module__function': [1, 2, 3, 4],
                         'module__hidden_dim': [(50, 25), (100, 50), (150, 75),
                                                (200, 100), (250, 125),
                                                (300, 150), (350, 175),
                                                (400, 200), (450, 225),
                                                (500, 250)]},
             return_train_score=True, scoring='accuracy', verbose=4)

In [321]:
# %%Display the result
df_temp2 = pd.DataFrame(grid_2.cv_results_)
col = ['param_module__function','param_module__hidden_dim',
       'mean_test_score','std_test_score']


df_temp2 =  df_temp2[col]
df_temp2["param_module__function"].replace(func_dict, inplace=True)
print(df_temp2)

   param_module__function param_module__hidden_dim  mean_test_score  std_test_score
0                    Relu                 (50, 25)         0.624000        0.048539
1                    Relu                (100, 50)         0.688167        0.052436
2                    Relu                (150, 75)         0.655500        0.070039
3                    Relu               (200, 100)         0.715250        0.041364
4                    Relu               (250, 125)         0.661750        0.062902
5                    Relu               (300, 150)         0.668500        0.044279
6                    Relu               (350, 175)         0.656083        0.046216
7                    Relu               (400, 200)         0.621583        0.028057
8                    Relu               (450, 225)         0.647667        0.040942
9                    Relu               (500, 250)         0.656583        0.057608
10              LeakyReLu                 (50, 25)         0.641333        0

In [323]:
# Sort the result by accuracy score
print(df_temp2.sort_values('mean_test_score', ascending=False).head(10))

   param_module__function param_module__hidden_dim  mean_test_score  std_test_score
25                   tanh               (300, 150)         0.772250        0.024246
29                   tanh               (500, 250)         0.762500        0.006651
24                   tanh               (250, 125)         0.758250        0.015121
21                   tanh                (100, 50)         0.756417        0.024465
28                   tanh               (450, 225)         0.751750        0.032881
22                   tanh                (150, 75)         0.747167        0.026965
27                   tanh               (400, 200)         0.745667        0.039122
23                   tanh               (200, 100)         0.742417        0.033416
26                   tanh               (350, 175)         0.739917        0.045696
17              LeakyReLu               (400, 200)         0.718167        0.025576


The architecture that provides the best mean test score is a network with hidden layer of (300,150) dimension with a tanh activation funciton.
The next step is to find the appropriate learning rate for this network

#### 2.1.4 Tune the learning rate to the best 2 hidden layer architecture

In [329]:
# Construct Gridseach to identify the best learning rate for the 
# previous network architecture ((300,150) hidden nodes with activation function of tanh)
lr_list =[1,0.5,0.1,0.05,0.01,0.005,0.001]

param_grid = {'module__hidden_dim': [(300,150)],
              'module__function':[3],
              'lr': lr_list}

grid_2_1 = GridSearchCV(net_2, param_grid,
                    scoring = 'accuracy',
                    n_jobs = -1,
                    cv= 5,
                    verbose = 4,
                    return_train_score = True)

grid_2_1.fit(X_train, y_train)

Fitting 5 folds for each of 7 candidates, totalling 35 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:  2.5min
[Parallel(n_jobs=-1)]: Done  35 out of  35 | elapsed: 22.5min finished


GridSearchCV(cv=5,
             estimator=<class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.MLP_2'>,
  module__function=1,
  module__hidden_dim=(50, 50),
),
             n_jobs=-1,
             param_grid={'lr': [1, 0.5, 0.1, 0.05, 0.01, 0.005, 0.001],
                         'module__function': [3],
                         'module__hidden_dim': [(300, 150)]},
             return_train_score=True, scoring='accuracy', verbose=4)

In [330]:
# %%Display the result
df_temp2_1 = pd.DataFrame(grid_2_1.cv_results_)
col = ['param_lr',
       'mean_test_score','std_test_score']


df_temp2_1 =  df_temp2_1[col]
print(df_temp2_1)

  param_lr  mean_test_score  std_test_score
0        1         0.744833        0.058387
1      0.5         0.767583        0.055823
2      0.1         0.764250        0.020312
3     0.05         0.784333        0.011157
4     0.01         0.783500        0.008782
5    0.005         0.776417        0.025626
6    0.001         0.098333        0.010541


### 2.2 Compared the performance of the networks <a id='MLP_2.2'></a> 

According to the tuning result of MLP using the pixel intensity as an input the best score for each MLP are

<b>1. MLP with 1 hidden layer <b>
* Mean_Score =0.795
* Size of hidden layer = 50
* Activation function = tanh
* Learning Rate = 0.05

<b>2. MLP with 2 hidden layer<b>
* Mean_Score =0.784
* Size of hidden layer (300,150)
* Activation function = tanh
* Learning Rate = 0.05

The best aritecture is the one with only 1 hidden layer. This architecture will be trained using the whole dataset again and export to test with the test set.


In [36]:
#Set parameter
net = NeuralNetClassifier(module = MLP_1,
                          module__hidden_dim = 50,
                          module__function = 3,
                          max_epochs = 500,
                          criterion = nn.CrossEntropyLoss,
                          lr = 0.05,
                          batch_size = 100,
                          callbacks= [EarlyStopping()],
                          device= device,
                          verbose = 1)
#Export Model
t0 = time.time()
net.fit(X_train,y_train)
t1 = time.time()
pkl_filename = "MLP_pixel.pkl"
with open(pkl_filename, 'wb') as file:
    pickle.dump(net, file)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m2.3011[0m       [32m0.1800[0m        [35m2.2987[0m  0.4615
      2        [36m2.2968[0m       [32m0.2258[0m        [35m2.2943[0m  0.2633
      3        [36m2.2917[0m       0.2233        [35m2.2883[0m  0.2144
      4        [36m2.2843[0m       0.2025        [35m2.2794[0m  0.2107
      5        [36m2.2726[0m       0.2037        [35m2.2649[0m  0.2213
      6        [36m2.2540[0m       0.2050        [35m2.2434[0m  0.2284
      7        [36m2.2297[0m       0.2175        [35m2.2185[0m  0.2133
      8        [36m2.2042[0m       [32m0.2992[0m        [35m2.1939[0m  0.2275
      9        [36m2.1791[0m       [32m0.3725[0m        [35m2.1698[0m  0.2927
     10        [36m2.1538[0m       [32m0.4129[0m        [35m2.1450[0m  0.2401
     11        [36m2.1269[0m       [32m0.4508[0m        [35m2.1184[0m  0.2096
     12

In [37]:
print('Fitting time',t1-t0)

Fitting time 37.87144494056702


##  3. Train MLP using HOG descriptor as an input  <a id='MLP_3'></a> 

The method to train MLP is divided into 2 steps

- 3.1 Find appropriate architecture by tuning hidden layer, hidden dimension and activation function
  - 3.1.1 Create a network with 1 hidden layer
  - 3.1.2 Tune the learning rate to the best 1 hidden layer architecture 
  - 3.1.3 Create a network with 2 hidden layers
  - 3.1.4 Tune the learning rate to the best 2 hidden layer architecture

3.  Compared the performance of the networks 

Before we move on to create a model and search for optimized parameters, we need to create some function and transform our data to HOG features

In [38]:
class HogTransformer(BaseEstimator, TransformerMixin):
    """
    Expects an array of 2d arrays (1 channel images)
    Calculates hog features for each img
    """

    def __init__(self, y=None, orientations=8,
                 pixels_per_cell=(2, 2),
                 cells_per_block=(2, 2)):
        self.y = y
        self.orientations = orientations
        self.pixels_per_cell = pixels_per_cell
        self.cells_per_block = cells_per_block

    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):

        def local_hog(X):
            return hog(X,
                       orientations=self.orientations,
                       pixels_per_cell=self.pixels_per_cell,
                       cells_per_block=self.cells_per_block)

        try:  # parallel
            return np.array([local_hog(img) for img in X])
        except:
            return np.array([local_hog(img) for img in X])

In [39]:
class dataTransform(BaseEstimator, TransformerMixin):
    """
    Transform  numpy datatype to tensor
    """

    def __init__(self,y=None):
        self.y =y
    def fit(self, X, y=None):
        return self
    def transform(self, X, y=None):
        #def simpleconvert(x):
        #    return 
        return torch.from_numpy(X).float()


In [40]:
# Redownload the data
df_train = np.loadtxt(train_path)
df_test = np.loadtxt(test_path)

X_train = df_train[:,0:-1]
y_train = df_train[:,-1]
X_test = df_test[:,0:-1]
y_test = df_test[:,-1]

X_train = X_train.reshape(X_train.shape[0],28,28)
X_test = X_test.reshape(X_test.shape[0],28,28)

y_train = torch.from_numpy(np.array(y_train))
y_train = y_train.type(torch.LongTensor)

y_test = torch.from_numpy(np.array(y_test))
y_test = y_test.type(torch.LongTensor)

### 3.1 Find appropriate architecture by tuning hidden layer, hidden dimension and activation function <a id='MLP_3.1'></a> 

#### 3.1.1 Create a network with 1 hidden layer

The general aritecture of the MLP in this section will have the input as 5408 (HOG feature),the output of 10.
Softmax activation function at the output nodes and use cross entropy as a loss function


In [41]:
# %% Set Class
# MLP_3: 1 hidden layer; with Softmax at output layer

class MLP_3_HOG(nn.Module):
    def __init__(self,hidden_dim,function):
        super(MLP_3_HOG,self).__init__()
        self.fc1 = nn.Linear(5408,hidden_dim)

        if function ==1:
            self.func = nn.ReLU()
        elif function == 2:
            self.func = nn.LeakyReLU()
        elif function == 3:
            self.func = nn.Tanh()
        elif function ==4:
            self.func = nn.Sigmoid()

        self.output = nn.Linear(hidden_dim,10)

    def forward(self,x):
        #Change to tensor
        hidden = self.fc1(x)
        hidden = self.func(hidden)

        out = F.softmax(self.output(hidden), dim = -1)

        return out

In [25]:
#Set Default parameter
net_HOG_1 = NeuralNetClassifier(module = MLP_3_HOG,
                          module__hidden_dim = 50,
                          module__function = 1,
                          max_epochs = 500,
                          criterion = nn.CrossEntropyLoss,
                          lr = 0.1,
                          batch_size = 100,
                          callbacks= [EarlyStopping()],
                          device= device,
                          verbose = 0)

In [32]:
# Construct Gridseach to search for the best architecture

hidden_list = np.arange(100, 2200, 200).tolist()
func_list = [1,2,3,4]

# Initiate a pipleline to turn the image to HOG, standardize and train
pipeline_1 = Pipeline(
    [
    ('hogify', HogTransformer(
        pixels_per_cell=(2, 2),
        cells_per_block=(2, 2),
        orientations=8)
     ),
    ('scalify', StandardScaler()),
        ('dataloader',dataTransform()),
    ('classify',net_HOG_1)
    ]
)

param_grid = {'classify__module__hidden_dim':hidden_list,
              'classify__module__function':func_list}
grid_1_HOG = GridSearchCV(pipeline_1, param_grid,
                    scoring = 'accuracy',
                    n_jobs = -1,
                    cv= 5,
                    verbose = 4,
                    return_train_score = True)


grid_1_HOG.fit(X_train, y_train)

Fitting 5 folds for each of 44 candidates, totalling 220 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed: 26.4min
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed: 215.1min
[Parallel(n_jobs=-1)]: Done 213 tasks      | elapsed: 491.4min
[Parallel(n_jobs=-1)]: Done 220 out of 220 | elapsed: 525.1min finished


GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('hogify', HogTransformer()),
                                       ('scalify', StandardScaler()),
                                       ('dataloader', dataTransform()),
                                       ('classify',
                                        <class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.MLP_3_HOG'>,
  module__function=1,
  module__hidden_dim=50,
))]),
             n_jobs=-1,
             param_grid={'classify__module__function': [1, 2, 3, 4],
                         'classify__module__hidden_dim': [100, 300, 500, 700,
                                                          900, 1100, 1300, 1500,
                                                          1700, 1900, 2100]},
             return_train_score=True, scoring='accuracy', verbose=4)

In [33]:
#Display the result
df_temp5 = pd.DataFrame(grid_1_HOG.cv_results_)
col = ['param_classify__module__hidden_dim','param_classify__module__function',
       'mean_test_score','std_test_score']


df_temp5 =  df_temp5[col]
print(df_temp5)
df_temp5.to_csv('MLP_raw.csv')

   param_classify__module__hidden_dim param_classify__module__function  mean_test_score  std_test_score
0                                 100                                1         0.776500        0.008116
1                                 300                                1         0.780750        0.005213
2                                 500                                1         0.780583        0.005506
3                                 700                                1         0.782250        0.006742
4                                 900                                1         0.781167        0.006616
5                                1100                                1         0.780000        0.006806
6                                1300                                1         0.786500        0.003914
7                                1500                                1         0.782000        0.004522
8                                1700                           

In [34]:
# Sort the result by accuracy score
print(df_temp5.sort_values('mean_test_score', ascending=False).head(20))

   param_classify__module__hidden_dim param_classify__module__function  mean_test_score  std_test_score
9                                1900                                1         0.786583        0.008393
6                                1300                                1         0.786500        0.003914
19                               1700                                2         0.784917        0.009027
10                               2100                                1         0.783333        0.008441
16                               1100                                2         0.782833        0.006310
15                                900                                2         0.782667        0.005606
18                               1500                                2         0.782500        0.007250
3                                 700                                1         0.782250        0.006742
7                                1500                           

The mean_test_score for the first top 20 is very close with the difference in the 3rd decimal place. The hidden nodes of 1900 ( with a Relu Activation) performs best with the mean test score of 0.787. However, the network of 1300 hidden nodes (Relu Activation)) also has a valdiation score that is in the same level. In this case we choose a smaller network (1300 hidden nodes) as a candidate for the next step.

The next step is to find the appropriate learning rate for the 1 hidden MLP with a hidden dimeansion of 1300 with a Relu activaiton function

#### 3.1.2 Tune the learning rate to the best 1 hidden layer architecture 

In [15]:
# Construct Gridseach to identify the best learning rate for the 
# previous network architecture (700 hidden nodes with activation function of leakyRelu)

hidden_list =[1300]
func_list = [1]

# Create pipeline
pipeline_1 = Pipeline(
    [
    ('hogify', HogTransformer(
        pixels_per_cell=(2, 2),
        cells_per_block=(2, 2),
        orientations=8)
     ),
    ('scalify', StandardScaler()),
        ('dataloader',dataTransform()),
    ('classify',net_HOG_1)
    ]
)

# In this section lower limit of  learning rate is set to be 0.005 since from the preliminary test, 
# the learning rate pf 0.001 take too long
lr_list = [1,0.5,0.1,0.05,0.01,0.005]

param_grid = {'classify__module__hidden_dim':hidden_list,
              'classify__module__function':func_list,
             'classify__lr':lr_list}
grid_2_HOG = GridSearchCV(pipeline_1, param_grid,
                    scoring = 'accuracy',
                    n_jobs = -1,
                    cv= 5,
                    verbose = 4,
                    return_train_score = True)


grid_2_HOG.fit(X_train, y_train)

Fitting 5 folds for each of 6 candidates, totalling 30 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed: 39.4min
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed: 146.7min finished


GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('hogify', HogTransformer()),
                                       ('scalify', StandardScaler()),
                                       ('dataloader', dataTransform()),
                                       ('classify',
                                        <class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.MLP_3_HOG'>,
  module__function=1,
  module__hidden_dim=50,
))]),
             n_jobs=-1,
             param_grid={'classify__lr': [1, 0.5, 0.1, 0.05, 0.01, 0.005],
                         'classify__module__function': [1],
                         'classify__module__hidden_dim': [1300]},
             return_train_score=True, scoring='accuracy', verbose=4)

In [16]:
#Display the result
df_temp6 = pd.DataFrame(grid_2_HOG.cv_results_)
col = ['param_classify__lr',
       'mean_test_score','std_test_score']


df_temp6 =  df_temp6[col]
print(df_temp6)
df_temp6.to_csv('MLP_HOG_1_LR.csv')

  param_classify__lr  mean_test_score  std_test_score
0                  1         0.748750        0.007737
1                0.5         0.775417        0.006967
2                0.1         0.785000        0.007509
3               0.05         0.784500        0.007511
4               0.01         0.784750        0.005340
5              0.005         0.783583        0.006426


There is no major improvement when changing changing the learning rate, we will use th learning rate of 0.1 for this network.


<b>Hence the final architecture for a MLP of 1 hidden layer is<b>
* Hidden layer: 1300
* Activation Function: Relu
* Learning Rate: 0.1

#### 3.1.3 Create a network with 2 hidden layers

In [7]:
# %% Set Class
# MLP_2: 2 hidden layer

class MLP_4(nn.Module):
    def __init__(self,hidden_dim,function):
        super(MLP_4,self).__init__()
        
        hid1, hid2 = hidden_dim
        
        self.fc1 = nn.Linear(5408,hid1)
        self.fc2 = nn.Linear(hid1,hid2)

        if function ==1:
            self.func = nn.ReLU()
        elif function == 2:
            self.func = nn.LeakyReLU()
        elif function == 3:
            self.func = nn.Tanh()
        elif function ==4:
            self.func = nn.Sigmoid()

        self.output = nn.Linear(hid2,10)

    def forward(self,x):

        hidden1 = self.fc1(x)
        hidden1 = self.func(hidden1)

        hidden2 = self.fc2(hidden1)
        hidden2 = self.func(hidden2)
        
        out = F.softmax(self.output(hidden2), dim = -1)

        return out

In [11]:
# Setting up of combination of nodes of 2 hidden layer architect network
hidden_1_list = np.arange(100, 2200, 200).tolist()
hidden_2_list = (np.array(hidden_1_list)/2).astype(int).tolist()
func_list = [1,2,3,4]


hidden_2MLP_list = []
for i in hidden_1_list:
    for j in hidden_2_list:
        if i/j ==2:
            hidden_tuple = (i,j)
            hidden_2MLP_list.append(hidden_tuple)
hidden_2MLP_list

[(100, 50),
 (300, 150),
 (500, 250),
 (700, 350),
 (900, 450),
 (1100, 550),
 (1300, 650),
 (1500, 750),
 (1700, 850),
 (1900, 950),
 (2100, 1050)]

In [12]:
#Set Default parameter
net_HOG_3 = NeuralNetClassifier(module = MLP_4,
                          module__hidden_dim = 50,
                          module__function = 3,
                          max_epochs = 500,
                          criterion = nn.CrossEntropyLoss,
                          lr = 0.1,
                          batch_size = 100,
                          callbacks= [EarlyStopping()],
                          device= device,
                          verbose = 0)

In [14]:
# Construct Gridseach to search for the best architecture for 2 hidden layer network

param_grid = {'classify__module__hidden_dim':hidden_2MLP_list,
              'classify__module__function':func_list}

pipeline_2 = Pipeline(
    [
    ('hogify', HogTransformer(
        pixels_per_cell=(2, 2),
        cells_per_block=(2, 2),
        orientations=8)
     ),
    ('scalify', StandardScaler()),
        ('dataloader',dataTransform()),
    ('classify',net_HOG_3)
    ]
)

grid_3_HOG = GridSearchCV(pipeline_2, param_grid,
                    scoring = 'accuracy',
                    n_jobs = -1,
                    cv= 5,
                    verbose = 4,
                    return_train_score = True)

grid_3_HOG.fit(X_train, y_train)

Fitting 5 folds for each of 44 candidates, totalling 220 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed: 35.2min
[Parallel(n_jobs=-1)]: Done  90 tasks      | elapsed: 297.7min
[Parallel(n_jobs=-1)]: Done 213 tasks      | elapsed: 836.9min
[Parallel(n_jobs=-1)]: Done 220 out of 220 | elapsed: 920.9min finished


GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('hogify', HogTransformer()),
                                       ('scalify', StandardScaler()),
                                       ('dataloader', dataTransform()),
                                       ('classify',
                                        <class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.MLP_4'>,
  module__function=3,
  module__hidden_dim=50,
))]),
             n_jobs=-1,
             param_grid={'classify__module__function': [1, 2, 3, 4],
                         'classify__module__hidden_dim': [(100, 50), (300, 150),
                                                          (500, 250),
                                                          (700, 350),
                                                          (900, 450),
                                                          (1100, 550),
                                                          (1300, 650),


In [15]:
#Display the result
df_temp7 = pd.DataFrame(grid_3_HOG.cv_results_)
col = ['param_classify__module__hidden_dim','param_classify__module__function',
       'mean_test_score','std_test_score']


df_temp7 =  df_temp7[col]
print(df_temp7)
df_temp7.to_csv('MLP_HOG.csv')

   param_classify__module__hidden_dim param_classify__module__function  mean_test_score  std_test_score
0                           (100, 50)                                1         0.770583        0.006533
1                          (300, 150)                                1         0.778333        0.009923
2                          (500, 250)                                1         0.779417        0.003967
3                          (700, 350)                                1         0.782333        0.003797
4                          (900, 450)                                1         0.782917        0.005633
5                         (1100, 550)                                1         0.781083        0.002988
6                         (1300, 650)                                1         0.781417        0.004997
7                         (1500, 750)                                1         0.781833        0.006399
8                         (1700, 850)                           

In [16]:
# Sort the result by accuracy score
print(df_temp7.sort_values('mean_test_score', ascending=False).head(15))

   param_classify__module__hidden_dim param_classify__module__function  mean_test_score  std_test_score
4                          (900, 450)                                1         0.782917        0.005633
17                        (1300, 650)                                2         0.782333        0.007645
3                          (700, 350)                                1         0.782333        0.003797
8                         (1700, 850)                                1         0.782083        0.005815
10                       (2100, 1050)                                1         0.782083        0.005761
7                         (1500, 750)                                1         0.781833        0.006399
6                         (1300, 650)                                1         0.781417        0.004997
18                        (1500, 750)                                2         0.781333        0.005586
14                         (700, 350)                           

The mean_test_score for the first top 15 is very close with the difference in the 3rd decimal place. The best network is a network of 900 and 450 hidden nodes and a Relu Activation Function.

The next step is to find the good learning rate for the 2 hidden MLP with a hidden dimeansion of (900,450) with a Relu activaiton function

#### 3.1.4 Tune the learning rate to the best 2 hidden layer architecture

In [19]:
# Setting gridsearch for 2 hidden layer network
hidden_list =[(900,450)]
func_list = [1]
lr_list = [1,0.5,0.1,0.05,0.01,0.005]




param_grid = {'classify__module__hidden_dim':hidden_list,
              'classify__module__function':func_list,
              'classify__lr':lr_list}

grid_4_HOG = GridSearchCV(pipeline_2, param_grid,
                    scoring = 'accuracy',
                    n_jobs = -1,
                    cv= 5,
                    verbose = 4,
                    return_train_score = True)

grid_4_HOG.fit(X_train, y_train)

Fitting 5 folds for each of 6 candidates, totalling 30 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed: 33.6min
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed: 129.5min finished


GridSearchCV(cv=5,
             estimator=Pipeline(steps=[('hogify', HogTransformer()),
                                       ('scalify', StandardScaler()),
                                       ('dataloader', dataTransform()),
                                       ('classify',
                                        <class 'skorch.classifier.NeuralNetClassifier'>[uninitialized](
  module=<class '__main__.MLP_4'>,
  module__function=3,
  module__hidden_dim=50,
))]),
             n_jobs=-1,
             param_grid={'classify__lr': [1, 0.5, 0.1, 0.05, 0.01, 0.005],
                         'classify__module__function': [1],
                         'classify__module__hidden_dim': [(900, 450)]},
             return_train_score=True, scoring='accuracy', verbose=4)

In [22]:
#Display the result
df_temp8 = pd.DataFrame(grid_4_HOG.cv_results_)
col = ['param_classify__lr',
       'mean_test_score','std_test_score']


df_temp8 =  df_temp8[col]
print(df_temp8)

  param_classify__lr  mean_test_score  std_test_score
0                  1         0.718833        0.008260
1                0.5         0.758083        0.007784
2                0.1         0.780667        0.004071
3               0.05         0.783250        0.007295
4               0.01         0.780250        0.003255
5              0.005         0.782167        0.005207


The best performance is learning rate of 0.05 with a mean_test_score of 0.783


### 3.2 Compared the performance of the networks <a id='MLP_3.2'></a> 

According to the tuning result of MLP using the HOG as an input the best score for each MLP are

<b>1. MLP with 1 hidden layer <b>
* Mean_Score =0.785
* Size of hidden layer 1900
* Activation function = Relu
* Learning Rate = 0.1

<b>2. MLP with 2 hidden layer<b>
* Mean_Score =0.783
* Size of hidden layer (900,450)
* Activation function = Relu
* Learning Rate = 0.05

The best aritecture is the one with only 1 hidden layer. This architecture will be trained using the whole dataset again and export to test with the test set in the next section.


In [43]:
#Set Default parameter
net_HOG_1 = NeuralNetClassifier(module = MLP_3_HOG,
                          module__hidden_dim = 700,
                          module__function = 1,
                          max_epochs = 500,
                          criterion = nn.CrossEntropyLoss,
                          lr = 0.1,
                          batch_size = 100,
                          callbacks= [EarlyStopping()],
                          device= device,
                          verbose = 1)


pipeline_1 = Pipeline(
    [
    ('hogify', HogTransformer(
        pixels_per_cell=(2, 2),
        cells_per_block=(2, 2),
        orientations=8)
     ),
    ('scalify', StandardScaler()),
        ('dataloader',dataTransform()),
    ('classify',net_HOG_1)
    ]
)

t0 = time.time()
pipeline_1.fit(X_train,y_train)
t1 =time.time()
pkl_filename = "MLP_hog.pkl"
with open(pkl_filename, 'wb') as file:
    pickle.dump(pipeline_1, file)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m2.2275[0m       [32m0.4458[0m        [35m2.0967[0m  2.2530
      2        [36m1.9743[0m       [32m0.6250[0m        [35m1.9105[0m  2.1615
      3        [36m1.7800[0m       [32m0.8037[0m        [35m1.7620[0m  2.1785
      4        [36m1.6398[0m       [32m0.8137[0m        [35m1.7133[0m  2.2684
      5        [36m1.5747[0m       [32m0.8150[0m        [35m1.6983[0m  2.1817
      6        [36m1.5400[0m       0.8096        [35m1.6919[0m  2.1577
      7        [36m1.5204[0m       0.8063        [35m1.6884[0m  2.1714
      8        [36m1.5086[0m       0.8046        [35m1.6862[0m  2.1698
      9        [36m1.5007[0m       0.8054        [35m1.6844[0m  2.1809
     10        [36m1.4954[0m       0.8042        [35m1.6828[0m  2.1814
     11        [36m1.4915[0m       0.8029        [35m1.6815[0m  2.1751
     12        

In [44]:
print('Fitting time',t1-t0)

Fitting time 151.33455419540405
