Applying Convolutional neural network on MNIST dataset (In this notebook, CNN is tried out with different number of layers and different architectures with help of 5x5 kernel)

In [0]:
#loading all the necessary packages for data processing and plotting
import numpy as np
import pandas as pd
import plotly.express as px
import plotly as py
import plotly.graph_objects as go


In [0]:
#importing all the deep_learning packages
import tensorflow.keras 
from tensorflow.keras.layers import Dense,Activation,Conv2D,MaxPooling2D,Dropout,Flatten
from tensorflow.keras.models import Sequential
from keras.utils import np_utils
from tensorflow.keras.datasets import mnist
from tensorflow.keras.initializers import RandomNormal,RandomUniform,he_normal,he_uniform
from tensorflow.keras import backend as k

Using TensorFlow backend.


In [0]:
#load the data and make the data ready to pass on to the architecture
(X_train,y_train),(X_test,y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [0]:
print(f"train data shape {X_train.shape}")
print(f"test data shape {X_test.shape}")
print(f"train data labels shape {y_train.shape}")
print(f"test data labels shape {y_test.shape}")

train data shape (60000, 28, 28)
test data shape (10000, 28, 28)
train data labels shape (60000,)
test data labels shape (10000,)


In [0]:
img_rows,img_cols = 28,28

if tensorflow.keras.backend.image_data_format() == 'channels_first':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)


In [0]:
#normalizing the data
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

X_train = (X_train-X_train.min())/(X_train.max()-X_train.min())
X_test = (X_test-X_test.min())/(X_test.max()-X_test.min())

print(f"train samples -- {X_train.shape[0]}")
print(f"test samples -- {X_test.shape[0]}")

print(f"train samples shape -- {X_train.shape}")
print(f"test samples shape -- {X_test.shape}")

train samples -- 60000
test samples -- 10000
train samples shape -- (60000, 28, 28, 1)
test samples shape -- (10000, 28, 28, 1)


In [0]:
#other important parameters
batch_size = 128
epochs = 20

In [0]:
# we have a class number for each image
# lets convert this into a 10 dimensional vector
# ex: consider an image is 5 convert it into 5 => [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
# this conversion needed for MLPs
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

In [0]:
#plot function to generate loss plots
no_of_epochs = list(range(1,21))


def loss_plots(n_epochs,train_loss,validation_loss):

  fig = go.Figure()
  fig.add_trace(go.Scatter(x=n_epochs, y=train_loss,
                  mode='lines+markers',name='train_loss'))

  fig.add_trace(go.Scatter(x=n_epochs, y=validation_loss,
                  mode='lines+markers',name='validation_loss'))

  fig.update_layout(title='Train_loss and Validation_loss',xaxis_title='number of epochs',yaxis_title='Categorical Cross Entropy Loss')
  fig.show()

def accuracy_plots(n_epochs,train_accuracy,validation_accuracy):

  fig = go.Figure()
  fig.add_trace(go.Scatter(x=n_epochs, y=train_accuracy,
                  mode='lines+markers',name='train_accuracy'))

  fig.add_trace(go.Scatter(x=n_epochs, y=validation_accuracy,
                  mode='lines+markers',name='validation_accuracy'))

  fig.update_layout(title='Train_accuracy and Validation_accuracy',xaxis_title='number of epochs',yaxis_title='Accuracy')
  fig.show()

## Model -1 
Basic Architecture of Convnet Only **3 layers of ConvNets** (Without any BatchNormalization or any complex layers) **Without** **padding and MaxPooling layer and with default stride**

In [0]:
model_1 = Sequential()
model_1.add(Conv2D(128,kernel_size=(5,5),
                   activation='relu',
                   input_shape=input_shape))
model_1.add(Conv2D(64,kernel_size=(5,5),
                   activation='relu'))
model_1.add(Conv2D(32,kernel_size=(5,5),
                   activation='relu'))


model_1.add(Dropout(0.25))
model_1.add(Flatten())
model_1.add(Dense(128,activation='relu'))
model_1.add(Dropout(0.25))
model_1.add(Dense(10,activation='softmax'))


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


In [0]:
model_1.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 24, 24, 128)       3328      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 20, 20, 64)        204864    
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 16, 16, 32)        51232     
_________________________________________________________________
dropout (Dropout)            (None, 16, 16, 32)        0         
_________________________________________________________________
flatten (Flatten)            (None, 8192)              0         
_________________________________________________________________
dense (Dense)                (None, 128)               1048704   
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0

In [0]:
model_1.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [0]:
history = model_1.fit(X_train,Y_train,batch_size=batch_size,epochs=epochs,validation_data=(X_test,Y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [0]:
score = model_1.evaluate(X_test,Y_test,verbose=0)

In [0]:
print(f"Test score is {score[0]}")
print(f"Test accuracy is {score[1]}")

Test score is 0.03569596311414578
Test accuracy is 0.9927999973297119


In [0]:
validation_loss = history.history['val_loss']
train_loss = history.history['loss']
train_accuracy = history.history['acc']
validation_accuracy = history.history['val_acc']
loss_plots(no_of_epochs,train_loss,validation_loss)
accuracy_plots(no_of_epochs,train_accuracy,validation_accuracy)

**Observation**
*   There is slight overfit in this model (which can be interpreted from both the plots)
*   ideal epochs might be considered as 6 or 7 in this architecture because when only at that stage both train loss and test loss are close
*   This is very basic architecture without batch normalization and max pooling and padding. Let's see further cases.




## Model - 2 
Slightly complex Architecture of Convnet Only **4 layers of ConvNets** (With BatchNormalization and MaxPooling and slight complex layers) **Without** **padding and with default stride** 4 layers of ConvNet because another one layer is MaxPool , which is pretty different from convNet layer

In [0]:
from tensorflow.keras.layers import BatchNormalization

In [0]:
model_2 = Sequential()
model_2.add(Conv2D(256,kernel_size=(5,5),
                   activation='relu',
                   input_shape=input_shape))
model_2.add(Conv2D(128,kernel_size=(5,5),
                   activation='relu'))
model_2.add(MaxPooling2D(pool_size=(2,2)))
model_2.add(Conv2D(64,kernel_size=(5,5),
                   activation='relu'))
model_2.add(Conv2D(32,kernel_size=(5,5),
                   activation='relu'))
model_2.add(MaxPooling2D(pool_size=(2,2)))


model_2.add(Flatten())
model_2.add(Dense(128,activation='relu'))
model_2.add(BatchNormalization())
model_2.add(Dropout(0.25))
model_2.add(Dense(64,activation='relu'))
model_2.add(BatchNormalization())
model_2.add(Dropout(0.25))
model_2.add(Dense(10,activation='softmax'))


In [0]:
model_2.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_8 (Conv2D)            (None, 24, 24, 256)       6656      
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 20, 20, 128)       819328    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 10, 10, 128)       0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 6, 6, 64)          204864    
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 2, 2, 32)          51232     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 1, 1, 32)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 32)               

In [0]:
model_2.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [0]:
history = model_2.fit(X_train,Y_train,batch_size=batch_size,epochs=epochs,validation_data=(X_test,Y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [0]:
score = model_2.evaluate(X_test,Y_test,verbose=0)

In [0]:
print(f"Test score is {score[0]}")
print(f"Test accuracy is {score[1]}")

Test score is 0.03344524789099032
Test accuracy is 0.9925000071525574


In [0]:
validation_loss = history.history['val_loss']
train_loss = history.history['loss']
train_accuracy = history.history['acc']
validation_accuracy = history.history['val_acc']
loss_plots(no_of_epochs,train_loss,validation_loss)
accuracy_plots(no_of_epochs,train_accuracy,validation_accuracy)

**Observation**
*   Even in this model, there is overfitting condition (which can be interpreted from both the plots)
*   As we have seen in the example of Resnet as no.of.layers are increasing the model's performance is getting worse, the better approach would be to use hyperparameter tuning to find accurate number of epochs, moreover there are architecture and kernel size might also be the issue (which are again hyperparameters).
*   There is uncertainity in the losses between train and test.


## Model - 3
Complex Architecture of Convnet Only **6 layers of ConvNets** (With BatchNormalization and MaxPooling and **With** **padding and default stride**)

In [0]:
model_3 = Sequential()
model_3.add(Conv2D(512,kernel_size=(5,5),
                   activation='relu',
                   padding='same',
                   input_shape=input_shape))
model_3.add(Conv2D(128,kernel_size=(5,5),
                   activation='relu',
                   padding='same'))
model_3.add(MaxPooling2D(pool_size=(2,2)))
model_3.add(Conv2D(64,kernel_size=(5,5),
                   activation='relu',
                   padding='same'))
model_3.add(Conv2D(32,kernel_size=(5,5),
                   activation='relu',
                   padding='same'))
model_3.add(MaxPooling2D(pool_size=(2,2)))
model_3.add(Conv2D(16,kernel_size=(5,5),
                   activation='relu',
                   padding='same'))
model_3.add(Conv2D(32,kernel_size=(5,5),
                   activation='relu',
                   padding='same'))
model_3.add(MaxPooling2D(pool_size=(2,2)))

model_3.add(Flatten())
model_3.add(Dense(256,activation='relu'))
model_3.add(BatchNormalization())
model_3.add(Dropout(0.20))
model_3.add(Dense(128,activation='relu'))
model_3.add(BatchNormalization())
model_3.add(Dropout(0.20))
model_3.add(Dense(64,activation='relu'))
model_3.add(BatchNormalization())
model_3.add(Dropout(0.20))
model_3.add(Dense(10,activation='softmax'))


In [0]:
model_3.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_12 (Conv2D)           (None, 28, 28, 512)       13312     
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 28, 28, 128)       1638528   
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 14, 14, 64)        204864    
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 14, 14, 32)        51232     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 7, 7, 32)          0         
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 7, 7, 16)         

In [0]:
model_3.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [0]:
history = model_3.fit(X_train,Y_train,batch_size=batch_size,epochs=epochs,validation_data=(X_test,Y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [0]:
score = model_3.evaluate(X_test,Y_test,verbose=0)

In [0]:
print(f"Test score is {score[0]}")
print(f"Test accuracy is {score[1]}")

Test score is 0.02247127885554364
Test accuracy is 0.9937999844551086


In [0]:
validation_loss = history.history['val_loss']
train_loss = history.history['loss']
train_accuracy = history.history['acc']
validation_accuracy = history.history['val_acc']
loss_plots(no_of_epochs,train_loss,validation_loss)
accuracy_plots(no_of_epochs,train_accuracy,validation_accuracy)

**Observation**:


1.   In the initial epochs, it has been observed as clear case of **Overfitting** , it can be interpreted from huge uncertainity in the loss and accuracy plots.

2.   But, surprisingly, while the number of epochs have been increasing , loss has been reducing very fast , which also improved accuracy. So, we can say that this model is good than others because it has been improving with respect to epochs. (which is ideal case)



**Final Observation**

Among all the architectures complex architecture which we implemented as **model_3 has performed better than other architectures**. 