Applying Convolutional neural network on MNIST dataset (In this notebook, CNN is tried out with different number of layers and different architectures with help of 3x3 kernel)

In [0]:
#loading all the necessary packages for data processing and plotting
import numpy as np
import pandas as pd
import plotly.express as px
import plotly as py
import plotly.graph_objects as go


In [0]:
#importing all the deep_learning packages
import tensorflow.keras 
from tensorflow.keras.layers import Dense,Activation,Conv2D,MaxPooling2D,Dropout,Flatten
from tensorflow.keras.models import Sequential
from keras.utils import np_utils
from tensorflow.keras.datasets import mnist
from tensorflow.keras.initializers import RandomNormal,RandomUniform,he_normal,he_uniform
from tensorflow.keras import backend as k

Using TensorFlow backend.


In [0]:
#load the data and make the data ready to pass on to the architecture
(X_train,y_train),(X_test,y_test) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [0]:
print(f"train data shape {X_train.shape}")
print(f"test data shape {X_test.shape}")
print(f"train data labels shape {y_train.shape}")
print(f"test data labels shape {y_test.shape}")

train data shape (60000, 28, 28)
test data shape (10000, 28, 28)
train data labels shape (60000,)
test data labels shape (10000,)


In [0]:
img_rows,img_cols = 28,28

if tensorflow.keras.backend.image_data_format() == 'channels_first':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)


In [0]:
#normalizing the data
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

X_train = (X_train-X_train.min())/(X_train.max()-X_train.min())
X_test = (X_test-X_test.min())/(X_test.max()-X_test.min())

print(f"train samples -- {X_train.shape[0]}")
print(f"test samples -- {X_test.shape[0]}")

print(f"train samples shape -- {X_train.shape}")
print(f"test samples shape -- {X_test.shape}")

train samples -- 60000
test samples -- 10000
train samples shape -- (60000, 28, 28, 1)
test samples shape -- (10000, 28, 28, 1)


In [0]:
#other important parameters
batch_size = 128
epochs = 20

In [0]:
# we have a class number for each image
# lets convert this into a 10 dimensional vector
# ex: consider an image is 5 convert it into 5 => [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]
# this conversion needed for MLPs
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)

In [0]:
#plot function to generate loss plots
no_of_epochs = list(range(1,21))


def loss_plots(n_epochs,train_loss,validation_loss):

  fig = go.Figure()
  fig.add_trace(go.Scatter(x=n_epochs, y=train_loss,
                  mode='lines+markers',name='train_loss'))

  fig.add_trace(go.Scatter(x=n_epochs, y=validation_loss,
                  mode='lines+markers',name='validation_loss'))

  fig.update_layout(title='Train_loss and Validation_loss',xaxis_title='number of epochs',yaxis_title='Categorical Cross Entropy Loss')
  fig.show()

def accuracy_plots(n_epochs,train_accuracy,validation_accuracy):

  fig = go.Figure()
  fig.add_trace(go.Scatter(x=n_epochs, y=train_accuracy,
                  mode='lines+markers',name='train_accuracy'))

  fig.add_trace(go.Scatter(x=n_epochs, y=validation_accuracy,
                  mode='lines+markers',name='validation_accuracy'))

  fig.update_layout(title='Train_accuracy and Validation_accuracy',xaxis_title='number of epochs',yaxis_title='Accuracy')
  fig.show()

## Model -1 
Basic Architecture of Convnet Only **3 layers of ConvNets** (Without any BatchNormalization or any complex layers) **Without** **padding and MaxPooling layer and with default stride**

In [0]:
model_1 = Sequential()
model_1.add(Conv2D(128,kernel_size=(3,3),
                   activation='relu',
                   input_shape=input_shape))
model_1.add(Conv2D(64,kernel_size=(3,3),
                   activation='relu'))
model_1.add(Conv2D(32,kernel_size=(3,3),
                   activation='relu'))


model_1.add(Dropout(0.25))
model_1.add(Flatten())
model_1.add(Dense(128,activation='relu'))
model_1.add(Dropout(0.25))
model_1.add(Dense(10,activation='softmax'))


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


In [0]:
model_1.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 128)       1280      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 64)        73792     
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 22, 22, 32)        18464     
_________________________________________________________________
dropout (Dropout)            (None, 22, 22, 32)        0         
_________________________________________________________________
flatten (Flatten)            (None, 15488)             0         
_________________________________________________________________
dense (Dense)                (None, 128)               1982592   
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0

In [0]:
model_1.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [0]:
history = model_1.fit(X_train,Y_train,batch_size=batch_size,epochs=epochs,validation_data=(X_test,Y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [0]:
score = model_1.evaluate(X_test,Y_test,verbose=0)

In [0]:
print(f"Test score is {score[0]}")
print(f"Test accuracy is {score[1]}")

Test score is 0.03980802101993977
Test accuracy is 0.9914000034332275


In [0]:
validation_loss = history.history['val_loss']
train_loss = history.history['loss']
train_accuracy = history.history['acc']
validation_accuracy = history.history['val_acc']
loss_plots(no_of_epochs,train_loss,validation_loss)
accuracy_plots(no_of_epochs,train_accuracy,validation_accuracy)

**Observation**
*   There is slight overfit in the model (which can be interpreted from both the plots)
*   ideal epochs might be considered as 2 or 3 in this architecture because when only at that stage both train loss and test loss are close
*   This is very basic architecture without batch normalization and max pooling and padding. Let's see further cases.




## Model - 2 
Slightly complex Architecture of Convnet Only **5 layers of ConvNets** (With BatchNormalization and MaxPooling and slight complex layers) **Without** **padding and with default stride**

In [0]:
from tensorflow.keras.layers import BatchNormalization

In [0]:
model_2 = Sequential()
model_2.add(Conv2D(256,kernel_size=(3,3),
                   activation='relu',
                   input_shape=input_shape))
model_2.add(Conv2D(128,kernel_size=(3,3),
                   activation='relu'))
model_2.add(MaxPooling2D(pool_size=(2,2)))
model_2.add(Conv2D(64,kernel_size=(3,3),
                   activation='relu'))
model_2.add(Conv2D(32,kernel_size=(3,3),
                   activation='relu'))
model_2.add(MaxPooling2D(pool_size=(2,2)))
model_2.add(Conv2D(16,kernel_size=(3,3),
                   activation='relu'))

model_2.add(Flatten())
model_2.add(Dense(128,activation='relu'))
model_2.add(BatchNormalization())
model_2.add(Dropout(0.25))
model_2.add(Dense(64,activation='relu'))
model_2.add(BatchNormalization())
model_2.add(Dropout(0.25))
model_2.add(Dense(10,activation='softmax'))


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


In [0]:
model_2.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 256)       2560      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 24, 24, 128)       295040    
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 128)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 10, 10, 64)        73792     
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 8, 8, 32)          18464     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 4, 4, 32)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 2, 2, 16)          4

In [0]:
model_2.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [0]:
history = model_2.fit(X_train,Y_train,batch_size=batch_size,epochs=epochs,validation_data=(X_test,Y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [0]:
score = model_2.evaluate(X_test,Y_test,verbose=0)

In [0]:
print(f"Test score is {score[0]}")
print(f"Test accuracy is {score[1]}")

Test score is 0.030873713541904362
Test accuracy is 0.9925000071525574


In [0]:
validation_loss = history.history['val_loss']
train_loss = history.history['loss']
train_accuracy = history.history['acc']
validation_accuracy = history.history['val_acc']
loss_plots(no_of_epochs,train_loss,validation_loss)
accuracy_plots(no_of_epochs,train_accuracy,validation_accuracy)

**Observation**
*   Even in this model, there is overfit (which can be interpreted from both the plots)
*   As we have seen in the example of Resnet as no.of.layers are increasing the model's performance is getting worse, the better approach would be to use hyperparameter tuning to find accurate number of epochs, moreover there are architecture and kernel size might also be the issue (which are again hyperparameters).


## Model - 3
Complex Architecture of Convnet Only **6 layers of ConvNets** (With BatchNormalization and MaxPooling and **With** **padding and default stride**)

In [0]:
model_3 = Sequential()
model_3.add(Conv2D(512,kernel_size=(3,3),
                   activation='relu',
                   padding='same',
                   input_shape=input_shape))
model_3.add(Conv2D(128,kernel_size=(3,3),
                   activation='relu',
                   padding='same'))
model_3.add(MaxPooling2D(pool_size=(2,2)))
model_3.add(Conv2D(64,kernel_size=(3,3),
                   activation='relu',
                   padding='same'))
model_3.add(Conv2D(32,kernel_size=(3,3),
                   activation='relu',
                   padding='same'))
model_3.add(MaxPooling2D(pool_size=(2,2)))
model_3.add(Conv2D(16,kernel_size=(3,3),
                   activation='relu',
                   padding='same'))
model_3.add(Conv2D(32,kernel_size=(3,3),
                   activation='relu',
                   padding='same'))
model_3.add(MaxPooling2D(pool_size=(2,2)))

model_3.add(Flatten())
model_3.add(Dense(256,activation='relu'))
model_3.add(BatchNormalization())
model_3.add(Dropout(0.20))
model_3.add(Dense(128,activation='relu'))
model_3.add(BatchNormalization())
model_3.add(Dropout(0.20))
model_3.add(Dense(64,activation='relu'))
model_3.add(BatchNormalization())
model_3.add(Dropout(0.20))
model_3.add(Dense(10,activation='softmax'))


In [0]:
model_3.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_5 (Conv2D)            (None, 28, 28, 512)       5120      
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 28, 28, 128)       589952    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 14, 14, 64)        73792     
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 14, 14, 32)        18464     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 7, 7, 32)          0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 7, 7, 16)         

In [0]:
model_3.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [0]:
history = model_3.fit(X_train,Y_train,batch_size=batch_size,epochs=epochs,validation_data=(X_test,Y_test))

Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [0]:
score = model_3.evaluate(X_test,Y_test,verbose=0)

In [0]:
print(f"Test score is {score[0]}")
print(f"Test accuracy is {score[1]}")

Test score is 0.02541501125908253
Test accuracy is 0.9937000274658203


In [0]:
validation_loss = history.history['val_loss']
train_loss = history.history['loss']
train_accuracy = history.history['acc']
validation_accuracy = history.history['val_acc']
loss_plots(no_of_epochs,train_loss,validation_loss)
accuracy_plots(no_of_epochs,train_accuracy,validation_accuracy)

**Final Observation**

Among all the architectures only basic architecture which we implemented as **model_1 has performed better than other complex architectures**. This might be because of various hyperparameters like no.of.layers , kernel size and padding, stride, number of layers and many more.

# ---------------------------------------------------------------------------------------------------------------------------------------

**IMPORTANT NOTES**


1.   **Question :** If i have an image, i scrambled(reorder the pixels for security of image)
Then trained the CNN on it,
Will CNN learn same things as if trained it on original image.
       
       **ANSWER by Applied AI Course** 
Very good question. In some forms of scrambling like moving part of images like this, CNNs can learn the classes of the orignial images well. But, if you completely randomize the pixels, it would be significantly harder. Geoff Hinton designed a new type of models called capsule networks to work around this limitation of CNNs that they do not respect the relative positions and geometry of objects in an image.



2. **Question:** Why-is-max-pooling-necessary-in-convolutional-neural-networks? Most common convolutional neural networks contains pooling layers to reduce the dimensions of output features. Why couldn't I achieve the same thing by simply increase the stride of the convolutional layer? What makes the pooling layer necessary? 
 
  **Answer:**
You can indeed do that, see [Striving for Simplicity](https://arxiv.org/abs/1412.6806): The All Convolutional Net. Pooling gives you some amount of translation invariance, which may or may not be helpful. Also, pooling is faster to compute than convolutions. Still, you can always try replacing pooling by convolution with stride and see what works better.
 Some current works use average pooling ( [Wide Residual Networks](https://arxiv.org/abs/1605.07146), [DenseNets](https://arxiv.org/abs/1608.06993)), others use convolution with stride ( [DelugeNets](https://arxiv.org/abs/1611.05552))
  https://stats.stackexchange.com/questions/288261/why-is-max-pooling-necessary-in-convolutional-neural-networks




