# MNIST digit classification with a Convolutional Neural Network (CNN)

***

**Goal** In this notebook, we will train a CNN to classify handwritten digits from the MNIST dataset. We will use the Keras API with the TensorFlow backend.

**Dataset**: We will work with the MNIST dataset which contains 60'000 28x28 pixel greyscale images of digits and want to classify them into the right label (0-9).

***

## Preparation and Imports

A pre requirement for this notebook is the installation of tensorflow 2.x

In [None]:
import tensorflow as tf
if (not tf.__version__.startswith('2')): #Checking if tf 2.0 is installed
    print('Please install tensorflow 2.0 to run this notebook')
print('Tensorflow version: ',tf.__version__)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('default')
from sklearn.metrics import confusion_matrix

import tensorflow.keras as keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Convolution2D, MaxPooling2D, Flatten , Activation
from tensorflow.keras.utils import to_categorical 
from tensorflow.keras import optimizers

In [None]:
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# separate x_train in X_train and X_val, same for y_train
X_train=x_train[0:50000] / 255 #divide by 255 so that they are in range 0 to 1
Y_train=keras.utils.to_categorical(y_train[0:50000],10) # one-hot encoding

X_val=x_train[50000:60000] / 255
Y_val=keras.utils.to_categorical(y_train[50000:60000],10)

X_test=x_test / 255
Y_test=keras.utils.to_categorical(y_test,10)

del x_train, y_train, x_test, y_test

X_train=np.reshape(X_train, (X_train.shape[0],28,28,1))
X_val=np.reshape(X_val, (X_val.shape[0],28,28,1))
X_test=np.reshape(X_test, (X_test.shape[0],28,28,1))

print(X_train.shape)
print(X_val.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_val.shape)
print(Y_test.shape)

To visualize the data we will use the code below to show the first 10 images of the training set.

In [None]:
# visualize the 4 first mnist images before shuffling the pixels
plt.figure(figsize=(12,12))
for i in range(0,2):
    for j in range(0, 5):
        plt.subplot(5,5,(i*10+j+1))
        plt.imshow((X_train[i*10+j,:,:,0]),cmap="gray")
        plt.title('true label: '+np.str(np.argmax(Y_train,axis=1)[i*10+j]))

Running the following code will create a sequential model with the layers like described in the hw5 task. This will lead to extremely bad results (~12% accuracy).

In [None]:
model = Sequential()

model.add(Convolution2D(32, (3, 3), padding = 'valid', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, (3, 3), padding = 'valid'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(10))

Adding a `softmax` activation function to the last layer will lead to a better result (~98% accuracy). This is because the softmax function will normalize the output of the last layer to a probability distribution???

In [None]:
model = Sequential()

model.add(Convolution2D(32, (3, 3), padding = 'valid', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, (3, 3), padding = 'valid'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(10))
model.add(Activation('softmax'))

## Compile and Train Model

In [None]:
# compile model and intitialize weights
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
# summarize model along with number of model weights
model.summary()

In [None]:
# train the model
history=model.fit(X_train, Y_train, 
                  batch_size=128, 
                  epochs=10,
                  verbose=2, 
                  validation_data=(X_val, Y_val)
                 )

## Evaluation of the Model

### Evaluation: Accuracy and Loss Diagram

To evaluate the performance of the network we will plot the accuracy and loss diagram. The accuracy is the percentage of correctly classified images and the loss is the error of the network. The loss should decrease over time and the accuracy should increase.

In [None]:
# plot the development of the accuracy and loss during training
plt.figure(figsize=(12,4))
plt.subplot(1,2,(1))
plt.plot(history.history['accuracy'],linestyle='-.')
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='lower right')
plt.subplot(1,2,(2))
plt.plot(history.history['loss'],linestyle='-.')
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'valid'], loc='upper right')

### Evaluation: Confusion Matrix

The confusion matrix is a good way to visualize the performance of the network. It shows how many images were classified correctly and how many were classified incorrectly. The confusion matrix is a square matrix with the number of classes as the dimension. The diagonal of the matrix shows the number of correctly classified images. The off-diagonal elements show the number of incorrectly classified images.

In [None]:
pred=model.predict(X_test)
print(confusion_matrix(np.argmax(Y_test,axis=1),np.argmax(pred,axis=1)))
acc_fc_orig = np.sum(np.argmax(Y_test,axis=1)==np.argmax(pred,axis=1))/len(pred)
print("Acc_fc_orig_flat = " , acc_fc_orig)