# MNIST digit recognition

In this notebook I am going to use Keras to make two networks for digit recognition. The first network will be a a simple feed forward network. The second one will include convolutions (filters and max pooling layers) and it will also contain some regularization. The idea is to play around with Keras and see ho much better can we get using CNN networks.

* Downlaod and flatten the MNIST data set and prepare training and test subsets
* Creagte a simple two feed forward network with one hidden layer
* Create a CNN network according to the following architecture:
    * A convolutional layer with 32 filters of size 3 × 3, with a ReLU activation
    * A max pooling layer with size 2 × 2
    * A convolutional layer with 64 filters of size 3 × 3, with ReLU activation
    * A max pooling layer with size 2 × 2
    * A flatten layer
    * A fully connected layer with 128 neurons, with ReLU activation
    * A dropout layer with drop probability 0.5
    * A fully-connected layer with 10 neurons with softmax
* Compare accuracies on test data

## Imports and data loading

In [232]:
# Imports
import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.optimizers import SGD, Adam
from keras.layers import Conv2D, Dense, Dropout, Flatten, MaxPooling2D, Input
from keras.utils import np_utils, to_categorical
from keras.callbacks import Callback
from keras.datasets import mnist
from keras import backend as K

import numpy as np
tf.config.run_functions_eagerly(True) # needed for cnn network. Found on stackoverflow

In [233]:
# Getting the MNIST data set:
def get_mnist(flatten=True):
    '''
        load MIST data using keras datasets
    '''
    (X_train, y_train), (X_test, y_test) = mnist.load_data()

    #standarize:
    X_train = X_train/255
    X_test = X_test/255
    
    if flatten:
        X_train = X_train.reshape((X_train.shape[0], X_train.shape[1]*X_train.shape[2]))
        X_test = X_test.reshape((X_test.shape[0], X_test.shape[1]*X_test.shape[2]))
    
    y_train = to_categorical(y_train, 10)
    y_test  = to_categorical(y_test, 10)
    return X_train, y_train, X_test, y_test 


def shifted(X, shift):
    '''
        increase the image size to size+shift randomly shifting 
    '''
    n = X.shape[0]
    m = X.shape[1]
    size = m + shift
    X_sh = np.zeros((n, size, size))
    for i in range(n):
        sh1 = np.random.randint(shift)
        sh2 = np.random.randint(shift)
        X_sh[i, sh1:sh1+m, sh2:sh2+m] = X[i, :, :]
    return X_sh

data = get_mnist()

## Make a two layer fully connectged feed forwards network
The network architecture will be as follows:

[flat input]--->[512 units with ReLU activation]--->[256 unites with ReLU activation]---->[10 units with softmax]

In [234]:
# Defining the model
layers = [
    Dense(input_dim=28**2, units=512, activation="relu"),
    Dense(units=256, activation="relu"),
    Dense(units = 10, activation='softmax')
]

model = Sequential(layers)
model.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=["accuracy"])
model.summary()

Model: "sequential_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_26 (Dense)            (None, 512)               401920    
                                                                 
 dense_27 (Dense)            (None, 256)               131328    
                                                                 
 dense_28 (Dense)            (None, 10)                2570      
                                                                 
Total params: 535,818
Trainable params: 535,818
Non-trainable params: 0
_________________________________________________________________


In [235]:
# function that will run the model a few times and return the average test accuracy
def run_model(model, data, batch_size, epochs, split, verbose, num_trials):
    
    X_train, y_train, X_test, y_test = data

    global_test_acc = 0
    
    for i in range(num_trials):
        print('trial number: {} --------------------------'.format(i+1))
        model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=split, verbose=verbose)
        test_loss, test_acc = model.evaluate(X_test, y_test, batch_size=batch_size, verbose=0)
        
        global_test_acc += test_acc
        
        # reset the weights after each trial but not for the last, as I want to keep a trained model
        if i<num_trials-1:
            for ix, layer in enumerate(model.layers):
                if hasattr(model.layers[ix], 'kernel_initializer') and hasattr(model.layers[ix], 'bias_initializer'):
                    weight_initializer = model.layers[ix].kernel_initializer

                    bias_initializer = model.layers[ix].bias_initializer

                    old_weights, old_biases = model.layers[ix].get_weights()

                    model.layers[ix].set_weights([
                        weight_initializer(shape=old_weights.shape),
                        bias_initializer(shape=old_biases.shape)])
    print('\nThe tes accuracy in {} trials is {:.3f}'.format(num_trials, global_test_acc/num_trials))    
    return global_test_acc/num_trials

I will now train the network with 3 epochs and a batch size of 32 and see what is its accuracy. I also use a split=0.1, to so that the network, for each epoch, trains on 90% of the data. This is to see if it is not overfitting. 

In [236]:
ff_network_acc = run_model(
                    model,
                    data, 
                    batch_size=32,
                    epochs=1,
                    split=0.1,
                    verbose=2,
                    num_trials=3
                )


trial number: 1 --------------------------
1688/1688 - 73s - loss: 0.1985 - accuracy: 0.9395 - val_loss: 0.0889 - val_accuracy: 0.9735 - 73s/epoch - 43ms/step
trial number: 2 --------------------------
1688/1688 - 69s - loss: 0.1901 - accuracy: 0.9412 - val_loss: 0.0949 - val_accuracy: 0.9707 - 69s/epoch - 41ms/step
trial number: 3 --------------------------
1688/1688 - 54s - loss: 0.1865 - accuracy: 0.9431 - val_loss: 0.0868 - val_accuracy: 0.9733 - 54s/epoch - 32ms/step

The tes accuracy in 3 trials is 0.969


This network seems to do a pretty good job with an accurac of 97%. Just to convince myself that this is really the case, let me try to select at random 5 digits and see if the network gets them all correctly. It should with the following probability:

In [237]:
def sanity_check(model,data, num,ff_network_acc):
    prob = ff_network_acc**num
    print('The probabilty of making no mistakes in predicting {} ranodm digits is {}'.format(num, prob))
    print()

    x,y = (data[2], data[3])
    ind = np.random.choice(range(x.shape[0]), size=num, replace=False, p=None)
    mistakes = 0
    for i in ind:
        true_val = np.argmax(y[i])
        network_pred = np.argmax(model.predict(x[[i],:]))
        print('The digit under {} is {}'.format(i, true_val))
        print('The network prediction is  {}'.format(network_pred))
        print()
        if int(true_val) != int(network_pred):
            mistakes += 1

    print('The network mande {} mistakes in 5 randomly chosen digits'.format(mistakes))
    

sanity_check(model,data,5 ,ff_network_acc)


The probabilty of making no mistakes in predicting 5 ranodm digits is 0.8560814684790844

The digit under 9538 is 4
The network prediction is  4

The digit under 7963 is 8
The network prediction is  8

The digit under 1914 is 8
The network prediction is  8

The digit under 8686 is 5
The network prediction is  5

The digit under 1428 is 9
The network prediction is  9

The network mande 0 mistakes in 5 randomly chosen digits


## Convolutional Neural Network

The fully connected feed forward network does a pretty good job, but I want to see if a CNN network can do a little better. I am going to use the following architecture, which is not a result of my trial and error, but it was suggested in a homework assignment (course 6.036 MIT)
* A convolutional layer with 32 filters of size 3 × 3, with a ReLU activation
* A max pooling layer with size 2 × 2
* A convolutional layer with 64 filters of size 3 × 3, with ReLU activation
* A max pooling layer with size 2 × 2
* A flatten layer
* A fully connected layer with 128 neurons, with ReLU activation
* A dropout layer with drop probability 0.5
* A fully-connected layer with 10 neurons with softmax




In [240]:
layers_cnn = [
            Conv2D(input_shape = (28,28,1), filters=32 , kernel_size=(3, 3), activation='relu'),
            MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='valid'),
            Conv2D(filters=64 , kernel_size=(3, 3), activation='relu'),
            MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='valid'),
            Flatten(),
            Dense(units = 128, activation='relu'),
            Dropout(0.5),
            Dense(units = 10, activation='softmax')
]

model_cnn = Sequential(layers_cnn)
model_cnn.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=["accuracy"])
model_cnn.summary()

Model: "sequential_17"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_19 (Conv2D)          (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_16 (MaxPoolin  (None, 25, 25, 32)       0         
 g2D)                                                            
                                                                 
 conv2d_20 (Conv2D)          (None, 23, 23, 64)        18496     
                                                                 
 max_pooling2d_17 (MaxPoolin  (None, 22, 22, 64)       0         
 g2D)                                                            
                                                                 
 flatten_8 (Flatten)         (None, 30976)             0         
                                                                 
 dense_31 (Dense)            (None, 128)             

Now I will train the cnn network and see if it does any better. I will use only one trial and one epoch due becuse otherwise the learning takes a long time.


In [241]:
data_cnn = get_mnist(flatten=False)

cnn_network_acc = run_model(
                    model_cnn,
                    data_cnn, 
                    batch_size=32,
                    epochs=1,
                    split=0.1,
                    verbose=2,
                    num_trials=1
                )


trial number: 1 --------------------------
1688/1688 - 222s - loss: 0.1702 - accuracy: 0.9487 - val_loss: 0.0393 - val_accuracy: 0.9897 - 222s/epoch - 132ms/step

The tes accuracy in 1 trials is 0.985


## Comparing the networks

Looking at sheer accuracy, the CNN networks performs slightly better as expected. I also want to observe if the CNN network is more robust when working with images of digits thare are somewhat uncentered. To do this I wll make another data set that will be randomly shifted and I will compare both the CNNN and the fully connected feed forwards network in this shifted data.

In [242]:
# shift the data sets by 10 (randomly)
data_sh_cnn = (shifted(data_cnn[0],20), data_cnn[1], shifted(data_cnn[2],20), data_cnn[3])

data_sh_fc = ( 
            data_sh_cnn[0].reshape((data_sh_cnn[0].shape[0], data_sh_cnn[0].shape[1]*data_sh_cnn[0].shape[2])),
            data_sh_cnn[1],
            data_sh_cnn[2].reshape((data_sh_cnn[2].shape[0], data_sh_cnn[2].shape[1]*data_sh_cnn[2].shape[2])),
            data_sh_cnn[3]
            )



In [243]:
# change the architecture, as the first input layer now takes in different input size
layers_sh = [
            Dense(input_dim=48**2, units=512, activation="relu"),
            Dense(units=256, activation="relu"),
            Dense(units = 10, activation='softmax')
]

layers_cnn_sh = [
            Conv2D(input_shape = (48,48,1), filters=32 , kernel_size=(3, 3), activation='relu'),
            MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='valid'),
            Conv2D(filters=64 , kernel_size=(3, 3), activation='relu'),
            MaxPooling2D(pool_size=(2, 2), strides=(1, 1), padding='valid'),
            Flatten(),
            Dense(units = 128, activation='relu'),
            Dropout(0.5),
            Dense(units = 10, activation='softmax')
]



model_sh = Sequential(layers_sh)
model_cnn_sh = Sequential(layers_cnn_sh)

model_sh.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=["accuracy"])
model_cnn_sh.compile(loss='categorical_crossentropy', optimizer=Adam(), metrics=["accuracy"])


In [244]:
ff_network_acc_sh = run_model(
                    model_sh,
                    data_sh_fc, 
                    batch_size=32,
                    epochs=1,
                    split=0.1,
                    verbose=2,
                    num_trials=1
                )


trial number: 1 --------------------------
1688/1688 - 55s - loss: 0.7869 - accuracy: 0.7385 - val_loss: 0.3701 - val_accuracy: 0.8828 - 55s/epoch - 33ms/step

The tes accuracy in 1 trials is 0.875


In [None]:
cnn_network_acc_sh = run_model(
                    model_cnn_sh,
                    data_sh_cnn, 
                    batch_size=32,
                    epochs=1,
                    split=0.1,
                    verbose=2,
                    num_trials=1
                )


trial number: 1 --------------------------


In [None]:
print('The accuracy of the fully connected feed forward NN dropped from {:.3f} to {:.3f}\n'
      .format(ff_network_acc, ff_network_acc_sh))

      
print('The accuracy of the CNN network dropped from {:.3f} to {:.3f}'
      .format(cnn_network_acc, cnn_network_acc_sh))

