# Handwritten digits classification with a Feed-Forward Neural network

The notebook uses the MNIST database. The MNIST dataset containss 70,000 grayscale images of handwritten digits at a resolution of 28 by 28 pixels. 

<a href="https://en.wikipedia.org/wiki/MNIST_database"><img src="https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png"/></a>

   
The task is to take one of these images as input and predict the most likely digit contained in the image (along with a relative confidence in this prediction):
<a href="https://colab.research.google.com/github/lexfridman/mit-deep-learning/blob/master/tutorial_deep_learning_basics/deep_learning_basics.ipynb"><img src="https://i.imgur.com/ITrm9x4.png" width="500px"></a>
    
    


## Enabling and testing the GPU in Colab

    Navigate to Edit
        →Notebook Settings.
    select GPU from the Hardware Accelerator drop-down.



In [None]:
#import the needed modules
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import os

## The dataset is available through TensorFlow 
create train and test dataset

In [None]:
(x_train,y_train), (x_test,y_test) = tf.keras.datasets.mnist.load_data()

## Data exploration and visualization

In [None]:
#how many samples are available?
print('Shape of x_train: {}'.format(x_train.shape))
print('Shape of y_train: {}'.format(y_train.shape))
print('Shape of x_test: {}'.format(x_test.shape))
print('Shape of y_test: {}'.format(y_test.shape))

In [None]:
#plot 4 example of handwritten digits from the train dataset
idxes =  [np.random.randint(60000) for i in range(4)]
plt.figure(figsize=(15,8))
for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(x_train[idxes[i]],cmap='Greys')
    plt.title('Label = {}'.format(y_train[idxes[i]]))


## Dataset balanced?

In [None]:
from collections import Counter

train_class_counter = Counter(y_train)
plt.figure()
plt.bar(train_class_counter.keys(),train_class_counter.values())
plt.title('Class distribution in training dataset')


test_class_counter = Counter(y_test)
plt.figure()
plt.bar(test_class_counter.keys(),test_class_counter.values())
plt.title('Class distribution in test dataset')

## Data standardization
The input values has to be scaled to a range of 0 to 1 before feeding to the neural network model. 
For this, we divide the values by 255. Since every pixel is represented by a 8bit color-value, and the range for each individual color is 0-255 (as 2^8 = 256 possibilities).

In [None]:
#we scale data to ease training
x_train, x_test = x_train / 255.0, x_test / 255.0

## Model definition for FFNN 
We use tf.keras to define models by means of a simple, high-level library.
Each layer is defined by some common parameters:
- **activation**: the activation function for the layer. This parameter is specified by the name of a built-in function or as a callable object. By default, no activation is applied.
- **kernel_initializer** and **bias_initializer**: the initialization schemes that create the layer's weights (kernel and bias). This parameter is a name or a callable object. This defaults to the "Glorot uniform" initializer.
- **kernel_regularizer** and **bias_regularizer**: The regularization schemes that apply the layer's weights (kernel and bias), such as L1 or L2 regularization. By default, no regularization is applied.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Flatten,Dropout

def create_model():
    model = Sequential(
    [
        Flatten(input_shape=(28,28)),
        Dense(512,activation='relu'),

    ])

    # The compile step specifies the training configuration.
    model.compile(loss='sparse_categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])
    return model

model = create_model()

In [None]:
model.summary()

In [None]:
tf.keras.utils.plot_model(model)

## Training

In [None]:
history = model.fit(x_train,y_train,batch_size=32,epochs=10,validation_split=0.1)

In [None]:
plt.figure()
plt.plot(history.history['accuracy'],label='training accuracy')
plt.plot(history.history['val_accuracy'],label='validation accuracy')
plt.grid()
plt.title('Training vs validation accuracy')
plt.legend()

## Evaluation

In [None]:
[test_loss, test_accuracy] = model.evaluate(x_test,y_test)
print('Test accuracy: {}'.format(test_accuracy))

y_pred = np.argmax(model.predict(x_test), axis=-1)

#check against the ground truth
mismatch = np.where(y_pred!=y_test)[0]

In [None]:
mismatch_class_counter = Counter(y_test[mismatch])
mismatch_percentage = dict()
for digit in test_class_counter.keys():
    mismatch_percentage[digit]=mismatch_class_counter[digit]/test_class_counter[digit]*100
plt.figure()
plt.bar(mismatch_percentage.keys(),mismatch_percentage.values())
plt.title('Class distribution for mismatch in label prediction')
plt.show()

In [None]:
np.random.shuffle(mismatch)
idxes =  mismatch[:4]
plt.figure(figsize=(15,8))
for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(x_test[idxes[i]],cmap='Greys')
    plt.title('True label = {}\nPredicted label: {}'.format(y_test[idxes[i]],y_pred[idxes[i]]))

In [None]:
from sklearn.metrics import confusion_matrix

cm = np.log10(1+confusion_matrix(y_test,y_pred))
fig,ax = plt.subplots(figsize=[12,10])
im = ax.imshow(cm, interpolation='nearest',cmap='Blues')
ax.figure.colorbar(im, ax=ax)
plt.xlabel('Predicted labels')
plt.ylabel('True labels')
plt.show()


## Exercises
- What happens if dropout is removed from the model?
- What happens if we increase the number of epochs?
- Does it help to add more dense layers?
- Does it help to change the activation function, options are listed <a href="https://keras.io/api/layers/activations/#available-activations">here</a>?
- Have a look at the digits that are wrongly recognized by the FFNN. Can you correctly classify them?

Report in the following table the accuracy for every task and comment the results


<table>
    <tr>
        <th> Task </th><th> Accuracy </th><th> Validation Accuracy </th><th> Test Accuracy </th>
    </tr> 
    <tr>
        <td> Increase N. Epochs </td><td> ... </td><td> ... </td><td> ... </td>
    </tr>
    <tr>
        <td> Remove Dropout </td><td> ... </td><td> ... </td><td> ... </td>
    </tr> 
    <tr>
        <td> Add 1 Dense layer </td><td> ... </td><td> ... </td><td> ... </td>
    </tr> 
    <tr>
        <td> Change activation function of first Dense layer </td><td> ... </td><td> ... </td><td> ... </td>
    </tr> 
</table>    


In [None]:
#change the model in accordance with the task

from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense


newmodel = keras.Sequential()
newmodel.add(Flatten(input_shape=(28,28)))
newmodel.add(Dense(512,activation='relu'))
newmodel.add(Dropout(0.5))
newmodel.add(Dense(10,activation='softmax'))

newmodel.compile(loss='sparse_categorical_crossentropy',
                  optimizer='adam',
                  metrics=['accuracy'])

n_epochs = 10



In [None]:
#check the results of your modified model running this cell


newhistory = newmodel.fit(x_train,y_train,batch_size=32,epochs=n_epochs,validation_split=0.1)
plt.figure()
plt.plot(newhistory.history['accuracy'],label='training accuracy')
plt.plot(newhistory.history['val_accuracy'],label='validation accuracy')
plt.grid()
plt.title('Training vs validation accuracy')
plt.legend()
[test_loss, test_accuracy] = newmodel.evaluate(x_test,y_test)
print('Test accuracy: {}'.format(test_accuracy))

y_pred = np.argmax(newmodel.predict(x_test), axis=-1)

#check against the ground truth
mismatch = np.where(y_pred!=y_test)[0]

mismatch_class_counter = Counter(y_test[mismatch])
mismatch_percentage = dict()
for digit in test_class_counter.keys():
    mismatch_percentage[digit]=mismatch_class_counter[digit]/test_class_counter[digit]*100
plt.figure()
plt.bar(mismatch_percentage.keys(),mismatch_percentage.values())
plt.title('Class distribution for mismatch in label prediction')
plt.show()

cm = np.log(1+confusion_matrix(y_test,y_pred))

fig,ax = plt.subplots()
im = ax.imshow(cm, interpolation='nearest',cmap='Reds')
ax.figure.colorbar(im, ax=ax)
plt.xlabel('Predicted labels')
plt.ylabel('True labels')