# Deep Neural Network for MNIST Classification

The objective of of this project is to write a deep neural a network algorithm that correctly detects hand written digits. This is a classification problem with 10 classes because there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9). 

The MNIST problem is often called the "Hello World!" of deep learning.

The dataset is called MNIST and refers to handwritten digit recognition. The dataset is a set of 70,000 small images of digits handwritten by high school students and employees of the US Census Bureau.  
The 70,000 images each image have 784 features. This is because each image is 28 × 28 pixels, and each feature simply represents one pixel’s intensity, from 0 (white) to 255 (black). Each image is a single digit.

The project's authors are Yann LeCun, Corinna Cortes, CJ Burges.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

## Loading and preprocessing the data:
Begin by downloading and loading the data with `tfds.load`:

In [2]:
# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# as_supervised=True will load the dataset in a 2-tuple structure (input, target)
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

Now, extract the training and testing dataset with the built references:

In [3]:
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

Set aside a validation set from the train set.

In [5]:
# define the number of validation samples as a % of the train samples
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
# cast this number to an integer, as a float may cause an error down the line.
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

# store the number of test samples in a dedicated variable and cast to an integer.
num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

Next, the data inputs will be scaled from a value range of [0,255] to [0,1] in order to make the result more numerically stable, and is likely to improve the performance of the model.  
A function will be defined to do this:

In [6]:
# the function will will take an MNIST image and its label
def scale(image, label):
    # cast the values to float
    image = tf.cast(image, tf.float32)
    image /= 255.

    return image, label

# map this custom transformer with the dataset using `.map()`
scaled_train_and_validation_data = mnist_train.map(scale)

# finally, we scale test data so it has the same magnitude as the train and validation
test_data = mnist_test.map(scale)

The next preprocessing step is to set a buffer size, and shuffle the train and validation data.  
Shuffling is important because it will prevent the algorithm from assigning importance to the arrangement order of the training set, and ensure that the training instances in each batch is representative of the entire dataset, including as many of the target classes as possibe, does not just only one or a small number of the total classes.  
The test data does not need to be shuffled as it will not be partaking in the training process.

The buffers size is a hyperparameter that is set to account for cases where the dataset is enormous and shuffling might not be possible in one go because it cannot fit into the memory all at once.  
The dataset will be loaded into the memory per time according to the buffer size. This technique also optimizes the system's performance.

The train and validation data will be shuffled using `.shuffle()` and set the buffer size as a parameter.

In [7]:
BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

# Extract the validation data, which is 10% of the training set
# we use the .take() method to take that many samples
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

# the train_data is everything else, so we skip as many samples as there are in the validation dataset.
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

Now a batch size will be set. This is another hyperparameter that may need to be tuned during the training process.
Batching involves splitting the train data into the set number of batches; training the model on each set, and updating the weights right after.

We must set a batch size for the train data in order to utilise the more efficient Mini-batch Gradient Descent as against Batch Gradient Descent.  
The weights are only updated once per batch. Updating the weights in smaller batches rather than the entire batch at once is proven to be a more efficient way to update weights.

The validation data does not need to be batched because there will only be forward propagation and no backpropagation since we are only interested in the errors not updating the weights; when batching we usually find the average loss and average accuracy, but during validation and testing, we want the exact values, therefore we should take all the data at once.  
However, we should still batch it because the model expects the validation data in batch form too. 
But here, the batch size will be set to the number of the validation samples - this will make it only a single batch. This same logic will be applied for the test data.

In [8]:
# determine the batch size
BATCH_SIZE = 100
train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)  # will have only one batch.

# batch the test data
test_data = test_data.batch(num_test_samples)

# split 2-tuple structure of the validation data to separate the input features from the targets.
validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model

In [12]:
# define the input and output sizes
input_size = 784
output_size = 10

# define hidden layer size.
hidden_layer_size = 200
    
# outline the model
model = tf.keras.Sequential([
    # Flatten the array to a (784,) vector.
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    # model the hidden layers of the network
    # set the hidden layer sizes (width of the network), and activation function to be used.
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 3rd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 4th hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 5th hidden layer
    
    
    # define the output layer and define its activation function. 
    # Softmax is the most suitable because we are dealing with a classification problem.
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function
Define the optimizer to use (here we'll choose the 'Adaptive Moment Estimation optimizer (ADAM)'), the loss function ('sparse categorical cross entropy' - which will apply one-hot encoding for us since the targets are one-hot encoded), and the model evaluation metric ('accuracy' - which models the accuracy of the model's predictions) that we are interested in obtaining at each iteration.

In [14]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training
That's where we train the model we have built.

In [15]:
# determine the maximum number of epochs
NUM_EPOCHS = 5

# Fit the model to the data.
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/5
540/540 - 5s - loss: 0.2731 - accuracy: 0.9158 - val_loss: 0.1410 - val_accuracy: 0.9568
Epoch 2/5
540/540 - 7s - loss: 0.1080 - accuracy: 0.9674 - val_loss: 0.0840 - val_accuracy: 0.9758
Epoch 3/5
540/540 - 7s - loss: 0.0763 - accuracy: 0.9769 - val_loss: 0.0700 - val_accuracy: 0.9793
Epoch 4/5
540/540 - 6s - loss: 0.0622 - accuracy: 0.9808 - val_loss: 0.0728 - val_accuracy: 0.9780
Epoch 5/5
540/540 - 7s - loss: 0.0515 - accuracy: 0.9835 - val_loss: 0.0611 - val_accuracy: 0.9818


<tensorflow.python.keras.callbacks.History at 0x1c076de28e0>

The model gives us an accuracy of 98% on the validation data.

## Test the model

The final step is to test the trained model on data that it has not 'seen' before. Fiddling with the hyperparameters introduces the risk of the model overfitting the validation dataset. Therefore, it is important to evaluate the model on unseen data as it measures the true performance of the model and help to expose whether or not the model has overfit the training data.  

The evaluated performance of the model on the test data can be viewed as the true pointer of the model's performance in deployment.

In [16]:
test_loss, test_accuracy = model.evaluate(test_data)



In [17]:
# Presenting the result in a nicely formatted way.
print(f'Test loss: {test_loss:.2f}. Test accuracy: {(test_accuracy*100.):.2f}%')

Test loss: 0.10. Test accuracy: 97.21%


## Conclusion:
We have successfully trained a deep neural network that accurately predicts hand-written digits with a 97.21% accuracy.  

This was achieved by outlining a deep learning model with an input size of (781,), five (5) hidden layers each with a width size of 200, a 'Rectified Linear Unit (RELU)' activation function, an output layer of size 10, for each of the digit classes, and a 'Softmax' activation function - which is the preferred function for the output layer of classification problems.  
The model was compiled using the 'Adaptive Moment Estimation (ADAM)' optimizer, a 'Sparse categorical Entropy' loss function, and an evaluation metric of 'Accuracy'.

The model was fit to the training data, trained in 5 epochs, and validated on the validation data. The validation accuracy was calculated to be 98.18% and when tested on the test data produced an accuracy of 97.21%. These are similar numbers therefore it proves that the model did not overfit the train data, and will be expected to perform as well in deployment.