# Deep Neural Network with Tensorflow for MNIST Classification

## Author/Data-Scientist: Leon Hamnett

### [LinkedIn](https://www.linkedin.com/in/leon-hamnett/)

#### Contents:

1. [Introduction](#introduction)
2. [Preprocessing](#prepro)
3. [Building the model](#model)
4. [Training the model](#train)
5. [Testing and Conclusion](#test)

### Introduction: <a name="introduction"></a>

In this notebook, we will use Tensorflow to create a deep neural network to apply a machine learning algorithm to classify/recognise handwritten digits.

The dataset is called MNIST and refers to handwritten digit recognition. This is a well known dataset within the machine learning community. The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image).  The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

We will build a neural network with 3 hidden layers to achieve this goal.

## Import the relevant packages

In [3]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
 

## Preprocess Data <a name="prepro"></a>

In this section we will load our data from the MNIST dataset into an appropiate format and then preprocess it so it can be easily fed into the machine learning algorithm.

In [4]:
#load dataset and info about the MNIST data
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

#extract training and test datasets
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

# by default, TF has training and testing datasets, but no validation sets thus we will split it on our own

# we start by defining the number of validation samples as a % of the train samples
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
# we will cast this number to an integer, as a float may cause an error along the way
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

#We will also store the number of test samples in a dedicated variable (instead of using the mnist_info one)
num_test_samples = mnist_info.splits['test'].num_examples
# change to integer
num_test_samples = tf.cast(num_test_samples, tf.int64)


# normally, we would like to scale our data in some way to make the result more numerically stable
# in this case we will simply prefer to have inputs between 0 and 1

# define a scale function that takes an image as well as the label (target) for that image
def scale(image, label):
    image = tf.cast(image, tf.float32)
    # since the possible values for the inputs are 0 to 255 (256 different shades of grey)
    # We  can divide each element by 255 so that all elements will be between 0 and 1 
    image /= 255.

    return image, label


# we have already decided that we will get the validation data from mnist_train, so assign a variable  for this and apply the scale function
scaled_train_and_validation_data = mnist_train.map(scale)

# finally, we scale and batch the test data
# we scale it so it has the same magnitude as the train and validation
# there is no need to shuffle it, because we won't be training on the test data
# there would be a single batch, equal to the size of the test data
test_data = mnist_test.map(scale)


# We will also shuffle the data

#set buffer size for larger datasets
BUFFER_SIZE = 10000

# shuffle the training and validation data
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

# once we have scaled and shuffled the data, we can proceed to separating the train and validation data
# our validation data would be equal to 10% of the training set, which we've already calculated
# we use the .take() method to take that many samples
# finally, we create a batch with a batch size equal to the total number of validation samples
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)

# similarly, the train_data is everything else, so we skip as many samples as there are in the validation dataset
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

# set the batch size
BATCH_SIZE = 100

# we can also take advantage of the occasion to batch the train data
# this would be very helpful when we train, as we would be able to iterate over the different batches
train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)

# batch the test data
test_data = test_data.batch(num_test_samples)


# takes next batch (it is the only batch)
# because as_supervized=True, we've got a 2-tuple structure
validation_inputs, validation_targets = next(iter(validation_data))



## Model <a name="model"></a>

### Outline the model

In this section we will design and code the deep neural net that we will use to employ the machine learning model. We set an input size equal to the number of pixels in each image as well as an output equal to the 10 classifications that are possible (0-9) from this dataset.

In [10]:
# input size = 784 as there is 28 x 28 pixels in each image
input_size = 784
# output size = 10, as 10 possible numbers, we will use one-hot encoding fo each output/target
output_size = 10
# Use same hidden layer size for both hidden layers for simplicity
hidden_layer_size = 100
    
# define how the model will look like
model = tf.keras.Sequential([
    
    # the first layer (the input layer)
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
    # need to flatten the image
    # there is a convenient method 'Flatten' that simply takes our 28x28x1 tensor and orders it into a (None,) 
    # or (28x28x1,) = (784,) vector
    # this allows us to actually create a feed forward neural network
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 3rd hidden layer
    
    # the final layer is no different, we just make sure to activate it with softmax to get probabilities
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

### Choose the optimizer and the loss function

In [11]:
# we define the optimizer we'd like to use, 
# the loss function, 
# and the metrics we are interested in obtaining at each iteration
# we choose adam optimizer as this is most cutting edge in machine learning 
# we choose the appropiate loss mechanism for out categorical data
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training <a name="train"></a>

In this section we will train the machine learning model on the training data.

In [12]:
# determine the maximum number of epochs
NUM_EPOCHS = 10

# we fit the model, specifying the
# training data
# the total number of epochs
# and the validation data we just created ourselves in the format: (inputs,targets)
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/10
540/540 - 20s - loss: 0.3176 - accuracy: 0.9064 - val_loss: 0.1530 - val_accuracy: 0.9560
Epoch 2/10
540/540 - 29s - loss: 0.1223 - accuracy: 0.9632 - val_loss: 0.1133 - val_accuracy: 0.9670
Epoch 3/10
540/540 - 28s - loss: 0.0876 - accuracy: 0.9731 - val_loss: 0.0841 - val_accuracy: 0.9752
Epoch 4/10
540/540 - 33s - loss: 0.0679 - accuracy: 0.9785 - val_loss: 0.0728 - val_accuracy: 0.9772
Epoch 5/10
540/540 - 38s - loss: 0.0555 - accuracy: 0.9826 - val_loss: 0.0526 - val_accuracy: 0.9840
Epoch 6/10
540/540 - 26s - loss: 0.0440 - accuracy: 0.9858 - val_loss: 0.0501 - val_accuracy: 0.9845
Epoch 7/10
540/540 - 26s - loss: 0.0358 - accuracy: 0.9891 - val_loss: 0.0452 - val_accuracy: 0.9858
Epoch 8/10
540/540 - 27s - loss: 0.0324 - accuracy: 0.9894 - val_loss: 0.0395 - val_accuracy: 0.9878
Epoch 9/10
540/540 - 24s - loss: 0.0306 - accuracy: 0.9900 - val_loss: 0.0322 - val_accuracy: 0.9887
Epoch 10/10
540/540 - 24s - loss: 0.0237 - accuracy: 0.9920 - val_loss: 0.0353 - val_accura

<tensorflow.python.keras.callbacks.History at 0x7f490c328100>

## Test the model <a name="test"></a>

Now after training the model on the training data and validating on the validation data, we test the final prediction power of our model by running it on the test dataset that the algorithm has never seen before.

It is very important to realize that fiddling with the hyperparameters overfits the validation dataset. 

The test is the absolute final instance. We should not test before we are completely done with adjusting your model.

If we adjust the model after testing, we can start to see some overfitting on the test dataset, which will defeat its purpose.

In [13]:
test_loss, test_accuracy = model.evaluate(test_data)



In [14]:
# We can apply some nice formatting if we want to
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.09. Test accuracy: 97.65%


## Conclusion:

We can see from the testing accuracy that the model can correctly predict roughly 97% of handwritten digits when presented in the format of a centered 28 x 28 grayscale image. Depending on the reason the deep neural net model is being deployed this could be an acceptable level of accuracy. However if even further accuracy is required, more hidden layers could be added, the effect of using different activation functions could be investigated and also if more data could be obtained to train the model, this would also increase the accuracy of the model.