## Deep Neural Network with the beautiful MNIST dataset

Game plan:

		○ Import libraries, acquire data, and pre-process it
        ○ Outline the model and choose the activation functions we want to employ
		○ Describe placeholders, variables, and related operations
		○ Choose the appropriate advanced optimisers
		○ Split data set into batches for faster learning
		○ Initialise the variables
		○ Make the model learn
        ○ Test accuracy of the model!

### Description of the exercise

There are 70,000 images of handwritten digits. The goal is to build an algorithm that classifies them correctly (0-9). Each photo is 28x28 pixels, so can think about the problem as a 28x28 matrix with values from 0 to 1. The approach for deep neural networks is to "flatten" each 28x28 image into a vector of 784x1. Each image would have 784 inputs, each pixel is an input for our neural network. Each pixel corresponds to the intensity of colour (0:completely white to 1:completely black). For this example, there will be 784 inputs, 2 hidden layers (enough for good accuracy), and 10 outputs (numbers 0-9). One-hot encoding for the outputs and the targets. Since we want to see probabilities of digits being accurately labelled, we will use softmax activation function for the output layer.

### Import libraries

In [1]:
import numpy as np
import tensorflow as tf

#import the datasets where MNIST will be imported from
import tensorflow_datasets as tfds

### Collect and pre-process data

In [2]:
#loads the MNIST data set from the tensorflow datasets
# as_supervised = True: loads the dataset in a 2-tuple structure [input,target]
# mnist_info and with_info = True: provides a tuple containing info about version, features, number of samples in dataset
mnist_dataset, mnist_info = tfds.load("mnist", with_info = True, as_supervised = True)

In [3]:
# extracting the train and test datasets from MNIST
# note that MNIST does not provide a validation dataset, need to split the train dataset further
mnist_train, mnist_test = mnist_dataset["train"], mnist_dataset ["test"]

# let's take 10% of the train dataset to serve as validation, get the number of training samples /10
num_validation_samples = 0.1 * mnist_info.splits["train"].num_examples

# however, this might not be an integer
# can use tf.cast to convert the validation variable into an integer, preventing any issues
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

# also need an easy acces variable for the number of test samples, using splits as above and cast it to an integer
num_test_samples = mnist_info.splits["test"].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

# also need to scale the inputs to make the model more numerically stable, taking the MNIST image and label
# currently the values are represented from 0 to 255 (the scale of grey), aim is to make it fro 0 to 1
def scale(image, label):
    image = tf.cast(image, tf.float32) #covert to float just in case
    image /= 255. #divide all numbers by 255 to get the scale from 0 to 1 as a float
    return image, label

# create new scaled variable with the function applied
# .map(function) applies a custom transformation to the dataset
scaled_train_and_validation_data = mnist_train.map(scale)

# similarly, need to scale the test data set as well
test_data = mnist_test.map(scale)


# next step is to shuffle the data so that it is impossible for it to be ordered which will confuse the model
# set the buffer size since dealing with the whole dataset will be too much, I'll set the batch size to 10,000 at a time

buffer_size = 10000
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(buffer_size)

# the need to extract the train and validation data into their own variables with .take and .skip
validation_data = shuffled_train_and_validation_data.take(num_validation_samples) #takes num validation samples
train_data = shuffled_train_and_validation_data.skip(num_validation_samples) #takes everything BUT num validation samples

# since I will be using mini-batch gradient descent to train the model so need to specify batch size and prepare data
batch_size = 150
# use the .batch method to combine the elements of a dataset into batches 
train_data = train_data.batch(batch_size)

# as for the validation data, don't need to separate into batches because there will be no backpropogation
# but the model will expect the validation in batch form too, so will overwrite the validation data with num of samples
validation_data = validation_data.batch(num_validation_samples)

# same with the test data, don't need to batch it but the model will expect a batch form
test_data = test_data.batch(num_test_samples)

# validation data must have the same properties as the train and test data
# iter is to have an object that can be iterated one element at a time
# next will load the next element of the iter object
validation_inputs, validation_targets = next(iter(validation_data))

### Outlining the model

In [4]:
# declate the input, output and for the neural network model
input_size = 784
output_size = 10
hidden_layer_size = 1000 #arbitrary, need to do testing to find best width

In [5]:
# define the model, flattening the tensors into vectors, softmax will produce probability outputs
model = tf.keras.Sequential([
                            tf.keras.layers.Flatten(input_shape = (28,28,1)), #flatten to vector, input layer
                            tf.keras.layers.Dense(hidden_layer_size, activation="relu"), #first hidden layer
                            tf.keras.layers.Dense(hidden_layer_size, activation="relu"), #second hidden layer
                            tf.keras.layers.Dense(output_size, activation="softmax") # output layer    
                            ])

### Optimiser and loss function

In [6]:
# must specify the optimiser and loss using the .compile method. Adam is the best optimizer at the moment
# for loss, need to use crossentropy, and will use sparse categorical crossentropy to apply one-hot encoding
# add metrics to help configure the model for training, which is accuracy
model.compile(optimizer = "adam", loss = "sparse_categorical_crossentropy", metrics = ["accuracy"])

### Training the model

In [7]:
# define the number of epochs (iterations), chosen arbitrarily for now
num_epochs = 5

# fit the model and add validation data, verbose = 2 to receive only the most important info for each epoch
model.fit(train_data, epochs = num_epochs, validation_data = (validation_inputs, validation_targets), validation_steps=10, verbose = 2)

Epoch 1/5
360/360 - 36s - loss: 0.2045 - accuracy: 0.9377 - val_loss: 0.0834 - val_accuracy: 0.9748
Epoch 2/5
360/360 - 37s - loss: 0.0774 - accuracy: 0.9760 - val_loss: 0.0689 - val_accuracy: 0.9810
Epoch 3/5
360/360 - 36s - loss: 0.0490 - accuracy: 0.9844 - val_loss: 0.0507 - val_accuracy: 0.9835
Epoch 4/5
360/360 - 37s - loss: 0.0378 - accuracy: 0.9874 - val_loss: 0.0350 - val_accuracy: 0.9893
Epoch 5/5
360/360 - 36s - loss: 0.0290 - accuracy: 0.9903 - val_loss: 0.0347 - val_accuracy: 0.9915


<tensorflow.python.keras.callbacks.History at 0x1728ffb7cc8>

### Testing the model

In [8]:
# can use .evaluate to return the loss value and metrics values for the model in testing mode
test_loss, test_accuracy = model.evaluate(test_data)

      1/Unknown - 7s 7s/step - loss: 0.0759 - accuracy: 0.9803

In [9]:
# apply formatting
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.08. Test accuracy: 98.03%
