# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [None]:
import numpy as np
import tensorflow as tf
import pandas as pd

# tensorflow  will be used to load the ready-to-use MNIST dataset
import tensorflow_datasets as tfds

## Data preprocessing

In [None]:
# Using 'tfds.load' method to import MNIST dataset
# Using 'with_info=True' to obtain relevant information regarding number of samples
# Using 'as_supervised=True' to split the dataset in a 2-tuple structure (input, target) 
mnist_data, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

# The 'mnist_dataset' variable already contains a train and testing split, which will be extracted
mnist_train, mnist_test = mnist_data['train'], mnist_data['test']

# Obtaining the number of training samples from the 'mnist_info' metadata
# Defining the number of the validation set samples as a % of the train set
# Defining the number of the test set samples
num_valid = 0.1 * mnist_info.splits['train'].num_examples
num_test = mnist_info.splits['test'].num_examples

# The above step produces a float, which now will be turn into a int to avoid issues
num_valid = tf.cast(num_valid, tf.int64)
num_test = tf.cast(num_test, tf.int64)

# Preparing a data scaler to obtain inputs between 0 and 1
# The inputs are shades of gray numered from 0 to 255
def scale (image,label): 
    image = tf.cast(image, tf.float64)
    image /= 255.
    return image, label

# Scaling the data set applying the scaler with '.map()'
# The 'mnist_train' contains both train and validation datasets, which later will be split using 'num_valid'
scaled_train_valid_data = mnist_train.map(scale)
test_data = mnist_test.map(scale)

# Shuffling the data to ensure randomness during training
# There's no need to shuffle the test data
buffer_size = 10000
shuffled_train_valid_data = scaled_train_valid_data.shuffle(buffer_size)

# Finnaly splitting train and validation data sets, after ensuring they're shuffled
# Validation takes only the first 'num_valid' samples. Train skips the first 'num_valid' samples.
validation_data = shuffled_train_valid_data.take(num_valid)
train_data = shuffled_train_valid_data.skip(num_valid)

# Batch size
batch_size = 100

# Combining data into batches, as required by the model
# Testing and validation are a single batch, since they do not back propagate (weight adjustments)
train_data = train_data.batch(batch_size)
validation_data = validation_data.batch(num_valid)
test_data = test_data.batch(num_test)

# takes next batch (it is the only batch)
# because as_supervized=True, we've got a 2-tuple structure
validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model

In [None]:
# inputs are pixels in a 28 x 28 x 1 image
# outputs are digits ranging from 0 to 9
input_size = 28 * 28 * 1
output_size = 10

# Defining hidden layer size
hidden_layer_size = 600

# The model
model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)), #input layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), #1 Hidden Layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), #2 Hidden Layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), #3 Hidden Layer
    tf.keras.layers.Dense(output_size, activation='softmax') #output layer
])

### Otimizer and loss function

In [None]:
# Using 'adam' stochastic descent method for weight adjustment (optimizer)
# Since it's a classification problem and the labels are not in one-hot format, the loss function is sparse cross-entropy
model.compile(optimizer= 'adam', loss= 'sparse_categorical_crossentropy', metrics= ['accuracy'])

### Training

In [None]:
num_epochs = 8

model.fit(train_data, epochs = num_epochs, validation_data=(validation_inputs, validation_targets), verbose=2)

Epoch 1/8
540/540 - 13s - loss: 0.2132 - accuracy: 0.9342 - val_loss: 0.1036 - val_accuracy: 0.9677
Epoch 2/8
540/540 - 10s - loss: 0.0841 - accuracy: 0.9740 - val_loss: 0.0979 - val_accuracy: 0.9710
Epoch 3/8
540/540 - 10s - loss: 0.0607 - accuracy: 0.9804 - val_loss: 0.0569 - val_accuracy: 0.9812
Epoch 4/8
540/540 - 9s - loss: 0.0443 - accuracy: 0.9861 - val_loss: 0.0676 - val_accuracy: 0.9800
Epoch 5/8
540/540 - 10s - loss: 0.0377 - accuracy: 0.9878 - val_loss: 0.0560 - val_accuracy: 0.9827
Epoch 6/8
540/540 - 10s - loss: 0.0343 - accuracy: 0.9892 - val_loss: 0.0339 - val_accuracy: 0.9887
Epoch 7/8
540/540 - 10s - loss: 0.0260 - accuracy: 0.9921 - val_loss: 0.0310 - val_accuracy: 0.9902
Epoch 8/8
540/540 - 9s - loss: 0.0242 - accuracy: 0.9925 - val_loss: 0.0268 - val_accuracy: 0.9908


<tensorflow.python.keras.callbacks.History at 0x7f83fee54278>

## Testing the model

In [None]:
test_loss, test_accuracy = model.evaluate(test_data)
print('\nTest loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))


Test loss: 0.08. Test accuracy: 98.15%


## Takeaway

The model achieved an accuracy of roughly 98.0% when predicting images and assignin a digit to it. It's a good result, considering the methodology used and the time it took to train itself in a matter of just a couple minutes. If we were to improve the accuracy even by a small increase such as 0.5%, it could take a large amount of processing time, maybe even a few hours. 