# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

# TensorFLow includes a data provider for MNIST that we'll use.
# It comes with the tensorflow-datasets module, therefore, if you haven't please install the package using
# pip install tensorflow-datasets 
# or
# conda install tensorflow-datasets

import tensorflow_datasets as tfds

# these datasets will be stored in C:\Users\*USERNAME*\tensorflow_datasets\...
# the first time you download a dataset, it is stored in the respective folder 
# every other time, it is automatically loading the copy on your computer 

## Data

That's where we load and preprocess our data.

In [2]:
# remember the comment from above
# these datasets will be stored in C:\Users\*USERNAME*\tensorflow_datasets\...
# the first time you download a dataset, it is stored in the respective folder 
# every other time, it is automatically loading the copy on your computer 

# tfds.load actually loads a dataset (or downloads and then loads if that's the first time you use it) 
# in our case, we are interesteed in the MNIST; the name of the dataset is the only mandatory argument
# there are other arguments we can specify, which we can find useful
# mnist_dataset = tfds.load(name='mnist', as_supervised=True)
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# we will use this information a bit below and we will store it in mnist_info

# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 
# alternatively, as_supervised=False, would return a dictionary
# obviously we prefer to have our inputs and targets separated 

## Data Preprocessing

In [3]:
# Extract test and train data
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

#### Split Training dataset into Training and Validation datasets

In [4]:
# Determine valie oof validation samples
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

# Store number of test samples
num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)

# scale/transform inputs  so that values are between 0 and 1
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image/=255.
    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)

test_data = mnist_test.map(scale)

#### Shuffle Data

In [5]:
# shuffle data (best practice when batching)
BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

Validation and Train data

In [6]:
validation_data = shuffled_train_and_validation_data.take(num_validation_samples) # takes n first elements

train_data = shuffled_train_and_validation_data.skip(num_validation_samples) # skips n first elements

#### Mini-Batch

In [7]:
BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE)

# In theory, we do not need nor should split our validation data into batches
# However, the model expects our validation set in batch form too
# Hence, we create a single batch
validation_data = validation_data.batch(num_validation_samples)

test_data = test_data.batch(num_test_samples)


# MNIST data is iterable and in 2-tuple format (as_supervised = True)
# remember that validation_data only has one batch
validation_inputs, validation_targets = next(iter(validation_data))

## Model

#### Outline the model

The input layer consists of 784 inputs (coming from a 28x28 matrix). We have 10 outputs nodes, one for each digit [0,9]. In addition, we will work with two hidden node layers (depth) of 50 nodes each (width). These height and width hyperparameters are suboptimal, and we will fine tune them later.

In [8]:
input_size = 784
output_size = 10
hidden_layer_size = 100

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28,28,1)), # flatten inputs
    tf.keras.layers.Dense(hidden_layer_size,activation='relu'), # first hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # second hidden layer
    tf.keras.layers.Dense(output_size, activation='softmax')# output layer, softmax outputs probabilities
])

#### Choose the optimizer and the loss function

In [9]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

## Training

In [10]:
NUM_EPOCHS = 5 # arbitrary
model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs,validation_targets), verbose=2)

Epoch 1/5
540/540 - 4s - loss: 0.3304 - accuracy: 0.9057 - val_loss: 0.1851 - val_accuracy: 0.9440 - 4s/epoch - 7ms/step
Epoch 2/5
540/540 - 2s - loss: 0.1358 - accuracy: 0.9598 - val_loss: 0.1125 - val_accuracy: 0.9647 - 2s/epoch - 4ms/step
Epoch 3/5
540/540 - 2s - loss: 0.0963 - accuracy: 0.9711 - val_loss: 0.0919 - val_accuracy: 0.9737 - 2s/epoch - 4ms/step
Epoch 4/5
540/540 - 2s - loss: 0.0761 - accuracy: 0.9773 - val_loss: 0.0749 - val_accuracy: 0.9767 - 2s/epoch - 4ms/step
Epoch 5/5
540/540 - 2s - loss: 0.0605 - accuracy: 0.9821 - val_loss: 0.0648 - val_accuracy: 0.9790 - 2s/epoch - 4ms/step


<keras.callbacks.History at 0x1ee619e9580>