# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset is called MNIST and refers to handwritten digit recognition. You can find more about it on Yann LeCun's website (Director of AI Research, Facebook). He is one of the pioneers of what we've been talking about and of more complex approaches that are widely used today, such as covolutional neural networks (CNNs). 

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes. 

Our goal would be to build a neural network with 2 hidden layers.

## Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

## Data

That's where we load and preprocess our data.

In [2]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)
# with_info=True will also provide us with a tuple containing information about the version, features, number of samples
# we will use this information a bit below and we will store it in mnist_info

# as_supervised=True will load the dataset in a 2-tuple structure (input, target) 
# alternatively, as_supervised=False, would return a dictionary
# obviously we prefer to have our inputs and targets separated 



In [3]:
# dataset has built-in references to train and test datasets
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

# there is no built-in validation data set, so we will define our batch size to be the 10% of the data set.
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
# Make sure this number to be an integer
num_validation_samples = tf.cast(num_validation_samples, tf.int64)
print(f'validations: {num_validation_samples}')

num_test_samples = mnist_info.splits['train'].num_examples
# Make sure this number to be an integer
num_test_samples = tf.cast(num_test_samples, tf.int64)
print(f'tests: {num_test_samples}')

validations: 6000
tests: 60000


In [4]:
# we want image scaled between 0 and 1, we prefer them to be float
def scale(image, label):
    # caution, make sure its float, so division is secured to be float
    image = tf.cast(image, tf.float32)
    image /= 255. # Vals between [0,255]
    return image, label

scaled_train_and_validation_data = mnist_train.map(scale)
scaled_test_data = mnist_test.map(scale)

In [5]:
# Shuffle data, so, if ordered, to not have weird values
BUFFER_SIZE = 10000
shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

In [6]:
# Take our data
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

In [7]:
# batch_size = 1 => SGD
# batch_size > 1 => (single batch) GD
BATCH_SIZE = 100

# dataset.batch combines the consecutive elements of a dataset into batches
train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = scaled_test_data.batch(num_test_samples)

In [8]:
validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model

In [9]:
input_size = 784 # 28*28 pixels
output_size = 10 # number in image belongs to: [0, 10]
hidden_layer_size = 255 # Accordting to lecture, suboptimal


model = tf.keras.Sequential([
    # Input layer
    
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
    # since we don't know CNNs yet, we don't know how to feed such input into our net, so we must flatten the images
    # there is a convenient method 'Flatten' that simply takes our 28x28x1 tensor and orders it into a (None,) 
    # or (28x28x1,) = (784,) vector
    # this allows us to actually create a feed forward neural network
        tf.keras.layers.Flatten(input_shape=(28, 28, 1)), 
    # Hidden layers
        tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # Relu is good for this problem, according to lecuter
        tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    # Output layer
        tf.keras.layers.Dense(output_size, activation='softmax') # we want output to be categorial
])

### Optimizer and the loss function

In [10]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training

In [11]:
NUM_EPOCHS = 5

model.fit(
    train_data,
    epochs=NUM_EPOCHS,
    validation_data=(validation_inputs, validation_targets),
    verbose=2,
    validation_steps=10
)

Epoch 1/5
540/540 - 15s - loss: 0.2524 - accuracy: 0.9261 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/5
540/540 - 16s - loss: 0.0942 - accuracy: 0.9711 - val_loss: 0.0841 - val_accuracy: 0.9773
Epoch 3/5
540/540 - 15s - loss: 0.0653 - accuracy: 0.9795 - val_loss: 0.0596 - val_accuracy: 0.9835
Epoch 4/5
540/540 - 16s - loss: 0.0473 - accuracy: 0.9848 - val_loss: 0.0510 - val_accuracy: 0.9853
Epoch 5/5
540/540 - 17s - loss: 0.0367 - accuracy: 0.9886 - val_loss: 0.0500 - val_accuracy: 0.9838


<tensorflow.python.keras.callbacks.History at 0x7f6b385b6ed0>

### Testing

In [12]:
test_loss, test_accuracy = model.evaluate(test_data)



In [13]:
print(f'Test loss: {test_loss:.2f}. Test accuracy: {test_accuracy*100.:.2f}%')

Test loss: 0.09. Test accuracy: 97.36%
