# How to approach the dataset
- Each image is 28 x 28 pixels, it is on a greyscale, so we can think of the problem as a 28 x 28 matrix where input values are from 0 to 255
- 0 = black, 255 = white
- The approach for deep neural networks is to 'flatten' each image into a vector 784 x 1
- Each pixel is an input in the input layer
- There are 10 digits, so 10 classes, so 10 output units in the output layer
- The output will then be compared to the targets, we will use one hot encoding for both the outputs and the targets
- Since we would like to see the probability of a digit being correctly labelled we will use a softmax activation function for the output layer

## Action Plan
1. Prepare our data and preprocess it. Create training, validation, and test datasets
2. Outline the model and choose the activation functions
3. Set the appropriate advanced optimizers and the loss function
4. Make it learn
5. Test the accuracy of the model

### Import the relevant packages

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

# Data Preprocessing

In [2]:
mnist_data, mnist_info = tfds.load(name = 'mnist', with_info = True, as_supervised = True)

#tfds.load(name, with_info, as_supervised) loads a dataset from 
#tensorflow datasets 
#as_supervised=True loads the data in a 2-tuple structure 
#[input, target]
#with_info=True provides a tuple containing info about version
#features, no. of samples

### Splitting the Dataset

In [3]:
mnist_train, mnist_test = mnist_data['train'], mnist_data['test']

## Creating the Validation set

In [4]:
num_val = 0.1*mnist_info.splits['train'].num_examples
num_val = tf.cast(num_val, tf.int64)

# Want the validation set to be 10% of the training set
# So the number of validation samples is 0.1 times the number
# of training samples to the nearest integer

num_test = mnist_info.splits['test'].num_examples
num_test = tf.cast(num_test, tf.int64)
# Want the number of samples in the test set to be easily
# accessible

In [5]:
# Normally we'd like to scale our data in some way to make the
# result more numerically stable (e.g. inputs between 0 & 1)

def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label

# We want all values to be floats so we cast the image to float
# Next we scale it, since the images contain values from 0 to
# 255, we can divide the values by 255 to get values between 0
# and 1

# image /= 255. the dot at the end signifies we want the result
# to be a float

# dataset.map(function) applies a custom transformation to a
# given dataset. It takes as input a function which determines
# the transformation
scaled_train_val_data = mnist_train.map(scale)

scaled_test_data = mnist_test.map(scale)

### Shuffling the data

In [6]:
buffer_size = 10000

# We shuffle the data so batching works effectively
# When we are dealing with enormous datasets, we can't shuffle
# all the data at once, so we take 10000 samples shuffle them and
# then take the next 10000

# If buffer size>= num_samples shuffling will happen all at
# once (uniformly)

shuffle_train_val_data = scaled_train_val_data.shuffle(buffer_size)

val_data = shuffle_train_val_data.take(num_val)
# val_data = 10% of train data which we stored in num_val
# We can use .take() to extract that many samples

train_data = shuffle_train_val_data.skip(num_val)
# We can get the train_data by extracting all elements but the
# first x validation samples

### Batching

In [7]:
batch_size = 100

# dataset.batch(batch_size) is a method that combines the 
# consecutive elements of a dataset into batches

train_data = train_data.batch(batch_size)

# We don't need to batch the val_data as we don't back 
# propagate that, however the model expects the val_data in 
# batch form too

val_data = val_data.batch(num_val)
# This indicates that the model should take the whole val_data
# at once

test_data = scaled_test_data.batch(num_test)

val_inputs, val_targets = next(iter(val_data))
# iter() creates an object which can be iterated one element
# at a time (e.g. in a for loop or while loop)
# next() loads the next element of an iterable object

# Model

### Outline Model

In [8]:
input_size = 784
output_size = 10
hidden_size = 100

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape = (28, 28, 1)),
    tf.keras.layers.Dense(hidden_size, activation='relu'),
    tf.keras.layers.Dense(hidden_size, activation='relu'),
    tf.keras.layers.Dense(output_size, activation='softmax')
])

# tf.keras.layers.Flatten(original shape) transforms a tensor
# into a vector
# tf.keras.layers.Dense(output size, activation) takes the 
# inputs, provided the model and calculates the dot product of 
# the inputs and the weights and adds the bias. This is also 
# where we can apply an activation function
# relu is a good activation function for the mnist set

# The above format is how we stack layers we have two hidden 
# layers and the output layer, we have assumed the hidden 
# layers are the same size

### Choose the optimizer and the loss function

In [9]:
model.compile(optimizer = 'adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# These strings are not case sensitive
# We want to use cross-entropy as the loss function here but
# TensorFlow has 3 cross-entropy functions:
# binary_crossentropy refers to the case where we have binary
# encoding
# categorical_crossentropy expects that you've one-hot encoded
# the targets
# sparse_categorical_crossentropy applies one-hot encoding

# We can include metrics that we wish to calculate throughout
# the training and testing processes, typically that's the 
# accuracy

### Training
#### What happens inside each epoch
1. At the beginning of each epoch the training loss will be set to 0
2. The algorithm will iterate iver a preset number of batches all from train_data
3. The weights and biases will be updated as many times as there are batches
4. We will get a value for the loss function, indicating how the training is going
5. We will also see a training accuracy
6. At the end of each epoch, the algorithm will forward propagate the whole validation set

**We keep an eye on the validation loss to make sure no overfitting occurs. The validation accuracy is the true accuracy of the model**

In [10]:
num_epochs = 100

early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)

model.fit(train_data,
          epochs = num_epochs,
          callbacks = [early_stopping],
          validation_data = (val_inputs, val_targets),
          verbose=2
         )

Epoch 1/100
540/540 - 3s - loss: 0.3309 - accuracy: 0.9050 - val_loss: 0.1703 - val_accuracy: 0.9500 - 3s/epoch - 6ms/step
Epoch 2/100
540/540 - 2s - loss: 0.1404 - accuracy: 0.9592 - val_loss: 0.1237 - val_accuracy: 0.9625 - 2s/epoch - 4ms/step
Epoch 3/100
540/540 - 2s - loss: 0.0997 - accuracy: 0.9699 - val_loss: 0.0920 - val_accuracy: 0.9720 - 2s/epoch - 4ms/step
Epoch 4/100
540/540 - 2s - loss: 0.0762 - accuracy: 0.9769 - val_loss: 0.0811 - val_accuracy: 0.9740 - 2s/epoch - 4ms/step
Epoch 5/100
540/540 - 2s - loss: 0.0613 - accuracy: 0.9816 - val_loss: 0.0692 - val_accuracy: 0.9783 - 2s/epoch - 4ms/step
Epoch 6/100
540/540 - 2s - loss: 0.0508 - accuracy: 0.9840 - val_loss: 0.0510 - val_accuracy: 0.9845 - 2s/epoch - 4ms/step
Epoch 7/100
540/540 - 2s - loss: 0.0414 - accuracy: 0.9875 - val_loss: 0.0461 - val_accuracy: 0.9860 - 2s/epoch - 4ms/step
Epoch 8/100
540/540 - 2s - loss: 0.0356 - accuracy: 0.9888 - val_loss: 0.0467 - val_accuracy: 0.9842 - 2s/epoch - 4ms/step
Epoch 9/100
540/

<keras.callbacks.History at 0x20c35500640>

# Test the model

In [11]:
test_loss, test_accuracy = model.evaluate(test_data)



In [12]:
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.11. Test accuracy: 97.73%


- After we test the model, conceptually, we are no longer allowed to change it
- Getting a test accuracy close to the validation accuracy shows we did not overfit