## MNIST classification, Deep neural network
### Handwritten digit recognition.

The dataset provides 70k images (28x28 pixels) of handwritten digits (1 digit per image).

The goal is to write an algorithm that detects which digit is written.


In [77]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

## Import the dataset

with_info provides a tuple containing info about version, features, samples of the dataset

as_supervised will load the dataset as a 2-tuple structure [input,target]

In [78]:
mnist_data, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

Here we can see the tuple with the train and test sets

so let's separate them

In [79]:
mnist_data

{'train': <PrefetchDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>,
 'test': <PrefetchDataset element_spec=(TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.int64, name=None))>}

In [80]:
mnist_train, mnist_test = mnist_data['train'], mnist_data['test']

mnist_info tells us we have a train set of 60k and a test set of 10k but we don't have a validation set so we need to make one ourselves, as the most popular choice we are going to use 10% of the rain set as the validation set

In [81]:
mnist_info.splits

{'train': <SplitInfo num_examples=60000, num_shards=1>,
 'test': <SplitInfo num_examples=10000, num_shards=1>}

### Saving the dimensions of our sets

In [82]:
number_of_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
#here we don't know if the resulting number is an integer so let's convert it
number_of_validation_samples = tf.cast(number_of_validation_samples, tf.int64)
#let's also get the number of samples in the test set
number_of_test_samples = mnist_info.splits['test'].num_examples
number_of_test_samples = tf.cast(number_of_test_samples, tf.int64)

#### Scale the data

Now we want to scale the data in some way to make the data more numerically stable, like values between 0 and 1 so we make a function that does that, it takes an mnist image and its label

In [83]:
def scale_function(image, label):
    #let's make sure the values are of type float
    image = tf.cast(image, tf.float32)
    #the mnist set contains values between 0 and 255 so let's divide this value by it
    image /= 255.
    return image, label

Now we apply the function

In [84]:
scaled_train_and_validation_data = mnist_train.map(scale_function)
test_data = mnist_test.map(scale_function)

#### Batching

Before we begin thinking about batches we need to make sure the data is not ordered so we need to shuffle it to not confuse the stochastic gradient descent algorithm.

This dataset is small so it doesn't matter but it's good to set a buffer size that takes into account the system's limitation.

In [85]:
BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

now we can slice off our validation data

In [86]:
validation_data = shuffled_train_and_validation_data.take(number_of_validation_samples)

and now we reduce the train data by that same amount

In [87]:
train_data = shuffled_train_and_validation_data.skip(number_of_validation_samples)

Now we can start thinking about batch size for the mini batch gradient descent method, the number should be 1 < batch size < number of samples

In [88]:
BATCH_SIZE = 75

And we apply that number to the train set

In [89]:
train_data = train_data.batch(BATCH_SIZE)

the model expects the validation set to be batched as well so batch the whole thing

In [90]:
validation_data = validation_data.batch(number_of_validation_samples)

same with the test data

In [91]:
test_data = test_data.batch(number_of_test_samples)

iter is the pythonic syntax for making the validation set iterable, by default that will make the dataset iterable but will not load the data, next will load the next batch. Since there's only one batch it will load the inputs and the targets.

In [92]:
validation_inputs, validation_targets = next(iter(validation_data))

### Model Building

In [144]:
#the set has 784 different images
input_size = 784
#and there are 10 different digits in total
output_size = 10
#we can refine the layers later
hidden_layer_size = 256

model = tf.keras.Sequential([
    #we know the image dimensions already, 28x28, and flatten them into a vector
    tf.keras.layers.Flatten(input_shape=(28,28,1)),
    #the number of layers is up to us
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    #the output layer is created the same as the rest but we use the output size
    #the activation function of the output layer must transform the value into a probability
    #so we use softmax
    tf.keras.layers.Dense(output_size, activation='softmax')
])

#### Choosing the optimizer and the loss function

adam is a safe option for optimization

as for the loss function i'm going to use sparse_categorical_crossentropy because i haven't applied one_hot_encoding to the data and it will do it for me

In [149]:
# adam_optimizer = tf.keras.optimizers.Adam(learning_rate=*.**)
# if you want to change the learning rate of the optimizer
# then just replace 'adam' with adam_optimizer in the compile module below
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics='accuracy')

### Training the model

1. At the beginning of each epoch the training loss will be set to 0
2. The algorithm will iterate over a preset number of batches
3. The weights and biases will be updated after each batch
4. At the end of each epoch will get a value for the loss function , indicating how the training is going
5. We will also know the training accuracy
6. At the end of each epoch the algorithm will forward propagate the whole validation set

In [150]:
number_of_epochs = 5

model.fit(train_data, epochs = number_of_epochs, validation_data=(validation_inputs, validation_targets), verbose=2)


Epoch 1/5
720/720 - 5s - loss: 0.0521 - accuracy: 0.9866 - val_loss: 0.0547 - val_accuracy: 0.9873 - 5s/epoch - 8ms/step
Epoch 2/5
720/720 - 4s - loss: 0.0326 - accuracy: 0.9904 - val_loss: 0.0469 - val_accuracy: 0.9877 - 4s/epoch - 6ms/step
Epoch 3/5
720/720 - 4s - loss: 0.0244 - accuracy: 0.9926 - val_loss: 0.0336 - val_accuracy: 0.9905 - 4s/epoch - 6ms/step
Epoch 4/5
720/720 - 5s - loss: 0.0214 - accuracy: 0.9939 - val_loss: 0.0377 - val_accuracy: 0.9930 - 5s/epoch - 7ms/step
Epoch 5/5
720/720 - 5s - loss: 0.0222 - accuracy: 0.9941 - val_loss: 0.0243 - val_accuracy: 0.9935 - 5s/epoch - 8ms/step


<keras.callbacks.History at 0x2398f10d150>

I tried a bunch of times with different learning rates, layer sizes and such and i think that is decent for a practice project

### Testing the model

In [151]:
test_loss, test_accuracy = model.evaluate(test_data)



Here we can see that after testing on a completely new dataset we got an accuracy of 98%, we can get better results by changing the batch size, learning rate, layer density, number of epochs, etc.