# Neural Network for MNIST Classification

The MNIST dataset provides 70,000 labelled images (28x28 pixels) of handwritten digits (1 digit per image). 

The goal is to write an algorithm that detects which digit is present in each image. Since there are only 10 digits (0-9), this is a classification problem with 10 classes. 

In this notebook I use TensorFlow to make a simple neural network consisting of fully-connected layers to solve this classification problem.

## Import the relevant packages

In [20]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

In [21]:
tf.__version__

'2.0.0'

## Data

The data is loaded through `tensorflow_datasets`, which already splits the data into train and test sets. A further 10% of the training data is then split into a validation set.

In [29]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_test_samples, tf.int64)


The image data for each image consists of 28x28 greyscale values from 0-255, so they are converted to float32 and scaled to between 0 and 1 by dividing by 255.

In [30]:
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label

In [31]:
scaled_train_and_validation_data = mnist_train.map(scale)

test_data = mnist_test.map(scale)

Data are then shuffled. For large datasets shuffling all at once is not possible, so a buffer size is defined and shuffling occurs in batches.

In [32]:
BUFFER_SIZE = 10000

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

Next, the training data is split into batches as this significantly speeds up training. The validation set stays in a single a single batch.

In [33]:
BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE)
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model

The model consists of a flatten function to turn each image matrix into a list of values, 2 hidden layers with ReLU activations and an output layer with one unit for each output class (10 in this case) using a sotfmax activation function.

In [36]:
input_size = 784
output_size = 10
hidden_layer_size = 50

model = tf.keras.Sequential([
                            tf.keras.layers.Flatten(input_shape=(28,28,1)),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                            tf.keras.layers.Dense(output_size, activation='softmax')   
                            ])

### Choose the optimizer and the loss function

In [37]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training

In [40]:
NUM_EPOCHS = 10

model.fit(train_data, epochs = NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), validation_steps=num_validation_samples, verbose=2)

Epoch 1/10
540/540 - 8s - loss: 0.0487 - accuracy: 0.9848 - val_loss: 0.0000e+00 - val_accuracy: 0.0000e+00
Epoch 2/10
540/540 - 8s - loss: 0.0463 - accuracy: 0.9859 - val_loss: 0.0593 - val_accuracy: 0.9838
Epoch 3/10
540/540 - 8s - loss: 0.0407 - accuracy: 0.9883 - val_loss: 0.0602 - val_accuracy: 0.9835
Epoch 4/10
540/540 - 8s - loss: 0.0368 - accuracy: 0.9891 - val_loss: 0.0577 - val_accuracy: 0.9833
Epoch 5/10
540/540 - 8s - loss: 0.0355 - accuracy: 0.9894 - val_loss: 0.0492 - val_accuracy: 0.9872
Epoch 6/10
540/540 - 9s - loss: 0.0307 - accuracy: 0.9912 - val_loss: 0.0461 - val_accuracy: 0.9872
Epoch 7/10
540/540 - 9s - loss: 0.0276 - accuracy: 0.9916 - val_loss: 0.0477 - val_accuracy: 0.9865
Epoch 8/10
540/540 - 9s - loss: 0.0265 - accuracy: 0.9915 - val_loss: 0.0400 - val_accuracy: 0.9892
Epoch 9/10
540/540 - 9s - loss: 0.0244 - accuracy: 0.9926 - val_loss: 0.0394 - val_accuracy: 0.9887
Epoch 10/10
540/540 - 9s - loss: 0.0219 - accuracy: 0.9934 - val_loss: 0.0414 - val_accuracy

<tensorflow.python.keras.callbacks.History at 0x2c0922a3b88>

## Test the model

In [39]:
test_loss, test_accuracy = model.evaluate(test_data)



In [35]:
print('Test loss: {0:.4f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.0776. Test accuracy: 97.90%


Even this simple model trained in under 2 minutes is over 97% accurate on unseen testing data.