# Deep Learning with Python
## Example 2.1 - MNIST Digit Recognition

In [32]:
import numpy as np
from tensorflow import keras
from tensorflow.keras.datasets import mnist

Attempting to read the `mnist` dataset directly

In [33]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Examining the shape of training and test data and labels

In [34]:
train_data.shape

(60000, 28, 28)

In [35]:
test_data.shape

(10000, 28, 28)

In [36]:
train_labels.shape

(60000,)

In [37]:
test_labels.shape

(10000,)

So training and test sets consist of 60k and 10k images respectively, each of which are stored as `numpy` arrays of size (28, 28). 

For each sample in the training and test set, there is a corresponding label.

In [38]:
# Can also access the number of elements in each array using `len`\
print("Length of training data tensor: "+ str(len(train_data)))
print("Length of test data tensor: " + str(len(test_data)))

Length of training data tensor: 60000
Length of test data tensor: 10000


## Creating a model

In [39]:
from tensorflow.keras import models, layers
# Linear model - layers are stacked on top of each other in sequence
network = models.Sequential()

# Input layer - has 512 activation units, each of which accepts an input tensor
# of 28 x 28 floating point values (with arbitrary number of batches of such values)
# and uses the `relu` activation function to compute an output
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28, )))

# Output layer - has 10 hidden units, each of which will output a probability 
# score that represents the probability that a given input tensor belongs to one of 10 image classes
network.add(layers.Dense(10, activation='softmax'))

## Compiling Model
This means specifying the 
- `optimizer` - the specific kind of gradient descent that will be used by the neural network to minimise its loss
- `loss function` - a measure of the difference between the network's predicted label and the actual label for a given sample
- `metrics` - how will we be assessing the performance of our network? What measure will the loss function be used to compute? Accuracy? Specificity? Selectivity? 

In [40]:
network.compile(optimizer='rmsprop', 
                loss='categorical_crossentropy', # probabilistic and multiple classes 
                metrics=['accuracy'])

## Data Preprocessing
Transforming the input data into a tensor with dimensions that are compatible with the input layer of the neural network.

Each training vector is reshaped from 3D tensor to a 2D tensor, where each row corresponds to a 28 * 28 dimensional vector of pixels for that image.

The pixel values are then converted to floating point values and normalized by dividing by the range - all pixel values are between 0 and 255.

In [41]:
train_images = train_images.reshape(60000, 28 * 28)
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape(10000, 28 * 28)
test_images = test_images.astype('float32') / 255

Converting labels to categorical data

In [42]:
# Prior to categorical conversion
print(train_labels[:10])

[5 0 4 1 9 2 1 3 1 4]


Each label is a 0D tensor i.e. a single scalar value between 0 and 9 that corresponds to a single kind of digit.

When we convert this to a categorical variable, we will be carrying out one-hot encoding, where each label will become a 10D vector of all zeroes except for the index corresponding to that label - which will be 1.

In [43]:
from tensorflow.keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# After one-hot encoding
print(train_labels[:10])

[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]


## Training the Network
Using automated gradient descent to modify the weights (kernels and biases) for each node in the neural network such that the training images are mapped onto the right training labels.

Input data is the `train_images` tensor and one-hot encoded tensor of  predicted labels `train_labels`. The model is trained by making 5 passes over the entire data set of 60k images in batches of 128 images at a time. 

In [44]:
network.fit(x=train_images, y=train_labels, epochs=5, batch_size=128)

Instructions for updating:
Use tf.cast instead.
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x2624330a3c8>

## Evaluating Model
Using the weights of the model that were learned during training by gradient descent over the training data, the model will attempt to predict the correct label for testing data - data that it has never seen before.

In [45]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test accuracy: ', test_acc)

test accuracy:  0.9765


The test set accuracy is much lower than the training set accuracy. This is because of overfitting - during training, the model may learn dataset-specific I/O mappings that do not generalize well to data it has never seen before. As a result, machine learning models will perform worse on new data than on training data.