# Excercise 1 - Handwritten Digits Recgonition using MNIST

In this notebook, we will build a classifier model from scratch that is able to recognise handwritten digits. We will follow these steps:

1. Explore the MNIST dataset
2. Build a small convnet from scratch to solve our classification problem
3. Evaluate training and validation accuracy.

Let's get started!

## Download and Explore the Dataset

Let's start by downloading our MNIST dataset by calling the function `tf.keras.datasets.mnist.load_data()`.

In [None]:
# Import the tensorflow module and give it an alias
import tensorflow as tf

# Expand the data into train and test set, together with its label
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data()

Once loaded, let's check how many samples we have in both train and test set.

In [None]:
# How many samples do we have?
print("Number of training data:", len(x_train))
print("Number of testing data:", len(x_test))

### Viewing the first sample

Looking at the first sample, you will find that each pixel is represented by a number ranging from 0 to 255. 0 represents black and 255 represents white. This means that pixel with higher value will appear white.

Collectively, they will show a digit.

In [None]:
# Let's look at the first sample
print(x_train[0])

In [None]:
# Shape of the first sample train data
x_train[0].shape

In [None]:
# Let's look at the first label
print(y_train[0])

### Inspecting the training set and its labels

To visualise the images and its labels in the training set, we randomly sample 14 images and display them using the code below.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

np.random.seed()

rand_14 = np.random.randint(0, x_train.shape[0],14)
sample_digits = x_train[rand_14]
sample_labels = y_train[rand_14]
# code to view the images
num_rows, num_cols = 2, 7
f, ax = plt.subplots(num_rows, num_cols, figsize=(12,5),
                     gridspec_kw={'wspace':0.03, 'hspace':0.01}, 
                     squeeze=True)

for r in range(num_rows):
    for c in range(num_cols):
        image_index = r * 7 + c
        ax[r,c].axis("off")
        ax[r,c].imshow(sample_digits[image_index], cmap='gray')
        ax[r,c].set_title('No. %d' % sample_labels[image_index])
plt.show()
plt.close()

## Data Preprocessing

### Feature scaling

Convert the samples from integers to floating-point numbers so that it is easier for the model to ingest.

In [None]:
# Normalise the data by dividing it with 255, this is part of scaling the data to the same unit dimension. 
x_train, x_test = x_train / 255.0, x_test / 255.0

Let's take a look at the impact of scaling.

In [None]:
# Let's look again at the first sample
print(x_train[0])

### Reshaping input data

Before the data is usable, we need to reshape the input data from 28 x 28 into 28 x 28 x 1. 

In [None]:
# Converting from (28,28) to (28,28,1)

print("Before reshaping:", x_train.shape)

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

print("After reshaping:", x_train.shape)

## Building a Small Convnet Model

The images that will go into our convnet are 32 x 32 grayscaled feature.

We first pass in the feature to the convolution layer with 16 filters of size 3x3. We choose `relu` as the activation function. Next, our maxpooling layer operate on 2x2 windows, followed by flattening the 2-dimension input 1-dimension output. Lastly, we pass in the data to a fully connected layer (`Dense`) with 10 classes.

In [None]:
import keras
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Flatten, Dense

# Here we specify the input shape of our model 
# This should match the size of each digit along with the number of channel (1)
input_shape = (28,28,1)

# Define the number of classes
num_classes = 10 

# Initialising the model
model = Sequential()

# Convolution layer extracts 16 filters that are 3x3
model.add(Conv2D(16, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))

# Convolution is followed by max-pooling layer with a 2x2 window
model.add(MaxPooling2D(pool_size=(2, 2)))

# Flatter 2-dim tensor to 1-d tensor so we can add fully connected layers (dense layers)
model.add(Flatten())

# Create output layer with the number of classes we are classifying and activate using softmax
model.add(Dense(num_classes, activation='softmax'))

Let's summarise the model architecture:

In [None]:
# The number of parameters will be the number of weights that need to be adjusted when model is learning
model.summary()

The "Output Shape" column shows how the size of your feature map evolve in each layer. The convolution layers reduce the size of the feature maps due to padding, and each pooling layer half the feature map.

Next, we will configure the specifications for model training.

We train our model with `sparse_categorical_crossentropy` loss, because this is a multi-class problem and that our targers are integer encodings (e.g. 0, 1, 2).  During training, we want to monitor classification `accuracy`.

In [None]:
model.compile(loss='sparse_categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])


### Optimizer
Optimizer ties together the loss function and model parameters by updating the model in response to the output of the loss function. There are multiple optimizer to choose from. See https://keras.io/optimizers/

`adam`, `SGD`, `RMSprop`, `Adagrad`, `Adadelta`, `Adamax`, `Nadam`

### Loss function
A loss function (or objective function, or optimization score function). There are also a few loss functions to choose from. See https://keras.io/losses/

`mean_squared_error`, `mean_absolute_error`, `categorical_crossentropy`, `binary_crossentropy`, `....`

### Metric
A metric is a function that is used to judge the performance of your model. See https://keras.io/metrics/

`accuracy`, `mae`, `binary_accuracy`, `categorical_accuracy`, `sparse_categorical_accuracy`, `top_k_categorical_accuracy`, `...`
    

## Model Training
Let's train the model for 3 epochs and monitor its training accuracy.

In [None]:
model.fit(x_train, y_train, epochs=3)

## Evaluating Accuracy and Loss of the Model

With a trained model, we evaluate the model performance against the truth labels of our validation set.

In [None]:
# Validating against the test set
scores = model.evaluate(x_test, y_test)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Let's take a look at a random test image and its prediction.

In [None]:
import random

# Randomly choose an image from x_test
index = random.choice(range(len(x_test)))

plt.imshow(x_test[index].reshape(28, 28), cmap='gray')
pred = model.predict(x_test[index].reshape(1, 28, 28, 1))
print(pred.argmax())

## Save and load the trained model

Once you are done with model training, we can use the `.save()` and `.load_model()` function to save the trained  model as one `.h5` file and load it back for future use. The `.h5` file contains the model architecture, weights and optimizer state, and it allows us to resume training from exactly where we left off.

Reference: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

In [None]:
# creates a HDF5 file 'my_mnist_model.h5'
# NOTE: it will overwrite existing file, if any
model.save('my_mnist_model.h5')

Let's delete the model and try to load it back.

In [None]:
# deleting the model
del model 

In [None]:
model.evaluate(x_test, y_test) # this will throw an error

In [None]:
from keras.models import load_model

# returns a compiled model, identical to the previous one
model = load_model('my_mnist_model.h5')

In [None]:
model.evaluate(x_test, y_test) # this will work