# Testing Notebook

After training the model at [train.ipynb](https://github.com/ammar-elsabe/handwritten-digit-recognition-DCNN/blob/master/src/train.ipynb) we need to test its accuracy

## Imports

We begin by importing our dependencies, for testing, we need tensorflow and tensorflow_datasets much like we did with training, but we also need numpy and seaborn

In [None]:
import tensorflow as tf
import numpy as np
import seaborn as sns
from pprint import pprint
import matplotlib.pyplot as plt
import tensorflow_datasets as tfds

## The dataset
We use the [load](https://www.tensorflow.org/datasets/api_docs/python/tfds/load) method to load the mnist dataset, but this time, we only load the test split which contains 10000 images

In [None]:
# load the mnist dataset
dstest, dsinfo = tfds.load(
    'mnist',
    split=['test'],  # only need the test set
    data_dir='../dataset/',
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

dstest = dstest[0] # Because tfds.load returns a list


## Preprocessing

We follow the same preprocessing we did while training the model, which is practically nothing that is necessary. However for performance reasons, we autotune and batch the test set

In [None]:
batch_size = 128

# Evaluation pipleine
dstest = dstest.batch(batch_size)
dstest = dstest.cache()
dstest = dstest.prefetch(tf.data.AUTOTUNE)

## The model

We load the model created in the train script

In [None]:
model = tf.keras.models.load_model('./model.h5')

Printing the class names

In [None]:
class_names = dsinfo.features['label'].names
print(class_names)

Generating the model predictions

In [None]:
model_probabilities = model.predict(dstest)
pprint(model_probabilities)

`model.predict ` returns a list of lists, where each inner list contains the probabilities that image belongs to each class, to get the predicted labels we choose the class with the highest probability

In [None]:
predictions = [np.argmax(x) for x in model_probabilities]
pprint(predictions)

## Evaluation of the model
We get the actual true labels using this one line

In [None]:
labels = np.concatenate([y for x, y in dstest], axis=0)

We then generate a confusion matrix and plot it

In [None]:
confusion_matrix = tf.math.confusion_matrix(
    labels=labels,
    predictions=predictions,
)
pprint(confusion_matrix)
sns.heatmap(confusion_matrix,
            annot=True,
            xticklabels=class_names,
            yticklabels=class_names,
            fmt='g')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.savefig('../paper/figs/confusion_matrix.svg', format='svg')
plt.show()

With the confusion matrix we can calculate the accurace as
$$
\frac{\sum_i k_{ii}}{\sum_i \sum_j k_{ij}} \times 100
$$

Where $k_{xy}$ represent an element in the $x^{th}$ row and the $y^{th}$ column in the confusion matrix, in other words it is the sum of the elements in the diagonal of the confusion matrix, dividied by the total sum.

In [None]:
diagonal_sum = 0
total_sum = 0

for i in range(len(class_names)):
    for j in range(len(class_names)):
        if(i == j):
            diagonal_sum += confusion_matrix[i][j]
        total_sum += confusion_matrix[i][j]

print("Diagonal sum: {}, Total sum: {}".format(diagonal_sum, total_sum)) # total sum should be 10000 as the test split is 10000 images
accuracy = 100 * diagonal_sum/total_sum
print("Accuracy: {}".format(accuracy))