<img src="images/kiksmeisedwengougent.png" alt="Banner" width="1100"/>

<div>
    <font color=#690027 markdown="1">   
<h1>CLASSIFICATION OF THE MNIST-DATASET WITH A NEURAL NETWORK</h1>    </font>
</div>

<div class="alert alert-box alert-success">
This notebook contains a <em>concrete example of a neural network</em> that is built using the functionalities of the Python module <em>Keras</em>. <br>The example concerns a <em>classification problem</em>, namely the classification of grayscale images of handwritten digits.</div>

The images must be presented to the AI system in a format of 28x28 pixels; there are 10 classes with labels 0 through 9.
To train the network, the MNIST dataset is used. This dataset consists of 60,000 images to train the network and 10,000 images to test the network. <br>These images were collected by the National Institute of Standards and Technology (the NIST in the acronym MNIST) in the 1980s.

### Import necessary modules

In this notebook, you build a *sequential model* with Keras.<br>That is a model that consists of *connected layers*.You will work with an input layer, an output layer, and one hidden layer in between.<br>So you will need to be able to create that model and the layers.
You import the 'NumPy' module for calculations, through the 'Matplotlib' module you will be able to create graphs; the 'Keras' module will provide you with the necessary building blocks to realize your neural network.

In [None]:
import numpy as npimport matplotlib.pyplot as plt
from tensorflow.keras import modelsfrom tensorflow.keras import layersfrom tensorflow.keras.utils import to_categorical             # to be able to represent classes in a different way
from tensorflow.keras.datasets import mnist

<div>
    <font color=#690027 markdown="1">   
<h2>1. Reading in the data</h2>    </font>
</div>

The MNIST data consists of a pair. Moreover, each of the two elements of the pair is itself a pair.
The data are structured as follows:- (trainingdata, corresponding categories) in the first element;- (testdata, corresponding categories) in the second element.
You load the dataset with the command `mnist.load_data()`. <br>`load_data()` is a function of `mnist`. <br>In the meantime, you name four variables each referring to a particular type of data in the dataset.

In [None]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()      # elements of tuples get the correct name

### Assignment 1.1- How many elements does the object referred to by `train_images` contain?- How many elements does the object to which `test_labels` refers contain?

Answer:

Check your answer using the following code cell.

In [None]:
print(len(train_images))         # number of points in training setprint(len(train_labels))         # number of labels in training setprint(len(test_images))          # number of points in test setprint(len(test_labels))          # number of labels in test set

### Assignment 1.2What does the output of the following code cells mean?

In [None]:
train_images.shape

In [None]:
train_labels.shape

Answer:

<div>
    <font color=#690027 markdown="1">   
<h2>2. Viewing the data</h2>    </font>
</div>

In [None]:
image1 = train_images[4]image2 = train_images[100]label1 = train_labels[4]label2 = train_labels[100]

In [None]:
# labelsprint(label1, label2)
# imagesplt.figure()
plt.subplot(1,2,1)plt.imshow(image1, cmap="gray")plt.subplot(1,2,2)plt.imshow(image2, cmap="gray")
plt.show()

In [None]:
print(image1.shape)print(image1)

### Assignment 2.1Ask for the largest and smallest number in this matrix and the type of the numbers.

In [None]:
image1.dtype

In [None]:
print(np.min(image1), np.max(image1))

Answer:

<div>
    <font color=#690027 markdown="1">   
<h2>3. Building the Neural Network</h2>    </font>
</div>

<div>
    <font color=#690027 markdown="1">   
<h3>3.1 Architecture of the neural network</h3>    </font>
</div>

Your network model is a *Sequential model* consisting of chained layers: an *input layer*, an *output layer*, and in between them one *hidden layer*. <br>You use *dense layers*. That means that they are *fully connected* layers: the neurons in a certain layer are connected to all the neurons in the previous layer.<br>In each layer, you have to choose the number of output neurons for that layer. <br>For the output layer, that's certain: since there are ten classes, the model must represent for each class how certain it is of that particular class. So you have 10 neurons there. <br> For the hidden layer, you can experiment a bit with the number of neurons and compare the performances of the network. <br>
The model must receive the *input* in the form of a tensor consisting of *vectors*. In addition, the model needs to know how many elements each data point, so each vector, in that tensor contains. This is provided with the first layer through the `input_dim` parameter.<br>This should not be passed on with the following layers, as the number of elements is then automatically fixed by the mathematical operations that will happen.
In the hidden layer and in the output layer, an activation function is also applied after the linear transformations, determined by the *weights* chosen by the network. Which *activation function* that is, you have to define. In most modern networks, 'ReLU' is chosen. The activation function in the output layer is actually determined by the type of problem. Since you have a classification problem here with more than two classes, the activation function is the 'Softmax activation function'.
To complete the architecture, you still need to choose a *loss* function and an *optimizer*. The loss function is used to determine how much the model deviates from the labels. The total error on this will be minimized with the help of the optimizer.<br>Finally, you choose another *metrics* which you can use to determine the *performance* of the model. Here you choose 'accuracy', the percentage of data points that are assigned to the correct category.

In [None]:
# network architecturenetwork = models.Sequential()                                          # 'Sequential model' consists of linked layersnetwork.add(layers.Dense(15, activation="relu", input_dim= 28 * 28))   # hidden layer: 15 neurons, activation functionnetwork.add(layers.Dense(10, activation="softmax"))                    # output layer: 10 output neurons, activation functionnetwork.compile(optimizer="sgd",                loss="categorical_crossentropy",                metrics=["accuracy"])                                  # choose optimizer, loss and metrics

<div>
    <font color=#690027 markdown="1">   
<h3>3.2 Training the neural network</h3>    </font>
</div>

For this, you asked for the size of the training images via `train_images.shape`. These data points are matrices. However, these data points must be offered as vectors.

The dimension of each vector is fixed in the architecture of the network. So, each data point must be transformed into a vector with dimension 784.
The 60,000 28x28 matrices must therefore be transformed into vectors, or in other words, you have to transform the training set from a composition of 60,000 28x28 matrices into a composition of 60,000 vectors.
Moreover, it is better to *normalize* the values of the images.

<div class="alert alert-block alert-warning"> 
More explanation about normalization can be found in this learning path under 'Standardizing'.</div>

In [None]:
# preparing the dataset
# training set from stack of 60,000 28x28 matrices to stack of 60,000 vectorstrain_images = train_images.reshape((60000, 28 * 28))train_images = train_images.astype("float32") / 255            # rescale elements to interval [0,1] instead of [0,255]
# test set from stack of 10,000 28x28 matrices to stack of 10,000 vectorstest_images = test_images.reshape((10000, 28 * 28))test_images = test_images.astype("float32") / 255
# save labels in another form e.g. 0 becomes 1000000000, 1 becomes 0100000000, ...# 7 becomes 00000001000 ... so a 1 in the position with index 7 (you start counting from index 0)train_labels = to_categorical(train_labels)test_labels = to_categorical(test_labels)

In [None]:
# training with the `fit` method of the network, i.e. matching images and labels together# 5 epochs, so go through the training set 5 times# always use 128 images at a time to apply the optimizer to, e.g. average derivative over 128 data pointshistory = network.fit(train_images, train_labels, epochs=5, batch_size=128)
loss = history.history["loss"]      # save the value of the loss function in a list after each epoch during trainingacc = history.history["accuracy"]   # store the value of the loss function in a list after each epoch during trainingepochs = range (1 , len(loss)+1)    # numbering epochs from 1 to the number of epochs

In [None]:
print("loss =", loss)print("acc =", acc)print("epochs =", epochs)

Do you see that the loss decreases and the accuracy increases?

In [None]:
"family": "serif","color": "black","weight": "normal","size": 16,It appears there is no text in the input to translate. Please provide the text that needs to be translated.plt.figure()
plt.plot(epochs, acc, "o", color="blue", label="accuracy")plt.plot(epochs, loss, "o", color="green", label="loss")plt.legend(loc="lower left")
plt.show()

Assess the accuracy of the network after training. Is it decent on the training set? Do you find the error large or not?

<div>
    <font color=#690027 markdown="1">   
<h3>3.3 Performance of the model</h3>    </font>
</div>

To know how good the model is, you need to know how well it performs on the test data.<br> Just because the model performs well on the training data doesn't mean it also performs well on unseen data. So you'll check what the loss and accuracy are on the test data.

In [None]:
test_loss, test_acc = network.evaluate(test_images, test_labels)

In [None]:
print("test_acc:", test_acc)

The accuracy on the test set is even better than on the training set. That means that the model generalizes well.

By executing the following code cell, you test the model on a data point from the training data (resume `image1` from the beginning of the notebook). First, it ensures that you are working with the correct format.

In [None]:
# preparing datapointexample = train_images[4].reshape((1, 28 * 28))# testingnetwork.predict(example)

What you get here is an array that indicates how certain the model is that the given data point is a 0, a 1, a 2, etc., in that order. These certainties are expressed in percent.

Fill in and remove what does not fit:
The model is most certain that it is a .... <br>That certainty is .... <br>Correct/Incorrectly classified!

<div class="alert alert-box alert-info">
The difference between the <em>training accuracy</em> and the <em>test accuracy</em> is important. If the training accuracy is greater than the test accuracy, it's referred to as <em>overfitting</em>: the fact that the model will perform worse on new data than on the training data.</div>

<div>
    <font color=#690027 markdown="1">   
<h3>3.4 Testing model on unseen data</h3>    </font>
</div>

Can the model also recognize handwritten numbers? Try it out.

In [None]:
seven = np.loadtxt("data/seven.dat")four = np.loadtxt("data/four.dat")two = np.loadtxt("data/two.dat")

In [None]:
plt.figure()
plt.subplot(1,3,1)plt.imshow(seven, cmap="gray")plt.subplot(1,3,2)plt.imshow(four, cmap="gray")plt.subplot(1,3,3)plt.imshow(two, cmap="gray")
plt.show()

Do these figures sufficiently resemble those of the MNIST dataset?Why is that important?

Answer:

In [None]:
print(seven.shape, two.shape, four.shape)

In [None]:
print(seven)print(zeven.dtype)

The data points take the form of matrices.

See how the model performs on these numbers.

In [None]:
# prepare dataseven = seven.reshape((1, 28 * 28))              # transform into tensor that contains 1 vectorfour = four.reshape((1, 28 * 28))two = two.reshape((1, 28 * 28))# print(new format)print(seven.shape, two.shape, four.shape)

In [None]:
network.predict(seven)

In [None]:
network.predict(four)

In [None]:
network.predict(two)

How does the model perform on these self-written numbers?

Answer:

### Assignment 3.1Write down some numbers yourself and test whether the model reads your handwriting correctly!<br>

<div class="alert alert-block alert-warning">
In the section 'From jpg to npy' in the learning path 'Digital images', you can read more about how to convert your images to the desired format.</div>

Decision:

<div>
    <font color=#690027 markdown="1">   
<h2>4. Looking for a better model</h2>    </font>
</div>

### Assignment 4.1Adjust the number of neurons and the number of epochs in the network architecture to improve the performance of the network. <br>*You do need to have Python execute the instructions from the notebook again from the beginning.*
Who in the class achieves the best accuracy?

The best accuracy that your model achieves is ........ for the training set and ........ for the test set.

### Assignment 4.2Test your model on your own numbers.<br>

Tip: avoid *overfitting*.

<div>
<h2>Reference List</h2></div>

[1] Chollet, F. (2018). *Deep learning with Python*. Manning publications.<br>[2] Getting started with the Keras Sequential model. Consulted on September 25, 2019 via https://keras.io/getting-started/sequential-model-guide/.

<div>
<h2>With support from</h2></div>

<img src="images/kikssteun2.png" alt="Banner" width="1100"/>

<img src="images/cclic.png" alt="Banner" align="left" width:="100"/><br><br>
Notebook KIKS, see <a href="http://www.aiopschool.be">AI at School</a>, by F. Wyffels & N. Gesquière is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.