# Perceptron - MNIST

For the practical part of this session, we will use so-called Jupyter notebooks. Each notebook is an user-friendly way to write and manage code in the programming language Python. If you are unfamiliar with Jupyter notebooks, this is a concise explanation:

- Each "cell" of the notebook (a rectangular area) contains either explanatory text ("Markdown," like this cell) or commands ("Code"). By clicking on this cell, you select it (and the contours are highlighted). 


- If you press "__Run__" in the menu, Jupyter processes the contents of this cell and moves on to the next. Scroll to the next cell, read the command and press __Run__ again. The result of the command (if any) will become visible. 

Just proceed through the notebook in this fashion, and return to previous cells, whenever necessary (either to re-read an explanation or command, or to change pieces of code when requested). Please note that if you want to restart the entire notebook, you have to start at the top.

Let's start with importing the required technical plugins.

In [None]:
import os, cv2, keras
import numpy as np
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import matplotlib.pyplot as plt
%matplotlib inline
from keras import models
from keras import layers
from keras.utils import to_categorical
keras.initializers.RandomNormal(seed=42)

This notebook introduces the training of 10 perceptrons with KERAS, a publicly available deep-learning tookit. Briefly explained, a perceptron is a simple neural network that has $I$ inputs. All $I$ inputs are connected to a single output. The connections between $I$ and the outputs, which are called the weights, are trained to ensure that the right output is activated whenever the inputs contain certain patterns.

In our example, we train 10 perceptrons simultaneously on the same input to recognize handwritten digits '0' to '9'. Each perceptron takes care of detecting one of the digits; their output should become activated whenever one of the digits '0' to '9' appears in the input. With each output generating a probability, the output with the largest probability given a certain input is the classification of the input.

<img src="images/Perceptron.png" alt="Perceptron visualization" style="width: 400px;"/>

>_Illustration of the 10 combined perceptrons._

The perceptrons are trained on the broadly-used Modified National Institute of Standards and Technology (MNIST) dataset of handwritten digits. Each instance of the dataset is a 28 x 28 pixel grayscale image, assigned to one of the 10 available targets/labels (i.e., '0', '1', .... '9'). 

The 28 x 28 pixel values are 'flattened', resulting in one row of 28*28 = 784 inputs. Hence, in our neural network, the number of inputs $I$ is equal to 784.

The MNIST dataset, containing the instances and their associated labels, is provided in both a training dataset and a test dataset. A general guideline during experiments with neural networks is to separate part of your available data for validation purposes. This way, we are able to evaluate the performance of the neural network on unseen data in a relatively objective manner.

In [None]:
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

print('MNIST imported.')

The perceptrons are trained on the training instances by presenting them together with their labels to the network. Below, we examine the shape of the training instances.

In [None]:
train_images.shape

We can see that there are 60000 training instances, each with a resolution of 28 by 28 pixels. Let's visualise the first 8 instances.

In [None]:
plt.subplot(241)
plt.imshow(train_images[10].reshape(28,28), cmap=plt.get_cmap('gray'))
plt.subplot(242)
plt.imshow(train_images[1].reshape(28,28), cmap=plt.get_cmap('gray'))
plt.subplot(243)
plt.imshow(train_images[2].reshape(28,28), cmap=plt.get_cmap('gray'))
plt.subplot(244)
plt.imshow(train_images[3].reshape(28,28), cmap=plt.get_cmap('gray'))
plt.subplot(245)
plt.imshow(train_images[4].reshape(28,28), cmap=plt.get_cmap('gray'))
plt.subplot(246)
plt.imshow(train_images[5].reshape(28,28), cmap=plt.get_cmap('gray'))
plt.subplot(247)
plt.imshow(train_images[6].reshape(28,28), cmap=plt.get_cmap('gray'))
plt.subplot(248)
plt.imshow(train_images[7].reshape(28,28), cmap=plt.get_cmap('gray'))
# show the plot
plt.show()

We now turn to the training labels. Their number should equal the number of images (60000).

In [None]:
len(train_labels)

What are the unique labels? These should be '0' to '9'.

In [None]:
print(np.unique(train_labels))

Are the labels (more-or-less) balanced? Are there about as many instances of each class?

In [None]:
plt.hist(train_labels)

plt.title("Train Label Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.xticks(np.arange(10))
fig = plt.gcf()

Let's have a look at the same aspects for the test instances:

In [None]:
test_images.shape

In [None]:
print(np.unique(test_labels))

In [None]:
plt.hist(test_labels)

plt.title("Test Label Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.xticks(np.arange(10))
fig = plt.gcf()

__=====================================================================================================================__
<b><center>Pause Here</center></b>
__=====================================================================================================================__

The core building block of neural networks is the "layer", a data-processing module which you can conceive as a "filter" for data. Some 
data comes in, is processed, and comes out in a more useful form. Precisely, layers extract _representations_ out of the data fed into them -- hopefully 
representations that are more meaningful for the problem at hand. Most of deep learning really consists of chaining together layers 
which will implement a form of progressive "data distillation". A deep learning model is like a sieve for data processing, made of a 
succession of increasingly refined data filters -- the "layers".

Let's build our neural network. It consists of:

- An input layer of $28$ x $28$ elements (each pixel value is an input).


- The output layer which has $10$ outputs, one for each class. It will return ten probability scores; each score will be the probability that the current digit image belongs to one of the ten available digit classes.

In [None]:
network = models.Sequential()
network.add(layers.Dense(10, use_bias='true', activation='softmax', input_shape=(28 * 28,)))

print('Layers added.')

Our workflow will be as follows: first we will present the training dataset, consisting of `train_images` and `train_labels`, to the neural network. The perceptrons will then learn to associate images and labels. Then, we will ask the network to produce predictions for `test_images`, and we will verify if these predictions are matching the labels from `test_labels`.

To make our network ready for training, we need to pick three more things, as part of the "compilation" step:

* An optimizer: this is the mechanism through which the network will update itself based on the data it sees and its loss function.


* A loss function: this is how the network will be able to measure how good a job it is doing on its training data, and thus how it will be able to steer itself in the right direction.


* Metrics to monitor during training and testing. In this case we will only care about accuracy (the fraction of the images that were correctly classified).

In [None]:
keras.optimizers.SGD(lr=0.01)

network.compile(optimizer='sgd',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

print('Network compiled.')

Before starting the actual training, we will preprocess our data by reshaping it into the format that the network expects (the 'flattened' representation containing 784 inputs). Furthermore, we convert the labels to a categorical format.

In [None]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

print('Preprocessing completed.')

We are now ready to train our network, which is done via a call to the `fit` method of the network: 
we "fit" the model to its training data. The number of `epochs` represents the number of times the network iterates over all the training data.
The `batch_size` specifies the number of instances which is used to update the weights of the network. A higher batch size results in a much faster neural network, but reduces the network precision.

__Bonus: Experiment with several batch sizes and epoch values to determine how it affects the duration and accuracy of training.__

In [None]:
network.fit(train_images, train_labels, epochs=20, batch_size=20)

Two quantities are being displayed during training; the `loss` of the network over the training data, and `acc`, the accuracy of the network over the training data:

- The loss should be as small as possible (it is a measure of the error the network makes on classifying instances). 


- The accuracy is the percentage correctly classified instances for the __training__ data. Please note that this number should be sufficiently high, but does not reveal anything about prediction, but instead specifies the degree of replication. 

In order to know how well the trained perceptrons generalize, we have to determine their performance on the test data.

In [None]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

__=====================================================================================================================__
<b><center>Pause Here</center></b>
__=====================================================================================================================__

An interesting feature of a perceptron (and other types of neural networks) trained on images, is that we can visualise its internal weights after training. 

The output for each perceptron can be reshaped back into an 28 x 28 matrix, which allows us to visually interpret the internal patterns for each of the output perceptrons.

In [None]:
W1 = network.layers[0].get_weights()[0]
WW = W1.reshape(28,28,10)

plt.subplot(2,5,1)
plt.imshow(WW[:,:,0], cmap=plt.get_cmap('gray'))
plt.title('output 0')
plt.subplot(2,5,2)
plt.imshow(WW[:,:,1], cmap=plt.get_cmap('gray'))
plt.title('output 1')
plt.subplot(2,5,3)
plt.imshow(WW[:,:,2], cmap=plt.get_cmap('gray'))
plt.title('output 2')
plt.subplot(2,5,4)
plt.imshow(WW[:,:,3], cmap=plt.get_cmap('gray'))
plt.title('output 3')
plt.subplot(2,5,5)
plt.imshow(WW[:,:,4], cmap=plt.get_cmap('gray'))
plt.title('output 4')
plt.subplot(2,5,6)
plt.imshow(WW[:,:,5], cmap=plt.get_cmap('gray'))
plt.title('output 5')
plt.subplot(2,5,7)
plt.imshow(WW[:,:,6], cmap=plt.get_cmap('gray'))
plt.title('output 6')
plt.subplot(2,5,8)
plt.imshow(WW[:,:,7], cmap=plt.get_cmap('gray'))
plt.title('output 7')
plt.subplot(2,5,9)
plt.imshow(WW[:,:,8], cmap=plt.get_cmap('gray'))
plt.title('output 8')
plt.subplot(2,5,10)
plt.imshow(WW[:,:,9], cmap=plt.get_cmap('gray'))
plt.title('output 9')
# show the plot
plt.show()

__Bonus: What do you observe when inspecting the 10 visualizations? Can you explain why each of the perceptrons show their specific pattern?__

Now that we have trained our neural network, we are able to make predictions:

- Write down a single number on a sheet of paper, and take a picture of it with your smartphone.


- Using your smartphone, generate a square cut-out of the picture containing the number.


- Upload your image to the folder in which this notebook is located.

Once you've uploaded your photo, please change the following variable `file` so that it contains the same filename as your photo. If you are not able to take a picture yourself, you can use an example photo by typing `images/img.jpg`:

In [None]:
file = 'images/img.jpg'
print('Filename updated.')

If everything went correctly, we can visualize our uploaded photo:

In [None]:
img = cv2.imread(file,0)
img = cv2.resize(img, (28, 28))
img = (255-img)
img = np.reshape(img, [1, 28 * 28])
plt.imshow(img[0].reshape(28,28), cmap=plt.get_cmap('gray'))
plt.show

This should show significant resemblance to the visualizations of the MNIST training dataset, shown earlier in this notebook. Lastly, we provide the network with our input image, and let it generate a prediction:

In [None]:
print("The perceptron-based neural network predicts:", (np.argmax(network.predict(img))))

Did the network correctly predict your number?

__=====================================================================================================================__
<b><center>Pause Here</center></b>
__=====================================================================================================================__

Don't be surprised if the network is not able to correctly predict your image. The perceptrons we used so far are one of the most basic deep learning solutions available. 

Let's try a more advanced neural network architecture. We will have to import and process the data, build a new network, and train this new network again in order to make a new prediction. Note that this neural network takes more time to train compared to our previous example, due to its increased complexity.

The theoretical background of the type of neural network used below (a convolutional neural network) goes beyond the scope of this session.

In [None]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

test_loss, test_acc = model.evaluate(test_images, test_labels)

print('test_acc:', test_acc)

While our perceptron-based neural network had an accuracy of approximately 92%, our advanced neural network architecture achieves an accuracy score of nearly 99% !

Now we are able to generate a new prediction.

In [None]:
img2 = cv2.imread(file,0)
img2 = cv2.resize(img2, (28, 28))
img2 = (255-img2)
img2 = np.reshape(img2, [1, 28, 28, 1])

print("The advanced neural network architecture predicts:", (np.argmax(model.predict(img2))))

This interactive Python Notebook is based on https://github.com/fchollet/deep-learning-with-python-notebooks.