**If you run this notebook in GOOGLE COLAB, you have to uncomment and run the following commands once!**

In [None]:
# from google.colab import drive
# import sys
# drive.mount('/content/gdrive')
# !git clone https://github.com/BenBol/MLE-School.git
# sys.path.append('MLE-School')

![MNIST Beispiel](03_Images/SummerSchool22.png)

If this image (and the following) is not displayed, adjust the path with `'MLE-School/...'`.

# Classification of handwritten numbers

"Moin" and welcome to this interactive session in which we will use neural networks to identify handwritten letters and numbers.

**Introduction:**
- You should modify the marked parts of the program and you can execute the function section directly. To do this, add the code between ### start ### and ### end ###.
- To execute a function section, you should select it and then press `STRG` + `ENTER`.
- Python 3 is used for this course

**Motivation for neural networks:**

The more data is available, the greater is the advantage of neural networks over traditional machine learning approaches. 

- Rapid increase in the number
    - networked devices and
    - generated data (in the multiple zeta-bite range per year). 
    - Size of the data (e.g. accelerators / huge amounts of data that have to be pre-filtered)

## Introduction 

There are many ways to train neural networks with open source libraries. In Python, Sciki-Learn is an entry-level variant.<br> 
More possibility in the design of neural networks is offered by [TensorFlow](https://www.tensorflow.org), which we will use in the following.
<br><br>
[TensorFlow](https://www.tensorflow.org) is an easy-to-use but powerful deep learning library for Python. In this course, we will, create a feed-forward neural network and train it to recognize handwritten characters.  <br>

The dataset we will use for this task is also publicly available and is called [MNIST](http://yann.lecun.com/exdb/mnist/).<br><br>
![MNIST Example](03_Images/Beispiel1.png)


## Import of relevant Libraries

At the beginning of each project the most important packages are imported, which are needed in the further course. This includes [Matplotlib](https://matplotlib.org) to generate graphics, [Numpy](https://numpy.org) to perform vector calculations and of course the [Keras](https://keras.io) package from [TensorFlow](https://www.tensorflow.org) to create the neural networks.

**Task:** Run the cell.

In [None]:
# import for vector calculation
import numpy as np

# import for graphics
import matplotlib.pyplot as plt

# tensorflow for neural network creation
from tensorflow import keras

Additionally we provide small helper functions, which are located in the Python file: `Additional_Functions.py`.

**Task:** import all functions by executing the cell

In [None]:
from Additional_Functions import *

## Import the dataset

The [MNIST](http://yann.lecun.com/exdb/mnist/) dataset is generated from the much larger [NIST](https://www.nist.gov/srd/nist-special-database-19) dataset from 1995.   For this purpose, original black and white images of the numbers 0-9 were normalized to 20x20 pixels. (Hence, slight gray scales are visible).
The images were then centered as image of 28x28 pixles by calculating the center of mass of the pixels and moving the image so that this point was in the center of the 28x28 field.

The dataset is stored in the folder `./01_Dataset/mnist.npz` and can be loaded with the function ``mnist_data, mnist_label = load_data(path)``.

**Task**: Load the dataset as described above.<br>
**Reminder**: Fill the blancs between `### start ###` and `### end ###` and change `./01_Dataset/mnist.npz` to `'MLE-School/01_Datensatz/mnist.npz'` when using **COLAB**.

In [None]:
### start ####
path = ...
mnist_data, mnist_label = ...
### end ###

When loading the dataset, two NumPy arrays are returned.</br>
* The variable `mnist_data` contains the individual images. 
* The varialbe `mnist_label` consists of the numeric digits.

## Explore the data

Let us first take a closer look at the data set together.
* With `type(variable)` the type of the variable can be displayed. 
* and `len(variable)` gives the number of entries

**Task**: Test both commands to determine the type and length of both variables and check if the number of labels corresponds to the number of data points. (Reminder: with the `print(...)` command multiple outputs of a cell can be printed)

In [None]:
### start ####
...
...
### end ###

The Numpy library has implemented the `variable.shape` attribute, which gives more precise information about the dimensions of the vectors.<br>
**Task:** Use the `shape` attribute for both variables.

In [None]:
### start ####
dim_data = ...
dim_label = ...
print('Dimensions dataset: {}\nDimensions label: {}'.format(dim_data, dim_label))
### end ###

As expected, the data set consists of 70000 digits, each with 28x28 pixel and the corresponding number in the label vector.

## Illustrate the individual numbers

To plot the individual example, try the Matplotlib function `plt.imshow(dataset)`. To get a single number from the data set, it can be selected by the "slicing" with `dataset[index]`.<br>
**Task**: Plot some numbers for the `mnist_data`.

In [None]:
### start ####
...
### end ####

We provide here a function to plot multiple random images.<br>
**Task**: use the function `plot_numbers(n_rows, n_columns, data, label)` to plot a grid of 4x5 numbers.<br>
**Note**:  Since the numbers are shown randomly, you can run this function more than once.

In [None]:
### start ###
...
### end ###

In the following, the color values of the numbers are examined more closely. For this purpose, a random number and additionally a histogram are displayed, in which the distribution of the color values is shown. <br><br>This line can also be executed more often to view different numbers.

In [None]:
index = plot_numbers(1,1,mnist_data, mnist_label)
plt.hist(mnist_data[index].reshape(784), bins=20)
plt.title("Distribution of pixel values")
plt.show()

As expected, most pixels have an intensity of zero with the exception of the dark number.
Black is defined as 255, as usual. A few pixels have a gray value, which results from the anti-aliasing procedure for scaling.


## Number of individual numbers 

Furthermore, it is interesting to see how many images of each number are present in the data set. For this purpose, the frequency of the respective classes is shown below:

In [None]:
Number, frequency_in_dataset = np.unique(mnist_label, return_counts=True)
plt.bar(Number, frequency_in_dataset)
plt.xlabel('Number in the image')
plt.ylabel('Quantity in the data set')

As can be seen, not all numbers are present with the same frequency. In case of strong deviations, this could be considered further. For this exercise, this is not necessary.

## Preparation of the data

### Dataset

For the training of neural networks with *dense layers* it is advantageous if the images are not in the form of a 28x28 vector but in the form of a 784x1 vector.

With the function `Vector_new = Vector.reshape((b,c))` a matrix with the dimensions `Vector.shape == 'a,b'` can be transformed into `Vector_new.shape == 'b,c'`.
Of course, the dimensions must fit.
<img src="03_Images/Reshape.svg" width=350 /><br>
**Task:** Reshape the dataset `mnist_data` with dimensions `70000 x 28 x 28` to dimensions `70000 x 784`.

In [None]:
### start ####
mnist_reshape = ...
### end ###

print('New dimensions: {}'.format(mnist_reshape.shape))

### Normalisation

Typically, the weights between the neurons of a neural network are initiated in a range between zero and one.  
Now, if values between 0-255 (from our dataset so far) are given to the ANN as input, it must adjust to the range in the first steps of training. While this is possible, it unnecessarily increases training time and may make the neural network less reliable.

**Task** Normalize the color values from the dataset.

In [None]:
# Normalization of the color values
### start ###
mnist_normalised = mnist_reshape / ...
### end ###

### Label

The output of our neural network is distinguishing between the ten different classes.
This could be trained using a neuron with a linear activation function.
However, it has been shown that classification works much better with 10 output neurons with logistic activation functions.
Or even a **softmax** layer, where the output is normalized to a probability distribution over predicted output classes, based on [Luce's choice axiom](https://en.wikipedia.org/wiki/Luce%27s_choice_axiom)<br><br>
In this example, the network is trained to generate the highest output at the neuron of the associated class.
So the labels must be adapted with the so-called [One-Hot-Encoding](https://en.wikipedia.org/wiki/One-hot) in which a vector with integers becomes a matrix with zeros and ones.
This is exemplary shown in the following picture.<br>

<img src="03_Images/To_Categorical.svg" width=500 />


In [None]:
# Transform to a one-hot matrix
label_one_hot = keras.utils.to_categorical(mnist_label, dtype ="bool") # Data type: Boolean. What is it? Why are we using it?

#Display of the procedure
[print(mnist_label[i], label_one_hot[i]) for i in range(10)]
print('\nDimensions of the new vector:',label_one_hot.shape)

## Splitting into test and training data

Finally, it is still necessary to divide the dataset into a test and an evaluation dataset. Our algorithm is trained on the training dataset. Then, the evaluation dataset is predicted with the adapted neural network to evaluate the algorithm.<br><br>

For this the already existing function `train_test_split` of `Scikit-Learn` is first imported and executed.<br>
The data set and the labels are specified as arguments of the function. Afterwards it is defined how many percent of the data are selected pseudo-randomly into the test data set. Furthermore, it is possible to define by specifying a `random_state` that the same numbers are selected each time this function is executed.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_evaluation, y_train, y_evaluation = train_test_split(mnist_normalised, 
                                                    label_one_hot, 
                                                    test_size=0.1, 
                                                    random_state=42)

print('Dimensions of training data  :', X_train.shape)
print('Dimensions of test data      :', X_evaluation.shape)
print('Dimensions of training labels:', y_train.shape)
print('Dimensions of test labels    :', y_evaluation.shape)

## Definition of the neural network

### Network Architecture

The data is now prepared and the training of the neural network can begin. In the first experiment we will use a neural network with 64 neurons in the first layer and 32 neurons in the second layer, similar to the network shown. As the last layer we will use a layer with 10 neurons, so that each neuron indicates the class membership to one of the 10 classes. To evaluate this affiliation, we use a `softmax` activation function. 

<img src="03_Images/ANN_MNIST.svg" width=500 />

The *Keras* library of Tensorflow was already imported in the first cell and can now be used. 
With `keras.Sequential(...)` a sequential structure of the network is defined and any number of layers and network architectures can be defined between the brackets. To create a feed-forward network, which we have already described mathematically in the slides, we use a sequence of **Dense** layers. **Dense** layers define that each neuron of the previous layer is connected to each neuron of this layer via weights. These layers are defined with `keras.layer.Dense(n_neurons, activation='activation')`.<br>

**Task** Define a sequential network with three layers, with `[64, 32, 10]` neurons and the activation functions `['relu', 'relu', 'softmax']`.

In [None]:
# Definition of the model
model = keras.Sequential(
    [
        ### start ###
        keras.layers.Dense(..., activation='...',input_shape=(784,)),
        keras.layers.Dense(..., activation='...'),
        keras.layers.Dense(..., activation='...')
        ### end ###
    ]
)
model.summary()

### Compilation of the model

Before we can start training, we need to configure the training process. This is done by defining the **three key factors** for the compilation step:<br>

**The optimizer** We stick to a pretty good standard: the gradient-based optimizer called [`adam`](https://arxiv.org/abs/1412.6980). 
Many other [optimization algorithms](https://keras.io/losses/) are already implemented in Keras, which you can also take a look at.<br>

**The metric**. Keras also implements [many metrics](https://keras.io/api/metrics/) to evaluate the algorithm. Since this is a classification problem, we use the output value of Keras to determine the accuracy (`accuracy`).<br>

**The loss function**. Since we are using a softmax output layer, we will use the cross-entropy loss. Keras distinguishes between `binary_crossentropy` (2 classes) and `categorical_crossentropy` (>2 classes). Thus, we choose the latter.
Further loss functions are listed at [Keras loss functions](https://keras.io/losses/). <br><br>

**Task:** Define the compilation step of the neural network.


In [None]:
# Compilation of the sequential model
### start ###
model.compile(loss='...', metrics=['...'], optimizer='...')
### end ###

### Training des Modells

Training a model in **Keras** consists only of running `model.fit()`. There are still some possible parameters here, but we will only specify four manually:

**The training data** (images and label), commonly known as `X_train`,`y_train`.<br>
**The number of epochs** (iterations over the entire data set) for which to train. we start with `20` epochs.<br>
**The batch size** (number of images per gradient update) to be used during training. `1024` or `2048` are valid  values.But almost any other value is also possible.<br>
**The test data split**: We have already divided our data into training and evaluation data. With the test data split, we now define that the algorithm for evaluating the training, randomly selects 20% from the training data. Thus, we keep the evaluation data set for objective evaluation of the algorithm.
<br><br>

**Task:** Complete the information and start the training.


In [None]:
# Training of the model
### start ###
history = model.fit(X..., 
                    y..., 
                    epochs=...,
                    batch_size=..., 
                    validation_split=0.2)
### end ###


# Saving the network
path_to_model = 'keras_mnist.h5'
model.save(path_to_model)
print('The model was saved as: "%s"\n' % path_to_model)

When the training starts, the time per step and the current values for the *loss* and *accuracy* are displayed for the test and training set respectively. So it is easy to follow the progress of the training already during the training and to notice possible errors early. 

The algorithm is now trained and the model has been saved for further use with the command:`model.save('path')`.

### Evaluation of the training

Now we will display the training history. For this purpose we have written the training progress in the variable `history` in the previous step.

In [None]:
# load of the network
path_to_model = 'keras_mnist.h5'
model = keras.models.load_model(path_to_model)
print('The model: "%s" was loaded\n' % path_to_model)

# Display of the training progress - precision
plt.plot(history.history['accuracy'])# depending on Keras version also 'acc
plt.plot(history.history['val_accuracy']) # depending on Keras version also 'val_acc'.
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(['Accuracy training set', 'Accuracy test set'], loc='lower right')

# Display of the training progress - Loss
fig = plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend(['Accuracy training set', 'Accuracy test set'], loc='lower right')

After a sharp increase in the first two epochs, the accuracy of the training set converges to 100% and the loss converges to zero. Thus, further training will likely continue to optimize both values. The values of the test data set, on the other hand, appear to have already reached near optimum. This could be due to the small size of the network.

### Evaluation on the test data

Now we can use the network to predict the part of the data that was split off for evaluation. This is made possible with the attribute `prediction = model.predict(X)` of the class `model`. As a result we get the output of the Softmax layer. An example is shown with the code of the cell under the task. <br>
**Task:** Check the trained network with the `X_evaluation` dataset and change the index to test different results. 

In [None]:
### start ###
prediction = ...
### end ###

index=15
plt.imshow(X_evaluation[index].reshape(28,28))
print(np.vstack((range(10),np.round(prediction[index],1))))

It is not necessary to check all values manually. With the function: `model.evaluate(X,y)` all predictions can be analyzed at once.


In [None]:
loss_and_metrics = model.evaluate(X_evaluation, y_evaluation)

print("Test Loss:     %.3f"% loss_and_metrics[0])
print("Test Accuracy: %.2f"% (loss_and_metrics[1]*100)+'%')

The result is already relatively good. More exciting than looking at **all** predictions is to display only the wrongly predicted ones.<br>
**Task:** Execute the following code several times to plot the wrong predictions. Note that you have to decide in which line `==` (equal) and `!=` (unequal) should be written.

In [None]:
# Reversal of one-hot
prediction_numeric = np.argmax(prediction, axis=1)
y_eval_numeric = np.argmax(y_evaluation, axis=1)

# Comparison of vectors
### start ###
incorrect_indices = np.where(prediction_numeric ... y_eval_numeric)[0]
correct_indices = np.where(prediction_numeric ... y_eval_numeric)[0]
### end ###

print(len(correct_indices)," correctly classified")
print(len(incorrect_indices)," misclassified")

def plot_incorrect_samples():
    # adapt figure size to accomodate 18 subplots

    figure_evaluation = plt.figure(figsize=(10,7.5))

    # plot 9 incorrect predictions
    for i, incorrect in enumerate(np.random.choice(incorrect_indices,12)):
        plt.subplot(3,4,i+1)
        plt.imshow(X_evaluation[incorrect].reshape(28,28))
        plt.title("Predicted {}, Truth: {}".format(prediction_numeric[incorrect], y_eval_numeric[incorrect]))
        plt.xticks([])
        plt.yticks([])
plot_incorrect_samples()

With some of these wrong predictions, it is understandable that they were not recognized. But most of the images were recognized correctly.

### Confusions Matrix

Scikit-Learn offers the possibility to generate a Confusions Matrix. In this matrix the predictions are compared with the true values. On the diagonal is the accuracy with which the respective number is predicted. Next to the diagonal is the percentage of incorrectly predicted values.

**Task:** Plot the confusion matrix with the function: `plot = plot_confusion_matrix(y_true, y_pred, normalize=True)`. In doing so, set the correct variables for `y_true` and `y_pred`. What can you deduce from this graph?

In [None]:
### start ###
...
### end ###

## Evaluation of the networks with a picture of numbers

We now have a model that can predict with a relatively high accuracy the numbers between 0-9.<br>
With a bit of image recognition, you can recognize patterns on a solid color background and process this picture as inputs to a neural network. <br>
For this purpose, we have written and provided a script for pattern recognition and subsequent classification. This script can be executed via the function `classify_image(path_to_image, path_to_model)`.<br> Here, `path_to_image` is the path to an image with text and `path_to_model` is the path to the model we just trained.

**Task** Evaluate your network with the pictures `'02_Test-images/n1.jpg'`,`'02_Test-images/n2.jpg'`and `'02_Test-Pictures/n3.jpg'`<br>
**Reminder** add the MLE-folder like: `'MLE-School/02_Test-images/n1.jpg'` when using **COLAB**

In [None]:
### start ###
...
...
...
### end ###

# Optimization of the network architecture

**Congratulations**, <br>
The foundation stone has been laid and you have trained an artificial neural network that is capable of recognizing written digits. Thus, you have laid the foundation for a program for text recognition, such as provided by most smartphones.<br><br>
**But** the system is not perfect yet and you can start now to optimize the parameters to further improve the prediction.

**Task:** Vary the parameters already introduced above and evaluate the impact on the precision of the result. The following headings will guide you through the process.

## Definition of network V2

For the network architecture, you should introduce two important changes. On the one hand, the number of layers and neurons can be increased and a regularization technique should be implemented. 

**Customize the architecture**<br>
Create a Sequential Network analogous to the first experiment and increase the number of layers and also neurons per layer. <br>**Attention** This significantly increases the training time. So compare the number of parameters that can be optimized before training.<br><br>
**Regularization** <br>
In order to train large networks, many methods have been developed to prevent over-fitting and to regularize the network.
In the training of neural networks, the weights are initialized randomly at the beginning. By the gradient descent method, these are now optimized. Here it can happen that the prediction of the network is based on a few weights, which were randomly in a good range at the beginning of the training. Thus, during the training steps, these weights become more and more "important" and many other weights and neurons are "deactivated". To prevent this, among other things, the method *DropOut* was developed, in which a random number of neurons are deactivated in each training step. Thus, the network is forced to develop redundancies and thus to generalize better. More robust networks are created. This method is easy to integrate by adding a line after the *Dense* layer with the command: `keras.layers.Dropout(p)`, where *p* is the proportion(0-1) of neurons that are randomly disabled per step. A common value is between 5-10%.
***Caution***: Do not insert dropout after the output layer. 

In [None]:
# Definition of the model
model = keras.Sequential(
    [
        ### start ###
        keras.layers.Dense(..., activation='...',input_shape=(784,)),
        ...
        keras.layers.Dense(10, activation='softmax')
        ### end ###
    ]
)
model.summary()

## Compilation V2

The initial *learning rate* should also be adjusted for the new network. This also influences the achievable result. The calculation of an optimum is relatively complex. In the following code a *learning rate* of $\eta$ = 0.005 was given, which fit well for our larger network. Depending on the architecture you have defined, this value may vary. 

In [None]:
### start ###
opt = keras.optimizers.Adam(learning_rate=...)
### end  ###Based on the NIST dataset, a dataset was created by [Kaggle](https://www.kaggle.com), which consists of 28x28 pixels like the MNIST dataset. So the workflow from preparation to training can be adopted here. If you are already done with optimizing the MNIST mesh, then use again the function: `az_data, az_label = load_data(az_path)`. The normalization of the data and the creation of the network can be done in the same way as in the previous task.
model.compile(loss='categorical_crossentropy',metrics=['accuracy'], optimizer=opt)

## Training process

Now follows the training process. Here, too, the number of training epochs can be adjusted/optimized.

In [None]:
### start ###
history = model.fit(X..., 
                    y..., 
                    epochs=...,
                    batch_size=..., 
                    validation_split=0.2)
### end ###

# Saving the network
path_to_model = 'keras_mnist_V2.h5'
model.save(path_to_model)
print('The model was saved as: "%s"\n' % path_to_model)

## Evaluation

Analogous to the first network, the following code saves the new network as `'keras_mnist_V2.h5'` and again gives the progression of precision and loss over the number of trained epochs. At the end, the evaluation is added to have all metrics at a glance. Execute the cell. 

In [None]:
# load of the network
path_to_model = 'keras_mnist.h5'
model = keras.models.load_model(path_to_model)
print('The model: "%s" was loaded\n' % path_to_model)

# Display of the training progress - precision
plt.plot(history.history['accuracy'])# depending on Keras version also 'acc
plt.plot(history.history['val_accuracy']) # depending on Keras version also 'val_acc'.
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(['Accuracy training set', 'Accuracy test set'], loc='lower right')

# Display of the training progress - Loss
fig = plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend(['Accuracy training set', 'Accuracy test set'], loc='lower right')

# Evaluation
loss_and_metrics = model.evaluate(X_evaluation, y_evaluation)

print("Test Loss:     %.3f"% loss_and_metrics[0])
print("Test Accuracy: %.2f"% (loss_and_metrics[1]*100)+'%')

Hopefully, the precision has increased over the first result. If not, you should change the previous values.

## Evaluation with the self-written images

Our own images can also be evaluated with this newly trained network. <br>
**Task:** To do this, copy the command from the one in **Section 1.10** and match the name of the saved model to the new saved mesh `'keras_mnist_V2.h5'`.

In [None]:
### start ###
...
### end ###

# Classification of letters from A-Z (addition)

Based on the NIST dataset, a dataset was created by [Kaggle](https://www.kaggle.com), which consists of 28x28 pixels like the MNIST dataset. So the workflow from preparation to training can be adopted here. If you are already done with optimizing the MNIST mesh, then use again the function: `az_data, az_label = load_data(az_path)`. The normalization of the data and the creation of the network can be done in the same way as in the previous task.
**Reminder** add the MLE-folder like: `'MLE-School/...` when using **COLAB**

In [None]:
### start ###
az_path = './01_Dataset/az_Data.npz'
...

## Explore the dataset

In contrast to the MNIST dataset, the distribution here is not very uniform. One possibility to adjust this would be the data augmentation. Here, the images are slightly rotated, shifted or scaled, for example, and are thus an unknown data point for the algorithm.

In [None]:
Number, frequency_in_dataset = np.unique(az_label.flatten(), return_counts=True)
plt.bar(Number, frequency_in_dataset)
plt.xticks(Number, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
plt.xlabel('Number in the image')
plt.ylabel('Quantity in the data set')

## Normalization of the data and training of the networks

Analogous to the previous examples, you can train an ANN on this dataset. For the evaluation we have also provided images:`'./02_Test-images/b1.jpg'`-`'./02_Test-images/b3.jpg'`.

**Reminder** add the MLE-folder like: `'MLE-School/02_Test-images/n1.jpg'` when using **COLAB**

In [None]:
### end ###

# What's next?

## Write a number line on the paper yourself

Here the scaling of the images is crucial, because the function 'Classify Images' uses the dimensions of the images to decide if the pattern found is too small or if it can be a letter.

## Use Convolutional Neural Networks

In recent years, Convolutional Neural Networks have emerged as state of the art, especially for image recognition. Without going into details, we have built a small CNN here. As you can see, the methodology of network definition is the same. Only now other types of layers are used. 

### CNN_mnist

In [None]:
# Square layout images
X_train_Q = X_train.reshape(-1,28,28,1)
X_eval_Q = X_evaluation.reshape(-1,28,28,1)

#  Definition of the Convolutional neural Network
model = keras.Sequential(
    [
    keras.layers.Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28,1)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'),
    keras.layers.Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Flatten(),
    keras.layers.Dense(100, activation='relu', kernel_initializer='he_uniform'),
    keras.layers.Dense(10, activation='softmax')
    ])
# compile model
opt = keras.optimizers.SGD(lr=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

# Trianing
model.fit(X_train_Q, y_train, epochs=10, batch_size=1024, verbose=1)


# save the network
path_to_model = 'keras_mnist_CNN.h5'
model.save(path_to_model)
print('The model was saved as: "%s"\n' % path_to_model)

# Display of the training progress - precision
plt.plot(history.history['accuracy'])# depending on Keras version also 'acc'
plt.plot(history.history['val_accuracy']) # depending on Keras version also 'val_acc'.
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(['Accuracy traing set', 'Accuracy test set'], loc='lower right')

# Display of the training progress - Loss
fig = plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend(['Loss training set', 'Loss test_set'], loc='upper right')

# Evaluation
loss_and_metrics = model.evaluate(X_eval_Q, y_evaluation)

print("Test Loss:     %.3f"% loss_and_metrics[0])
print("Test Accuracy: %.2f"% (loss_and_metrics[1]*100)+'%')

In [None]:
path_to_model = 'keras_mnist_CNN.h5'

classify_image('02_Test-images/n1.jpg', path_to_model, CNN=True)
classify_image('02_Test-images/n2.jpg', path_to_model, CNN=True)
classify_image('02_Test-images/n3.jpg', path_to_model, CNN=True)

### CNN_az

In [None]:
# Square layout images
X_train_Q_az = X_train_az.reshape(-1,28,28,1)
X_eval_Q_az = X_evaluation_az.reshape(-1,28,28,1)

#  Definition of the Convolutional neural Network
model = keras.Sequential(
    [
    keras.layers.Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28,1)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'),
    keras.layers.Conv2D(64, (3, 3), activation='relu', kernel_initializer='he_uniform'),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Flatten(),
    keras.layers.Dense(100, activation='relu', kernel_initializer='he_uniform'),
    keras.layers.Dense(36, activation='softmax')
    ])
# compile model
opt = keras.optimizers.SGD(lr=0.01, momentum=0.9)
model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])

#Training
model.fit(X_train_Q, y_train, epochs=10, batch_size=1024, verbose=1)

# save the network
path_to_model = 'keras_az_CNN.h5'
model.save(path_to_model)
print('The model was saved as: "%s"\n' % path_to_model)

# Display of the training progress - precision
plt.plot(history.history['accuracy'])# depending on Keras version also 'acc'
plt.plot(history.history['val_accuracy']) # depending on Keras version also 'val_acc'.
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend(['Accuracy traing set', 'Accuracy test set'], loc='lower right')

# Display of the training progress - Loss
fig = plt.figure()
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.legend(['Loss training set', 'Loss test_set'], loc='upper right')

# Evaluation
loss_and_metrics = model.evaluate(X_eval_Q, y_evaluation)

print("Test Loss:     %.3f"% loss_and_metrics[0])
print("Test Accuracy: %.2f"% (loss_and_metrics[1]*100)+'%')

In [None]:
classify_image('02_Test-images/b1.jpg', 'keras_az_CNN.h5', CNN=True)
classify_image('02_Test-images/b2.jpg', 'keras_az_CNN.h5', CNN=True)
classify_image('02_Test-images/b3.jpg', 'keras_az_CNN.h5', CNN=True)
classify_image('02_Test-images/b4.jpg', 'keras_az_CNN.h5', CNN=True)

## Explainable AI

### Deep_explainer

In [None]:
import shap
import numpy as np
import tensorflow as tf

# select a set of background examples to take an expectation over
background = X_train_Q[np.random.choice(X_train_Q.shape[0], 100, replace=False)]


path_to_model = 'keras_mnist_CNN.h5'
model = keras.models.load_model(path_to_model)

# explain predictions of the model on three images
e = shap.DeepExplainer(model, background)
# ...or pass tensors directly
# e = shap.DeepExplainer((model.layers[0].input, model.layers[-1].output), background)
shap_values = e.shap_values(X_eval_Q[:5])

# Appendix

## Cover Picture

In [None]:
classify_image('02_Test-images/MLE.jpg', 'keras_az_CNN.h5', True)
classify_image('02_Test-images/MLEz.jpg', 'keras_mnist_CNN.h5', True)

## Time measurements other algorithms

To evaluate the memory used, another package is needed. To install it, you have to change the following line to a code line and execute it once...

##### Modules Loading and Test Train Split

In [None]:
import time
from sklearn.model_selection import train_test_split
from sklearn import metrics
%load_ext memory_profiler

from sklearn.svm import SVC, NuSVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import ExtraTreeClassifier
from sklearn.neural_network import MLPClassifier

# Test Train
X_train, X_test, y_train, y_test = train_test_split(mnist_normalised, mnist_label.ravel(), test_size=0.1, random_state=42)

##### Support Vector Machine

In [None]:
#SVC
model = SVC(kernel='linear')

#training
%time %memit model.fit(X_train, y_train)

# predict
print('prediction')
%time %memit y_pred = model.predict(X_test)
# accuracy
print("accuracy:", metrics.accuracy_score(y_true=y_test, y_pred=y_pred), "\n")

del model

In [None]:
#NuSVC
model = NuSVC()

#training
%time %memit model.fit(X_train, y_train)

# predict
print('prediction')
%time %memit y_pred = model.predict(X_test)
# accuracy
print("accuracy:", metrics.accuracy_score(y_true=y_test, y_pred=y_pred), "\n")

del model

##### Decision Tree

In [None]:
# Decision Tree
model = DecisionTreeClassifier(random_state=0)

#training
%time %memit model.fit(X_train, y_train)

# predict
print('prediction')
%time %memit y_pred = model.predict(X_test)
# accuracy
print("\naccuracy:", metrics.accuracy_score(y_true=y_test, y_pred=y_pred), "\n")

del model

##### Extra Tree

In [None]:
# Extra Tree
model = ExtraTreeClassifier(random_state=0)

#training
%time %memit model.fit(X_train, y_train)

# predict
print('prediction')
%time %memit y_pred = model.predict(X_test)
# accuracy
print("\naccuracy:", metrics.accuracy_score(y_true=y_test, y_pred=y_pred), "\n")

del model

##### Random Forrest

In [None]:
# Random Forrest
model = RandomForestClassifier()

#training
%time %memit model.fit(X_train, y_train)

# predict
print('prediction')
%time %memit y_pred = model.predict(X_test)
# accuracy
print("\naccuracy:", metrics.accuracy_score(y_true=y_test, y_pred=y_pred), "\n")

del model

##### scikit learn MLP

In [None]:
# Neural Network
model = MLPClassifier(hidden_layer_sizes=(64,32,), max_iter=10, alpha=1e-4,
                    solver='sgd', random_state=1,
                    learning_rate_init=.1)

#training
start = time.time()
%time %memit model.fit(X_train, y_train)
print('actual time:', time.time()-start)
# predict
print('prediction')
%time %memit y_pred = model.predict(X_test)
# accuracy
print("\naccuracy:", metrics.accuracy_score(y_true=y_test, y_pred=y_pred), "\n")

del model

## MNIST_Images

In [None]:
fig, ax = plt.subplots(figsize=(18,9))
im = np.zeros((1,28*20+1))
for i in range(10):
    itemindex = np.where(mnist_label.reshape(70000,)==i)[0]
    row = np.zeros((28,1))
    for k in range(20):
        row = np.hstack((row, mnist_data[itemindex[k]]))
    im = np.vstack((im, row))
plt.imshow(im, cmap='binary')
fig.patch.set_visible(False)
ax.axis('off')
plt.savefig('Beispiel1.jpg', dpi=300, bbox_inches='tight', pad_inches=0.1, frameon=False)

In [None]:
fig, ax = plt.subplots(figsize=(18,9))
im = np.zeros((28*11+1,1))
for i in range(10):
    itemindex = np.where(mnist_label.reshape(70000,)==i)[0]
    row = np.zeros((1,28))
    for k in range(11):
        row = np.vstack((row, mnist_data[itemindex[k]]))
    im = np.hstack((im, row))
plt.imshow(im, cmap='binary')
fig.patch.set_visible(False)
ax.axis('off')
plt.savefig('Beispiel2.jpg', dpi=300, bbox_inches='tight', pad_inches=0.1, frameon=False)

In [None]:
plt.figure(figsize=(2,2))
plt.imshow(mnist_data[0], cmap='binary', interpolation='none')
plt.title("Zahl: {}".format(mnist_label[0][0]))
plt.xticks([])
plt.yticks([])
plt.savefig('Beispiel3.png', dpi=600, bbox_inches='tight', pad_inches=0.1)

In [None]:
for number in range(15):
    new_im = []
    n = 10
    for row in mnist_data[number]:
        new_row= []
        for pix in row:
            new_row.append(255)
            for i in range(n):
                new_row.append(pix)
        new_row.append(255)
        for i in range(n):
            new_im.append(new_row)
        new_im.append(np.ones((len(new_row)))*255)



    plt.figure(figsize=(6,6))
    plt.imshow(new_im, cmap='binary', interpolation='none')
    plt.xticks([])
    plt.yticks([])
    plt.savefig('Nummer_%i.png'%number, dpi=600, bbox_inches='tight', pad_inches=0.1)