<img src="images/kiksmeisedwengougent.png" alt="Banner" width="1100"/>

<div>
    <font color=#690027 markdown="1">
<h1>THE FOUNDATIONS OF A DEEP NEURAL NETWORK FOR IMAGE RECOGNITION</h1>    </font>
</div>

<div class="alert alert-box alert-success">
In this notebook, you will gradually get acquainted with the structure of a deep neural network that is capable of distinguishing between an image with a stoma and an image without a stoma. <br>The network provides a solution for a <em>classification problem</em>: an image is classified as 'Stoma' or 'No stoma'.</div>

In the development of a neural network, multiple choices must be made and values of parameters must be set. These choices determine the architecture of the network and how the training of the network proceeds.

Normally it takes a while to train a network. However, the networks that can be created in this notebook have already been trained and the results are stored in a database. In this way, you can immediately look at the performance of the chosen network.

<div class="alert alert-box alert-warning">
If you want to train a network yourself, take a look afterwards at the notebook 'From leaf to label' from the learning path 'Advanced Deep Learning'.</div>

<div class="alert alert-block alert-warning"> 
Take a look at a specific application of classification in the notebook 'Stomata Sun Shadow' from the learning path 'ML Classification'.</div>

### Install and import necessary modules

Execute the code cells below to be able to use the functions in this notebook.

In [None]:
import sys
!{sys.executable} -m pip install pymongo

In [None]:
import importlib.util
spec = importlib.util.spec_from_file_location(
name = "diep_neuraal_netwerk",
location = ".scripts/diep_neuraal_netwerk.py"
)
deep_neural_network = importlib.util.module_from_spec(spec)
spec.loader.exec_module(deep_neural_network)

<div>
    <font color=#690027 markdown="1">
<h2>1. The data</h2>    </font>
</div>

A deep neural network learns to map an input to an output by processing **labeled data**. Labeled data is data that consists of input, each with the corresponding expected output of the model. <br><br>For the stomata problem, the data are micro photos of parts of leaves from different types of plants from the tropical rainforest; each micro photo is accompanied by a label indicating whether or not the photo shows a stoma. The photos are therefore divided into **two classes** that represent the output of the network.<br>All these micro photographs are color photographs with a size of **120 x 120 pixels**. As seen in Figure 1, the photos with a stoma in the middle belong to the 'Stoma' class; photos without a stoma, with a partial stoma or with a stoma that is not in the middle of the image belong to the 'No Stoma' class. In photos with a stoma, the stoma fills most of the photo.
<img src="images/trainingdata.jpg"/>
<center>Figure 1: Example of the labeled data: the input and the corresponding output.</center>
<b>It has been experimentally determined that in order to achieve the best possible result, 6 times more photos without a stoma are needed than photos with a stoma. </b>This is because there is more variety in the images without a stoma, so more examples are needed.

The available data is divided into 3 groups:<ul>
<li><b>Training set</b>: These are the data used to train the model.<br>76 740 photos with label 'No stoma' + 12 790 photos with label 'Stoma' = 89 530 training images</li><li><b>Validation set</b>: This data is used to determine how well the network performs on data it has not yet seen. Based on this data, the network is adjusted to achieve better results.<br>28 866 photos with label 'No stoma' + 4 811 photos with label 'Stoma' = 33 677 validation images</li><li><b>Test set</b>: After training with the training set and refining the network with the validation set, the network is evaluated one more time using the test set. <br>55 182 photos with label 'No stoma' + 9 197 photos with label 'Stoma' = 64 379 test images</li></ul>

It may seem redundant to have a test set to evaluate the network one last time, since the performance of the network is already checked using the validation set. However, this is necessary. <br>The results of the validation set are in fact used to adjust the network so that it performs better on this validation set. In this way, the best network for a specific validation set is sought. How the network eventually performs on the validation set is unsuitable to assess how the network performs on new data; the pictures of the validation set are indeed no longer new data. **To assess the performance of the final network, the extent to which the network performs on a test set that the network has never seen, is appropriate.**

<div>
    <font color=#690027 markdown="1">
<h2>2. The Network Architecture</h2>    </font>
</div>

A deep neural network consists of several consecutive layers that convert the input, layer by layer, into the output. The more layers the network has, the deeper the network is.
Figure 2 shows an example of a deep neural network.<img src="images/vbnetwerk.jpg"/>
<center>Figure 2: The structure of a neural network.</center>

The network is a feedforward network. The input is sent through it from left to right. The first (blue) layers are convolutional layers and the other (purple) layers are *dense layers*.

**The following subsections each describe a part of the network. The number 1 in the image therefore refers to paragraph 2.1, the number 2 to paragraph 2.2, etc.**

<div>
    <font color=#690027 markdown="1">
<h3>2.1 Input</h3>    </font>
</div>

As previously mentioned, a microphoto of a part of a leaf on which a stoma can either be seen or not is inputted into the network, and this color photo has a format of 120 x 120 pixels. The photo will be (digitally) represented as a 3D-tensor with dimensions 3x120x120, so it can be processed by the network.

<div class="alert alert-block alert-warning"> 
How a computer looks at photos, you can read in the notebooks 'Matrices and grayscale images' and 'Tensors and RGB'. For explanations about tensors, you can refer to the notebooks 'Tensors'. You can find these STEM notebooks in the learning path "Digital Images".
</div>

<div>
    <font color=#690027 markdown="1">
<h3>2.2 Convolutional layers</h3>    </font>
</div>

The convolutional layers serve to discover relevant patterns in the input image (represented by a tensor).

A convolutional layer will place a second 3D tensor, a (<b>filter</b>), with dimension 3xaxb, at the top left of the image and perform a mathematical calculation, a convolution, which returns a single number. After this, this filter will shift a certain number of pixels and the next number is calculated. The result is a matrix containing information about where the pattern of the filter occurs in the input. The person designing the network chooses the value of a and of b and the number of pixels over which it is shifted.

The network for the classification of stomata uses filters of 3 by 3, with depth 3, and these filters will always shift 1 position at a time.

The following image shows this operation on an image of 8 x 8 pixels with a depth of 3 (RGB, dimension 3x8x8) and with a filter as a 3D tensor with dimension 3x3x3 that shifts one position each time. The resulting matrix has dimension 6x6.
<img src="images/convoperation.jpg" width="500"/>
<center>Figure 3: Convolution.</center>

A convolutional layer will usually not use one, but multiple filters to recognize different patterns.
- Every filter that slides over the image results in a matrix. All these matrices have the same dimension.
- By combining these matrices, a new 3D tensor is created where one of the numbers in the dimension is equal to the number of filters. This tensor is called a **feature map**.The elements of these filters are adjusted during the training of the model, with the aim of recognizing relevant patterns.

In Figure 2, the number below a convolutional layer (32 in the case of the first one) represents the number of filters. For filters with dimension 3x3x3, the output tensor of the first layer, i.e., the feature map of the first layer, will have a dimension of 32x118x118.
<img src="images/convlayer2.jpg"/>
<center>Figure 4: Convolutional layer.</center>

<div class="alert alert-block alert-warning"> 
You will find the surprising effect of convolutions in the 'Convolution' notebook of the 'Deep learning basics' learning path. To know what operation is being performed, check out the notebook 'Convolution: the operation' of the 'Advanced Deep Learning' learning path.</div>

<div>
    <font color=#690027 markdown="1">
<h3>2.3 Max pooling</h3>    </font>
</div>

The convolutional layers are alternated with max pooling operations. These operations retain only the largest value from a window of self-chosen dimensions. Moreover, this largest value indicates where the filter was most present in the input.

In the KIKS network, max pooling is used in windows of 2 pixels by 2 pixels. As a result, the feature maps become four times smaller due to a max pooling operation.

The following image provides an example of such a max pooling operation.
<img src="images/maxpooling.jpg"/><br>
<center>Figure 5: Max pooling.</center>

The purpose of max pooling operations is twofold.
- On the one hand, there is a desire to reduce the size of the output from the convolutional layers; why this is necessary is described in the following paragraph.
- On the other hand, they want to obtain more information about a larger part of the entered image.

Imagine for a moment that max pooling operations were not there. After two consecutive convolutional layers, a number in the feature map would contain information about a window of just 5 by 5. This is often not enough to recognize important features of the image. Would you recognize something in an image with only about 25 pixels?

The following image shows what happens with two convolutional layers (always with just 1 filter) without max pooling operations. The red field in the last feature map only contains information from the red field in the input image.
<img src="images/convnomaxpooling.jpg"/>
<center>Figure 6: Convolution without max pooling.</center>

### Assignment
What is the size of the window to which a number in the feature map refers, if max pooling is performed after both convolutions?

<div class="alert alert-block alert-warning"> 
Read more about max pooling in the notebook 'ReLU and max pooling' from the learning path 'Basic Deep Learning'.</div>

<div>
    <font color=#690027 markdown="1">
<h3>2.4 Dense layers</h3>    </font>
</div>

The dense layers serve to effectively classify the image.

A dense layer consists of neurons where each neuron is connected to every neuron of the next feedforward layer.To each connection between the neurons of the consecutive layers, a certain number is assigned, the **weight** of the connection.<br> You can compare the whole thing to a linear function that converts the input of a layer into the output of that layer, where the neurons are the variables of the function and the weights of the connections are the coefficients, in other words, a linear combination of the neurons.

A network <em>learns</em> by adjusting these weights based on the training data. The more neurons, the more information the network can store. However, too many neurons are not always good. The network may then start to store and overfit irrelevant information about the training data (see further).

The elements of the feature map from the last convolutional layer form the input of the first dense layer.

<div class="alert alert-block alert-warning"> 
More explanation about the fact that too many neurons, however, are not always good, can be found in the notebook 'Overfitting' in the learning path 'Basic Deep Learning'.</div>

In Figure 7, a feedforward network is represented as a graph: the nodes (the circles) represent the neurons and the arcs represent the connections between the neurons.
<img src="images/ffn.jpg"/><br>
<center>Figure 7: Feedforward layers with neurons and connections.</center>
So far, the input image has been converted into a 3D tensor by the convolutional layers and the max pooling operations, containing information about various relevant patterns in that image.

Before this data can serve as input to the dense layers, the 3D tensor must be converted into a row matrix. This is done by a <b>flatten</b> operation. Figure 8 shows how the flatten operation converts a feature map with dimensions 2x3x3 into a row matrix to serve as input for the dense layers.
<img src="images/flatten.jpg"/>
<center>Figure 8: Flatten.</center>

Now it is also immediately clear why the output of the convolutional layers must not be too large. For example, take a flatten operation on a feature map with dimension 64x100x100, this gives 640 000 inputs for the first dense layer. If the first dense layer itself contains, for example, 64 neurons, there are almost 4.1 million weights between these two layers alone that the network has to learn.

In the presentation of the deep neural network, a dense layer is represented as a purple bar. The number under the bar represents the number of neurons in this layer.

<div>
    <font color=#690027 markdown="1">
<h3>2.5 Output</h3>    </font>
</div>

The last layer consists of only one neuron; it is the output or the 'prediction' of the network. The network returns a number between 0 and 1, where 0 stands for 'No stoma' and 1 for 'Stoma'. The closer the output is to 1, the more certain the network is that a stoma is visible in the input image. The person building the network chooses a threshold value. Once this threshold is exceeded, a photo is classified in the 'Stoma' class.

<div>
    <font color=#690027 markdown="1">
<h2>3. Choose your network architecture</h2>    </font>
</div>

Now that you understand the different components of a deep neural network, you can put together a network yourself to perform stomata classification. Run the code cell below to choose some parameters of the network.

In [None]:
deep_neural_network.kies_netwerk_parameters()

To visualize the selected network, execute the following instruction. If you are not satisfied or want to try other things, feel free to change the parameters above and execute the instruction again.

In [None]:
deep_neural_network.toon_netwerk()

<div>
    <font color=#690027 markdown="1">
<h2>4. Training the Network</h2>    </font>
</div>

Once the network architecture is chosen, this network can be trained. But before you can start with that, you still have to make some choices.

<div>
    <font color=#690027 markdown="1">
<h3>4.1 Epochs</h3>    </font>
</div>

You have to choose the number of <b>epochs</b>: how many times the complete training data are processed. Often you have to experiment with the number of epochs to achieve a good result. The networks in this notebook are trained with 50 epochs.

*As previously mentioned, the networks in this notebook are not trained here. They have already been trained in advance and the results are stored in a database, enabling you to immediately view the characteristics of the selected network.*

<div>
    <font color=#690027 markdown="1">
<h3>4.2 Loss function</h3>    </font>
</div>

You also have to choose a loss function. The loss function is a function of the weights. The value of the loss function describes how well a network performs. The lower the value of the loss function, the closer the output of the network is to the desired output. Every time the weights are adjusted during training, the value of the loss function will also change. So in training, you are looking for the weights with the smallest loss.

In the networks in this notebook, the loss function ***binary crossentropy*** has been chosen. This loss function is suitable for classification problems with 2 classes (hence the 'binary').

<div class="alert alert-block alert-warning"> 
Do you want to know more about how this loss function works? Then enlighten yourself in the 'KIKS' manual.</div>

<div>
    <font color=#690027 markdown="1">
<h3>4.3 Optimizer</h3>    </font>
</div>

Another important choice when training a deep neural network is the optimizer. This determines the way the network learns. The optimizer determines how the weights need to be adjusted to get closer to the minimum of the loss function.

In the networks in this notebook, the ***stochastic gradient descent (SGD)*** optimizer has been chosen. As the name suggests, SGD employs the 'gradient descent' technique: the derivative of the loss function is calculated to find its minimum. 'Stochastic' means that the derivative of the loss is not calculated with the entire training set, but with only a randomly selected part of it. This is much faster than when the derivative of the loss is calculated with the entire training set.

<div class="alert alert-block alert-warning"> 
Immerse yourself in the technique of 'gradient descent' in the 'Gradient Descent' notebook from the 'Advanced Deep Learning' learning path. You also see the effect of the 'learning rate'.</div>

<div>
    <font color=#690027 markdown="1">
<h3>4.4 Learning rate</h3>    </font>
</div>

The learning rate determines how strongly the weights are adjusted each time, in other words, how big the steps are to reach the minimum.<br>
A small learning rate will ensure that the network slowly approaches the minimum, hence learning slowly, and with a too large learning rate, the network won't find the minimum. The following image shows how gradient descent works in 2 dimensions (so with just one weight). In a deep neural network there are a huge amount of weights and therefore also a huge amount of dimensions, which is much harder to imagine through a figure.
<img src="images/gradientdescent.jpg" width="500"/>
<center>Figure 9: Gradient descent.</center>

Execute the code cell below to choose the learning rate with which the network is trained.

In [None]:
deep_neural_network.kies_training_parameters()

<div>
    <font color=#690027 markdown="1">
<h2>5. Network Performance</h2>    </font>
</div>

<div>
    <font color=#690027 markdown="1">
<h3>5.1 Loss, accuracy and baseline</h3>    </font>
</div>

To evaluate a network, you can thus base yourself on the value of the loss function, but you can also look at the <b>accuracy</b> or precision of the network, this is the percentage of the data for which the network correctly predicts the label. <br>You calculate both values during training, per epoch, for both the training data and the validation data. <br>After training, you calculate these values one more time for the test data to finally assess the network.

It can also happen that the network does not learn. To determine this, you formulate a <b>baseline</b>. This is easy to explain with the KIKS example. Our dataset contains 6 times as many images without a stoma as images with a stoma. So, in total, 6/7 (85.7%) of the training set, validation set, and test set is an image without a stoma. When the model thus labels images as 'No stoma', there will already be an accuracy of 85.7%. This is the baseline that the model must exceed before you can say that the model has learned something.
By executing the following code cell, you will see graphs of the accuracy and the loss of the network, for the training and validation set. The graphs represent the accuracy and the loss per epoch.

In [None]:
deep_neural_network.toon_grafiek()

It happens that the training loss decreases, while the validation loss increases. When this happens, the network is <b>overfitting</b>. It means that the network is learning too many details of the training data by heart and therefore no longer generalizes well on data it has never seen before. <br>Overfitting is one of the biggest problems with a deep neural network. Fortunately, there are techniques to counteract overfitting. A following notebook 'Overfitting' explains a number of them.

The opposite of overfitting is <b>underfitting</b>: this means that the network has not learned enough and therefore can only poorly recognize the relevant patterns in the data. This is often the case with a network that is too simple or a network that has been trained for too short a time.

<div class="alert alert-block alert-warning"> 
View techniques to combat overfitting in the notebook 'Overfitting', also in the learning path 'Deep learning basics'.<br></div>

Figure 10 shows how you can recognize underfitting and overfitting on a graph where the values of the loss function for the different epochs are displayed. If the training were to stop before the leftmost dotted line, then you have a network that is underfit. If the training were to stop after the rightmost dotted line, you have a network that is overfit.
<img src="images/underfittingoverfitting.jpg"/>
<center>Figure 10: Under- and overfitting.</center>

<div>
    <font color=#690027 markdown="1">
<h3>5.2 Exercise</h3>    </font>
</div>

Search for a model that underfits, a model that overfits, and a model that does not learn. Do this by adjusting the parameters in '3. Choose your network architecture' and '4.4 Learning rate', and viewing the graphs in '5.1. Loss, accuracy and baseline'.

<div>
    <font color=#690027 markdown="1">
<h3>5.3 Threshold value</h3>    </font>
</div>

As an extra, you will look at a number of predictions from the network.
The prediction or output is a number between 0 and 1 that indicates how certain the network is that a stoma was found. The threshold determines for which values of the output the network considers the input as a stoma. For example, if the threshold value is 0.5, an image with an output greater than 0.5 will be considered as "Stoma" and all images with an output less than 0.5 as "No stoma".

### Assignment
- Execute the following code cell to see the predictions.
- Play with the threshold value (thr: displayed in percentage).
- Check if the network classifies the image correctly (a green border) or incorrectly (a red border).

In [None]:
deep_neural_network.toon_voorspellingen()

<div>
    <font color=#690027 markdown="1">
<h3>5.4 False positive and false negative</h3>    </font>
</div>

There are also images that the model struggles with. When the network judges that there is no stoma in a photo with a stoma, one speaks of a **false negative**. Conversely, when the network judges that there is a stoma in a photo without a stoma, one speaks of a **false positive**.

Execute the code cell below to see some images that many models struggle with.

In [None]:
deep_neural_network.toon_slechte_voorspellingen()

<div>
    <font color=#690027 markdown="1">
<h3>5.5 Performance on the test set</h3>    </font>
</div>

If you are satisfied with your network, you will evaluate it one last time using the test set. Execute the following code cell to see the final performance of your network.

In [None]:
deep_neural_network.toon_test_resultaten()

<div class="alert alert-block alert-warning"> 
In the notebook 'From leaf to label' of the learning path 'Advanced Deep Learning', you experiment with the parameters of the KIKS neural network. In other words, you train your own AI system that recognizes and counts stomata. You strive for the most accurate and efficient network possible.</div>

<div>
<h2>With support from</h2></div>

<img src="images/kikssteun.png" alt="Banner" width="1100"/>

<img src="images/cclic.png" alt="Banner" align="left" width="100"/><br><br>
Notebook KIKS, see <a href="http://www.aiopschool.be">AI At School</a>, from F. wyffels, A. Meheus, T. Neutens & N. Gesquière, is licensed under a <a href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.