# Beer Classifier 🍻 🍻 🍻

# Problem Statement
Imagine the scenario:

> You're standing in the supermarket looking at the shelf of beers. You're taken aback by the shear amount of options and you don't know which one to buy. You wish you could get some sort of help identifying all of these beers.

This happens way to often and this need to be fixed! Introducing: **THE BEER CLASSIFIER**

<img src="images/personal_care.jpg" style="height:350px;">
<caption><center> <u> Figure 1:</u> A shelf of beers in a Danish supermarket<br> </center></caption>

# Scope of this notebook
This notebook is meant as a self-paced walkthrough for constructing `The Beer Classifier` which is a `Convolutional Neural Network` (CNN) able to detect if an image contains a Hoegaarden (see Figure 2) or a Tuborg (see Figure 3). The goal of is to construct and train this CNN from scratch.

This notebook will give a high-level introduction to `Convolutional Neural Networks` using the `Keras` API implemented by the `TensorFlow` framework. It will discuss many common topics when working with Machine Learning and CNNs, but it will teach them in great detail. The structure of this notebook has been optimized for a learning objective rather than a programming/memory efficiency perspective.

<img src="images/hoegaarden.jpg" style="height:200px;">
<caption><center> <u> Figure 2:</u> A Hoegaarden<br> </center></caption>

<img src="images/tuborg.jpg" style="height:200px;">
<caption><center> <u> Figure 3:</u> A Tuborg<br> </center></caption>

# 0) Version check
Before doing any coding, it's always a good practice to ensure we are running the correct version of the packages. This notebook has been tested to run with TensorFlow v2.9.1.

In [None]:
import tensorflow as tf
print(tf.version.VERSION)

# 1) Importing data
The first step of any Machine Learning project is to examine the available data in order to get a better understanding of it. So let's do that!

A folder called `data` has been provided with 40 images, 20 images of a Hoegaarden and 20 images of a Tuborg.

We'll make use of a framework called `Matplotlib` in order to visualize the data, which is a common Python framework for plotting data. We will use the `pyplot` module which provides an interface similar to the one found in MATLAB. We'll also use the `image` module which will help to import a single image from disk so we can show it in the notebook. Furthermore we'll instruct Matplotlib to embed its output into the Jupyter Notebook using the `inline` command.

We'll import all of these dependencies in the next cell.

In [None]:
from matplotlib import pyplot as plt
from matplotlib import image as mpimg
%matplotlib inline

Now that we have imported the plotting framework and necessary methods, let's plot a single data point in order to see how it looks like.

In [None]:
img = mpimg.imread('data/hoegaarden/IMG_6603.JPG')
plt.imshow(img)
print("Image size: ", img.shape)

So the raw data available to us are a bunch of images with the size 3024x4032 pixels and 3 color channels.

# 2) Preprocessing the data
As we saw in the previous section, the images are 4032x3024 pixels big, which means they are just the raw images taken by a smartphone. In order to prevent your computer from melting down and running of out memory when training the CNN, then we should scale down these images down before using them for training.

---

**In the following cell, you should perform the following steps:**

Create a python list called `images_paths_orig` which should contain all the paths to images found in the `data` folder. The resulting list should be similar to the following:
`['data/hoegaarden/IMG_6603.JPG', 'data/tuborg/IMG_6604.JPG', ...]`

A few hints to help you on the way:
- Use the `listdir` to list out all the files in a directory
- Python lists have a method called `append`
- First loop over all the file names in `data/tuborg/` and simply concatenate the file name with the absolute path like so `"data/tuborg/" + file_name`. Afterwards do the same for `data/hoegaarden/`
- Write `print(images_paths_orig)` in order to have the contents written out for debugging

In [None]:
from os import listdir
images_paths_orig = []

### START CODE HERE ###

### END CODE HERE ###

Now that we have a list of paths to all the images, let's scale them down to a more appropiate size. You will implement this logic in the next cell.

---

**In the following cell, you should perform the following steps:**

Loop over every path in `images_paths_orig` and perform the following actions:

1. Load the image into memory 
2. Scale it down to 200x150px
3. Create a new path for the scaled image. It should have the same folder structure, but in a folder called `tmp` instead of `data`. An example of such a new path could be `'tmp/hoegaarden/IMG_6603.JPG'`
4. Save the resized image to this new path
5. Append this new path to the list `images_paths_resized`

Hints:
- Use the imported `Image` for loading the image from disk into memory
- The `Image` object has a `resize` method
- Use the `replace` method on a String in order to replace the word `data` with the word `tmp` in the path string
- The `Image` object has a `save` method

In [None]:
# PIL stands for `Python Imaging Library` which is the standard library for dealing with images in Python.
from PIL import Image
from os import makedirs

new_width  = 200
new_height = 150
images_paths_resized = []

makedirs("tmp/tuborg", exist_ok=True)
makedirs("tmp/hoegaarden", exist_ok=True)

### START CODE HERE ###

### END CODE HERE ###

# 3) Loading the data
Now that all the images have been preprocessed, then we are ready to train with them. However, before we continue, there are two important Machine Learning concepts worth explaining first:

1. Splitting the data into a training and testing set
2. One-hot encoding

### Train/test data

Let's say that we have the dataset seen in Table 1. This is a list of 10 images with a corresponding label indicating whether there's a dog in the image or not.

You start by building a ML model and then train it on all 10 images to detect whether there is a dog in the image or not. Afterwards you are curious to see how well it learned to detect dogs, so to test it you give the model `img1.jpg` and it correctly outputs that the image has a dog in it. Just to confirm the results, then you also ask the model to classify the other 9 images and it gets everything correct.

<br/>
<br/>
<br/>
<center><b>🚫🚫🚫This is bad practice and is the biggest Machine Learning sin ever🚫🚫🚫</b></center>
<br/>
<br/>
<br/>

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-yw4l">image</th>
    <th class="tg-yw4l">dog?</th>
  </tr>
  <tr>
    <td class="tg-yw4l">img1.jpg<br></td>
    <td class="tg-yw4l">🐶</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img2.jpg</td>
    <td class="tg-yw4l">🐶</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img3.jpg</td>
    <td class="tg-yw4l">🐶</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img4.jpg</td>
    <td class="tg-yw4l">🚫</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img5.jpg</td>
    <td class="tg-yw4l">🚫</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img6.jpg</td>
    <td class="tg-yw4l">🐶</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img7.jpg</td>
    <td class="tg-yw4l">🚫</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img8.jpg</td>
    <td class="tg-yw4l">🐶</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img9.jpg</td>
    <td class="tg-yw4l">🚫</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img10.jpg</td>
    <td class="tg-yw4l">🐶</td>
  </tr>
</table>

<caption><center> <u> Table 1:</u> An fictional dataset of images and a label indicating whether the image contains a dog<br> </center></caption>

The reason why this is a bad practice is because the Machine Learning algorithm has simply memorized all the images it has seen. It's just like if you are studying for a math exam and memorizing all the assignments you are studying at home. At the exam itself, then you are presented with the exact same assignments you were studying at home.

Instead of asking the model to classify images that it has already seen, then you need to ask the model to classify images that it has never seen before. You can achieve this by only training the model on part of the original data and save the rest of the data for testing, as seen in Figure 4.

<img src="images/train_test.png" style="height:200px;">
<caption><center> <u> Figure 4:</u> Illustration of how the full dataset is split into 2 parts; one for training and one for testing<br> </center></caption>

Let's say that we split the data with a 70%/30% split, similar to the green/blue split in Table 2. This means the Machine Learning model is allowed to learn from 70% of the data, but once you need to test the model, then you test it against the 30% data it has never seen. 


<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-yw4l">image</th>
    <th class="tg-yw4l">dog?</th>
  </tr>
  <tr>
    <td bgcolor="#c9daf8" class="tg-yw4l">img1.jpg<br></td>
    <td bgcolor="#c9daf8" class="tg-yw4l">🐶</td>
  </tr>
  <tr>
    <td bgcolor="#c9daf8" class="tg-yw4l">img2.jpg</td>
    <td bgcolor="#c9daf8" class="tg-yw4l">🐶</td>
  </tr>
  <tr>
    <td bgcolor="#c9daf8" class="tg-yw4l">img3.jpg</td>
    <td bgcolor="#c9daf8" class="tg-yw4l">🐶</td>
  </tr>
  <tr>
    <td bgcolor="#c9daf8" class="tg-yw4l">img4.jpg</td>
    <td bgcolor="#c9daf8" class="tg-yw4l">🚫</td>
  </tr>
  <tr>
    <td bgcolor="#c9daf8" class="tg-yw4l">img5.jpg</td>
    <td bgcolor="#c9daf8" class="tg-yw4l">🚫</td>
  </tr>
  <tr>
    <td bgcolor="#c9daf8" class="tg-yw4l">img6.jpg</td>
    <td bgcolor="#c9daf8" class="tg-yw4l">🐶</td>
  </tr>
  <tr>
    <td bgcolor="#c9daf8" class="tg-yw4l">img7.jpg</td>
    <td bgcolor="#c9daf8" class="tg-yw4l">🚫</td>
  </tr>
  <tr>
    <td bgcolor="#d9ead3" class="tg-yw4l">img8.jpg</td>
    <td bgcolor="#d9ead3" class="tg-yw4l">🐶</td>
  </tr>
  <tr>
    <td bgcolor="#d9ead3" class="tg-yw4l">img9.jpg</td>
    <td bgcolor="#d9ead3" class="tg-yw4l">🚫</td>
  </tr>
  <tr>
    <td bgcolor="#d9ead3" class="tg-yw4l">img10.jpg</td>
    <td bgcolor="#d9ead3" class="tg-yw4l">🐶</td>
  </tr>
</table>

<caption><center> <u> Table 2:</u> The dataset has now been split into a training (blue) and test set (green)<br> </center></caption>

### One-hot encoding
One-hot encoding is a technique used to represent categorial data. So what is categorial data? It's data has a finite list of potential values. It's much easier to explain through an example, so let's do that!

Imagine we are building an image classifier to detect which city is shown in the image. We've gathered the following imaginary dataset.

<br/>

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-yw4l">image</th>
    <th class="tg-yw4l">city</th>
  </tr>
  <tr>
    <td class="tg-yw4l">img1.jpg<br></td>
    <td class="tg-yw4l">Paris</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img2.jpg</td>
    <td class="tg-yw4l">Copenhagen</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img3.jpg</td>
    <td class="tg-yw4l">Copenhagen</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img4.jpg</td>
    <td class="tg-yw4l">Chicago</td>
  </tr>
  <tr>
    <td class="tg-yw4l">img5.jpg</td>
    <td class="tg-yw4l">Paris</td>
  </tr>
</table>

<caption><center> <u> Table 3:</u> An fictional dataset of images and a label indicating which city is shown in the corresponding image<br> </center></caption>

<br/>

There are many ways we could encode the labels seen in Table 3. One of the first ideas that comes to mind would be to store the labels as a list of strings.

```
labels = ["Paris", "Copenhagen", "Copenhagen, "Chicago", "Paris"]
```

Unfortunately Machine Learning models only work with numbers, so we need to convert that list into numbers. Let's use the notation that that `Paris=1`, `Copenhagen=2` and `Chicago=3`. This would update our list of labels to be:

```
data = [1, 2, 2, 3, 1]
```

This representation of labels is technically feasible to train a ML model, but unfortunately it does not perform very well. It's hard for the model to calculate these relatively huge numerical difference, which will only grow proportionally to the amount of possible values (fx if there are 200 cities, then the max value will be 200). A fundamental part of neural networks is derivities and they will have a much better effect working on smaller values.

A representation that has shown itself useful in this regard is called one-hot encoding. The idea is that instead of representing a data point as a single value, then it should be represented by a list of zeroes where a single element is 1.

In the case of the data above, then the encoding could for example be:

```
Paris      = [1, 0, 0]
Copenhagen = [0, 1, 0]
Chicago    = [0, 0, 1]
```

This means our data variable would be a list of lists, also known as a matrix:

```
data = [[1, 0, 0], 
        [0, 1, 0], 
        [0, 1, 0], 
        [0, 0, 1], 
        [1, 0, 0]]
```

The following Figure 5 and Figure 6 show some other examples of representing categorial data as one-hot encoded data.

<img src="images/one_hot_1.png" style="height:200px;">
<caption><center> <u> Figure 5:</u> Illustration of one-hot encoding fruit data<br> </center></caption>

<img src="images/one_hot_2.png" style="height:200px;">
<caption><center> <u> Figure 6:</u> Illustration of one-hot encoding color data<br> </center></caption>

---

**In the following cell, you should perform the following steps:**

Loop over each path and its corresponding index in `images_paths_resized` and perform the following tasks:
1. Load the image into memory using the `load_img` method
2. Convert the image data into a matrix using the `img_to_array` method
3. Assign the matrix with the image data to the `i`'th index in the variable `x`
4. If the image is a Tuborg, then the `i`'th index in `y` should be set to 0. Otherwise 1.

Once the `x` and `y` variables have been filled, then use the `to_categorical` method on `y` in order to one-hot encode it. Remember to store the output of this method back into the `y` variable.

Hints:
- Use the `enumerate` method on a list in order get tuples containing the element and its corresponding index
- Python has an operator `in` which can be used on strings

In [None]:
import numpy as np
from tensorflow.keras.preprocessing.image import load_img, img_to_array
from tensorflow.keras.utils import to_categorical

num_samples = len(images_paths_resized)
color_channels = 3
x = np.zeros((num_samples, new_height, new_width, color_channels))
y = np.zeros((num_samples, 1))
num_classes = 2

### START CODE HERE ###

### END CODE HERE ###

Now that we have the image data and labels loaded into memory in a format we can work with, then we need to split it into a training set and test set. This can easily be done using the `train_test_split` from the scikit-learn framework. We give it a random seed to ensure consistent results whenever this code is run. The default split is 25% data for the test set.

We also need to apply another trick on the raw image data. We need to normalize it, which means that all the values will be between 0 and 1. This to ensure the model works with smaller numbers like one-hot encoding, which will help the model to learn much faster.

In [None]:
from sklearn.model_selection import train_test_split

# Splitting the data
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=42)

# Normalizing the image data
x_train /= 255
x_test /= 255

# 4) Building the model
Now comes the fun part of actually building the `Convolutional Neural Network` 🎉 

A CNN consists of many different working parts, the two most important parts being:
- Fully connected layer (also known as `Dense` layer in the Keras API)
- Convolutional layer

### Fully connected layer
Let's try to look at a fully connected layer from an outside-in perspective. A small neural network can be seen in the following Figure 7. Think of the red layer as the raw image data of either a Hoegaarden or Tuborg, the blue layer is a fully-connected layer and the green layer is the one-hot encoded output.

A fully connected layer consists of a bunch of individual neurons which are all connected to the input from the previous layer. As seen in Figure 7, all the neurons in the blue layer are connected to all the input nodes in the red layer.

<img src="images/neural_net.png" style="height:200px;">
<caption><center> <u> Figure 7:</u> A small neural network consisting of a single hidden layer with 4 units<br> </center></caption>

Let's dig deeper and see what happens in a single neuron in the fully connected layer. Take a look at Figure 8 and then we'll review it step by step:

1 - First the input from the previous layer is passed to the neuron, let's denote this input as `x`. These are illustrated as orange circles in Figure 8. There are 3 inputs to the neuron in Figure 8, which means x is actually a list of 3 elements: 

> x = [x_1, x_2, x_3]

2 - The neuron has a list of parameters known as weights, `w`, which are parameters that the neuron has learned through training. When you train a neural network, then it is actually these weights that you adjust in order to get the correct output. The neuron has a specific weight for each input, in the case of Figure 8, then the weights would be:

> w = [w_1, w_2, w_3] = [0.7, 0.6, 1.4]

3 - Next step is for the neuron to multiply `x` and `w` and sum them up. So the calculation of the neuron so far is: 

> $\sum$ wx = $\sum$ [w_1, w_2, w_3] * [x_1, x_2, x_3] = w_1 $*$ x_1 + w_2 $*$ x_2 + w_3 $*$ x_3

4 - The neuron has another parameter besides the weights called the bias, also learned through training, which is represents what is not modelled by the input. This will cause a better prediction of the output. The purpose of the bias is  similar to the term `b` in the classic linear equation `y = ax + b`. The current calculation of the neuron looks like the following:

> w_1 $*$ x_1 + w_2 $*$ x_2 + w_3 $*$ x_3  + b

5 - The last step for a neuron is to pass the calculation through an activation function. There exists many different types, but in this notebook we will make use of one called `relu`. This means that the output of the neuron in Figure 8 is the following:

> output = relu(w_1 $*$ x_1 + w_2 $*$ x_2 + w_3 $*$ x_3  + b)

<img src="images/neuron.gif" style="height:200px;">
<caption><center> <u> Figure 8:</u> Illustration of what goes on inside a single neuron<br> </center></caption>

### Convolutional layer
The structure of a convolutional layer is quite different from a fully-connected layer. We won't go into much detail, but only give a quick intuition of what it does.

Convolutional layers has become the standard to use when dealing with images. The reason for this is because it "scans" an image and reuses its learnings across the whole image. 

Let's say  you have a bunch of training images with a dog in the right corner of all the images. If you were to use a neural network only consisting of fully-connected layers, then it would not learn how to recognize the dog if it was placed in the top left corner of the image. However if you use convolutional layers, then it will reuse it learnings and realize that it's a dog even though it's placed in the top left corner of the image.

The following Figure 9 shows an image (the red, green and blue layers in the top) being scanned by a conlutional layer.

<img src="images/conv_layer.gif" style="height:300px;">
<caption><center> <u> Figure 9:</u> A a convolution over an image with a single color channel<br> </center></caption>

---

**In the following cell, you should perform the following steps in the order specified:**
- Add a `Activation` layer with a `relu` activation to the model
- Add a `MaxPooling2D` layer to the model with a pool size of `(2,2)` and strides of `(2,2)`
- Add a `Dropout` layer with a dropout rate of `0.5`
- Add a `Conv2D` layer with 64 filters and a kernel size of `(3,3)`
- Add a `Activation` layer with a `relu` activation to the model
- Add a `MaxPooling2D` layer to the model with a pool size of `(2,2)` and strides of `(2,2)`
- Add a `Dropout` layer with a dropout rate of `0.5`
- Add a `Flatten` layer
- Add a `Dense` layer with 256 hidden units
- Add a `Activation` layer with a `relu` activation to the model
- Add a `Dropout` layer with a dropout rate of `0.5`
- Add a `Dense` layer with 2 hidden units
- Add a `Activation` layer with a `softmax` activation to the model

You can find the necessary documentation for the different layer types here:
- [Activation documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Activation)
- [MaxPooling2D documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D)
- [Dropout documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout)
- [Conv2D documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D)
- [Flatten documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten)
- [Dense documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D, Activation

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), input_shape=x[0].shape))

### START CODE HERE ### 

### END CODE HERE ###

# 5) Training the model
Now that the model architecture has been defined, then it's time for the training. Based on the description earlier of fully-connected layers, then a neural network simply consists of a lot of small float variables known as weights and biases. When training a neural network, then these weights and biases have to be tuned to the perfect values in order to get the correct output. The training process takes care of tuning these variables through a process called backpropagation with an approach called gradient descent.

### The loss function
Before moving on, first we need to discuss something called the loss function. This is a mathematical equation, just like `y = ax + b`, which defines how accurate the output of a model is compared to the expected output:

> Loss function = $\mathcal{L}$($\hat{y}$, y)

> where **$\hat{y}$** is the predicted value by the model and **y** is the real value provided in the training data.

The loss function calculates how different **$\hat{y}$** and **y** are. If they are an exact match, then the loss will have a value of $\mathcal{L}$($\hat{y}$, y) = 0. If they are very different, then the loss will be a big numerical value.

That means, the lower the loss, the better the model is performing. And this is really important, because this gives us a value to measure on how well the model is performing. So if we adjust the parameters in the neural network, then we can measure if these changes made the model better or worse.

In our case of beer classification, we will use a loss function called binary crossentropy. There exist a lot of different loss functions depending on the use case.

### Backpropagation and gradient descent
The essense of backpropagation is based on derivatives. It will calculate the derivate of the loss function and determine towards which direction that the weights and biasses should be updated.

Let's try to look at an example. Let's assume that the loss function that we've chosen to use is a quadratic function, which depends on the weights `w` and biasses `b` in the neural network. The graphical representation of this quadratic function can be seen in Figure 10. In Figure 10, the variable `x` can be considered a combination of `w` and `b`.

Both `w` and `b` are initialized randomly when the training begins. Such a random state could for example at the top left of the graph. Taking the derivative at this point, the gradient descent can figure out that it should take a small step to the right in order to reduce the loss (thus making the model more accurate). It repeats this process over and over for multiple iterations, until the gradient descent has converged at the bottom and reached the optimal values for `w` and `b`.

Figure 11 shows how the parameters are being updated for each iteration of gradient descent and ends up converging with the lowest loss possible.

<img src="images/gradient_descent.jpg" style="height:300px;">
<caption><center> <u> Figure 10:</u> The quadratic loss as a function based on the neural network parameters `x`.<br> </center></caption>

<img src="images/backprop.gif" style="height:250px;">
<caption><center> <u> Figure 11:</u> An illustration of a network training. The loss (also known as error) can be seen in the right corner.<br> </center></caption>

In [None]:
from tensorflow.keras.losses import binary_crossentropy
from tensorflow.keras.optimizers import Adam

# Trains the model
model.compile(loss=binary_crossentropy, optimizer=Adam(lr=0.00001), metrics=['accuracy'])
model.fit(x_train, y_train, epochs=15)

# 6) Evaluating the model
As seen from the output above while training the model, the accuracy on the training data increased steadily over time. However, when measuring the real accuracy of a model, then it needs to be against the test data as discussed previously.

In total we have 40 data sample and 25% was set aside for the test set, which means we have 10 images to test on.

The final accuracy test is quite simple:
> The model will be given a bunch of images that it has never seen before and asked to classify them as either a Turborg or Hoegaarden. If it correctly classifies for example 9 out of 10 images, then it will get an accuracy of 90%. 

Implementing this evaluation is very simply thanks to useful method in the Keras API.

In [None]:
score = model.evaluate(x_test, y_test)
print('Test accuracy:', score[1])

As you can see from the output above, the accuracy of the model managed to achieve an accuracy above 80%! 🎉 This result will vary from time to time since we haven't given any fixed seed to the random number generator.

If the model would have learned nothing at all, then it would be guessing randomly, giving an accuracy of approximately 50 % since there are only 2 beers to guess between. If there had been 3 beers, then it would have had a guessing accuracy of 33%.

# 7) Next steps - Deployment
Congratulations on reaching it to the end! You now have a fully functional CNN for classifying beers. The next step is to export the model and deploy this model somewhere so it can be used in production!

- One option could be to create an iOS app and embed the model using `CoreML`: https://developer.apple.com/documentation/coreml
- Another option could be to setup a simple web server and expose the model through a web API: https://blog.keras.io/building-a-simple-keras-deep-learning-rest-api.html

If you're up to the challenge, then you can also try to collect images of a third beer and update the CNN architecture to handle a third possible output.

# 8) Bibliography

- Figure 1: https://www.reddit.com/r/Denmark/comments/626pl6/personlig_pleje_i_netto/
- Figure 2: https://www.belgianbeerz.com/products/hoegaarden-75-cl?variant=1069873317
- Figure 3: https://www.bestiloghent.dk/produkt?id=29914
- Figure 4: http://scott.fortmann-roe.com/docs/MeasuringError.html
- Figure 5: https://chrisalbon.com/machine_learning/preprocessing_structured_data/one-hot_encode_nominal_categorical_features/
- Figure 6: https://www.kaggle.com/dansbecker/using-categorical-data-with-one-hot-encoding
- Figure 7: https://medium.com/@curiousily/tensorflow-for-hackers-part-iv-neural-network-from-scratch-1a4f504dfa8
- Figure 8: https://harishnarayanan.org/writing/artistic-style-transfer/
- Figure 9: https://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist/#0
- Figure 10: https://www.quora.com/Does-Gradient-Descent-Algo-always-converge-to-the-global-minimum
- Figure 11: https://www.codeproject.com/articles/175777/financial-predictor-via-neural-network
