# Object recognition with keras

<img src="https://media.giphy.com/media/l0MYxW1PyZl1qEA1O/giphy.gif" align="right" width="250" />

Hi again! Let's do some object recognition with keras! We'll make a CNN from the ground up, one that's specialized to **recognize hand drawn doodles**. "Like pictionary?" Yes exactly like pictionary!

We've seen the basics of how CNN's process visual data in [the previous section](./1.convolutional_neural_networks), we'll compose a net that accepts a picture you draw yourself and guess what's on it! This may not have that much practical use, but it serves as a nice **learning experience** on how neural nets work step by step before you tackle the challenges ahead!

We'll use part of google's generously documented ['quick, draw!' dataset](https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/simplified) to train our model.

We're using the simplified dataset, but still, there are too many categories to download all of them! That's alright, **pick out two classes** and download them to your machine for use, you'll use those in this notebook for object recognition!


## Examining the data

Let's take a look at what we dragged out of google's database. For this exercise, I'll be using the **cat** and **dog** classes, but as stated before, you can choose your own classes!

In [None]:
!wget -N https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/dog.npy -P ../assets/
!wget -N https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/cat.npy -P ../assets/

In [None]:
import numpy as np
from matplotlib import pyplot as plt
dogs = np.load("../assets/dog.npy")
cats = np.load("../assets/cat.npy")

plt.bar([0,1], [dogs.shape[0], cats.shape[0]])
plt.title('dataset sizes')
plt.xticks([0,1], ['dogs', 'cats'])
plt.ylabel('number of samples');

That's a lot of cats and dogs!

Google has been so kind to deliver us a **cleaned dataset**, the process of cleaning was as follows:

1. Align the drawing to the top-left corner, to have minimum values of 0.
2. Uniformly scale the drawing, to have a maximum value of 255.
3. Resample all strokes with a 1 pixel spacing.
4. Simplify all strokes using the Ramer–Douglas–Peucker algorithm with an epsilon value of 2.0.

This would have given them an image like the following.
![Cat Drawing (Image)](../assets/cat_drawing.png)

To get to an image like we had before, they had to transform this datatype from **vector format** to **raster format**! You might wonder why this step is necessary. Vector format retains all of the original information and is easy to store; no need to store all pixels, only 2 points for every line. BUT...convolutional neural networks want all those intermediate pixels as well! We want a **grid of pixels** to feed to our network so we can **convolve and pool**.

Fortunately, the `.npy` file format contains the pixel values of the cats and dogs in binary, so we do not need to clean them more.

In [None]:
def plot_raster(image):
    plt.imshow(image, cmap="gray")
    plt.axis('off')

dog_sample = dogs[0].reshape(28,28)
plot_raster(dog_sample)

What a cute little doggy!

In [None]:
cat_sample = cats[1].reshape(28,28)
plot_raster(cat_sample)

There we go, the same kitty as above but in a raster format! When this is fed to a **CNN**, the input layer can accept this input as an **image** comprising as a 28 by 28 grid of **pixels** that are either activated or not. This way, our neural network can also **convolve** the image and **pool** the pixels. This would be a bit hard to do with the original **vector format**.

We have more than enough samples to work with, so let's take a nicely **balanced subset** of our data so we don't get bias in the final model.

In [None]:
# 40000 samples is a nice balance to have enough data to have a nice
# accuracy, without training for too long
max_samples = 40000
preprocessed_cats = cats[:max_samples].reshape(-1,28,28)
preprocessed_dogs = dogs[:max_samples].reshape(-1,28,28)

# Normalizing
preprocessed_cats = preprocessed_cats/255
preprocessed_dogs = preprocessed_dogs/255

plt.bar([0,1], [preprocessed_dogs.shape[0], preprocessed_cats.shape[0]])
plt.title('dataset sizes')
plt.xticks([0,1], ['dogs', 'cats'])
plt.ylabel('number of samples');

Now, let's add the labels as well! Our neural network won't like **string values like 'cat' and 'dog**, but will want numeric representations instead. let's use **0 for cats, and 1 for dogs**. When you handle classifiers with more than two possible classes, don't forget the one-hot encoding format! Keras has the handy [to_categorical](https://keras.io/api/utils/python_utils/#to_categorical-function) function for this. For this example though, it is not needed.

In [None]:
cat_labels = np.zeros((max_samples, 1))
dog_labels = np.ones((max_samples, 1))

labels = np.concatenate([cat_labels, dog_labels])
drawings = np.concatenate([preprocessed_cats, preprocessed_dogs])

# tensorflow wants a 4D tensor with (n_images, width, height, colour_depth)
print("Drawings shape before : ", drawings.shape)
drawings = np.expand_dims(drawings, axis=3)
print("Drawings shape after : ", drawings.shape)
print("Label shape : ", labels.shape)

One final step: **separating our dataset into a train/val/test set**.

In [None]:
from sklearn.model_selection import train_test_split

train_val_drawings, test_drawings, train_val_labels, test_labels = train_test_split(
    drawings, 
    labels,
    test_size=0.2, 
    random_state=42, 
    shuffle=True
)

train_drawings, val_drawings, train_labels, val_labels = train_test_split(
    train_val_drawings, 
    train_val_labels,
    test_size=0.2, 
    random_state=42, 
    shuffle=True
)

print("train_drawings shape : ", train_drawings.shape)
print("val_drawings shape : ", val_drawings.shape)
print("test_drawings shape : ", test_drawings.shape)

print("train_labels shape : ", train_labels.shape)
print("val_labels shape : ", val_labels.shape)
print("test_labels shape : ", test_labels.shape)

## Composing and evaluating the model

### Composing the most simple form

You've seen in [convolutional neural networks](./1.convolutional_neural_networks.ipynb) the different parts of a typical CNN and how they relate to each other. **keras** makes it easy for us to quickly compose a neural network.

Let's start with the **simplist of convolutional neural networks**: A **convolutional layer**, followed by **a flattening layer** and **a regular 1D dense layer**. Why is this the most simple form, you ask? Well, our neural network cannot consist of only a conv layer, since these **accept and output a 2D data structure**. If the output and input layer are the same, the outputs is a very hard to interpret **2D layer**. How does this relate to our defined classes; The **cat class** and **dog class**? It doesn't, it needs to be flattened into a binary output, since we have two classes in our model.

We will fix some basic parameters below. You can use these values directly, they have been tested by us to be good enough for this task

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models

# pixel width and height of our images
input_size = 28

# number of filters in the convnet layer
filters = 64

# conv net parameters
strides = (2, 2)
pool_size = (2,2)
kernel_size = (5, 5)

You know how `tf.keras` works now. You'll implement the model described above using the `Sequential` API.
You also get a model summary below to show the output shapes.
In this first example, we'll only modify the default `kernel_size`, `activation` (use ReLU) and `input_shape` of the convolutional layer. We also specify the number of convolutional filters, given by the `filters` variable. The other parameters are left at their default values.


Store the model in a `model` variable.

In [None]:
# Create the first version of the model



model.summary()

The model summary should look similar to the following:
![Keras model 1 summary](../assets/first_summary.png)



In Tensorflow, we have a bit more control over what we can do with the loss function than in PyTorch.
Indeed, in this case, we have the choice between 4 different losses, which depend on:
- whether we one-hot encode our labels
- whether we use a Softmax activation in the last layer

The next will show the 4 solutions.



- We can put a `SoftMax()` on the last layer, **NOT** one-hot encode the labels and use [`tf.keras.losses.SparseCategoricalCrossentropy()`](https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy)


- We can **NOT** put a `Softmax()` on the last layer, **NOT** one-hot encode the labels and use [`tf.nn.sparse_softmax_cross_entropy_with_logits`](https://www.tensorflow.org/api_docs/python/tf/nn/sparse_softmax_cross_entropy_with_logits). This is equivalent to the above loss with the argument `from_logits` set to `True` : `SparseCategoricalCrossentropy(from_logits=True)` because the default is `False`.


- We can put a `SoftMax()` on the last layer, and one-hot encode labels and use [`tf.keras.losses.CategoricalCrossentropy(from_logits=False)`](https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy)


- We can **NOT** put a `SoftMax()` on the last layer, and one-hot encode labels and use `tf.keras.losses.CategoricalCrossentropy(from_logits=True)`


**Logits** is a confusing term and often abused in the deep learning community, as it can mean a lot of things. Here, logits are meant to describe the input to the Softmax layer, or generally any output from a neuron that has not been passed through an activation function.

**Now it is on you to choose the correct loss function** 

Use the Adam optimizer, one of the above mentioned loss functions and the accuracy as a metric.

Then fit the model for `6` epochs, while not forgetting to use the validation set.

Store the result of the fit in `history` variable.

In [None]:
# Compile and fit the model




Now, let's evaluate out model by comparing our **validation accuracy vs the training accuracy** with the matplotlib library.

In [None]:
def plot_history(history : tensorflow.python.keras.callbacks.History):
    """ This helper function takes the tensorflow.python.keras.callbacks.History
    that is output from your `fit` method to plot the loss and accuracy of
    the training and validation set.
    """
    fig, axs = plt.subplots(1,2, figsize=(12,6))
    axs[0].plot(history.history['accuracy'], label='training set')
    axs[0].plot(history.history['val_accuracy'], label = 'validation set')
    axs[0].set(xlabel = 'Epoch', ylabel='Accuracy', ylim=[0, 1])

    axs[1].plot(history.history['loss'], label='training set')
    axs[1].plot(history.history['val_loss'], label = 'validation set')
    axs[1].set(xlabel = 'Epoch', ylabel='Loss', ylim=[0, 10])
    
    axs[0].legend(loc='lower right')
    axs[1].legend(loc='lower right')
    
plot_history(history)

In [None]:
# Evaluate the model on the test set




Around 85% accuracy...Alright, not bad! But we can do better!
Notice that the validation loss does not increase anymore.

**Having only a single convolutional layer is maybe not enough to perfectly tell the difference between cat and dog doodles**. Why? Because right now, the neural network can just about make out if an image contains lines or not, which they both obviously do! Look at the image below:

![convnet_progression](./../assets/convnet_progression.png)

Each of these stages (from bottom to top) represent a convnet layer, and what it outputs. The bottom layer has 24 filters and seems to be able to make out the lines composing the image, but it's not connecting the dots so to speak. The features extracted by the first convnet layer are given to the next, and this layer connect these features into higher order features, and so on.

So we are still stuck in the first stage, we need to **extend our model**.

### Composing an extended model

Let's add another convnet layer! But in between these, we'll add a **pooling layer**. What's the purpose of this layer you ask? It has multiple purposes as seen in the last subchapter, but the main purpose here is **dimensionality reduction** to make it a bit easier on our model. Training the last model was already pretty challenging, and we don't need quantity, we need quality. Pooling (supposedly) redundant spatial information together is ofter done in CNN's for this reason.

Again, you know what to do. We will be modifying the same parameters as before, and you have the summary below again for the details. 

For the **pooling layer**, we will specify our own `strides` and `pool_size` parameters. Also specify `padding='same'`. 

Make sure to change those so that your summary matches the one provided.

In [None]:
# Create the second version of the model




The model summary should look similar to the following:
![Keras model 2 summary](../assets/second_summary.png)

This time, you can run the model for a few more epochs, to give it time to get better.
Use around `10` epochs.

In [None]:
# Compile and fit the model




In [None]:
plot_history(history)

In [None]:
# Evaluate the model on the test set




Around 88% ! YEAH, we added 3% to our accuracy! Right now, our model is still lacking something vital: **regular dense layers**! 

Adding these layers is a nice way to finally put the extracted features together and make nonlinear connections between them! Like "aha, sharp ear features + a triangle nose means a cat, probably >.>!".

We will also add some **Dropout** layers to reduce overfitting.


### Further extending the model with dense layers
Last model, like said above, we're adding `Dense` layers at the end and `Dropout` layers between the convolutional layers and the pooling layers. Remember that we are still modifying the same parameters for the **convolutional layer** and for the **pooling layer** as the two other models.

In [None]:
# Create the final version of the model




The model summary should look similar to the following:
![Keras model 3 summary](../assets/third_summary.png)

In [None]:
# Compile and fit the model (10 epochs too)




In [None]:
plot_history(history)

In [None]:
# Evaluate the model on the test set




Around 89% accuracy ?! Now we're cookin'! This is starting to look good! As you already know, there are a few ways to combat overfitting in a model. One of them is to **decrease the DOF's of the model**, another neat method often used in CNN's is **adding dropout layers**.

"Dropout layers?! I'm no quitter!". Don't worry, these are conceptually easy to understand: These **deactivate a fraction of the neurons of the preceeding layers at random** to encourage robustness on the overall layer. Deactivating them at random avoids the reliance on specific nodes and disallows the network to overly fit to the training set.

This method isn't perfect, and it needs a decently sized network, which may not be the case here, so it might not be necessary here.

Read more about dropout [here](https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/).

Congratulations! You created your first convolutional neural networks from scratch, and were able to get a pretty good accuracy with a small network.

![Well done (GIF)](https://media.giphy.com/media/d31w24psGYeekCZy/giphy.gif)

## Testing the model

The following is a piece of code designed to take inputs from the user to generate a drawing from. **This code is very case specific and there is no need for you to learn the tkinter library for computer vision**. So just enjoy the result and continue on to the [next notebook](./4.object_recognition_with_pytorch.ipynb) after you're done playing around.

In [None]:
class LineDrawer:
    
    def __init__(self, canvas, input_size, output_size = 64, brush_size = 3, line_colour = "#476042"):
        self.canvas = canvas
        self.drawing = np.zeros((output_size, output_size))
        self.mouse_is_clicked = False
        self.brush_size = brush_size
        self.line_colour = line_colour
        self.scale_factor = float(output_size)/float(input_size)

    def on_motion(self, event):
        if not self.mouse_is_clicked:
            return
        x1, y1 = (event.x - self.brush_size), (event.y - self.brush_size)
        x2, y2 = (event.x + self.brush_size), (event.y + self.brush_size)
        self.canvas.create_oval(x1, y1, x2, y2, fill=self.line_colour)
        self.drawing[round(event.y*self.scale_factor), round(event.x*self.scale_factor)] = 255

    def on_down_press(self, event):
        self.mouse_is_clicked = True

    def on_release(self, event):
        self.mouse_is_clicked = False


In [None]:
import tkinter as tk

# run this cell to get a drawing interface! Just close it once you are done
gui = tk.Tk()
canvas_size = 512
canvas = tk.Canvas(gui, width=canvas_size, height=canvas_size)
line_drawer = LineDrawer(canvas, canvas_size, 28)
canvas.bind('<Motion>', line_drawer.on_motion)
canvas.bind('<Button-1>', line_drawer.on_down_press)
canvas.bind('<ButtonRelease-1>', line_drawer.on_release)
canvas.pack()
gui.mainloop()

In [None]:
plot_raster(line_drawer.drawing)

In [None]:
drawing_input = line_drawer.drawing.reshape(1, 28, 28, 1)
prediction = model.predict(drawing_input)
print(f"Predictions : {prediction[0][0] * 100 :.2f}% cat, {prediction[0][1] * 100 :.2f}% dog ")

In [None]:
# run this cell to get a drawing interface!
gui = tk.Tk()
canvas_size = 512
canvas = tk.Canvas(gui, width=canvas_size, height=canvas_size)
second_line_drawer = LineDrawer(canvas, canvas_size, 28)
canvas.bind('<Motion>', second_line_drawer.on_motion)
canvas.bind('<Button-1>', second_line_drawer.on_down_press)
canvas.bind('<ButtonRelease-1>', second_line_drawer.on_release)
canvas.pack()
gui.mainloop()

In [None]:
plot_raster(second_line_drawer.drawing)

In [None]:
drawing_input = second_line_drawer.drawing.reshape(1, 28, 28, 1)
prediction = model.predict(drawing_input)
print(f"Predictions : {prediction[0][0] * 100 :.2f}% cat, {prediction[0][1] * 100 :.2f}% dog ")