# The Donkey Convolutional Neural Network

The goal of this chapter is to get familiar with the neural network(s) used in the [Donkey](https://github.com/wroscoe/donkey) library.

## The Donkey library

The [Donkey library](https://github.com/wroscoe/donkey) has several components.

It is first and foremost a python library installed where your other python libraries are (e.g. system python or virtualenv). After installation, you can `import` it as any normal python library:

```python
import donkeycar as dk
```

It also has a CLI with tools mainly used to aid training (see [donkey-tools.ipynb](./donkey-tools.ipynb)):

```bash
donkey --help
```

A `Vehicle` application, installed to the `~/d2` directory by default. This is where you'll find the `manage.py` script, which is used for both **driving** and **training**.

```bash
~/d2/manage.py --help
```

### Install the Donkey library

If you already installed the [Donkey](https://github.com/wroscoe/donkey) library, you can [skip](#Train) this step.

Otherwise, go ahead:

In [None]:
# Make sure we're in SageMaker root
%cd ~/SageMaker

# Remove any old versions of the library
!rm -rf ~/SageMaker/donkey

# Clone the Donkey library git
!git clone https://github.com/wroscoe/donkey.git

In [None]:
# Update Donkey dependencies

# Keras is pinned to version 2.0.8 in the Donkey requirements. Change this to allow a newer version
!sed -i -e 's/keras==2.0.8/keras>=2.1.2/g' ~/SageMaker/donkey/setup.py
!sed -i -e 's/tensorflow>=1.1/tensorflow-gpu>=1.4/g' ~/SageMaker/donkey/setup.py

# Install
!pip uninstall --yes donkeycar
!pip install ~/SageMaker/donkey

## Inspecting the Keras network

First, take a few minutes to look through the `keras.py` file (it's a [Donkey](https://github.com/wroscoe/donkey) library *part*):

In [None]:
# Assuming donkey library is installed in ./donkey
%pycat ~/SageMaker/donkey/donkeycar/parts/keras.py

The default algorithm used is defined in `donkey/donkeycar/templates/donkey2.py`:

```python
def drive(cfg, model_path=None, use_joystick=False):
    ...
    kl = KerasCategorical()
    if model_path:
        kl.load(model_path)
    ...
```

`KerasCategorical`, which is created in `default_categorical()` method. Before looking closer at the code, let's see if we can visualize it using our pre-trained model (see [Donkey library training](./donkey-train.ipynb) chapter):

In [None]:
# Download the model (if it's not already there)
# Replace bucket if you used some other bucket than SageMaker default
import sagemaker
bucket = sagemaker.Session().default_bucket()
!aws s3 cp s3://{bucket}/models/my-first-model ~/SageMaker/models/my-first-model

In [None]:
# Install some additional python libraries required by the Keras visualization lib
!conda install --yes pydot graphviz

In [None]:
# Plot the algorithm
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
from keras.models import load_model

model = load_model('./my-first-model')
SVG(model_to_dot(model).create(prog='dot', format='svg'))

Ah! A nice neural network with
- 1 input layer
- 5 convolutional layers
- 1 flatten layer
- 2 dense layers
- 2 dropout layers

The presence of a convolutional layer makes this neural network a [convolutional network](https://en.wikipedia.org/wiki/Convolutional_neural_network). This makes sense, since CNNs are particulary good with images.

So, what do the different layers actually do? Let's have a look at them, one at a time.

#### Quick readup

This excellent cheatsheet is handy to read through and use as a reference before continuing:
- http://ml-cheatsheet.readthedocs.io/en/latest/nn_concepts.html

And two links about CNNs (read, or be confused later):
- https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/
- https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/index.html

#### Input layer

The *Input layer* holds the input data.

In [None]:
from keras.layers import Input

# Input layer
#
# 120 x 160 x 3 image size (RGB)
#
img_in = Input(shape=(120, 160, 3), name='img_in')

# Rename it to x
x = img_in

We define the shape of the data as a 3 channel (RGB) 120px x 160px image.

If you read through the [links](#quick-readup), you'll have a rought idea of what a channel is. In CNNs,  layers operate on 3D chunks of data. The first two dimensions are the height and width of the image, and the third dimension is a number of such 2D data stacked over each other (3 in RGB images). This stack is called channels.

Why is this important? In the input layer, a channel is easily understood as RGB data (one channel per color). However, a [convolutional layer](#first-convolutional-layer) can have many more channels, which we'll get to later.

Formally, x can be viewed as a $H \times W \times C$ vector, where `H,W` are the dimensions of the image and `C` is the number of channels of the given image volume.

API documentation:
- https://keras.io/layers/core/#input

#### First convolutional layer

Before continuing, make sure you've read through and have a rough idea of the following concepts:
- NN concepts - http://ml-cheatsheet.readthedocs.io/en/latest/nn_concepts.html
- Relu - http://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html#relu
- Convolution operator - https://cambridgespark.com/content/tutorials/convolutional-neural-networks-with-keras/index.html#convolutions

In [None]:
from keras.layers import Convolution2D

# First hidden layer
#
# 24 features      - Results in a 24 channel feature map(s). This means the first layer can detect 24 different low level features in the image.
# 5x5 pixel kernel - Width and height of the kernel. Will automatically have the same channel depth as the input (5x5x3)
# 2w x 2h stride   - Strides of the convolution along the width and height
# relu activation  - Use the 'relu' activation function
#
x = Convolution2D(24, (5,5), strides=(2,2), activation='relu')(x)

The first hidden layer is a 2D convolution layer with the described hyperparameters (see above) and a `relu` activation function.

> What is the reasoning behind this design? Why are the hyperparameters given these values?

This is a tricky question to answer, but important for later tweaking of the network. The sad news are that they often required lots of experience and theoretical background to master. But hey, let's give it a shot!

They ar all a tradeoff between performance and accuracy, but there are some general rules to follow. For example, the number of *features* are usually lower in the first convolutional layers, because the input size is still large, resulting in large *feature maps*. Later layers have smaller (but deeper) input size (because of the convolution in the previous layers), and thus can afford to have more features. Similar reasoning can be done for the other hyperparameters. The following link has a more in depth discussion around CNN hyperparameters:
- http://deeplearning.net/tutorial/lenet.html#tips-and-tricks

API documentation:
- https://keras.io/layers/convolutional/#conv2d

#### Second convolutional layer

In [None]:
# Second hidden layer
#
# 32 features      - Results in a 32 channel feature map(s)
#
x = Convolution2D(32, (5,5), strides=(2,2), activation='relu')(x)

Not much changed here. We increase the number of features as discussed earlier. This allows the network to pick up more features based on the feature maps from the first hidden layer.

#### Third convolutional layer

In [None]:
# Third hidden layer
#
# 64 features      - Results in a 64 channel feature map(s)
#
x = Convolution2D(64, (5,5), strides=(2,2), activation='relu')(x)

We increase the number of features yet again. The input to the next layer will now be 64 channels deep.

#### Fourth convolutional layer

In [None]:
# Fourth hidden layer
#
# 3x3 pixel kernel   - Width and height of the kernel. Will automatically have the same channel depth as the input (3x3x64)
x = Convolution2D(64, (3,3), strides=(2,2), activation='relu')(x)

Here we decrease the kernel size to 3x3. By using a relatively large kernel size in the 3 previous layers, we have shrunk the image size by quite a bit (each convolution will shrink the image/feature map (if not padded), and add depth instead). By decreasing the kernel size, we make sure that image doesn't shrink too much.

#### Fifth convolutional layer

In [None]:
# Fifth hidden layer
#
# 1w x 1h stride   -
x = Convolution2D(64, (3,3), strides=(1,1), activation='relu')(x)

In the last convolution layer, we change the stride length. This will result in larger feature maps (at the cost of performance).

#### Flatten layer

In [None]:
from keras.layers import Flatten 

# Flatten layer
#
# Flatten to 1 dimension
#
x = Flatten(name='flattened')(x)

A flatten layer does what it claims to do. It flattens the convoluted input from 64 channels to 1, by putting the channels after each other. This is needed for the next layer, a fully-connected MLP (Dense).

API documentation:
- https://keras.io/layers/core/#flatten

#### First dense layer

In [None]:
from keras.layers import Dense

# First Dense layer
#
# 100 units       - Use 100 neurons in the layer
# relu activation - Use 'relu' activation
#
x = Dense(100, activation='relu')(x)

A dense layer is another name for a fully-connected layer like in a [MLP](https://en.wikipedia.org/wiki/Multilayer_perceptron). It is very common for upper-layers in a CNNs to have fully-connected layers (actually, the purpose of conv layers is to extract important features from the image before downsampling enough to be handled by a MLP).

#### First dropout

In [None]:
from keras.layers import Dropout

# First dropout
#
# 0.1 - Randomly drop out 10% of the neurons to prevent overfitting.
#
x = Dropout(.1)(x)

As described in the [links](), dropout is used to prevent overfitting by forcing redundancy in the neural network (which means it cannot rely on a particular neuron because it might be dropped).

#### Second dense layer

In [None]:
# Second Dense layer
#
# 50 units        - Use 50 neurons in the layer
#
x = Dense(50, activation='relu')(x)

We gradually decrease the size of the layers from the original 100 to 50.

#### Second dropout

In [None]:
# Second dropout
#
# Not much to say here...
#
x = Dropout(.1)(x)

#### Output layer

In [None]:
# Outputs
#
# Angle
# Dense              - Fully-connected output layer
# 15 units           - Use a 15 neuron output.
# softmax activation - Use 'softmax' activation
#
# Throttle
# Dense           - Fully-connected output layer
# 1 unit          - Use a 1 neuron output
# relu activation - Use 'relu' activation
angle_out = Dense(15, activation='softmax', name='angle_out')(x)
throttle_out = Dense(1, activation='relu', name='throttle_out')(x)

The output layers are fully-connected with different units and activations.

Angle uses a 15 neuron output with a softmax activation. [Softmax](https://en.wikipedia.org/wiki/Softmax_function) is a common way of creating a probability distribution over K different possible outcomes (in this case, `K=15`).

Throttle uses a 1 neuron output with a relu activation. [Relu](https://en.wikipedia.org/wiki/Rectifier_(neural_networks) will result in the throttle having only positive values (it can only go forward).

#### Create the model

Finally, it's time to create the model and define the loss functions for training.

In [None]:
from keras.models import Model

# Create model
#
# Optimizer
# ---------
# adam
#
# Angle
# -----
# categorical cross entropy loss function
# 0.9 loss weight
#
# Throttle
# --------
# mean_absolute_error loss function
# 0.001 loss weight
# 
model = Model(inputs=[img_in], outputs=[angle_out, throttle_out])
model.compile(
    optimizer='adam',
    loss={'angle_out': 'categorical_crossentropy', 'throttle_out': 'mean_absolute_error'},
    loss_weights={'angle_out': 0.9, 'throttle_out': .001})

The model requires the following information:
- input (`img_in`) and output tensors (`angle_out`, `throttle_out`)
- optimizer ([`adam`](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam)). The optimizer is the algorithm used when minimizing the loss function during training. Performs well and efficient.
- loss function(s), one for each output
  - `angle_out` : [`categorial_crossentropy`](https://en.wikipedia.org/wiki/Cross_entropy) - Suitable for categorization.
  - `throttle_out` : [`mean_absolute_error`](https://en.wikipedia.org/wiki/Mean_absolute_error) - More general loss function
- initial loss weights

## Next

[Donkey data](./donkey-data.ipynb)