Exercise 8 - Introduction to Neural Networks
===

Originally hypothesised in the 1940s, neural networks are now one of the main tools used in modern AI. Neural networks can be used for both regression and categorisation applications. Recent advances with storage, processing power, and open-source tools have allowed many successful applications of neural networks in medical diagnosis, filtering explicit content, speech recognition, and machine translation.

In this exercise we will compare three dog breeds using their age, weight, and height. We will make a neural network model to classify the breeds of dogs based on these features.

Note: It's extremely common for AI practitioners to use a template such as the one below for making neural networks quickly. After you are done, feel free to play around with the template to get a feel of how you can easily adjust a neural network to your problems using the package `keras`.

Let's start by loading the libraries required for this session.

**Run the code below**

In [None]:
# Run this to load the required libraries, it might take a little while.

suppressMessages(install.packages("tidyverse"))
suppressMessages(library("tidyverse"))

suppressMessages(install.packages("keras"))
suppressMessages(library(keras))
suppressMessages(install_keras())

Step 1
---

Now let's load our data and inspect it.

#### Replace `<dataset>` with `dog_data` and run the code.

In [None]:
# Run this box to load our data

# Load the dataset `dog_data.csv`

###
# REPLACE <dataset> WITH dog_data
###
<dataset> <- read.csv("Data/dog_data.csv")
###

# Check the structure
str(dog_data)
head(dog_data)
summary(dog_data)

Based on the output of `str(dog_data)`, we have **200 observations** on dogs stored in **4 variables**:

* `age`: the first feature;
* `weight`: the second feature;
* `height`: the third feature;
* `breed`: the label, represented as numbers `0`, `1`, and `2`. 

Step 2
---

Before we make our model, let's get our training and test sets ready.

We've got 200 observations on dogs, so we'll use the first 160 observations for the training set, and the last 40 observations for our test set. For both the training and test sets, we will also separate `X` the features (`age`, `weight` and `height`) from `Y` the label (`breed`).

### In the cell below replace:
#### 1. `<trainingSetLocation>` with `1:160`
#### 2. `<trainingSetLocation>` with `1:160`
#### 3. `<testSetLocation>` with `161:200`
#### 4. `<testSetLocation>` with `161:200`
#### then __run the code__.

In [None]:
# Run this box to split data into training and test sets

###
# REPLACE <trainingSetLocation> WITH 1:160
###
train_X <- as.matrix(dog_data[<trainingSetLocation>, 1:3]) # Rows 1 - 160, columns 1 - 3 (the features)
raw_train_Y <- as.matrix(dog_data[<trainingSetLocation>, 4]) # Rows 1 - 160, column 4 (the label)
###

###
# REPLACE <testSetLocation> WITH 161:200
###
test_X <- as.matrix(dog_data[<testSetLocation>, 1:3]) # Rows 161 - 200, columns 1 - 3 (the features)
raw_test_Y <- as.matrix(dog_data[<testSetLocation>, 4]) # Rows 161 - 200, column 4 (the label)
###

# Check first few lines of new variables to see if the output is what we expect
# Training data
head(train_X)
head(raw_train_Y)

# Test data
head(test_X)
head(raw_test_Y)

Step 3
---

For a neural network, indicating `breed` using  `0`, `1`, and `2` are misleading, as it might imply that breed `0` is closer to breed `1` than breed `2`. But that is not the case here.

To allow the neural network to predict categories properly, we represent categories as 'one-hot vectors'. The labels (dog breeds) will go from being represented as `0`, `1`, and `2` to this:

| breed 0 | breed 1 | breed 2 |
|:------- |:------- |:------- |
| `1 0 0` | `0 1 0` | `0 0 1` |

So if the 1 is in the first position, the neural network knows that it's breed 0.

If the 1 is in the second position, the neural network knows that it's breed 1, and so on.

The code below will turn our raw labels into one-hot vectors our neural networks will be able to use.

### In the cell below replace:
#### 1. `<trainingLabels>` with `raw_train_Y`
#### 2. `<testLabels>` with `raw_test_Y`
#### then __run the code__.

In [None]:
# This box uses the keras function to_categorical to change breed from integer to categorical

###
# REPLACE <trainingLabels> WITH raw_train_Y
###
train_Y <- to_categorical(<trainingLabels>, num_classes = 3)
###

###
# REPLACE <testLabels> WITH raw_test_Y
###
test_Y <- to_categorical(<testLabels>, num_classes = 3)
###

# Print out some of our training and test data
head(train_Y)
head(test_Y)

There we go!

## Step 4

That's our data ready. Now it's time to make your first neural network model!

This is the standard syntax for a model using the `keras` package. You can always play around with adding in extra hidden layers and changing their size and activation functions later.

Our **input shape** in the first dense layer is the **number of features**, which is **3** in this case.

Our **final layer** has **3 units** (nodes), one for each of the dog breeds. So if we had 5 different breeds of dog in our dataset, the final layer would have 5 units.

### In the cell below replace:
#### 1. `<hiddenLayer1>` with `10`
#### 2. `<inputNumber>` with `3`
#### 3. `<hiddenLayer2>` with `10`
#### 4. `<outputNumber>` with `3`
#### then __run the code__.

In [None]:
# Run this!

use_session_with_seed(5)
set.seed(5)

model <- keras_model_sequential()

model %>%

# Add densely-connected neural network layers using `layer_dense` function
# Our first layer has an input shape of 3 to represent 3 input features (age, weight, height)

###
# REPLACE <hiddenLayer1> WITH 10 AND <inputNumber> WITH 3
###
layer_dense(units = <hiddenLayer1>, activation = "relu", input_shape = <inputNumber>) %>% 
###

# We now have a hidden layer with 10 nodes, with an input shape of 3 representing our 3 features.

# Next up we'll add another layer, with 10 nodes too.
###
# REPLACE <hiddenLayer2> WITH 10
###
layer_dense(units = <hiddenLayer2>, activation = "relu") %>% 
###

# Uncomment the next line if you want to add another layer
# layer_dense(units = 10, activation = "relu") %>% 

###
# REPLACE <outputNumber> WITH 3
###
layer_dense(units = <outputNumber>, activation = "softmax")
###

# Output layer has 3 nodes, one for each type of category we have

model %>% summary

Alright, that's our first model ready.

N.B. `"tanh"` is another common activation function that, if you want, you can try instead of `"relu"`, but it doesn't perform very well here.

Feel free to experiment with some different parameters later on. If this doesn't work, check that you have the correct size for the input and output layers in Step 4 (must be 3 nodes each). For example, "tanh" is another popular activation function if you want to try it instead of "relu".

Step 5
---

Next, we'll compile the model for training and see how it runs.

There are a few parameters you can choose that change how the model trains, and end up changing how the model performs.

We will use some standard parameters for now. 

Feel free to experiment with some different parameters later on. If this doesn't work, check that you input the correct size for the input and output layers in Step 4 (must have 3 nodes each).

#### Replace `<optimizer>` with `optimizer_adagrad()` and run the code.

In [None]:
###
# REPLACE <optimizer> WITH optimizer_adagrad()
###
model %>% compile(
    loss = "categorical_crossentropy",
    optimizer = <optimizer>,
    metrics = c("accuracy")
)
###

N.B. `"adam"` is another popular optimizer if you want to try it instead of `"adagrad"`.

Let's train the neural network and plot it!

### In the cell below replace:
#### 1. `<xData>` with `train_X`
#### 2. `<yData>` with `train_Y`
#### 3. `<epochNumber>` with `25`
#### then __run the code__.

In [None]:
# Run this box to plot our fit and print out how it performed on the training set
history <- model %>% fit(
    ###
    # REPLACE <xData> WITH train_X and <yData> WITH train_Y
    ###
    x = <xData>,
    y = <yData>,
    ###
    shuffle = T,
    ###
    # REPLACE <epochNumber> WITH 25
    ###
    epochs = <epochNumber>,
    ###
    batch_size = 2,
    validation_split = 0.2
)

plot(history)

# This tells us how the model performed on the training set
history

Note that the original training set `train_X` and `train_Y` with 160 observations has been split up again during the training process, where 112 of 160 samples were used for training, and 48 samples were used for validation, as per the output from `history`.

Step 6
---

Now that our model is trained and ready, let's see how it performs on our test data, `test_X` and `test_Y`!

It's important to test a model on data that it has never seen before, to make sure it doesn't overfit. Now let's evaluate it against the test set.

**Run the box below**

In [None]:
# Run this box
perf <- model %>% evaluate(test_X, test_Y)
print(perf)

It seems to be very accurate (acc = 95%) with the random seed that we set!

Let's see how the model predicts something completely new and unclassified.

**Come up with a brand new sample of the format `[age, weight, height]` to test the model with, then run the two code blocks.**

In [None]:
###
# CHANGE age, weight, AND height TO NEW VALUES 
###
new_dog <- data.frame(age = 5, weight = 4, height = 8)
###

str(dog_data)

# Age vs weight
ggplot() +
geom_point(data = dog_data, aes(x = age, y = weight, colour = as.factor(breed))) +
geom_point(data = new_dog, aes (x = age, y = weight), shape = "+", size=10) +
labs(x = "Age", y = "Weight", colour = "Breed")

In [None]:
# Run this code block to plot the relationship between age, height, and breed
# Age vs height
ggplot() +
geom_point(data = dog_data, aes(x = age, y = height, colour = as.factor(breed))) +
geom_point(data = new_dog, aes (x = age, y = height), , shape = "+", size=10) +
labs(x = "Age", y = "Height", colour = "Breed")

Now let's see what breed of dog the model says it is!

**Run the code below**

In [None]:
# Run this code to run the model

print("Probabilities of classes:")
predict_proba(model, as.matrix(new_dog))

print("Predicted class:")
predict_classes(model, as.matrix(new_dog))

The final number tells us which class it thinks it is.

Conclusion
---

We've built a simple neural network to help us predict dog breeds. In the next exercise, we'll look into neural networks with a bit more depth, and at the factors that influence how well it learns.

If you want to play around with this neural network and a new data set, just remember to set your input and output sizes correctly.