Exercise 9 - Advanced Neural Networks
===

There are many factors that influence how well a neural network might perform. AI practitioners tend to play around with the structure of the hidden layers, the activation function, the optimisation function, and the number of epochs (training cycles).

In this exercise, we will look at how changing these parameters impacts the accuracy and performance of our network.

Let's start by loading the libraries required for this session.

**Run the code below**

In [None]:
# Run this code box to load the packages we need
# It might take a few minutes...

suppressMessages(install.packages("tidyverse"))
suppressMessages(library("tidyverse"))

suppressMessages(install.packages("keras"))
suppressMessages(library("keras"))
suppressMessages(install_keras())

options(repr.plot.width = 7, repr.plot.height = 5)

Step 1
---

We will use the same dog data set as in Exercise 8, building on what we learnt before and trying different parameters for a network to try and improve performance.

Let's open up our data set and create training and test sets.

### In the cell below replace:
#### 1. `<featureColumns>` with `1:3`
#### 2. `<labelColumn>` with `4`
#### 3. `<featureColumns>` with `1:3`
#### 4. `<labelColumn>` with `4`
#### then __run the code__.

In [None]:
# Run this box to set up our training and test datasets

# Load the dog data
dog_data <- read.csv("Data/dog_data.csv")

# Check structure
str(dog_data)
head(dog_data)

# Take the first 160 observations, separate the features from the labels, and assign them to the training set
###
# REPLACE <featureColumns> WITH 1:3 AND <labelColumn> WITH 4
###
train_X <- as.matrix(dog_data[1:160, <featureColumns>])
raw_train_Y <- as.matrix(dog_data[1:160, <labelColumn>])
###

# Take the last 40 observations, separate the features from the labels, and assign them to the test set
###
# REPLACE <featureColumns> WITH 1:3 AND <labelColumn> WITH 4
###
test_X <- as.matrix(dog_data[161:200, <featureColumns>])
raw_test_Y <- as.matrix(dog_data[161:200, <labelColumn>])
###

And just like the last exercise, we will transform the raw labels into one-hot vectors

### In the cell below replace:
#### 1. `<one-hot-function>` with `to_categorical`
#### 2. `<numberOfClasses>` with `3`
#### 3. `<one-hot-function>` with `to_categorical`
#### 4. `<numberOfClasses>` with `3`
#### then __run the code__.

In [None]:
# Set the testing and training labels as categories using one-hot vectors
###
# REPLACE <one-hot-function> WITH to_categorical AND <numberOfClasses> WITH 3
###
train_Y <- <one-hot-function>(raw_train_Y, num_classes = <numberOfClasses>)
test_Y <- <one-hot-function>(raw_test_Y, num_classes = <numberOfClasses>)
###

head(train_Y)

Done!

Step 2
---

The code block below contains a custom function `train_network` to help us quickly change the training factors of our neural network. We will use this function throughout the remainder of this exercise.

The `train_network` function allows us to change:

* the size and/or number of layers;
* the activation function the layers use;
* the optimizer of the model;
* the number of training cycles for the model (`epochs`).

**Run the code below**

In [None]:
# Run this box to prepare functions for later

# Define our custom function `train_network` with four arguments
train_network <- function(structure, activation, optimizer, epochs){
    suppressMessages(use_session_with_seed(1))
    model = keras_model_sequential()
    
    model %>%
    layer_dense(units = structure[2], activation = activation, input_shape = structure[1]) %>%
    layer_dense(units = structure[3], activation = activation) %>% 
    layer_dense(units = structure[4], activation = "softmax")
    
    model %>% 
    compile(loss = "categorical_crossentropy", optimizer = optimizer, metrics = c("accuracy"))
    
    history = model %>% 
    fit(x = train_X, y = train_Y, shuffle = T, epochs = epochs, batch_size = 5, 
        validation_split = 0.3)
    
    history_df <- as.data.frame(history)   
    acc <<- history_df[nrow(history_df), 2]
    print("Accuracy based on training set...")
    print(history_df[nrow(history_df), 2])
    
    perf <- model %>% evaluate(test_X, test_Y)
    print("Accuracy based on test set...")
    print(perf$acc)
    testacc <<- perf$acc
    
    plot(history)
}

Let's recreate the neural network from Exercise 8 to use as our bench mark, but we will change it to have two hidden layers. You do not need to edit the code block below.

### In the cell below replace:
#### 1. `<activationFunction>` with `"relu"`
#### 2. `<optimizer>` with `"adagrad"`
#### 3. `<epochNumber>` with `30`
#### then __run the code__.

In [None]:
# Run this code to train the network

# Create variables for each of the inputs to our custom function
sample_structure <- c(3, 10, 10, 3)
###
# REPLACE <activationFunction> WITH "relu" (INCLUDING THE QUOTATION MARKS!)
###
sample_activation <- <activationFunction>
###

###
# REPLACE <activationFunction> WITH "adagrad" (INCLUDING THE QUOTATION MARKS!)
###
optimizer <- <activationFunction>
###

###
# REPLACE <epochNumber> WITH 30
###
sample_epochs <- <epochNumber>
###

# Run our custom function specifying our arguments in correct order: structure, activation, optimizer, epochs
train_network(sample_structure, sample_activation, optimizer, sample_epochs)

Step 3
---

Now, let's start playing with the structure of our neural network, in particular the size of our hidden layers. We can easily do this by changing the input to the first argument of our `train_network` function, `structure`.

Here we will test the size of our two hidden layers, testing values 1 through to 10. For simplicity, we will make the size of the two hidden layers the same, e.g. when we test a layer size of 5, the structure of our neural network will be `[3, 5, 5, 3]`, and when we test a layer size of 9, our neural network structure will be `[3, 9, 9, 3]`. Note that both the input and output layers of our network must remain as size 3, as our data have 3 input features.

**In the code below:**  
**1. Run the first box to alter the structure of the network**  
**2. Run the second box to plot the results**

In [None]:
# Run this code box to alter the structure of the network

# Initialise empty lists to store results
train_acc <- c()
test_acc <- c()

# Change the input to our first argument of our `train_network` function
for(i in 1:10){
    NN_structure <- c(3, i, i, 3)
    print("TEST THE FOLLOWING NUMBER OF HIDDEN LAYERS...")
    print(i)
    train_network(NN_structure, sample_activation, optimizer, sample_epochs)
    train_acc[i] <- acc
    test_acc[i] <- testacc
}

In [None]:
# Run this box to plot the results

# Reshape the results for plotting
train_results <- data.frame(dataType = rep("Training", 10), acc = train_acc, nLayers = seq(1, 10, 1), stringsAsFactors = FALSE)
test_results <- data.frame(dataType = rep("Test", 10), acc = test_acc, nLayers = seq(1, 10, 1), stringsAsFactors = FALSE)

hiddenLayerDf <- train_results %>%  mutate(dataType = 'Training') %>%
       bind_rows(test_results %>%
           mutate(dataType = 'Test'))

ggplot(hiddenLayerDf,aes(y = acc,x = nLayers,color = dataType)) + 
  geom_line() +
  labs(title = "", x = "Size of hidden layers", y = "Accuracy", colour = "Data type") +
scale_x_discrete(limits = seq(1, 10, 1))

So, experimenting with different sizes of hidden layers can dramatically improve your results.

Step 4
---

Now we'll look at how different **activation functions** impact the performance of neural networks. To do this, we need to change the second argument to our custom function `train_network`, the `activation` argument.

There are many different activation functions to try, so let's store them all as a vector and try them all!

#### Replace `<addActivation>` with `activation_functions[i]` and run the code.

In [None]:
# Run this box to run the network with different activation functions

# Initialise empty lists to store results
train_acc <- c()
test_acc <- c()

# Create a vector listing all the activation functions we wish to test
activation_functions <- c("elu", "hard_sigmoid", "linear", "relu", "selu", "sigmoid", 
                         "softplus", "softsign", "tanh")

# # Uncomment the code below to play with the structure, optimizer, and epochs
# sample_structure <- c(3, ?, ?, 3) # e.g. c(3, 4, 4, 3)
# optimizer <- "?" # e.g. "adagrad"
# sample_epochs <- ? # e.g. 20

# Test all the different activation functions and save results
for(i in 1:length(activation_functions)){
    print("Evaluating model with hidden layer activation function... ")
    print(activation_functions[i])
###
# REPLACE <addActivation> WITH activation_functions[i]
###    
    train_network(sample_structure, <addActivation>, optimizer, sample_epochs)
###    
    train_acc[i] <- acc
    test_acc[i] <- testacc
}

print("Finished!")

#### Now run the code below to plow the results.

In [None]:
# Run this box to plot the result

# Reshape the results for plotting
train_results <- data.frame(dataType = "Train", actFuncName = activation_functions, funcAcc = train_acc,
                            stringsAsFactors = FALSE)
test_results <- data.frame(dataType = "Test", actFuncName = activation_functions, funcAcc = test_acc,
                           stringsAsFactors = FALSE)

results <- bind_rows(train_results, test_results) %>%
mutate(dataType = as.factor(dataType))

# Create line plot: activation function vs. accuracy coloured by data type
results %>%
ggplot(aes(actFuncName, funcAcc, group = dataType, colour = dataType)) +
geom_line() +
labs(title = "", x = "Activation function", y = "Function accuracy", colour = "Data type") +
theme(plot.title = element_text(hjust = 0.5))

There's quite a lot of variance there. It's always good to quickly test different activation functions first.

Step 5
---

The __optimisation function__ is the next major parameter of the network architecture. It changes how the network is trained, so it can have a __very large impact on training time and end performance__.

#### Replace `<optimizerFunction>` with `optimizer_functions[i]` and run the cell.

In [None]:
# Run this box to try different optimization functions

# Initialise empty lists to store results
train_acc <- c()
test_acc <- c()

# Create a vector listing all the optimization functions we wish to test
optimizer_functions = c("adadelta", "adagrad", "adam", "adamax",
                        "nadam", "rmsprop", "sgd")
NN_structure <- c(3, 9, 9, 3)
sample_activation <- "relu"

# Uncomment the code below to play with the structure, activation, and epochs
# NN_structure <- c(3, ?, ?, 3) # e.g. c(3, 4, 4, 3)
# sample_activation <- ? # e.g. "tanh"
# sample_epochs <- ? # e.g. 20

# Test all the different optimization functions and save results
for(i in 1:length(optimizer_functions)){
    print("Evaluating model with hidden layer optimization function... ")
    print(optimizer_functions[i])
###
# REPLACE <optimizerFunction> WITH optimizer_functions[i]
###    
    train_network(NN_structure, sample_activation, <optimizerFunction>, sample_epochs)
###    
    train_acc[i] <- acc
    test_acc[i] <- testacc
}

train_acc
test_acc

#### Now run the code below to plot the results.

In [None]:
# Run this box to plot the results

# Reshape the results to create plot
train_results <- data.frame(dataType = "Train", optFuncName = optimizer_functions, funcAcc = train_acc,
                            stringsAsFactors = FALSE)
test_results <- data.frame(dataType = "Test", optFuncName = optimizer_functions, funcAcc = test_acc,
                           stringsAsFactors = FALSE)

results <- bind_rows(train_results, test_results) %>%
mutate(dataType = as.factor(dataType))

# Create line plot: optimzation function vs. accuracy coloured by data type
results %>%
ggplot(aes(optFuncName, funcAcc, group = dataType, colour = dataType)) +
geom_line() +
labs(title = "Performance of training and test sets using different optimizer functions",
     x = "Optimization function", y = "Function accuracy", colour = "Data type") +
theme(plot.title = element_text(hjust = 0.5))

Step 6
---

Now let's test the number of training cycles for the model, i.e. `epochs`, the final argument in our custom function.

**In the code below, change the epochs below to any positive whole number and press Run. Try this with several different numbers.**

In [None]:
###
# CHANGE 15 TO ANY POSITIVE INTEGER
###
epochs <- 15

train_network(sample_structure, sample_activation, optimizer, epochs)

You will notice a trend: the higher the number of epoch/training cycles, the greater the accuracy of the model.

Step 7
---

Let's try to combine what we've seen above and try to create a neural network that performs better than what we made in Exercise 7, where we used the structure `[3, 4, 2, 3]`, the activation function `relu`, and the optimiser `sgd` (stochastic gradient descent).

**Follow the instructions in the code below**

In [None]:
###
# Run this box to train once more with a good selection of options
# Then change the configurations as you like and run again to see how the network performs
###

sample_structure <- c(3, 9, 9, 3)
sample_activation <- "selu"
optimizer <- "adam"
sample_epochs <- 10

train_network(sample_structure, sample_activation, optimizer, sample_epochs)

How does it look? Were we able to beat the other network? Try out a number of different configurations to see how they perform!

Conclusion
---

We've compared how different neural network architecture parameters influence accuracy performance, and we've tried to combine them in such a way that we maximise this performance.