Convolutional Neural Networks
======

Convolutional neural networks (CNNs) are a class of deep neural networks, most commonly used in computer vision applications.

Convolutional refers the network pre-processing data for you - traditionally this pre-processing was performed by data scientists. The neural network can learn how to do pre-processing *itself* by applying filters for things such as edge detection.

This exercise uses keras and another library, which we need to load before we begin.

**Run the code below to start loading the required libraries for this exercise.**

In [None]:
# Run this box to load libraries

# Load libraries
suppressMessages(install.packages("keras"))
suppressMessages(install.packages("stringr"))
suppressMessages(library(keras))
suppressMessages(install_keras())
suppressMessages(library(stringr))

Step 1
-----

In this exercise we will train a CNN to recognise handwritten digits, using the MNIST digit dataset.

This is a very common exercise and data set to learn from.

Let's start by load our dataset and setting up our training and test sets (keras will automatically assign a validation set from the training set for us).

### In the cell below replace:
#### 1. `<selectTrainingSetX>` with `1:1000,,`
#### 2. `<selectTrainingSetY>` with `1:1000`
#### 3. `<selectTestSetX>` with `1001:1500,,`
#### 4. `<selectTestSetY>` with `1001:1500`
#### then __run the code__.

In [None]:
# Run this box to load the dataset and split it into training and test sets

# Here we import the dataset.
mnist <- dataset_mnist()


# This stores our features and labels for both our training and test sets as local variables
###
# REPLACE <selectTrainingSetX> WITH 1:6000,, AND <selectTrainingSetY> WITH 1:6000
###
raw_x_train <- mnist$train$x[<selectTrainingSet>]
raw_y_train <- mnist$train$y[<selectTrainingSet>]
###

###
# REPLACE <selectTestSetX> WITH 6001:8000,, AND <selectTestSetY> WITH 6001:8000
###
raw_x_test <- mnist$test$x[<selectTestSet>]
raw_y_test <- mnist$test$y[<selectTestSet>]
###

# This tells us the dimensions of our training set's features
dim(raw_x_train)

Expected output:  
`1000  28  28`

So we have 1,000 training samples.


The two 28's after the 1,000 tell us each sample is 28 pixels wide and 28 pixels high.

Each pixel is really just a number from 0 to 255 - 0 being fully black, 255 being fully white - so the images are greyscale. When we graph the 28x28 numbers, we can see the image.

Step 2
============

So, let's have a look at one of our samples.

**Run the code below**

In [None]:
# Run this box to look at one of our images
im <- raw_x_train[1,,]
im <- t(apply(im, 2, rev)) 
image(1:28, 1:28, im, col=gray((0:255)/255), xaxt='n', main=paste(raw_y_train[1]))

Our first training image is `5`.

Next, let's check out our test set.

**Run the code below**

In [None]:
# Run this to see the dimensions of our test set.
dim(raw_x_test)

Expected output:  
`500  28  28`

And we have 500 test images!

Let's take a look at the first image in the test set.

**Run the code below**

In [None]:
# Run this to look at the first image in the test set
im <- raw_x_test[1,,]
im <- t(apply(im, 2, rev)) 
image(1:28, 1:28, im, col=gray((0:255)/255), xaxt='n', main=paste(raw_y_test[1]))

You should see a 9 above. Looking good - next we will prepare our data for another neural network.

Step 3
---

The neural network will use the 28x28 values of each image to predict what each image represents.

We need to reshape our data to get it working well with our neural network. 

**Run the code below**

In [None]:
# Read then run this code

# First off, let's reshape our X sets so that they fit the convolutional layers.
x_train <- array_reshape(raw_x_train, c(nrow(raw_x_train), 28, 28, 1))
x_test <- array_reshape(raw_x_test, c(nrow(raw_x_test), 28, 28, 1))

# Next up - feature scaling.
# We scale the values so they are between 0 and 1, instead of 0 and 255.
x_train <- x_train / 255
x_test <- x_test / 255

# Print the label associated with the first element in the training data set
print(raw_y_train[1])

Expected output:  
`5`

The label is a number - the number we see when we view the image.

We need represent this number as a category by using a one-hot vector, rather than an integer (a number). This is the same as if we were still trying to predict the breed of a dog.

Keras can convert these numeric labels into one-hot vectors easily with the function - `to_categorical`

#### Replace `<addCategorical>` with `to_categorical` and run the code.

In [None]:
# The 10 means that there are 10 different categories - 0 to 9
###
# REPLACE THE <addCategorical> BELOW WITH to_categorical
###
y_train <- <addCategorical>(raw_y_train, 10)
y_test <- <addCategorical>(raw_y_test, 10)
###

# Print the label for the first element
print(y_train[1,])

Expected output:  
`[1] 0 0 0 0 0 1 0 0 0 0`

Step 4
-----

All ready! Time to build another neural network.

We need to add in convolutional layers. We have 2D images, so we want 2D layers. We also will use a few additional techniques which you can read about in the code comments.

### In the cell below replace:
#### 1. `<shape1>` with `28 `
#### 2. `<shape2>` with `28`
#### 3. `<shape3>` with `1`
#### 4. `<numberOfClasses>` with `10`

#### and then __run the code__.

In [None]:
suppressMessages(use_session_with_seed(1))
###
# REPLACE THE <shape1> WITH 28 AND <shape2> WITH 28 AND <shape3> WITH 1
###
input_shape <- c(<shape1>, <shape2>, <shape3>)
###

###
# REPLACE THE <numberOfClasses> WITH 10
###
num_classes <- <numberOfClasses>
###

Time to set up our model.

### In the cell below replace:
#### 1. `<convolutionalLayer>` with `layer_conv_2d `
#### 2. `<convolutionalLayer>` with `layer_conv_2d`
#### 3. `<poolingLayer>` with `layer_max_pooling_2d`
#### 4. `<dropout>` with `layer_dropout`
#### 5. `<flatten>` with `layer_flatten()`
#### 6. `<dropout>` with `layer_dropout`

#### and then __run the code__.

In [None]:
# This box sets up a new convolutional neural network and prints a summary         

use_session_with_seed(1)
set.seed(1)

model <- keras_model_sequential() %>%
# Here we start with the convolutional layers
###
# REPLACE THE TWO <convolutionalLayer>'s BELOW WITH layer_conv_2d
###
  <convolutionalLayer>(filters = 28, kernel_size = c(3,3), activation = 'relu',
                input_shape = input_shape) %>% 
  <convolutionalLayer>(filters = 28, kernel_size = c(3,3), activation = 'relu') %>%
###

# Pooling layers help speed up training time and make features it detects more robust.
# They act by downsampling the data - reducing the data size and complexity.
###
# REPLACE <poolingLayer> WITH layer_max_pooling_2d
###
  <poolingLayer>(pool_size = c(2, 2)) %>%
###

# Dropout is a technique to help prevent overfitting
# It makes nodes 'dropout' - turning them off randomly.
###
# REPLACE <dropout> WITH layer_dropout
###
  <dropout>(rate = 0.125) %>% 
###

# Next the data is flattened to a vector
###
# REPLACE <flatten> WITH layer_flatten()
###
  <flatten> %>% 
###

# Dense layers perform classification - we have extracted the features with the convolutional pre-processing
  layer_dense(units = 64, activation = 'relu') %>% 
###
# REPLACE <dropout> WITH layer_dropout
###
  <dropout>(rate = 0.25) %>% 
###

# Next is our output layer
# Softmax outputs the probability for each category
  layer_dense(units = num_classes, activation = 'softmax')


# Let's print out the structure of our model
summary(model)

In [None]:
# Run this cell!
# Time to compile the model, ready for training

model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = 'Adamax',
  metrics = c('accuracy')
)

Step 5
============

Time to train our model!

If it's taking a while you can lower the number of epochs. If you want to leave it running in the background and see how accurate you can get, you can increase the number of epochs.

### In the cell below replace:
#### 1. `<numberOfEpochs>` with `25`
#### 2. `<validationPercentage>` with `0.2`

#### and then __run the code__.

In [None]:
# Run this code to train the convolutional neural network and print out its accuracy

history <- model %>% fit(
  x_train, y_train, 
###
# REPLACE <numberOfEpochs> WITH 25 AND <validationPercentage> WITH 0.2
###    
  epochs = <numberOfEpochs>, batch_size = 32, 
  validation_split = <validationPercentage>
###    
)

# Make a graph of loss and accuracy
plot(history)

# Let's take a look at the loss and accuracy on the test set
model %>% evaluate(x_test, y_test)

predictions <- model %>% predict_classes(x_test)
scores <- model %>% evaluate(
  x_test, y_test, verbose = 0
)

# Output metrics
cat('Test loss:', scores[[1]], '\n')
cat('Test accuracy:', scores[[2]], '\n')

Step 6
============

Let's take a look at an actual prediction, and what the image in the test set looks like.

**Run the code below**

In [None]:
# Run this box to print how the  convolutional neural network predicts the label for an image
print("prediction:")
print(predictions[1])
print("Test image:")
im <- x_test[1,,,]
im <- t(apply(im, 2, rev)) 
image(1:28, 1:28, im, col=gray((0:255)/255), xaxt='n')

How is the prediction? Does it look right?

Conclusion
------

Congratulations! We've built a convolutional neural network that is able to recognise handwritten digits with very high accuracy.

CNN's are very complex - you're not expected to understand everything (or most things) we covered here. They take a lot of time and practice to properly understand each aspect of them.

Here we used:  
* __Feature scaling__ - reducing the range of the values. This helps improve training time.
* __Convolutional layers__ - network layers that pre-process the data for us. These apply filters to extract features for the neural network to analyze.
* __Pooling layers__ - part of the Convolutional layers. They apply filters to downsample the data - extracting features.
* __Dropout__ - a regularization technique to help prevent overfitting.
* __Dense layers__ - neural network layers which perform classification on the features extracted by the convolutional layers and downsampled by the pooling layers.
* __Softmax__ - an activation function which outputs the probability for each category.