In [None]:
library(keras)
library(reticulate)
library(ggplot2)
library(gridExtra)
use_condaenv("r-tensorflow")
use_session_with_seed(7)
options(keras.view_metrics = TRUE)

# 2.1 - Introduction to convnets

This notebook contains the code samples found in Chapter 5, Section 1 of [Deep Learning with R](https://www.manning.com/books/deep-learning-with-r). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

We're about to dive into the theory of what convnets are and why they have been so successful at computer vision tasks. But first, let's take a practical look at a simple convnet example. It uses a convnet to classify MNIST digits, a task we performed in chapter 2 using a densely connected network (our test accuracy then was 97.8%). Even though the convnet will be basic, its accuracy will blow out of the water that of the densely connected model from notebook 1.1.

The following lines of code show you what a basic convnet looks like.  It's a stack of `layer_conv_2d()` and `layer_max_pooling_2d()` layers. You'll see in a minute exactly what they do.

Importantly, a convnet takes as input tensors of shape `(image_height, image_width, image_channels)` (not including the batch dimension). In this case, we'll configure the convnet to process inputs of size `(28, 28, 1)`, which is the format of MNIST images. We do this by passing the argument `input_shape = c(28, 28, 1)` to the first layer.

In [None]:
model <- keras_model_sequential() %>% 
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu",
                input_shape = c(28, 28, 1)) %>% 
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>% 
  layer_max_pooling_2d(pool_size = c(2, 2)) %>% 
  layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu")

<center><h3>Let's display the architecture of our convnet so far:

In [None]:
summary(model)

You can see above that the output of every `Conv2D` and `MaxPooling2D` layer is a 3D tensor of shape `(height, width, channels)`. The width 
and height dimensions tend to shrink as we go deeper in the network. The number of channels is controlled by the first argument passed to 
the `Conv2D` layers (e.g. 32 or 64).

The next step would be to feed our last output tensor (of shape `(3, 3, 64)`) into a densely-connected classifier network like those you are 
already familiar with: a stack of `Dense` layers. These classifiers process vectors, which are 1D, whereas our current output is a 3D tensor. 
So first, we will have to flatten our 3D outputs to 1D, and then add a few `Dense` layers on top:

In [None]:
model <- model %>% 
  layer_flatten() %>% 
  layer_dense(units = 64, activation = "relu") %>% 
  layer_dense(units = 10, activation = "softmax")

We are going to do 10-way classification, so we use a final layer with 10 outputs and a softmax activation. 

Now here's what our network looks like:

In [None]:
summary(model)

As you can see, our `(3, 3, 64)` outputs were flattened into vectors of shape `(576)`, before going through two `Dense` layers.

Now, let's train our convnet on the MNIST digits. We will reuse a lot of the code we have already covered in the previous MNIST example.

In [None]:
mnist <- dataset_mnist()
c(c(train_images, train_labels), c(test_images, test_labels)) %<-% mnist

train_images <- array_reshape(train_images, c(60000, 28, 28, 1))
train_images <- train_images / 255

test_images <- array_reshape(test_images, c(10000, 28, 28, 1))
test_images <- test_images / 255

train_labels <- to_categorical(train_labels)
test_labels <- to_categorical(test_labels)

In [None]:
model %>% compile(
  optimizer = "rmsprop",
  loss = "categorical_crossentropy",
  metrics = c("accuracy")
)
              
history = model %>% fit(
  train_images, train_labels, 
  epochs = 1, batch_size=64,
  validation_split=0.2
)

<center><h2> As you can see each epoch takes around 1 minute to run and we do not have time to train enough epochs, so we load a model that was trained earlier.

In [None]:
model <- load_model_hdf5("../data/models/2-1-GPU.h5")
history  <- py_load_object('../data/models/2-1-GPU-history.pk')
df <- data.frame(val_loss=unlist(history$val_loss), val_acc=unlist(history$val_acc), loss=unlist(history$loss), acc=unlist(history$acc), epochs=seq(length(history$val_loss)))
summary(model)

In [None]:
cat(paste('val_loss:',df$val_loss[df$epoch==100],'\n'))
cat(paste(' val_acc:',df$val_acc[df$epoch==100],'\n'))
cat(paste('    loss:',df$loss[df$epoch==100],'\n'))
cat(paste('     acc:',df$acc[df$epoch==100],'\n'))

In [None]:
options(repr.plot.width=8, repr.plot.height=6)

p1 <- ggplot(df, aes(x=epochs)) +
  geom_point(aes( y=loss, colour = "Trainig loss")) +
  geom_line(aes(y=val_loss,colour = "Validation loss")) +
  scale_colour_manual("",values=c("blue","orange"))
p2 <- ggplot(df, aes(x=epochs)) +
  geom_point(aes( y=acc, colour = "Training acc")) +
  geom_line(aes(y=val_acc,colour = "Validation acc")) +
  scale_colour_manual("",values=c("blue","orange"))

grid.arrange(p1,p2)

<h3><center>Let's evaluate the model on the test data:

In [None]:
results <- model %>% evaluate(test_images, test_labels)

In [None]:
results

While our densely-connected network had a test accuracy of 97.8%, our basic convnet has a test accuracy of 99.2%: we decreased our error rate by 68% (relative). Not bad! 

In [None]:
classes = model %>% predict_classes(test_images)

In [None]:
valid_class = 3   # <============== Change me to see different predictions.

# extract the prediction, we created a 10 class matrix earlier, so we need to figure out
# which class was predicted for the chosen image (ie which entry is 1 as opposed to 0)
predict = which.max((test_labels)[valid_class,])-1
m <-  t(apply(as.matrix(test_images[valid_class, , ,]), 2, rev))

# Plotting the test image
options(repr.plot.width=2, repr.plot.height=2)
par(oma=c(0,0,0,0), mar=c(0,0,2,0))
image(m, asp=1, axes=FALSE)
title(main =  paste("Predicted:",classes[valid_class],"| Actual:",predict), cex.main=0.9)