Recurrent Neural Networks
===

A recurrent neural network (RNN) is a class of neural network that excels when your data can be treated as a sequence - such as text, music, speech recognition, connected handwriting, or data over a time period. 

RNNs can analyse or predict a word based on the previous words in a sentence - they allow a connection between previous information and current information.

This exercise looks at implementing a LSTM RNN to generate new characters after learning from a large sample of text. LSTMs are a special type of RNN which dramatically improves the model’s ability to connect previous data to current data where there is a long gap.

We will train an RNN model using a novel written by H. G. Wells - The Time Machine.

Step 1
------

Let's start by loading our libraries looking at our text file. This might take a few minutes.

In [None]:
# Run this!

suppressMessages(install.packages("keras"))
suppressMessages(install.packages("tokenizers"))
suppressMessages(install.packages("stringr"))
suppressMessages(library(keras))
suppressMessages(library(readr))
suppressMessages(library(stringr))
suppressMessages(library(purrr))
suppressMessages(library(tokenizers))
suppressMessages(install_keras())

In [None]:
path <- file.path("Data/time-edit.txt")
# Let's have a look at the text
read_lines(path)

Expected output:  
```The Time Traveller (for so it will be convenient to speak of him) was expounding a recondite matter to us. His pale grey eyes shone and twinkled, and his usually pale face was flushed and animated.
text length: 174201 characters
unique characters: 39```

Step 2
-----

Next we'll divide the text into sequences of 35 characters.

Then for each sequence we'll make a training set - the following character will be the correct output for the test set.

### In the cell below replace:
#### 1. `<textSequenceLength>` with `35`
#### 2. `<pathToDataset>` with `path`
#### then __run the code__.

In [None]:
###
# REPLACE <textSequenceLength> WITH 35
###
maxlen <- <textSequenceLength>
###

# This makes all the characters lower case, and separates the individual characters from whole words.

###
# REPLACE <pathToDataset> WITH path
###
text <- read_lines(<pathToDataset>) %>%
###
  str_to_lower() %>%
  str_c(collapse = "\n") %>%
  tokenize_characters(strip_non_alphanum = FALSE, simplify = TRUE)

print(sprintf("Total length: %d", length(text)))

chars <- text %>%
  unique() %>%
  sort()

print(sprintf("Total chars: %d", length(chars)))

Expected output:  
`"Total length: 174666"`  
`"Total chars: 29"`

#### Replace the 3 `<maximumLength>`'s with `maxlen`

In [None]:
###
# REPLACE ALL THE <maximumLength>'s WITH maxlen
###
dataset <- map(
  seq(1, length(text) - <maximumLength> - 1, by = 6), 
  ~list(sentence = text[.x:(.x + <maximumLength> - 1)], next_char = text[.x + <maximumLength>])
  )
###

dataset <- transpose(dataset)

x <- array(0, dim = c(length(dataset$sentence), maxlen, length(chars)))
y <- array(0, dim = c(length(dataset$sentence), length(chars)))

for(i in 1:length(dataset$sentence)){
  
  x[i,,] <- sapply(chars, function(x){
    as.integer(x == dataset$sentence[[i]])
  })
  
  y[i,] <- as.integer(chars == dataset$next_char[[i]])
  
}

Step 3
------

Let's build our model, using a single LSTM layer of 64 units. We'll keep the model simple for now, so that training does not take too long.

#### Replace the `<layerSize>` with 64, and run the cell.

In [None]:
model <- keras_model_sequential()
###
# REPLACE <layerSize> WITH 64
###
model %>%
  layer_lstm(<layerSize>, input_shape = c(maxlen, length(chars))) %>%
###
  layer_dense(length(chars)) %>%
  layer_activation("softmax")

model %>% compile(
  loss = "categorical_crossentropy", 
  optimizer = "Adam"
)

We'll just get a few helper functions ready, run the cell below to prepare them.

In [None]:
# Run this cell!

sample_mod <- function(preds, temperature = 1){
  preds <- log(preds)/temperature
  exp_preds <- exp(preds)
  preds <- exp_preds/sum(exp(preds))
  
  rmultinom(1, 1, preds) %>% 
    as.integer() %>%
    which.max()
}

on_epoch_end <- function(epoch, logs) {
  
  cat(sprintf("epoch: %02d ---------------\n\n", epoch))
    
  diversity <- 0.5
  generated <- ""
    
  cat(sprintf("diversity: %f ---------------\n\n", diversity))
    
  start_index <- sample(1:(length(text) - maxlen), size = 1)
  sentence <- text[start_index:(start_index + maxlen - 1)]
    
    for(i in 1:400){
      
      x <- sapply(chars, function(x){
        as.integer(x == sentence)
      })
      x <- array_reshape(x, c(1, dim(x)))
      
      preds <- predict(model, x)
      next_index <- sample_mod(preds, diversity)
      next_char <- chars[next_index]
      
      generated <- str_c(generated, next_char, collapse = "")
      sentence <- c(sentence[-1], next_char)
      
    }
    
    cat(generated)
    cat("\n\n")
    
  
}

Ready to go. The next cell will train the model.

Training RNN's on low compute takes a long time. We'll only build a small one for now. If you want to leave this model training for longer change the number of epochs to a larger number.

#### Replace the `<epochNumber>` with 3 and run the cell.

In [None]:
# This will take a little while...
print_callback <- callback_lambda(on_epoch_end = on_epoch_end)

history <- model %>% fit(
  x, y,
  batch_size = 1,
###
# REPLACE <epochNumber> WITH 3
###
  epochs = <epochNumber>,
###
  callbacks = print_callback
)

The output won't appear to be very good. But then, this dataset is small, and we have trained it only for a short time using a rather small RNN. Feel free to increase the number of epochs and leave it training for a long time if you want to see better results.

We could improve our model by:
* Having a larger training set.
* Increasing the number of LSTM units.
* Training it for longer
* Experimenting with difference activation functions, optimization functions etc


Conclusion
--------

We have trained an RNN that learns to predict characters based on a text sequence. We have trained a lightweight model from scratch.