In [0]:
import os
import sys
import math
import time
import itertools

import tensorflow as tf
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tensorflow import keras
from sklearn.preprocessing import OneHotEncoder

%matplotlib inline

# Recurrent Neural Networks

[Karpathy's blog about RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)



    'If training vanilla neural nets is optimization over functions, training recurrent nets is optimization over programs.'

Sequences. Depending on your background you might be wondering: What makes Recurrent Networks so special? A glaring limitation of Vanilla Neural Networks (and also Convolutional Networks) is that their API is too constrained: they accept a fixed-sized vector as input (e.g. an image) and produce a fixed-sized vector as output (e.g. probabilities of different classes). Not only that: These models perform this mapping using a fixed amount of computational steps (e.g. the number of layers in the model). The core reason that recurrent nets are more exciting is that they allow us to operate over sequences of vectors: Sequences in the input, the output, or in the most general case both.

![](http://karpathy.github.io/assets/rnn/diags.jpeg)

Each rectangle is a vector and arrows represent functions (e.g. matrix multiply). Input vectors are in red, output vectors are in blue and green vectors hold the RNN's state. From left to right: 

1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). 

2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words). 

3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). 

4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French).

5. Synced sequence input and output (e.g. video classification where we wish to label each frame of the video). Notice that in every case are no pre-specified constraints on the lengths sequences because the recurrent transformation (green) is fixed and can be applied as many times as we like.

## RNN

![](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png)

![](https://image.slidesharecdn.com/rnn-lstm-161106132927/95/understanding-rnn-and-lstm-4-638.jpg?cb=1478439617)


## The Problem of Long-Term Dependencies

![](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-shorttermdepdencies.png)

![](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-longtermdependencies.png)

## LSTM

[Colah's blog about LSTMs](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

![](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png)

![](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM2-notation.png)

## Language models

![](https://raw.githubusercontent.com/torch/torch.github.io/master/blog/_posts/images/rnnlm.png)

![](http://karpathy.github.io/assets/rnn/charseq.jpeg)

![](https://i.redd.it/cw04e9546gv11.png)

# Train character level RNN for language modelling

## Load corpora

Upload text file from your drive

In [0]:
from google.colab import files

In [0]:
uploaded = files.upload()

Saving sheakspeare.txt to sheakspeare.txt


In [0]:
data = str(list(uploaded.values())[0])

## Prepare dataset

Calculate vocabulary size. Create char2index and index2char dictionaries, that could help us in text vectorizing

In [0]:
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('data has %d characters, %d unique.' % (data_size, vocab_size))

char_to_ix = { ch:i for i,ch in enumerate(chars) }
ix_to_char = { i:ch for i,ch in enumerate(chars) }

data has 1235397 characters, 66 unique.


Prepare text x and y datasets

In [0]:
max_len = 20
step = 3

sentences = []
next_chars = []
for i in range(0, len(data) - max_len, step):
    sentences.append(data[i: i + max_len])
    next_chars.append(data[i + max_len])
    
print("total # of sentences: ", len(sentences))    

total # of sentences:  411793


Translate string datasets into number vectors

In [0]:
x = np.zeros((len(sentences), max_len), dtype=np.int)
y = np.zeros((len(sentences)), dtype=np.int)

for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t] = char_to_ix[char]
        
    y[i] = char_to_ix[next_chars[i]]
    
x.shape, y.shape

((411793, 20), (411793,))

## Define model

Define the character level LSTM model for text generation that consists of:

1.   (Optional) Embedding layer [keras.layers.Embedding](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding) for training the word embeddings. You should pass the propper input dim to the layer and specify the embedding dim. **Note** that because we use the RNN model, you don't need to specify the input sequence length.

2.   Some LSTM layers [keras.layers.LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM) with specified number of hidden units.  **Note** that middle LSTM layers should return full sequences (you can specify this with parameter **return_sequences**).

3.   Together with LSTM layers, you can also use the dropout layers [keras.layers.Dropout](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout), to regularize the network.

4.   Final dense layer for making the classification [keras.layers.Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense), with specified number of output units and activation function.

**Define model hyperparameters**

In [0]:
dropout_rate = 0.3
hidden_dim = 100

dropout = keras.layers.Dropout(rate=dropout_rate)

input_dim = vocab_size
lstm_1_out_dim = 256
output_dim = vocab_size

**Define model as keras sequential**

In [0]:
model = keras.models.Sequential()

**Define the embedding layer**

You could define the **optional** embedding layer

In [0]:
embedding_layer = keras.layers.Embedding(input_dim,
                                         hidden_dim)

model.add(embedding_layer)

**Define LSTM layers**

You could define as many LSTM layers, as you want. Remember that middle LSTM layers should return full sequences, not one word (you can specify this with parameter *return_sequences*). You could also define dropout after LSTM layers.

In [0]:
lstm1 = keras.layers.CuDNNLSTM(lstm_1_out_dim, return_sequences=True)
lstm2 = keras.layers.CuDNNLSTM(output_dim, return_sequences=False)
model.add(lstm1)
model.add(dropout)
model.add(lstm2)

**Define dense classification layer**

In [0]:
out = keras.layers.Dense(output_dim, activation='softmax')
model.add(out)

**Compile the model**

In [0]:
model.compile(loss='sparse_categorical_crossentropy', 
              optimizer='adam',
              metrics=['accuracy'])

**Check the model summary**

In [0]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, None, 100)         6600      
_________________________________________________________________
cu_dnnlstm_10 (CuDNNLSTM)    (None, None, 256)         366592    
_________________________________________________________________
dropout_1 (Dropout)          (None, None, 256)         0         
_________________________________________________________________
cu_dnnlstm_11 (CuDNNLSTM)    (None, 66)                85536     
_________________________________________________________________
dense_2 (Dense)              (None, 66)                4422      
Total params: 463,150
Trainable params: 463,150
Non-trainable params: 0
_________________________________________________________________


## Define sampling function

Define the function that takes the model and generates the string from characters sampled from model probabilities.

In [0]:
def sample_string(model, seq_len, input_sequence=None):
    if input_sequence is None:
        generated_sequence = ix_to_char[x[np.random.choice(len(x)),0]]
    else:
        generated_sequence = input_sequence
    
    for _ in range(seq_len):
        seq_vector = np.array([[char_to_ix[c] for c in generated_sequence]])
#         print(seq_vector)
        word_proba = model.predict_proba(seq_vector)
#         print(vocab_size)
#         print(word_proba.shape)
#         print(word_proba.flatten().shape)
        predicted_word = ix_to_char[np.random.choice(vocab_size, p=word_proba.flatten())]
        generated_sequence += predicted_word

    return generated_sequence

## Train model

In [0]:
sample_len = 100

generated_string = sample_string(model, sample_len)
print("Sample string before training: '%s'" % generated_string)

for epoch in range(100):
    model.fit(x, y, batch_size=256, epochs=1)
    generated_string = sample_string(model, sample_len)
    print("Sample string after epoch %d: '%s'" % (epoch, generated_string))

Sample string before training: ' cdk KbJsHLy bqXdV"IaLrUDFMLYoYWJ\"l?SPoAKWh:ImoTJ U&QjX'-uCYglZlaLaQ\U';.s$B$TZ,tLlpdZkZBu"BI'ZhHe, '
Sample string after epoch 0: 'evi; su fome kaam, hit aroralsl thas go sid sveor heunson'r\r\nWraiuide hif me tuechyret' rum e wo sr'
Sample string after epoch 1: ' D:ivind have hich aw elker.\r\n\r\nPhanord'clet mendred now the, he gourlawe my mens tremy in math d'
Sample string after epoch 2: ' nLoals syors lom;\r\nRoorse: and of the enll,\r\nO ctent tipe, and to bust ty shalkengy jiam onout, '
Sample string after epoch 3: '''VA OIN:\r\nOl a youss, ar-o shore aurst.\r\nThe tive yame unme Conturves,\r\nAnd hard favet prierd '
Sample string after epoch 4: 'e'lle bamas unqueen at then,\r\nAnd cition you his af that tufing, last:\r\nIn crest must death sheer'
Sample string after epoch 5: 'thaif-ank you, hear his hither the can preace have\r\nMy reneaves Korger.\r\nGuve men, no redues and '
Sample string after epoch 6: 'o'sers\r\nWitchen I'll and well?\r\n\

KeyboardInterrupt: ignored

# Images sources

Images and code fragments used in this notebook comes from the following web pages and papers:

1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/
2. http://colah.github.io/posts/2015-08-Understanding-LSTMs/
3. https://pt.slideshare.net/ssuser6c624f/understanding-rnn-and-lstm
4. [BERT paper](https://arxiv.org/pdf/1810.04805.pdf)
5. https://github.com/JY-H/character-level-rnn/blob/master/src/character_level.py