

# Recurrent Neural Networks and Long Short Term Memory (LSTM)

![Monkey at a typewriter](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/Chimpanzee_seated_at_typewriter.jpg/603px-Chimpanzee_seated_at_typewriter.jpg)

It is said that [infinite monkeys typing for an infinite amount of time](https://en.wikipedia.org/wiki/Infinite_monkey_theorem) will eventually type, among other things, the complete works of Wiliam Shakespeare. Let's see if we can get there a bit faster, with the power of Recurrent Neural Networks and LSTM.

This text file contains the complete works of Shakespeare: https://www.gutenberg.org/files/100/100-0.txt

Use it as training data for an RNN - you can keep it simple and train character level, and that is suggested as an initial approach.

Then, use that trained RNN to generate Shakespearean-ish text. Your goal - a function that can take, as an argument, the size of text (e.g. number of characters or lines) to generate, and returns generated text of that size.

Note - Shakespeare wrote an awful lot. It's OK, especially initially, to sample/use smaller data and parameters, so you can have a tighter feedback loop when you're trying to get things running. Then, once you've got a proof of concept - start pushing it more!

In [None]:
from tensorflow.keras.callbacks import LambdaCallback
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import get_file
import requests

import numpy as np
import pandas as pd
import random 
import sys
import os
import io 

In [None]:
#https://www.gnu.org/software/sed/manual/sed.html
#http://www.gutenberg.org/ebooks/100
#http://queirozf.com/entries/sed-examples-search-and-replace-on-linux
#https://www.quora.com/What-are-differences-between-update-rules-like-AdaDelta-RMSProp-AdaGrad-and-AdaM?share=1
#https://stackoverflow.com/questions/46437761/codecs-openutf-8-fails-to-read-plain-ascii-file/46438434#46438434

In [None]:
#Load Data url

path = get_file('100-0.txt', origin='https://www.gutenberg.org/files/100/100-0.txt')

with io.open(path, encoding='utf-8') as f:
  text = f.read().lower()
  print('corpus length', len(text))

Downloading data from https://www.gutenberg.org/files/100/100-0.txt
corpus length 5573152


In [None]:
#Unique Chars as list
chars = sorted(list(set(text)))
print('Total Unique Chars:', len(chars))

Total Unique Chars: 79


In [None]:
# lookup tables
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

#create the sequence data
maxlen = 40
step = 2

sentences = []  # Each element is 40 chars long
next_chars = [] # One element for each sequence

encoded = [char_indices[c] for c in text]

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])

print('sequences:', len(sentences))

#Create X and y:



x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1
    
print('Vectorization')
x.shape, y.shape

sequences: 2786556
Vectorization


((2786556, 40, 79), (2786556, 79))

##RNN/LSTM Sentiment Classification with Keras
Rmsprop IS uses a moving average of squared gradients to normalize the gradient itself. That has an effect of balancing the step size — decrease the step for large gradient to avoid exploding, and increase the step for small gradient to avoid vanishing.

In [None]:
#Build Model a single LSTM (Long Short Term Memory (LSTM))

model = Sequential()

model.add(LSTM(128, input_shape=(maxlen, len(chars)))) #Long Short Term Memory
model.add(Dense(len(chars), activation='softmax'))
optimizer = RMSprop(learning_rate=0.01)

model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 128)               106496    
_________________________________________________________________
dense (Dense)                (None, 79)                10191     
Total params: 116,687
Trainable params: 116,687
Non-trainable params: 0
_________________________________________________________________


#Helper Functions

Return a probability distribution, and this return the maximun value, which is the most likely character.

In [None]:
#Helper function to sample an index from a probability array

def sample(preds):
  preds = np.asarray(preds).astype('float64')
  preds = np.log(preds) / 1
  exp_preds = np.exp(preds)
  preds = exp_preds/np.sum(exp_preds)
  probas = np.random.multinomial(1, preds, 1)
  return np.argmax(probas) 

In [None]:
#Fuction invoked at end of each epoch, Prints generated text.
def on_epoch_end(epoch, _):
       
    print()
    print('----- Generating text after Epoch: %d' % epoch)
    
    start_index = random.randint(0, len(text) - maxlen - 1)
    
    generated = ''
    
    sentence = text[start_index: start_index + maxlen]
    generated += sentence
    
    print('----- Generating with seed: "' + sentence + '"')
    sys.stdout.write(generated)
    
    for i in range(400):
        x_pred = np.zeros((1, maxlen, len(chars)))
        for t, char in enumerate(sentence):
            x_pred[0, t, char_indices[char]] = 1
            
        preds = model.predict(x_pred, verbose=0)[0]
        next_index = sample(preds)
        next_char = indices_char[next_index]
        
        sentence = sentence[1:] + next_char
        
        sys.stdout.write(next_char)
        sys.stdout.flush()
    print()


print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

#FIT MODEL

In [None]:
#Fit Model:

model.fit(x, y,
          batch_size=1024,
          epochs=10,
          callbacks=[print_callback])

Epoch 1/10
----- Generating text after Epoch: 0
----- Generating with seed: ",
while you perform your antic round;
th"
,
while you perform your antic round;
thouch me they, fors thous
  me oe to mim it your my ho. opt monce poretr’t but asarn,
    feer wear
  a juki! sherest in i my fore nkere it my thou in wather besteblont bup ow af erofrigh the se, and the berspenithing
     i hath soughind
      exseect dolturgs’s you to will mengen more ofrantige. u tit list the rokes.
  horpusco.
wall hol of ar thee;

canes, aighis. ay uthes matile;

forfes loath 
Epoch 2/10
----- Generating text after Epoch: 1
----- Generating with seed: "the blood.

enter a messenger.

messenge"
the blood.

enter a messenger.

messengeds.
what his tait head be gond vereme?
   rightles, alstet make arighte'd
the nay but in heal achingh ut to frimes.


a, sirord.
is sun, wherland! or is tome, i knem my fricn.

upsteagionts.
gook'd thy. af hasle the not nay, buk-monesset lisio]
   aplect oniess of path me in your 

<tensorflow.python.keras.callbacks.History at 0x7f0f863a48d0>

# Resources and Stretch Goals


- Refine the training and generation of text to be able to ask for different genres/styles of Shakespearean text (e.g. plays versus sonnets)
- Train a classification model that takes text and returns which work of Shakespeare it is most likely to be from
- Make it more performant! Many possible routes here - lean on Keras, optimize the code, and/or use more resources (AWS, etc.)
- Revisit the news example from class, and improve it - use categories or tags to refine the model/generation, or train a news classifier
- Run on bigger, better data

## Resources:
- [The Unreasonable Effectiveness of Recurrent Neural Networks](https://karpathy.github.io/2015/05/21/rnn-effectiveness/) - a seminal writeup demonstrating a simple but effective character-level NLP RNN
- [Simple NumPy implementation of RNN](https://github.com/JY-Yoon/RNN-Implementation-using-NumPy/blob/master/RNN%20Implementation%20using%20NumPy.ipynb) - Python 3 version of the code from "Unreasonable Effectiveness"
- [TensorFlow RNN Tutorial](https://github.com/tensorflow/models/tree/master/tutorials/rnn) - code for training a RNN on the Penn Tree Bank language dataset
- [4 part tutorial on RNN](http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/) - relates RNN to the vanishing gradient problem, and provides example implementation
- [RNN training tips and tricks](https://github.com/karpathy/char-rnn#tips-and-tricks) - some rules of thumb for parameterizing and training your RNN