## 1. Frame the problem 
In this tutorial, we will learn to build a simple recurrent neural network using Keras. We assume that you have already installed Keras when studying the last CNN project. For more details about the API, please refer to https://keras.io/
First, we will import some python packages.


In [1]:
from __future__ import print_function                  # Allows for python3 printing
import keras
from keras.models import Sequential
from keras.layers import Input, Dense, Dropout, LSTM, Activation
from keras.callbacks import ModelCheckpoint
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras import optimizers
import numpy as np
import sys

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


Keras has two kinds of models, the Sequential model is a linear stack of layers and is simpler. The other is the Model class used with functional API, which is capable to build more complex models. In project 1, we have learned how to use the Sequential model, and in project 2 we used functional API. Here we will go back to the Sequential model for demonstration purpose.  

## 2. Get the data
Here we use the text file in this github repo. The raw text was downloaded from https://archive.org/details/shakespearessonn01041gut 

In [2]:
filename = "sonnet.txt"
#read the file and make all chracaters lowercase
text = open(filename).read().lower() 

We convert all characters to lowercase, so as to avoid distinguishing between lower and upper case letters.

## 3. Explore the data

In this section, we want to gather more information about our data. We are interested in, among other things, the total length of the text (how many characters are in the file) and the number of unique characters used (which will be 26 letters and a few symbols)

In [3]:
# summarize the loaded data
print('text length:', len(text))    # this prints the length of the text 

chars = sorted(list(set(text)))    # sorted():Return a new list containing all items from the iterable in ascending order.
print('total chars:', len(chars))
print(chars)
##This should print 38.

text length: 95690
total chars: 38
['\n', ' ', '!', "'", '(', ')', ',', '-', '.', ':', ';', '?', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


## 4. Prepare the data
First, we need to encode all of the characters in our corpus as numbers. Here we use the python dictionary class, or dict, to allow for efficient interchange between numbers and characters.

Recall that a dictionary is a set of key and value pairs, where keys can be used to efficiently get an associated value. Dictionaries are much faster and more flexible than lists, which are limited by index logic.

my_dict = {'name': 'John', 1: [2, 4, 3]}

In the following code we construct two dictionaries. One allows us to efficiently map from characters in our corpus to an index, and the other allows us to go from an index to the character at that index. 

In [4]:
# 4.1 encoding and decoding dictionaries
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

##### 4.1 encoding and decoding dictionary

"char_indices" is a dictionary that returns the index for any input character.
'indices_char' does the oppostive, returning the character that corresponds to an input index. 

char_indices[‘a’] = (index of a), indices_char[n] = ‘(character indexed by n)’

We'll convert chars to an RNN readable format with char_indices, and we'll convert our RNN's outputs to characters with indices_char.

In [5]:
# 4.2 cut corpus into equal length sequences
maxlen = 40
step = 3

sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])               
    next_chars.append(text[i + maxlen])

x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)   #generate zeros with size[number of sentences,40,38]
y = np.zeros((len(sentences), len(chars)), dtype=np.bool) #generate zeros with size[number of sentences,38]
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

##### 4.2 cut the corpus into equal length sequences####
We also want to have the text ordered into equal length sequences. (There are multiple ways to do this, but for simplicity we just choose sequences of a set number of characters). Recall that an RNN calculates the gradient with respect to every copy of the network, for every step in every sequence. Thus the longer the length of the sequences, or the more sequences we have, the longer it takes to train. 

We choose 40 characters for each sequence and set the step size to 3 to decrease the total number of sequences by a factor of 3. If we chose a step size of 1, then our dataset would be composed of every possible 40 character subsequence of the corpus. 

We want the input to our RNN to be a sequence, and the output to be the character that immediately follows that sequence. We can think of this as 'completing' a sequence.

The last step is to convert to one-hot encoding for both the input and the output sequences.

## 5 Create the model
In this section, we will create a very basic recurrent neural network with one RNN layer. The input shape is the shape of any input sequence in our training set, which is the second and third dimensions of x: 40 by 38 (the maximum length of the sentence is 40, and after one-hot encoding, every character is 1 by 38). The number of neurons is user-defined.

We also need a Fully-connected layer for classification (predicting the last character). We feed our first RNN into this fully connected layer. This means that our output layer is a 38-unit softmax layer which corresponds to the 38 available characters.

Finally, we use **model.compile** to configure the model. We use the categorical cross entropy loss function, owing to the mutually exclusive nature of the classes we're predicting. We arbitrarily pick the Adam optimizer (you can use other optimizers to see if the result will be improved). 
For details of other optimizers check [Keras: optimizers](https://keras.io/optimizers/).

In [11]:
model = Sequential()
model.add(LSTM(64, input_shape=(x.shape[1], x.shape[2])))
model.add(Dense(len(chars), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=optimizers.RMSprop())

## 6 Train the model and predict the output
Here we call our model train it using **.fit**.

We used a for-loop here so that we can generate text every few epochs of training (specifically after every ten.) 

To generate the sentence, we will randomly choose a sequence from the corpus and feed it into the model. The returned character is the model's prediction for the next character.
RNNs can be run for arbitrarily many steps after an input sequence, and we do this to generate 400 characters after the input sentence.

In [12]:
for iteration in range(1, 11):
    model.fit(x, y,
              batch_size=128,
              epochs=10)
    
# generating text
    print('\nIteration', iteration)
    start_index = np.random.randint(0, len(text) - maxlen - 1)
    sentence = text[start_index: start_index + maxlen]


    for i in range(400):
        x_pred = np.zeros((1, maxlen, len(chars)))
        for t, char in enumerate(sentence):
            x_pred[0, t, char_indices[char]] = 1.

        preds = model.predict(x_pred, verbose=0)[0]      #predict using model, generating a matrix of relu
        next_index = np.argmax(preds)                    #we want the index with highest probability 
        next_char = indices_char[next_index]             #convert number back to character using the dictionary
        sentence = sentence[1:] + next_char              #append the character predicted to the sentence (start from the second character)

        sys.stdout.write(next_char)                
        sys.stdout.flush()
    print('\n')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Iteration 1
 the hath the hath the seare,
and the hath the hath the hath the seare,
and the hath the hath the hath the seare,
and the hath the hath the hath the seare,
and the hath the hath the hath the seare,
and the hath the hath the hath the seare,
and the hath the hath the hath the seare,
and the hath the hath the hath the seare,
and the hath the hath the hath the seare,
and the hath the hath the hath the

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Iteration 2
the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with the with

Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Iteration 7
row the world deay,
  see the sterseres sube all the will shee strengess peart,
and their thee the pood do have the sor fars,
and all their their thee shell shall thee greaser's with,
and thoughts that thou as me thou art for whill sporters says
the love the pood that i me beauty shill,
o that i whit still that thou shel strent
or prids speaking the less to the selfal aupless seat
so beauty's sume

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Iteration 8
ve's be,
that i love to the self that thought
in the world desplenieng ele,
that enould not my be thee shee some in thee,
  and though i mand thee thou astand glows,
which i same grow the would do prosse.

cxxii

if that thou shal the with on that forth doth prease,
and thee their sid can on the eaven that sure steet
or prom my heart doth some in the remore,
and thou astand with that thou art my b

Epo

We see that the loss decreases over the training, and the text generated improves each time. The result is very impressive given a simple RNN model. Generally speaking, using an improved version of RNN model, such as LSTM, will make the result a lot better. We will leave the process of improving the model up to the student.