# Chapter 16 Modelling Sequential Data Using Recurrent Neural Network

### In this chapter, we will cover
- Introducing sequential data
- RNNs for modelling sequences
- Long Short-Term Memory (LSTM)
- Truncated Backpropagation Through Time (T-BPTT)
- Implementing a multilayer RNN for sequential modeling in TensorFlow
- Project 1: RNN sentiment analysis of the IMDb movie review dataset
- Project 2: RNN character-level languge modeling with LSTM cells, using text data from Shakspeare's Hamlet
- Using gradient clipping to avoid  exploding gradients

## Introducing sequential data
- Sequential data: elements in a sequence appear in a certain order
    - are not independent of each other
    - order matters
    - Unlike typical machine learning algorithms for supervised learning which assums the input data is IID
        - MLPs and CNNs are not capable of handling the order of input samples
        - they do not have memory of the past seen samples
            - weights are updated independent of the order in which the sample is processed
    - RNN:
        - designed for modeling sequences
        - capable of remembering past information and processing new events accordingly
        - Application: e.g. language translation, image captioning, text generation
    
    

## Categories of sequence modeling
- Many-to-one e.g. Sentiment Analysis
     - input: a squence e.g.text
     - output: a fixed-size vector, not a sequence e.g.class label
     
- One-to-many e.g. Image Captioning
    - input: a fixed-sized vector e.g. image
    - output: a sequence e.g. An English phrase

- Many-to-many e.g.Language Translation (Delayed) Video classification (Synchronized)
    - input: a sequence e.g. An English phrase
    - output: a sequence e.g. An German phrase












## RNN for modeling sequences

- layer 1:
    - the hidden layer: $h_1^(t)$
    - input from data point: $x_t$
    - input from the hidden values in the same layer, but the previous time step $h_1^(t-1)$
    
- layer 2:
    - the hidden layer: $h_2^(t)$
    - input from $h_1^(t)$
    - input from its own hidden values from prevous time step $h_2^(t-1)$
    


## Computing activations in an RNN

- $W_{xh}$: The weight matrix between the input $x^t$ and the hidden layer h
- $W_{hh}$: The weight matrix associated with the recurrent edge
- $W_{hy}$: The weight matrix between the hidden layer and output layer
- $W_h = [W_{xh} + W_{hh}]$ combined weight matrices
- $b_h$: Bias vector for the hidden units
- $z_h$: Net input
- $\phi_h(.)$: the activation function of the hidden layer

Formula for computing hidden units:
$$ h^t = \phi(W_h[x^t,h^(t-1)]+b_h) $$

Activation of output units:
$$ y^t = \phi(W_{hy}^t+b_y) $$

## The challenges of learning long-range interactions

- Vanishing or exploding gradient problem
    - Vanishing: $|w|<1$
    - exploding: $|w|>1$
    - Ideal: $|w|=1$

- Solutions:
    - Truncated backpropagation through time (TBPTT)
        - clip the gradients above a given threshold
        - limit the number of steps that gradient can effectively flow back and properly update the weights
    - Long short-term memory (LSTM)
        - Most popular
        - introduce a memory cell

## LSTM units

- Building block: a Memory Cell
- Cell state at the current time step $C^t$: value associated with the current edge
- 3 different types of gates:
    - Forget gate ($f_t$): allow the memory cell to reset the cell state without growing indefinitely
        - decide which information is allowed to go through and which information to suppress
        - $f_t = \sigma(W_{xf}x^t+W_{hf}h^{t-1}+b_f)$
    - Input gate ($i_t$) and input node($g_t$): update the cell state
        - $i_t = \sigma(W_{xi}x^t+W_{hi}h^{t-1}+b_i)$
        - $g_t = \sigma(W_{xg}x^t+W_{hg}h^{t-1}+b_g)$
    - The cell state at time $t$:
        - $C^t = (C^{t-1} \bigodot f_t) \bigoplus (i_t \bigodot g_t)$
            - $\bigodot$: element wise product
            - $\bigoplus$: element wise summation
    - Output gate ($o_t$): decide how to update the values of hidden units:
        - $o_t = \sigma(W_{xo}x^t+W_{ho}h^{t-1}+b_o)$
    - Hidden units at the current time step:
        - $h^t = o_t \bigodot tanh(C^t)$

## Implementing a multilayer RNN for sequence modeling in TensorFlow

### Project 1: Performing sentiment analysis of IMDb movie reviews using multilayer RNNs

In [1]:
## Import the necessary modules and read the data into a DataFrame pandas

import pyprind
import pandas as pd
from string import punctuation
import re
import numpy as np

df = pd.read_csv('movie_data.csv', encoding = 'utf-8')
print(df.head(3))

                                              review  sentiment
0  This tearful movie about a sister and her batt...          1
1  It's too kind to call this a "fictionalized" a...          0
2  Truly bad and easily the worst episode I have ...          0


In [2]:
## Preprocessing the data:
## Separate words and
## count each word's occurrence

from collections import Counter

counts = Counter()
pbar = pyprind.ProgBar(len(df['review']),
                       title = 'Counting words occurences')

for i, review in enumerate(df['review']):
    text = ''.join([c if c not in punctuation else ' ' +c+' '\
                    for c in review]).lower()
    df.loc[i, 'review'] = text
    pbar.update()
    counts.update(text.split())

Counting words occurences
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:05:26


In [3]:
## Create a mapping
## Map each unique word to an integer

word_counts = sorted(counts, key = counts.get, reverse = True)
print(word_counts[:5])
word_to_int = {word: ii for ii, word in enumerate(word_counts, 1)}

mapped_reviews = []
pbar = pyprind.ProgBar(len(df['review']),
                       title = 'Map reviews to ints')
for review in df['review']:
    mapped_reviews.append([word_to_int[word] for word in review.split()])
    pbar.update()

Map reviews to ints


['the', '.', ',', 'and', 'a']


0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


In [6]:
## Define fixed-length sequences:
## Use the last 200 elements of each sequence
## if sequence length < 200: left-pad with zeros

sequence_length = 200 ## sequence length (or T in our formulas)

sequences = np.zeros((len(mapped_reviews),sequence_length),dtype=int)

for i, row in enumerate(mapped_reviews):
    review_arr = np.array(row)
    sequences[i,-len(row):] = review_arr[-sequence_length:]

X_train = sequences[:25000, :]
y_train = df.loc[:25000, 'sentiment'].values
X_test = sequences[25000:, :]
y_test = df.loc[25000:, 'sentiment'].values

np.random.seed(123) # for reproducibility

## Function to generate minibatches:
def create_batch_generator(x, y=None, batch_size=64):
    n_batches = len(x)//batch_size
    x = x[: n_batches * batch_size]
    if y is not None:
        y = y[:n_batches*batch_size]
    for ii in range(0,len(x),batch_size):
        if y is not None:
            yield x[ii:ii+batch_size], y[ii:ii+batch_size]
        else: 
            yield x[ii:ii+batch_size]


### Embedding

### Build the RNN model

In [23]:
import tensorflow as tf

class SentimentRNN(object):
    def __init__(self, n_words, seq_len=200,
                 lstm_size = 256, num_layers = 1, batch_size = 64,
                 learning_rate = 0.0001, embed_size = 200):
        # creating the embedding layer
        self.n_words = n_words # = the number of unique words (plus 1 since we use zero to fill sequences who size is less than 200)
        self.embed_size = embed_size
        
        self.seq_len = seq_len # 
        self.lstm_size = lstm_size ## number of hidden units in each RNN layer
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        
        
        self.g = tf.Graph()
        with self.g.as_default():
            tf.set_random_seed(123)
            self.build()
            self.saver = tf.train.Saver()
            self.init_op = tf.global_variables_initializer()
            
    def build(self):
        ## Define the placeholders
        tf_x = tf.placeholder(tf.int32,
                              shape = (self.batch_size, self.seq_len),
                              name = 'tf_x')
        tf_y = tf.placeholder(tf.float32,
                              shape = (self.batch_size),
                              name = 'tf_y')
        tf_keepprob = tf.placeholder(tf.float32,
                                     name = 'tf_keepprob')
        
        ## Create the embedding layer
        embedding = tf.Variable(
                    tf.random_uniform(
                        (self.n_words, self.embed_size),
                        minval = -1, maxval = 1),
                    name = 'embedding')
        embed_x = tf.nn.embedding_lookup(
                    embedding, tf_x, 
                    name = 'embeded_x')
        ## Step 1: Define LSTM cell and stack them together
        cells = tf.contrib.rnn.MultiRNNCell( #3. Make a list of such cells 
                [tf.contrib.rnn.DropoutWrapper( #2. Apply the dropout to the RNN cells
                    tf.contrib.rnn.BasicLSTMCell(self.lstm_size),## 1. Create the RNN cells
                    output_keep_prob = tf_keepprob) #2. Apply the dropout to the RNN cells
                 for i in range(self.num_layers)])#3. According to the desired number of RNN layers
        
        ## Step 2: Define the initial state:
        self.initial_state = cells.zero_state(
                    self.batch_size, tf.float32)
        print('   << initial state >> ', self.initial_state)
        
        ## Step 3: Creating the RNN using the RNN cells and their states
        lstm_outputs, self.final_state = tf.nn.dynamic_rnn(
                            cells, embed_x,
                            initial_state = self.initial_state)
        
        ## Note: lstm_outputs shape:
        ##    [batch_size, max_time, cells.output_size]
        print('   << lstm_output >> ', lstm_outputs)
        print('   << final state >> ', self.final_state)
        
        ## Apply a FC layer after on top of RNN output
        logits = tf.layers.dense(
                    inputs = lstm_outputs[:,-1],
                    units = 1, activation = None,
                    name = 'logits')
        
        logits = tf.squeeze(logits, name = 'logits_squeezed')
        print('   << logits >> ', logits)
        
        y_proba = tf.nn.sigmoid(logits, name = 'probabilities')
        
        predictions = {
            'probabilities' : y_proba,
            'labels' : tf.cast(tf.round(y_proba), tf.int32,
                               name = 'labels')
        }
        print('\n   << predictions >> ', predictions)
        
        ## Define the cost function
        cost = tf.reduce_mean(
                tf.nn.sigmoid_cross_entropy_with_logits(
                labels = tf_y, logits = logits),
                             name = 'cost')
        
        ## Define the optimizer
        optimizer = tf.train.AdamOptimizer(self.learning_rate)
        train_op = optimizer.minimize(cost, name = 'train_op')
    
    def train(self, X_train, y_train, num_epochs):
        with tf.Session(graph=self.g) as sess:
            sess.run(self.init_op)
            iteration = 1
            for epoch in range (num_epochs):
                state = sess.run(self.initial_state)
                
                for batch_x, batch_y in create_batch_generator(
                        X_train, y_train, self.batch_size):
                    feed = {'tf_x:0': batch_x,
                            'tf_y:0': batch_y,
                            'tf_keepprob:0': 0.5,
                            self.initial_state : state}
                    loss, _, state = sess.run(
                                ['cost:0', 'train_op',
                                 self.final_state],
                                 feed_dict = feed)
                    if iteration % 20 == 0:
                        print("Epoch: %d/%d Iteration: %d"
                              "| Train loss: %.5f" % (
                              epoch + 1, num_epochs,
                              iteration, loss))
                    iteration += 1
                if (epoch + 1) % 10 == 0:
                    self.saver.save(sess,
                                    "model/sentiment-%d.ckpt" % epoch)
                    
    def predict(self, X_data, return_proba = False):
        preds = []
        with tf.Session(graph = self.g) as sess:
            self.saver.restore(
                sess, tf.train.latest_checkpoint('model/'))
            test_state = sess.run(self.initial_state)
            for ii, batch_x in enumerate(create_batch_generator(
                                X_data, None, batch_size = self.batch_size),1):
                feed = {"tf_x:0" : batch_x,
                        "tf_keepprob:0" : 1.0,
                        self.initial_state: test_state}
                if return_proba:
                    pred, test_state =sess.run(
                                        ['probabilities:0', self.final_state],
                                        feed_dict =feed)
                else:
                    pred, test_state = sess.run(
                        ['labels:0', self.final_state],
                        feed_dict = feed)
                
                preds.append(pred)
            
        return np.concatenate(preds)
    
        

In [25]:
## Train:
n_words = max(list(word_to_int.values()))+1
rnn = SentimentRNN(n_words = n_words,
                   seq_len = sequence_length,
                   embed_size = 256,
                   lstm_size = 128,
                   num_layers = 1,
                   batch_size = 100,
                   learning_rate = 0.001)

   << initial state >>  (LSTMStateTuple(c=<tf.Tensor 'MultiRNNCellZeroState/DropoutWrapperZeroState/BasicLSTMCellZeroState/zeros:0' shape=(100, 128) dtype=float32>, h=<tf.Tensor 'MultiRNNCellZeroState/DropoutWrapperZeroState/BasicLSTMCellZeroState/zeros_1:0' shape=(100, 128) dtype=float32>),)
   << lstm_output >>  Tensor("rnn/transpose_1:0", shape=(100, 200, 128), dtype=float32)
   << final state >>  (LSTMStateTuple(c=<tf.Tensor 'rnn/while/Exit_3:0' shape=(100, 128) dtype=float32>, h=<tf.Tensor 'rnn/while/Exit_4:0' shape=(100, 128) dtype=float32>),)
   << logits >>  Tensor("logits_squeezed:0", shape=(100,), dtype=float32)

   << predictions >>  {'probabilities': <tf.Tensor 'probabilities:0' shape=(100,) dtype=float32>, 'labels': <tf.Tensor 'labels:0' shape=(100,) dtype=int32>}


In [18]:
rnn.train(X_train, y_train, num_epochs = 40)

Epoch: 1/40 Iteration: 20| Train loss: 0.66341
Epoch: 1/40 Iteration: 40| Train loss: 0.65519
Epoch: 1/40 Iteration: 60| Train loss: 0.61008
Epoch: 1/40 Iteration: 80| Train loss: 0.57383
Epoch: 1/40 Iteration: 100| Train loss: 0.50775
Epoch: 1/40 Iteration: 120| Train loss: 0.50454
Epoch: 1/40 Iteration: 140| Train loss: 0.53387
Epoch: 1/40 Iteration: 160| Train loss: 0.57484
Epoch: 1/40 Iteration: 180| Train loss: 0.56980
Epoch: 1/40 Iteration: 200| Train loss: 0.50458
Epoch: 1/40 Iteration: 220| Train loss: 0.41951
Epoch: 1/40 Iteration: 240| Train loss: 0.45895
Epoch: 2/40 Iteration: 260| Train loss: 0.33960
Epoch: 2/40 Iteration: 280| Train loss: 0.46478
Epoch: 2/40 Iteration: 300| Train loss: 0.44123
Epoch: 2/40 Iteration: 320| Train loss: 0.30060
Epoch: 2/40 Iteration: 340| Train loss: 0.36178
Epoch: 2/40 Iteration: 360| Train loss: 0.41756
Epoch: 2/40 Iteration: 380| Train loss: 0.36257
Epoch: 2/40 Iteration: 400| Train loss: 0.29643
Epoch: 2/40 Iteration: 420| Train loss: 0.30

Epoch: 14/40 Iteration: 3380| Train loss: 0.00343
Epoch: 14/40 Iteration: 3400| Train loss: 0.00084
Epoch: 14/40 Iteration: 3420| Train loss: 0.00080
Epoch: 14/40 Iteration: 3440| Train loss: 0.00358
Epoch: 14/40 Iteration: 3460| Train loss: 0.01215
Epoch: 14/40 Iteration: 3480| Train loss: 0.00451
Epoch: 14/40 Iteration: 3500| Train loss: 0.01979
Epoch: 15/40 Iteration: 3520| Train loss: 0.00519
Epoch: 15/40 Iteration: 3540| Train loss: 0.02459
Epoch: 15/40 Iteration: 3560| Train loss: 0.01432
Epoch: 15/40 Iteration: 3580| Train loss: 0.00702
Epoch: 15/40 Iteration: 3600| Train loss: 0.00224
Epoch: 15/40 Iteration: 3620| Train loss: 0.01107
Epoch: 15/40 Iteration: 3640| Train loss: 0.01135
Epoch: 15/40 Iteration: 3660| Train loss: 0.01072
Epoch: 15/40 Iteration: 3680| Train loss: 0.01060
Epoch: 15/40 Iteration: 3700| Train loss: 0.00217
Epoch: 15/40 Iteration: 3720| Train loss: 0.03234
Epoch: 15/40 Iteration: 3740| Train loss: 0.00333
Epoch: 16/40 Iteration: 3760| Train loss: 0.00654


Epoch: 27/40 Iteration: 6660| Train loss: 0.00018
Epoch: 27/40 Iteration: 6680| Train loss: 0.00041
Epoch: 27/40 Iteration: 6700| Train loss: 0.00004
Epoch: 27/40 Iteration: 6720| Train loss: 0.00006
Epoch: 27/40 Iteration: 6740| Train loss: 0.00008
Epoch: 28/40 Iteration: 6760| Train loss: 0.00009
Epoch: 28/40 Iteration: 6780| Train loss: 0.00010
Epoch: 28/40 Iteration: 6800| Train loss: 0.00005
Epoch: 28/40 Iteration: 6820| Train loss: 0.00030
Epoch: 28/40 Iteration: 6840| Train loss: 0.00011
Epoch: 28/40 Iteration: 6860| Train loss: 0.00024
Epoch: 28/40 Iteration: 6880| Train loss: 0.00006
Epoch: 28/40 Iteration: 6900| Train loss: 0.00018
Epoch: 28/40 Iteration: 6920| Train loss: 0.00005
Epoch: 28/40 Iteration: 6940| Train loss: 0.00005
Epoch: 28/40 Iteration: 6960| Train loss: 0.00008
Epoch: 28/40 Iteration: 6980| Train loss: 0.00002
Epoch: 28/40 Iteration: 7000| Train loss: 0.00007
Epoch: 29/40 Iteration: 7020| Train loss: 0.00007
Epoch: 29/40 Iteration: 7040| Train loss: 0.00003


Epoch: 40/40 Iteration: 9940| Train loss: 0.00001
Epoch: 40/40 Iteration: 9960| Train loss: 0.00001
Epoch: 40/40 Iteration: 9980| Train loss: 0.00000
Epoch: 40/40 Iteration: 10000| Train loss: 0.00002


In [26]:
## Test
preds = rnn.predict(X_test)
y_true = y_test[:len(preds)]
print('Test Acc.: %.3f' % ( np.sum(preds == y_true)/len(y_true)))


INFO:tensorflow:Restoring parameters from model/sentiment-39.ckpt
Test Acc.: 0.854


In [27]:
## Get probabilities
proba = rnn.predict(X_test, return_proba = True)

INFO:tensorflow:Restoring parameters from model/sentiment-39.ckpt


## Project 2: Implementing an RNN for character-level language modeling in TensorFlow

Input: a text document
Goal: develop a model that can generate new text similar to the input document

In character-level langague modeling, the input is broken down into a sequence of characters that are fed into our network one character at a time.

The network will process each new character in conjuction with the memory of the previously seen characters to predict the next character

Implementation:

1) Preparing the data

2) Building the RNN model

3) Performing next-character prediction and sampling to generate new text.

In [28]:
# import numpy as np

## reading and processing text
with open('pg2265.txt', 'r', encoding = 'utf-8') as f:
    text = f.read()

text = text[15858:] # remove the beginning portion of the text that contains some legal description of the Gutenberg project
chars = set(text)
char2int = {ch: i for i, ch in enumerate(chars)}
int2char = dict(enumerate(chars))
text_ints = np.array([char2int[ch] for ch in text],
                     dtype = np.int32)


Reshape the data into batches of sequences

shift the input(x) and output(y) of the neural network by one character

In [31]:
def reshape_data(sequence, batch_size, num_steps):
    tot_batch_length = batch_size * num_steps
    num_batches = int(len(sequence)/tot_batch_length)
    
    if num_batches * tot_batch_length + 1 > len(sequence):
        num_batches = num_batches - 1
    ## Truncate the sequence at the end to get rid of
    ## remaining characters that do not make a full batch
    x = sequence[0: num_batches * tot_batch_length]
    y = sequence[1: num_batches * tot_batch_length + 1]
    
    ## Split x & y into a list batches of sequences:
    x_batch_splits = np.split(x, batch_size)
    y_batch_splits = np.split(y, batch_size)
    
    ## Stack the batches together
    ## batch_size x mini_batch_length
    x = np.stack(x_batch_splits)
    y = np.stack(y_batch_splits)
    
    return x,y
## Testing:
train_x, train_y = reshape_data(text_ints,64,10)
print(train_x[0,:10])
print(train_y[0,:10])
print(''.join(int2char[i] for i in train_x[0,:50]))

[15 31 63 25 15 50 57  8 63 53]
[31 63 25 15 50 57  8 63 53 37]
The Tragedie of Hamlet

Actus Primus. Scoena Prima


In [34]:
np.random.seed(123)

def create_batch_generator(data_x, data_y, num_steps):
    batch_size, tot_batch_length = data_x.shape
    num_batches = int(tot_batch_length/num_steps)
    
    for b in range(num_batches):
        yield(data_x[:,b*num_steps:(b+1)*num_steps],
              data_y[:,b*num_steps:(b+1)*num_steps])
        
bgen = create_batch_generator(train_x[:,:100],train_y[:,:100],15)

for b in bgen:
    print(b[0].shape,b[1].shape,end=' ')
    print(''.join(int2char[i] for i in b[0][0,:]).replace('\n','*'),'    ',
          ''.join(int2char[i] for i in b[1][0,:]).replace('\n','*'))

(64, 15) (64, 15) The Tragedie of      he Tragedie of 
(64, 15) (64, 15)  Hamlet**Actus       Hamlet**Actus P
(64, 15) (64, 15) Primus. Scoena       rimus. Scoena P
(64, 15) (64, 15) Prima.**Enter B      rima.**Enter Ba
(64, 15) (64, 15) arnardo and Fra      rnardo and Fran
(64, 15) (64, 15) ncisco two Cent      cisco two Centi


### Building the character-level RNN model

- Constructor:
    - set up the learning parameters
    - create a computation graph
    - call the `build` method to construct the graph based on the sampling mode v.s. training mode

- `build` method:
    - define the placeholders for feeding the data
    - construct the RNN using LSTM cells
    - define the output of the network
    - define the cost function
    - define the optimizer
    
- `train` method:
    - iterate through the mini-batches
    - train the network for the specified number of epochs

- `sample` method:
    - calculate the prob. of the next character
    - choose a character randomly according to these probs.
    - concatenated together to form a string

In [60]:
def get_top_char(probas, char_size, top_n = 5):
    p = np.squeeze(probas)
    p[np.argsort(p)[:-top_n]] = 0.0
    p = p / np.sum(p)
    ch_id = np.random.choice(char_size,1,p = p)[0]
    return ch_id

In [63]:
import tensorflow as tf
import os

class CharRNN(object):
    def __init__(self, num_classes, batch_size = 64,
                 num_steps = 100, lstm_size = 128,
                 num_layers = 1, learning_rate = 0.001,
                 keep_prob = 0.5, grad_clip = 5,
                 sampling = False):
        ## 1. set up learning parameters
        self.num_classes = num_classes
        self.batch_size = batch_size
        self.num_steps = num_steps
        self.lstm_size = lstm_size
        self.num_layers = num_layers
        self.learning_rate = learning_rate
        self.keep_prob = keep_prob
        self.grad_clip = grad_clip
        
        ## 2. create computation graph
        self.g = tf.Graph()
        with self.g.as_default():
            tf.set_random_seed(123)
            ## 3. calls the `build` method the construct the graph
            self.build(sampling = sampling)
            self.saver = tf.train.Saver()
            self.init_op = tf.global_variables_initializer()
        
    def build(self, sampling):
        if sampling == True:
            batch_size, num_steps = 1,1
        else:
            batch_size = self.batch_size
            num_steps = self.num_steps
            
        ## 1. define the placeholders for feeding the data
        tf_x = tf.placeholder(tf.int32,
                              shape = [batch_size, num_steps],
                              name = 'tf_x')
        tf_y = tf.placeholder(tf.int32,
                              shape = [batch_size, num_steps],
                              name = 'tf_y')
        tf_keepprob = tf.placeholder(tf.float32,
                                     name = 'tf_keepprob')
        
        # One-hot encoding:
        x_onehot = tf.one_hot(tf_x, depth = self.num_classes)
        y_onehot = tf.one_hot(tf_y, depth = self.num_classes)
        
        ## 2. Build the multi-layer RNN cells
        cells = tf.contrib.rnn.MultiRNNCell(
                [tf.contrib.rnn.DropoutWrapper(
                    tf.contrib.rnn.BasicLSTMCell(self.lstm_size),
                    output_keep_prob = tf_keepprob)
                 for _ in range(self.num_layers)])
        
        # Define the initial state
        self.initial_state = cells.zero_state(
                batch_size, tf.float32)
        
        ## 3. Define the output of the network
        # Run each sequence step through the RNN
        lstm_outputs, self.final_state = tf.nn.dynamic_rnn(
                    cells, x_onehot,
                    initial_state = self.initial_state)
        
        print('  <<lstm_outputs   >>', lstm_outputs)
        
        seq_output_reshaped = tf.reshape(
                            lstm_outputs,
                            shape = [-1, self.lstm_size],
                            name = 'seq_output_reshaped')
        
        
        logits = tf.layers.dense(
                    inputs = seq_output_reshaped,
                    units =self.num_classes,
                    activation = None,
                    name = 'logits')
        
        proba = tf.nn.softmax(logits,
                              name = 'probabilities')
        
        print(proba)
        
        y_reshaped = tf.reshape(
                        y_onehot,
                        shape = [-1, self.num_classes],
                        name = 'y_reshaped')
        
        ## 4. Define the cost function
        cost = tf.reduce_mean(
                    tf.nn.softmax_cross_entropy_with_logits(
                        logits = logits,
                        labels = y_reshaped),
                    name = 'cost')
        
        # Gradient clipping to avoid "exploding gradient"
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(
                        tf.gradients(cost, tvars),
                        self.grad_clip)
        
        ## 5. Define the optimizer
        optimizer = tf.train.AdamOptimizer(self.learning_rate)
        train_op = optimizer.apply_gradients(
                    zip(grads, tvars),
                    name = 'train_op')
        
        
    def train(self, train_x, train_y,
              num_epochs, ckpt_dir = './model/'):
        ## Create the checkpoint directory
        ## if not exisit
        
        if not os.path.exists(ckpt_dir):
            os.mkdir(ckpt_dir)
            
        with tf.Session(graph = self.g) as sess:
            sess.run(self.init_op)
            n_batches = int(train_x.shape[1]/self.num_steps)
            
            
            for epoch in range(num_epochs):
                # Train network
                new_state = sess.run(self.initial_state)
                loss = 0
                # Minibatch generator
                bgen = create_batch_generator(
                        train_x, train_y, self.num_steps)
                
                for b, (batch_x, batch_y) in enumerate(bgen,1):
                    
                    iteration = epoch * n_batches + b
                    
                    feed = {'tf_x:0': batch_x,
                            'tf_y:0': batch_y,
                            'tf_keepprob:0':self.keep_prob,
                            self.initial_state : new_state}
                    batch_cost, _, new_state = sess.run(
                            ['cost:0', 'train_op',self.final_state],
                            feed_dict = feed)
                    
                    if iteration % 10 == 0:
                        print('Epoch %d/%d Iteration %d'
                              '| Training loss: % .4f' %(
                              epoch + 1, num_epochs,
                              iteration, batch_cost))
                        
            ## save the trained model
            self.saver.save(
                    sess, os.path.join(
                        ckpt_dir, 'language_modeling.ckpt'))
            
            
            
            
    def sample(self, output_length,
               ckpt_dir, starter_seq = 'The '):
        observed_seq = [ch for ch in starter_seq]

        with tf.Session(graph = self.g) as sess:
            self.saver.restore(
                sess,
                tf.train.latest_checkpoint(ckpt_dir))

            ## 1. run the model using the starter sequence
            new_state = sess.run(self.initial_state)
            for ch in starter_seq:
                x = np.zeros((1,1))
                x[0,0] = char2int[ch]
                feed = {'tf_x:0': x,
                        'tf_keepprob:0': 1.0,
                        self.initial_state: new_state}
                proba, new_state = sess.run(
                        ['probabilities:0', self.final_state],
                        feed_dict = feed)

            ch_id = get_top_char(proba,len(chars))
            observed_seq.append(int2char[ch_id])


            ## 2: run the model using the updated observed_sq
            for i in range(output_length):
                x[0,0] = ch_id
                feed = {'tf_x:0':x,
                        'tf_keepprob:0' : 1.0,
                        self.initial_state: new_state}
                proba, new_state = sess.run(
                        ['probabilities:0', self.final_state],
                        feed_dict = feed)
                ch_id  = get_top_char(proba, len(chars))
                observed_seq.append(int2char[ch_id])

        return ''.join(observed_seq)



## Creating and training the CharRNN Model

In [50]:
batch_size = 64
num_steps = 100 
train_x, train_y = reshape_data(text_ints, 
                                batch_size, 
                                num_steps)

rnn = CharRNN(num_classes=len(chars), batch_size=batch_size)
rnn.train(train_x, train_y, 
          num_epochs=100,
          ckpt_dir='./model-100/')

  <<lstm_outputs   >> Tensor("rnn/transpose_1:0", shape=(64, 100, 128), dtype=float32)
Tensor("probabilities:0", shape=(6400, 65), dtype=float32)
Epoch 1/100 Iteration 10| Training loss:  3.7578
Epoch 1/100 Iteration 20| Training loss:  3.3869
Epoch 2/100 Iteration 30| Training loss:  3.3011
Epoch 2/100 Iteration 40| Training loss:  3.2389
Epoch 2/100 Iteration 50| Training loss:  3.2294
Epoch 3/100 Iteration 60| Training loss:  3.2202
Epoch 3/100 Iteration 70| Training loss:  3.1929
Epoch 4/100 Iteration 80| Training loss:  3.1841
Epoch 4/100 Iteration 90| Training loss:  3.1605
Epoch 4/100 Iteration 100| Training loss:  3.1537
Epoch 5/100 Iteration 110| Training loss:  3.1390
Epoch 5/100 Iteration 120| Training loss:  3.1000
Epoch 6/100 Iteration 130| Training loss:  3.0778
Epoch 6/100 Iteration 140| Training loss:  3.0434
Epoch 6/100 Iteration 150| Training loss:  3.0064
Epoch 7/100 Iteration 160| Training loss:  2.9725
Epoch 7/100 Iteration 170| Training loss:  2.9186
Epoch 8/100 I

Epoch 64/100 Iteration 1590| Training loss:  1.9991
Epoch 64/100 Iteration 1600| Training loss:  1.9355
Epoch 65/100 Iteration 1610| Training loss:  1.9746
Epoch 65/100 Iteration 1620| Training loss:  1.9674
Epoch 66/100 Iteration 1630| Training loss:  1.9594
Epoch 66/100 Iteration 1640| Training loss:  1.9922
Epoch 66/100 Iteration 1650| Training loss:  1.9276
Epoch 67/100 Iteration 1660| Training loss:  1.9815
Epoch 67/100 Iteration 1670| Training loss:  1.9496
Epoch 68/100 Iteration 1680| Training loss:  1.9727
Epoch 68/100 Iteration 1690| Training loss:  1.9804
Epoch 68/100 Iteration 1700| Training loss:  1.9223
Epoch 69/100 Iteration 1710| Training loss:  1.9642
Epoch 69/100 Iteration 1720| Training loss:  1.9473
Epoch 70/100 Iteration 1730| Training loss:  1.9635
Epoch 70/100 Iteration 1740| Training loss:  1.9821
Epoch 70/100 Iteration 1750| Training loss:  1.9173
Epoch 71/100 Iteration 1760| Training loss:  1.9541
Epoch 71/100 Iteration 1770| Training loss:  1.9404
Epoch 72/100

In [64]:
np.random.seed(123)
rnn = CharRNN(len(chars), sampling=True)

print(rnn.sample(ckpt_dir='./model-100/', 
                 output_length=500))

  <<lstm_outputs   >> Tensor("rnn/transpose_1:0", shape=(1, 1, 128), dtype=float32)
Tensor("probabilities:0", shape=(1, 65), dtype=float32)
INFO:tensorflow:Restoring parameters from ./model-100/language_modeling.ckpt
The shist at his wish these shat in the pelains andere and muscard, a dowht, a thinkent as of is it mant, wothere this thay, what, ard haul his whese: whene he heard, on the toull what of thoues
To may in ore inta chaust hemeete, that whel he hould
Te thit tos so toust of thay in may,
Thoued and a deene
To me thit thas simpald the sallent in as in the,
Whin words in that sine the preserind of and miner and
Andowne thinge tis allath that wat he well would,
To shaule so the ploshe wist my some,
Tare in
