# Data Science Summer School - Split '17

# 3. Character-wise language modeling with multi-layer LSTMs

This hands-on session is based on two tutorial notebooks [*Intro to Recurrent Networks (Character-wise RNN)*](https://github.com/udacity/deep-learning/tree/master/intro-to-rnns) and [*Tensorboard*](https://github.com/udacity/deep-learning/tree/master/tensorboard) from Udacity's [Deep Learning Nanodegree Foundation](https://www.udacity.com/course/deep-learning-nanodegree-foundation--nd101) program.

This notebook implements a multi-layer LSTMs network for training/sampling from character-level language models. The model takes a text file as input and trains the network that learns to predict the next character in a sequence. The network can then be used to generate text character by character that will look like the original training data. This network is based on Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/), which became standard example for explaining peculiarities behind RNN models.

In this session we will train our model on Donald Trump's tweets.

In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

In [2]:
%load_ext autoreload
%autoreload 2
#from utils import show_graph

## 3.1 Data preparation

### Loading and encoding text
We have already prepared a text file with concatenated Donald Trump's tweets obtained from the [Trump Twitter Archive](http://www.trumptwitterarchive.com/). The text file is located in `PATH-TO-REPOSITORY/Day-3/assets/data/trump_tweets_ascii.txt`. First, we will load the text file, and encode its characters as integers.

In [3]:
with open('assets/data/trump_tweets_ascii.txt', 'r') as f:
    text=f.read()

#get set of characters contained in the loaded text file
vocab = sorted(set(text))

#encoding characters as integers
vocab_to_int = {c: i for i, c in enumerate(vocab)}
encoded_chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

#make dict for decoding intergers to corresponding characters
int_to_vocab = dict(enumerate(vocab))


print('Text size: {}'.format(len(encoded_chars)))
print('Vocabulary size: {}'.format(len(vocab)))
print('*******************************')
print('Number of tweets: {}'.format(len(text.split('\n'))))
print('Median size of a tweet: {}'.format(np.percentile([len(t) for t in text.split('\n')], 50)))


Text size: 2167951
Vocabulary size: 92
*******************************
Number of tweets: 20629
Median size of a tweet: 117.0


In the above output, we can see that the provided text file `trump_tweets_ascii.txt` contains in total 2 167 951 characters, as well as that, our character-wise model will try to 'choose' between 92 unique characters (*vocabulary size*) while predicting next character based on the previously seen text.

Now, we can see first 300 characters of the provided text:

In [4]:
text[:300]

'We are building our future with American hands American labor American iron aluminum and steel. Happy #LaborDay! https://t.co/lyvtNfQ5IO\nThe United States is considering in addition to other options stopping all trade with any country doing business with North Korea.\nI will be meeting General Kelly '

See how they are encoded as integers:

In [5]:
encoded_chars[:300]

array([53, 66,  1, 62, 79, 66,  1, 63, 82, 70, 73, 65, 70, 75, 68,  1, 76,
       82, 79,  1, 67, 82, 81, 82, 79, 66,  1, 84, 70, 81, 69,  1, 31, 74,
       66, 79, 70, 64, 62, 75,  1, 69, 62, 75, 65, 80,  1, 31, 74, 66, 79,
       70, 64, 62, 75,  1, 73, 62, 63, 76, 79,  1, 31, 74, 66, 79, 70, 64,
       62, 75,  1, 70, 79, 76, 75,  1, 62, 73, 82, 74, 70, 75, 82, 74,  1,
       62, 75, 65,  1, 80, 81, 66, 66, 73, 14,  1, 38, 62, 77, 77, 86,  1,
        4, 42, 62, 63, 76, 79, 34, 62, 86,  2,  1, 69, 81, 81, 77, 80, 26,
       15, 15, 81, 14, 64, 76, 15, 73, 86, 83, 81, 44, 67, 47, 21, 39, 45,
        0, 50, 69, 66,  1, 51, 75, 70, 81, 66, 65,  1, 49, 81, 62, 81, 66,
       80,  1, 70, 80,  1, 64, 76, 75, 80, 70, 65, 66, 79, 70, 75, 68,  1,
       70, 75,  1, 62, 65, 65, 70, 81, 70, 76, 75,  1, 81, 76,  1, 76, 81,
       69, 66, 79,  1, 76, 77, 81, 70, 76, 75, 80,  1, 80, 81, 76, 77, 77,
       70, 75, 68,  1, 62, 73, 73,  1, 81, 79, 62, 65, 66,  1, 84, 70, 81,
       69,  1, 62, 75, 86

And finally, we check the use of our decoding dict `int_to_vocab` to decode the first 300 encoded characters (which we will be using later while sampling new text from the learned model):

In [6]:
''.join([int_to_vocab[ec] for ec in encoded_chars[:300]])

'We are building our future with American hands American labor American iron aluminum and steel. Happy #LaborDay! https://t.co/lyvtNfQ5IO\nThe United States is considering in addition to other options stopping all trade with any country doing business with North Korea.\nI will be meeting General Kelly '

### Making training and validation mini-batches

Neural networks are trained by approximating the gradient of loss function with respect to the neuron-weights, by looking at only a small subset of the data, also known as a mini-batch. Here is where we'll make our mini-batches for training and validation. Now we need to split up the data into batches, and into training and validation sets. 

For the test we will observe how the network generates new text, thus we will not be using test set. We will feed a character into the network and sample a next one from the distribution over characters likely to come next. We feed the sampled character right back to get another next character. Repeating this process character by character will generate new text, hopefully indistinguishable from [Donald Trump's](https://twitter.com/realdonaldtrump/status/881281755017355264) Twitter [tweets](https://twitter.com/realdonaldtrump/status/869858333477523458).


In [7]:
def split_data(arr, batch_size, num_steps, split_frac=0.9):
    """ 
    Split data into training and validation sets, inputs and targets for each set.
    
    Arguments
    ---------
    arr: Array of encoded characters as integers 
    batch_size: Number of sequences per batch
    num_steps: Number of sequence steps per batch to keep in the input and pass to the network, max_time
    split_frac: Fraction of batches to keep in the training set
    
    
    Returns train_x, train_y, val_x, val_y
    """
    
    slice_size = batch_size * num_steps
    n_batches = int(len(arr) / slice_size)
    
    # Drop the last few characters to make only full batches
    x = arr[: n_batches*slice_size]
    
    # The targets are the same as the inputs, except shifted one character over.
    # number of batches covers full size of arr (no characters dropped)
    if(len(arr) == n_batches*slice_size):
        #for the last target character use first input character
        y = np.roll(x, -1)
    else:
        #for the last target characher use first dropped character
        y = arr[1: n_batches*slice_size + 1]
    
    # Split the data into batch_size slices, then stack them into a 2D matrix 
    x = np.stack(np.split(x, batch_size))
    y = np.stack(np.split(y, batch_size))
    # faster alternative
    #x = x.reshape((batch_size, -1))
    #y = y.reshape((batch_size, -1))
    
    # Now x and y are arrays with dimensions batch_size x n_batches*num_steps
    
    # Split into training and validation sets, keep the virst split_frac batches for training
    split_idx = int(n_batches*split_frac)
    train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]
    val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]
    
    return train_x, train_y, val_x, val_y

In [8]:
# example test array
example_arr = np.arange(63)
print(np.array2string(example_arr, max_line_width=100, separator=', '))

[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62]


In [9]:
batch_size = 5 #use 5 sequences in a batch
num_steps = 3 #'size' of sequence in a batch, max_time
#n_batches = len(arr)/(batch_size*num_steps) = 100/(5*5) = 4

split_frac=0.9 # TRAIN= int(0.9 * n_batches) = 3, VAL= n_batches - TRAIN = 1

In [10]:
train_x, train_y, val_x, val_y = split_data(example_arr, batch_size, num_steps, split_frac)

In [11]:
train_x

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [12, 13, 14, 15, 16, 17, 18, 19, 20],
       [24, 25, 26, 27, 28, 29, 30, 31, 32],
       [36, 37, 38, 39, 40, 41, 42, 43, 44],
       [48, 49, 50, 51, 52, 53, 54, 55, 56]])

In [12]:
train_y

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9],
       [13, 14, 15, 16, 17, 18, 19, 20, 21],
       [25, 26, 27, 28, 29, 30, 31, 32, 33],
       [37, 38, 39, 40, 41, 42, 43, 44, 45],
       [49, 50, 51, 52, 53, 54, 55, 56, 57]])

Next, we will create a generator function to get batches from the arrays made by `split_data`. This will provide us with the functionality to iterate over batches, which we can feed to our network model. The arrays are of dimension (`batch_size, n_batches*num_steps`). Each batch is a sliding window on these arrays with size `batch_size X num_steps`.

In [13]:
def get_batch(arrs, num_steps):
    batch_size, slice_size = arrs[0].shape
    
    n_batches = int(slice_size/num_steps)
    for b in range(n_batches):
        yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]

In [14]:
train_batches = get_batch([train_x, train_y], num_steps)

Now, we will test getting the first batch:

In [15]:
train_batch_x, train_batch_y = next(train_batches)

In [16]:
train_batch_x

array([[ 0,  1,  2],
       [12, 13, 14],
       [24, 25, 26],
       [36, 37, 38],
       [48, 49, 50]])

In [17]:
train_batch_y

array([[ 1,  2,  3],
       [13, 14, 15],
       [25, 26, 27],
       [37, 38, 39],
       [49, 50, 51]])

In [18]:
train_batch_x, train_batch_y = next(train_batches)

Next, we can get the second one:

In [19]:
train_batch_x

array([[ 3,  4,  5],
       [15, 16, 17],
       [27, 28, 29],
       [39, 40, 41],
       [51, 52, 53]])

In [20]:
train_batch_y

array([[ 4,  5,  6],
       [16, 17, 18],
       [28, 29, 30],
       [40, 41, 42],
       [52, 53, 54]])

In [21]:
'(batch_size, num_steps) = {}'.format(train_batch_x.shape)

'(batch_size, num_steps) = (5, 3)'

However, we will be iterating over batches using the 'for loop':

In [22]:
print("TRAIN BATCHES")
for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
    print('\nBatch {}:'.format(b))
    print(np.stack([x,y]))

TRAIN BATCHES

Batch 1:
[[[ 0  1  2]
  [12 13 14]
  [24 25 26]
  [36 37 38]
  [48 49 50]]

 [[ 1  2  3]
  [13 14 15]
  [25 26 27]
  [37 38 39]
  [49 50 51]]]

Batch 2:
[[[ 3  4  5]
  [15 16 17]
  [27 28 29]
  [39 40 41]
  [51 52 53]]

 [[ 4  5  6]
  [16 17 18]
  [28 29 30]
  [40 41 42]
  [52 53 54]]]

Batch 3:
[[[ 6  7  8]
  [18 19 20]
  [30 31 32]
  [42 43 44]
  [54 55 56]]

 [[ 7  8  9]
  [19 20 21]
  [31 32 33]
  [43 44 45]
  [55 56 57]]]


In [23]:
print("VAL BATCHES")
for b, (x, y) in enumerate(get_batch([val_x, val_y], num_steps), 1):
    print('\nBatch {}:'.format(b))
    print(np.stack([x,y]))

VAL BATCHES

Batch 1:
[[[ 9 10 11]
  [21 22 23]
  [33 34 35]
  [45 46 47]
  [57 58 59]]

 [[10 11 12]
  [22 23 24]
  [34 35 36]
  [46 47 48]
  [58 59 60]]]


## 3.2 Building the model

After having our data prepared and convenience functions `split_data` and `get_batch` for handling the data during the training of our model, we can finally start building the model using the TensorFlow library. We will break the model building into five parts:
* building input placeholders for x, y and dropout 
* building multi-layer RNN with stacked LSTM cells
* building softmax output layer
* computation for training loss
* building the optimizer for the model parameters





### Inputs
First, we will create our input placeholders for Tensorflow computational graph of the model. As we are building supervised learning model, we need to declare placeholders for inputs (x) and targets (y). We also need to one-hot encode the input and target tokens, remember we're getting them as encoded characters. Here, we will also declare scalar placeholder for output keep probablity of LSTM cells called `keep_prob`.  

In [24]:
def build_inputs(batch_size, num_steps, num_classes):
    ''' Define placeholders for inputs, targets, and dropout 
    
        Arguments
        ---------
        batch_size: Batch size, number of sequences per batch
        num_steps: Number of sequence steps in a batch
        
    '''
    # Declare placeholders we'll feed into the graph
    
     #Declare placeholder for inputs (x) and one hot encode inputs
    inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
    x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')
    
    #Declare placeholder for targets (y) and one hot encode targets
    targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
    y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')
    
    # Keep probability placeholder for drop out layers
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    return inputs, x_one_hot, targets, y_one_hot, keep_prob

### Multi-layer LSTM Cell
We first implement `build_cell` function where we create the LSTM cell we will use in the hidden layer. We will use this cell as a building block for the multi layer RNN. Afterwards, we implement the `build_lstm` function to create multiple LSTM cells stacked on each other using build_cell function. We can stack up the LSTM cells into layers with tf.contrib.rnn.MultiRNNCell. Finally, we create an initial state of all zeros for the MultiRNNCell.

In [25]:
def build_cell(lstm_size, keep_prob):
    # Use a basic LSTM cell
    lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)

    # Add dropout to the cell
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    return drop

In [26]:
def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
    ''' Build LSTM cell.
    
        Arguments
        ---------
        keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
        lstm_size: Size of the hidden layers in the LSTM cells
        num_layers: Number of LSTM layers
        batch_size: Batch size

    '''
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])
    
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, initial_state

### RNN Output
Here we'll create the output layer. We need to connect the output of the RNN cells to a full connected layer with a softmax output. The softmax output gives us a probability distribution we can use to predict the next character. The output 3D tensor with size $(batch\_size \times num\_steps \times lstm\_size)$ has to be reshaped to $((batch\_size * num\_steps) \times  lstm\_size)$, so we can do the matrix multiplication with the softmax weights.


In [27]:
def build_output(lstm_output, in_size, out_size):
    ''' Build a softmax layer, return the softmax output and logits.
    
        Arguments
        ---------
        
        x: Input tensor
        in_size: Size of the input tensor, for example, size of the LSTM cells
        out_size: Size of this softmax layer
    
    '''

    # Reshape output so it's a bunch of rows, one row for each step for each sequence.
    # That is, the shape should be batch_size*num_steps rows by lstm_size columns
    seq_output = tf.concat(lstm_output, axis=1)
    x = tf.reshape(seq_output, [-1, in_size])
    
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        softmax_w = tf.Variable(tf.truncated_normal((in_size, out_size), stddev=0.1))
        softmax_b = tf.Variable(tf.zeros(out_size))
        
        # Tensorboard
        tf.summary.histogram('h_softmax_w', softmax_w)
        tf.summary.histogram('h_softmax_b', softmax_b)
    
    # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
    # of rows of logit outputs, one for each step and sequence
    logits = tf.matmul(x, softmax_w) + softmax_b
    
    # Use softmax to get the probabilities for predicted characters
    out = tf.nn.softmax(logits, name='predictions')
    
    # Tensorboard
    tf.summary.histogram('h_predictions', out)
    
    return out, logits

### Training loss

Next up is the training loss. We get the logits and targets and calculate the softmax cross-entropy loss. First, reshape the one-hot targets so it's a 2D tensor with size $((batch\_size * num\_steps) \times  num\_classes)$, which match logits. Remember that we reshaped the LSTM outputs and ran them through a fully connected layer with num\_classes units. Then we run the logits and targets through `tf.nn.softmax_cross_entropy_with_logits` and find the mean to get the loss.

In [28]:
def build_loss(logits, y_one_hot, lstm_size):
    ''' Calculate the loss from the logits and the targets.
    
        Arguments
        ---------
        logits: Logits from final fully connected layer
        t
        lstm_size: Number of LSTM hidden units
        num_classes: Number of classes in targets
        
    '''
    
    # reshape one-hot encoded targets to match logits, one row per batch_size per step
    y_reshaped = tf.reshape(y_one_hot, logits.get_shape())
    
    # Softmax cross entropy loss
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped)
    loss = tf.reduce_mean(loss)
    
    # Tensorboard
    tf.summary.scalar('s_cost', loss)
    
    return loss

### Optimizer

Here we build the optimizer. Normal RNNs have have issues gradients exploding and disappearing. LSTMs fix the disappearance problem, but the gradients can still grow without bound. To fix this, we can clip the gradients above some threshold. That is, if a gradient is larger than that threshold, we set it to the threshold. This will ensure the gradients never grow overly large. Then we use an AdamOptimizer for the learning step.

In [29]:
def build_optimizer(loss, learning_rate, grad_clip):
    ''' Build optmizer for training, using gradient clipping.
    
        Arguments:
        loss: Network loss
        learning_rate: Learning rate for optimizer
    
    '''
    
    # Optimizer for training, using gradient clipping to control exploding gradients
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
    train_op = tf.train.AdamOptimizer(learning_rate)
    optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    return optimizer

### Build the network

Now we can put all the pieces together and build a class for the network. To actually run data through the LSTM cells, we will use [`tf.nn.dynamic_rnn`](https://www.tensorflow.org/versions/r1.0/api_docs/python/tf/nn/dynamic_rnn). This function will pass the hidden and cell states across LSTM cells appropriately for us. It returns the outputs for each LSTM cell at each step for each sequence in the mini-batch. It also gives us the final LSTM state. We want to save this state as `final_state` so we can pass it to the first LSTM cell in the the next mini-batch run. For `tf.nn.dynamic_rnn`, we pass in the cell and initial state we get from `build_lstm`, as well as our input sequences.

In [30]:
class CharRNN:
    
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       lstm_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, sampling=False):
    
        # When we're using this network for sampling later, we'll be passing in
        # one character at a time, so providing an option for that
        if sampling == True:
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps

        tf.reset_default_graph()
        
        # Build the input placeholder tensors, and one-hot encode the input and target tokens
        self.inputs, x_one_hot, self.targets, y_one_hot, self.keep_prob = \
        build_inputs(batch_size, num_steps, num_classes)
        
        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, self.keep_prob)
 
        # Run each sequence step through the RNN and collect the outputs
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=self.initial_state)
        
        self.final_state = state
        
        # Get softmax predictions and logits
        self.prediction, self.logits = build_output(outputs, lstm_size, num_classes)
        
        # Loss and optimizer (with gradient clipping)
        self.loss = build_loss(self.logits, y_one_hot, lstm_size)
        self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)
        
        self.summary_merged = tf.summary.merge_all()

### Hyperparameters

Here we declare the hyperparameters for the network. 

* `batch_size` - Number of sequences running through the network in one pass.
* `num_steps` - Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.
* `lstm_size` - The number of units in the hidden layers.
* `num_layers` - Number of hidden LSTM layers to use
* `learning_rate` - Learning rate for training
* `keep_prob` - The dropout keep probability when training. If you're network is overfitting, try decreasing this.

Here's some good advice from Andrej Karpathy on training the network [https://github.com/karpathy/char-rnn#tips-and-tricks](https://github.com/karpathy/char-rnn#tips-and-tricks).



In [31]:
batch_size = 100        # Sequences per batch
num_steps = 100         # Number of sequence steps per batch
lstm_size = 512         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.001   # Learning rate
keep_prob = 0.5         # Dropout keep probability

### Number of parameters

LATM cell: $4 \times \big[N_{units} \times (N_{inputs}+1) + N_{units}^{2}\big]; N_{units}=lstm\_size, N_{inputs}=len(vocab)$

In [32]:
print("lstm_size = {}".format(lstm_size))
print("input size = {}".format(len(vocab)))

lstm_size = 512
input size = 92


In [33]:
model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

In [34]:
tf.trainable_variables()

[<tf.Variable 'rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0' shape=(604, 2048) dtype=float32_ref>,
 <tf.Variable 'rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0' shape=(1024, 2048) dtype=float32_ref>,
 <tf.Variable 'rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'softmax/Variable:0' shape=(512, 92) dtype=float32_ref>,
 <tf.Variable 'softmax/Variable_1:0' shape=(92,) dtype=float32_ref>]

### Write out the graph for TensorBoard

In [35]:
model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    file_writer = tf.summary.FileWriter('assets/logs/1', sess.graph)
    
    file_writer.close()

#run tensorboard from command line by issuing command (e.g. from root repository directory):
#tensorboard --logdir=Day-3/assets/logs/

### Run nodes in the graph

In [36]:
train_x, train_y, val_x, val_y = split_data(encoded_chars, batch_size, num_steps)

model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)


with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    file_writer = tf.summary.FileWriter('assets/logs/2', sess.graph)
    
    new_state = sess.run(model.initial_state)
    x, y = next(get_batch([train_x, train_y], num_steps))
    
    feed = {model.inputs: x,
        model.targets: y,
        model.keep_prob: 0.5,
        model.initial_state: new_state}
    
    summary, batch_loss, new_state = sess.run([model.summary_merged, model.loss, model.final_state], feed_dict=feed)
    
    file_writer.add_summary(summary, 0)
    
    file_writer.close()

In [37]:
batch_loss

4.5195522

In [38]:
'Number of layers : {}'.format(len(new_state))

'Number of layers : 2'

In [39]:
type(new_state[0]), type(new_state[1])

(tensorflow.python.ops.rnn_cell_impl.LSTMStateTuple,
 tensorflow.python.ops.rnn_cell_impl.LSTMStateTuple)

In [40]:
new_state[1].c.shape, new_state[1].h.shape

((100, 512), (100, 512))

## 3.3 Training model

This is typical training code, passing inputs and targets into the network, then running the optimizer. Here we also get back the final LSTM state for the mini-batch. Then, we pass that state back into the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) we calculate the validation loss and save a checkpoint.

In [41]:
epochs = 1 #20
save_every_n = 200
train_x, train_y, val_x, val_y = split_data(encoded_chars, batch_size, num_steps)


model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

saver = tf.train.Saver(max_to_keep=100)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter('assets/logs/3/train', sess.graph)
    test_writer = tf.summary.FileWriter('assets/logs/3/test')
    
    #############################################################
    # Use the line below to load a checkpoint and resume training
    # Plase download provided checkpoint files from the link
    # https://www.dropbox.com/s/3p9l3s8nzzkg1en/trump_i4320_l512.ckpt.zip?dl=0#
    # and place them in assets/checkpoints/ssds direcory in the repositoy
    saver.restore(sess, 'assets/checkpoints/ssds/trump_i4320_l512.ckpt')
    #############################################################
    
    n_batches = int(train_x.shape[1]/num_steps)
    iterations = n_batches * epochs
    for e in range(epochs):
        
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
            iteration = e*n_batches + b
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: 0.5,
                    model.initial_state: new_state}
            summary, batch_loss, new_state, _ = sess.run([model.summary_merged, model.loss, model.final_state, model.optimizer], 
                                                 feed_dict=feed)
            loss += batch_loss
            end = time.time()
            print('Epoch {}/{} '.format(e+1, epochs),
                  'Iteration {}/{}'.format(iteration, iterations),
                  'Training loss: {:.4f}'.format(loss/b),
                  '{:.4f} sec/batch'.format((end-start)))
            
            train_writer.add_summary(summary, iteration)
        
            if (iteration%save_every_n == 0) or (iteration == iterations):
                # Check performance, notice dropout has been set to 1
                val_loss = []
                new_state = sess.run(model.initial_state)
                for x, y in get_batch([val_x, val_y], num_steps):
                    feed = {model.inputs: x,
                            model.targets: y,
                            model.keep_prob: 1.,
                            model.initial_state: new_state}
                    summary, batch_loss, new_state = sess.run([model.summary_merged, model.loss, model.final_state], feed_dict=feed)
                    val_loss.append(batch_loss)
                
                test_writer.add_summary(summary, iteration)

                print('Validation loss:', np.mean(val_loss),
                      'Saving checkpoint!')
                saver.save(sess, "assets/checkpoints/trump/trump_new_i{}_l{}_{:.3f}.ckpt".format(iteration, lstm_size, np.mean(val_loss)))

Epoch 1/1  Iteration 1/194 Training loss: 1.4525 3.8374 sec/batch
Epoch 1/1  Iteration 2/194 Training loss: 1.3918 3.3975 sec/batch
Epoch 1/1  Iteration 3/194 Training loss: 1.4020 3.4241 sec/batch
Epoch 1/1  Iteration 4/194 Training loss: 1.3904 3.2976 sec/batch
Epoch 1/1  Iteration 5/194 Training loss: 1.3775 3.2635 sec/batch
Epoch 1/1  Iteration 6/194 Training loss: 1.3779 3.8269 sec/batch
Epoch 1/1  Iteration 7/194 Training loss: 1.3753 3.6497 sec/batch
Epoch 1/1  Iteration 8/194 Training loss: 1.3775 3.8245 sec/batch
Epoch 1/1  Iteration 9/194 Training loss: 1.3767 4.1228 sec/batch
Epoch 1/1  Iteration 10/194 Training loss: 1.3754 4.2724 sec/batch
Epoch 1/1  Iteration 11/194 Training loss: 1.3710 4.1800 sec/batch
Epoch 1/1  Iteration 12/194 Training loss: 1.3699 4.3923 sec/batch
Epoch 1/1  Iteration 13/194 Training loss: 1.3733 4.7221 sec/batch
Epoch 1/1  Iteration 14/194 Training loss: 1.3726 4.3953 sec/batch
Epoch 1/1  Iteration 15/194 Training loss: 1.3707 4.2723 sec/batch
Epoc

Epoch 1/1  Iteration 124/194 Training loss: 1.3653 4.9670 sec/batch
Epoch 1/1  Iteration 125/194 Training loss: 1.3653 5.2259 sec/batch
Epoch 1/1  Iteration 126/194 Training loss: 1.3653 5.0187 sec/batch
Epoch 1/1  Iteration 127/194 Training loss: 1.3653 4.9575 sec/batch
Epoch 1/1  Iteration 128/194 Training loss: 1.3652 4.9274 sec/batch
Epoch 1/1  Iteration 129/194 Training loss: 1.3654 5.2300 sec/batch
Epoch 1/1  Iteration 130/194 Training loss: 1.3655 6.1638 sec/batch
Epoch 1/1  Iteration 131/194 Training loss: 1.3656 5.7031 sec/batch
Epoch 1/1  Iteration 132/194 Training loss: 1.3658 5.2177 sec/batch
Epoch 1/1  Iteration 133/194 Training loss: 1.3662 5.2397 sec/batch
Epoch 1/1  Iteration 134/194 Training loss: 1.3663 4.9468 sec/batch
Epoch 1/1  Iteration 135/194 Training loss: 1.3665 4.8977 sec/batch
Epoch 1/1  Iteration 136/194 Training loss: 1.3663 5.0583 sec/batch
Epoch 1/1  Iteration 137/194 Training loss: 1.3661 5.1973 sec/batch
Epoch 1/1  Iteration 138/194 Training loss: 1.36

### Saved checkpoints

Read up on saving and loading checkpoints here: https://www.tensorflow.org/programmers_guide/variables

In [42]:
tf.train.get_checkpoint_state('assets/checkpoints/trump')

model_checkpoint_path: "assets/checkpoints/trump/trump_new_i194_l512_1.229.ckpt"
all_model_checkpoint_paths: "assets/checkpoints/trump/trump_new_i194_l512_1.229.ckpt"

## 3.4 Testing model - sampling from the model

In [43]:
from IPython.core.display import display, HTML

In [44]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [45]:
def sample_model(checkpoint, n_samples, lstm_size, vocab_size, num_layers=2, prime="The "):
    samples = [c for c in prime]
    model = CharRNN(len(vocab), lstm_size=lstm_size, num_layers=num_layers, sampling=True)
    saver = tf.train.Saver()
    
    states = []
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
         
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)
      
            states.append(new_state)
    
        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])
        states.append(new_state)

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
            states.append(new_state)
        
    return (''.join(samples), states)

In [47]:
checkpoint = tf.train.latest_checkpoint('assets/checkpoints/trump')
samp, _ = sample_model(checkpoint, 160, lstm_size, len(vocab), prime="Obama")
print(samp)


ObamaCare will be a great president!
What will should have been a friend of the more play to hear and how a great problem who is a bottrest fact. That's a true good l


Plase download provided checkpoint files from the Dropbox shared link
[https://www.dropbox.com/s/3p9l3s8nzzkg1en/trump_i4320_l512.ckpt.zip?dl=0#](https://www.dropbox.com/s/3p9l3s8nzzkg1en/trump_i4320_l512.ckpt.zip?dl=0#)
 and place them in `assets/checkpoints/ssds` direcory in the repositoy.

In [48]:
checkpoint = 'assets/checkpoints/ssds/trump_i4320_l512.ckpt'
samp, _ = sample_model(checkpoint, 1000, lstm_size, len(vocab), prime="Obama (")
print(samp)

Obama (cont) http://t.co/IYYYB70k
The polls are gassing about me and a great job in China to be the stupid campaign and there is a tax seasing their country. Will be this more. They are not being a sense for a charator!
I hope @MittRomney was great filst and a great correct of our great call the failing @newsmax as @CNN to me!
The @Macys is terrific. I will give a crazy the man and will another still and allow their. Send its careful in the road.
The @BarackObama still have the real place and telling an extates and money. They have never been better than you can do.
If you lest the possible- that they defind my comments in anywhine will be said that they are going to get them interest our country. We have no sees it was broken back to them and being sport.
If you're designed and is not only saying what the place of the press with me and see the false points in the saming way to be a good thought to make a great tax country.
Interesting and totally support. See you at @BarackObama we ha

In [49]:
checkpoint = 'assets/checkpoints/ssds/trump_i4320_l512.ckpt'
samp, _ = sample_model(checkpoint, 1000, lstm_size, len(vocab), num_layers=1, prime="Obama (")
print(samp)

Obama (aIHKKaCy`bEerBBEEEEbBBeblhlMnM8n)nMGnccHnr jVylq)FttlfAFSbGlhenbMnaQ5Bes m[FMnksq1FF5nA55nsc nggtngngylhesem{LNFFEKbMWFAns{S`ns]g ngim\Fna3ZeuqNFEAAAbObs3bFnbubr1eurXlnggblncrEr pneplF6bscHnernGrrvvvtlh[ngnwEpnpllwFvAAnc 8InWTlnc `UaQQ[[}*bbTlh\Qq[[]q]FFntls9nF639QF635nb6bnerF6q1zb9FH9BnbbBbnenabMlhbnat`AIF bFlEemLWEAAAbs ZFeHFleAAISOS5FFbFFnb79EbMP@r)zm|bsenbmlhcvlevenFnauWFFFbeFAbTneWFS@ujeniMTm|@{j 2bT1kTRp\nbr9eseublCFbr9lNPMccngr7wRS`252ngbM \ngichhleenWFlnbFbTnrGnseMlUpllFlCFneObBS9NabGenerTneru bMNaucc@Gana]s`S{WHT@plC
AIKgtgatcflwwnggptpTlnwFrbr3ObFFeuub61bBTlnFrt 3zTbRFFntbMna~q51bFlnFEbTngnbt hbrne~=tt ng@JncihTniHF bnrUvhbnalHLKHttepLCQAAIFS315FbFFFFFEnAIF9Uq511bM9no)FMna5zs5ksnscneuururn6bFFFdnT@q\GBlh\hVnauj{S]F@uFNeeZTmccHnaQq)[eupTm]ng@nhchneum`Mnggnwtm]ns`snsem{blhbnalFvrtq9FFF5FF5Ns9neub69eem`nclgenerWvTnwHlwtlnvLT@wMvt"A+sgngggnchr 8nFKbMbRFbm`abMneursckTm[u]bMbFebMnabmlCcflvab69Vb9ehqllFemGAabscnetccorr955552s bncfnrttl
VaQat%}&t~==AObFFbMnaQ5gsg lh$icHLe eom|

In [50]:
checkpoint = 'assets/checkpoints/ssds/trump_i4320_l512.ckpt'
samp, _ = sample_model(checkpoint, 1000, lstm_size, len(vocab), num_layers=2, prime="Obama (")
print(samp)

Obama (cont) http://t.co/THa770x0
"Donald Trump Tops Prime Military All Stars and Failing Tower Along Collection http://t.co/22raax5c   They are all control the bank tal is so many sides?
Trump Intelnational Holly of the Deboce should have a great present that he has been destroyed by the border will defeat them!
@TeleballGroups http://t.co/narccirc
@JandyMinsher  The Apprentice and the world will be a great time for a great campaign from my office. True!
I will beat Moring South Carolina to me in the polls to the media is never talking about the U to be talented about the birthday to the U.S. in the server- and his people are not a begon.
Well weak in Charlottesville for an highly and want to both a golf course in New York. It is too bad!
I will be at @ApprenticeNBC tonight at 9pm. #Trump2016#MakeAmericaGreatAgain https://t.co/A5arI5rnrt
Thank you to the U.S.A. Startes to be interested. Will be amazing! #Irainahttpses at 11:30pm. https://t.co/ta5crca7Yn
I will be interviewed on @foxan

## 3.5 Visualization of memory cell activations

In [51]:
from IPython.core.display import display, HTML
from utils import save_lstm_vis, make_colored_text

In [52]:
checkpoint = 'assets/checkpoints/ssds/trump_i4320_l512.ckpt'
samp, states = sample_model(checkpoint, 1000, lstm_size, len(vocab), prime="Obama (")
print(samp)

Obama (cont) http://t.co/20cYYB04
"The post is the people in America. The rollaut against me whith I want to total can work and a the first policies!
"If you don't have to get it that we are that well." - @BarackObama should be better!
If you win friends in a landslide in the support of our country in the winner. Stay taxes at the U.S. is not too smart!... http://t.co/YxxxBxc6
Travel who would be a still better and with military should be a successful coupt of a big spok why they are not ban and tough and say they are sell or instincts.
@AmericanReasher @MissUniverse @TrumpChicago @TrumpDoral on the @Newsmax_Media by Thank you from the @BarackObama.
I am spending the poll numbers will be a great comments of cast in the signature country.
In that I will be interviewed by @SeanHannity tonight at 10pmE on Fox &amp; Friends at 1:30pmE on @CNBC. Thank you fer your focus on the biggest crowd!
I am starting to be passionation on my so much things work as they content to ach time if we will be

In [53]:
HTML(make_colored_text(samp, states, cell_id=200, layer_id=1))

In [54]:
HTML(make_colored_text(samp, states, cell_id=200, layer_id=0))

In [None]:
save_lstm_vis("trump_i4320_l512_ca_L2_S512_E20", samp, states)

Number of layers: 2
Number of memory cells (LSTM size): 512
Saving trump_i4320_l512_ca_L2_S512_E20_0.html...
Saving trump_i4320_l512_ca_L2_S512_E20_1.html...
