![Big Data University](https://ibm.box.com/shared/static/jvcqp2iy2jlx2b32rmzdt0tx8lvxgzkp.png)
# <center> Text generation using RNN/LSTM (Character-level)</center>
<div class="alert alert-block alert-info">
<font size = 3><strong>In this notebook you will learn the How to use TensorFlow for create a Recurrent Neural Network</strong></font>
<br>    
- <a href="#intro">Introduction</a>
<br>
- <p><a href="#arch">Architectures</a></p>
    - <a href="#lstm">Long Short-Term Memory Model (LSTM)</a>

- <p><a href="#build">Building a LSTM with TensorFlow</a></p>
</div>
----------------

This code implements a Recurrent Neural Network with LSTM/RNN units for training/sampling from character-level language models. In other words the model takes a text file as input and trains the RNN network that learns to predict the next character in a sequence.  
The RNN can then be used to generate text character by character that will look like the original training data. 

This code is based on this [blog](http://karpathy.github.io/2015/05/21/rnn-effectiveness/), and the code is an step-by-step implimentation of the [character-level implimentation](https://github.com/crazydonkey200/tensorflow-char-rnn).




In [1]:
import tensorflow as tf
import time

### Data loader
The following cell is a class that help to read data from input file.

In [2]:
import codecs
import os
import collections
from six.moves import cPickle
import numpy as np

class TextLoader():
    def __init__(self, data_dir, batch_size, seq_length, encoding='utf-8'):
        self.data_dir = data_dir
        self.batch_size = batch_size
        self.seq_length = seq_length
        self.encoding = encoding

        input_file = os.path.join(data_dir, "input.txt")
        vocab_file = os.path.join(data_dir, "vocab.pkl")
        tensor_file = os.path.join(data_dir, "data.npy")

        if not (os.path.exists(vocab_file) and os.path.exists(tensor_file)):
            print("reading text file")
            self.preprocess(input_file, vocab_file, tensor_file)
        else:
            print("loading preprocessed files")
            self.load_preprocessed(vocab_file, tensor_file)
        self.create_batches()
        self.reset_batch_pointer()

    def preprocess(self, input_file, vocab_file, tensor_file):
        with codecs.open(input_file, "r", encoding=self.encoding) as f:
            data = f.read()
        counter = collections.Counter(data)
        count_pairs = sorted(counter.items(), key=lambda x: -x[1])
        self.chars, _ = zip(*count_pairs)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        with open(vocab_file, 'wb') as f:
            cPickle.dump(self.chars, f)
        self.tensor = np.array(list(map(self.vocab.get, data)))
        np.save(tensor_file, self.tensor)

    def load_preprocessed(self, vocab_file, tensor_file):
        with open(vocab_file, 'rb') as f:
            self.chars = cPickle.load(f)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        self.tensor = np.load(tensor_file)
        self.num_batches = int(self.tensor.size / (self.batch_size *
                                                   self.seq_length))

    def create_batches(self):
        self.num_batches = int(self.tensor.size / (self.batch_size *
                                                   self.seq_length))

        # When the data (tensor) is too small, let's give them a better error message
        if self.num_batches==0:
            assert False, "Not enough data. Make seq_length and batch_size small."

        self.tensor = self.tensor[:self.num_batches * self.batch_size * self.seq_length]
        xdata = self.tensor
        ydata = np.copy(self.tensor)
        ydata[:-1] = xdata[1:]
        ydata[-1] = xdata[0]
        self.x_batches = np.split(xdata.reshape(self.batch_size, -1), self.num_batches, 1)
        self.y_batches = np.split(ydata.reshape(self.batch_size, -1), self.num_batches, 1)


    def next_batch(self):
        x, y = self.x_batches[self.pointer], self.y_batches[self.pointer]
        self.pointer += 1
        return x, y

    def reset_batch_pointer(self):
        self.pointer = 0

### Parameters
#### Batch, number_of_batch, batch_size and seq_length
what is batch, number_of_batch, batch_size and seq_length in the charcter level example?  

Lets assume the input is 'here is an example'. Then:
- txt_length = 18  
- seq_length = 3  
- batch_size = 2  
- number_of_batch = 18/3*2 = 3
- batch = array (['h','e','r'],['e',' ','i'])
- sample Seq = 'her'  


So, what are our actual parameters?


In [3]:
batch_size = 60 #minibatch size, i.e. size of dataset in each epoch
seq_length = 50 #RNN sequence length
num_epochs = 25 # you should change it to 50 if you want to see a relatively good results
learning_rate = 0.002
decay_rate = 0.97
rnn_size = 128 #size of RNN hidden state
num_layers = 2 #number of layers in the RNN

### LSTM Architecture
- each LSTM cell has an input layre, which its size is 128 units. 
- 128 is dimensionality of embedding vector.



#### rnn_size = num_units = num_hidden_units:   = LSTM size


- Each LSTM cell has a hidden layer, where there are some hidden units.
- The argument n_hidden=128 of BasicLSTMCell is the number of hidden units of the LSTM (inside A).
- Each LSTM cell keeps a vector, called __hidden state__ vector, of size n_hidden=128.
- A __hidden state__ vector; which is the memory of the LSTM, accumulates using its (forget, input, and output) gates through time. 
- For each LSTM cell that we initialise, we need to supply a value (128 in this case) for the hidden dimension, or as some people like to call it, the number of units in the LSTM cell. 
- "num_units" is equivalant to "size of RNN hidden state"
- rnn_size= 128, is also the dimension size of W2V/embedding, for each character/word.
- An LSTM keeps two pieces of information as it propagates through time: 
    - A __hidden state__ vector
    - A __previous time-step output__
- To make the name num_units more intuitive, you can think of it as the number of hidden units in the LSTM cell, or the number of memory units in the cell.
- number of hidden units is the dimensianality of the output (= dimesianality of the state) of the LSTM cell.

#### num_layers = 2 
- number of layers in the RNN
- An input of MultiRNNCell is __cells__ which is list of RNNCells that will be composed in this order.

In [4]:
data_loader = TextLoader('/resources/data/', batch_size, seq_length)
vocab_size = data_loader.vocab_size
data_loader.vocab_size

loading preprocessed files


65

In [5]:
data_loader.num_batches

371

### Input and output

In [6]:
x,y = data_loader.next_batch()

In [7]:
x

array([[49,  9,  7, ...,  1,  4,  7],
       [19,  4, 14, ..., 14,  9, 20],
       [ 8, 20, 10, ...,  8, 10, 18],
       ..., 
       [21,  2,  0, ...,  0, 21,  0],
       [ 9,  7,  7, ...,  0,  2,  3],
       [ 3,  7,  0, ...,  5,  9, 23]])

In [8]:
x.shape  #batch_size =50, seq_length=50

(60, 50)

In [9]:
y

array([[ 9,  7,  6, ...,  4,  7,  0],
       [ 4, 14, 22, ...,  9, 20,  5],
       [20, 10, 29, ..., 10, 18,  4],
       ..., 
       [ 2,  0,  6, ..., 21,  0,  6],
       [ 7,  7,  4, ...,  2,  3,  0],
       [ 7,  0, 33, ...,  9, 23,  0]])

In [10]:
data_loader.chars[0:5]

(u' ', u'e', u't', u'o', u'a')

In [11]:
data_loader.vocab['t']

2

### Defining stacked RNN Cell

__BasicRNNCell__ is the most basic RNN cell.

In [12]:
#cell= rnn_cell.BasicLSTMCell
cell = tf.contrib.rnn.BasicRNNCell(rnn_size)

In [13]:
# a two layer cell
stacked_cell = tf.contrib.rnn.MultiRNNCell([cell] * num_layers)

In [14]:
# hidden state size
stacked_cell.output_size

128

In [15]:
stacked_cell.state_size

(128, 128)

In [16]:
input_data = tf.placeholder(tf.int32, [batch_size, seq_length])# a 60x50
targets = tf.placeholder(tf.int32, [batch_size, seq_length]) # a 60x50


The memory state of the network is initialized with a vector of zeros and gets updated after reading each character.

__BasicRNNCell.zero_state(batch_size, dtype)__ Return zero-filled state tensor(s).

Args:

batch_size: int, float, or unit Tensor representing the batch size.  
dtype: the data type to use for the state.

In [17]:
initial_state = stacked_cell.zero_state(batch_size, tf.float32) #why batch_size ? 60x128

In [18]:
input_data

<tf.Tensor 'Placeholder:0' shape=(60, 50) dtype=int32>

In [19]:
session = tf.Session()

In [20]:
feed_dict={input_data:x, targets:y}

In [21]:

session.run(input_data, feed_dict)

array([[49,  9,  7, ...,  1,  4,  7],
       [19,  4, 14, ..., 14,  9, 20],
       [ 8, 20, 10, ...,  8, 10, 18],
       ..., 
       [21,  2,  0, ...,  0, 21,  0],
       [ 9,  7,  7, ...,  0,  2,  3],
       [ 3,  7,  0, ...,  5,  9, 23]], dtype=int32)

### Embedding

In [22]:
with tf.variable_scope('rnnlm',reuse=False):
    softmax_w = tf.get_variable("softmax_w", [rnn_size, vocab_size]) #128x65
    softmax_b = tf.get_variable("softmax_b", [vocab_size]) # 1x65)
    with tf.device("/cpu:0"):
        embedding = tf.get_variable("embedding", [vocab_size, rnn_size])  #65x128
        #input_data is a matrix of 60x50 and embedding is dictionary of 65x128 for all 65 characters
        # embedding_lookup goes to each row of input_data, and for each character in the row, finds the correspond vector in embedding
        # it creates a 60*50*[1*128] matrix
        # so, the first elemnt of em, is a matrix of 50x128, which each row of it is vector representing that character
        em = tf.nn.embedding_lookup(embedding, input_data) # em is 60x50x[1*128]
        # split: Splits a tensor into sub tensors.
        # syntax:  tf.split(split_dim, num_split, value, name='split')
        # it will split the 60x50x[1x128] matrix into 50 matrix of 60x[1*128]
        inputs = tf.split(em, seq_length, 1)
        # It will convert the list to 50 matrix of [60x128]
        inputs = [tf.squeeze(input_, [1]) for input_ in inputs]

In [23]:
session.run(tf.global_variables_initializer())
session.run(embedding)

array([[-0.01187611,  0.17317678, -0.04664408, ...,  0.13480385,
        -0.17349039,  0.11645617],
       [-0.09165053, -0.00296672,  0.08263053, ..., -0.10514037,
         0.07863681,  0.13713406],
       [ 0.15487878,  0.04418661, -0.07928608, ..., -0.01012498,
         0.09112112,  0.02380088],
       ..., 
       [-0.02088466, -0.1330204 ,  0.11572404, ...,  0.01860093,
         0.061552  , -0.0512418 ],
       [ 0.15729706, -0.17406626, -0.03938693, ..., -0.13803616,
        -0.15875421, -0.00743723],
       [-0.03634189, -0.10259962,  0.09615393, ...,  0.02035314,
        -0.08798251, -0.01574321]], dtype=float32)

In [24]:
em = tf.nn.embedding_lookup(embedding, input_data)
em

<tf.Tensor 'embedding_lookup:0' shape=(60, 50, 128) dtype=float32>

In [25]:
emp = session.run(em,feed_dict={input_data:x})
emp.shape

(60, 50, 128)

In [26]:
emp[0]


array([[-0.08390545,  0.07798855,  0.04977153, ..., -0.01728435,
         0.14431466, -0.11718105],
       [-0.141221  , -0.06074303,  0.05824159, ..., -0.11142561,
         0.11705144, -0.13420179],
       [-0.10235118,  0.04645897, -0.09329474, ...,  0.04407293,
        -0.15662232, -0.06366031],
       ..., 
       [-0.09165053, -0.00296672,  0.08263053, ..., -0.10514037,
         0.07863681,  0.13713406],
       [ 0.04105198, -0.09031538,  0.08082359, ..., -0.08462127,
         0.07042079, -0.07249702],
       [-0.10235118,  0.04645897, -0.09329474, ...,  0.04407293,
        -0.15662232, -0.06366031]], dtype=float32)

In [27]:
inputs = tf.split(em, seq_length, 1)
inputs[0:5]

[<tf.Tensor 'split:0' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:1' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:2' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:3' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:4' shape=(60, 1, 128) dtype=float32>]

In [28]:
inputs = [tf.squeeze(input_, [1]) for input_ in inputs]
inputs[0:5]

[<tf.Tensor 'Squeeze:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_1:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_2:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_3:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_4:0' shape=(60, 128) dtype=float32>]

### Feeding a batch of 50 sequence to a RNN:
- Step 1:  first character of each of the 50 sentences (in a batch) is input in parallel.  
- Step 2:  second character of each of the 50 sentences is input in parallel. 
- Step n: nth character of each of the 50 sentences is input in parallel.  

The parallelism is only for efficiency.  Each character in a batch is handled in parallel,  but the network sees one character of a sequence at a time and does the computations accordingly. All the computations involving the characters of all sequences in a batch at a given time step are done in parallel. 

In [29]:
session.run(inputs[0],feed_dict={input_data:x})

array([[-0.08390545,  0.07798855,  0.04977153, ..., -0.01728435,
         0.14431466, -0.11718105],
       [-0.08402   ,  0.03561696,  0.00605483, ..., -0.15896314,
         0.15770842,  0.16403507],
       [-0.04141414, -0.09957065, -0.08418749, ..., -0.14920859,
         0.09632279,  0.01807462],
       ..., 
       [-0.05428317, -0.0974982 , -0.05203034, ...,  0.16225956,
         0.08755989, -0.088445  ],
       [-0.141221  , -0.06074303,  0.05824159, ..., -0.11142561,
         0.11705144, -0.13420179],
       [-0.16422763, -0.12701379,  0.14181305, ...,  0.10273062,
         0.01999506,  0.11997597]], dtype=float32)

In [30]:
cell.state_size

128

In [31]:
#outputs is 50x[60*128]
outputs, last_state = tf.contrib.legacy_seq2seq.rnn_decoder(inputs, initial_state, stacked_cell, loop_function=None, scope='rnnlm')


In [32]:
outputs[0:5]

[<tf.Tensor 'rnnlm_1/multi_rnn_cell/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_1/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_2/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_3/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_4/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>]

In [33]:
test = outputs[0]
test

<tf.Tensor 'rnnlm_1/multi_rnn_cell/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>

In [34]:
session.run(tf.global_variables_initializer())
session.run(test,feed_dict={input_data:x})

array([[-0.05821293, -0.05047727,  0.03240858, ..., -0.06629939,
        -0.10825536,  0.01807678],
       [ 0.00837175,  0.01522417, -0.02819392, ..., -0.01531586,
         0.1428543 ,  0.0505182 ],
       [ 0.00326554, -0.0340628 , -0.03086306, ...,  0.00487064,
        -0.00767021,  0.05711473],
       ..., 
       [-0.03196747, -0.22419493,  0.121751  , ..., -0.05133714,
         0.07012291,  0.04810454],
       [ 0.0887068 , -0.05727573, -0.07711516, ..., -0.00186371,
         0.18001084,  0.06250025],
       [-0.02076182, -0.06524993, -0.0664552 , ..., -0.01729499,
         0.05801091,  0.02758405]], dtype=float32)

outputs is 50x[60*128]. We need to reshape it to [60x50x128]. Then we can calculate the softmax:

softmax_w is [rnn_size, vocab_size], [128x65]

[60x50x128]x[128x65]+[60x50]

In [35]:
output = tf.reshape(tf.concat( outputs,1), [-1, rnn_size])
output

<tf.Tensor 'Reshape:0' shape=(3000, 128) dtype=float32>

In [36]:
logits = tf.matmul(output, softmax_w) + softmax_b
logits

<tf.Tensor 'add:0' shape=(3000, 65) dtype=float32>

In [37]:
probs = tf.nn.softmax(logits)
probs

<tf.Tensor 'Softmax:0' shape=(3000, 65) dtype=float32>

In [38]:
session.run(tf.global_variables_initializer())
session.run(probs,feed_dict={input_data:x})

array([[ 0.01452553,  0.01277425,  0.01863852, ...,  0.01512464,
         0.01406453,  0.01335054],
       [ 0.01180578,  0.01229866,  0.01665466, ...,  0.01562338,
         0.01562235,  0.01671582],
       [ 0.01471618,  0.01498685,  0.01340279, ...,  0.01755621,
         0.01526808,  0.01378074],
       ..., 
       [ 0.01179038,  0.0098834 ,  0.01629242, ...,  0.01548402,
         0.01515435,  0.01294479],
       [ 0.01756067,  0.01485729,  0.01396407, ...,  0.01787437,
         0.01747026,  0.01640356],
       [ 0.01610383,  0.01111977,  0.01510501, ...,  0.01382569,
         0.0106961 ,  0.01195219]], dtype=float32)

In [39]:
loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example([logits],
                [tf.reshape(targets, [-1])],
                [tf.ones([batch_size * seq_length])],
                vocab_size)

In [40]:
cost = tf.reduce_sum(loss) / batch_size / seq_length
cost
        

<tf.Tensor 'div_1:0' shape=() dtype=float32>

In [41]:
final_state = last_state
final_state

(<tf.Tensor 'rnnlm_1/multi_rnn_cell_49/cell_0/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_49/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>)

In [42]:
lr = tf.Variable(0.0, trainable=False)

In [43]:
grad_clip =5.
tvars = tf.trainable_variables()

In [44]:
tvars

[<tensorflow.python.ops.variables.Variable at 0x11f840050>,
 <tensorflow.python.ops.variables.Variable at 0x11f82bf50>,
 <tensorflow.python.ops.variables.Variable at 0x11f85bf50>,
 <tensorflow.python.ops.variables.Variable at 0x11f9f3190>,
 <tensorflow.python.ops.variables.Variable at 0x11f9f3050>,
 <tensorflow.python.ops.variables.Variable at 0x11fd68310>,
 <tensorflow.python.ops.variables.Variable at 0x11fd757d0>]

In [45]:
session.run(tf.global_variables_initializer())
[v.name for v in tf.global_variables()]

[u'rnnlm/softmax_w:0',
 u'rnnlm/softmax_b:0',
 u'rnnlm/embedding:0',
 u'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/weights:0',
 u'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/biases:0',
 u'rnnlm/multi_rnn_cell/cell_1/basic_rnn_cell/weights:0',
 u'rnnlm/multi_rnn_cell/cell_1/basic_rnn_cell/biases:0',
 u'Variable:0']

In [46]:
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)
grads

[<tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_0:0' shape=(128, 65) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_1:0' shape=(65,) dtype=float32>,
 <tensorflow.python.framework.ops.IndexedSlices at 0x122071990>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_3:0' shape=(256, 128) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_4:0' shape=(128,) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_5:0' shape=(256, 128) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_6:0' shape=(128,) dtype=float32>]

In [47]:
session.run(grads, feed_dict)[0]

array([[ -1.15889707e-04,   4.72889794e-03,   4.97568049e-04, ...,
         -3.00469866e-04,  -2.99643085e-04,  -2.73773738e-04],
       [ -9.02880263e-03,  -1.17456540e-03,  -5.33561828e-03, ...,
          1.13346858e-03,   1.15507946e-03,   9.97719471e-04],
       [ -3.42746335e-03,  -5.40494744e-04,  -1.60877389e-04, ...,
          4.20372613e-04,   3.28794529e-04,   2.47172808e-04],
       ..., 
       [ -7.71436235e-03,  -6.17193757e-03,   2.12614439e-04, ...,
          7.91810919e-04,   7.32082524e-04,   6.88683067e-04],
       [  9.13359970e-03,   5.39305434e-03,   2.35419301e-03, ...,
         -9.34848678e-04,  -9.68575769e-04,  -9.20143095e-04],
       [  2.37912731e-03,  -1.45130185e-03,  -6.81445003e-04, ...,
         -5.27885277e-06,  -1.81617615e-05,  -3.04286186e-06]], dtype=float32)

In [48]:
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.apply_gradients(zip(grads, tvars))

# Using classes
Now that we have learned how the networks work, we can put all together:

In [49]:
import tensorflow as tf
from tensorflow.python.ops import rnn_cell
from tensorflow.python.ops import seq2seq

import numpy as np

class LSTMModel():
    def __init__(self,sample=False):
        rnn_size = 128 # size of RNN hidden state vector
        batch_size = 60 # minibatch size, i.e. size of dataset in each epoch
        seq_length = 50 # RNN sequence length
        num_layers = 2 # number of layers in the RNN
        vocab_size = 65
        grad_clip = 5.
        if sample:
            print("sample mode")
            batch_size = 1
            seq_length = 1
        # The core of the model consists of an LSTM cell that processes one char at a time and computes probabilities of the possible continuations of the char. 
        basic_cell = tf.contrib.rnn.BasicRNNCell(rnn_size)
        # model.cell.state_size is (128, 128)
        self.stacked_cell = tf.contrib.rnn.MultiRNNCell([basic_cell] * num_layers)

        self.input_data = tf.placeholder(tf.int32, [batch_size, seq_length])
        self.targets = tf.placeholder(tf.int32, [batch_size, seq_length])
        # Initial state of the LSTM memory.
        # The memory state of the network is initialized with a vector of zeros and gets updated after reading each char. 
        self.initial_state = stacked_cell.zero_state(batch_size, tf.float32) #why batch_size

        with tf.variable_scope('rnnlm_class1'):
            softmax_w = tf.get_variable("softmax_w", [rnn_size, vocab_size]) #128x65
            softmax_b = tf.get_variable("softmax_b", [vocab_size]) # 1x65
            with tf.device("/cpu:0"):
                embedding = tf.get_variable("embedding", [vocab_size, rnn_size])  #65x128
                inputs = tf.split(tf.nn.embedding_lookup(embedding, self.input_data), seq_length, 1)
                inputs = [tf.squeeze(input_, [1]) for input_ in inputs]
                #inputs = tf.split(em, seq_length, 1)
                
                



        # The value of state is updated after processing each batch of chars.
        outputs, last_state = tf.contrib.legacy_seq2seq.rnn_decoder(inputs, self.initial_state, self.stacked_cell, loop_function=None, scope='rnnlm_class1')
        output = tf.reshape(tf.concat(outputs,1), [-1, rnn_size])
        self.logits = tf.matmul(output, softmax_w) + softmax_b
        self.probs = tf.nn.softmax(self.logits)
        loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example([self.logits],
                [tf.reshape(self.targets, [-1])],
                [tf.ones([batch_size * seq_length])],
                vocab_size)
        self.cost = tf.reduce_sum(loss) / batch_size / seq_length
        self.final_state = last_state
        self.lr = tf.Variable(0.0, trainable=False)
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),grad_clip)
        optimizer = tf.train.AdamOptimizer(self.lr)
        self.train_op = optimizer.apply_gradients(zip(grads, tvars))
        
    def sample(self, sess, chars, vocab, num=200, prime='The ', sampling_type=1):
        state = sess.run(self.stacked_cell.zero_state(1, tf.float32))
        print state
        for char in prime[:-1]:
            x = np.zeros((1, 1))
            x[0, 0] = vocab[char]
            feed = {self.input_data: x, self.initial_state:state}
            [state] = sess.run([self.final_state], feed)

        def weighted_pick(weights):
            t = np.cumsum(weights)
            s = np.sum(weights)
            return(int(np.searchsorted(t, np.random.rand(1)*s)))

        ret = prime
        char = prime[-1]
        for n in range(num):
            x = np.zeros((1, 1))
            x[0, 0] = vocab[char]
            feed = {self.input_data: x, self.initial_state:state}
            [probs, state] = sess.run([self.probs, self.final_state], feed)
            p = probs[0]

            if sampling_type == 0:
                sample = np.argmax(p)
            elif sampling_type == 2:
                if char == ' ':
                    sample = weighted_pick(p)
                else:
                    sample = np.argmax(p)
            else: # sampling_type == 1 default:
                sample = weighted_pick(p)

            pred = chars[sample]
            ret += pred
            char = pred
        return ret


the input is always a matrix of of shape [n x m]. Where n is the batch size, m is the feature size. 
In our case, the input shape will be [60 x ??]. 

 
size of data is 1113000, number of batches are 371, batch size is 60 and sequence length is 50. so, 50*60*371= 1113000

we have 50 epochs. 
each input matrix will represent 1 update per epoch.

### Creating the LSTM object

In [50]:
with tf.variable_scope("rnn"):
    model = LSTMModel()

In [51]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
e=1
sess.run(tf.assign(model.lr, learning_rate * (decay_rate ** e)))
data_loader.reset_batch_pointer()
state = sess.run(model.initial_state)
state

(array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32),
 array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32))

In [52]:
x, y = data_loader.next_batch()
feed = {model.input_data: x, model.targets: y, model.initial_state:state}

In [53]:
train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
train_loss

4.2406917

In [54]:
state

(array([[ 0.16106769,  0.09671967,  0.05100814, ..., -0.01635377,
         -0.05336112,  0.05017365],
        [ 0.03706374,  0.09157047,  0.16957512, ...,  0.35060707,
          0.06505036,  0.16768083],
        [-0.06861316,  0.35269612,  0.01392379, ..., -0.3016403 ,
          0.15793857, -0.03690971],
        ..., 
        [-0.09782275,  0.30612332,  0.01817104, ...,  0.12719224,
         -0.07495581, -0.02366222],
        [ 0.12180551,  0.20894924, -0.01887005, ...,  0.00422742,
          0.11226939, -0.06858645],
        [-0.09287884, -0.01052966,  0.20708023, ..., -0.0488694 ,
         -0.0396448 , -0.02352033]], dtype=float32),
 array([[-0.14369003, -0.15190348, -0.10946867, ..., -0.19686896,
         -0.26301304,  0.02058302],
        [ 0.01852476, -0.09826656,  0.04434421, ...,  0.05310607,
          0.07389302, -0.05907106],
        [ 0.0751107 ,  0.22561377, -0.0044577 , ...,  0.28249529,
         -0.25592875,  0.29575139],
        ..., 
        [-0.07900319, -0.03534848,  0

# Train usinng LSTMModel class

In [55]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for e in range(num_epochs): # num_epochs is 5 for test, but should be higher
        sess.run(tf.assign(model.lr, learning_rate * (decay_rate ** e)))
        data_loader.reset_batch_pointer()
        state = sess.run(model.initial_state) # (2x[60x128])
        for b in range(data_loader.num_batches): #for each batch
            start = time.time()
            x, y = data_loader.next_batch()
            feed = {model.input_data: x, model.targets: y, model.initial_state:state}
            train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
            end = time.time()
        print("{}/{} (epoch {}), train_loss = {:.3f}, time/batch = {:.3f}" \
                .format(e * data_loader.num_batches + b, num_epochs * data_loader.num_batches, e, train_loss, end - start))
        #model.sample(sess, data_loader.chars , data_loader.vocab, num=200, prime='The ', sampling_type=1)

370/9275 (epoch 0), train_loss = 1.875, time/batch = 0.043
741/9275 (epoch 1), train_loss = 1.708, time/batch = 0.041
1112/9275 (epoch 2), train_loss = 1.633, time/batch = 0.054
1483/9275 (epoch 3), train_loss = 1.593, time/batch = 0.046
1854/9275 (epoch 4), train_loss = 1.567, time/batch = 0.053
2225/9275 (epoch 5), train_loss = 1.548, time/batch = 0.041
2596/9275 (epoch 6), train_loss = 1.535, time/batch = 0.039
2967/9275 (epoch 7), train_loss = 1.525, time/batch = 0.049
3338/9275 (epoch 8), train_loss = 1.517, time/batch = 0.040
3709/9275 (epoch 9), train_loss = 1.509, time/batch = 0.043
4080/9275 (epoch 10), train_loss = 1.502, time/batch = 0.047
4451/9275 (epoch 11), train_loss = 1.494, time/batch = 0.040
4822/9275 (epoch 12), train_loss = 1.487, time/batch = 0.036
5193/9275 (epoch 13), train_loss = 1.481, time/batch = 0.039
5564/9275 (epoch 14), train_loss = 1.477, time/batch = 0.038
5935/9275 (epoch 15), train_loss = 1.473, time/batch = 0.036
6306/9275 (epoch 16), train_loss = 1

# Sample

In [56]:
sess = tf.InteractiveSession()
with tf.variable_scope("sample_test"):
    sess.run(tf.global_variables_initializer())
    m = LSTMModel(sample=True)

sample mode


In [57]:
prime='The '
num=200
sampling_type=1
vocab=data_loader.vocab
chars=data_loader.chars 

In [58]:
sess.run(m.initial_state)

(array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32),
 array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
          0.,  0.,  0.,  0.,  

In [59]:
#print state
sess.run(tf.global_variables_initializer())
state=sess.run(m.initial_state)
for char in prime[:-1]:
    x = np.zeros((1, 1))
    x[0, 0] = vocab[char]
    feed = {m.input_data: x, m.initial_state:state}
    [state] = sess.run([m.final_state], feed)

In [60]:
state

(array([[-0.05878098,  0.05151086, -0.01658276, -0.01361227, -0.07602809,
          0.09598655, -0.15346964, -0.29651856, -0.14367703, -0.03699737,
         -0.13778403,  0.0206705 , -0.0425937 ,  0.06101717,  0.01471224,
          0.07944712, -0.00359382,  0.07064421, -0.05881016,  0.02421402,
          0.27702171,  0.09948913,  0.01396049,  0.24405159, -0.15082668,
          0.09722668,  0.04478887, -0.13433081, -0.18657054, -0.0896575 ,
          0.02983936,  0.14230509, -0.08031639, -0.07138512, -0.01722506,
         -0.07718454, -0.06950866, -0.16460122,  0.0498888 , -0.01911711,
         -0.00944835,  0.03433346, -0.19992422, -0.12076587,  0.07408935,
         -0.01468888,  0.32369787, -0.09865332,  0.19121619,  0.09646589,
          0.14042749,  0.11068583, -0.00829003,  0.02230815, -0.08730035,
         -0.13893236, -0.1292657 , -0.18696299,  0.14410619,  0.1721911 ,
         -0.06488436,  0.17704102,  0.13975242, -0.16831143,  0.04572584,
         -0.07830398, -0.14049169,  0.

In [61]:
def weighted_pick(weights):
    t = np.cumsum(weights)
    s = np.sum(weights)
    return(int(np.searchsorted(t, np.random.rand(1)*s)))

ret = prime
char = prime[-1]
for n in range(num):
    x = np.zeros((1, 1))
    x[0, 0] = vocab[char]
    feed = {m.input_data: x, m.initial_state:state}
    [probs, state] = sess.run([m.probs, m.final_state], feed)
    p = probs[0]

    if sampling_type == 0:
        sample = np.argmax(p)
    elif sampling_type == 2:
        if char == ' ':
            sample = weighted_pick(p)
        else:
            sample = np.argmax(p)
    else: # sampling_type == 1 default:
        sample = weighted_pick(p)

    pred = chars[sample]
    ret += pred
    char = pred


In [62]:
ret

u"The grWax'WNnm$oO;3KPcu!REmeIIcbN.DDUzp\nlNvF!NyKJgAr'qeSKgR'h,;WHOkBpKdGJjGX'x?IKJLQ!JEkzpqCwM-\n'hZCxF;rSVNEdXelGltCJOlj:Gua:pvyr&oQ\nkVvMh''D?B$AgxEX!i:P:w.cz'JV!U'TNKvYwTizAgJ?fUgZkSI$kJnp3'lWJi$&ZIARViD"

# Sample using function

In [63]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
state=sess.run(m.initial_state)
m.sample(sess, data_loader.chars , data_loader.vocab, num=200, prime='The ', sampling_type=1)

(array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]], dtype=float32), array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,

u"The  B&;x-kHZTs;suVlcjvgAe-M!EnlK$EaeseKcRCsTGWEC..M?Gudy$lIu'&KbvUOE!BJ&HGb,:CdJcpSdiV'ifX..:rdl.Dio:qIMxgsdXotdxJV i3'BekMuQ;BQdWqmdgW$f&Crw'UKMYGag:Hegq l$?Twh$wmHsbEP,kssVksshzAxY$ggHQe$dOxrwQHHP\n3g$v"