<a href="https://colab.research.google.com/github/shubhamsks/deep-learning/blob/master/predicting_next_character_RNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

In [0]:
with open('anna.txt','r') as f:
  text = f.read()


In [0]:
vocab = sorted(set(text))

In [0]:
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))

# Encoding the character to integers because computers don't understand text
encoded = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

In [15]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

In [16]:
encoded[:100]

array([25, 57, 50, 65, 69, 54, 67,  1, 11,  0,  0,  0, 30, 50, 65, 65, 74,
        1, 55, 50, 62, 58, 61, 58, 54, 68,  1, 50, 67, 54,  1, 50, 61, 61,
        1, 50, 61, 58, 60, 54, 21,  1, 54, 71, 54, 67, 74,  1, 70, 63, 57,
       50, 65, 65, 74,  1, 55, 50, 62, 58, 61, 74,  1, 58, 68,  1, 70, 63,
       57, 50, 65, 65, 74,  1, 58, 63,  1, 58, 69, 68,  1, 64, 72, 63,  0,
       72, 50, 74,  9,  0,  0, 27, 71, 54, 67, 74, 69, 57, 58, 63],
      dtype=int32)

In [17]:
len(vocab)

76

In [0]:
def get_batches(arr, batch_size, n_steps):
    '''Create a generator that returns batches of size
       batch_size x n_steps from arr.
       
       Arguments
       ---------
       arr: Array you want to make batches from
       batch_size: Batch size, the number of sequences per batch
       n_steps: Number of sequence steps per batch
    '''
    # Get the number of characters per batch and number of batches we can make
    chars_per_batch = batch_size * n_steps
    n_batches = len(arr)//chars_per_batch
    
    # Keep only enough characters to make full batches
    arr = arr[:n_batches * chars_per_batch]
    
    # Reshape into batch_size rows
    arr = arr.reshape((batch_size, -1))
    
    for n in range(0, arr.shape[1], n_steps):
        # The features
        x = arr[:, n:n+n_steps]
        # The targets, shifted by one
        y_temp = arr[:, n+1:n+n_steps+1]
        #trick by udacity 
        # For the very last batch, y will be one character short at the end of 
        # the sequences which breaks things. To get around this, I'll make an 
        # array of the appropriate size first, of all zeros, then add the targets.
        # This will introduce a small artifact in the last batch, but it won't matter.
        y = np.zeros(x.shape, dtype=x.dtype)
        y[:,:y_temp.shape[1]] = y_temp
        
        yield x, y

In [0]:
batches = get_batches(encoded, 10, 50)
x, y = next(batches)

In [20]:
print('x')
print(x[:10,:20])
print('y')
print(y[:10,:20])

x
[[25 57 50 65 69 54 67  1 11  0  0  0 30 50 65 65 74  1 55 50]
 [69 57 54  1 66 70 54 68 69 58 64 63  1 58 63  1 69 57 54  1]
 [68 70 55 55 54 67 58 63 56  7  1 50 63 53  1 52 67 70 54 61]
 [63 20  1 69 72 64  1 55 54 68 69 58 71 54  1 74 64 70 63 56]
 [53 53 54 63 61 74  7  1 61 50 74 58 63 56  1 53 64 72 63  1]
 [54 61 61  2  1 24 70 69  1 31  1 54 73 65 54 52 69  1 50  1]
 [70 56 57 69  1 57 54  1 62 70 68 69  1 55 50 61 61  7  1 50]
 [54  1 74 64 70  9  1 42 54 50  1 58 63  1 69 57 54  1 61 58]
 [54 67 68  1 51 54 52 50 70 68 54  1 69 57 54  1 65 54 64 65]
 [72 64 62 50 63  9  1 28 64 67  1 29 64 53  4 68  1 68 50 60]]
y
[[57 50 65 69 54 67  1 11  0  0  0 30 50 65 65 74  1 55 50 62]
 [57 54  1 66 70 54 68 69 58 64 63  1 58 63  1 69 57 54  1 62]
 [70 55 55 54 67 58 63 56  7  1 50 63 53  1 52 67 70 54 61  7]
 [20  1 69 72 64  1 55 54 68 69 58 71 54  1 74 64 70 63 56  1]
 [53 54 63 61 74  7  1 61 50 74 58 63 56  1 53 64 72 63  1 69]
 [61 61  2  1 24 70 69  1 31  1 54 73 65 54 52 69 

In [0]:
def build_inputs(batch_size, num_steps):
    ''' Define placeholders for inputs, targets, and dropout 
    
        Arguments
        ---------
        batch_size: Batch size, number of sequences per batch
        num_steps: Number of sequence steps in a batch
        
    '''
    # Declare placeholders we'll feed into the graph
    inputs = tf.placeholder(shape = (batch_size, num_steps), dtype = tf.int32, name='inputs')
    targets = tf.placeholder(shape =(batch_size, num_steps),dtype = tf.int32, name = 'targets')
    
    # Keep probability placeholder for drop out layers
    keep_prob = tf.placeholder(tf.float32, name = 'keep_prob')
    
    return inputs, targets, keep_prob

In [0]:
def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
    ''' Build LSTM cell.
    
        Arguments
        ---------
        keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
        lstm_size: Size of the hidden layers in the LSTM cells
        num_layers: Number of LSTM layers
        batch_size: Batch size

    '''
    ### Build the LSTM Cell
    
    def build_cell(lstm_size, keep_prob):
        # Use a basic LSTM cell
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
        
        # Add dropout to the cell
        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        return drop
    
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, initial_state

### RNN Output

Here we'll create the output layer. We need to connect the output of the RNN cells to a full connected layer with a softmax output. The softmax output gives us a probability distribution we can use to predict the next character.

If our input has batch size $N$, number of steps $M$, and the hidden layer has $L$ hidden units, then the output is a 3D tensor with size $N \times M \times L$. The output of each LSTM cell has size $L$, we have $M$ of them, one for each sequence step, and we have $N$ sequences. So the total size is $N \times M \times L$.

We are using the same fully connected layer, the same weights, for each of the outputs. Then, to make things easier, we should reshape the outputs into a 2D tensor with shape $(M * N) \times L$. That is, one row for each sequence and step, where the values of each row are the output from the LSTM cells.

One we have the outputs reshaped, we can do the matrix multiplication with the weights. We need to wrap the weight and bias variables in a variable scope with `tf.variable_scope(scope_name)` because there are weights being created in the LSTM cells. TensorFlow will throw an error if the weights created here have the same names as the weights created in the LSTM cells, which they will be default. To avoid this, we wrap the variables in a variable scope so we can give them unique names.

In [0]:
# RNN output
def build_output(lstm_output, in_size, out_size):
    ''' Build a softmax layer, return the softmax output and logits.
    
        Arguments
        ---------
        
        lstm_output: List of output tensors from the LSTM layer
        in_size: Size of the input tensor, for example, size of the LSTM cells
        out_size: Size of this softmax layer
    
    '''

    # Reshape output so it's a bunch of rows, one row for each step for each sequence.
    # Concatenate lstm_output over axis 1 (the columns)
    seq_output = tf.concat(lstm_output, axis = 1)
    # Reshape seq_output to a 2D tensor with lstm_size columns
    x = tf.reshape(seq_output,[-1, in_size])
    
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        # Create the weight and bias variables here
        softmax_w = tf.Variable(tf.truncated_normal((in_size, out_size),stddev=0.1))
        softmax_b = tf.Variable(tf.zeros(out_size))
    
    # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
    # of rows of logit outputs, one for each step and sequence
    logits = tf.matmul(x, softmax_w) + softmax_b
    
    # Use softmax to get the probabilities for predicted characters
    out = tf.nn.softmax(logits, name = 'predictions')
    
    return out, logits

In [0]:
# Training Loss
def build_loss(logits, targets, lstm_size, num_classes):
    ''' Calculate the loss from the logits and the targets.
    
        Arguments
        ---------
        logits: Logits from final fully connected layer
        targets: Targets for supervised learning
        lstm_size: Number of LSTM hidden units
        num_classes: Number of classes in targets
        
    '''
    
    # One-hot encode targets and reshape to match logits, one row per sequence per step
    y_one_hot = tf.one_hot(targets, num_classes)
    y_reshaped =  tf.reshape(y_one_hot,logits.get_shape())
    
    # Softmax cross entropy loss
    loss = tf.nn.softmax_cross_entropy_with_logits(labels = y_reshaped, logits = logits)
    
    
    return loss

In [0]:
# optmizer
def build_optimizer(loss, learning_rate, grad_clip):
    ''' Build optmizer for training, using gradient clipping.
    
        Arguments:
        loss: Network loss
        learning_rate: Learning rate for optimizer
    
    '''
    
    # Optimizer for training, using gradient clipping to control exploding gradients
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
    train_op = tf.train.AdamOptimizer(learning_rate)
    optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    return optimizer

In [0]:
class CharRNN:
    
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       lstm_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, sampling=False):
    
        # When we're using this network for sampling later, we'll be passing in
        # one character at a time, so providing an option for that
        if sampling == True:
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps

        tf.reset_default_graph()
        
        # Build the input placeholder tensors
        self.inputs, self.targets, self.keep_prob = build_inputs(batch_size,num_steps)

        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, self.keep_prob)

        ### Run the data through the RNN layers
        # First, one-hot encode the input tokens
        x_one_hot = tf.one_hot(self.inputs, num_classes)
        
        # Run each sequence step through the RNN with tf.nn.dynamic_rnn 
        outputs, state = tf.nn.dynamic_rnn(cell= cell, inputs = x_one_hot,initial_state= self.initial_state)
        self.final_state = state
        
        # Get softmax predictions and logits
        self.prediction, self.logits = build_output(outputs,lstm_size,num_classes)
        
        # Loss and optimizer (with gradient clipping)
        self.loss =  build_loss(self.logits,self.targets, lstm_size, num_classes)
        self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)

In [0]:
# setting the hyper parameters 
batch_size = 100        # Sequences per batch
num_steps = 100         # Number of sequence steps per batch
lstm_size = 512         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.001   # Learning rate
keep_prob = 0.5         # Dropout keep probability

In [29]:
epochs = 20
# Print losses every N interations
print_every_n = 50

# Save every N iterations
save_every_n = 200

model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/______.ckpt')
    counter = 0
    for e in range(epochs):
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for x, y in get_batches(encoded, batch_size, num_steps):
            counter += 1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)
            if (counter % print_every_n == 0):
                end = time.time()
                print('Epoch: {}/{}... '.format(e+1, epochs),
                      'Training Step: {}... '.format(counter),
                      'Training loss: {}... '.format(batch_loss),
                      '{:.4f} sec/batch'.format((end-start)))
        
            if (counter % save_every_n == 0):
                saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
    
    saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))

Epoch: 1/20...  Training Step: 50...  Training loss: [3.8588972 2.8508182 4.275669  ... 1.9458618 4.499959  2.0578387]...  0.1464 sec/batch
Epoch: 1/20...  Training Step: 100...  Training loss: [1.2806416  0.9940436  2.469316   ... 1.1012967  0.47052938 3.466614  ]...  0.1569 sec/batch
Epoch: 2/20...  Training Step: 150...  Training loss: [0.19318552 2.758006   0.6432263  ... 1.0827253  2.5662653  0.8128898 ]...  0.1500 sec/batch
Epoch: 2/20...  Training Step: 200...  Training loss: [0.5710872  5.759434   0.06132247 ... 1.1293353  1.4260061  3.1424685 ]...  0.1585 sec/batch
Epoch: 3/20...  Training Step: 250...  Training loss: [4.3598833  0.74093246 2.3057022  ... 4.369235   1.925463   0.5429019 ]...  0.1548 sec/batch
Epoch: 3/20...  Training Step: 300...  Training loss: [0.95164865 3.1261916  0.357482   ... 0.35111293 4.680373   0.77234316]...  0.1519 sec/batch
Epoch: 4/20...  Training Step: 350...  Training loss: [2.2194192 2.5235195 0.445639  ... 3.7747998 1.626569  3.4196167]...  0

In [30]:
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints/i2040_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2040_l512.ckpt"

In [0]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [0]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    samples = [c for c in prime]
    model = CharRNN(len(vocab), lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

In [33]:
tf.train.latest_checkpoint('checkpoints')

'checkpoints/i2040_l512.ckpt'

In [34]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)

W0901 17:49:37.816656 140704822318976 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.


Farence, he
had so that she cuuld be a still because, but he went on: "To the mother
with her to be angaids, the people can be stream of it. I all
sorry of it?"

"Yes!."

She saw the statious for the sturn at the troom, and as how the mistake he
felt as the calmiss with his fateres, and had been answered, and too to her
sen, a consciousness was a smile.

"Why, a someone and I was to be to be done, would not be afread of a
shurters of as a moment. It were at in all, but there was to myself,"
asked Stepan Arkadyevitch, sister as settled in his hands in his must
before her face.

He wanted to talk of her son, sight of a stream of this works and he come
to all the way in the same simply one and still and worked it or how to a
minute shades brown first to be sont of the fellow. But in a conversation had
been same to hear and all about him in the sound of her horses, and he had
not seared a little.

"It is impossible to say this in his been feeling of my coming our
often their subject. I was

In [35]:
checkpoint = 'checkpoints/i200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Fart, and
hit out shat the hat ins apes is ofele the tith tart he sonterse fromith, whing with we hos to he
soringiton, and seiled, anding heress to the hins his sontit of his the chics on hit hint, and his hougt thouls and tho the sale ant the sall wing the wishe and thousstenterstered of in has torigent time on and the thongented ind ous and of tor thard
and, she she hard
ofthith wast hat has shit ase tont hat the wasd hishid. 
And her tin she somen teen of the tare singe fas inte sat at her
ince the ther tom ithe his att alsing hourd had had wassele wasdind and to her hes its the to cimse to shis sound the sore tith, the sher as in ther hest had shithe fallingele he his he wals there she the the sithen with hit hers in the shrowis the toule the har wittele heredste of
inter that ard to calle her, the sind and thin whe sad of the momed in wan and wish whas head to core ho hishand the cimlessed it and and of in at intinting of
his hither soult and
it the hat to thet sone to he
tariste

In [36]:
checkpoint = 'checkpoints/i600_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Farning as a cart and that they.

The ploase, a conlly as they wan of it, and what had been steplow him to at allove the colligh, wish and say, to be with the clater," she said. "I dad you was the sand? But'y a leed be must and had been take,. What wish you ase her as hid sore wene... What said they he was allouse.... What wime they say all, and that was now that to the crunter at in the sort the clild of all sering of heres, ald and took over, but is. I day's beare they so was that's a serting as they wine so the motting to all the work, was at the could stenting him havis to be at him."

Stepal Arkadyevitch was not see a lask in the was and sail
of the cantions, the poresion had a stelp him that theyer, he had to see in the complearanters of a marest, and her his
charres. She whule sat home her he was to her thisk op of and and sat
intenting the way to an his wonters, whom atay ther all to hand a comares as
he did not can that was a child,
betine have something hears at her as is wal

In [37]:
checkpoint = 'checkpoints/i1200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Farning off her face, she was the princess,
and all the telled that he was not the came of seticiation of
the same stricted hen there, and as a from thinging on the statting of
his face. And there's no dring all the doiling with his face and the
people, he said to her shining of, and he saw, a child things of the master
of
templess that it seemed and sen out of his back in all the prestion of
the mother. Have note the consequent the same them a dream, to the
sons.

"I am to speak of it. That's
to be?" he said.

"If it tell you with me. It thought it is, and I suppossed you.
The measurss the sume as it was a laborer in an inside, a cartiant
of his brought?" he said.

"If it is a person of an orce, and though there would be something in the
round of the proncession of her," said Alexey Alexandrovitch
had not before the prencess of the starding of a sont of the sawes and
senst of insuctess in the servant, that
she cared that to sut to her all sure anything to the
start, the plonce was sor