In [1]:
import numpy as np
import time
import tensorflow as tf
from collections import namedtuple

  from ._conv import register_converters as _register_converters


First we'll load the text file and convert it into integers for our network to use. Here I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network.

In [5]:
with open('anna.txt','r') as f:
    text=f.read()
vocab=sorted(set(text))
chat_to_int = {c:i for i,c in enumerate(vocab)}
int_to_car = dict(enumerate(vocab))
encoded = np.array([chat_to_int[c] for c in text],dtype=np.int32)

In [7]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

In [8]:
encoded[:100]

array([31, 64, 57, 72, 76, 61, 74,  1, 16,  0,  0,  0, 36, 57, 72, 72, 81,
        1, 62, 57, 69, 65, 68, 65, 61, 75,  1, 57, 74, 61,  1, 57, 68, 68,
        1, 57, 68, 65, 67, 61, 26,  1, 61, 78, 61, 74, 81,  1, 77, 70, 64,
       57, 72, 72, 81,  1, 62, 57, 69, 65, 68, 81,  1, 65, 75,  1, 77, 70,
       64, 57, 72, 72, 81,  1, 65, 70,  1, 65, 76, 75,  1, 71, 79, 70,  0,
       79, 57, 81, 13,  0,  0, 33, 78, 61, 74, 81, 76, 64, 65, 70])

Since the network is working with individual characters, it's similar to a classification problem in which we are trying to predict the next character from the previous text. Here's how many 'classes' our network has to pick from.

In [11]:
len(vocab)

83

We have our text encoded as integers as one long array in encoded. Let's create a function that will give us an iterator for our batches. I like using generator functions to do this. Then we can pass encoded into this function and get our batch generator.

The first thing we need to do is discard some of the text so we only have completely full batches. Each batch contains $N \times M$ characters, where $N$ is the batch size (the number of sequences) and $M$ is the number of steps. Then, to get the number of batches we can make from some array arr, you divide the length of arr by the batch size. Once you know the number of batches and the batch size, you can get the total number of characters to keep.

After that, we need to split arr into $N$ sequences. You can do this using arr.reshape(size) where size is a tuple containing the dimensions sizes of the reshaped array. We know we want $N$ sequences (n_seqs below), let's make that the size of the first dimension. For the second dimension, you can use -1 as a placeholder in the size, it'll fill up the array with the appropriate data for you. After this, you should have an array that is $N \times (M * K)$ where $K$ is the number of batches.

Now that we have this array, we can iterate through it to get our batches. The idea is each batch is a $N \times M$ window on the array. For each subsequent batch, the window moves over by n_steps. We also want to create both the input and target arrays. Remember that the targets are the inputs shifted over one character. You'll usually see the first input character used as the last target character, so something like this:

y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
where x is the input batch and y is the target batch.

The way I like to do this window is use range to take steps of size n_steps from $0$ to arr.shape[1], the total number of steps in each sequence. That way, the integers you get from range always point to the start of a batch, and each window is n_steps wide.

In [15]:
encoded.shape

(1985223,)

In [120]:
def create_batches(arr,seq_n,step_n):
    '''Create a generator that returns batches of size
       n_seqs x n_steps from arr.
    seq_n = batch size , number of seq per batch
    step_n = number of time setps per batch
    '''
    # Get the number of characters per batch and number of batches we can make
    totalnochar=seq_n*step_n
    n_batches= len(arr)//totalnochar
    
    # Keep only enough characters to make full batches
    arr=arr[:totalnochar*n_batches]
    
    # Reshape into n_seqs rows
    arr = arr.reshape((seq_n,-1))
    #print (arr.shape)
    
    for n in range(0,arr.shape[1],step_n):
        # The features
        x = arr[:, n:n+step_n]
        # The targets, shifted by one
        y=np.zeros_like(x)
        y[:, :-1], y[:, -1] = x[:, 1:], x[:,0]
        yield x,y
    
    
    

In [18]:
#Visualize what is happpening

# In this code n_steps is the size of horizontal width of what moves
# And n_seqs is the no. of rows 
# y is the same size as x which is n_steps just one shifted right
n_seqs = 10
n_steps = 50
arr = encoded

characters_per_batch = n_seqs*n_steps
n_batches = len(arr)//characters_per_batch

arr = arr[:characters_per_batch*n_batches]
# Reshape into n_seqs rows
arr = arr.reshape((n_seqs,-1))
print(arr.shape)
print(arr)
for n in range(0, arr.shape[1], n_steps)[:1]:
    # The features
    x = arr[:, n:n+n_steps]
    # The targets, shifted by one
    y = np.zeros_like(x)
    y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
    print("x is: ", x)
    print("y is: ", y)

(10, 198500)
[[31 64 57 ... 11  1 37]
 [ 1 57 69 ...  1 40 61]
 [78 65 70 ... 61 78 65]
 ...
 [26  1 58 ... 81  1 65]
 [76  1 65 ... 75 64 61]
 [ 1 75 57 ... 65 71 70]]
x is:  [[31 64 57 72 76 61 74  1 16  0  0  0 36 57 72 72 81  1 62 57 69 65 68 65
  61 75  1 57 74 61  1 57 68 68  1 57 68 65 67 61 26  1 61 78 61 74 81  1
  77 70]
 [ 1 57 69  1 70 71 76  1 63 71 65 70 63  1 76 71  1 75 76 57 81 11  3  1
  57 70 75 79 61 74 61 60  1 29 70 70 57 11  1 75 69 65 68 65 70 63 11  1
  58 77]
 [78 65 70 13  0  0  3 53 61 75 11  1 65 76  7 75  1 75 61 76 76 68 61 60
  13  1 48 64 61  1 72 74 65 59 61  1 65 75  1 69 57 63 70 65 62 65 59 61
  70 76]
 [70  1 60 77 74 65 70 63  1 64 65 75  1 59 71 70 78 61 74 75 57 76 65 71
  70  1 79 65 76 64  1 64 65 75  0 58 74 71 76 64 61 74  1 79 57 75  1 76
  64 65]
 [ 1 65 76  1 65 75 11  1 75 65 74  2  3  1 75 57 65 60  1 76 64 61  1 71
  68 60  1 69 57 70 11  1 63 61 76 76 65 70 63  1 77 72 11  1 57 70 60  0
  59 74]
 [ 1 37 76  1 79 57 75  0 71 70 68 81  

In [88]:
batches=create_batches(encoded,10,50)
x,y=next(batches)

In [93]:
x[9,:]

array([ 1, 75, 57, 65, 60,  1, 76, 71,  1, 64, 61, 74, 75, 61, 68, 62, 11,
        1, 57, 70, 60,  1, 58, 61, 63, 57, 70,  1, 57, 63, 57, 65, 70,  1,
       62, 74, 71, 69,  1, 76, 64, 61,  1, 58, 61, 63, 65, 70, 70, 65])

Inputs
First off we'll create our input placeholders. As usual we need placeholders for the training data and the targets. We'll also create a placeholder for dropout layers called keep_prob. This will be a scalar, that is a 0-D tensor. To make a scalar, you create a placeholder without giving it a size.

In [26]:
def build_inputs(batch_size,num_steps):
    inputs = tf.placeholder(tf.int32,shape=[batch_size,num_steps],name='input')
    output = tf.placeholder(tf.int32,shape=[batch_size,num_steps],name='output')
    keep_proba = tf.placeholder(tf.float32,name='keep_proba')
    
    return inputs,output,keep_proba

LSTM Cell
Here we will create the LSTM cell we'll use in the hidden layer. We'll use this cell as a building block for the RNN. So we aren't actually defining the RNN here, just the type of cell we'll use in the hidden layer.

We first create a basic LSTM cell with

lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
where num_units is the number of units in the hidden layers in the cell. Then we can add dropout by wrapping it with

tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
You pass in a cell and it will automatically add dropout to the inputs or outputs. Finally, we can stack up the LSTM cells into layers with tf.contrib.rnn.MultiRNNCell. With this, you pass in a list of cells and it will send the output of one cell into the next cell. Previously with TensorFlow 1.0, you could do this

tf.contrib.rnn.MultiRNNCell([cell]*num_layers)
This might look a little weird if you know Python well because this will create a list of the same cell object. However, TensorFlow 1.0 will create different weight matrices for all cell objects. But, starting with TensorFlow 1.1 you actually need to create new cell objects in the list. To get it to work in TensorFlow 1.1, it should look like

def build_cell(num_units, keep_prob):
    lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)

    return drop

tf.contrib.rnn.MultiRNNCell([build_cell(num_units, keep_prob) for _ in range(num_layers)])
Even though this is actually multiple LSTM cells stacked on each other, you can treat the multiple layers as one cell.

We also need to create an initial cell state of all zeros. This can be done like so

initial_state = cell.zero_state(batch_size, tf.float32)
Below, we implement the build_lstm function to create these LSTM cells and the initial state.

In [46]:
def build_lstm(lstm_size,num_layers,batch_size,keep_proba):
    ''' Build LSTM cell.
    
        Arguments
        ---------
        keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
        lstm_size: Size of the hidden layers in the LSTM cells
        num_layers: Number of LSTM layers
        batch_size: Batch size

    '''
    
    def build_cell(lstm_size,keep_proba):
        cell=tf.contrib.rnn.BasicLSTMCell(lstm_size)
        drop=tf.contrib.rnn.DropoutWrapper(cell,output_keep_prob=keep_proba)
        
        return drop
    # Stack up multiple LSTM layers, for deep learning
    
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size,keep_proba) for _ in range(num_layers)])
    initial_state = cell.zero_state(batch_size,tf.float32)
    
    return cell,initial_state
        

# RNN Output

Here we'll create the output layer. We need to connect the output of the RNN cells to a full connected layer with a softmax output. The softmax output gives us a probability distribution we can use to predict the next character, so we want this layer to have size $C$, the number of classes/characters we have in our text.

If our input has batch size $N$, number of steps $M$, and the hidden layer has $L$ hidden units, then the output is a 3D tensor with size $N \times M \times L$. The output of each LSTM cell has size $L$, we have $M$ of them, one for each sequence step, and we have $N$ sequences. So the total size is $N \times M \times L$

We are using the same fully connected layer, the same weights, for each of the outputs. Then, to make things easier, we should reshape the outputs into a 2D tensor with shape $(M * N) \times L$. That is, one row for each sequence and step, where the values of each row are the output from the LSTM cells. We get the LSTM output as a list, lstm_output. First we need to concatenate this whole list into one array with tf.concat. Then, reshape it (with tf.reshape) to size $(M * N) \times L$.

One we have the outputs reshaped, we can do the matrix multiplication with the weights. We need to wrap the weight and bias variables in a variable scope with tf.variable_scope(scope_name) because there are weights being created in the LSTM cells. TensorFlow will throw an error if the weights created here have the same names as the weights created in the LSTM cells, which they will be default. To avoid this, we wrap the variables in a variable scope so we can give them unique names.

In [56]:
def build_output(lstm_output, in_size, out_size):
    ''' Build a softmax layer, return the softmax output and logits.
    
        Arguments
        ---------
        
        lstm_output: List of output tensors from the LSTM layer
        in_size: Size of the input tensor, for example, size of the LSTM cells
        out_size: Size of this softmax layer
    
    '''
    
    # Reshape output so it's a bunch of rows, one row for each step for each sequence.
    # Concatenate lstm_output over axis 1 (the columns)
    
    seq_out = tf.concat(lstm_output,axis=1)
    
    # Reshape seq_output to a 2D tensor with lstm_size columns
    
    x = tf.reshape(seq_out,([-1,in_size]))
    
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        softmax_w=tf.Variable(tf.truncated_normal((in_size,out_size),stddev=0.1))
        softmax_b=tf.Variable(tf.zeros(out_size))
        
    # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
    # of rows of logit outputs, one for each step and sequence
    logits = tf.matmul(x,softmax_w)+softmax_b
    
    # Use softmax to get the probabilities for predicted characters
    out = tf.nn.softmax(logits,name='predictions')
    
    return out, logits

# Training loss

Next up is the training loss. We get the logits and targets and calculate the softmax cross-entropy loss. First we need to one-hot encode the targets, we're getting them as encoded characters. Then, reshape the one-hot targets so it's a 2D tensor with size $(M*N) \times C$ where $C$ is the number of classes/characters we have. Remember that we reshaped the LSTM outputs and ran them through a fully connected layer with $C$ units. So our logits will also have size $(M*N) \times C$.

Then we run the logits and targets through tf.nn.softmax_cross_entropy_with_logits and find the mean to get the loss.

In [57]:
def build_loss(logits, targets, lstm_size, num_classes):
    ''' Calculate the loss from the logits and the targets.
    
        Arguments
        ---------
        logits: Logits from final fully connected layer
        targets: Targets for supervised learning
        lstm_size: Number of LSTM hidden units
        num_classes: Number of classes in targets
        
    '''
    
    # One-hot encode targets and reshape to match logits, one row per sequence per step
    y_one_hot = tf.one_hot(targets,num_classes)
    y_reshaped = tf.reshape(y_one_hot,(logits.get_shape()))
    
    # Softmax cross entropy loss
    
    loss = tf.nn.softmax_cross_entropy_with_logits(labels=y_reshaped,logits=logits)
    loss = tf.reduce_mean(loss)
    
    return loss
    

# Optimizer

Here we build the optimizer. Normal RNNs have have issues gradients exploding and disappearing. LSTMs fix the disappearance problem, but the gradients can still grow without bound. To fix this, we can clip the gradients above some threshold. That is, if a gradient is larger than that threshold, we set it to the threshold. This will ensure the gradients never grow overly large. Then we use an AdamOptimizer for the learning step.

In [58]:
def build_optimizer(loss, learning_rate, grad_clip):
    ''' Build optmizer for training, using gradient clipping.
    
        Arguments:
        loss: Network loss
        learning_rate: Learning rate for optimizer
    
    '''
    
    # Optimizer for training, using gradient clipping to control exploding gradients
    
    tvars = tf.trainable_variables()
    grads,_= tf.clip_by_global_norm(tf.gradients(loss,tvars),grad_clip)
    train_op=tf.train.AdamOptimizer(learning_rate=learning_rate)
    optimizer = train_op.apply_gradients(zip(grads,tvars))
    
    return optimizer

# Build the network¶

Now we can put all the pieces together and build a class for the network. To actually run data through the LSTM cells, we will use tf.nn.dynamic_rnn. This function will pass the hidden and cell states across LSTM cells appropriately for us. It returns the outputs for each LSTM cell at each step for each sequence in the mini-batch. It also gives us the final LSTM state. We want to save this state as final_state so we can pass it to the first LSTM cell in the the next mini-batch run. For tf.nn.dynamic_rnn, we pass in the cell and initial state we get from build_lstm, as well as our input sequences. Also, we need to one-hot encode the inputs before going into the RNN.

In [59]:
class charRNN:
    
    def __init__(self, numClasses,batch_size=64,num_steps=50,lstm_size=128,num_layers=2,learning_rate=0.001,
                grad_clip=5,sampling=False):
        
        # When we're using this network for sampling later, we'll be passing in
        # one character at a time, so providing an option for that
        if sampling == True:
            batch_size,num_steps=1,1
        else:
            batch_size,num_steps=batch_size,num_steps
            
        tf.reset_default_graph()
        
        # Build the input placeholder tensors
        self.inputs, self.targets, self.keep_proba = build_inputs(batch_size,num_steps)
        
        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size,num_layers,batch_size,self.keep_proba) 
        
        ### Run the data through the RNN layers
        # First, one-hot encode the input tokens
        x_one_hot = tf.one_hot(self.inputs,numClasses)
        
        # Run each sequence step through the RNN with tf.nn.dynamic_rnn
        
        outputs, state = tf.nn.dynamic_rnn(cell,x_one_hot,initial_state=self.initial_state)
        self.final_state =state
        
        # Get softmax predictions and logits
        self.prediction,self.logits = build_output(outputs,lstm_size,numClasses)
        
        #Loss and optimizer (with gradient clipping )
        self.loss = build_loss(self.logits,self.targets,lstm_size,numClasses)
        self.optimizer = build_optimizer(self.loss,learning_rate,grad_clip)
        
        
        
        
        

In [122]:

batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
lstm_size = 512         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.001    # Learning rate
keep_prob = 0.5         # Dropout keep probability

# Time for training
This is typical training code, passing inputs and targets into the network, then running the optimizer. Here we also get back the final LSTM state for the mini-batch. Then, we pass that state back into the network so the next batch can continue the state from the previous batch. And every so often (set by save_every_n) I save a checkpoint.

Here I'm saving checkpoints with the format

i{iteration number}_l{# hidden layer units}.ckpt

In [123]:
epoch = 20
save_every_n = 200

model = charRNN(len(vocab),batch_size=batch_size,num_steps=num_steps,
                lstm_size=lstm_size,num_layers=num_layers,learning_rate=learning_rate)

saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/______.ckpt')
    
    counter = 0
    
    for e in range (epoch):
        new_state = sess.run(model.initial_state)
        loss = 0
        for x,y in create_batches(encoded, batch_size, num_steps):
        #while True:
            #x,y=next(create_batches(encoded, batch_size, num_steps))
            counter +=1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_proba: keep_prob,
                    model.initial_state: new_state}
            
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)
            end = time.time()
            print('Epoch: {}/{}... '.format(e+1, epoch),
                  'Training Step: {}... '.format(counter),
                  'Training loss: {:.4f}... '.format(batch_loss),
                  #'Training state: {:.4f}... '.format(new_state),
                  '{:.4f} sec/batch'.format((end-start)))
        
            if (counter % save_every_n == 0):
                saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
    
    saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
        

Epoch: 1/20...  Training Step: 1...  Training loss: 4.4188...  4.9150 sec/batch
Epoch: 1/20...  Training Step: 2...  Training loss: 4.3292...  4.4780 sec/batch
Epoch: 1/20...  Training Step: 3...  Training loss: 3.8415...  4.7900 sec/batch
Epoch: 1/20...  Training Step: 4...  Training loss: 5.7999...  4.7750 sec/batch
Epoch: 1/20...  Training Step: 5...  Training loss: 4.1311...  4.7030 sec/batch
Epoch: 1/20...  Training Step: 6...  Training loss: 3.8876...  4.4570 sec/batch
Epoch: 1/20...  Training Step: 7...  Training loss: 3.7852...  4.8420 sec/batch
Epoch: 1/20...  Training Step: 8...  Training loss: 3.6761...  5.2670 sec/batch
Epoch: 1/20...  Training Step: 9...  Training loss: 3.5410...  5.0290 sec/batch
Epoch: 1/20...  Training Step: 10...  Training loss: 3.4720...  4.6490 sec/batch
Epoch: 1/20...  Training Step: 11...  Training loss: 3.3952...  4.9040 sec/batch
Epoch: 1/20...  Training Step: 12...  Training loss: 3.3888...  4.6340 sec/batch
Epoch: 1/20...  Training Step: 13... 

Epoch: 1/20...  Training Step: 103...  Training loss: 3.1137...  10.4080 sec/batch
Epoch: 1/20...  Training Step: 104...  Training loss: 3.1094...  10.4570 sec/batch
Epoch: 1/20...  Training Step: 105...  Training loss: 3.1059...  7.8470 sec/batch
Epoch: 1/20...  Training Step: 106...  Training loss: 3.1035...  6.5900 sec/batch
Epoch: 1/20...  Training Step: 107...  Training loss: 3.1085...  5.4590 sec/batch
Epoch: 1/20...  Training Step: 108...  Training loss: 3.1000...  5.3790 sec/batch
Epoch: 1/20...  Training Step: 109...  Training loss: 3.1209...  5.1960 sec/batch
Epoch: 1/20...  Training Step: 110...  Training loss: 3.0858...  5.2070 sec/batch
Epoch: 1/20...  Training Step: 111...  Training loss: 3.0946...  5.2490 sec/batch
Epoch: 1/20...  Training Step: 112...  Training loss: 3.0999...  5.3920 sec/batch
Epoch: 1/20...  Training Step: 113...  Training loss: 3.0965...  5.1320 sec/batch
Epoch: 1/20...  Training Step: 114...  Training loss: 3.0781...  5.0940 sec/batch
Epoch: 1/20...

Epoch: 2/20...  Training Step: 203...  Training loss: 2.5478...  4.8660 sec/batch
Epoch: 2/20...  Training Step: 204...  Training loss: 2.5408...  5.0830 sec/batch
Epoch: 2/20...  Training Step: 205...  Training loss: 2.5375...  5.8410 sec/batch
Epoch: 2/20...  Training Step: 206...  Training loss: 2.5493...  5.2760 sec/batch
Epoch: 2/20...  Training Step: 207...  Training loss: 2.5564...  5.2060 sec/batch
Epoch: 2/20...  Training Step: 208...  Training loss: 2.5209...  5.2820 sec/batch
Epoch: 2/20...  Training Step: 209...  Training loss: 2.5222...  5.3930 sec/batch
Epoch: 2/20...  Training Step: 210...  Training loss: 2.5318...  5.7850 sec/batch
Epoch: 2/20...  Training Step: 211...  Training loss: 2.5155...  5.2940 sec/batch
Epoch: 2/20...  Training Step: 212...  Training loss: 2.5731...  5.3500 sec/batch
Epoch: 2/20...  Training Step: 213...  Training loss: 2.5400...  5.5410 sec/batch
Epoch: 2/20...  Training Step: 214...  Training loss: 2.5193...  5.1910 sec/batch
Epoch: 2/20...  

Epoch: 2/20...  Training Step: 303...  Training loss: 2.2743...  5.6030 sec/batch
Epoch: 2/20...  Training Step: 304...  Training loss: 2.2863...  4.9520 sec/batch
Epoch: 2/20...  Training Step: 305...  Training loss: 2.2768...  5.7040 sec/batch
Epoch: 2/20...  Training Step: 306...  Training loss: 2.3070...  5.8740 sec/batch
Epoch: 2/20...  Training Step: 307...  Training loss: 2.3012...  6.4960 sec/batch
Epoch: 2/20...  Training Step: 308...  Training loss: 2.2620...  5.8380 sec/batch
Epoch: 2/20...  Training Step: 309...  Training loss: 2.2873...  5.6310 sec/batch
Epoch: 2/20...  Training Step: 310...  Training loss: 2.2909...  5.6540 sec/batch
Epoch: 2/20...  Training Step: 311...  Training loss: 2.2732...  5.1090 sec/batch
Epoch: 2/20...  Training Step: 312...  Training loss: 2.2629...  6.0100 sec/batch
Epoch: 2/20...  Training Step: 313...  Training loss: 2.2572...  5.7580 sec/batch
Epoch: 2/20...  Training Step: 314...  Training loss: 2.2316...  5.6040 sec/batch
Epoch: 2/20...  

Epoch: 3/20...  Training Step: 403...  Training loss: 2.1209...  6.3160 sec/batch
Epoch: 3/20...  Training Step: 404...  Training loss: 2.1275...  4.9310 sec/batch
Epoch: 3/20...  Training Step: 405...  Training loss: 2.1617...  4.9740 sec/batch
Epoch: 3/20...  Training Step: 406...  Training loss: 2.1181...  5.4010 sec/batch
Epoch: 3/20...  Training Step: 407...  Training loss: 2.1062...  5.2580 sec/batch
Epoch: 3/20...  Training Step: 408...  Training loss: 2.1024...  5.4820 sec/batch
Epoch: 3/20...  Training Step: 409...  Training loss: 2.1209...  5.6280 sec/batch
Epoch: 3/20...  Training Step: 410...  Training loss: 2.1517...  5.6840 sec/batch
Epoch: 3/20...  Training Step: 411...  Training loss: 2.1171...  5.3200 sec/batch
Epoch: 3/20...  Training Step: 412...  Training loss: 2.1019...  5.0390 sec/batch
Epoch: 3/20...  Training Step: 413...  Training loss: 2.1113...  5.4660 sec/batch
Epoch: 3/20...  Training Step: 414...  Training loss: 2.1513...  5.1390 sec/batch
Epoch: 3/20...  

Epoch: 3/20...  Training Step: 503...  Training loss: 1.9990...  4.9210 sec/batch
Epoch: 3/20...  Training Step: 504...  Training loss: 2.0134...  5.9000 sec/batch
Epoch: 3/20...  Training Step: 505...  Training loss: 2.0078...  4.8950 sec/batch
Epoch: 3/20...  Training Step: 506...  Training loss: 2.0019...  5.7630 sec/batch
Epoch: 3/20...  Training Step: 507...  Training loss: 1.9984...  5.7070 sec/batch
Epoch: 3/20...  Training Step: 508...  Training loss: 1.9916...  5.2530 sec/batch
Epoch: 3/20...  Training Step: 509...  Training loss: 1.9882...  5.2240 sec/batch
Epoch: 3/20...  Training Step: 510...  Training loss: 1.9816...  5.4900 sec/batch
Epoch: 3/20...  Training Step: 511...  Training loss: 1.9757...  6.2390 sec/batch
Epoch: 3/20...  Training Step: 512...  Training loss: 1.9497...  6.0280 sec/batch
Epoch: 3/20...  Training Step: 513...  Training loss: 1.9904...  4.8960 sec/batch
Epoch: 3/20...  Training Step: 514...  Training loss: 1.9810...  5.6810 sec/batch
Epoch: 3/20...  

Epoch: 4/20...  Training Step: 603...  Training loss: 1.9415...  5.1150 sec/batch
Epoch: 4/20...  Training Step: 604...  Training loss: 1.9011...  5.8840 sec/batch
Epoch: 4/20...  Training Step: 605...  Training loss: 1.8884...  5.1580 sec/batch
Epoch: 4/20...  Training Step: 606...  Training loss: 1.8779...  5.1220 sec/batch
Epoch: 4/20...  Training Step: 607...  Training loss: 1.8973...  5.7650 sec/batch
Epoch: 4/20...  Training Step: 608...  Training loss: 1.9334...  5.4470 sec/batch
Epoch: 4/20...  Training Step: 609...  Training loss: 1.8897...  5.2670 sec/batch
Epoch: 4/20...  Training Step: 610...  Training loss: 1.8762...  5.1540 sec/batch
Epoch: 4/20...  Training Step: 611...  Training loss: 1.8956...  4.6940 sec/batch
Epoch: 4/20...  Training Step: 612...  Training loss: 1.9334...  5.4320 sec/batch
Epoch: 4/20...  Training Step: 613...  Training loss: 1.8989...  6.5050 sec/batch
Epoch: 4/20...  Training Step: 614...  Training loss: 1.9005...  6.3560 sec/batch
Epoch: 4/20...  

Epoch: 4/20...  Training Step: 703...  Training loss: 1.8348...  6.8090 sec/batch
Epoch: 4/20...  Training Step: 704...  Training loss: 1.8363...  6.4880 sec/batch
Epoch: 4/20...  Training Step: 705...  Training loss: 1.8281...  5.9380 sec/batch
Epoch: 4/20...  Training Step: 706...  Training loss: 1.8180...  5.4300 sec/batch
Epoch: 4/20...  Training Step: 707...  Training loss: 1.8314...  5.6440 sec/batch
Epoch: 4/20...  Training Step: 708...  Training loss: 1.8199...  5.8730 sec/batch
Epoch: 4/20...  Training Step: 709...  Training loss: 1.8121...  6.6070 sec/batch
Epoch: 4/20...  Training Step: 710...  Training loss: 1.7800...  6.1680 sec/batch
Epoch: 4/20...  Training Step: 711...  Training loss: 1.8311...  5.1630 sec/batch
Epoch: 4/20...  Training Step: 712...  Training loss: 1.8117...  5.6960 sec/batch
Epoch: 4/20...  Training Step: 713...  Training loss: 1.8177...  5.9260 sec/batch
Epoch: 4/20...  Training Step: 714...  Training loss: 1.8193...  7.5770 sec/batch
Epoch: 4/20...  

Epoch: 5/20...  Training Step: 803...  Training loss: 1.7468...  5.7580 sec/batch
Epoch: 5/20...  Training Step: 804...  Training loss: 1.7408...  5.6750 sec/batch
Epoch: 5/20...  Training Step: 805...  Training loss: 1.7520...  6.3730 sec/batch
Epoch: 5/20...  Training Step: 806...  Training loss: 1.8043...  5.6670 sec/batch
Epoch: 5/20...  Training Step: 807...  Training loss: 1.7623...  5.4660 sec/batch
Epoch: 5/20...  Training Step: 808...  Training loss: 1.7486...  7.0950 sec/batch
Epoch: 5/20...  Training Step: 809...  Training loss: 1.7735...  6.9810 sec/batch
Epoch: 5/20...  Training Step: 810...  Training loss: 1.7981...  5.8280 sec/batch
Epoch: 5/20...  Training Step: 811...  Training loss: 1.7742...  5.5480 sec/batch
Epoch: 5/20...  Training Step: 812...  Training loss: 1.7642...  6.8080 sec/batch
Epoch: 5/20...  Training Step: 813...  Training loss: 1.7545...  6.5450 sec/batch
Epoch: 5/20...  Training Step: 814...  Training loss: 1.7868...  5.0690 sec/batch
Epoch: 5/20...  

Epoch: 5/20...  Training Step: 903...  Training loss: 1.6992...  5.2370 sec/batch
Epoch: 5/20...  Training Step: 904...  Training loss: 1.7032...  6.0810 sec/batch
Epoch: 5/20...  Training Step: 905...  Training loss: 1.6896...  5.2410 sec/batch
Epoch: 5/20...  Training Step: 906...  Training loss: 1.6919...  5.4740 sec/batch
Epoch: 5/20...  Training Step: 907...  Training loss: 1.6887...  5.3230 sec/batch
Epoch: 5/20...  Training Step: 908...  Training loss: 1.6715...  5.0470 sec/batch
Epoch: 5/20...  Training Step: 909...  Training loss: 1.7102...  5.1210 sec/batch
Epoch: 5/20...  Training Step: 910...  Training loss: 1.6966...  5.8140 sec/batch
Epoch: 5/20...  Training Step: 911...  Training loss: 1.6974...  6.6250 sec/batch
Epoch: 5/20...  Training Step: 912...  Training loss: 1.6964...  7.3110 sec/batch
Epoch: 5/20...  Training Step: 913...  Training loss: 1.7137...  5.5850 sec/batch
Epoch: 5/20...  Training Step: 914...  Training loss: 1.6779...  5.3470 sec/batch
Epoch: 5/20...  

Epoch: 6/20...  Training Step: 1003...  Training loss: 1.6505...  5.4990 sec/batch
Epoch: 6/20...  Training Step: 1004...  Training loss: 1.6911...  6.9560 sec/batch
Epoch: 6/20...  Training Step: 1005...  Training loss: 1.6510...  6.5990 sec/batch
Epoch: 6/20...  Training Step: 1006...  Training loss: 1.6303...  5.7760 sec/batch
Epoch: 6/20...  Training Step: 1007...  Training loss: 1.6591...  5.9840 sec/batch
Epoch: 6/20...  Training Step: 1008...  Training loss: 1.6886...  6.0060 sec/batch
Epoch: 6/20...  Training Step: 1009...  Training loss: 1.6662...  5.3090 sec/batch
Epoch: 6/20...  Training Step: 1010...  Training loss: 1.6729...  7.5290 sec/batch
Epoch: 6/20...  Training Step: 1011...  Training loss: 1.6540...  7.0770 sec/batch
Epoch: 6/20...  Training Step: 1012...  Training loss: 1.6783...  5.7060 sec/batch
Epoch: 6/20...  Training Step: 1013...  Training loss: 1.6523...  5.3940 sec/batch
Epoch: 6/20...  Training Step: 1014...  Training loss: 1.6674...  6.7860 sec/batch
Epoc

Epoch: 6/20...  Training Step: 1102...  Training loss: 1.6083...  5.7580 sec/batch
Epoch: 6/20...  Training Step: 1103...  Training loss: 1.6149...  5.6330 sec/batch
Epoch: 6/20...  Training Step: 1104...  Training loss: 1.6058...  6.6130 sec/batch
Epoch: 6/20...  Training Step: 1105...  Training loss: 1.5905...  7.2820 sec/batch
Epoch: 6/20...  Training Step: 1106...  Training loss: 1.5650...  5.3850 sec/batch
Epoch: 6/20...  Training Step: 1107...  Training loss: 1.6074...  5.9800 sec/batch
Epoch: 6/20...  Training Step: 1108...  Training loss: 1.6168...  6.2710 sec/batch
Epoch: 6/20...  Training Step: 1109...  Training loss: 1.6193...  5.7340 sec/batch
Epoch: 6/20...  Training Step: 1110...  Training loss: 1.6125...  5.1090 sec/batch
Epoch: 6/20...  Training Step: 1111...  Training loss: 1.6203...  4.9970 sec/batch
Epoch: 6/20...  Training Step: 1112...  Training loss: 1.5769...  5.7030 sec/batch
Epoch: 6/20...  Training Step: 1113...  Training loss: 1.5696...  5.8440 sec/batch
Epoc

Epoch: 7/20...  Training Step: 1201...  Training loss: 1.5655...  5.4000 sec/batch
Epoch: 7/20...  Training Step: 1202...  Training loss: 1.6091...  4.9710 sec/batch
Epoch: 7/20...  Training Step: 1203...  Training loss: 1.5696...  5.5570 sec/batch
Epoch: 7/20...  Training Step: 1204...  Training loss: 1.5507...  5.6830 sec/batch
Epoch: 7/20...  Training Step: 1205...  Training loss: 1.5864...  5.6990 sec/batch
Epoch: 7/20...  Training Step: 1206...  Training loss: 1.6083...  5.7370 sec/batch
Epoch: 7/20...  Training Step: 1207...  Training loss: 1.5915...  5.6440 sec/batch
Epoch: 7/20...  Training Step: 1208...  Training loss: 1.5965...  5.6280 sec/batch
Epoch: 7/20...  Training Step: 1209...  Training loss: 1.5717...  5.5600 sec/batch
Epoch: 7/20...  Training Step: 1210...  Training loss: 1.6118...  5.3960 sec/batch
Epoch: 7/20...  Training Step: 1211...  Training loss: 1.5695...  5.9290 sec/batch
Epoch: 7/20...  Training Step: 1212...  Training loss: 1.5868...  5.4850 sec/batch
Epoc

Epoch: 7/20...  Training Step: 1300...  Training loss: 1.5244...  5.9450 sec/batch
Epoch: 7/20...  Training Step: 1301...  Training loss: 1.5247...  6.0800 sec/batch
Epoch: 7/20...  Training Step: 1302...  Training loss: 1.5229...  5.5270 sec/batch
Epoch: 7/20...  Training Step: 1303...  Training loss: 1.5120...  5.4210 sec/batch
Epoch: 7/20...  Training Step: 1304...  Training loss: 1.4925...  5.2570 sec/batch
Epoch: 7/20...  Training Step: 1305...  Training loss: 1.5375...  5.2520 sec/batch
Epoch: 7/20...  Training Step: 1306...  Training loss: 1.5385...  5.0090 sec/batch
Epoch: 7/20...  Training Step: 1307...  Training loss: 1.5264...  4.9230 sec/batch
Epoch: 7/20...  Training Step: 1308...  Training loss: 1.5340...  4.8980 sec/batch
Epoch: 7/20...  Training Step: 1309...  Training loss: 1.5368...  4.9300 sec/batch
Epoch: 7/20...  Training Step: 1310...  Training loss: 1.5018...  5.4230 sec/batch
Epoch: 7/20...  Training Step: 1311...  Training loss: 1.4880...  4.8880 sec/batch
Epoc

Epoch: 8/20...  Training Step: 1399...  Training loss: 1.5112...  5.2840 sec/batch
Epoch: 8/20...  Training Step: 1400...  Training loss: 1.5324...  5.6760 sec/batch
Epoch: 8/20...  Training Step: 1401...  Training loss: 1.4999...  7.7550 sec/batch
Epoch: 8/20...  Training Step: 1402...  Training loss: 1.4795...  7.4910 sec/batch
Epoch: 8/20...  Training Step: 1403...  Training loss: 1.5224...  6.1380 sec/batch
Epoch: 8/20...  Training Step: 1404...  Training loss: 1.5237...  5.5960 sec/batch
Epoch: 8/20...  Training Step: 1405...  Training loss: 1.5071...  5.6380 sec/batch
Epoch: 8/20...  Training Step: 1406...  Training loss: 1.5245...  5.4910 sec/batch
Epoch: 8/20...  Training Step: 1407...  Training loss: 1.5106...  5.2510 sec/batch
Epoch: 8/20...  Training Step: 1408...  Training loss: 1.5385...  5.5010 sec/batch
Epoch: 8/20...  Training Step: 1409...  Training loss: 1.4974...  5.5370 sec/batch
Epoch: 8/20...  Training Step: 1410...  Training loss: 1.5267...  6.1490 sec/batch
Epoc

Epoch: 8/20...  Training Step: 1498...  Training loss: 1.4830...  5.1980 sec/batch
Epoch: 8/20...  Training Step: 1499...  Training loss: 1.4852...  6.1582 sec/batch
Epoch: 8/20...  Training Step: 1500...  Training loss: 1.4737...  6.1712 sec/batch
Epoch: 8/20...  Training Step: 1501...  Training loss: 1.4532...  4.2919 sec/batch
Epoch: 8/20...  Training Step: 1502...  Training loss: 1.4369...  4.3269 sec/batch
Epoch: 8/20...  Training Step: 1503...  Training loss: 1.4744...  4.7329 sec/batch
Epoch: 8/20...  Training Step: 1504...  Training loss: 1.4765...  4.9690 sec/batch
Epoch: 8/20...  Training Step: 1505...  Training loss: 1.4708...  6.8944 sec/batch
Epoch: 8/20...  Training Step: 1506...  Training loss: 1.4679...  5.7021 sec/batch
Epoch: 8/20...  Training Step: 1507...  Training loss: 1.4777...  4.4039 sec/batch
Epoch: 8/20...  Training Step: 1508...  Training loss: 1.4393...  4.5099 sec/batch
Epoch: 8/20...  Training Step: 1509...  Training loss: 1.4307...  4.8330 sec/batch
Epoc

Epoch: 9/20...  Training Step: 1597...  Training loss: 1.4543...  6.3476 sec/batch
Epoch: 9/20...  Training Step: 1598...  Training loss: 1.4940...  7.4397 sec/batch
Epoch: 9/20...  Training Step: 1599...  Training loss: 1.4611...  5.4005 sec/batch
Epoch: 9/20...  Training Step: 1600...  Training loss: 1.4339...  4.9005 sec/batch
Epoch: 9/20...  Training Step: 1601...  Training loss: 1.4788...  5.5366 sec/batch
Epoch: 9/20...  Training Step: 1602...  Training loss: 1.4917...  5.7856 sec/batch
Epoch: 9/20...  Training Step: 1603...  Training loss: 1.4660...  7.4867 sec/batch
Epoch: 9/20...  Training Step: 1604...  Training loss: 1.4859...  5.5376 sec/batch
Epoch: 9/20...  Training Step: 1605...  Training loss: 1.4499...  4.6715 sec/batch
Epoch: 9/20...  Training Step: 1606...  Training loss: 1.4649...  4.7845 sec/batch
Epoch: 9/20...  Training Step: 1607...  Training loss: 1.4561...  5.2815 sec/batch
Epoch: 9/20...  Training Step: 1608...  Training loss: 1.4702...  7.0787 sec/batch
Epoc

Epoch: 9/20...  Training Step: 1696...  Training loss: 1.4353...  4.5083 sec/batch
Epoch: 9/20...  Training Step: 1697...  Training loss: 1.4380...  5.0300 sec/batch
Epoch: 9/20...  Training Step: 1698...  Training loss: 1.4231...  5.2719 sec/batch
Epoch: 9/20...  Training Step: 1699...  Training loss: 1.4114...  6.9275 sec/batch
Epoch: 9/20...  Training Step: 1700...  Training loss: 1.3949...  5.1525 sec/batch
Epoch: 9/20...  Training Step: 1701...  Training loss: 1.4229...  5.6040 sec/batch
Epoch: 9/20...  Training Step: 1702...  Training loss: 1.4276...  5.4605 sec/batch
Epoch: 9/20...  Training Step: 1703...  Training loss: 1.4280...  6.2381 sec/batch
Epoch: 9/20...  Training Step: 1704...  Training loss: 1.4266...  6.7780 sec/batch
Epoch: 9/20...  Training Step: 1705...  Training loss: 1.4302...  4.8414 sec/batch
Epoch: 9/20...  Training Step: 1706...  Training loss: 1.3983...  5.3170 sec/batch
Epoch: 9/20...  Training Step: 1707...  Training loss: 1.3850...  4.9458 sec/batch
Epoc

Epoch: 10/20...  Training Step: 1795...  Training loss: 1.4312...  4.5454 sec/batch
Epoch: 10/20...  Training Step: 1796...  Training loss: 1.4489...  4.7627 sec/batch
Epoch: 10/20...  Training Step: 1797...  Training loss: 1.4217...  4.7387 sec/batch
Epoch: 10/20...  Training Step: 1798...  Training loss: 1.3988...  4.7647 sec/batch
Epoch: 10/20...  Training Step: 1799...  Training loss: 1.4375...  5.2523 sec/batch
Epoch: 10/20...  Training Step: 1800...  Training loss: 1.4514...  7.5561 sec/batch
Epoch: 10/20...  Training Step: 1801...  Training loss: 1.4288...  6.0138 sec/batch
Epoch: 10/20...  Training Step: 1802...  Training loss: 1.4607...  5.1316 sec/batch
Epoch: 10/20...  Training Step: 1803...  Training loss: 1.4230...  5.3058 sec/batch
Epoch: 10/20...  Training Step: 1804...  Training loss: 1.4454...  5.0290 sec/batch
Epoch: 10/20...  Training Step: 1805...  Training loss: 1.4157...  5.1216 sec/batch
Epoch: 10/20...  Training Step: 1806...  Training loss: 1.4393...  4.8468 se

Epoch: 10/20...  Training Step: 1893...  Training loss: 1.3849...  5.6596 sec/batch
Epoch: 10/20...  Training Step: 1894...  Training loss: 1.4068...  6.0529 sec/batch
Epoch: 10/20...  Training Step: 1895...  Training loss: 1.4021...  5.6195 sec/batch
Epoch: 10/20...  Training Step: 1896...  Training loss: 1.3900...  5.2814 sec/batch
Epoch: 10/20...  Training Step: 1897...  Training loss: 1.3775...  5.3095 sec/batch
Epoch: 10/20...  Training Step: 1898...  Training loss: 1.3553...  5.1118 sec/batch
Epoch: 10/20...  Training Step: 1899...  Training loss: 1.3980...  4.7065 sec/batch
Epoch: 10/20...  Training Step: 1900...  Training loss: 1.4069...  4.6964 sec/batch
Epoch: 10/20...  Training Step: 1901...  Training loss: 1.3932...  4.9423 sec/batch
Epoch: 10/20...  Training Step: 1902...  Training loss: 1.3975...  4.8249 sec/batch
Epoch: 10/20...  Training Step: 1903...  Training loss: 1.3935...  4.9593 sec/batch
Epoch: 10/20...  Training Step: 1904...  Training loss: 1.3580...  4.8770 se

Epoch: 11/20...  Training Step: 1991...  Training loss: 1.3816...  4.9965 sec/batch
Epoch: 11/20...  Training Step: 1992...  Training loss: 1.3965...  4.9845 sec/batch
Epoch: 11/20...  Training Step: 1993...  Training loss: 1.3856...  4.9324 sec/batch
Epoch: 11/20...  Training Step: 1994...  Training loss: 1.4227...  5.3861 sec/batch
Epoch: 11/20...  Training Step: 1995...  Training loss: 1.3908...  5.8369 sec/batch
Epoch: 11/20...  Training Step: 1996...  Training loss: 1.3725...  6.3008 sec/batch
Epoch: 11/20...  Training Step: 1997...  Training loss: 1.4149...  6.4779 sec/batch
Epoch: 11/20...  Training Step: 1998...  Training loss: 1.4211...  5.5519 sec/batch
Epoch: 11/20...  Training Step: 1999...  Training loss: 1.3902...  5.5329 sec/batch
Epoch: 11/20...  Training Step: 2000...  Training loss: 1.4110...  5.7876 sec/batch
Epoch: 11/20...  Training Step: 2001...  Training loss: 1.3876...  5.8436 sec/batch
Epoch: 11/20...  Training Step: 2002...  Training loss: 1.4039...  5.4180 se

Epoch: 11/20...  Training Step: 2089...  Training loss: 1.3658...  5.6823 sec/batch
Epoch: 11/20...  Training Step: 2090...  Training loss: 1.3934...  6.3725 sec/batch
Epoch: 11/20...  Training Step: 2091...  Training loss: 1.3561...  8.0722 sec/batch
Epoch: 11/20...  Training Step: 2092...  Training loss: 1.3712...  6.9358 sec/batch
Epoch: 11/20...  Training Step: 2093...  Training loss: 1.3753...  7.0638 sec/batch
Epoch: 11/20...  Training Step: 2094...  Training loss: 1.3660...  5.9624 sec/batch
Epoch: 11/20...  Training Step: 2095...  Training loss: 1.3479...  5.6363 sec/batch
Epoch: 11/20...  Training Step: 2096...  Training loss: 1.3221...  5.5182 sec/batch
Epoch: 11/20...  Training Step: 2097...  Training loss: 1.3702...  5.3571 sec/batch
Epoch: 11/20...  Training Step: 2098...  Training loss: 1.3807...  5.2851 sec/batch
Epoch: 11/20...  Training Step: 2099...  Training loss: 1.3740...  5.2271 sec/batch
Epoch: 11/20...  Training Step: 2100...  Training loss: 1.3660...  5.7453 se

Epoch: 12/20...  Training Step: 2187...  Training loss: 1.3804...  4.8882 sec/batch
Epoch: 12/20...  Training Step: 2188...  Training loss: 1.3546...  4.9740 sec/batch
Epoch: 12/20...  Training Step: 2189...  Training loss: 1.3470...  5.5102 sec/batch
Epoch: 12/20...  Training Step: 2190...  Training loss: 1.3720...  5.8736 sec/batch
Epoch: 12/20...  Training Step: 2191...  Training loss: 1.3649...  5.2047 sec/batch
Epoch: 12/20...  Training Step: 2192...  Training loss: 1.3910...  5.5821 sec/batch
Epoch: 12/20...  Training Step: 2193...  Training loss: 1.3541...  5.9135 sec/batch
Epoch: 12/20...  Training Step: 2194...  Training loss: 1.3311...  5.2356 sec/batch
Epoch: 12/20...  Training Step: 2195...  Training loss: 1.3808...  5.2156 sec/batch
Epoch: 12/20...  Training Step: 2196...  Training loss: 1.3921...  5.0669 sec/batch
Epoch: 12/20...  Training Step: 2197...  Training loss: 1.3776...  4.9980 sec/batch
Epoch: 12/20...  Training Step: 2198...  Training loss: 1.3889...  5.1368 se

Epoch: 12/20...  Training Step: 2285...  Training loss: 1.3445...  5.1170 sec/batch
Epoch: 12/20...  Training Step: 2286...  Training loss: 1.3487...  5.1930 sec/batch
Epoch: 12/20...  Training Step: 2287...  Training loss: 1.3348...  5.5170 sec/batch
Epoch: 12/20...  Training Step: 2288...  Training loss: 1.3733...  5.2830 sec/batch
Epoch: 12/20...  Training Step: 2289...  Training loss: 1.3266...  5.8980 sec/batch
Epoch: 12/20...  Training Step: 2290...  Training loss: 1.3545...  5.0940 sec/batch
Epoch: 12/20...  Training Step: 2291...  Training loss: 1.3502...  5.0750 sec/batch
Epoch: 12/20...  Training Step: 2292...  Training loss: 1.3409...  5.1990 sec/batch
Epoch: 12/20...  Training Step: 2293...  Training loss: 1.3203...  5.3930 sec/batch
Epoch: 12/20...  Training Step: 2294...  Training loss: 1.3086...  4.8940 sec/batch
Epoch: 12/20...  Training Step: 2295...  Training loss: 1.3574...  4.8270 sec/batch
Epoch: 12/20...  Training Step: 2296...  Training loss: 1.3565...  4.7730 se

Epoch: 13/20...  Training Step: 2383...  Training loss: 1.3644...  5.3853 sec/batch
Epoch: 13/20...  Training Step: 2384...  Training loss: 1.3377...  4.7998 sec/batch
Epoch: 13/20...  Training Step: 2385...  Training loss: 1.3491...  4.7818 sec/batch
Epoch: 13/20...  Training Step: 2386...  Training loss: 1.3360...  4.8017 sec/batch
Epoch: 13/20...  Training Step: 2387...  Training loss: 1.3298...  4.7298 sec/batch
Epoch: 13/20...  Training Step: 2388...  Training loss: 1.3403...  4.8437 sec/batch
Epoch: 13/20...  Training Step: 2389...  Training loss: 1.3505...  4.6789 sec/batch
Epoch: 13/20...  Training Step: 2390...  Training loss: 1.3602...  4.7498 sec/batch
Epoch: 13/20...  Training Step: 2391...  Training loss: 1.3340...  4.8347 sec/batch
Epoch: 13/20...  Training Step: 2392...  Training loss: 1.3270...  4.7069 sec/batch
Epoch: 13/20...  Training Step: 2393...  Training loss: 1.3570...  4.7868 sec/batch
Epoch: 13/20...  Training Step: 2394...  Training loss: 1.3703...  4.7418 se

Epoch: 13/20...  Training Step: 2481...  Training loss: 1.3157...  4.7920 sec/batch
Epoch: 13/20...  Training Step: 2482...  Training loss: 1.3180...  4.6650 sec/batch
Epoch: 13/20...  Training Step: 2483...  Training loss: 1.3309...  4.8150 sec/batch
Epoch: 13/20...  Training Step: 2484...  Training loss: 1.3319...  4.7260 sec/batch
Epoch: 13/20...  Training Step: 2485...  Training loss: 1.3155...  4.6820 sec/batch
Epoch: 13/20...  Training Step: 2486...  Training loss: 1.3435...  4.7780 sec/batch
Epoch: 13/20...  Training Step: 2487...  Training loss: 1.3086...  4.7330 sec/batch
Epoch: 13/20...  Training Step: 2488...  Training loss: 1.3282...  4.7170 sec/batch
Epoch: 13/20...  Training Step: 2489...  Training loss: 1.3264...  4.7580 sec/batch
Epoch: 13/20...  Training Step: 2490...  Training loss: 1.3209...  5.1810 sec/batch
Epoch: 13/20...  Training Step: 2491...  Training loss: 1.3024...  5.8460 sec/batch
Epoch: 13/20...  Training Step: 2492...  Training loss: 1.2839...  5.2110 se

Epoch: 14/20...  Training Step: 2579...  Training loss: 1.3032...  5.9135 sec/batch
Epoch: 14/20...  Training Step: 2580...  Training loss: 1.2992...  5.0270 sec/batch
Epoch: 14/20...  Training Step: 2581...  Training loss: 1.3454...  4.6508 sec/batch
Epoch: 14/20...  Training Step: 2582...  Training loss: 1.3160...  4.7819 sec/batch
Epoch: 14/20...  Training Step: 2583...  Training loss: 1.3357...  4.6988 sec/batch
Epoch: 14/20...  Training Step: 2584...  Training loss: 1.3114...  4.7929 sec/batch
Epoch: 14/20...  Training Step: 2585...  Training loss: 1.3059...  4.6608 sec/batch
Epoch: 14/20...  Training Step: 2586...  Training loss: 1.3053...  4.7178 sec/batch
Epoch: 14/20...  Training Step: 2587...  Training loss: 1.3192...  4.7579 sec/batch
Epoch: 14/20...  Training Step: 2588...  Training loss: 1.3380...  4.6918 sec/batch
Epoch: 14/20...  Training Step: 2589...  Training loss: 1.3156...  4.6638 sec/batch
Epoch: 14/20...  Training Step: 2590...  Training loss: 1.2963...  4.6438 se

Epoch: 14/20...  Training Step: 2677...  Training loss: 1.3194...  4.7825 sec/batch
Epoch: 14/20...  Training Step: 2678...  Training loss: 1.3063...  4.6545 sec/batch
Epoch: 14/20...  Training Step: 2679...  Training loss: 1.3005...  4.8155 sec/batch
Epoch: 14/20...  Training Step: 2680...  Training loss: 1.2994...  4.6905 sec/batch
Epoch: 14/20...  Training Step: 2681...  Training loss: 1.3109...  4.7535 sec/batch
Epoch: 14/20...  Training Step: 2682...  Training loss: 1.3092...  4.7485 sec/batch
Epoch: 14/20...  Training Step: 2683...  Training loss: 1.2926...  4.6285 sec/batch
Epoch: 14/20...  Training Step: 2684...  Training loss: 1.3264...  4.6615 sec/batch
Epoch: 14/20...  Training Step: 2685...  Training loss: 1.2931...  5.2655 sec/batch
Epoch: 14/20...  Training Step: 2686...  Training loss: 1.3171...  5.3195 sec/batch
Epoch: 14/20...  Training Step: 2687...  Training loss: 1.3129...  5.5104 sec/batch
Epoch: 14/20...  Training Step: 2688...  Training loss: 1.3057...  5.3575 se

Epoch: 15/20...  Training Step: 2775...  Training loss: 1.3083...  4.5844 sec/batch
Epoch: 15/20...  Training Step: 2776...  Training loss: 1.3242...  4.6114 sec/batch
Epoch: 15/20...  Training Step: 2777...  Training loss: 1.2702...  4.6634 sec/batch
Epoch: 15/20...  Training Step: 2778...  Training loss: 1.2728...  4.6762 sec/batch
Epoch: 15/20...  Training Step: 2779...  Training loss: 1.3172...  4.5883 sec/batch
Epoch: 15/20...  Training Step: 2780...  Training loss: 1.2922...  4.6993 sec/batch
Epoch: 15/20...  Training Step: 2781...  Training loss: 1.3199...  4.6373 sec/batch
Epoch: 15/20...  Training Step: 2782...  Training loss: 1.2906...  4.6353 sec/batch
Epoch: 15/20...  Training Step: 2783...  Training loss: 1.2773...  4.8174 sec/batch
Epoch: 15/20...  Training Step: 2784...  Training loss: 1.2948...  4.6523 sec/batch
Epoch: 15/20...  Training Step: 2785...  Training loss: 1.2978...  4.6183 sec/batch
Epoch: 15/20...  Training Step: 2786...  Training loss: 1.3154...  4.6623 se

Epoch: 15/20...  Training Step: 2873...  Training loss: 1.2865...  5.1050 sec/batch
Epoch: 15/20...  Training Step: 2874...  Training loss: 1.2978...  5.3310 sec/batch
Epoch: 15/20...  Training Step: 2875...  Training loss: 1.2977...  5.1160 sec/batch
Epoch: 15/20...  Training Step: 2876...  Training loss: 1.2807...  4.9830 sec/batch
Epoch: 15/20...  Training Step: 2877...  Training loss: 1.2840...  4.6420 sec/batch
Epoch: 15/20...  Training Step: 2878...  Training loss: 1.2947...  4.6390 sec/batch
Epoch: 15/20...  Training Step: 2879...  Training loss: 1.2946...  4.6350 sec/batch
Epoch: 15/20...  Training Step: 2880...  Training loss: 1.3013...  4.6380 sec/batch
Epoch: 15/20...  Training Step: 2881...  Training loss: 1.2783...  4.6800 sec/batch
Epoch: 15/20...  Training Step: 2882...  Training loss: 1.3038...  4.6170 sec/batch
Epoch: 15/20...  Training Step: 2883...  Training loss: 1.2783...  4.6650 sec/batch
Epoch: 15/20...  Training Step: 2884...  Training loss: 1.3003...  4.7000 se

Epoch: 16/20...  Training Step: 2971...  Training loss: 1.4150...  4.6479 sec/batch
Epoch: 16/20...  Training Step: 2972...  Training loss: 1.3124...  4.6749 sec/batch
Epoch: 16/20...  Training Step: 2973...  Training loss: 1.2876...  4.5769 sec/batch
Epoch: 16/20...  Training Step: 2974...  Training loss: 1.3088...  4.6679 sec/batch
Epoch: 16/20...  Training Step: 2975...  Training loss: 1.2687...  4.7269 sec/batch
Epoch: 16/20...  Training Step: 2976...  Training loss: 1.2502...  4.6629 sec/batch
Epoch: 16/20...  Training Step: 2977...  Training loss: 1.2952...  4.6109 sec/batch
Epoch: 16/20...  Training Step: 2978...  Training loss: 1.2858...  4.6159 sec/batch
Epoch: 16/20...  Training Step: 2979...  Training loss: 1.2925...  4.6159 sec/batch
Epoch: 16/20...  Training Step: 2980...  Training loss: 1.2764...  4.6089 sec/batch
Epoch: 16/20...  Training Step: 2981...  Training loss: 1.2709...  4.6579 sec/batch
Epoch: 16/20...  Training Step: 2982...  Training loss: 1.2772...  5.0950 se

Epoch: 16/20...  Training Step: 3069...  Training loss: 1.2584...  4.6722 sec/batch
Epoch: 16/20...  Training Step: 3070...  Training loss: 1.2458...  4.6132 sec/batch
Epoch: 16/20...  Training Step: 3071...  Training loss: 1.2750...  4.6072 sec/batch
Epoch: 16/20...  Training Step: 3072...  Training loss: 1.2670...  4.5972 sec/batch
Epoch: 16/20...  Training Step: 3073...  Training loss: 1.2763...  4.5952 sec/batch
Epoch: 16/20...  Training Step: 3074...  Training loss: 1.2712...  4.6262 sec/batch
Epoch: 16/20...  Training Step: 3075...  Training loss: 1.2725...  4.7012 sec/batch
Epoch: 16/20...  Training Step: 3076...  Training loss: 1.2699...  4.6432 sec/batch
Epoch: 16/20...  Training Step: 3077...  Training loss: 1.2833...  4.8152 sec/batch
Epoch: 16/20...  Training Step: 3078...  Training loss: 1.2874...  4.8520 sec/batch
Epoch: 16/20...  Training Step: 3079...  Training loss: 1.2684...  5.0730 sec/batch
Epoch: 16/20...  Training Step: 3080...  Training loss: 1.2895...  4.9740 se

Epoch: 16/20...  Training Step: 3167...  Training loss: 1.2619...  4.8100 sec/batch
Epoch: 16/20...  Training Step: 3168...  Training loss: 1.2613...  5.1530 sec/batch
Epoch: 17/20...  Training Step: 3169...  Training loss: 1.3901...  5.0650 sec/batch
Epoch: 17/20...  Training Step: 3170...  Training loss: 1.2861...  5.0930 sec/batch
Epoch: 17/20...  Training Step: 3171...  Training loss: 1.2799...  5.2170 sec/batch
Epoch: 17/20...  Training Step: 3172...  Training loss: 1.3011...  5.2621 sec/batch
Epoch: 17/20...  Training Step: 3173...  Training loss: 1.2466...  5.1060 sec/batch
Epoch: 17/20...  Training Step: 3174...  Training loss: 1.2414...  5.2170 sec/batch
Epoch: 17/20...  Training Step: 3175...  Training loss: 1.2851...  4.8380 sec/batch
Epoch: 17/20...  Training Step: 3176...  Training loss: 1.2675...  4.6329 sec/batch
Epoch: 17/20...  Training Step: 3177...  Training loss: 1.2799...  4.6469 sec/batch
Epoch: 17/20...  Training Step: 3178...  Training loss: 1.2630...  4.6769 se

Epoch: 17/20...  Training Step: 3265...  Training loss: 1.2631...  4.8336 sec/batch
Epoch: 17/20...  Training Step: 3266...  Training loss: 1.2437...  4.8686 sec/batch
Epoch: 17/20...  Training Step: 3267...  Training loss: 1.2635...  4.6887 sec/batch
Epoch: 17/20...  Training Step: 3268...  Training loss: 1.2448...  4.7996 sec/batch
Epoch: 17/20...  Training Step: 3269...  Training loss: 1.2655...  4.8086 sec/batch
Epoch: 17/20...  Training Step: 3270...  Training loss: 1.2500...  5.0175 sec/batch
Epoch: 17/20...  Training Step: 3271...  Training loss: 1.2795...  5.2914 sec/batch
Epoch: 17/20...  Training Step: 3272...  Training loss: 1.2581...  5.0305 sec/batch
Epoch: 17/20...  Training Step: 3273...  Training loss: 1.2560...  4.7796 sec/batch
Epoch: 17/20...  Training Step: 3274...  Training loss: 1.2625...  4.8436 sec/batch
Epoch: 17/20...  Training Step: 3275...  Training loss: 1.2509...  4.7646 sec/batch
Epoch: 17/20...  Training Step: 3276...  Training loss: 1.2697...  4.6177 se

Epoch: 17/20...  Training Step: 3363...  Training loss: 1.2211...  5.0315 sec/batch
Epoch: 17/20...  Training Step: 3364...  Training loss: 1.2609...  4.8455 sec/batch
Epoch: 17/20...  Training Step: 3365...  Training loss: 1.2496...  5.1665 sec/batch
Epoch: 17/20...  Training Step: 3366...  Training loss: 1.2467...  5.0485 sec/batch
Epoch: 18/20...  Training Step: 3367...  Training loss: 1.3799...  4.7575 sec/batch
Epoch: 18/20...  Training Step: 3368...  Training loss: 1.2756...  4.6905 sec/batch
Epoch: 18/20...  Training Step: 3369...  Training loss: 1.2614...  4.6445 sec/batch
Epoch: 18/20...  Training Step: 3370...  Training loss: 1.2850...  5.4215 sec/batch
Epoch: 18/20...  Training Step: 3371...  Training loss: 1.2423...  5.3205 sec/batch
Epoch: 18/20...  Training Step: 3372...  Training loss: 1.2187...  4.9895 sec/batch
Epoch: 18/20...  Training Step: 3373...  Training loss: 1.2617...  4.7445 sec/batch
Epoch: 18/20...  Training Step: 3374...  Training loss: 1.2516...  4.6615 se

Epoch: 18/20...  Training Step: 3461...  Training loss: 1.2394...  5.0660 sec/batch
Epoch: 18/20...  Training Step: 3462...  Training loss: 1.2619...  5.3751 sec/batch
Epoch: 18/20...  Training Step: 3463...  Training loss: 1.2625...  5.2901 sec/batch
Epoch: 18/20...  Training Step: 3464...  Training loss: 1.2240...  4.8060 sec/batch
Epoch: 18/20...  Training Step: 3465...  Training loss: 1.2405...  5.2450 sec/batch
Epoch: 18/20...  Training Step: 3466...  Training loss: 1.2364...  4.9050 sec/batch
Epoch: 18/20...  Training Step: 3467...  Training loss: 1.2577...  4.8350 sec/batch
Epoch: 18/20...  Training Step: 3468...  Training loss: 1.2345...  4.9490 sec/batch
Epoch: 18/20...  Training Step: 3469...  Training loss: 1.2547...  5.0520 sec/batch
Epoch: 18/20...  Training Step: 3470...  Training loss: 1.2451...  5.1140 sec/batch
Epoch: 18/20...  Training Step: 3471...  Training loss: 1.2359...  5.2380 sec/batch
Epoch: 18/20...  Training Step: 3472...  Training loss: 1.2490...  5.0890 se

Epoch: 18/20...  Training Step: 3559...  Training loss: 1.2577...  4.8450 sec/batch
Epoch: 18/20...  Training Step: 3560...  Training loss: 1.2453...  5.1200 sec/batch
Epoch: 18/20...  Training Step: 3561...  Training loss: 1.2114...  5.4201 sec/batch
Epoch: 18/20...  Training Step: 3562...  Training loss: 1.2581...  5.3501 sec/batch
Epoch: 18/20...  Training Step: 3563...  Training loss: 1.2354...  5.2671 sec/batch
Epoch: 18/20...  Training Step: 3564...  Training loss: 1.2386...  4.7930 sec/batch
Epoch: 19/20...  Training Step: 3565...  Training loss: 1.3552...  4.7079 sec/batch
Epoch: 19/20...  Training Step: 3566...  Training loss: 1.2601...  4.6709 sec/batch
Epoch: 19/20...  Training Step: 3567...  Training loss: 1.2511...  4.6359 sec/batch
Epoch: 19/20...  Training Step: 3568...  Training loss: 1.2658...  4.7139 sec/batch
Epoch: 19/20...  Training Step: 3569...  Training loss: 1.2305...  4.7019 sec/batch
Epoch: 19/20...  Training Step: 3570...  Training loss: 1.2068...  4.6339 se

Epoch: 19/20...  Training Step: 3657...  Training loss: 1.2202...  4.8596 sec/batch
Epoch: 19/20...  Training Step: 3658...  Training loss: 1.2244...  5.0884 sec/batch
Epoch: 19/20...  Training Step: 3659...  Training loss: 1.2263...  4.7677 sec/batch
Epoch: 19/20...  Training Step: 3660...  Training loss: 1.2466...  4.8217 sec/batch
Epoch: 19/20...  Training Step: 3661...  Training loss: 1.2463...  4.8646 sec/batch
Epoch: 19/20...  Training Step: 3662...  Training loss: 1.2137...  4.6878 sec/batch
Epoch: 19/20...  Training Step: 3663...  Training loss: 1.2326...  4.6158 sec/batch
Epoch: 19/20...  Training Step: 3664...  Training loss: 1.2240...  4.6298 sec/batch
Epoch: 19/20...  Training Step: 3665...  Training loss: 1.2466...  4.5939 sec/batch
Epoch: 19/20...  Training Step: 3666...  Training loss: 1.2314...  4.5759 sec/batch
Epoch: 19/20...  Training Step: 3667...  Training loss: 1.2418...  4.8966 sec/batch
Epoch: 19/20...  Training Step: 3668...  Training loss: 1.2258...  5.1953 se

Epoch: 19/20...  Training Step: 3755...  Training loss: 1.2266...  4.9750 sec/batch
Epoch: 19/20...  Training Step: 3756...  Training loss: 1.2265...  4.9580 sec/batch
Epoch: 19/20...  Training Step: 3757...  Training loss: 1.2442...  4.7500 sec/batch
Epoch: 19/20...  Training Step: 3758...  Training loss: 1.2257...  4.5714 sec/batch
Epoch: 19/20...  Training Step: 3759...  Training loss: 1.1992...  4.6420 sec/batch
Epoch: 19/20...  Training Step: 3760...  Training loss: 1.2480...  4.5790 sec/batch
Epoch: 19/20...  Training Step: 3761...  Training loss: 1.2240...  4.6560 sec/batch
Epoch: 19/20...  Training Step: 3762...  Training loss: 1.2261...  4.5950 sec/batch
Epoch: 20/20...  Training Step: 3763...  Training loss: 1.3487...  4.6540 sec/batch
Epoch: 20/20...  Training Step: 3764...  Training loss: 1.2495...  4.5860 sec/batch
Epoch: 20/20...  Training Step: 3765...  Training loss: 1.2389...  4.6480 sec/batch
Epoch: 20/20...  Training Step: 3766...  Training loss: 1.2671...  4.7820 se

Epoch: 20/20...  Training Step: 3853...  Training loss: 1.2080...  4.6825 sec/batch
Epoch: 20/20...  Training Step: 3854...  Training loss: 1.2216...  4.6615 sec/batch
Epoch: 20/20...  Training Step: 3855...  Training loss: 1.2054...  4.7745 sec/batch
Epoch: 20/20...  Training Step: 3856...  Training loss: 1.2000...  4.6065 sec/batch
Epoch: 20/20...  Training Step: 3857...  Training loss: 1.2237...  4.6285 sec/batch
Epoch: 20/20...  Training Step: 3858...  Training loss: 1.2449...  4.8455 sec/batch
Epoch: 20/20...  Training Step: 3859...  Training loss: 1.2393...  4.7835 sec/batch
Epoch: 20/20...  Training Step: 3860...  Training loss: 1.2062...  4.9005 sec/batch
Epoch: 20/20...  Training Step: 3861...  Training loss: 1.2138...  4.8155 sec/batch
Epoch: 20/20...  Training Step: 3862...  Training loss: 1.2126...  4.8325 sec/batch
Epoch: 20/20...  Training Step: 3863...  Training loss: 1.2317...  4.7705 sec/batch
Epoch: 20/20...  Training Step: 3864...  Training loss: 1.2247...  4.8075 se

Epoch: 20/20...  Training Step: 3951...  Training loss: 1.2261...  4.8760 sec/batch
Epoch: 20/20...  Training Step: 3952...  Training loss: 1.2221...  4.8270 sec/batch
Epoch: 20/20...  Training Step: 3953...  Training loss: 1.2149...  4.9710 sec/batch
Epoch: 20/20...  Training Step: 3954...  Training loss: 1.2076...  5.0870 sec/batch
Epoch: 20/20...  Training Step: 3955...  Training loss: 1.2255...  5.6401 sec/batch
Epoch: 20/20...  Training Step: 3956...  Training loss: 1.2173...  6.2933 sec/batch
Epoch: 20/20...  Training Step: 3957...  Training loss: 1.2002...  5.5591 sec/batch
Epoch: 20/20...  Training Step: 3958...  Training loss: 1.2278...  5.6381 sec/batch
Epoch: 20/20...  Training Step: 3959...  Training loss: 1.2275...  5.2531 sec/batch
Epoch: 20/20...  Training Step: 3960...  Training loss: 1.2200...  4.9730 sec/batch


In [124]:
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints\\i3960_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i1000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i1200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i1400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i1600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i1800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i3000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i3200_l512.ckpt"
all_mo

# Sampling
Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.

In [125]:
def pick_top_N(preds,vocab_size,top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]]=0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [168]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    samples = [c for c in prime]
    model = charRNN(len(vocab),lstm_size=lstm_size,sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess,checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1,1))
            x[0,0]=chat_to_int[c]
            feed_dict = { model.inputs:x,
                        model.keep_proba:1,
                        model.initial_state:new_state}
            preds,new_state = sess.run([model.prediction,model.final_state],feed_dict=feed_dict)
        
        c = pick_top_N(preds,len(vocab))
        samples.append(int_to_car[c])
        for i in range(n_samples):
            x[0,0]=c
            feed_dict = { model.inputs:x,
                        model.keep_proba:1,
                        model.initial_state:new_state}
            preds,new_state = sess.run([model.prediction,model.final_state],feed_dict=feed_dict)
            
            c = pick_top_N(preds,len(vocab))
            samples.append(int_to_car[c])
            
    return ''.join(samples)
                       
        


Here, pass in the path to a checkpoint and sample from the network.

In [169]:
tf.train.latest_checkpoint('checkpoints')

'checkpoints\\i3960_l512.ckpt'

In [170]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint,2000,lstm_size,len(vocab), prime="Far")
print (samp)

INFO:tensorflow:Restoring parameters from checkpoints\i3960_l512.ckpt
Farriath was
all, to her the sound of the people. The point of the country and
had bought in the past, and he went to spake towers the steams, and
to her heart. He saw that they and he saw she was that he was in tell the
province, would have said something. Turns of her cases and at the table
with which she shameded all the same way as he had at the same
crack of any sterning to the country in a force that they were both
as seemed to to hear and when he saw her hellow only of the same she
had said:

"That's all about you," said the baby shooting, but she felt so
show herself and decided in anyone when he was seeing, and was sorry
of the more and marshing which in haspen back he would have than
ever see how in his brather and thinking, he had been such a man to
believe in the proposed, though to dress in any words would have
been some straight of interest in, and the day about him. And all
too had no death seemed it h

In [171]:
checkpoint="checkpoints\\i200_l512.ckpt"
samp = sample(checkpoint,1000,lstm_size,len(vocab), prime="Far")
print (samp)

INFO:tensorflow:Restoring parameters from checkpoints\i200_l512.ckpt
Farng had atinnd. " on te ortis the san hos he shes set to atinn ant an at theed an he wis on here he th an tant an hang ose shos he an tore an ot and wer tit ale san oserad as ooth tha tis thar anth the tha ing wh ise tot the as ite hit ared. he athes hhite he ho tes ha aneg on heese hee and asestharesat ifo te son his tot ise tar han sithire so te tho he ware te hons hos, son thar soton sot orer ald ont ise he her althes theresend,, the the wire ans her ons ans tou sart int oond ate the arinnte al ond th as the thers tortitit he tho tire heres totin ton onte se an alt an otithe sot hind torin the wim tansid there sothe
heessand
on that hes anthan an the se athin on who that so he sasitos war tou the wans int onthat on aress ho the hang ho ta than thit ang whe thiser that the wos to has so on an tis ant at as is her sot the sirind tho he sor tho wh ter thon he antile wos ithere wind whe the hand tou serat ho times an