# Anna KaRNNa

In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.

This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.

<img src="assets/charseq.jpeg" width="500">

In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

First we'll load the text file and convert it into integers for our network to use.

In [2]:
with open('anna.txt', 'r') as f:
    text=f.read()
vocab = set(text)
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

In [3]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

In [4]:
chars[:100]

array([15, 19, 65, 57, 71, 37, 62, 10, 29, 17, 17, 17, 44, 65, 57, 57,  2,
       10, 13, 65, 26, 27, 14, 27, 37, 52, 10, 65, 62, 37, 10, 65, 14, 14,
       10, 65, 14, 27, 40, 37, 79, 10, 37, 46, 37, 62,  2, 10, 18, 16, 19,
       65, 57, 57,  2, 10, 13, 65, 26, 27, 14,  2, 10, 27, 52, 10, 18, 16,
       19, 65, 57, 57,  2, 10, 27, 16, 10, 27, 71, 52, 10, 82, 12, 16, 17,
       12, 65,  2, 32, 17, 17, 64, 46, 37, 62,  2, 71, 19, 27, 16], dtype=int32)

Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.

Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.

The idea here is to make a 2D matrix where the number of rows is equal to the number of batches. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the `split_frac` keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set.

In [5]:
def split_data(chars, batch_size, num_steps, split_frac=0.9):
    """ 
    Split character data into training and validation sets, inputs and targets for each set.
    
    Arguments
    ---------
    chars: character array
    batch_size: Size of examples in each of batch
    num_steps: Number of sequence steps to keep in the input and pass to the network
    split_frac: Fraction of batches to keep in the training set
    
    
    Returns train_x, train_y, val_x, val_y
    """
    
    slice_size = batch_size * num_steps
    n_batches = int(len(chars) / slice_size)
    
    # Drop the last few characters to make only full batches
    x = chars[: n_batches*slice_size]
    y = chars[1: n_batches*slice_size + 1]
    
    # Split the data into batch_size slices, then stack them into a 2D matrix 
    x = np.stack(np.split(x, batch_size))
    y = np.stack(np.split(y, batch_size))
    
    # Now x and y are arrays with dimensions batch_size x n_batches*num_steps
    
    # Split into training and validation sets, keep the virst split_frac batches for training
    split_idx = int(n_batches*split_frac)
    train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]
    val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]
    
    return train_x, train_y, val_x, val_y

In [6]:
train_x, train_y, val_x, val_y = split_data(chars, 10, 200)

In [7]:
train_x.shape

(10, 178400)

In [8]:
train_x[:,:10]

array([[15, 19, 65, 57, 71, 37, 62, 10, 29, 17],
       [49, 16, 58, 10, 19, 37, 10, 26, 82, 46],
       [10, 72, 65, 71, 72, 19, 27, 16, 66, 10],
       [82, 71, 19, 37, 62, 10, 12, 82, 18, 14],
       [10, 71, 19, 37, 10, 14, 65, 16, 58, 20],
       [10, 61, 19, 62, 82, 18, 66, 19, 10, 14],
       [71, 10, 71, 82, 17, 58, 82, 32, 17, 17],
       [82, 10, 19, 37, 62, 52, 37, 14, 13,  4],
       [19, 65, 71, 10, 27, 52, 10, 71, 19, 37],
       [37, 62, 52, 37, 14, 13, 10, 65, 16, 58]], dtype=int32)

I'll write another function to grab batches out of the arrays made by split data. Here each batch will be a sliding window on these arrays with size `batch_size X num_steps`. For example, if we want our network to train on a sequence of 100 characters, `num_steps = 100`. For the next batch, we'll shift this window the next sequence of `num_steps` characters. In this way we can feed batches to the network and the cell states will continue through on each batch.

In [9]:
def get_batch(arrs, num_steps):
    batch_size, slice_size = arrs[0].shape
    
    n_batches = int(slice_size/num_steps)
    for b in range(n_batches):
        yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]

In [10]:
def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,
              learning_rate=0.001, grad_clip=5, sampling=False):
        
    if sampling == True:
        batch_size, num_steps = 1, 1

    tf.reset_default_graph()
    
    # Declare placeholders we'll feed into the graph
    with tf.name_scope('inputs'):
        inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
        x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')
    
    with tf.name_scope('targets'):
        targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
        y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')
        y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])
    
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    # Build the RNN layers
    with tf.name_scope("RNN_cells"):
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
#         cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)
        cell = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.BasicLSTMCell(lstm_size),
            output_keep_prob=keep_prob) for _ in range(num_layers)])
    
    with tf.name_scope("RNN_init_state"):
        initial_state = cell.zero_state(batch_size, tf.float32)

    # Run the data through the RNN layers
    with tf.name_scope("RNN_forward"):
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=initial_state)
    
    final_state = state
    
    # Reshape output so it's a bunch of rows, one row for each cell output
    with tf.name_scope('sequence_reshape'):
        seq_output = tf.concat(outputs, axis=1,name='seq_output')
        output = tf.reshape(seq_output, [-1, lstm_size], name='graph_output')
    
    # Now connect the RNN outputs to a softmax layer and calculate the cost
    with tf.name_scope('logits'):
        softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1),
                               name='softmax_w')
        softmax_b = tf.Variable(tf.zeros(num_classes), name='softmax_b')
        logits = tf.matmul(output, softmax_w) + softmax_b
        tf.summary.histogram('softmax_w', softmax_w)
        tf.summary.histogram('softmax_b', softmax_b)

    with tf.name_scope('predictions'):
        preds = tf.nn.softmax(logits, name='predictions')
        tf.summary.histogram('predictions', preds)
    
    with tf.name_scope('cost'):
        loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped, name='loss')
        cost = tf.reduce_mean(loss, name='cost')
        tf.summary.scalar('cost', cost)

    # Optimizer for training, using gradient clipping to control exploding gradients
    with tf.name_scope('train'):
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)
        train_op = tf.train.AdamOptimizer(learning_rate)
        optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    merged = tf.summary.merge_all()
    
    # Export the nodes 
    export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',
                    'keep_prob', 'cost', 'preds', 'optimizer', 'merged']
    Graph = namedtuple('Graph', export_nodes)
    local_dict = locals()
    graph = Graph(*[local_dict[each] for each in export_nodes])
    
    return graph

## Hyperparameters

Here I'm defining the hyperparameters for the network. The two you probably haven't seen before are `lstm_size` and `num_layers`. These set the number of hidden units in the LSTM layers and the number of LSTM layers, respectively. Of course, making these bigger will improve the network's performance but you'll have to watch out for overfitting. If your validation loss is much larger than the training loss, you're probably overfitting. Decrease the size of the network or decrease the dropout keep probability.

In [15]:
batch_size = 50#100
num_steps = 50#100
lstm_size = 256#512
num_layers = 2
learning_rate = 0.001

## Training

Time for training which is is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I calculate the validation loss and save a checkpoint.

In [16]:
!mkdir -p checkpoints/anna

In [17]:
def train(model, epochs, file_writer):
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        # Use the line below to load a checkpoint and resume training
        #saver.restore(sess, 'checkpoints/anna20.ckpt')

        n_batches = int(train_x.shape[1]/num_steps)
        iterations = n_batches * epochs
        for e in range(epochs):

            # Train network
            new_state = sess.run(model.initial_state)
            loss = 0
            for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
                iteration = e*n_batches + b
                start = time.time()
                feed = {model.inputs: x,
                        model.targets: y,
                        model.keep_prob: 0.5,
                        model.initial_state: new_state}
                summary, batch_loss, new_state, _ = sess.run([model.merged, model.cost, 
                                                              model.final_state, model.optimizer], 
                                                              feed_dict=feed)
                loss += batch_loss
                end = time.time()
                print('Epoch {}/{} '.format(e+1, epochs),
                      'Iteration {}/{}'.format(iteration, iterations),
                      'Training loss: {:.4f}'.format(loss/b),
                      '{:.4f} sec/batch'.format((end-start)))

                file_writer.add_summary(summary, iteration)

In [None]:
epochs = 5#20
batch_size = 100
num_steps = 100
train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)

for lstm_size in [128,256,512]:
    for num_layers in [1, 2]:
        for learning_rate in [0.002, 0.001]:
            log_string = 'logs/4/lr={},rl={},ru={}'.format(learning_rate, num_layers, lstm_size)
            writer = tf.summary.FileWriter(log_string)
            model = build_rnn(len(vocab), 
                    batch_size=batch_size,
                    num_steps=num_steps,
                    learning_rate=learning_rate,
                    lstm_size=lstm_size,
                    num_layers=num_layers)
            
            train(model, epochs, writer)

Epoch 1/5  Iteration 1/890 Training loss: 4.4204 1.0047 sec/batch
Epoch 1/5  Iteration 2/890 Training loss: 4.4085 0.8002 sec/batch
Epoch 1/5  Iteration 3/890 Training loss: 4.3950 0.7837 sec/batch
Epoch 1/5  Iteration 4/890 Training loss: 4.3759 0.8324 sec/batch
Epoch 1/5  Iteration 5/890 Training loss: 4.3371 0.8020 sec/batch
Epoch 1/5  Iteration 6/890 Training loss: 4.2600 0.8325 sec/batch
Epoch 1/5  Iteration 7/890 Training loss: 4.1707 0.7997 sec/batch
Epoch 1/5  Iteration 8/890 Training loss: 4.0897 0.7947 sec/batch
Epoch 1/5  Iteration 9/890 Training loss: 4.0183 0.8000 sec/batch
Epoch 1/5  Iteration 10/890 Training loss: 3.9582 0.8001 sec/batch
Epoch 1/5  Iteration 11/890 Training loss: 3.9033 0.8217 sec/batch
Epoch 1/5  Iteration 12/890 Training loss: 3.8572 0.8033 sec/batch
Epoch 1/5  Iteration 13/890 Training loss: 3.8157 0.7815 sec/batch
Epoch 1/5  Iteration 14/890 Training loss: 3.7804 0.8313 sec/batch
Epoch 1/5  Iteration 15/890 Training loss: 3.7487 0.8236 sec/batch
Epoc

Epoch 1/5  Iteration 124/890 Training loss: 3.1216 0.5313 sec/batch
Epoch 1/5  Iteration 125/890 Training loss: 3.1187 0.5535 sec/batch
Epoch 1/5  Iteration 126/890 Training loss: 3.1156 0.5552 sec/batch
Epoch 1/5  Iteration 127/890 Training loss: 3.1128 0.5510 sec/batch
Epoch 1/5  Iteration 128/890 Training loss: 3.1100 0.5345 sec/batch
Epoch 1/5  Iteration 129/890 Training loss: 3.1071 0.5431 sec/batch
Epoch 1/5  Iteration 130/890 Training loss: 3.1042 0.5486 sec/batch
Epoch 1/5  Iteration 131/890 Training loss: 3.1014 0.5587 sec/batch
Epoch 1/5  Iteration 132/890 Training loss: 3.0985 0.5459 sec/batch
Epoch 1/5  Iteration 133/890 Training loss: 3.0957 0.5496 sec/batch
Epoch 1/5  Iteration 134/890 Training loss: 3.0928 0.5445 sec/batch
Epoch 1/5  Iteration 135/890 Training loss: 3.0897 0.5588 sec/batch
Epoch 1/5  Iteration 136/890 Training loss: 3.0868 0.5277 sec/batch
Epoch 1/5  Iteration 137/890 Training loss: 3.0839 0.5404 sec/batch
Epoch 1/5  Iteration 138/890 Training loss: 3.08

Epoch 2/5  Iteration 245/890 Training loss: 2.4793 0.6168 sec/batch
Epoch 2/5  Iteration 246/890 Training loss: 2.4781 0.6006 sec/batch
Epoch 2/5  Iteration 247/890 Training loss: 2.4770 0.5987 sec/batch
Epoch 2/5  Iteration 248/890 Training loss: 2.4762 0.6310 sec/batch
Epoch 2/5  Iteration 249/890 Training loss: 2.4755 0.6137 sec/batch
Epoch 2/5  Iteration 250/890 Training loss: 2.4749 0.6175 sec/batch
Epoch 2/5  Iteration 251/890 Training loss: 2.4741 0.6075 sec/batch
Epoch 2/5  Iteration 252/890 Training loss: 2.4733 0.6201 sec/batch
Epoch 2/5  Iteration 253/890 Training loss: 2.4725 0.6165 sec/batch
Epoch 2/5  Iteration 254/890 Training loss: 2.4722 0.6188 sec/batch
Epoch 2/5  Iteration 255/890 Training loss: 2.4714 0.6247 sec/batch
Epoch 2/5  Iteration 256/890 Training loss: 2.4709 0.6216 sec/batch
Epoch 2/5  Iteration 257/890 Training loss: 2.4699 0.6247 sec/batch
Epoch 2/5  Iteration 258/890 Training loss: 2.4691 0.6323 sec/batch
Epoch 2/5  Iteration 259/890 Training loss: 2.46

Epoch 3/5  Iteration 366/890 Training loss: 2.3177 0.7386 sec/batch
Epoch 3/5  Iteration 367/890 Training loss: 2.3162 0.8582 sec/batch
Epoch 3/5  Iteration 368/890 Training loss: 2.3159 0.7455 sec/batch
Epoch 3/5  Iteration 369/890 Training loss: 2.3163 0.7260 sec/batch
Epoch 3/5  Iteration 370/890 Training loss: 2.3186 0.9335 sec/batch
Epoch 3/5  Iteration 371/890 Training loss: 2.3180 0.9681 sec/batch
Epoch 3/5  Iteration 372/890 Training loss: 2.3178 0.7371 sec/batch
Epoch 3/5  Iteration 373/890 Training loss: 2.3174 0.6946 sec/batch
Epoch 3/5  Iteration 374/890 Training loss: 2.3194 0.7255 sec/batch
Epoch 3/5  Iteration 375/890 Training loss: 2.3194 0.7861 sec/batch
Epoch 3/5  Iteration 376/890 Training loss: 2.3183 0.6923 sec/batch
Epoch 3/5  Iteration 377/890 Training loss: 2.3178 0.6775 sec/batch
Epoch 3/5  Iteration 378/890 Training loss: 2.3191 0.7043 sec/batch
Epoch 3/5  Iteration 379/890 Training loss: 2.3185 0.7012 sec/batch
Epoch 3/5  Iteration 380/890 Training loss: 2.31

Epoch 3/5  Iteration 487/890 Training loss: 2.2742 0.7067 sec/batch
Epoch 3/5  Iteration 488/890 Training loss: 2.2736 0.7040 sec/batch
Epoch 3/5  Iteration 489/890 Training loss: 2.2734 0.6859 sec/batch
Epoch 3/5  Iteration 490/890 Training loss: 2.2732 0.7013 sec/batch
Epoch 3/5  Iteration 491/890 Training loss: 2.2729 0.6851 sec/batch
Epoch 3/5  Iteration 492/890 Training loss: 2.2727 0.6835 sec/batch
Epoch 3/5  Iteration 493/890 Training loss: 2.2724 0.6915 sec/batch
Epoch 3/5  Iteration 494/890 Training loss: 2.2723 0.6928 sec/batch
Epoch 3/5  Iteration 495/890 Training loss: 2.2723 0.6677 sec/batch
Epoch 3/5  Iteration 496/890 Training loss: 2.2719 0.7124 sec/batch
Epoch 3/5  Iteration 497/890 Training loss: 2.2718 0.6644 sec/batch
Epoch 3/5  Iteration 498/890 Training loss: 2.2716 0.6869 sec/batch
Epoch 3/5  Iteration 499/890 Training loss: 2.2714 0.6917 sec/batch
Epoch 3/5  Iteration 500/890 Training loss: 2.2711 0.6743 sec/batch
Epoch 3/5  Iteration 501/890 Training loss: 2.27

Epoch 4/5  Iteration 608/890 Training loss: 2.1929 0.7016 sec/batch
Epoch 4/5  Iteration 609/890 Training loss: 2.1927 0.7072 sec/batch
Epoch 4/5  Iteration 610/890 Training loss: 2.1928 0.6985 sec/batch
Epoch 4/5  Iteration 611/890 Training loss: 2.1925 0.6875 sec/batch
Epoch 4/5  Iteration 612/890 Training loss: 2.1925 0.6945 sec/batch
Epoch 4/5  Iteration 613/890 Training loss: 2.1919 0.6873 sec/batch
Epoch 4/5  Iteration 614/890 Training loss: 2.1916 0.6883 sec/batch
Epoch 4/5  Iteration 615/890 Training loss: 2.1910 0.6868 sec/batch
Epoch 4/5  Iteration 616/890 Training loss: 2.1910 0.7089 sec/batch
Epoch 4/5  Iteration 617/890 Training loss: 2.1905 0.6815 sec/batch
Epoch 4/5  Iteration 618/890 Training loss: 2.1900 0.6874 sec/batch
Epoch 4/5  Iteration 619/890 Training loss: 2.1893 0.6962 sec/batch
Epoch 4/5  Iteration 620/890 Training loss: 2.1888 0.6990 sec/batch
Epoch 4/5  Iteration 621/890 Training loss: 2.1886 0.7015 sec/batch
Epoch 4/5  Iteration 622/890 Training loss: 2.18

Epoch 5/5  Iteration 729/890 Training loss: 2.1196 0.7053 sec/batch
Epoch 5/5  Iteration 730/890 Training loss: 2.1217 0.7109 sec/batch
Epoch 5/5  Iteration 731/890 Training loss: 2.1217 0.6907 sec/batch
Epoch 5/5  Iteration 732/890 Training loss: 2.1212 0.7080 sec/batch
Epoch 5/5  Iteration 733/890 Training loss: 2.1204 0.7004 sec/batch
Epoch 5/5  Iteration 734/890 Training loss: 2.1217 0.6907 sec/batch
Epoch 5/5  Iteration 735/890 Training loss: 2.1214 0.7077 sec/batch
Epoch 5/5  Iteration 736/890 Training loss: 2.1208 0.7099 sec/batch
Epoch 5/5  Iteration 737/890 Training loss: 2.1204 0.6978 sec/batch
Epoch 5/5  Iteration 738/890 Training loss: 2.1199 0.7074 sec/batch
Epoch 5/5  Iteration 739/890 Training loss: 2.1189 0.7035 sec/batch
Epoch 5/5  Iteration 740/890 Training loss: 2.1191 0.6913 sec/batch
Epoch 5/5  Iteration 741/890 Training loss: 2.1199 0.7035 sec/batch
Epoch 5/5  Iteration 742/890 Training loss: 2.1200 0.7006 sec/batch
Epoch 5/5  Iteration 743/890 Training loss: 2.12

Epoch 5/5  Iteration 850/890 Training loss: 2.0907 1.3415 sec/batch
Epoch 5/5  Iteration 851/890 Training loss: 2.0908 1.0857 sec/batch
Epoch 5/5  Iteration 852/890 Training loss: 2.0904 1.0772 sec/batch
Epoch 5/5  Iteration 853/890 Training loss: 2.0905 1.1534 sec/batch
Epoch 5/5  Iteration 854/890 Training loss: 2.0902 1.2031 sec/batch
Epoch 5/5  Iteration 855/890 Training loss: 2.0901 1.1379 sec/batch
Epoch 5/5  Iteration 856/890 Training loss: 2.0899 1.1590 sec/batch
Epoch 5/5  Iteration 857/890 Training loss: 2.0896 1.0541 sec/batch
Epoch 5/5  Iteration 858/890 Training loss: 2.0896 1.0463 sec/batch
Epoch 5/5  Iteration 859/890 Training loss: 2.0894 1.0146 sec/batch
Epoch 5/5  Iteration 860/890 Training loss: 2.0895 1.0601 sec/batch
Epoch 5/5  Iteration 861/890 Training loss: 2.0892 1.0397 sec/batch
Epoch 5/5  Iteration 862/890 Training loss: 2.0890 1.0104 sec/batch
Epoch 5/5  Iteration 863/890 Training loss: 2.0888 1.0383 sec/batch
Epoch 5/5  Iteration 864/890 Training loss: 2.08

Epoch 1/5  Iteration 82/890 Training loss: 3.3684 1.0291 sec/batch
Epoch 1/5  Iteration 83/890 Training loss: 3.3657 1.0027 sec/batch
Epoch 1/5  Iteration 84/890 Training loss: 3.3629 1.0075 sec/batch
Epoch 1/5  Iteration 85/890 Training loss: 3.3600 1.0071 sec/batch
Epoch 1/5  Iteration 86/890 Training loss: 3.3573 1.0266 sec/batch
Epoch 1/5  Iteration 87/890 Training loss: 3.3546 0.9831 sec/batch
Epoch 1/5  Iteration 88/890 Training loss: 3.3520 1.0160 sec/batch
Epoch 1/5  Iteration 89/890 Training loss: 3.3496 1.0265 sec/batch
Epoch 1/5  Iteration 90/890 Training loss: 3.3472 1.0355 sec/batch
Epoch 1/5  Iteration 91/890 Training loss: 3.3448 1.0157 sec/batch
Epoch 1/5  Iteration 92/890 Training loss: 3.3423 1.0420 sec/batch
Epoch 1/5  Iteration 93/890 Training loss: 3.3399 1.0662 sec/batch
Epoch 1/5  Iteration 94/890 Training loss: 3.3377 1.0521 sec/batch
Epoch 1/5  Iteration 95/890 Training loss: 3.3353 1.0452 sec/batch
Epoch 1/5  Iteration 96/890 Training loss: 3.3331 1.0849 sec/b

Epoch 2/5  Iteration 203/890 Training loss: 2.8651 1.0258 sec/batch
Epoch 2/5  Iteration 204/890 Training loss: 2.8636 1.0292 sec/batch
Epoch 2/5  Iteration 205/890 Training loss: 2.8621 1.0387 sec/batch
Epoch 2/5  Iteration 206/890 Training loss: 2.8600 1.0357 sec/batch
Epoch 2/5  Iteration 207/890 Training loss: 2.8582 1.0712 sec/batch
Epoch 2/5  Iteration 208/890 Training loss: 2.8566 1.0653 sec/batch
Epoch 2/5  Iteration 209/890 Training loss: 2.8555 1.0098 sec/batch
Epoch 2/5  Iteration 210/890 Training loss: 2.8535 1.0278 sec/batch
Epoch 2/5  Iteration 211/890 Training loss: 2.8514 1.0305 sec/batch
Epoch 2/5  Iteration 212/890 Training loss: 2.8496 1.0528 sec/batch
Epoch 2/5  Iteration 213/890 Training loss: 2.8474 1.0634 sec/batch
Epoch 2/5  Iteration 214/890 Training loss: 2.8459 1.0659 sec/batch
Epoch 2/5  Iteration 215/890 Training loss: 2.8438 1.0519 sec/batch
Epoch 2/5  Iteration 216/890 Training loss: 2.8415 1.0597 sec/batch
Epoch 2/5  Iteration 217/890 Training loss: 2.83

Epoch 2/5  Iteration 324/890 Training loss: 2.6871 1.0352 sec/batch
Epoch 2/5  Iteration 325/890 Training loss: 2.6861 1.0612 sec/batch
Epoch 2/5  Iteration 326/890 Training loss: 2.6854 1.0541 sec/batch
Epoch 2/5  Iteration 327/890 Training loss: 2.6843 1.0333 sec/batch
Epoch 2/5  Iteration 328/890 Training loss: 2.6833 1.0391 sec/batch
Epoch 2/5  Iteration 329/890 Training loss: 2.6825 1.0525 sec/batch
Epoch 2/5  Iteration 330/890 Training loss: 2.6819 1.0427 sec/batch
Epoch 2/5  Iteration 331/890 Training loss: 2.6811 1.0526 sec/batch
Epoch 2/5  Iteration 332/890 Training loss: 2.6802 1.0386 sec/batch
Epoch 2/5  Iteration 333/890 Training loss: 2.6792 1.0983 sec/batch
Epoch 2/5  Iteration 334/890 Training loss: 2.6782 1.0642 sec/batch
Epoch 2/5  Iteration 335/890 Training loss: 2.6772 1.0525 sec/batch
Epoch 2/5  Iteration 336/890 Training loss: 2.6762 1.0674 sec/batch
Epoch 2/5  Iteration 337/890 Training loss: 2.6751 1.0804 sec/batch
Epoch 2/5  Iteration 338/890 Training loss: 2.67

Epoch 3/5  Iteration 445/890 Training loss: 2.4592 1.0252 sec/batch
Epoch 3/5  Iteration 446/890 Training loss: 2.4589 1.0268 sec/batch
Epoch 3/5  Iteration 447/890 Training loss: 2.4583 1.0463 sec/batch
Epoch 3/5  Iteration 448/890 Training loss: 2.4580 1.0428 sec/batch
Epoch 3/5  Iteration 449/890 Training loss: 2.4575 1.0361 sec/batch
Epoch 3/5  Iteration 450/890 Training loss: 2.4569 1.0544 sec/batch
Epoch 3/5  Iteration 451/890 Training loss: 2.4562 1.0528 sec/batch
Epoch 3/5  Iteration 452/890 Training loss: 2.4557 1.0686 sec/batch
Epoch 3/5  Iteration 453/890 Training loss: 2.4552 1.0905 sec/batch
Epoch 3/5  Iteration 454/890 Training loss: 2.4547 1.0175 sec/batch
Epoch 3/5  Iteration 455/890 Training loss: 2.4542 1.0580 sec/batch
Epoch 3/5  Iteration 456/890 Training loss: 2.4537 1.0618 sec/batch
Epoch 3/5  Iteration 457/890 Training loss: 2.4535 1.0709 sec/batch
Epoch 3/5  Iteration 458/890 Training loss: 2.4531 1.0567 sec/batch
Epoch 3/5  Iteration 459/890 Training loss: 2.45

Epoch 4/5  Iteration 566/890 Training loss: 2.3682 1.0384 sec/batch
Epoch 4/5  Iteration 567/890 Training loss: 2.3675 1.0594 sec/batch
Epoch 4/5  Iteration 568/890 Training loss: 2.3675 1.0369 sec/batch
Epoch 4/5  Iteration 569/890 Training loss: 2.3673 1.0330 sec/batch
Epoch 4/5  Iteration 570/890 Training loss: 2.3670 1.0560 sec/batch
Epoch 4/5  Iteration 571/890 Training loss: 2.3666 1.0523 sec/batch
Epoch 4/5  Iteration 572/890 Training loss: 2.3656 1.0632 sec/batch
Epoch 4/5  Iteration 573/890 Training loss: 2.3646 1.0283 sec/batch
Epoch 4/5  Iteration 574/890 Training loss: 2.3640 1.0604 sec/batch
Epoch 4/5  Iteration 575/890 Training loss: 2.3634 1.0484 sec/batch
Epoch 4/5  Iteration 576/890 Training loss: 2.3627 1.0460 sec/batch
Epoch 4/5  Iteration 577/890 Training loss: 2.3619 1.0482 sec/batch
Epoch 4/5  Iteration 578/890 Training loss: 2.3612 1.0624 sec/batch
Epoch 4/5  Iteration 579/890 Training loss: 2.3607 1.0629 sec/batch
Epoch 4/5  Iteration 580/890 Training loss: 2.35

Epoch 4/5  Iteration 687/890 Training loss: 2.3299 1.0304 sec/batch
Epoch 4/5  Iteration 688/890 Training loss: 2.3298 1.0522 sec/batch
Epoch 4/5  Iteration 689/890 Training loss: 2.3296 1.0766 sec/batch
Epoch 4/5  Iteration 690/890 Training loss: 2.3295 1.0327 sec/batch
Epoch 4/5  Iteration 691/890 Training loss: 2.3292 1.0758 sec/batch
Epoch 4/5  Iteration 692/890 Training loss: 2.3289 1.0767 sec/batch
Epoch 4/5  Iteration 693/890 Training loss: 2.3286 1.0840 sec/batch
Epoch 4/5  Iteration 694/890 Training loss: 2.3286 1.1029 sec/batch
Epoch 4/5  Iteration 695/890 Training loss: 2.3285 1.0600 sec/batch
Epoch 4/5  Iteration 696/890 Training loss: 2.3281 1.1029 sec/batch
Epoch 4/5  Iteration 697/890 Training loss: 2.3279 1.0669 sec/batch
Epoch 4/5  Iteration 698/890 Training loss: 2.3277 1.0579 sec/batch
Epoch 4/5  Iteration 699/890 Training loss: 2.3275 1.0536 sec/batch
Epoch 4/5  Iteration 700/890 Training loss: 2.3274 1.0361 sec/batch
Epoch 4/5  Iteration 701/890 Training loss: 2.32

Epoch 5/5  Iteration 808/890 Training loss: 2.2640 1.0552 sec/batch
Epoch 5/5  Iteration 809/890 Training loss: 2.2637 1.0897 sec/batch
Epoch 5/5  Iteration 810/890 Training loss: 2.2634 1.0310 sec/batch
Epoch 5/5  Iteration 811/890 Training loss: 2.2630 1.0538 sec/batch
Epoch 5/5  Iteration 812/890 Training loss: 2.2625 1.0612 sec/batch
Epoch 5/5  Iteration 813/890 Training loss: 2.2624 1.0491 sec/batch
Epoch 5/5  Iteration 814/890 Training loss: 2.2622 1.0809 sec/batch
Epoch 5/5  Iteration 815/890 Training loss: 2.2617 1.0550 sec/batch
Epoch 5/5  Iteration 816/890 Training loss: 2.2615 1.0462 sec/batch
Epoch 5/5  Iteration 817/890 Training loss: 2.2611 1.0658 sec/batch
Epoch 5/5  Iteration 818/890 Training loss: 2.2608 1.0644 sec/batch
Epoch 5/5  Iteration 819/890 Training loss: 2.2606 1.0924 sec/batch
Epoch 5/5  Iteration 820/890 Training loss: 2.2605 1.0354 sec/batch
Epoch 5/5  Iteration 821/890 Training loss: 2.2604 1.0270 sec/batch
Epoch 5/5  Iteration 822/890 Training loss: 2.26

Epoch 1/5  Iteration 40/890 Training loss: 3.3971 1.7391 sec/batch
Epoch 1/5  Iteration 41/890 Training loss: 3.3911 1.7027 sec/batch
Epoch 1/5  Iteration 42/890 Training loss: 3.3855 1.6288 sec/batch
Epoch 1/5  Iteration 43/890 Training loss: 3.3801 1.7507 sec/batch
Epoch 1/5  Iteration 44/890 Training loss: 3.3749 1.7371 sec/batch
Epoch 1/5  Iteration 45/890 Training loss: 3.3698 1.7018 sec/batch
Epoch 1/5  Iteration 46/890 Training loss: 3.3652 1.8059 sec/batch
Epoch 1/5  Iteration 47/890 Training loss: 3.3608 1.7106 sec/batch
Epoch 1/5  Iteration 48/890 Training loss: 3.3568 1.6831 sec/batch
Epoch 1/5  Iteration 49/890 Training loss: 3.3529 1.6859 sec/batch
Epoch 1/5  Iteration 50/890 Training loss: 3.3491 1.6516 sec/batch
Epoch 1/5  Iteration 51/890 Training loss: 3.3452 1.7240 sec/batch
Epoch 1/5  Iteration 52/890 Training loss: 3.3414 1.6836 sec/batch
Epoch 1/5  Iteration 53/890 Training loss: 3.3379 1.6990 sec/batch
Epoch 1/5  Iteration 54/890 Training loss: 3.3342 1.6874 sec/b

Epoch 1/5  Iteration 162/890 Training loss: 3.1283 1.6791 sec/batch
Epoch 1/5  Iteration 163/890 Training loss: 3.1259 1.6523 sec/batch
Epoch 1/5  Iteration 164/890 Training loss: 3.1237 1.6729 sec/batch
Epoch 1/5  Iteration 165/890 Training loss: 3.1214 1.6879 sec/batch
Epoch 1/5  Iteration 166/890 Training loss: 3.1191 1.7036 sec/batch
Epoch 1/5  Iteration 167/890 Training loss: 3.1169 1.6440 sec/batch
Epoch 1/5  Iteration 168/890 Training loss: 3.1146 1.6998 sec/batch
Epoch 1/5  Iteration 169/890 Training loss: 3.1124 1.7037 sec/batch
Epoch 1/5  Iteration 170/890 Training loss: 3.1101 1.6501 sec/batch
Epoch 1/5  Iteration 171/890 Training loss: 3.1080 1.6787 sec/batch
Epoch 1/5  Iteration 172/890 Training loss: 3.1060 1.6796 sec/batch
Epoch 1/5  Iteration 173/890 Training loss: 3.1039 1.6788 sec/batch
Epoch 1/5  Iteration 174/890 Training loss: 3.1020 1.6757 sec/batch
Epoch 1/5  Iteration 175/890 Training loss: 3.1000 1.7509 sec/batch
Epoch 1/5  Iteration 176/890 Training loss: 3.09

Epoch 2/5  Iteration 283/890 Training loss: 2.5567 1.7828 sec/batch
Epoch 2/5  Iteration 284/890 Training loss: 2.5558 1.6818 sec/batch
Epoch 2/5  Iteration 285/890 Training loss: 2.5548 1.6695 sec/batch
Epoch 2/5  Iteration 286/890 Training loss: 2.5540 1.7772 sec/batch
Epoch 2/5  Iteration 287/890 Training loss: 2.5532 1.7325 sec/batch
Epoch 2/5  Iteration 288/890 Training loss: 2.5521 1.7047 sec/batch
Epoch 2/5  Iteration 289/890 Training loss: 2.5513 1.7511 sec/batch
Epoch 2/5  Iteration 290/890 Training loss: 2.5506 1.7851 sec/batch
Epoch 2/5  Iteration 291/890 Training loss: 2.5497 1.6931 sec/batch
Epoch 2/5  Iteration 292/890 Training loss: 2.5489 1.6718 sec/batch
Epoch 2/5  Iteration 293/890 Training loss: 2.5480 1.6795 sec/batch
Epoch 2/5  Iteration 294/890 Training loss: 2.5469 1.7811 sec/batch
Epoch 2/5  Iteration 295/890 Training loss: 2.5460 1.7063 sec/batch
Epoch 2/5  Iteration 296/890 Training loss: 2.5453 1.7295 sec/batch
Epoch 2/5  Iteration 297/890 Training loss: 2.54

Epoch 3/5  Iteration 404/890 Training loss: 2.3591 1.7761 sec/batch
Epoch 3/5  Iteration 405/890 Training loss: 2.3585 2.2746 sec/batch
Epoch 3/5  Iteration 406/890 Training loss: 2.3586 1.6119 sec/batch
Epoch 3/5  Iteration 407/890 Training loss: 2.3579 1.5992 sec/batch
Epoch 3/5  Iteration 408/890 Training loss: 2.3577 1.8076 sec/batch
Epoch 3/5  Iteration 409/890 Training loss: 2.3572 1.3795 sec/batch
Epoch 3/5  Iteration 410/890 Training loss: 2.3566 1.4312 sec/batch
Epoch 3/5  Iteration 411/890 Training loss: 2.3560 1.2909 sec/batch
Epoch 3/5  Iteration 412/890 Training loss: 2.3557 1.2542 sec/batch
Epoch 3/5  Iteration 413/890 Training loss: 2.3553 1.2175 sec/batch
Epoch 3/5  Iteration 414/890 Training loss: 2.3547 1.3650 sec/batch
Epoch 3/5  Iteration 415/890 Training loss: 2.3540 1.2705 sec/batch
Epoch 3/5  Iteration 416/890 Training loss: 2.3539 1.3411 sec/batch
Epoch 3/5  Iteration 417/890 Training loss: 2.3534 1.2033 sec/batch
Epoch 3/5  Iteration 418/890 Training loss: 2.35

Epoch 3/5  Iteration 525/890 Training loss: 2.3117 1.6547 sec/batch
Epoch 3/5  Iteration 526/890 Training loss: 2.3113 1.5334 sec/batch
Epoch 3/5  Iteration 527/890 Training loss: 2.3111 1.5695 sec/batch
Epoch 3/5  Iteration 528/890 Training loss: 2.3109 1.4013 sec/batch
Epoch 3/5  Iteration 529/890 Training loss: 2.3109 1.6765 sec/batch
Epoch 3/5  Iteration 530/890 Training loss: 2.3109 1.5985 sec/batch
Epoch 3/5  Iteration 531/890 Training loss: 2.3108 1.5367 sec/batch
Epoch 3/5  Iteration 532/890 Training loss: 2.3105 1.8375 sec/batch
Epoch 3/5  Iteration 533/890 Training loss: 2.3101 1.7065 sec/batch
Epoch 3/5  Iteration 534/890 Training loss: 2.3096 1.4881 sec/batch
Epoch 4/5  Iteration 535/890 Training loss: 2.2944 1.2141 sec/batch
Epoch 4/5  Iteration 536/890 Training loss: 2.2614 1.2586 sec/batch
Epoch 4/5  Iteration 537/890 Training loss: 2.2501 1.5904 sec/batch
Epoch 4/5  Iteration 538/890 Training loss: 2.2478 1.6913 sec/batch
Epoch 4/5  Iteration 539/890 Training loss: 2.24

Epoch 4/5  Iteration 646/890 Training loss: 2.2159 1.3136 sec/batch
Epoch 4/5  Iteration 647/890 Training loss: 2.2156 1.3445 sec/batch
Epoch 4/5  Iteration 648/890 Training loss: 2.2153 1.2132 sec/batch
Epoch 4/5  Iteration 649/890 Training loss: 2.2149 1.1755 sec/batch
Epoch 4/5  Iteration 650/890 Training loss: 2.2144 1.1955 sec/batch
Epoch 4/5  Iteration 651/890 Training loss: 2.2141 1.1868 sec/batch
Epoch 4/5  Iteration 652/890 Training loss: 2.2139 1.1738 sec/batch
Epoch 4/5  Iteration 653/890 Training loss: 2.2138 1.1888 sec/batch
Epoch 4/5  Iteration 654/890 Training loss: 2.2135 1.1800 sec/batch
Epoch 4/5  Iteration 655/890 Training loss: 2.2135 1.1859 sec/batch
Epoch 4/5  Iteration 656/890 Training loss: 2.2132 1.1782 sec/batch
Epoch 4/5  Iteration 657/890 Training loss: 2.2129 1.1751 sec/batch
Epoch 4/5  Iteration 658/890 Training loss: 2.2128 1.1841 sec/batch
Epoch 4/5  Iteration 659/890 Training loss: 2.2125 1.1931 sec/batch
Epoch 4/5  Iteration 660/890 Training loss: 2.21

Epoch 5/5  Iteration 767/890 Training loss: 2.1481 1.1745 sec/batch
Epoch 5/5  Iteration 768/890 Training loss: 2.1478 1.1784 sec/batch
Epoch 5/5  Iteration 769/890 Training loss: 2.1479 1.1774 sec/batch
Epoch 5/5  Iteration 770/890 Training loss: 2.1475 1.2027 sec/batch
Epoch 5/5  Iteration 771/890 Training loss: 2.1471 1.1653 sec/batch
Epoch 5/5  Iteration 772/890 Training loss: 2.1469 1.1755 sec/batch
Epoch 5/5  Iteration 773/890 Training loss: 2.1466 1.1677 sec/batch
Epoch 5/5  Iteration 774/890 Training loss: 2.1469 1.1794 sec/batch
Epoch 5/5  Iteration 775/890 Training loss: 2.1470 1.1924 sec/batch
Epoch 5/5  Iteration 776/890 Training loss: 2.1468 1.1897 sec/batch
Epoch 5/5  Iteration 777/890 Training loss: 2.1464 1.1824 sec/batch
Epoch 5/5  Iteration 778/890 Training loss: 2.1465 1.1626 sec/batch
Epoch 5/5  Iteration 779/890 Training loss: 2.1464 1.1565 sec/batch
Epoch 5/5  Iteration 780/890 Training loss: 2.1456 1.1674 sec/batch
Epoch 5/5  Iteration 781/890 Training loss: 2.14

Epoch 5/5  Iteration 888/890 Training loss: 2.1227 1.1874 sec/batch
Epoch 5/5  Iteration 889/890 Training loss: 2.1224 1.1690 sec/batch
Epoch 5/5  Iteration 890/890 Training loss: 2.1222 1.1794 sec/batch
Epoch 1/5  Iteration 1/890 Training loss: 4.4191 1.2571 sec/batch
Epoch 1/5  Iteration 2/890 Training loss: 4.4120 1.1711 sec/batch
Epoch 1/5  Iteration 3/890 Training loss: 4.4038 1.1851 sec/batch
Epoch 1/5  Iteration 4/890 Training loss: 4.3931 1.2008 sec/batch
Epoch 1/5  Iteration 5/890 Training loss: 4.3781 1.1650 sec/batch
Epoch 1/5  Iteration 6/890 Training loss: 4.3548 1.1660 sec/batch
Epoch 1/5  Iteration 7/890 Training loss: 4.3162 1.1593 sec/batch
Epoch 1/5  Iteration 8/890 Training loss: 4.2600 1.1825 sec/batch
Epoch 1/5  Iteration 9/890 Training loss: 4.1992 1.1747 sec/batch
Epoch 1/5  Iteration 10/890 Training loss: 4.1443 1.3713 sec/batch
Epoch 1/5  Iteration 11/890 Training loss: 4.0948 1.2405 sec/batch
Epoch 1/5  Iteration 12/890 Training loss: 4.0507 1.1935 sec/batch
E

Epoch 1/5  Iteration 121/890 Training loss: 3.2943 1.1758 sec/batch
Epoch 1/5  Iteration 122/890 Training loss: 3.2931 1.1736 sec/batch
Epoch 1/5  Iteration 123/890 Training loss: 3.2919 1.1842 sec/batch
Epoch 1/5  Iteration 124/890 Training loss: 3.2908 1.1808 sec/batch
Epoch 1/5  Iteration 125/890 Training loss: 3.2895 1.1952 sec/batch
Epoch 1/5  Iteration 126/890 Training loss: 3.2881 1.1738 sec/batch
Epoch 1/5  Iteration 127/890 Training loss: 3.2870 1.1890 sec/batch
Epoch 1/5  Iteration 128/890 Training loss: 3.2859 1.1833 sec/batch
Epoch 1/5  Iteration 129/890 Training loss: 3.2847 1.1969 sec/batch
Epoch 1/5  Iteration 130/890 Training loss: 3.2835 1.1857 sec/batch
Epoch 1/5  Iteration 131/890 Training loss: 3.2825 1.1790 sec/batch
Epoch 1/5  Iteration 132/890 Training loss: 3.2813 1.1603 sec/batch
Epoch 1/5  Iteration 133/890 Training loss: 3.2802 1.1937 sec/batch
Epoch 1/5  Iteration 134/890 Training loss: 3.2790 1.1857 sec/batch
Epoch 1/5  Iteration 135/890 Training loss: 3.27

Epoch 2/5  Iteration 242/890 Training loss: 2.8767 1.1774 sec/batch
Epoch 2/5  Iteration 243/890 Training loss: 2.8743 1.1782 sec/batch
Epoch 2/5  Iteration 244/890 Training loss: 2.8726 1.1786 sec/batch
Epoch 2/5  Iteration 245/890 Training loss: 2.8704 1.1892 sec/batch
Epoch 2/5  Iteration 246/890 Training loss: 2.8677 1.1846 sec/batch
Epoch 2/5  Iteration 247/890 Training loss: 2.8652 1.1574 sec/batch
Epoch 2/5  Iteration 248/890 Training loss: 2.8633 1.1990 sec/batch
Epoch 2/5  Iteration 249/890 Training loss: 2.8613 1.1658 sec/batch
Epoch 2/5  Iteration 250/890 Training loss: 2.8596 1.1782 sec/batch
Epoch 2/5  Iteration 251/890 Training loss: 2.8576 1.1812 sec/batch
Epoch 2/5  Iteration 252/890 Training loss: 2.8557 1.1786 sec/batch
Epoch 2/5  Iteration 253/890 Training loss: 2.8538 1.1788 sec/batch
Epoch 2/5  Iteration 254/890 Training loss: 2.8523 1.1635 sec/batch
Epoch 2/5  Iteration 255/890 Training loss: 2.8504 1.2069 sec/batch
Epoch 2/5  Iteration 256/890 Training loss: 2.84

Epoch 3/5  Iteration 363/890 Training loss: 2.5367 1.1745 sec/batch
Epoch 3/5  Iteration 364/890 Training loss: 2.5370 1.1989 sec/batch
Epoch 3/5  Iteration 365/890 Training loss: 2.5374 1.2198 sec/batch
Epoch 3/5  Iteration 366/890 Training loss: 2.5359 1.2062 sec/batch
Epoch 3/5  Iteration 367/890 Training loss: 2.5343 1.1882 sec/batch
Epoch 3/5  Iteration 368/890 Training loss: 2.5339 1.2007 sec/batch
Epoch 3/5  Iteration 369/890 Training loss: 2.5332 1.1746 sec/batch
Epoch 3/5  Iteration 370/890 Training loss: 2.5344 1.2217 sec/batch
Epoch 3/5  Iteration 371/890 Training loss: 2.5337 1.2070 sec/batch
Epoch 3/5  Iteration 372/890 Training loss: 2.5337 1.1848 sec/batch
Epoch 3/5  Iteration 373/890 Training loss: 2.5328 1.1794 sec/batch
Epoch 3/5  Iteration 374/890 Training loss: 2.5344 1.1830 sec/batch
Epoch 3/5  Iteration 375/890 Training loss: 2.5339 1.1978 sec/batch
Epoch 3/5  Iteration 376/890 Training loss: 2.5322 1.2028 sec/batch
Epoch 3/5  Iteration 377/890 Training loss: 2.53

Epoch 3/5  Iteration 484/890 Training loss: 2.4792 1.1547 sec/batch
Epoch 3/5  Iteration 485/890 Training loss: 2.4789 1.1757 sec/batch
Epoch 3/5  Iteration 486/890 Training loss: 2.4787 1.1882 sec/batch
Epoch 3/5  Iteration 487/890 Training loss: 2.4783 1.1899 sec/batch
Epoch 3/5  Iteration 488/890 Training loss: 2.4779 1.1781 sec/batch
Epoch 3/5  Iteration 489/890 Training loss: 2.4776 1.1796 sec/batch
Epoch 3/5  Iteration 490/890 Training loss: 2.4775 1.1769 sec/batch
Epoch 3/5  Iteration 491/890 Training loss: 2.4769 1.1956 sec/batch
Epoch 3/5  Iteration 492/890 Training loss: 2.4766 1.1935 sec/batch
Epoch 3/5  Iteration 493/890 Training loss: 2.4763 1.1759 sec/batch
Epoch 3/5  Iteration 494/890 Training loss: 2.4760 1.1719 sec/batch
Epoch 3/5  Iteration 495/890 Training loss: 2.4757 1.1815 sec/batch
Epoch 3/5  Iteration 496/890 Training loss: 2.4754 1.1854 sec/batch
Epoch 3/5  Iteration 497/890 Training loss: 2.4752 1.1898 sec/batch
Epoch 3/5  Iteration 498/890 Training loss: 2.47

Epoch 4/5  Iteration 605/890 Training loss: 2.3867 1.1615 sec/batch
Epoch 4/5  Iteration 606/890 Training loss: 2.3866 1.2116 sec/batch
Epoch 4/5  Iteration 607/890 Training loss: 2.3865 1.2325 sec/batch
Epoch 4/5  Iteration 608/890 Training loss: 2.3860 1.2076 sec/batch
Epoch 4/5  Iteration 609/890 Training loss: 2.3856 1.2141 sec/batch
Epoch 4/5  Iteration 610/890 Training loss: 2.3858 1.1880 sec/batch
Epoch 4/5  Iteration 611/890 Training loss: 2.3854 1.1975 sec/batch
Epoch 4/5  Iteration 612/890 Training loss: 2.3852 1.2030 sec/batch
Epoch 4/5  Iteration 613/890 Training loss: 2.3847 1.1760 sec/batch
Epoch 4/5  Iteration 614/890 Training loss: 2.3842 1.1759 sec/batch
Epoch 4/5  Iteration 615/890 Training loss: 2.3837 1.1706 sec/batch
Epoch 4/5  Iteration 616/890 Training loss: 2.3836 1.1992 sec/batch
Epoch 4/5  Iteration 617/890 Training loss: 2.3832 1.1863 sec/batch
Epoch 4/5  Iteration 618/890 Training loss: 2.3827 1.1981 sec/batch
Epoch 4/5  Iteration 619/890 Training loss: 2.38

Epoch 5/5  Iteration 726/890 Training loss: 2.3159 1.2050 sec/batch
Epoch 5/5  Iteration 727/890 Training loss: 2.3161 1.1840 sec/batch
Epoch 5/5  Iteration 728/890 Training loss: 2.3161 1.1656 sec/batch
Epoch 5/5  Iteration 729/890 Training loss: 2.3161 1.1655 sec/batch
Epoch 5/5  Iteration 730/890 Training loss: 2.3174 1.1807 sec/batch
Epoch 5/5  Iteration 731/890 Training loss: 2.3175 1.1827 sec/batch
Epoch 5/5  Iteration 732/890 Training loss: 2.3165 1.1801 sec/batch
Epoch 5/5  Iteration 733/890 Training loss: 2.3158 1.1849 sec/batch
Epoch 5/5  Iteration 734/890 Training loss: 2.3172 1.2107 sec/batch
Epoch 5/5  Iteration 735/890 Training loss: 2.3170 1.1906 sec/batch
Epoch 5/5  Iteration 736/890 Training loss: 2.3164 1.1583 sec/batch
Epoch 5/5  Iteration 737/890 Training loss: 2.3156 1.1783 sec/batch
Epoch 5/5  Iteration 738/890 Training loss: 2.3154 1.1767 sec/batch
Epoch 5/5  Iteration 739/890 Training loss: 2.3149 1.1732 sec/batch
Epoch 5/5  Iteration 740/890 Training loss: 2.31

Epoch 5/5  Iteration 847/890 Training loss: 2.2863 1.1772 sec/batch
Epoch 5/5  Iteration 848/890 Training loss: 2.2862 1.1927 sec/batch
Epoch 5/5  Iteration 849/890 Training loss: 2.2860 1.1781 sec/batch
Epoch 5/5  Iteration 850/890 Training loss: 2.2859 1.1897 sec/batch
Epoch 5/5  Iteration 851/890 Training loss: 2.2859 1.1931 sec/batch
Epoch 5/5  Iteration 852/890 Training loss: 2.2855 1.1776 sec/batch
Epoch 5/5  Iteration 853/890 Training loss: 2.2855 1.2030 sec/batch
Epoch 5/5  Iteration 854/890 Training loss: 2.2852 1.1791 sec/batch
Epoch 5/5  Iteration 855/890 Training loss: 2.2851 1.1687 sec/batch
Epoch 5/5  Iteration 856/890 Training loss: 2.2849 1.1762 sec/batch
Epoch 5/5  Iteration 857/890 Training loss: 2.2847 1.1811 sec/batch
Epoch 5/5  Iteration 858/890 Training loss: 2.2847 1.2843 sec/batch
Epoch 5/5  Iteration 859/890 Training loss: 2.2846 1.2265 sec/batch
Epoch 5/5  Iteration 860/890 Training loss: 2.2846 1.2192 sec/batch
Epoch 5/5  Iteration 861/890 Training loss: 2.28

Epoch 1/5  Iteration 79/890 Training loss: 3.1518 1.6160 sec/batch
Epoch 1/5  Iteration 80/890 Training loss: 3.1476 1.6090 sec/batch
Epoch 1/5  Iteration 81/890 Training loss: 3.1435 1.5959 sec/batch
Epoch 1/5  Iteration 82/890 Training loss: 3.1396 1.6056 sec/batch
Epoch 1/5  Iteration 83/890 Training loss: 3.1355 1.6156 sec/batch
Epoch 1/5  Iteration 84/890 Training loss: 3.1313 1.6072 sec/batch
Epoch 1/5  Iteration 85/890 Training loss: 3.1268 1.6384 sec/batch
Epoch 1/5  Iteration 86/890 Training loss: 3.1224 1.5832 sec/batch
Epoch 1/5  Iteration 87/890 Training loss: 3.1181 1.6050 sec/batch
Epoch 1/5  Iteration 88/890 Training loss: 3.1137 1.6296 sec/batch
Epoch 1/5  Iteration 89/890 Training loss: 3.1093 1.6046 sec/batch
Epoch 1/5  Iteration 90/890 Training loss: 3.1051 1.5699 sec/batch
Epoch 1/5  Iteration 91/890 Training loss: 3.1007 1.5856 sec/batch
Epoch 1/5  Iteration 92/890 Training loss: 3.0964 1.6047 sec/batch
Epoch 1/5  Iteration 93/890 Training loss: 3.0922 1.6162 sec/b

Epoch 2/5  Iteration 200/890 Training loss: 2.3032 1.6254 sec/batch
Epoch 2/5  Iteration 201/890 Training loss: 2.3017 1.6342 sec/batch
Epoch 2/5  Iteration 202/890 Training loss: 2.3000 1.6116 sec/batch
Epoch 2/5  Iteration 203/890 Training loss: 2.2986 1.6076 sec/batch
Epoch 2/5  Iteration 204/890 Training loss: 2.2973 1.6358 sec/batch
Epoch 2/5  Iteration 205/890 Training loss: 2.2958 1.6512 sec/batch
Epoch 2/5  Iteration 206/890 Training loss: 2.2949 1.6147 sec/batch
Epoch 2/5  Iteration 207/890 Training loss: 2.2947 1.6538 sec/batch
Epoch 2/5  Iteration 208/890 Training loss: 2.2941 1.6049 sec/batch
Epoch 2/5  Iteration 209/890 Training loss: 2.2933 1.6189 sec/batch
Epoch 2/5  Iteration 210/890 Training loss: 2.2919 1.6105 sec/batch
Epoch 2/5  Iteration 211/890 Training loss: 2.2906 1.6351 sec/batch
Epoch 2/5  Iteration 212/890 Training loss: 2.2899 1.6126 sec/batch
Epoch 2/5  Iteration 213/890 Training loss: 2.2885 1.6252 sec/batch
Epoch 2/5  Iteration 214/890 Training loss: 2.28

Epoch 2/5  Iteration 321/890 Training loss: 2.2006 1.6287 sec/batch
Epoch 2/5  Iteration 322/890 Training loss: 2.1999 1.6147 sec/batch
Epoch 2/5  Iteration 323/890 Training loss: 2.1992 1.6220 sec/batch
Epoch 2/5  Iteration 324/890 Training loss: 2.1987 1.6260 sec/batch
Epoch 2/5  Iteration 325/890 Training loss: 2.1981 1.6171 sec/batch
Epoch 2/5  Iteration 326/890 Training loss: 2.1976 1.6269 sec/batch
Epoch 2/5  Iteration 327/890 Training loss: 2.1968 1.6412 sec/batch
Epoch 2/5  Iteration 328/890 Training loss: 2.1960 1.6146 sec/batch
Epoch 2/5  Iteration 329/890 Training loss: 2.1952 1.6363 sec/batch
Epoch 2/5  Iteration 330/890 Training loss: 2.1948 1.6085 sec/batch
Epoch 2/5  Iteration 331/890 Training loss: 2.1941 1.5878 sec/batch
Epoch 2/5  Iteration 332/890 Training loss: 2.1935 1.6056 sec/batch
Epoch 2/5  Iteration 333/890 Training loss: 2.1928 1.6191 sec/batch
Epoch 2/5  Iteration 334/890 Training loss: 2.1921 1.5960 sec/batch
Epoch 2/5  Iteration 335/890 Training loss: 2.19

Epoch 3/5  Iteration 442/890 Training loss: 2.0138 1.6224 sec/batch
Epoch 3/5  Iteration 443/890 Training loss: 2.0132 1.6301 sec/batch
Epoch 3/5  Iteration 444/890 Training loss: 2.0125 1.6164 sec/batch
Epoch 3/5  Iteration 445/890 Training loss: 2.0117 1.6197 sec/batch
Epoch 3/5  Iteration 446/890 Training loss: 2.0114 1.6085 sec/batch
Epoch 3/5  Iteration 447/890 Training loss: 2.0107 1.6120 sec/batch
Epoch 3/5  Iteration 448/890 Training loss: 2.0102 1.6089 sec/batch
Epoch 3/5  Iteration 449/890 Training loss: 2.0093 1.6321 sec/batch
Epoch 3/5  Iteration 450/890 Training loss: 2.0086 1.6262 sec/batch
Epoch 3/5  Iteration 451/890 Training loss: 2.0078 1.6487 sec/batch
Epoch 3/5  Iteration 452/890 Training loss: 2.0073 1.6368 sec/batch
Epoch 3/5  Iteration 453/890 Training loss: 2.0067 1.6183 sec/batch
Epoch 3/5  Iteration 454/890 Training loss: 2.0060 1.6157 sec/batch
Epoch 3/5  Iteration 455/890 Training loss: 2.0053 1.5901 sec/batch
Epoch 3/5  Iteration 456/890 Training loss: 2.00

Epoch 4/5  Iteration 563/890 Training loss: 1.8913 1.6383 sec/batch
Epoch 4/5  Iteration 564/890 Training loss: 1.8916 1.6434 sec/batch
Epoch 4/5  Iteration 565/890 Training loss: 1.8912 1.6117 sec/batch
Epoch 4/5  Iteration 566/890 Training loss: 1.8901 1.6321 sec/batch
Epoch 4/5  Iteration 567/890 Training loss: 1.8899 1.6200 sec/batch
Epoch 4/5  Iteration 568/890 Training loss: 1.8904 1.6006 sec/batch
Epoch 4/5  Iteration 569/890 Training loss: 1.8899 1.6314 sec/batch
Epoch 4/5  Iteration 570/890 Training loss: 1.8894 1.6386 sec/batch
Epoch 4/5  Iteration 571/890 Training loss: 1.8886 1.6226 sec/batch
Epoch 4/5  Iteration 572/890 Training loss: 1.8870 1.6320 sec/batch
Epoch 4/5  Iteration 573/890 Training loss: 1.8856 1.6209 sec/batch
Epoch 4/5  Iteration 574/890 Training loss: 1.8847 1.6265 sec/batch
Epoch 4/5  Iteration 575/890 Training loss: 1.8841 1.6357 sec/batch
Epoch 4/5  Iteration 576/890 Training loss: 1.8842 1.6380 sec/batch
Epoch 4/5  Iteration 577/890 Training loss: 1.88

Epoch 4/5  Iteration 684/890 Training loss: 1.8456 1.6169 sec/batch
Epoch 4/5  Iteration 685/890 Training loss: 1.8451 1.6164 sec/batch
Epoch 4/5  Iteration 686/890 Training loss: 1.8450 1.6251 sec/batch
Epoch 4/5  Iteration 687/890 Training loss: 1.8448 1.6196 sec/batch
Epoch 4/5  Iteration 688/890 Training loss: 1.8446 1.6060 sec/batch
Epoch 4/5  Iteration 689/890 Training loss: 1.8444 1.6087 sec/batch
Epoch 4/5  Iteration 690/890 Training loss: 1.8442 1.5952 sec/batch
Epoch 4/5  Iteration 691/890 Training loss: 1.8440 1.6369 sec/batch
Epoch 4/5  Iteration 692/890 Training loss: 1.8436 1.6175 sec/batch
Epoch 4/5  Iteration 693/890 Training loss: 1.8431 1.6114 sec/batch
Epoch 4/5  Iteration 694/890 Training loss: 1.8431 1.6326 sec/batch
Epoch 4/5  Iteration 695/890 Training loss: 1.8430 1.6283 sec/batch
Epoch 4/5  Iteration 696/890 Training loss: 1.8427 1.6087 sec/batch
Epoch 4/5  Iteration 697/890 Training loss: 1.8425 1.6356 sec/batch
Epoch 4/5  Iteration 698/890 Training loss: 1.84

Epoch 5/5  Iteration 805/890 Training loss: 1.7670 1.6010 sec/batch
Epoch 5/5  Iteration 806/890 Training loss: 1.7665 1.6112 sec/batch
Epoch 5/5  Iteration 807/890 Training loss: 1.7659 1.6008 sec/batch
Epoch 5/5  Iteration 808/890 Training loss: 1.7656 1.6139 sec/batch
Epoch 5/5  Iteration 809/890 Training loss: 1.7653 1.6029 sec/batch
Epoch 5/5  Iteration 810/890 Training loss: 1.7647 1.6037 sec/batch
Epoch 5/5  Iteration 811/890 Training loss: 1.7643 1.6329 sec/batch
Epoch 5/5  Iteration 812/890 Training loss: 1.7636 1.6281 sec/batch
Epoch 5/5  Iteration 813/890 Training loss: 1.7634 1.6194 sec/batch
Epoch 5/5  Iteration 814/890 Training loss: 1.7631 1.6306 sec/batch
Epoch 5/5  Iteration 815/890 Training loss: 1.7627 1.6239 sec/batch
Epoch 5/5  Iteration 816/890 Training loss: 1.7624 1.5963 sec/batch
Epoch 5/5  Iteration 817/890 Training loss: 1.7620 1.6148 sec/batch
Epoch 5/5  Iteration 818/890 Training loss: 1.7617 1.6283 sec/batch
Epoch 5/5  Iteration 819/890 Training loss: 1.76

Epoch 1/5  Iteration 37/890 Training loss: 3.4578 1.6258 sec/batch
Epoch 1/5  Iteration 38/890 Training loss: 3.4501 1.6277 sec/batch
Epoch 1/5  Iteration 39/890 Training loss: 3.4427 1.6065 sec/batch
Epoch 1/5  Iteration 40/890 Training loss: 3.4359 1.7375 sec/batch
Epoch 1/5  Iteration 41/890 Training loss: 3.4291 1.6346 sec/batch
Epoch 1/5  Iteration 42/890 Training loss: 3.4226 1.5984 sec/batch
Epoch 1/5  Iteration 43/890 Training loss: 3.4164 1.6222 sec/batch
Epoch 1/5  Iteration 44/890 Training loss: 3.4104 1.6233 sec/batch
Epoch 1/5  Iteration 45/890 Training loss: 3.4044 1.6243 sec/batch
Epoch 1/5  Iteration 46/890 Training loss: 3.3991 1.5948 sec/batch
Epoch 1/5  Iteration 47/890 Training loss: 3.3940 1.6232 sec/batch
Epoch 1/5  Iteration 48/890 Training loss: 3.3891 1.6313 sec/batch
Epoch 1/5  Iteration 49/890 Training loss: 3.3845 1.6063 sec/batch
Epoch 1/5  Iteration 50/890 Training loss: 3.3799 1.5994 sec/batch
Epoch 1/5  Iteration 51/890 Training loss: 3.3752 1.6182 sec/b

Epoch 1/5  Iteration 159/890 Training loss: 3.1104 1.6289 sec/batch
Epoch 1/5  Iteration 160/890 Training loss: 3.1079 1.6201 sec/batch
Epoch 1/5  Iteration 161/890 Training loss: 3.1055 1.6133 sec/batch
Epoch 1/5  Iteration 162/890 Training loss: 3.1030 1.6080 sec/batch
Epoch 1/5  Iteration 163/890 Training loss: 3.1003 1.6330 sec/batch
Epoch 1/5  Iteration 164/890 Training loss: 3.0979 1.6101 sec/batch
Epoch 1/5  Iteration 165/890 Training loss: 3.0954 1.6035 sec/batch
Epoch 1/5  Iteration 166/890 Training loss: 3.0930 1.6213 sec/batch
Epoch 1/5  Iteration 167/890 Training loss: 3.0905 1.6468 sec/batch
Epoch 1/5  Iteration 168/890 Training loss: 3.0881 1.6118 sec/batch
Epoch 1/5  Iteration 169/890 Training loss: 3.0856 1.6062 sec/batch
Epoch 1/5  Iteration 170/890 Training loss: 3.0831 1.6015 sec/batch
Epoch 1/5  Iteration 171/890 Training loss: 3.0807 1.6021 sec/batch
Epoch 1/5  Iteration 172/890 Training loss: 3.0784 1.6136 sec/batch
Epoch 1/5  Iteration 173/890 Training loss: 3.07

Epoch 2/5  Iteration 280/890 Training loss: 2.4665 1.6027 sec/batch
Epoch 2/5  Iteration 281/890 Training loss: 2.4653 1.5973 sec/batch
Epoch 2/5  Iteration 282/890 Training loss: 2.4641 1.6136 sec/batch
Epoch 2/5  Iteration 283/890 Training loss: 2.4630 1.6473 sec/batch
Epoch 2/5  Iteration 284/890 Training loss: 2.4620 1.6321 sec/batch
Epoch 2/5  Iteration 285/890 Training loss: 2.4609 1.6329 sec/batch
Epoch 2/5  Iteration 286/890 Training loss: 2.4601 1.6294 sec/batch
Epoch 2/5  Iteration 287/890 Training loss: 2.4591 1.6353 sec/batch
Epoch 2/5  Iteration 288/890 Training loss: 2.4579 1.6143 sec/batch
Epoch 2/5  Iteration 289/890 Training loss: 2.4569 1.6327 sec/batch
Epoch 2/5  Iteration 290/890 Training loss: 2.4560 1.6023 sec/batch
Epoch 2/5  Iteration 291/890 Training loss: 2.4550 1.6014 sec/batch
Epoch 2/5  Iteration 292/890 Training loss: 2.4539 1.6195 sec/batch
Epoch 2/5  Iteration 293/890 Training loss: 2.4529 1.5939 sec/batch
Epoch 2/5  Iteration 294/890 Training loss: 2.45

Epoch 3/5  Iteration 401/890 Training loss: 2.2508 1.6177 sec/batch
Epoch 3/5  Iteration 402/890 Training loss: 2.2491 1.6311 sec/batch
Epoch 3/5  Iteration 403/890 Training loss: 2.2490 1.6084 sec/batch
Epoch 3/5  Iteration 404/890 Training loss: 2.2482 1.6172 sec/batch
Epoch 3/5  Iteration 405/890 Training loss: 2.2476 1.5978 sec/batch
Epoch 3/5  Iteration 406/890 Training loss: 2.2476 1.6154 sec/batch
Epoch 3/5  Iteration 407/890 Training loss: 2.2467 1.6130 sec/batch
Epoch 3/5  Iteration 408/890 Training loss: 2.2467 1.6260 sec/batch
Epoch 3/5  Iteration 409/890 Training loss: 2.2462 1.6982 sec/batch
Epoch 3/5  Iteration 410/890 Training loss: 2.2455 1.6616 sec/batch
Epoch 3/5  Iteration 411/890 Training loss: 2.2449 1.6236 sec/batch
Epoch 3/5  Iteration 412/890 Training loss: 2.2446 1.6286 sec/batch
Epoch 3/5  Iteration 413/890 Training loss: 2.2442 1.6308 sec/batch
Epoch 3/5  Iteration 414/890 Training loss: 2.2436 1.6504 sec/batch
Epoch 3/5  Iteration 415/890 Training loss: 2.24

Epoch 3/5  Iteration 522/890 Training loss: 2.2004 1.6025 sec/batch
Epoch 3/5  Iteration 523/890 Training loss: 2.2001 1.6165 sec/batch
Epoch 3/5  Iteration 524/890 Training loss: 2.1999 1.5991 sec/batch
Epoch 3/5  Iteration 525/890 Training loss: 2.1997 1.6021 sec/batch
Epoch 3/5  Iteration 526/890 Training loss: 2.1993 1.6010 sec/batch
Epoch 3/5  Iteration 527/890 Training loss: 2.1989 1.6401 sec/batch
Epoch 3/5  Iteration 528/890 Training loss: 2.1985 1.6093 sec/batch
Epoch 3/5  Iteration 529/890 Training loss: 2.1984 1.5874 sec/batch
Epoch 3/5  Iteration 530/890 Training loss: 2.1981 1.5815 sec/batch
Epoch 3/5  Iteration 531/890 Training loss: 2.1979 1.6107 sec/batch
Epoch 3/5  Iteration 532/890 Training loss: 2.1976 1.5987 sec/batch
Epoch 3/5  Iteration 533/890 Training loss: 2.1971 1.6084 sec/batch
Epoch 3/5  Iteration 534/890 Training loss: 2.1968 1.6266 sec/batch
Epoch 4/5  Iteration 535/890 Training loss: 2.2161 1.6266 sec/batch
Epoch 4/5  Iteration 536/890 Training loss: 2.16

Epoch 4/5  Iteration 643/890 Training loss: 2.1003 1.6414 sec/batch
Epoch 4/5  Iteration 644/890 Training loss: 2.1000 1.6084 sec/batch
Epoch 4/5  Iteration 645/890 Training loss: 2.0997 1.6190 sec/batch
Epoch 4/5  Iteration 646/890 Training loss: 2.0995 1.6487 sec/batch
Epoch 4/5  Iteration 647/890 Training loss: 2.0991 1.6196 sec/batch
Epoch 4/5  Iteration 648/890 Training loss: 2.0987 1.6256 sec/batch
Epoch 4/5  Iteration 649/890 Training loss: 2.0983 1.6213 sec/batch
Epoch 4/5  Iteration 650/890 Training loss: 2.0976 1.6333 sec/batch
Epoch 4/5  Iteration 651/890 Training loss: 2.0974 1.6183 sec/batch
Epoch 4/5  Iteration 652/890 Training loss: 2.0971 1.6189 sec/batch
Epoch 4/5  Iteration 653/890 Training loss: 2.0970 1.6306 sec/batch
Epoch 4/5  Iteration 654/890 Training loss: 2.0966 1.6246 sec/batch
Epoch 4/5  Iteration 655/890 Training loss: 2.0965 1.6257 sec/batch
Epoch 4/5  Iteration 656/890 Training loss: 2.0961 1.6320 sec/batch
Epoch 4/5  Iteration 657/890 Training loss: 2.09

Epoch 5/5  Iteration 764/890 Training loss: 2.0204 1.6041 sec/batch
Epoch 5/5  Iteration 765/890 Training loss: 2.0201 1.6265 sec/batch
Epoch 5/5  Iteration 766/890 Training loss: 2.0198 1.6051 sec/batch
Epoch 5/5  Iteration 767/890 Training loss: 2.0194 1.6321 sec/batch
Epoch 5/5  Iteration 768/890 Training loss: 2.0196 1.5916 sec/batch
Epoch 5/5  Iteration 769/890 Training loss: 2.0195 1.6427 sec/batch
Epoch 5/5  Iteration 770/890 Training loss: 2.0191 1.6142 sec/batch
Epoch 5/5  Iteration 771/890 Training loss: 2.0185 1.6153 sec/batch
Epoch 5/5  Iteration 772/890 Training loss: 2.0190 1.6079 sec/batch
Epoch 5/5  Iteration 773/890 Training loss: 2.0185 1.6334 sec/batch
Epoch 5/5  Iteration 774/890 Training loss: 2.0190 1.6332 sec/batch
Epoch 5/5  Iteration 775/890 Training loss: 2.0193 1.6461 sec/batch
Epoch 5/5  Iteration 776/890 Training loss: 2.0193 1.5978 sec/batch
Epoch 5/5  Iteration 777/890 Training loss: 2.0190 1.6037 sec/batch
Epoch 5/5  Iteration 778/890 Training loss: 2.01

Epoch 5/5  Iteration 885/890 Training loss: 1.9929 1.6105 sec/batch
Epoch 5/5  Iteration 886/890 Training loss: 1.9928 1.6202 sec/batch
Epoch 5/5  Iteration 887/890 Training loss: 1.9927 1.6264 sec/batch
Epoch 5/5  Iteration 888/890 Training loss: 1.9925 1.6123 sec/batch
Epoch 5/5  Iteration 889/890 Training loss: 1.9922 1.6513 sec/batch
Epoch 5/5  Iteration 890/890 Training loss: 1.9921 1.6163 sec/batch
Epoch 1/5  Iteration 1/890 Training loss: 4.4199 3.5055 sec/batch
Epoch 1/5  Iteration 2/890 Training loss: 4.3884 3.4141 sec/batch
Epoch 1/5  Iteration 3/890 Training loss: 4.2491 3.4756 sec/batch
Epoch 1/5  Iteration 4/890 Training loss: 4.1641 3.4606 sec/batch
Epoch 1/5  Iteration 5/890 Training loss: 4.0390 3.4696 sec/batch
Epoch 1/5  Iteration 6/890 Training loss: 3.9446 3.4700 sec/batch
Epoch 1/5  Iteration 7/890 Training loss: 3.8771 3.5003 sec/batch
Epoch 1/5  Iteration 8/890 Training loss: 3.8205 3.5014 sec/batch
Epoch 1/5  Iteration 9/890 Training loss: 3.7699 3.4503 sec/batc

Epoch 1/5  Iteration 118/890 Training loss: 3.1225 3.4405 sec/batch
Epoch 1/5  Iteration 119/890 Training loss: 3.1193 3.4176 sec/batch
Epoch 1/5  Iteration 120/890 Training loss: 3.1160 3.5241 sec/batch
Epoch 1/5  Iteration 121/890 Training loss: 3.1127 3.4499 sec/batch
Epoch 1/5  Iteration 122/890 Training loss: 3.1094 3.4536 sec/batch
Epoch 1/5  Iteration 123/890 Training loss: 3.1060 3.4205 sec/batch
Epoch 1/5  Iteration 124/890 Training loss: 3.1027 3.4874 sec/batch
Epoch 1/5  Iteration 125/890 Training loss: 3.0993 3.4979 sec/batch
Epoch 1/5  Iteration 126/890 Training loss: 3.0956 3.4514 sec/batch
Epoch 1/5  Iteration 127/890 Training loss: 3.0922 3.4565 sec/batch
Epoch 1/5  Iteration 128/890 Training loss: 3.0889 3.4702 sec/batch
Epoch 1/5  Iteration 129/890 Training loss: 3.0853 3.4795 sec/batch
Epoch 1/5  Iteration 130/890 Training loss: 3.0817 3.4837 sec/batch
Epoch 1/5  Iteration 131/890 Training loss: 3.0782 3.4635 sec/batch
Epoch 1/5  Iteration 132/890 Training loss: 3.07

Epoch 2/5  Iteration 239/890 Training loss: 2.3156 3.4821 sec/batch
Epoch 2/5  Iteration 240/890 Training loss: 2.3145 3.4679 sec/batch
Epoch 2/5  Iteration 241/890 Training loss: 2.3135 3.4744 sec/batch
Epoch 2/5  Iteration 242/890 Training loss: 2.3122 3.4663 sec/batch
Epoch 2/5  Iteration 243/890 Training loss: 2.3107 3.4577 sec/batch
Epoch 2/5  Iteration 244/890 Training loss: 2.3097 3.4543 sec/batch
Epoch 2/5  Iteration 245/890 Training loss: 2.3085 3.4712 sec/batch
Epoch 2/5  Iteration 246/890 Training loss: 2.3069 3.4510 sec/batch
Epoch 2/5  Iteration 247/890 Training loss: 2.3052 3.4669 sec/batch
Epoch 2/5  Iteration 248/890 Training loss: 2.3041 3.4290 sec/batch
Epoch 2/5  Iteration 249/890 Training loss: 2.3030 3.4434 sec/batch
Epoch 2/5  Iteration 250/890 Training loss: 2.3020 3.4725 sec/batch
Epoch 2/5  Iteration 251/890 Training loss: 2.3009 3.4619 sec/batch
Epoch 2/5  Iteration 252/890 Training loss: 2.2994 3.4532 sec/batch
Epoch 2/5  Iteration 253/890 Training loss: 2.29

Epoch 3/5  Iteration 360/890 Training loss: 2.0283 3.4525 sec/batch
Epoch 3/5  Iteration 361/890 Training loss: 2.0257 3.6025 sec/batch
Epoch 3/5  Iteration 362/890 Training loss: 2.0197 3.5139 sec/batch
Epoch 3/5  Iteration 363/890 Training loss: 2.0204 3.4590 sec/batch
Epoch 3/5  Iteration 364/890 Training loss: 2.0202 3.5112 sec/batch
Epoch 3/5  Iteration 365/890 Training loss: 2.0219 3.4472 sec/batch
Epoch 3/5  Iteration 366/890 Training loss: 2.0206 3.4476 sec/batch
Epoch 3/5  Iteration 367/890 Training loss: 2.0164 3.4634 sec/batch
Epoch 3/5  Iteration 368/890 Training loss: 2.0143 3.4368 sec/batch
Epoch 3/5  Iteration 369/890 Training loss: 2.0142 3.4897 sec/batch
Epoch 3/5  Iteration 370/890 Training loss: 2.0162 3.4660 sec/batch
Epoch 3/5  Iteration 371/890 Training loss: 2.0152 3.4607 sec/batch
Epoch 3/5  Iteration 372/890 Training loss: 2.0134 3.4660 sec/batch
Epoch 3/5  Iteration 373/890 Training loss: 2.0123 3.4991 sec/batch
Epoch 3/5  Iteration 374/890 Training loss: 2.01

Epoch 3/5  Iteration 481/890 Training loss: 1.9405 3.4560 sec/batch
Epoch 3/5  Iteration 482/890 Training loss: 1.9398 3.4361 sec/batch
Epoch 3/5  Iteration 483/890 Training loss: 1.9394 3.4790 sec/batch
Epoch 3/5  Iteration 484/890 Training loss: 1.9389 3.4014 sec/batch
Epoch 3/5  Iteration 485/890 Training loss: 1.9383 3.4141 sec/batch
Epoch 3/5  Iteration 486/890 Training loss: 1.9378 3.4489 sec/batch
Epoch 3/5  Iteration 487/890 Training loss: 1.9371 3.4249 sec/batch
Epoch 3/5  Iteration 488/890 Training loss: 1.9363 3.4229 sec/batch
Epoch 3/5  Iteration 489/890 Training loss: 1.9359 3.4471 sec/batch
Epoch 3/5  Iteration 490/890 Training loss: 1.9355 3.4540 sec/batch
Epoch 3/5  Iteration 491/890 Training loss: 1.9350 3.4460 sec/batch
Epoch 3/5  Iteration 492/890 Training loss: 1.9346 3.4302 sec/batch
Epoch 3/5  Iteration 493/890 Training loss: 1.9342 3.4226 sec/batch
Epoch 3/5  Iteration 494/890 Training loss: 1.9337 3.4768 sec/batch
Epoch 3/5  Iteration 495/890 Training loss: 1.93

Epoch 4/5  Iteration 602/890 Training loss: 1.8046 3.4833 sec/batch
Epoch 4/5  Iteration 603/890 Training loss: 1.8043 3.4704 sec/batch
Epoch 4/5  Iteration 604/890 Training loss: 1.8040 3.5504 sec/batch
Epoch 4/5  Iteration 605/890 Training loss: 1.8040 3.4399 sec/batch
Epoch 4/5  Iteration 606/890 Training loss: 1.8039 3.4375 sec/batch
Epoch 4/5  Iteration 607/890 Training loss: 1.8041 3.4286 sec/batch
Epoch 4/5  Iteration 608/890 Training loss: 1.8036 3.4377 sec/batch
Epoch 4/5  Iteration 609/890 Training loss: 1.8032 3.4990 sec/batch
Epoch 4/5  Iteration 610/890 Training loss: 1.8032 3.4469 sec/batch
Epoch 4/5  Iteration 611/890 Training loss: 1.8028 3.4833 sec/batch
Epoch 4/5  Iteration 612/890 Training loss: 1.8026 3.4671 sec/batch
Epoch 4/5  Iteration 613/890 Training loss: 1.8019 3.4895 sec/batch
Epoch 4/5  Iteration 614/890 Training loss: 1.8015 3.4421 sec/batch
Epoch 4/5  Iteration 615/890 Training loss: 1.8007 3.4658 sec/batch
Epoch 4/5  Iteration 616/890 Training loss: 1.80

Epoch 5/5  Iteration 723/890 Training loss: 1.7150 3.4530 sec/batch
Epoch 5/5  Iteration 724/890 Training loss: 1.7141 3.4967 sec/batch
Epoch 5/5  Iteration 725/890 Training loss: 1.7133 3.4827 sec/batch
Epoch 5/5  Iteration 726/890 Training loss: 1.7156 3.4498 sec/batch
Epoch 5/5  Iteration 727/890 Training loss: 1.7144 3.4865 sec/batch
Epoch 5/5  Iteration 728/890 Training loss: 1.7126 3.4829 sec/batch
Epoch 5/5  Iteration 729/890 Training loss: 1.7122 3.4396 sec/batch
Epoch 5/5  Iteration 730/890 Training loss: 1.7138 3.4386 sec/batch
Epoch 5/5  Iteration 731/890 Training loss: 1.7144 3.4702 sec/batch
Epoch 5/5  Iteration 732/890 Training loss: 1.7144 3.4598 sec/batch
Epoch 5/5  Iteration 733/890 Training loss: 1.7136 3.4548 sec/batch
Epoch 5/5  Iteration 734/890 Training loss: 1.7147 3.4798 sec/batch
Epoch 5/5  Iteration 735/890 Training loss: 1.7137 3.4798 sec/batch
Epoch 5/5  Iteration 736/890 Training loss: 1.7133 3.4582 sec/batch
Epoch 5/5  Iteration 737/890 Training loss: 1.71

Epoch 5/5  Iteration 844/890 Training loss: 1.6761 3.4780 sec/batch
Epoch 5/5  Iteration 845/890 Training loss: 1.6761 3.5717 sec/batch
Epoch 5/5  Iteration 846/890 Training loss: 1.6760 3.4580 sec/batch
Epoch 5/5  Iteration 847/890 Training loss: 1.6758 3.4898 sec/batch
Epoch 5/5  Iteration 848/890 Training loss: 1.6757 3.4669 sec/batch
Epoch 5/5  Iteration 849/890 Training loss: 1.6757 3.4457 sec/batch
Epoch 5/5  Iteration 850/890 Training loss: 1.6755 3.4496 sec/batch
Epoch 5/5  Iteration 851/890 Training loss: 1.6755 3.4570 sec/batch
Epoch 5/5  Iteration 852/890 Training loss: 1.6752 3.4667 sec/batch
Epoch 5/5  Iteration 853/890 Training loss: 1.6754 3.4787 sec/batch
Epoch 5/5  Iteration 854/890 Training loss: 1.6751 3.5031 sec/batch
Epoch 5/5  Iteration 855/890 Training loss: 1.6748 3.4523 sec/batch
Epoch 5/5  Iteration 856/890 Training loss: 1.6747 3.4622 sec/batch
Epoch 5/5  Iteration 857/890 Training loss: 1.6745 3.4721 sec/batch
Epoch 5/5  Iteration 858/890 Training loss: 1.67

Epoch 1/5  Iteration 76/890 Training loss: 3.2998 3.4567 sec/batch
Epoch 1/5  Iteration 77/890 Training loss: 3.2981 3.4824 sec/batch
Epoch 1/5  Iteration 78/890 Training loss: 3.2962 3.4639 sec/batch
Epoch 1/5  Iteration 79/890 Training loss: 3.2943 3.4314 sec/batch
Epoch 1/5  Iteration 80/890 Training loss: 3.2923 3.4638 sec/batch
Epoch 1/5  Iteration 81/890 Training loss: 3.2903 3.4580 sec/batch
Epoch 1/5  Iteration 82/890 Training loss: 3.2886 3.4189 sec/batch
Epoch 1/5  Iteration 83/890 Training loss: 3.2870 3.4631 sec/batch
Epoch 1/5  Iteration 84/890 Training loss: 3.2852 3.4761 sec/batch
Epoch 1/5  Iteration 85/890 Training loss: 3.2833 3.4670 sec/batch
Epoch 1/5  Iteration 86/890 Training loss: 3.2816 3.4249 sec/batch
Epoch 1/5  Iteration 87/890 Training loss: 3.2798 3.4643 sec/batch
Epoch 1/5  Iteration 88/890 Training loss: 3.2781 3.4542 sec/batch
Epoch 1/5  Iteration 89/890 Training loss: 3.2766 3.4838 sec/batch
Epoch 1/5  Iteration 90/890 Training loss: 3.2751 3.4702 sec/b

Epoch 2/5  Iteration 197/890 Training loss: 2.6151 3.4335 sec/batch
Epoch 2/5  Iteration 198/890 Training loss: 2.6118 3.4911 sec/batch
Epoch 2/5  Iteration 199/890 Training loss: 2.6095 3.4593 sec/batch
Epoch 2/5  Iteration 200/890 Training loss: 2.6086 3.4455 sec/batch
Epoch 2/5  Iteration 201/890 Training loss: 2.6068 3.4763 sec/batch
Epoch 2/5  Iteration 202/890 Training loss: 2.6047 3.4578 sec/batch
Epoch 2/5  Iteration 203/890 Training loss: 2.6024 3.4641 sec/batch
Epoch 2/5  Iteration 204/890 Training loss: 2.6006 3.4714 sec/batch
Epoch 2/5  Iteration 205/890 Training loss: 2.5984 3.4471 sec/batch
Epoch 2/5  Iteration 206/890 Training loss: 2.5964 3.4659 sec/batch
Epoch 2/5  Iteration 207/890 Training loss: 2.5948 3.4695 sec/batch
Epoch 2/5  Iteration 208/890 Training loss: 2.5931 3.4722 sec/batch
Epoch 2/5  Iteration 209/890 Training loss: 2.5920 3.4949 sec/batch
Epoch 2/5  Iteration 210/890 Training loss: 2.5898 3.4846 sec/batch
Epoch 2/5  Iteration 211/890 Training loss: 2.58

Epoch 2/5  Iteration 318/890 Training loss: 2.4502 3.4449 sec/batch
Epoch 2/5  Iteration 319/890 Training loss: 2.4494 3.4348 sec/batch
Epoch 2/5  Iteration 320/890 Training loss: 2.4484 3.4511 sec/batch
Epoch 2/5  Iteration 321/890 Training loss: 2.4475 3.4473 sec/batch
Epoch 2/5  Iteration 322/890 Training loss: 2.4465 3.4882 sec/batch
Epoch 2/5  Iteration 323/890 Training loss: 2.4456 3.4538 sec/batch
Epoch 2/5  Iteration 324/890 Training loss: 2.4449 3.4806 sec/batch
Epoch 2/5  Iteration 325/890 Training loss: 2.4440 3.4763 sec/batch
Epoch 2/5  Iteration 326/890 Training loss: 2.4433 3.4753 sec/batch
Epoch 2/5  Iteration 327/890 Training loss: 2.4423 3.4827 sec/batch
Epoch 2/5  Iteration 328/890 Training loss: 2.4413 3.4396 sec/batch
Epoch 2/5  Iteration 329/890 Training loss: 2.4406 3.4593 sec/batch
Epoch 2/5  Iteration 330/890 Training loss: 2.4399 3.4862 sec/batch
Epoch 2/5  Iteration 331/890 Training loss: 2.4391 3.4616 sec/batch
Epoch 2/5  Iteration 332/890 Training loss: 2.43

Epoch 3/5  Iteration 439/890 Training loss: 2.2046 3.4618 sec/batch
Epoch 3/5  Iteration 440/890 Training loss: 2.2040 3.4340 sec/batch
Epoch 3/5  Iteration 441/890 Training loss: 2.2030 3.4711 sec/batch
Epoch 3/5  Iteration 442/890 Training loss: 2.2022 3.4494 sec/batch
Epoch 3/5  Iteration 443/890 Training loss: 2.2017 3.4575 sec/batch
Epoch 3/5  Iteration 444/890 Training loss: 2.2010 3.4917 sec/batch
Epoch 3/5  Iteration 445/890 Training loss: 2.2001 3.4520 sec/batch
Epoch 3/5  Iteration 446/890 Training loss: 2.1996 3.4697 sec/batch
Epoch 3/5  Iteration 447/890 Training loss: 2.1989 3.4986 sec/batch
Epoch 3/5  Iteration 448/890 Training loss: 2.1982 3.4886 sec/batch
Epoch 3/5  Iteration 449/890 Training loss: 2.1974 3.4535 sec/batch
Epoch 3/5  Iteration 450/890 Training loss: 2.1965 3.4655 sec/batch
Epoch 3/5  Iteration 451/890 Training loss: 2.1957 3.4667 sec/batch
Epoch 3/5  Iteration 452/890 Training loss: 2.1949 3.4154 sec/batch
Epoch 3/5  Iteration 453/890 Training loss: 2.19

Epoch 4/5  Iteration 560/890 Training loss: 2.0476 3.4612 sec/batch
Epoch 4/5  Iteration 561/890 Training loss: 2.0463 3.4651 sec/batch
Epoch 4/5  Iteration 562/890 Training loss: 2.0466 3.4512 sec/batch
Epoch 4/5  Iteration 563/890 Training loss: 2.0472 3.4897 sec/batch
Epoch 4/5  Iteration 564/890 Training loss: 2.0475 3.4545 sec/batch
Epoch 4/5  Iteration 565/890 Training loss: 2.0469 3.4633 sec/batch
Epoch 4/5  Iteration 566/890 Training loss: 2.0464 3.4620 sec/batch
Epoch 4/5  Iteration 567/890 Training loss: 2.0460 3.4611 sec/batch
Epoch 4/5  Iteration 568/890 Training loss: 2.0464 3.4701 sec/batch
Epoch 4/5  Iteration 569/890 Training loss: 2.0457 3.4585 sec/batch
Epoch 4/5  Iteration 570/890 Training loss: 2.0451 3.4979 sec/batch
Epoch 4/5  Iteration 571/890 Training loss: 2.0447 3.5069 sec/batch
Epoch 4/5  Iteration 572/890 Training loss: 2.0432 3.4454 sec/batch
Epoch 4/5  Iteration 573/890 Training loss: 2.0417 3.4931 sec/batch
Epoch 4/5  Iteration 574/890 Training loss: 2.04

Epoch 4/5  Iteration 681/890 Training loss: 1.9942 3.4421 sec/batch
Epoch 4/5  Iteration 682/890 Training loss: 1.9940 3.4715 sec/batch
Epoch 4/5  Iteration 683/890 Training loss: 1.9937 3.4189 sec/batch
Epoch 4/5  Iteration 684/890 Training loss: 1.9932 3.4382 sec/batch
Epoch 4/5  Iteration 685/890 Training loss: 1.9929 3.4799 sec/batch
Epoch 4/5  Iteration 686/890 Training loss: 1.9928 3.4520 sec/batch
Epoch 4/5  Iteration 687/890 Training loss: 1.9925 3.4507 sec/batch
Epoch 4/5  Iteration 688/890 Training loss: 1.9923 3.4845 sec/batch
Epoch 4/5  Iteration 689/890 Training loss: 1.9920 3.4639 sec/batch
Epoch 4/5  Iteration 690/890 Training loss: 1.9916 3.4651 sec/batch
Epoch 4/5  Iteration 691/890 Training loss: 1.9913 3.4466 sec/batch
Epoch 4/5  Iteration 692/890 Training loss: 1.9909 3.4798 sec/batch
Epoch 4/5  Iteration 693/890 Training loss: 1.9904 3.4843 sec/batch
Epoch 4/5  Iteration 694/890 Training loss: 1.9903 3.4219 sec/batch
Epoch 4/5  Iteration 695/890 Training loss: 1.99

Epoch 5/5  Iteration 802/890 Training loss: 1.8958 3.4534 sec/batch
Epoch 5/5  Iteration 803/890 Training loss: 1.8954 3.4537 sec/batch
Epoch 5/5  Iteration 804/890 Training loss: 1.8951 3.4608 sec/batch
Epoch 5/5  Iteration 805/890 Training loss: 1.8944 3.4465 sec/batch
Epoch 5/5  Iteration 806/890 Training loss: 1.8940 3.4763 sec/batch
Epoch 5/5  Iteration 807/890 Training loss: 1.8935 3.4659 sec/batch
Epoch 5/5  Iteration 808/890 Training loss: 1.8932 3.4239 sec/batch
Epoch 5/5  Iteration 809/890 Training loss: 1.8930 3.4288 sec/batch
Epoch 5/5  Iteration 810/890 Training loss: 1.8924 3.4516 sec/batch
Epoch 5/5  Iteration 811/890 Training loss: 1.8919 3.4855 sec/batch
Epoch 5/5  Iteration 812/890 Training loss: 1.8913 3.4201 sec/batch
Epoch 5/5  Iteration 813/890 Training loss: 1.8911 3.4282 sec/batch
Epoch 5/5  Iteration 814/890 Training loss: 1.8909 3.4274 sec/batch
Epoch 5/5  Iteration 815/890 Training loss: 1.8904 3.4567 sec/batch
Epoch 5/5  Iteration 816/890 Training loss: 1.88

Epoch 1/5  Iteration 34/890 Training loss: 3.3408 4.8074 sec/batch
Epoch 1/5  Iteration 35/890 Training loss: 3.3335 4.8014 sec/batch
Epoch 1/5  Iteration 36/890 Training loss: 3.3272 4.7420 sec/batch
Epoch 1/5  Iteration 37/890 Training loss: 3.3202 4.7537 sec/batch
Epoch 1/5  Iteration 38/890 Training loss: 3.3136 4.7054 sec/batch
Epoch 1/5  Iteration 39/890 Training loss: 3.3071 4.7457 sec/batch
Epoch 1/5  Iteration 40/890 Training loss: 3.3011 4.7525 sec/batch
Epoch 1/5  Iteration 41/890 Training loss: 3.2951 4.7454 sec/batch
Epoch 1/5  Iteration 42/890 Training loss: 3.2895 4.7758 sec/batch
Epoch 1/5  Iteration 43/890 Training loss: 3.2839 4.7265 sec/batch
Epoch 1/5  Iteration 44/890 Training loss: 3.2784 4.7977 sec/batch
Epoch 1/5  Iteration 45/890 Training loss: 3.2731 4.7419 sec/batch
Epoch 1/5  Iteration 46/890 Training loss: 3.2681 4.7245 sec/batch
Epoch 1/5  Iteration 47/890 Training loss: 3.2634 4.8576 sec/batch
Epoch 1/5  Iteration 48/890 Training loss: 3.2589 4.7359 sec/b

Epoch 1/5  Iteration 156/890 Training loss: 2.7860 4.7668 sec/batch
Epoch 1/5  Iteration 157/890 Training loss: 2.7826 4.7543 sec/batch
Epoch 1/5  Iteration 158/890 Training loss: 2.7793 4.8170 sec/batch
Epoch 1/5  Iteration 159/890 Training loss: 2.7760 4.7229 sec/batch
Epoch 1/5  Iteration 160/890 Training loss: 2.7730 4.7438 sec/batch
Epoch 1/5  Iteration 161/890 Training loss: 2.7699 4.7088 sec/batch
Epoch 1/5  Iteration 162/890 Training loss: 2.7667 4.7067 sec/batch
Epoch 1/5  Iteration 163/890 Training loss: 2.7635 4.7727 sec/batch
Epoch 1/5  Iteration 164/890 Training loss: 2.7604 4.7455 sec/batch
Epoch 1/5  Iteration 165/890 Training loss: 2.7574 4.6935 sec/batch
Epoch 1/5  Iteration 166/890 Training loss: 2.7544 4.7830 sec/batch
Epoch 1/5  Iteration 167/890 Training loss: 2.7514 4.7810 sec/batch
Epoch 1/5  Iteration 168/890 Training loss: 2.7484 4.7533 sec/batch
Epoch 1/5  Iteration 169/890 Training loss: 2.7455 4.7904 sec/batch
Epoch 1/5  Iteration 170/890 Training loss: 2.74

Epoch 2/5  Iteration 277/890 Training loss: 2.1172 4.7527 sec/batch
Epoch 2/5  Iteration 278/890 Training loss: 2.1158 4.7354 sec/batch
Epoch 2/5  Iteration 279/890 Training loss: 2.1150 4.6740 sec/batch
Epoch 2/5  Iteration 280/890 Training loss: 2.1141 4.7120 sec/batch
Epoch 2/5  Iteration 281/890 Training loss: 2.1129 4.7645 sec/batch
Epoch 2/5  Iteration 282/890 Training loss: 2.1119 4.7749 sec/batch
Epoch 2/5  Iteration 283/890 Training loss: 2.1108 4.7754 sec/batch
Epoch 2/5  Iteration 284/890 Training loss: 2.1098 4.7352 sec/batch
Epoch 2/5  Iteration 285/890 Training loss: 2.1089 4.8185 sec/batch
Epoch 2/5  Iteration 286/890 Training loss: 2.1082 4.7315 sec/batch
Epoch 2/5  Iteration 287/890 Training loss: 2.1074 4.7363 sec/batch
Epoch 2/5  Iteration 288/890 Training loss: 2.1064 4.7284 sec/batch
Epoch 2/5  Iteration 289/890 Training loss: 2.1056 4.8074 sec/batch
Epoch 2/5  Iteration 290/890 Training loss: 2.1046 4.7386 sec/batch
Epoch 2/5  Iteration 291/890 Training loss: 2.10

Epoch 3/5  Iteration 398/890 Training loss: 1.8985 4.7575 sec/batch
Epoch 3/5  Iteration 399/890 Training loss: 1.8974 4.7087 sec/batch
Epoch 3/5  Iteration 400/890 Training loss: 1.8959 4.8626 sec/batch
Epoch 3/5  Iteration 401/890 Training loss: 1.8956 4.7037 sec/batch
Epoch 3/5  Iteration 402/890 Training loss: 1.8937 4.6965 sec/batch
Epoch 3/5  Iteration 403/890 Training loss: 1.8931 4.7123 sec/batch
Epoch 3/5  Iteration 404/890 Training loss: 1.8921 4.7064 sec/batch
Epoch 3/5  Iteration 405/890 Training loss: 1.8913 4.6749 sec/batch
Epoch 3/5  Iteration 406/890 Training loss: 1.8914 4.7362 sec/batch
Epoch 3/5  Iteration 407/890 Training loss: 1.8905 4.8177 sec/batch
Epoch 3/5  Iteration 408/890 Training loss: 1.8905 4.7123 sec/batch
Epoch 3/5  Iteration 409/890 Training loss: 1.8899 4.7358 sec/batch
Epoch 3/5  Iteration 410/890 Training loss: 1.8890 4.7397 sec/batch
Epoch 3/5  Iteration 411/890 Training loss: 1.8882 4.7615 sec/batch
Epoch 3/5  Iteration 412/890 Training loss: 1.88

Epoch 3/5  Iteration 519/890 Training loss: 1.8308 4.7571 sec/batch
Epoch 3/5  Iteration 520/890 Training loss: 1.8304 4.7583 sec/batch
Epoch 3/5  Iteration 521/890 Training loss: 1.8300 4.8179 sec/batch
Epoch 3/5  Iteration 522/890 Training loss: 1.8296 4.8261 sec/batch
Epoch 3/5  Iteration 523/890 Training loss: 1.8293 4.7799 sec/batch
Epoch 3/5  Iteration 524/890 Training loss: 1.8292 4.7880 sec/batch
Epoch 3/5  Iteration 525/890 Training loss: 1.8287 4.7234 sec/batch
Epoch 3/5  Iteration 526/890 Training loss: 1.8283 4.9020 sec/batch
Epoch 3/5  Iteration 527/890 Training loss: 1.8278 4.7642 sec/batch
Epoch 3/5  Iteration 528/890 Training loss: 1.8273 4.7271 sec/batch
Epoch 3/5  Iteration 529/890 Training loss: 1.8269 4.7058 sec/batch
Epoch 3/5  Iteration 530/890 Training loss: 1.8265 4.7527 sec/batch
Epoch 3/5  Iteration 531/890 Training loss: 1.8261 4.7245 sec/batch
Epoch 3/5  Iteration 532/890 Training loss: 1.8257 4.7682 sec/batch
Epoch 3/5  Iteration 533/890 Training loss: 1.82

Epoch 4/5  Iteration 640/890 Training loss: 1.7008 4.7290 sec/batch
Epoch 4/5  Iteration 641/890 Training loss: 1.7005 4.7117 sec/batch
Epoch 4/5  Iteration 642/890 Training loss: 1.7002 4.7719 sec/batch
Epoch 4/5  Iteration 643/890 Training loss: 1.7000 4.7509 sec/batch
Epoch 4/5  Iteration 644/890 Training loss: 1.6997 4.7601 sec/batch
Epoch 4/5  Iteration 645/890 Training loss: 1.6993 4.7073 sec/batch
Epoch 4/5  Iteration 646/890 Training loss: 1.6990 4.7888 sec/batch
Epoch 4/5  Iteration 647/890 Training loss: 1.6986 4.7203 sec/batch
Epoch 4/5  Iteration 648/890 Training loss: 1.6982 4.8343 sec/batch
Epoch 4/5  Iteration 649/890 Training loss: 1.6978 4.7644 sec/batch
Epoch 4/5  Iteration 650/890 Training loss: 1.6971 4.7324 sec/batch
Epoch 4/5  Iteration 651/890 Training loss: 1.6969 4.7543 sec/batch
Epoch 4/5  Iteration 652/890 Training loss: 1.6967 4.8386 sec/batch
Epoch 4/5  Iteration 653/890 Training loss: 1.6963 4.7772 sec/batch
Epoch 4/5  Iteration 654/890 Training loss: 1.69

Epoch 5/5  Iteration 761/890 Training loss: 1.6080 4.7633 sec/batch
Epoch 5/5  Iteration 762/890 Training loss: 1.6085 4.7410 sec/batch
Epoch 5/5  Iteration 763/890 Training loss: 1.6079 4.7667 sec/batch
Epoch 5/5  Iteration 764/890 Training loss: 1.6084 4.6866 sec/batch
Epoch 5/5  Iteration 765/890 Training loss: 1.6081 4.7084 sec/batch
Epoch 5/5  Iteration 766/890 Training loss: 1.6080 4.7970 sec/batch
Epoch 5/5  Iteration 767/890 Training loss: 1.6076 4.7533 sec/batch
Epoch 5/5  Iteration 768/890 Training loss: 1.6076 4.7379 sec/batch
Epoch 5/5  Iteration 769/890 Training loss: 1.6078 4.7765 sec/batch
Epoch 5/5  Iteration 770/890 Training loss: 1.6073 4.7712 sec/batch
Epoch 5/5  Iteration 771/890 Training loss: 1.6065 4.6997 sec/batch
Epoch 5/5  Iteration 772/890 Training loss: 1.6070 4.6994 sec/batch
Epoch 5/5  Iteration 773/890 Training loss: 1.6067 4.7179 sec/batch
Epoch 5/5  Iteration 774/890 Training loss: 1.6073 4.8238 sec/batch
Epoch 5/5  Iteration 775/890 Training loss: 1.60

Epoch 5/5  Iteration 882/890 Training loss: 1.5790 4.7521 sec/batch
Epoch 5/5  Iteration 883/890 Training loss: 1.5787 4.7443 sec/batch
Epoch 5/5  Iteration 884/890 Training loss: 1.5784 4.7444 sec/batch
Epoch 5/5  Iteration 885/890 Training loss: 1.5783 4.7872 sec/batch
Epoch 5/5  Iteration 886/890 Training loss: 1.5781 4.8078 sec/batch
Epoch 5/5  Iteration 887/890 Training loss: 1.5780 4.8406 sec/batch
Epoch 5/5  Iteration 888/890 Training loss: 1.5777 4.7878 sec/batch
Epoch 5/5  Iteration 889/890 Training loss: 1.5773 4.8254 sec/batch
Epoch 5/5  Iteration 890/890 Training loss: 1.5773 4.7794 sec/batch
Epoch 1/5  Iteration 1/890 Training loss: 4.4183 4.7732 sec/batch
Epoch 1/5  Iteration 2/890 Training loss: 4.3923 4.7402 sec/batch
Epoch 1/5  Iteration 3/890 Training loss: 4.3504 4.7250 sec/batch
Epoch 1/5  Iteration 4/890 Training loss: 4.2205 4.7543 sec/batch
Epoch 1/5  Iteration 5/890 Training loss: 4.1182 4.8084 sec/batch
Epoch 1/5  Iteration 6/890 Training loss: 4.0272 4.8227 se

Epoch 1/5  Iteration 115/890 Training loss: 3.0716 4.7582 sec/batch
Epoch 1/5  Iteration 116/890 Training loss: 3.0678 4.7973 sec/batch
Epoch 1/5  Iteration 117/890 Training loss: 3.0641 4.7332 sec/batch
Epoch 1/5  Iteration 118/890 Training loss: 3.0606 4.8262 sec/batch
Epoch 1/5  Iteration 119/890 Training loss: 3.0572 4.7549 sec/batch
Epoch 1/5  Iteration 120/890 Training loss: 3.0536 4.8093 sec/batch
Epoch 1/5  Iteration 121/890 Training loss: 3.0501 4.7584 sec/batch
Epoch 1/5  Iteration 122/890 Training loss: 3.0465 4.7733 sec/batch
Epoch 1/5  Iteration 123/890 Training loss: 3.0428 4.8367 sec/batch
Epoch 1/5  Iteration 124/890 Training loss: 3.0392 4.8077 sec/batch
Epoch 1/5  Iteration 125/890 Training loss: 3.0356 4.7498 sec/batch
Epoch 1/5  Iteration 126/890 Training loss: 3.0318 4.8027 sec/batch
Epoch 1/5  Iteration 127/890 Training loss: 3.0282 4.7958 sec/batch
Epoch 1/5  Iteration 128/890 Training loss: 3.0248 4.6869 sec/batch
Epoch 1/5  Iteration 129/890 Training loss: 3.02

Epoch 2/5  Iteration 236/890 Training loss: 2.3155 4.8401 sec/batch
Epoch 2/5  Iteration 237/890 Training loss: 2.3143 4.7868 sec/batch
Epoch 2/5  Iteration 238/890 Training loss: 2.3138 4.7956 sec/batch
Epoch 2/5  Iteration 239/890 Training loss: 2.3126 4.7105 sec/batch
Epoch 2/5  Iteration 240/890 Training loss: 2.3118 4.9691 sec/batch
Epoch 2/5  Iteration 241/890 Training loss: 2.3114 4.7219 sec/batch
Epoch 2/5  Iteration 242/890 Training loss: 2.3105 4.8208 sec/batch
Epoch 2/5  Iteration 243/890 Training loss: 2.3094 4.7467 sec/batch
Epoch 2/5  Iteration 244/890 Training loss: 2.3088 4.7418 sec/batch
Epoch 2/5  Iteration 245/890 Training loss: 2.3078 4.8093 sec/batch
Epoch 2/5  Iteration 246/890 Training loss: 2.3066 4.8606 sec/batch
Epoch 2/5  Iteration 247/890 Training loss: 2.3055 4.7553 sec/batch
Epoch 2/5  Iteration 248/890 Training loss: 2.3046 4.7652 sec/batch
Epoch 2/5  Iteration 249/890 Training loss: 2.3039 4.7652 sec/batch
Epoch 2/5  Iteration 250/890 Training loss: 2.30

Epoch 3/5  Iteration 357/890 Training loss: 2.2124 4.7956 sec/batch
Epoch 3/5  Iteration 358/890 Training loss: 2.1549 4.7534 sec/batch
Epoch 3/5  Iteration 359/890 Training loss: 2.1394 4.7040 sec/batch
Epoch 3/5  Iteration 360/890 Training loss: 2.1322 4.7273 sec/batch
Epoch 3/5  Iteration 361/890 Training loss: 2.1284 4.8243 sec/batch
Epoch 3/5  Iteration 362/890 Training loss: 2.1213 4.7804 sec/batch
Epoch 3/5  Iteration 363/890 Training loss: 2.1217 4.8128 sec/batch
Epoch 3/5  Iteration 364/890 Training loss: 2.1221 5.7458 sec/batch
Epoch 3/5  Iteration 365/890 Training loss: 2.1238 4.8548 sec/batch
Epoch 3/5  Iteration 366/890 Training loss: 2.1228 4.7977 sec/batch
Epoch 3/5  Iteration 367/890 Training loss: 2.1200 4.7778 sec/batch
Epoch 3/5  Iteration 368/890 Training loss: 2.1180 4.7091 sec/batch
Epoch 3/5  Iteration 369/890 Training loss: 2.1177 4.7957 sec/batch
Epoch 3/5  Iteration 370/890 Training loss: 2.1194 4.7552 sec/batch
Epoch 3/5  Iteration 371/890 Training loss: 2.11

Epoch 3/5  Iteration 478/890 Training loss: 2.0634 4.7771 sec/batch
Epoch 3/5  Iteration 479/890 Training loss: 2.0628 4.8070 sec/batch
Epoch 3/5  Iteration 480/890 Training loss: 2.0626 4.7342 sec/batch
Epoch 3/5  Iteration 481/890 Training loss: 2.0621 4.7220 sec/batch
Epoch 3/5  Iteration 482/890 Training loss: 2.0614 4.7495 sec/batch
Epoch 3/5  Iteration 483/890 Training loss: 2.0611 4.8070 sec/batch
Epoch 3/5  Iteration 484/890 Training loss: 2.0608 4.6928 sec/batch
Epoch 3/5  Iteration 485/890 Training loss: 2.0604 4.7274 sec/batch
Epoch 3/5  Iteration 486/890 Training loss: 2.0600 4.7059 sec/batch
Epoch 3/5  Iteration 487/890 Training loss: 2.0594 4.8139 sec/batch
Epoch 3/5  Iteration 488/890 Training loss: 2.0588 4.7012 sec/batch
Epoch 3/5  Iteration 489/890 Training loss: 2.0584 4.8240 sec/batch
Epoch 3/5  Iteration 490/890 Training loss: 2.0580 4.7220 sec/batch
Epoch 3/5  Iteration 491/890 Training loss: 2.0577 4.8501 sec/batch
Epoch 3/5  Iteration 492/890 Training loss: 2.05

Epoch 4/5  Iteration 599/890 Training loss: 1.9511 4.7455 sec/batch
Epoch 4/5  Iteration 600/890 Training loss: 1.9510 4.8009 sec/batch
Epoch 4/5  Iteration 601/890 Training loss: 1.9507 4.8061 sec/batch
Epoch 4/5  Iteration 602/890 Training loss: 1.9499 4.6944 sec/batch
Epoch 4/5  Iteration 603/890 Training loss: 1.9493 4.7136 sec/batch
Epoch 4/5  Iteration 604/890 Training loss: 1.9490 4.8879 sec/batch
Epoch 4/5  Iteration 605/890 Training loss: 1.9491 4.8236 sec/batch
Epoch 4/5  Iteration 606/890 Training loss: 1.9489 4.7173 sec/batch
Epoch 4/5  Iteration 607/890 Training loss: 1.9490 4.8099 sec/batch
Epoch 4/5  Iteration 608/890 Training loss: 1.9483 4.7900 sec/batch
Epoch 4/5  Iteration 609/890 Training loss: 1.9479 4.7299 sec/batch
Epoch 4/5  Iteration 610/890 Training loss: 1.9481 4.6851 sec/batch
Epoch 4/5  Iteration 611/890 Training loss: 1.9478 4.7317 sec/batch
Epoch 4/5  Iteration 612/890 Training loss: 1.9477 4.8161 sec/batch
Epoch 4/5  Iteration 613/890 Training loss: 1.94

Epoch 5/5  Iteration 720/890 Training loss: 1.8541 4.7299 sec/batch
Epoch 5/5  Iteration 721/890 Training loss: 1.8568 4.7838 sec/batch
Epoch 5/5  Iteration 722/890 Training loss: 1.8550 4.7685 sec/batch
Epoch 5/5  Iteration 723/890 Training loss: 1.8523 4.7219 sec/batch
Epoch 5/5  Iteration 724/890 Training loss: 1.8498 4.7697 sec/batch
Epoch 5/5  Iteration 725/890 Training loss: 1.8497 4.7419 sec/batch
Epoch 5/5  Iteration 726/890 Training loss: 1.8521 4.8079 sec/batch
Epoch 5/5  Iteration 727/890 Training loss: 1.8515 4.7164 sec/batch
Epoch 5/5  Iteration 728/890 Training loss: 1.8500 4.8274 sec/batch
Epoch 5/5  Iteration 729/890 Training loss: 1.8494 4.7285 sec/batch
Epoch 5/5  Iteration 730/890 Training loss: 1.8517 4.7016 sec/batch
Epoch 5/5  Iteration 731/890 Training loss: 1.8516 4.7349 sec/batch
Epoch 5/5  Iteration 732/890 Training loss: 1.8523 4.6887 sec/batch
Epoch 5/5  Iteration 733/890 Training loss: 1.8515 4.7309 sec/batch
Epoch 5/5  Iteration 734/890 Training loss: 1.85

Epoch 5/5  Iteration 841/890 Training loss: 1.8130 4.7455 sec/batch
Epoch 5/5  Iteration 842/890 Training loss: 1.8126 4.7977 sec/batch
Epoch 5/5  Iteration 843/890 Training loss: 1.8121 4.8860 sec/batch
Epoch 5/5  Iteration 844/890 Training loss: 1.8116 4.7329 sec/batch
Epoch 5/5  Iteration 845/890 Training loss: 1.8113 4.7373 sec/batch
Epoch 5/5  Iteration 846/890 Training loss: 1.8111 4.7417 sec/batch
Epoch 5/5  Iteration 847/890 Training loss: 1.8109 4.6978 sec/batch
Epoch 5/5  Iteration 848/890 Training loss: 1.8107 4.7378 sec/batch
Epoch 5/5  Iteration 849/890 Training loss: 1.8107 4.8198 sec/batch
Epoch 5/5  Iteration 850/890 Training loss: 1.8105 4.8519 sec/batch
Epoch 5/5  Iteration 851/890 Training loss: 1.8105 4.7890 sec/batch
Epoch 5/5  Iteration 852/890 Training loss: 1.8102 4.9818 sec/batch
Epoch 5/5  Iteration 853/890 Training loss: 1.8102 5.1345 sec/batch
Epoch 5/5  Iteration 854/890 Training loss: 1.8099 7.6409 sec/batch
Epoch 5/5  Iteration 855/890 Training loss: 1.80

Epoch 1/5  Iteration 72/890 Training loss: 3.2292 11.8643 sec/batch
Epoch 1/5  Iteration 73/890 Training loss: 3.2257 11.5208 sec/batch
Epoch 1/5  Iteration 74/890 Training loss: 3.2221 11.5343 sec/batch
Epoch 1/5  Iteration 75/890 Training loss: 3.2185 11.6628 sec/batch
Epoch 1/5  Iteration 76/890 Training loss: 3.2150 11.5345 sec/batch
Epoch 1/5  Iteration 77/890 Training loss: 3.2113 11.5999 sec/batch
Epoch 1/5  Iteration 78/890 Training loss: 3.2078 11.6216 sec/batch
Epoch 1/5  Iteration 79/890 Training loss: 3.2040 11.4717 sec/batch
Epoch 1/5  Iteration 80/890 Training loss: 3.2000 11.4812 sec/batch
Epoch 1/5  Iteration 81/890 Training loss: 3.1962 11.5479 sec/batch
Epoch 1/5  Iteration 82/890 Training loss: 3.1923 11.8010 sec/batch
Epoch 1/5  Iteration 83/890 Training loss: 3.1885 11.5663 sec/batch
Epoch 1/5  Iteration 84/890 Training loss: 3.1845 11.7164 sec/batch
Epoch 1/5  Iteration 85/890 Training loss: 3.1801 11.7271 sec/batch
Epoch 1/5  Iteration 86/890 Training loss: 3.176

Epoch 2/5  Iteration 192/890 Training loss: 2.1931 11.6471 sec/batch
Epoch 2/5  Iteration 193/890 Training loss: 2.1910 11.6412 sec/batch
Epoch 2/5  Iteration 194/890 Training loss: 2.1885 11.6668 sec/batch
Epoch 2/5  Iteration 195/890 Training loss: 2.1862 11.9721 sec/batch
Epoch 2/5  Iteration 196/890 Training loss: 2.1869 11.5848 sec/batch
Epoch 2/5  Iteration 197/890 Training loss: 2.1854 11.5721 sec/batch
Epoch 2/5  Iteration 198/890 Training loss: 2.1829 11.6676 sec/batch
Epoch 2/5  Iteration 199/890 Training loss: 2.1808 11.6062 sec/batch
Epoch 2/5  Iteration 200/890 Training loss: 2.1809 11.6629 sec/batch
Epoch 2/5  Iteration 201/890 Training loss: 2.1784 11.7844 sec/batch
Epoch 2/5  Iteration 202/890 Training loss: 2.1755 11.7025 sec/batch
Epoch 2/5  Iteration 203/890 Training loss: 2.1734 11.5491 sec/batch
Epoch 2/5  Iteration 204/890 Training loss: 2.1709 11.6519 sec/batch
Epoch 2/5  Iteration 205/890 Training loss: 2.1682 11.7793 sec/batch
Epoch 2/5  Iteration 206/890 Train

Epoch 2/5  Iteration 311/890 Training loss: 2.0127 11.8114 sec/batch
Epoch 2/5  Iteration 312/890 Training loss: 2.0117 11.6770 sec/batch
Epoch 2/5  Iteration 313/890 Training loss: 2.0106 11.5499 sec/batch
Epoch 2/5  Iteration 314/890 Training loss: 2.0095 11.6179 sec/batch
Epoch 2/5  Iteration 315/890 Training loss: 2.0085 11.6836 sec/batch
Epoch 2/5  Iteration 316/890 Training loss: 2.0075 11.5563 sec/batch
Epoch 2/5  Iteration 317/890 Training loss: 2.0066 11.8022 sec/batch
Epoch 2/5  Iteration 318/890 Training loss: 2.0055 11.9776 sec/batch
Epoch 2/5  Iteration 319/890 Training loss: 2.0047 11.7246 sec/batch
Epoch 2/5  Iteration 320/890 Training loss: 2.0036 11.5758 sec/batch
Epoch 2/5  Iteration 321/890 Training loss: 2.0025 11.7774 sec/batch
Epoch 2/5  Iteration 322/890 Training loss: 2.0015 11.5681 sec/batch
Epoch 2/5  Iteration 323/890 Training loss: 2.0003 11.6130 sec/batch
Epoch 2/5  Iteration 324/890 Training loss: 1.9995 11.7342 sec/batch
Epoch 2/5  Iteration 325/890 Train

In [44]:
tf.train.get_checkpoint_state('checkpoints/anna')

## Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.



In [17]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [41]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    prime = "Far"
    samples = [c for c in prime]
    model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.preds, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.preds, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

In [44]:
checkpoint = "checkpoints/anna/i3560_l512_1.122.ckpt"
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)

Farlathit that if had so
like it that it were. He could not trouble to his wife, and there was
anything in them of the side of his weaky in the creature at his forteren
to him.

"What is it? I can't bread to those," said Stepan Arkadyevitch. "It's not
my children, and there is an almost this arm, true it mays already,
and tell you what I have say to you, and was not looking at the peasant,
why is, I don't know him out, and she doesn't speak to me immediately, as
you would say the countess and the more frest an angelembre, and time and
things's silent, but I was not in my stand that is in my head. But if he
say, and was so feeling with his soul. A child--in his soul of his
soul of his soul. He should not see that any of that sense of. Here he
had not been so composed and to speak for as in a whole picture, but
all the setting and her excellent and society, who had been delighted
and see to anywing had been being troed to thousand words on them,
we liked him.

That set in her money at th

In [43]:
checkpoint = "checkpoints/anna/i200_l512_2.432.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Farnt him oste wha sorind thans tout thint asd an sesand an hires on thime sind thit aled, ban thand and out hore as the ter hos ton ho te that, was tis tart al the hand sostint him sore an tit an son thes, win he se ther san ther hher tas tarereng,.

Anl at an ades in ond hesiln, ad hhe torers teans, wast tar arering tho this sos alten sorer has hhas an siton ther him he had sin he ard ate te anling the sosin her ans and
arins asd and ther ale te tot an tand tanginge wath and ho ald, so sot th asend sat hare sother horesinnd, he hesense wing ante her so tith tir sherinn, anded and to the toul anderin he sorit he torsith she se atere an ting ot hand and thit hhe so the te wile har
ens ont in the sersise, and we he seres tar aterer, to ato tat or has he he wan ton here won and sen heren he sosering, to to theer oo adent har herere the wosh oute, was serild ward tous hed astend..

I's sint on alt in har tor tit her asd hade shithans ored he talereng an soredendere tim tot hees. Tise sor 

In [46]:
checkpoint = "checkpoints/anna/i600_l512_1.750.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Fard as astice her said he celatice of to seress in the raice, and to be the some and sere allats to that said to that the sark and a cast a the wither ald the pacinesse of her had astition, he said to the sount as she west at hissele. Af the cond it he was a fact onthis astisarianing.


"Or a ton to to be that's a more at aspestale as the sont of anstiring as
thours and trey.

The same wo dangring the
raterst, who sore and somethy had ast out an of his book. "We had's beane were that, and a morted a thay he had to tere. Then to
her homent andertersed his his ancouted to the pirsted, the soution for of the pirsice inthirgest and stenciol, with the hard and and
a colrice of to be oneres,
the song to this anderssad.
The could ounterss the said to serom of
soment a carsed of sheres of she
torded
har and want in their of hould, but
her told in that in he tad a the same to her. Serghing an her has and with the seed, and the camt ont his about of the
sail, the her then all houg ant or to hus

In [47]:
checkpoint = "checkpoints/anna/i1000_l512_1.484.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Farrat, his felt has at it.

"When the pose ther hor exceed
to his sheant was," weat a sime of his sounsed. The coment and the facily that which had began terede a marilicaly whice whether the pose of his hand, at she was alligated herself the same on she had to
taiking to his forthing and streath how to hand
began in a lang at some at it, this he cholded not set all her. "Wo love that is setthing. Him anstering as seen that."

"Yes in the man that say the mare a crances is it?" said Sergazy Ivancatching. "You doon think were somether is ifficult of a mone of
though the most at the countes that the
mean on the come to say the most, to
his feesing of
a man she, whilo he
sained and well, that he would still at to said. He wind at his for the sore in the most
of hoss and almoved to see him. They have betine the sumper into at he his stire, and what he was that at the so steate of the
sound, and shin should have a geest of shall feet on the conderation to she had been at that imporsing the