# Anna KaRNNa

In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.

This network is based off of Andrej Karpathy's [post on RNNs](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) and [implementation in Torch](https://github.com/karpathy/char-rnn). Also, some information [here at r2rt](http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html) and from [Sherjil Ozair](https://github.com/sherjilozair/char-rnn-tensorflow) on GitHub. Below is the general architecture of the character-wise RNN.

<img src="assets/charseq.jpeg" width="500">

In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

  return f(*args, **kwds)


First we'll load the text file and convert it into integers for our network to use.

In [2]:
with open('anna.txt', 'r') as f:
    text=f.read()
vocab = set(text)
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

In [3]:
text[:100]

'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

In [4]:
chars[:100]

array([47, 71, 58, 15, 38, 69, 35, 42, 79, 68, 68, 68, 77, 58, 15, 15, 54,
       42, 22, 58, 10, 41,  7, 41, 69, 20, 42, 58, 35, 69, 42, 58,  7,  7,
       42, 58,  7, 41, 23, 69, 31, 42, 69, 24, 69, 35, 54, 42, 81, 33, 71,
       58, 15, 15, 54, 42, 22, 58, 10, 41,  7, 54, 42, 41, 20, 42, 81, 33,
       71, 58, 15, 15, 54, 42, 41, 33, 42, 41, 38, 20, 42, 66, 67, 33, 68,
       67, 58, 54, 25, 68, 68, 43, 24, 69, 35, 54, 38, 71, 41, 33], dtype=int32)

Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.

Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.

The idea here is to make a 2D matrix where the number of rows is equal to the number of batches. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the `split_frac` keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set.

In [5]:
def split_data(chars, batch_size, num_steps, split_frac=0.9):
    """ 
    Split character data into training and validation sets, inputs and targets for each set.
    
    Arguments
    ---------
    chars: character array
    batch_size: Size of examples in each of batch
    num_steps: Number of sequence steps to keep in the input and pass to the network
    split_frac: Fraction of batches to keep in the training set
    
    
    Returns train_x, train_y, val_x, val_y
    """
    
    slice_size = batch_size * num_steps
    n_batches = int(len(chars) / slice_size)
    
    # Drop the last few characters to make only full batches
    x = chars[: n_batches*slice_size]
    y = chars[1: n_batches*slice_size + 1]
    
    # Split the data into batch_size slices, then stack them into a 2D matrix 
    x = np.stack(np.split(x, batch_size))
    y = np.stack(np.split(y, batch_size))
    
    # Now x and y are arrays with dimensions batch_size x n_batches*num_steps
    
    # Split into training and validation sets, keep the virst split_frac batches for training
    split_idx = int(n_batches*split_frac)
    train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]
    val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]
    
    return train_x, train_y, val_x, val_y

In [6]:
train_x, train_y, val_x, val_y = split_data(chars, 10, 200)

In [7]:
train_x.shape

(10, 178400)

In [8]:
train_x[:,:10]

array([[47, 71, 58, 15, 38, 69, 35, 42, 79, 68],
       [73, 33,  9, 42, 71, 69, 42, 10, 66, 24],
       [42,  4, 58, 38,  4, 71, 41, 33,  2, 42],
       [66, 38, 71, 69, 35, 42, 67, 66, 81,  7],
       [42, 38, 71, 69, 42,  7, 58, 33,  9, 30],
       [42, 72, 71, 35, 66, 81,  2, 71, 42,  7],
       [38, 42, 38, 66, 68,  9, 66, 25, 68, 68],
       [66, 42, 71, 69, 35, 20, 69,  7, 22, 39],
       [71, 58, 38, 42, 41, 20, 42, 38, 71, 69],
       [69, 35, 20, 69,  7, 22, 42, 58, 33,  9]], dtype=int32)

I'll write another function to grab batches out of the arrays made by split data. Here each batch will be a sliding window on these arrays with size `batch_size X num_steps`. For example, if we want our network to train on a sequence of 100 characters, `num_steps = 100`. For the next batch, we'll shift this window the next sequence of `num_steps` characters. In this way we can feed batches to the network and the cell states will continue through on each batch.

In [9]:
def get_batch(arrs, num_steps):
    batch_size, slice_size = arrs[0].shape
    
    n_batches = int(slice_size/num_steps)
    for b in range(n_batches):
        yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]

In [10]:
def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,
              learning_rate=0.001, grad_clip=5, sampling=False):
        
    if sampling == True:
        batch_size, num_steps = 1, 1

    tf.reset_default_graph()
    
    # Declare placeholders we'll feed into the graph
    with tf.name_scope('inputs'):
        inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
        x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')
    
    with tf.name_scope('targets'):
        targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
        y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')
        y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])
    
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    # Build the RNN layers
    def build_cell(lstm_size, keep_prob):
        # Use a basic LSTM cell
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
        
        # Add dropout to the cell
        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        return drop
    with tf.name_scope("RNN_cells"):
        cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])
    
    with tf.name_scope("RNN_init_state"):
        initial_state = cell.zero_state(batch_size, tf.float32)

    # Run the data through the RNN layers
    with tf.name_scope("RNN_forward"):
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=initial_state)
    
    final_state = state
    
    # Reshape output so it's a bunch of rows, one row for each cell output
    with tf.name_scope('sequence_reshape'):
        seq_output = tf.concat(outputs, axis=1,name='seq_output')
        output = tf.reshape(seq_output, [-1, lstm_size], name='graph_output')
    
    # Now connect the RNN outputs to a softmax layer and calculate the cost
    with tf.name_scope('logits'):
        softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1),
                               name='softmax_w')
        softmax_b = tf.Variable(tf.zeros(num_classes), name='softmax_b')
        logits = tf.matmul(output, softmax_w) + softmax_b
        tf.summary.histogram('softmax_w', softmax_w)
        tf.summary.histogram('softmax_b', softmax_b)

    with tf.name_scope('predictions'):
        preds = tf.nn.softmax(logits, name='predictions')
        tf.summary.histogram('predictions', preds)
    
    with tf.name_scope('cost'):
        loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped, name='loss')
        cost = tf.reduce_mean(loss, name='cost')
        tf.summary.scalar('cost', cost)

    # Optimizer for training, using gradient clipping to control exploding gradients
    with tf.name_scope('train'):
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)
        train_op = tf.train.AdamOptimizer(learning_rate)
        optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    merged = tf.summary.merge_all()
    
    # Export the nodes 
    export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',
                    'keep_prob', 'cost', 'preds', 'optimizer', 'merged']
    Graph = namedtuple('Graph', export_nodes)
    local_dict = locals()
    graph = Graph(*[local_dict[each] for each in export_nodes])
    
    return graph

## Hyperparameters

Here I'm defining the hyperparameters for the network. The two you probably haven't seen before are `lstm_size` and `num_layers`. These set the number of hidden units in the LSTM layers and the number of LSTM layers, respectively. Of course, making these bigger will improve the network's performance but you'll have to watch out for overfitting. If your validation loss is much larger than the training loss, you're probably overfitting. Decrease the size of the network or decrease the dropout keep probability.

In [11]:
batch_size = 50
num_steps = 100
lstm_size = 256
num_layers = 2
learning_rate = 0.001

## Training

Time for training which is is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by `save_every_n`) I calculate the validation loss and save a checkpoint.

In [12]:
!mkdir -p checkpoints/anna

In [13]:
epochs = 10
save_every_n = 100
train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)

model = build_rnn(len(vocab), 
                  batch_size=batch_size,
                  num_steps=num_steps,
                  learning_rate=learning_rate,
                  lstm_size=lstm_size,
                  num_layers=num_layers)

saver = tf.train.Saver(max_to_keep=100)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter('./logs/2/train', sess.graph)
    test_writer = tf.summary.FileWriter('./logs/2/test')
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/anna20.ckpt')
    
    n_batches = int(train_x.shape[1]/num_steps)
    iterations = n_batches * epochs
    for e in range(epochs):
        
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
            iteration = e*n_batches + b
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: 0.5,
                    model.initial_state: new_state}
            summary, batch_loss, new_state, _ = sess.run([model.merged, model.cost, 
                                                          model.final_state, model.optimizer], 
                                                          feed_dict=feed)
            loss += batch_loss
            end = time.time()
            print('Epoch {}/{} '.format(e+1, epochs),
                  'Iteration {}/{}'.format(iteration, iterations),
                  'Training loss: {:.4f}'.format(loss/b),
                  '{:.4f} sec/batch'.format((end-start)))
            
            train_writer.add_summary(summary, iteration)
        
            if (iteration%save_every_n == 0) or (iteration == iterations):
                # Check performance, notice dropout has been set to 1
                val_loss = []
                new_state = sess.run(model.initial_state)
                for x, y in get_batch([val_x, val_y], num_steps):
                    feed = {model.inputs: x,
                            model.targets: y,
                            model.keep_prob: 1.,
                            model.initial_state: new_state}
                    summary, batch_loss, new_state = sess.run([model.merged, model.cost, 
                                                               model.final_state], feed_dict=feed)
                    val_loss.append(batch_loss)
                    
                test_writer.add_summary(summary, iteration)

                print('Validation loss:', np.mean(val_loss),
                      'Saving checkpoint!')
                #saver.save(sess, "checkpoints/anna/i{}_l{}_{:.3f}.ckpt".format(iteration, lstm_size, np.mean(val_loss)))

Epoch 1/10  Iteration 1/3570 Training loss: 4.4170 0.3839 sec/batch
Epoch 1/10  Iteration 2/3570 Training loss: 4.3990 0.2020 sec/batch
Epoch 1/10  Iteration 3/3570 Training loss: 4.3707 0.1780 sec/batch
Epoch 1/10  Iteration 4/3570 Training loss: 4.3054 0.1586 sec/batch
Epoch 1/10  Iteration 5/3570 Training loss: 4.1990 0.1496 sec/batch
Epoch 1/10  Iteration 6/3570 Training loss: 4.1023 0.1474 sec/batch
Epoch 1/10  Iteration 7/3570 Training loss: 4.0215 0.1471 sec/batch
Epoch 1/10  Iteration 8/3570 Training loss: 3.9531 0.1493 sec/batch
Epoch 1/10  Iteration 9/3570 Training loss: 3.8966 0.1467 sec/batch
Epoch 1/10  Iteration 10/3570 Training loss: 3.8490 0.1485 sec/batch
Epoch 1/10  Iteration 11/3570 Training loss: 3.8068 0.1495 sec/batch
Epoch 1/10  Iteration 12/3570 Training loss: 3.7712 0.1499 sec/batch
Epoch 1/10  Iteration 13/3570 Training loss: 3.7375 0.1473 sec/batch
Epoch 1/10  Iteration 14/3570 Training loss: 3.7066 0.1486 sec/batch
Epoch 1/10  Iteration 15/3570 Training loss

Epoch 1/10  Iteration 120/3570 Training loss: 3.2013 0.1515 sec/batch
Epoch 1/10  Iteration 121/3570 Training loss: 3.1988 0.1474 sec/batch
Epoch 1/10  Iteration 122/3570 Training loss: 3.1965 0.1510 sec/batch
Epoch 1/10  Iteration 123/3570 Training loss: 3.1937 0.1481 sec/batch
Epoch 1/10  Iteration 124/3570 Training loss: 3.1911 0.1489 sec/batch
Epoch 1/10  Iteration 125/3570 Training loss: 3.1882 0.1498 sec/batch
Epoch 1/10  Iteration 126/3570 Training loss: 3.1858 0.1511 sec/batch
Epoch 1/10  Iteration 127/3570 Training loss: 3.1830 0.1509 sec/batch
Epoch 1/10  Iteration 128/3570 Training loss: 3.1803 0.1510 sec/batch
Epoch 1/10  Iteration 129/3570 Training loss: 3.1778 0.1458 sec/batch
Epoch 1/10  Iteration 130/3570 Training loss: 3.1751 0.1506 sec/batch
Epoch 1/10  Iteration 131/3570 Training loss: 3.1726 0.1480 sec/batch
Epoch 1/10  Iteration 132/3570 Training loss: 3.1700 0.1516 sec/batch
Epoch 1/10  Iteration 133/3570 Training loss: 3.1674 0.1477 sec/batch
Epoch 1/10  Iteratio

Epoch 1/10  Iteration 238/3570 Training loss: 2.8986 0.1485 sec/batch
Epoch 1/10  Iteration 239/3570 Training loss: 2.8965 0.1487 sec/batch
Epoch 1/10  Iteration 240/3570 Training loss: 2.8945 0.1473 sec/batch
Epoch 1/10  Iteration 241/3570 Training loss: 2.8925 0.1472 sec/batch
Epoch 1/10  Iteration 242/3570 Training loss: 2.8906 0.1497 sec/batch
Epoch 1/10  Iteration 243/3570 Training loss: 2.8885 0.1474 sec/batch
Epoch 1/10  Iteration 244/3570 Training loss: 2.8867 0.1620 sec/batch
Epoch 1/10  Iteration 245/3570 Training loss: 2.8847 0.1480 sec/batch
Epoch 1/10  Iteration 246/3570 Training loss: 2.8828 0.1462 sec/batch
Epoch 1/10  Iteration 247/3570 Training loss: 2.8809 0.1514 sec/batch
Epoch 1/10  Iteration 248/3570 Training loss: 2.8790 0.1488 sec/batch
Epoch 1/10  Iteration 249/3570 Training loss: 2.8770 0.1514 sec/batch
Epoch 1/10  Iteration 250/3570 Training loss: 2.8752 0.1499 sec/batch
Epoch 1/10  Iteration 251/3570 Training loss: 2.8733 0.1476 sec/batch
Epoch 1/10  Iteratio

Epoch 1/10  Iteration 356/3570 Training loss: 2.7141 0.1487 sec/batch
Epoch 1/10  Iteration 357/3570 Training loss: 2.7129 0.1481 sec/batch
Epoch 2/10  Iteration 358/3570 Training loss: 2.3393 0.1491 sec/batch
Epoch 2/10  Iteration 359/3570 Training loss: 2.2741 0.1504 sec/batch
Epoch 2/10  Iteration 360/3570 Training loss: 2.2681 0.1480 sec/batch
Epoch 2/10  Iteration 361/3570 Training loss: 2.2564 0.1491 sec/batch
Epoch 2/10  Iteration 362/3570 Training loss: 2.2615 0.1488 sec/batch
Epoch 2/10  Iteration 363/3570 Training loss: 2.2595 0.1507 sec/batch
Epoch 2/10  Iteration 364/3570 Training loss: 2.2539 0.1515 sec/batch
Epoch 2/10  Iteration 365/3570 Training loss: 2.2542 0.1496 sec/batch
Epoch 2/10  Iteration 366/3570 Training loss: 2.2537 0.1477 sec/batch
Epoch 2/10  Iteration 367/3570 Training loss: 2.2549 0.1477 sec/batch
Epoch 2/10  Iteration 368/3570 Training loss: 2.2543 0.1491 sec/batch
Epoch 2/10  Iteration 369/3570 Training loss: 2.2543 0.1484 sec/batch
Epoch 2/10  Iteratio

Epoch 2/10  Iteration 474/3570 Training loss: 2.1835 0.1522 sec/batch
Epoch 2/10  Iteration 475/3570 Training loss: 2.1831 0.1503 sec/batch
Epoch 2/10  Iteration 476/3570 Training loss: 2.1822 0.1471 sec/batch
Epoch 2/10  Iteration 477/3570 Training loss: 2.1814 0.1482 sec/batch
Epoch 2/10  Iteration 478/3570 Training loss: 2.1812 0.1467 sec/batch
Epoch 2/10  Iteration 479/3570 Training loss: 2.1808 0.1487 sec/batch
Epoch 2/10  Iteration 480/3570 Training loss: 2.1806 0.1469 sec/batch
Epoch 2/10  Iteration 481/3570 Training loss: 2.1799 0.1470 sec/batch
Epoch 2/10  Iteration 482/3570 Training loss: 2.1794 0.1481 sec/batch
Epoch 2/10  Iteration 483/3570 Training loss: 2.1793 0.1483 sec/batch
Epoch 2/10  Iteration 484/3570 Training loss: 2.1785 0.1472 sec/batch
Epoch 2/10  Iteration 485/3570 Training loss: 2.1778 0.1494 sec/batch
Epoch 2/10  Iteration 486/3570 Training loss: 2.1774 0.1503 sec/batch
Epoch 2/10  Iteration 487/3570 Training loss: 2.1769 0.1477 sec/batch
Epoch 2/10  Iteratio

Epoch 2/10  Iteration 592/3570 Training loss: 2.1278 0.1514 sec/batch
Epoch 2/10  Iteration 593/3570 Training loss: 2.1274 0.1493 sec/batch
Epoch 2/10  Iteration 594/3570 Training loss: 2.1270 0.1491 sec/batch
Epoch 2/10  Iteration 595/3570 Training loss: 2.1265 0.1495 sec/batch
Epoch 2/10  Iteration 596/3570 Training loss: 2.1259 0.1521 sec/batch
Epoch 2/10  Iteration 597/3570 Training loss: 2.1256 0.1513 sec/batch
Epoch 2/10  Iteration 598/3570 Training loss: 2.1251 0.1473 sec/batch
Epoch 2/10  Iteration 599/3570 Training loss: 2.1247 0.1506 sec/batch
Epoch 2/10  Iteration 600/3570 Training loss: 2.1243 0.1478 sec/batch
Validation loss: 1.91499 Saving checkpoint!
Epoch 2/10  Iteration 601/3570 Training loss: 2.1242 0.1472 sec/batch
Epoch 2/10  Iteration 602/3570 Training loss: 2.1237 0.1510 sec/batch
Epoch 2/10  Iteration 603/3570 Training loss: 2.1233 0.1479 sec/batch
Epoch 2/10  Iteration 604/3570 Training loss: 2.1230 0.1498 sec/batch
Epoch 2/10  Iteration 605/3570 Training loss: 

Epoch 2/10  Iteration 708/3570 Training loss: 2.0845 0.1482 sec/batch
Epoch 2/10  Iteration 709/3570 Training loss: 2.0841 0.1478 sec/batch
Epoch 2/10  Iteration 710/3570 Training loss: 2.0838 0.1476 sec/batch
Epoch 2/10  Iteration 711/3570 Training loss: 2.0833 0.1495 sec/batch
Epoch 2/10  Iteration 712/3570 Training loss: 2.0829 0.1475 sec/batch
Epoch 2/10  Iteration 713/3570 Training loss: 2.0827 0.1484 sec/batch
Epoch 2/10  Iteration 714/3570 Training loss: 2.0824 0.1479 sec/batch
Epoch 3/10  Iteration 715/3570 Training loss: 2.0153 0.1486 sec/batch
Epoch 3/10  Iteration 716/3570 Training loss: 1.9648 0.1499 sec/batch
Epoch 3/10  Iteration 717/3570 Training loss: 1.9547 0.1482 sec/batch
Epoch 3/10  Iteration 718/3570 Training loss: 1.9449 0.1480 sec/batch
Epoch 3/10  Iteration 719/3570 Training loss: 1.9536 0.1480 sec/batch
Epoch 3/10  Iteration 720/3570 Training loss: 1.9575 0.1469 sec/batch
Epoch 3/10  Iteration 721/3570 Training loss: 1.9504 0.1494 sec/batch
Epoch 3/10  Iteratio

Epoch 3/10  Iteration 826/3570 Training loss: 1.9129 0.1491 sec/batch
Epoch 3/10  Iteration 827/3570 Training loss: 1.9126 0.1461 sec/batch
Epoch 3/10  Iteration 828/3570 Training loss: 1.9123 0.1507 sec/batch
Epoch 3/10  Iteration 829/3570 Training loss: 1.9116 0.1487 sec/batch
Epoch 3/10  Iteration 830/3570 Training loss: 1.9109 0.1502 sec/batch
Epoch 3/10  Iteration 831/3570 Training loss: 1.9106 0.1478 sec/batch
Epoch 3/10  Iteration 832/3570 Training loss: 1.9105 0.1495 sec/batch
Epoch 3/10  Iteration 833/3570 Training loss: 1.9099 0.1475 sec/batch
Epoch 3/10  Iteration 834/3570 Training loss: 1.9094 0.1483 sec/batch
Epoch 3/10  Iteration 835/3570 Training loss: 1.9096 0.1480 sec/batch
Epoch 3/10  Iteration 836/3570 Training loss: 1.9094 0.1482 sec/batch
Epoch 3/10  Iteration 837/3570 Training loss: 1.9096 0.1459 sec/batch
Epoch 3/10  Iteration 838/3570 Training loss: 1.9092 0.1476 sec/batch
Epoch 3/10  Iteration 839/3570 Training loss: 1.9090 0.1508 sec/batch
Epoch 3/10  Iteratio

Epoch 3/10  Iteration 944/3570 Training loss: 1.8847 0.1483 sec/batch
Epoch 3/10  Iteration 945/3570 Training loss: 1.8844 0.1497 sec/batch
Epoch 3/10  Iteration 946/3570 Training loss: 1.8842 0.1492 sec/batch
Epoch 3/10  Iteration 947/3570 Training loss: 1.8839 0.1475 sec/batch
Epoch 3/10  Iteration 948/3570 Training loss: 1.8838 0.1455 sec/batch
Epoch 3/10  Iteration 949/3570 Training loss: 1.8835 0.1487 sec/batch
Epoch 3/10  Iteration 950/3570 Training loss: 1.8833 0.1482 sec/batch
Epoch 3/10  Iteration 951/3570 Training loss: 1.8831 0.1476 sec/batch
Epoch 3/10  Iteration 952/3570 Training loss: 1.8830 0.1482 sec/batch
Epoch 3/10  Iteration 953/3570 Training loss: 1.8825 0.1497 sec/batch
Epoch 3/10  Iteration 954/3570 Training loss: 1.8824 0.1471 sec/batch
Epoch 3/10  Iteration 955/3570 Training loss: 1.8822 0.1466 sec/batch
Epoch 3/10  Iteration 956/3570 Training loss: 1.8820 0.1497 sec/batch
Epoch 3/10  Iteration 957/3570 Training loss: 1.8819 0.1528 sec/batch
Epoch 3/10  Iteratio

Epoch 3/10  Iteration 1060/3570 Training loss: 1.8651 0.1477 sec/batch
Epoch 3/10  Iteration 1061/3570 Training loss: 1.8650 0.1494 sec/batch
Epoch 3/10  Iteration 1062/3570 Training loss: 1.8649 0.1473 sec/batch
Epoch 3/10  Iteration 1063/3570 Training loss: 1.8646 0.1486 sec/batch
Epoch 3/10  Iteration 1064/3570 Training loss: 1.8643 0.1467 sec/batch
Epoch 3/10  Iteration 1065/3570 Training loss: 1.8641 0.1482 sec/batch
Epoch 3/10  Iteration 1066/3570 Training loss: 1.8639 0.1479 sec/batch
Epoch 3/10  Iteration 1067/3570 Training loss: 1.8638 0.1480 sec/batch
Epoch 3/10  Iteration 1068/3570 Training loss: 1.8635 0.1495 sec/batch
Epoch 3/10  Iteration 1069/3570 Training loss: 1.8633 0.1483 sec/batch
Epoch 3/10  Iteration 1070/3570 Training loss: 1.8633 0.1502 sec/batch
Epoch 3/10  Iteration 1071/3570 Training loss: 1.8632 0.1529 sec/batch
Epoch 4/10  Iteration 1072/3570 Training loss: 1.8593 0.1502 sec/batch
Epoch 4/10  Iteration 1073/3570 Training loss: 1.8096 0.1494 sec/batch
Epoch 

Epoch 4/10  Iteration 1176/3570 Training loss: 1.7717 0.1480 sec/batch
Epoch 4/10  Iteration 1177/3570 Training loss: 1.7715 0.1492 sec/batch
Epoch 4/10  Iteration 1178/3570 Training loss: 1.7713 0.1474 sec/batch
Epoch 4/10  Iteration 1179/3570 Training loss: 1.7709 0.1479 sec/batch
Epoch 4/10  Iteration 1180/3570 Training loss: 1.7707 0.1481 sec/batch
Epoch 4/10  Iteration 1181/3570 Training loss: 1.7706 0.1498 sec/batch
Epoch 4/10  Iteration 1182/3570 Training loss: 1.7702 0.1478 sec/batch
Epoch 4/10  Iteration 1183/3570 Training loss: 1.7704 0.1488 sec/batch
Epoch 4/10  Iteration 1184/3570 Training loss: 1.7701 0.1501 sec/batch
Epoch 4/10  Iteration 1185/3570 Training loss: 1.7700 0.1474 sec/batch
Epoch 4/10  Iteration 1186/3570 Training loss: 1.7693 0.1471 sec/batch
Epoch 4/10  Iteration 1187/3570 Training loss: 1.7687 0.1477 sec/batch
Epoch 4/10  Iteration 1188/3570 Training loss: 1.7684 0.1469 sec/batch
Epoch 4/10  Iteration 1189/3570 Training loss: 1.7683 0.1493 sec/batch
Epoch 

Epoch 4/10  Iteration 1292/3570 Training loss: 1.7538 0.1490 sec/batch
Epoch 4/10  Iteration 1293/3570 Training loss: 1.7538 0.1482 sec/batch
Epoch 4/10  Iteration 1294/3570 Training loss: 1.7536 0.1464 sec/batch
Epoch 4/10  Iteration 1295/3570 Training loss: 1.7534 0.1480 sec/batch
Epoch 4/10  Iteration 1296/3570 Training loss: 1.7532 0.1496 sec/batch
Epoch 4/10  Iteration 1297/3570 Training loss: 1.7532 0.1482 sec/batch
Epoch 4/10  Iteration 1298/3570 Training loss: 1.7532 0.1484 sec/batch
Epoch 4/10  Iteration 1299/3570 Training loss: 1.7531 0.1474 sec/batch
Epoch 4/10  Iteration 1300/3570 Training loss: 1.7531 0.1490 sec/batch
Validation loss: 1.60252 Saving checkpoint!
Epoch 4/10  Iteration 1301/3570 Training loss: 1.7534 0.1483 sec/batch
Epoch 4/10  Iteration 1302/3570 Training loss: 1.7532 0.1508 sec/batch
Epoch 4/10  Iteration 1303/3570 Training loss: 1.7531 0.1507 sec/batch
Epoch 4/10  Iteration 1304/3570 Training loss: 1.7529 0.1481 sec/batch
Epoch 4/10  Iteration 1305/3570 T

Epoch 4/10  Iteration 1408/3570 Training loss: 1.7442 0.1480 sec/batch
Epoch 4/10  Iteration 1409/3570 Training loss: 1.7442 0.1506 sec/batch
Epoch 4/10  Iteration 1410/3570 Training loss: 1.7442 0.1482 sec/batch
Epoch 4/10  Iteration 1411/3570 Training loss: 1.7442 0.1488 sec/batch
Epoch 4/10  Iteration 1412/3570 Training loss: 1.7440 0.1503 sec/batch
Epoch 4/10  Iteration 1413/3570 Training loss: 1.7440 0.1503 sec/batch
Epoch 4/10  Iteration 1414/3570 Training loss: 1.7438 0.1477 sec/batch
Epoch 4/10  Iteration 1415/3570 Training loss: 1.7436 0.1443 sec/batch
Epoch 4/10  Iteration 1416/3570 Training loss: 1.7435 0.1483 sec/batch
Epoch 4/10  Iteration 1417/3570 Training loss: 1.7434 0.1478 sec/batch
Epoch 4/10  Iteration 1418/3570 Training loss: 1.7434 0.1488 sec/batch
Epoch 4/10  Iteration 1419/3570 Training loss: 1.7433 0.1480 sec/batch
Epoch 4/10  Iteration 1420/3570 Training loss: 1.7432 0.1497 sec/batch
Epoch 4/10  Iteration 1421/3570 Training loss: 1.7429 0.1450 sec/batch
Epoch 

Epoch 5/10  Iteration 1524/3570 Training loss: 1.6820 0.1510 sec/batch
Epoch 5/10  Iteration 1525/3570 Training loss: 1.6816 0.1487 sec/batch
Epoch 5/10  Iteration 1526/3570 Training loss: 1.6815 0.1493 sec/batch
Epoch 5/10  Iteration 1527/3570 Training loss: 1.6813 0.1489 sec/batch
Epoch 5/10  Iteration 1528/3570 Training loss: 1.6814 0.1484 sec/batch
Epoch 5/10  Iteration 1529/3570 Training loss: 1.6810 0.1506 sec/batch
Epoch 5/10  Iteration 1530/3570 Training loss: 1.6808 0.1470 sec/batch
Epoch 5/10  Iteration 1531/3570 Training loss: 1.6808 0.1516 sec/batch
Epoch 5/10  Iteration 1532/3570 Training loss: 1.6802 0.1458 sec/batch
Epoch 5/10  Iteration 1533/3570 Training loss: 1.6795 0.1488 sec/batch
Epoch 5/10  Iteration 1534/3570 Training loss: 1.6792 0.1493 sec/batch
Epoch 5/10  Iteration 1535/3570 Training loss: 1.6790 0.1464 sec/batch
Epoch 5/10  Iteration 1536/3570 Training loss: 1.6785 0.1486 sec/batch
Epoch 5/10  Iteration 1537/3570 Training loss: 1.6783 0.1486 sec/batch
Epoch 

Epoch 5/10  Iteration 1640/3570 Training loss: 1.6680 0.1478 sec/batch
Epoch 5/10  Iteration 1641/3570 Training loss: 1.6678 0.1505 sec/batch
Epoch 5/10  Iteration 1642/3570 Training loss: 1.6679 0.1476 sec/batch
Epoch 5/10  Iteration 1643/3570 Training loss: 1.6675 0.1477 sec/batch
Epoch 5/10  Iteration 1644/3570 Training loss: 1.6673 0.1482 sec/batch
Epoch 5/10  Iteration 1645/3570 Training loss: 1.6674 0.1483 sec/batch
Epoch 5/10  Iteration 1646/3570 Training loss: 1.6675 0.1493 sec/batch
Epoch 5/10  Iteration 1647/3570 Training loss: 1.6672 0.1515 sec/batch
Epoch 5/10  Iteration 1648/3570 Training loss: 1.6671 0.1505 sec/batch
Epoch 5/10  Iteration 1649/3570 Training loss: 1.6670 0.1513 sec/batch
Epoch 5/10  Iteration 1650/3570 Training loss: 1.6670 0.1482 sec/batch
Epoch 5/10  Iteration 1651/3570 Training loss: 1.6668 0.1460 sec/batch
Epoch 5/10  Iteration 1652/3570 Training loss: 1.6667 0.1502 sec/batch
Epoch 5/10  Iteration 1653/3570 Training loss: 1.6665 0.1479 sec/batch
Epoch 

Epoch 5/10  Iteration 1756/3570 Training loss: 1.6628 0.1574 sec/batch
Epoch 5/10  Iteration 1757/3570 Training loss: 1.6628 0.1601 sec/batch
Epoch 5/10  Iteration 1758/3570 Training loss: 1.6628 0.1677 sec/batch
Epoch 5/10  Iteration 1759/3570 Training loss: 1.6627 0.1594 sec/batch
Epoch 5/10  Iteration 1760/3570 Training loss: 1.6628 0.1687 sec/batch
Epoch 5/10  Iteration 1761/3570 Training loss: 1.6628 0.1626 sec/batch
Epoch 5/10  Iteration 1762/3570 Training loss: 1.6627 0.1459 sec/batch
Epoch 5/10  Iteration 1763/3570 Training loss: 1.6628 0.1481 sec/batch
Epoch 5/10  Iteration 1764/3570 Training loss: 1.6628 0.1472 sec/batch
Epoch 5/10  Iteration 1765/3570 Training loss: 1.6626 0.1479 sec/batch
Epoch 5/10  Iteration 1766/3570 Training loss: 1.6626 0.1489 sec/batch
Epoch 5/10  Iteration 1767/3570 Training loss: 1.6626 0.1479 sec/batch
Epoch 5/10  Iteration 1768/3570 Training loss: 1.6626 0.1483 sec/batch
Epoch 5/10  Iteration 1769/3570 Training loss: 1.6624 0.1514 sec/batch
Epoch 

Epoch 6/10  Iteration 1872/3570 Training loss: 1.6232 0.1502 sec/batch
Epoch 6/10  Iteration 1873/3570 Training loss: 1.6228 0.1514 sec/batch
Epoch 6/10  Iteration 1874/3570 Training loss: 1.6221 0.1488 sec/batch
Epoch 6/10  Iteration 1875/3570 Training loss: 1.6211 0.1495 sec/batch
Epoch 6/10  Iteration 1876/3570 Training loss: 1.6210 0.1513 sec/batch
Epoch 6/10  Iteration 1877/3570 Training loss: 1.6202 0.1484 sec/batch
Epoch 6/10  Iteration 1878/3570 Training loss: 1.6197 0.1480 sec/batch
Epoch 6/10  Iteration 1879/3570 Training loss: 1.6189 0.1464 sec/batch
Epoch 6/10  Iteration 1880/3570 Training loss: 1.6186 0.1498 sec/batch
Epoch 6/10  Iteration 1881/3570 Training loss: 1.6182 0.1476 sec/batch
Epoch 6/10  Iteration 1882/3570 Training loss: 1.6179 0.1482 sec/batch
Epoch 6/10  Iteration 1883/3570 Training loss: 1.6179 0.1473 sec/batch
Epoch 6/10  Iteration 1884/3570 Training loss: 1.6175 0.1479 sec/batch
Epoch 6/10  Iteration 1885/3570 Training loss: 1.6176 0.1484 sec/batch
Epoch 

Epoch 6/10  Iteration 1988/3570 Training loss: 1.6084 0.1488 sec/batch
Epoch 6/10  Iteration 1989/3570 Training loss: 1.6083 0.1482 sec/batch
Epoch 6/10  Iteration 1990/3570 Training loss: 1.6082 0.1485 sec/batch
Epoch 6/10  Iteration 1991/3570 Training loss: 1.6079 0.1482 sec/batch
Epoch 6/10  Iteration 1992/3570 Training loss: 1.6076 0.1472 sec/batch
Epoch 6/10  Iteration 1993/3570 Training loss: 1.6074 0.1484 sec/batch
Epoch 6/10  Iteration 1994/3570 Training loss: 1.6072 0.1474 sec/batch
Epoch 6/10  Iteration 1995/3570 Training loss: 1.6073 0.1481 sec/batch
Epoch 6/10  Iteration 1996/3570 Training loss: 1.6072 0.1505 sec/batch
Epoch 6/10  Iteration 1997/3570 Training loss: 1.6073 0.1456 sec/batch
Epoch 6/10  Iteration 1998/3570 Training loss: 1.6072 0.1470 sec/batch
Epoch 6/10  Iteration 1999/3570 Training loss: 1.6073 0.1486 sec/batch
Epoch 6/10  Iteration 2000/3570 Training loss: 1.6069 0.1472 sec/batch
Validation loss: 1.47119 Saving checkpoint!
Epoch 6/10  Iteration 2001/3570 T

Epoch 6/10  Iteration 2104/3570 Training loss: 1.6044 0.1480 sec/batch
Epoch 6/10  Iteration 2105/3570 Training loss: 1.6045 0.1475 sec/batch
Epoch 6/10  Iteration 2106/3570 Training loss: 1.6046 0.1483 sec/batch
Epoch 6/10  Iteration 2107/3570 Training loss: 1.6046 0.1500 sec/batch
Epoch 6/10  Iteration 2108/3570 Training loss: 1.6047 0.1482 sec/batch
Epoch 6/10  Iteration 2109/3570 Training loss: 1.6047 0.1500 sec/batch
Epoch 6/10  Iteration 2110/3570 Training loss: 1.6046 0.1483 sec/batch
Epoch 6/10  Iteration 2111/3570 Training loss: 1.6048 0.1474 sec/batch
Epoch 6/10  Iteration 2112/3570 Training loss: 1.6047 0.1516 sec/batch
Epoch 6/10  Iteration 2113/3570 Training loss: 1.6047 0.1495 sec/batch
Epoch 6/10  Iteration 2114/3570 Training loss: 1.6048 0.1508 sec/batch
Epoch 6/10  Iteration 2115/3570 Training loss: 1.6047 0.1504 sec/batch
Epoch 6/10  Iteration 2116/3570 Training loss: 1.6047 0.1479 sec/batch
Epoch 6/10  Iteration 2117/3570 Training loss: 1.6048 0.1500 sec/batch
Epoch 

Epoch 7/10  Iteration 2220/3570 Training loss: 1.5758 0.1491 sec/batch
Epoch 7/10  Iteration 2221/3570 Training loss: 1.5752 0.1476 sec/batch
Epoch 7/10  Iteration 2222/3570 Training loss: 1.5761 0.1502 sec/batch
Epoch 7/10  Iteration 2223/3570 Training loss: 1.5760 0.1468 sec/batch
Epoch 7/10  Iteration 2224/3570 Training loss: 1.5756 0.1469 sec/batch
Epoch 7/10  Iteration 2225/3570 Training loss: 1.5756 0.1488 sec/batch
Epoch 7/10  Iteration 2226/3570 Training loss: 1.5754 0.1523 sec/batch
Epoch 7/10  Iteration 2227/3570 Training loss: 1.5748 0.1479 sec/batch
Epoch 7/10  Iteration 2228/3570 Training loss: 1.5747 0.1473 sec/batch
Epoch 7/10  Iteration 2229/3570 Training loss: 1.5748 0.1469 sec/batch
Epoch 7/10  Iteration 2230/3570 Training loss: 1.5743 0.1485 sec/batch
Epoch 7/10  Iteration 2231/3570 Training loss: 1.5737 0.1459 sec/batch
Epoch 7/10  Iteration 2232/3570 Training loss: 1.5727 0.1515 sec/batch
Epoch 7/10  Iteration 2233/3570 Training loss: 1.5727 0.1466 sec/batch
Epoch 

Epoch 7/10  Iteration 2336/3570 Training loss: 1.5623 0.1494 sec/batch
Epoch 7/10  Iteration 2337/3570 Training loss: 1.5623 0.1519 sec/batch
Epoch 7/10  Iteration 2338/3570 Training loss: 1.5622 0.1489 sec/batch
Epoch 7/10  Iteration 2339/3570 Training loss: 1.5620 0.1486 sec/batch
Epoch 7/10  Iteration 2340/3570 Training loss: 1.5621 0.1503 sec/batch
Epoch 7/10  Iteration 2341/3570 Training loss: 1.5621 0.1497 sec/batch
Epoch 7/10  Iteration 2342/3570 Training loss: 1.5619 0.1483 sec/batch
Epoch 7/10  Iteration 2343/3570 Training loss: 1.5617 0.1504 sec/batch
Epoch 7/10  Iteration 2344/3570 Training loss: 1.5615 0.1469 sec/batch
Epoch 7/10  Iteration 2345/3570 Training loss: 1.5615 0.1468 sec/batch
Epoch 7/10  Iteration 2346/3570 Training loss: 1.5615 0.1496 sec/batch
Epoch 7/10  Iteration 2347/3570 Training loss: 1.5614 0.1523 sec/batch
Epoch 7/10  Iteration 2348/3570 Training loss: 1.5611 0.1495 sec/batch
Epoch 7/10  Iteration 2349/3570 Training loss: 1.5609 0.1501 sec/batch
Epoch 

Epoch 7/10  Iteration 2452/3570 Training loss: 1.5592 0.1485 sec/batch
Epoch 7/10  Iteration 2453/3570 Training loss: 1.5593 0.1488 sec/batch
Epoch 7/10  Iteration 2454/3570 Training loss: 1.5593 0.1526 sec/batch
Epoch 7/10  Iteration 2455/3570 Training loss: 1.5592 0.1476 sec/batch
Epoch 7/10  Iteration 2456/3570 Training loss: 1.5593 0.1476 sec/batch
Epoch 7/10  Iteration 2457/3570 Training loss: 1.5593 0.1473 sec/batch
Epoch 7/10  Iteration 2458/3570 Training loss: 1.5593 0.1508 sec/batch
Epoch 7/10  Iteration 2459/3570 Training loss: 1.5594 0.1480 sec/batch
Epoch 7/10  Iteration 2460/3570 Training loss: 1.5594 0.1482 sec/batch
Epoch 7/10  Iteration 2461/3570 Training loss: 1.5593 0.1492 sec/batch
Epoch 7/10  Iteration 2462/3570 Training loss: 1.5594 0.1482 sec/batch
Epoch 7/10  Iteration 2463/3570 Training loss: 1.5595 0.1493 sec/batch
Epoch 7/10  Iteration 2464/3570 Training loss: 1.5594 0.1505 sec/batch
Epoch 7/10  Iteration 2465/3570 Training loss: 1.5595 0.1467 sec/batch
Epoch 

Epoch 8/10  Iteration 2568/3570 Training loss: 1.5398 0.1481 sec/batch
Epoch 8/10  Iteration 2569/3570 Training loss: 1.5395 0.1497 sec/batch
Epoch 8/10  Iteration 2570/3570 Training loss: 1.5393 0.1485 sec/batch
Epoch 8/10  Iteration 2571/3570 Training loss: 1.5392 0.1472 sec/batch
Epoch 8/10  Iteration 2572/3570 Training loss: 1.5390 0.1479 sec/batch
Epoch 8/10  Iteration 2573/3570 Training loss: 1.5385 0.1477 sec/batch
Epoch 8/10  Iteration 2574/3570 Training loss: 1.5380 0.1480 sec/batch
Epoch 8/10  Iteration 2575/3570 Training loss: 1.5378 0.1492 sec/batch
Epoch 8/10  Iteration 2576/3570 Training loss: 1.5372 0.1492 sec/batch
Epoch 8/10  Iteration 2577/3570 Training loss: 1.5373 0.1444 sec/batch
Epoch 8/10  Iteration 2578/3570 Training loss: 1.5368 0.1482 sec/batch
Epoch 8/10  Iteration 2579/3570 Training loss: 1.5377 0.1468 sec/batch
Epoch 8/10  Iteration 2580/3570 Training loss: 1.5376 0.1507 sec/batch
Epoch 8/10  Iteration 2581/3570 Training loss: 1.5374 0.1491 sec/batch
Epoch 

Epoch 8/10  Iteration 2684/3570 Training loss: 1.5266 0.1469 sec/batch
Epoch 8/10  Iteration 2685/3570 Training loss: 1.5266 0.1493 sec/batch
Epoch 8/10  Iteration 2686/3570 Training loss: 1.5267 0.1498 sec/batch
Epoch 8/10  Iteration 2687/3570 Training loss: 1.5264 0.1474 sec/batch
Epoch 8/10  Iteration 2688/3570 Training loss: 1.5263 0.1473 sec/batch
Epoch 8/10  Iteration 2689/3570 Training loss: 1.5261 0.1492 sec/batch
Epoch 8/10  Iteration 2690/3570 Training loss: 1.5260 0.1473 sec/batch
Epoch 8/10  Iteration 2691/3570 Training loss: 1.5260 0.1478 sec/batch
Epoch 8/10  Iteration 2692/3570 Training loss: 1.5259 0.1490 sec/batch
Epoch 8/10  Iteration 2693/3570 Training loss: 1.5260 0.1483 sec/batch
Epoch 8/10  Iteration 2694/3570 Training loss: 1.5260 0.1476 sec/batch
Epoch 8/10  Iteration 2695/3570 Training loss: 1.5260 0.1602 sec/batch
Epoch 8/10  Iteration 2696/3570 Training loss: 1.5257 0.1475 sec/batch
Epoch 8/10  Iteration 2697/3570 Training loss: 1.5258 0.1480 sec/batch
Epoch 

Epoch 8/10  Iteration 2800/3570 Training loss: 1.5247 0.1487 sec/batch
Validation loss: 1.39109 Saving checkpoint!
Epoch 8/10  Iteration 2801/3570 Training loss: 1.5249 0.1467 sec/batch
Epoch 8/10  Iteration 2802/3570 Training loss: 1.5248 0.1481 sec/batch
Epoch 8/10  Iteration 2803/3570 Training loss: 1.5248 0.1502 sec/batch
Epoch 8/10  Iteration 2804/3570 Training loss: 1.5248 0.1475 sec/batch
Epoch 8/10  Iteration 2805/3570 Training loss: 1.5246 0.1484 sec/batch
Epoch 8/10  Iteration 2806/3570 Training loss: 1.5247 0.1480 sec/batch
Epoch 8/10  Iteration 2807/3570 Training loss: 1.5247 0.1458 sec/batch
Epoch 8/10  Iteration 2808/3570 Training loss: 1.5247 0.1569 sec/batch
Epoch 8/10  Iteration 2809/3570 Training loss: 1.5247 0.1478 sec/batch
Epoch 8/10  Iteration 2810/3570 Training loss: 1.5248 0.1474 sec/batch
Epoch 8/10  Iteration 2811/3570 Training loss: 1.5248 0.1488 sec/batch
Epoch 8/10  Iteration 2812/3570 Training loss: 1.5247 0.1518 sec/batch
Epoch 8/10  Iteration 2813/3570 T

Epoch 9/10  Iteration 2916/3570 Training loss: 1.5124 0.1496 sec/batch
Epoch 9/10  Iteration 2917/3570 Training loss: 1.5120 0.1475 sec/batch
Epoch 9/10  Iteration 2918/3570 Training loss: 1.5116 0.1473 sec/batch
Epoch 9/10  Iteration 2919/3570 Training loss: 1.5115 0.1474 sec/batch
Epoch 9/10  Iteration 2920/3570 Training loss: 1.5117 0.1464 sec/batch
Epoch 9/10  Iteration 2921/3570 Training loss: 1.5112 0.1504 sec/batch
Epoch 9/10  Iteration 2922/3570 Training loss: 1.5110 0.1460 sec/batch
Epoch 9/10  Iteration 2923/3570 Training loss: 1.5109 0.1453 sec/batch
Epoch 9/10  Iteration 2924/3570 Training loss: 1.5104 0.1480 sec/batch
Epoch 9/10  Iteration 2925/3570 Training loss: 1.5104 0.1475 sec/batch
Epoch 9/10  Iteration 2926/3570 Training loss: 1.5100 0.1497 sec/batch
Epoch 9/10  Iteration 2927/3570 Training loss: 1.5099 0.1477 sec/batch
Epoch 9/10  Iteration 2928/3570 Training loss: 1.5097 0.1480 sec/batch
Epoch 9/10  Iteration 2929/3570 Training loss: 1.5097 0.1499 sec/batch
Epoch 

Epoch 9/10  Iteration 3032/3570 Training loss: 1.4973 0.1503 sec/batch
Epoch 9/10  Iteration 3033/3570 Training loss: 1.4972 0.1499 sec/batch
Epoch 9/10  Iteration 3034/3570 Training loss: 1.4971 0.1487 sec/batch
Epoch 9/10  Iteration 3035/3570 Training loss: 1.4972 0.1481 sec/batch
Epoch 9/10  Iteration 3036/3570 Training loss: 1.4972 0.1483 sec/batch
Epoch 9/10  Iteration 3037/3570 Training loss: 1.4973 0.1472 sec/batch
Epoch 9/10  Iteration 3038/3570 Training loss: 1.4973 0.1523 sec/batch
Epoch 9/10  Iteration 3039/3570 Training loss: 1.4973 0.1468 sec/batch
Epoch 9/10  Iteration 3040/3570 Training loss: 1.4975 0.1481 sec/batch
Epoch 9/10  Iteration 3041/3570 Training loss: 1.4973 0.1476 sec/batch
Epoch 9/10  Iteration 3042/3570 Training loss: 1.4972 0.1473 sec/batch
Epoch 9/10  Iteration 3043/3570 Training loss: 1.4972 0.1497 sec/batch
Epoch 9/10  Iteration 3044/3570 Training loss: 1.4970 0.1479 sec/batch
Epoch 9/10  Iteration 3045/3570 Training loss: 1.4969 0.1478 sec/batch
Epoch 

Epoch 9/10  Iteration 3148/3570 Training loss: 1.4952 0.1469 sec/batch
Epoch 9/10  Iteration 3149/3570 Training loss: 1.4951 0.1470 sec/batch
Epoch 9/10  Iteration 3150/3570 Training loss: 1.4952 0.1463 sec/batch
Epoch 9/10  Iteration 3151/3570 Training loss: 1.4954 0.1471 sec/batch
Epoch 9/10  Iteration 3152/3570 Training loss: 1.4955 0.1485 sec/batch
Epoch 9/10  Iteration 3153/3570 Training loss: 1.4956 0.1479 sec/batch
Epoch 9/10  Iteration 3154/3570 Training loss: 1.4956 0.1504 sec/batch
Epoch 9/10  Iteration 3155/3570 Training loss: 1.4957 0.1500 sec/batch
Epoch 9/10  Iteration 3156/3570 Training loss: 1.4958 0.1480 sec/batch
Epoch 9/10  Iteration 3157/3570 Training loss: 1.4956 0.1480 sec/batch
Epoch 9/10  Iteration 3158/3570 Training loss: 1.4956 0.1504 sec/batch
Epoch 9/10  Iteration 3159/3570 Training loss: 1.4955 0.1474 sec/batch
Epoch 9/10  Iteration 3160/3570 Training loss: 1.4955 0.1482 sec/batch
Epoch 9/10  Iteration 3161/3570 Training loss: 1.4954 0.1469 sec/batch
Epoch 

Epoch 10/10  Iteration 3264/3570 Training loss: 1.4875 0.1498 sec/batch
Epoch 10/10  Iteration 3265/3570 Training loss: 1.4873 0.1471 sec/batch
Epoch 10/10  Iteration 3266/3570 Training loss: 1.4867 0.1485 sec/batch
Epoch 10/10  Iteration 3267/3570 Training loss: 1.4858 0.1480 sec/batch
Epoch 10/10  Iteration 3268/3570 Training loss: 1.4849 0.1503 sec/batch
Epoch 10/10  Iteration 3269/3570 Training loss: 1.4845 0.1478 sec/batch
Epoch 10/10  Iteration 3270/3570 Training loss: 1.4851 0.1497 sec/batch
Epoch 10/10  Iteration 3271/3570 Training loss: 1.4854 0.1476 sec/batch
Epoch 10/10  Iteration 3272/3570 Training loss: 1.4851 0.1460 sec/batch
Epoch 10/10  Iteration 3273/3570 Training loss: 1.4852 0.1472 sec/batch
Epoch 10/10  Iteration 3274/3570 Training loss: 1.4850 0.1500 sec/batch
Epoch 10/10  Iteration 3275/3570 Training loss: 1.4843 0.1482 sec/batch
Epoch 10/10  Iteration 3276/3570 Training loss: 1.4844 0.1490 sec/batch
Epoch 10/10  Iteration 3277/3570 Training loss: 1.4846 0.1465 se

Epoch 10/10  Iteration 3378/3570 Training loss: 1.4724 0.1541 sec/batch
Epoch 10/10  Iteration 3379/3570 Training loss: 1.4724 0.1461 sec/batch
Epoch 10/10  Iteration 3380/3570 Training loss: 1.4723 0.1472 sec/batch
Epoch 10/10  Iteration 3381/3570 Training loss: 1.4721 0.1511 sec/batch
Epoch 10/10  Iteration 3382/3570 Training loss: 1.4718 0.1450 sec/batch
Epoch 10/10  Iteration 3383/3570 Training loss: 1.4717 0.1488 sec/batch
Epoch 10/10  Iteration 3384/3570 Training loss: 1.4720 0.1526 sec/batch
Epoch 10/10  Iteration 3385/3570 Training loss: 1.4722 0.1513 sec/batch
Epoch 10/10  Iteration 3386/3570 Training loss: 1.4726 0.1500 sec/batch
Epoch 10/10  Iteration 3387/3570 Training loss: 1.4726 0.1473 sec/batch
Epoch 10/10  Iteration 3388/3570 Training loss: 1.4728 0.1469 sec/batch
Epoch 10/10  Iteration 3389/3570 Training loss: 1.4730 0.1483 sec/batch
Epoch 10/10  Iteration 3390/3570 Training loss: 1.4729 0.1472 sec/batch
Epoch 10/10  Iteration 3391/3570 Training loss: 1.4728 0.1501 se

Epoch 10/10  Iteration 3492/3570 Training loss: 1.4722 0.1481 sec/batch
Epoch 10/10  Iteration 3493/3570 Training loss: 1.4724 0.1485 sec/batch
Epoch 10/10  Iteration 3494/3570 Training loss: 1.4724 0.1472 sec/batch
Epoch 10/10  Iteration 3495/3570 Training loss: 1.4724 0.1473 sec/batch
Epoch 10/10  Iteration 3496/3570 Training loss: 1.4724 0.1496 sec/batch
Epoch 10/10  Iteration 3497/3570 Training loss: 1.4722 0.1483 sec/batch
Epoch 10/10  Iteration 3498/3570 Training loss: 1.4722 0.1476 sec/batch
Epoch 10/10  Iteration 3499/3570 Training loss: 1.4722 0.1518 sec/batch
Epoch 10/10  Iteration 3500/3570 Training loss: 1.4719 0.1510 sec/batch
Validation loss: 1.34641 Saving checkpoint!
Epoch 10/10  Iteration 3501/3570 Training loss: 1.4722 0.1600 sec/batch
Epoch 10/10  Iteration 3502/3570 Training loss: 1.4722 0.1559 sec/batch
Epoch 10/10  Iteration 3503/3570 Training loss: 1.4723 0.1504 sec/batch
Epoch 10/10  Iteration 3504/3570 Training loss: 1.4722 0.1490 sec/batch
Epoch 10/10  Iterati

In [None]:
tf.train.get_checkpoint_state('checkpoints/anna')

## Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.



In [17]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [41]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    prime = "Far"
    samples = [c for c in prime]
    model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.preds, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.preds, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

In [44]:
checkpoint = "checkpoints/anna/i3560_l512_1.122.ckpt"
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)

Farlathit that if had so
like it that it were. He could not trouble to his wife, and there was
anything in them of the side of his weaky in the creature at his forteren
to him.

"What is it? I can't bread to those," said Stepan Arkadyevitch. "It's not
my children, and there is an almost this arm, true it mays already,
and tell you what I have say to you, and was not looking at the peasant,
why is, I don't know him out, and she doesn't speak to me immediately, as
you would say the countess and the more frest an angelembre, and time and
things's silent, but I was not in my stand that is in my head. But if he
say, and was so feeling with his soul. A child--in his soul of his
soul of his soul. He should not see that any of that sense of. Here he
had not been so composed and to speak for as in a whole picture, but
all the setting and her excellent and society, who had been delighted
and see to anywing had been being troed to thousand words on them,
we liked him.

That set in her money at th

In [43]:
checkpoint = "checkpoints/anna/i200_l512_2.432.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Farnt him oste wha sorind thans tout thint asd an sesand an hires on thime sind thit aled, ban thand and out hore as the ter hos ton ho te that, was tis tart al the hand sostint him sore an tit an son thes, win he se ther san ther hher tas tarereng,.

Anl at an ades in ond hesiln, ad hhe torers teans, wast tar arering tho this sos alten sorer has hhas an siton ther him he had sin he ard ate te anling the sosin her ans and
arins asd and ther ale te tot an tand tanginge wath and ho ald, so sot th asend sat hare sother horesinnd, he hesense wing ante her so tith tir sherinn, anded and to the toul anderin he sorit he torsith she se atere an ting ot hand and thit hhe so the te wile har
ens ont in the sersise, and we he seres tar aterer, to ato tat or has he he wan ton here won and sen heren he sosering, to to theer oo adent har herere the wosh oute, was serild ward tous hed astend..

I's sint on alt in har tor tit her asd hade shithans ored he talereng an soredendere tim tot hees. Tise sor 

In [46]:
checkpoint = "checkpoints/anna/i600_l512_1.750.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Fard as astice her said he celatice of to seress in the raice, and to be the some and sere allats to that said to that the sark and a cast a the wither ald the pacinesse of her had astition, he said to the sount as she west at hissele. Af the cond it he was a fact onthis astisarianing.


"Or a ton to to be that's a more at aspestale as the sont of anstiring as
thours and trey.

The same wo dangring the
raterst, who sore and somethy had ast out an of his book. "We had's beane were that, and a morted a thay he had to tere. Then to
her homent andertersed his his ancouted to the pirsted, the soution for of the pirsice inthirgest and stenciol, with the hard and and
a colrice of to be oneres,
the song to this anderssad.
The could ounterss the said to serom of
soment a carsed of sheres of she
torded
har and want in their of hould, but
her told in that in he tad a the same to her. Serghing an her has and with the seed, and the camt ont his about of the
sail, the her then all houg ant or to hus

In [47]:
checkpoint = "checkpoints/anna/i1000_l512_1.484.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)

Farrat, his felt has at it.

"When the pose ther hor exceed
to his sheant was," weat a sime of his sounsed. The coment and the facily that which had began terede a marilicaly whice whether the pose of his hand, at she was alligated herself the same on she had to
taiking to his forthing and streath how to hand
began in a lang at some at it, this he cholded not set all her. "Wo love that is setthing. Him anstering as seen that."

"Yes in the man that say the mare a crances is it?" said Sergazy Ivancatching. "You doon think were somether is ifficult of a mone of
though the most at the countes that the
mean on the come to say the most, to
his feesing of
a man she, whilo he
sained and well, that he would still at to said. He wind at his for the sore in the most
of hoss and almoved to see him. They have betine the sumper into at he his stire, and what he was that at the so steate of the
sound, and shin should have a geest of shall feet on the conderation to she had been at that imporsing the