In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

import pickle # to save/load objects
import sys
sys.path.append("../helper/") # go to parent dir
from functions_preprocess import *
from functions_model import *

%load_ext autoreload
%autoreload 2

### Data Preprocessing
Here, I'm getting, cleaning and processing the data. I intentionally mixed the bands with different characteristics. I thought this may prevent overfitting and provide better word pool. Data has the almost entire discography of the bands, even demos for some.  

I'm also creating two dictionaries here, `vocab_to_int` and `int_to_vocab`. The model need numerical representation of the characters to be able to compute weights, biases, etc.

In [2]:
# prepare training data
artist_list = ['metallica', 'megadeth', 'slayer', 'judaspriest', 'ironmaiden', 'motorhead', 
               'blacksabbath', 'testament', 'overkill', 'anthrax', 'pantera', 'icedearth',
               'manowar', 'annihilator', 'exodus', 'dreamtheater']
text = ''
folder_name = ''
for i in artist_list:
    text = text + combine_songs(i)
    folder_name = folder_name + i[:2] # I know this's not the greatest folder name
    
vocab = set(text)
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)
print(folder_name, "ready with", len(chars), "chars")

memesljuirmoblteovanpaicmaanexdr ready with 3079342 chars


### Saving
I need to save the objects I created so far. Remember, no orders in Python dictionaries. This means this notebook creates different dictionaries in each session. Because I need the use my model later, I have to save these lookup dictionaries.

In [3]:
# Saving the objects
f = open('checkpoints/{}/vars.pckl'.format(folder_name), 'wb')
pickle.dump([text, vocab, vocab_to_int, int_to_vocab, chars, artist_list], f, protocol=2)
f.close()

I highly encourage you to check the function `split_data` in `functions_model.py`. The target is the next character in the sequence.  
`x = chars[: n_batches*slice_size]`  
`y = chars[1: n_batches*slice_size + 1]`

In [4]:
# Split training and validation sets
train_x, train_y, val_x, val_y = split_data(chars, 10, 200) # default fraction for trainning is 0.9

In [5]:
# hyperparameters
batch_size = 100
num_steps = 100
lstm_size = 512
num_layers = 2
learning_rate = 0.001

In [6]:
start_time = time.time() # always measure the time
epochs = 40
save_every_n = 500
train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)

model = build_rnn(len(vocab), 
                  batch_size=batch_size,
                  num_steps=num_steps,
                  learning_rate=learning_rate,
                  lstm_size=lstm_size,
                  num_layers=num_layers)

saver = tf.train.Saver(max_to_keep=100)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    train_writer = tf.summary.FileWriter('./logs/{}/train'.format(folder_name), sess.graph)
    test_writer = tf.summary.FileWriter('./logs/{}/test'.format(folder_name))
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/lyr20.ckpt')
    
    n_batches = int(train_x.shape[1]/num_steps)
    iterations = n_batches * epochs
    for e in range(epochs):
        
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
            iteration = e*n_batches + b
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: 0.5,
                    model.initial_state: new_state}
            summary, batch_loss, new_state, _ = sess.run([model.merged, model.cost, 
                                                          model.final_state, model.optimizer], 
                                                          feed_dict=feed)
            loss += batch_loss
            end = time.time()
            print('Epoch {}/{} '.format(e+1, epochs),
                  'Iteration {}/{}'.format(iteration, iterations),
                  'Training loss: {:.4f}'.format(loss/b),
                  '{:.4f} sec/batch'.format((end-start)))
            
            train_writer.add_summary(summary, iteration)
        
            if (iteration%save_every_n == 0) or (iteration == iterations):
                # Check performance, notice dropout has been set to 1 
                # because this is validation, not training, so we need everything
                val_loss = []
                new_state = sess.run(model.initial_state)
                for x, y in get_batch([val_x, val_y], num_steps):
                    feed = {model.inputs: x,
                            model.targets: y,
                            model.keep_prob: 1.,
                            model.initial_state: new_state}
                    summary, batch_loss, new_state = sess.run([model.merged, model.cost, 
                                                               model.final_state], feed_dict=feed)
                    val_loss.append(batch_loss)
                    
                test_writer.add_summary(summary, iteration)

                print('Validation loss:', np.mean(val_loss),
                      'Saving checkpoint!')
                saver.save(sess, "checkpoints/{}/i{}_l{}_{:.3f}.ckpt".format(folder_name, iteration, lstm_size, np.mean(val_loss)))

print(epochs, "epoch done in", round(time.time() - start_time, 2)/60, "mins")

Epoch 1/40  Iteration 1/11040 Training loss: 4.4761 0.9269 sec/batch
Epoch 1/40  Iteration 2/11040 Training loss: 4.4324 0.4878 sec/batch
Epoch 1/40  Iteration 3/11040 Training loss: 4.2446 0.4667 sec/batch
Epoch 1/40  Iteration 4/11040 Training loss: 4.3949 0.4858 sec/batch
Epoch 1/40  Iteration 5/11040 Training loss: 4.3521 0.4824 sec/batch
Epoch 1/40  Iteration 6/11040 Training loss: 4.2877 0.4763 sec/batch
Epoch 1/40  Iteration 7/11040 Training loss: 4.2215 0.4735 sec/batch
Epoch 1/40  Iteration 8/11040 Training loss: 4.1513 0.4875 sec/batch
Epoch 1/40  Iteration 9/11040 Training loss: 4.0862 0.4765 sec/batch
Epoch 1/40  Iteration 10/11040 Training loss: 4.0323 0.4958 sec/batch
Epoch 1/40  Iteration 11/11040 Training loss: 3.9828 0.4804 sec/batch
Epoch 1/40  Iteration 12/11040 Training loss: 3.9386 0.4785 sec/batch
Epoch 1/40  Iteration 13/11040 Training loss: 3.9028 0.4814 sec/batch
Epoch 1/40  Iteration 14/11040 Training loss: 3.8712 0.4815 sec/batch
Epoch 1/40  Iteration 15/1104

Epoch 1/40  Iteration 118/11040 Training loss: 3.3084 0.4754 sec/batch
Epoch 1/40  Iteration 119/11040 Training loss: 3.3071 0.4824 sec/batch
Epoch 1/40  Iteration 120/11040 Training loss: 3.3060 0.4754 sec/batch
Epoch 1/40  Iteration 121/11040 Training loss: 3.3049 0.4766 sec/batch
Epoch 1/40  Iteration 122/11040 Training loss: 3.3037 0.4745 sec/batch
Epoch 1/40  Iteration 123/11040 Training loss: 3.3025 0.4793 sec/batch
Epoch 1/40  Iteration 124/11040 Training loss: 3.3013 0.4763 sec/batch
Epoch 1/40  Iteration 125/11040 Training loss: 3.3001 0.4744 sec/batch
Epoch 1/40  Iteration 126/11040 Training loss: 3.2987 0.4744 sec/batch
Epoch 1/40  Iteration 127/11040 Training loss: 3.2974 0.4774 sec/batch
Epoch 1/40  Iteration 128/11040 Training loss: 3.2960 0.4766 sec/batch
Epoch 1/40  Iteration 129/11040 Training loss: 3.2947 0.4744 sec/batch
Epoch 1/40  Iteration 130/11040 Training loss: 3.2933 0.4733 sec/batch
Epoch 1/40  Iteration 131/11040 Training loss: 3.2919 0.4774 sec/batch
Epoch 

Epoch 1/40  Iteration 234/11040 Training loss: 3.0591 0.4795 sec/batch
Epoch 1/40  Iteration 235/11040 Training loss: 3.0564 0.4794 sec/batch
Epoch 1/40  Iteration 236/11040 Training loss: 3.0537 0.4785 sec/batch
Epoch 1/40  Iteration 237/11040 Training loss: 3.0510 0.4784 sec/batch
Epoch 1/40  Iteration 238/11040 Training loss: 3.0484 0.4815 sec/batch
Epoch 1/40  Iteration 239/11040 Training loss: 3.0459 0.4784 sec/batch
Epoch 1/40  Iteration 240/11040 Training loss: 3.0433 0.4794 sec/batch
Epoch 1/40  Iteration 241/11040 Training loss: 3.0407 0.4764 sec/batch
Epoch 1/40  Iteration 242/11040 Training loss: 3.0381 0.4784 sec/batch
Epoch 1/40  Iteration 243/11040 Training loss: 3.0354 0.4804 sec/batch
Epoch 1/40  Iteration 244/11040 Training loss: 3.0327 0.4773 sec/batch
Epoch 1/40  Iteration 245/11040 Training loss: 3.0301 0.4783 sec/batch
Epoch 1/40  Iteration 246/11040 Training loss: 3.0276 0.4844 sec/batch
Epoch 1/40  Iteration 247/11040 Training loss: 3.0251 0.4775 sec/batch
Epoch 

Epoch 2/40  Iteration 350/11040 Training loss: 2.2782 0.4841 sec/batch
Epoch 2/40  Iteration 351/11040 Training loss: 2.2771 0.4788 sec/batch
Epoch 2/40  Iteration 352/11040 Training loss: 2.2761 0.4790 sec/batch
Epoch 2/40  Iteration 353/11040 Training loss: 2.2751 0.4816 sec/batch
Epoch 2/40  Iteration 354/11040 Training loss: 2.2743 0.4773 sec/batch
Epoch 2/40  Iteration 355/11040 Training loss: 2.2730 0.4656 sec/batch
Epoch 2/40  Iteration 356/11040 Training loss: 2.2717 0.4750 sec/batch
Epoch 2/40  Iteration 357/11040 Training loss: 2.2702 0.4809 sec/batch
Epoch 2/40  Iteration 358/11040 Training loss: 2.2687 0.4752 sec/batch
Epoch 2/40  Iteration 359/11040 Training loss: 2.2674 0.4832 sec/batch
Epoch 2/40  Iteration 360/11040 Training loss: 2.2658 0.4768 sec/batch
Epoch 2/40  Iteration 361/11040 Training loss: 2.2645 0.4655 sec/batch
Epoch 2/40  Iteration 362/11040 Training loss: 2.2634 0.4751 sec/batch
Epoch 2/40  Iteration 363/11040 Training loss: 2.2622 0.4859 sec/batch
Epoch 

Epoch 2/40  Iteration 466/11040 Training loss: 2.1903 0.4676 sec/batch
Epoch 2/40  Iteration 467/11040 Training loss: 2.1896 0.4772 sec/batch
Epoch 2/40  Iteration 468/11040 Training loss: 2.1888 0.4795 sec/batch
Epoch 2/40  Iteration 469/11040 Training loss: 2.1881 0.4904 sec/batch
Epoch 2/40  Iteration 470/11040 Training loss: 2.1875 0.4871 sec/batch
Epoch 2/40  Iteration 471/11040 Training loss: 2.1870 0.4788 sec/batch
Epoch 2/40  Iteration 472/11040 Training loss: 2.1864 0.4830 sec/batch
Epoch 2/40  Iteration 473/11040 Training loss: 2.1857 0.4915 sec/batch
Epoch 2/40  Iteration 474/11040 Training loss: 2.1851 0.4780 sec/batch
Epoch 2/40  Iteration 475/11040 Training loss: 2.1847 0.4727 sec/batch
Epoch 2/40  Iteration 476/11040 Training loss: 2.1841 0.4810 sec/batch
Epoch 2/40  Iteration 477/11040 Training loss: 2.1834 0.4764 sec/batch
Epoch 2/40  Iteration 478/11040 Training loss: 2.1827 0.4806 sec/batch
Epoch 2/40  Iteration 479/11040 Training loss: 2.1822 0.4684 sec/batch
Epoch 

Epoch 3/40  Iteration 581/11040 Training loss: 1.9673 0.4733 sec/batch
Epoch 3/40  Iteration 582/11040 Training loss: 1.9676 0.4764 sec/batch
Epoch 3/40  Iteration 583/11040 Training loss: 1.9667 0.4784 sec/batch
Epoch 3/40  Iteration 584/11040 Training loss: 1.9656 0.4755 sec/batch
Epoch 3/40  Iteration 585/11040 Training loss: 1.9659 0.4744 sec/batch
Epoch 3/40  Iteration 586/11040 Training loss: 1.9658 0.4774 sec/batch
Epoch 3/40  Iteration 587/11040 Training loss: 1.9656 0.4744 sec/batch
Epoch 3/40  Iteration 588/11040 Training loss: 1.9654 0.4784 sec/batch
Epoch 3/40  Iteration 589/11040 Training loss: 1.9651 0.4765 sec/batch
Epoch 3/40  Iteration 590/11040 Training loss: 1.9646 0.4744 sec/batch
Epoch 3/40  Iteration 591/11040 Training loss: 1.9640 0.4754 sec/batch
Epoch 3/40  Iteration 592/11040 Training loss: 1.9644 0.4734 sec/batch
Epoch 3/40  Iteration 593/11040 Training loss: 1.9637 0.4744 sec/batch
Epoch 3/40  Iteration 594/11040 Training loss: 1.9627 0.4774 sec/batch
Epoch 

Epoch 3/40  Iteration 697/11040 Training loss: 1.9231 0.4783 sec/batch
Epoch 3/40  Iteration 698/11040 Training loss: 1.9228 0.4783 sec/batch
Epoch 3/40  Iteration 699/11040 Training loss: 1.9224 0.4765 sec/batch
Epoch 3/40  Iteration 700/11040 Training loss: 1.9219 0.4794 sec/batch
Epoch 3/40  Iteration 701/11040 Training loss: 1.9213 0.4784 sec/batch
Epoch 3/40  Iteration 702/11040 Training loss: 1.9210 0.4804 sec/batch
Epoch 3/40  Iteration 703/11040 Training loss: 1.9208 0.4754 sec/batch
Epoch 3/40  Iteration 704/11040 Training loss: 1.9205 0.4764 sec/batch
Epoch 3/40  Iteration 705/11040 Training loss: 1.9200 0.4763 sec/batch
Epoch 3/40  Iteration 706/11040 Training loss: 1.9197 0.4776 sec/batch
Epoch 3/40  Iteration 707/11040 Training loss: 1.9194 0.4785 sec/batch
Epoch 3/40  Iteration 708/11040 Training loss: 1.9190 0.4746 sec/batch
Epoch 3/40  Iteration 709/11040 Training loss: 1.9185 0.4764 sec/batch
Epoch 3/40  Iteration 710/11040 Training loss: 1.9183 0.4766 sec/batch
Epoch 

Epoch 3/40  Iteration 813/11040 Training loss: 1.8872 0.4813 sec/batch
Epoch 3/40  Iteration 814/11040 Training loss: 1.8870 0.4803 sec/batch
Epoch 3/40  Iteration 815/11040 Training loss: 1.8868 0.4763 sec/batch
Epoch 3/40  Iteration 816/11040 Training loss: 1.8865 0.4774 sec/batch
Epoch 3/40  Iteration 817/11040 Training loss: 1.8862 0.4793 sec/batch
Epoch 3/40  Iteration 818/11040 Training loss: 1.8860 0.4814 sec/batch
Epoch 3/40  Iteration 819/11040 Training loss: 1.8857 0.4764 sec/batch
Epoch 3/40  Iteration 820/11040 Training loss: 1.8854 0.4774 sec/batch
Epoch 3/40  Iteration 821/11040 Training loss: 1.8852 0.4773 sec/batch
Epoch 3/40  Iteration 822/11040 Training loss: 1.8851 0.4774 sec/batch
Epoch 3/40  Iteration 823/11040 Training loss: 1.8851 0.4764 sec/batch
Epoch 3/40  Iteration 824/11040 Training loss: 1.8850 0.4785 sec/batch
Epoch 3/40  Iteration 825/11040 Training loss: 1.8847 0.4744 sec/batch
Epoch 3/40  Iteration 826/11040 Training loss: 1.8844 0.4744 sec/batch
Epoch 

Epoch 4/40  Iteration 929/11040 Training loss: 1.7692 0.4783 sec/batch
Epoch 4/40  Iteration 930/11040 Training loss: 1.7689 0.4763 sec/batch
Epoch 4/40  Iteration 931/11040 Training loss: 1.7681 0.4784 sec/batch
Epoch 4/40  Iteration 932/11040 Training loss: 1.7682 0.4773 sec/batch
Epoch 4/40  Iteration 933/11040 Training loss: 1.7677 0.4783 sec/batch
Epoch 4/40  Iteration 934/11040 Training loss: 1.7675 0.4783 sec/batch
Epoch 4/40  Iteration 935/11040 Training loss: 1.7679 0.4784 sec/batch
Epoch 4/40  Iteration 936/11040 Training loss: 1.7681 0.4793 sec/batch
Epoch 4/40  Iteration 937/11040 Training loss: 1.7683 0.4804 sec/batch
Epoch 4/40  Iteration 938/11040 Training loss: 1.7681 0.4803 sec/batch
Epoch 4/40  Iteration 939/11040 Training loss: 1.7681 0.4773 sec/batch
Epoch 4/40  Iteration 940/11040 Training loss: 1.7680 0.4743 sec/batch
Epoch 4/40  Iteration 941/11040 Training loss: 1.7681 0.4783 sec/batch
Epoch 4/40  Iteration 942/11040 Training loss: 1.7682 0.4803 sec/batch
Epoch 

Epoch 4/40  Iteration 1044/11040 Training loss: 1.7549 0.4824 sec/batch
Epoch 4/40  Iteration 1045/11040 Training loss: 1.7546 0.4773 sec/batch
Epoch 4/40  Iteration 1046/11040 Training loss: 1.7543 0.4784 sec/batch
Epoch 4/40  Iteration 1047/11040 Training loss: 1.7541 0.4787 sec/batch
Epoch 4/40  Iteration 1048/11040 Training loss: 1.7540 0.4775 sec/batch
Epoch 4/40  Iteration 1049/11040 Training loss: 1.7539 0.4774 sec/batch
Epoch 4/40  Iteration 1050/11040 Training loss: 1.7535 0.4783 sec/batch
Epoch 4/40  Iteration 1051/11040 Training loss: 1.7533 0.4774 sec/batch
Epoch 4/40  Iteration 1052/11040 Training loss: 1.7532 0.4814 sec/batch
Epoch 4/40  Iteration 1053/11040 Training loss: 1.7532 0.4804 sec/batch
Epoch 4/40  Iteration 1054/11040 Training loss: 1.7530 0.4796 sec/batch
Epoch 4/40  Iteration 1055/11040 Training loss: 1.7527 0.4804 sec/batch
Epoch 4/40  Iteration 1056/11040 Training loss: 1.7523 0.4796 sec/batch
Epoch 4/40  Iteration 1057/11040 Training loss: 1.7522 0.4793 se

Epoch 5/40  Iteration 1158/11040 Training loss: 1.6749 0.4673 sec/batch
Epoch 5/40  Iteration 1159/11040 Training loss: 1.6748 0.4713 sec/batch
Epoch 5/40  Iteration 1160/11040 Training loss: 1.6747 0.4752 sec/batch
Epoch 5/40  Iteration 1161/11040 Training loss: 1.6744 0.4752 sec/batch
Epoch 5/40  Iteration 1162/11040 Training loss: 1.6743 0.4699 sec/batch
Epoch 5/40  Iteration 1163/11040 Training loss: 1.6746 0.4709 sec/batch
Epoch 5/40  Iteration 1164/11040 Training loss: 1.6742 0.4836 sec/batch
Epoch 5/40  Iteration 1165/11040 Training loss: 1.6739 0.4747 sec/batch
Epoch 5/40  Iteration 1166/11040 Training loss: 1.6731 0.4626 sec/batch
Epoch 5/40  Iteration 1167/11040 Training loss: 1.6728 0.4785 sec/batch
Epoch 5/40  Iteration 1168/11040 Training loss: 1.6726 0.4749 sec/batch
Epoch 5/40  Iteration 1169/11040 Training loss: 1.6724 0.4768 sec/batch
Epoch 5/40  Iteration 1170/11040 Training loss: 1.6717 0.4644 sec/batch
Epoch 5/40  Iteration 1171/11040 Training loss: 1.6711 0.4763 se

Epoch 5/40  Iteration 1272/11040 Training loss: 1.6619 0.4738 sec/batch
Epoch 5/40  Iteration 1273/11040 Training loss: 1.6615 0.4818 sec/batch
Epoch 5/40  Iteration 1274/11040 Training loss: 1.6614 0.4737 sec/batch
Epoch 5/40  Iteration 1275/11040 Training loss: 1.6615 0.4772 sec/batch
Epoch 5/40  Iteration 1276/11040 Training loss: 1.6614 0.4634 sec/batch
Epoch 5/40  Iteration 1277/11040 Training loss: 1.6613 0.4721 sec/batch
Epoch 5/40  Iteration 1278/11040 Training loss: 1.6611 0.4851 sec/batch
Epoch 5/40  Iteration 1279/11040 Training loss: 1.6610 0.4654 sec/batch
Epoch 5/40  Iteration 1280/11040 Training loss: 1.6608 0.4909 sec/batch
Epoch 5/40  Iteration 1281/11040 Training loss: 1.6607 0.4809 sec/batch
Epoch 5/40  Iteration 1282/11040 Training loss: 1.6606 0.4766 sec/batch
Epoch 5/40  Iteration 1283/11040 Training loss: 1.6607 0.4736 sec/batch
Epoch 5/40  Iteration 1284/11040 Training loss: 1.6607 0.4651 sec/batch
Epoch 5/40  Iteration 1285/11040 Training loss: 1.6605 0.4598 se

Epoch 6/40  Iteration 1386/11040 Training loss: 1.6306 0.4648 sec/batch
Epoch 6/40  Iteration 1387/11040 Training loss: 1.6215 0.4784 sec/batch
Epoch 6/40  Iteration 1388/11040 Training loss: 1.6197 0.4762 sec/batch
Epoch 6/40  Iteration 1389/11040 Training loss: 1.6177 0.4649 sec/batch
Epoch 6/40  Iteration 1390/11040 Training loss: 1.6172 0.4784 sec/batch
Epoch 6/40  Iteration 1391/11040 Training loss: 1.6166 0.4640 sec/batch
Epoch 6/40  Iteration 1392/11040 Training loss: 1.6168 0.4579 sec/batch
Epoch 6/40  Iteration 1393/11040 Training loss: 1.6142 0.4751 sec/batch
Epoch 6/40  Iteration 1394/11040 Training loss: 1.6111 0.4827 sec/batch
Epoch 6/40  Iteration 1395/11040 Training loss: 1.6105 0.4753 sec/batch
Epoch 6/40  Iteration 1396/11040 Training loss: 1.6103 0.4753 sec/batch
Epoch 6/40  Iteration 1397/11040 Training loss: 1.6076 0.4753 sec/batch
Epoch 6/40  Iteration 1398/11040 Training loss: 1.6064 0.4773 sec/batch
Epoch 6/40  Iteration 1399/11040 Training loss: 1.6056 0.4760 se

Epoch 6/40  Iteration 1500/11040 Training loss: 1.5931 0.4791 sec/batch
Validation loss: 1.50083 Saving checkpoint!
Epoch 6/40  Iteration 1501/11040 Training loss: 1.5937 0.4845 sec/batch
Epoch 6/40  Iteration 1502/11040 Training loss: 1.5937 0.4825 sec/batch
Epoch 6/40  Iteration 1503/11040 Training loss: 1.5941 0.4794 sec/batch
Epoch 6/40  Iteration 1504/11040 Training loss: 1.5945 0.4793 sec/batch
Epoch 6/40  Iteration 1505/11040 Training loss: 1.5945 0.4791 sec/batch
Epoch 6/40  Iteration 1506/11040 Training loss: 1.5948 0.4843 sec/batch
Epoch 6/40  Iteration 1507/11040 Training loss: 1.5947 0.4787 sec/batch
Epoch 6/40  Iteration 1508/11040 Training loss: 1.5947 0.4790 sec/batch
Epoch 6/40  Iteration 1509/11040 Training loss: 1.5945 0.4797 sec/batch
Epoch 6/40  Iteration 1510/11040 Training loss: 1.5946 0.4777 sec/batch
Epoch 6/40  Iteration 1511/11040 Training loss: 1.5944 0.4635 sec/batch
Epoch 6/40  Iteration 1512/11040 Training loss: 1.5940 0.4782 sec/batch
Epoch 6/40  Iteratio

Epoch 6/40  Iteration 1614/11040 Training loss: 1.5870 0.4840 sec/batch
Epoch 6/40  Iteration 1615/11040 Training loss: 1.5870 0.4799 sec/batch
Epoch 6/40  Iteration 1616/11040 Training loss: 1.5868 0.4799 sec/batch
Epoch 6/40  Iteration 1617/11040 Training loss: 1.5866 0.4781 sec/batch
Epoch 6/40  Iteration 1618/11040 Training loss: 1.5865 0.4687 sec/batch
Epoch 6/40  Iteration 1619/11040 Training loss: 1.5865 0.4783 sec/batch
Epoch 6/40  Iteration 1620/11040 Training loss: 1.5863 0.4643 sec/batch
Epoch 6/40  Iteration 1621/11040 Training loss: 1.5862 0.4639 sec/batch
Epoch 6/40  Iteration 1622/11040 Training loss: 1.5860 0.4626 sec/batch
Epoch 6/40  Iteration 1623/11040 Training loss: 1.5858 0.4641 sec/batch
Epoch 6/40  Iteration 1624/11040 Training loss: 1.5855 0.4790 sec/batch
Epoch 6/40  Iteration 1625/11040 Training loss: 1.5853 0.4626 sec/batch
Epoch 6/40  Iteration 1626/11040 Training loss: 1.5852 0.4791 sec/batch
Epoch 6/40  Iteration 1627/11040 Training loss: 1.5851 0.4794 se

Epoch 7/40  Iteration 1728/11040 Training loss: 1.5408 0.4790 sec/batch
Epoch 7/40  Iteration 1729/11040 Training loss: 1.5401 0.4630 sec/batch
Epoch 7/40  Iteration 1730/11040 Training loss: 1.5400 0.4592 sec/batch
Epoch 7/40  Iteration 1731/11040 Training loss: 1.5396 0.4780 sec/batch
Epoch 7/40  Iteration 1732/11040 Training loss: 1.5397 0.4796 sec/batch
Epoch 7/40  Iteration 1733/11040 Training loss: 1.5394 0.4798 sec/batch
Epoch 7/40  Iteration 1734/11040 Training loss: 1.5394 0.4777 sec/batch
Epoch 7/40  Iteration 1735/11040 Training loss: 1.5389 0.4846 sec/batch
Epoch 7/40  Iteration 1736/11040 Training loss: 1.5387 0.4742 sec/batch
Epoch 7/40  Iteration 1737/11040 Training loss: 1.5382 0.4837 sec/batch
Epoch 7/40  Iteration 1738/11040 Training loss: 1.5380 0.4740 sec/batch
Epoch 7/40  Iteration 1739/11040 Training loss: 1.5376 0.4689 sec/batch
Epoch 7/40  Iteration 1740/11040 Training loss: 1.5373 0.4797 sec/batch
Epoch 7/40  Iteration 1741/11040 Training loss: 1.5369 0.4630 se

Epoch 7/40  Iteration 1842/11040 Training loss: 1.5387 0.4865 sec/batch
Epoch 7/40  Iteration 1843/11040 Training loss: 1.5389 0.4735 sec/batch
Epoch 7/40  Iteration 1844/11040 Training loss: 1.5389 0.4825 sec/batch
Epoch 7/40  Iteration 1845/11040 Training loss: 1.5387 0.4786 sec/batch
Epoch 7/40  Iteration 1846/11040 Training loss: 1.5386 0.4789 sec/batch
Epoch 7/40  Iteration 1847/11040 Training loss: 1.5384 0.4796 sec/batch
Epoch 7/40  Iteration 1848/11040 Training loss: 1.5383 0.4794 sec/batch
Epoch 7/40  Iteration 1849/11040 Training loss: 1.5381 0.4792 sec/batch
Epoch 7/40  Iteration 1850/11040 Training loss: 1.5381 0.4841 sec/batch
Epoch 7/40  Iteration 1851/11040 Training loss: 1.5383 0.4739 sec/batch
Epoch 7/40  Iteration 1852/11040 Training loss: 1.5383 0.4629 sec/batch
Epoch 7/40  Iteration 1853/11040 Training loss: 1.5383 0.4648 sec/batch
Epoch 7/40  Iteration 1854/11040 Training loss: 1.5383 0.4681 sec/batch
Epoch 7/40  Iteration 1855/11040 Training loss: 1.5385 0.4731 se

Epoch 8/40  Iteration 1956/11040 Training loss: 1.5110 0.4793 sec/batch
Epoch 8/40  Iteration 1957/11040 Training loss: 1.5096 0.4803 sec/batch
Epoch 8/40  Iteration 1958/11040 Training loss: 1.5087 0.4841 sec/batch
Epoch 8/40  Iteration 1959/11040 Training loss: 1.5076 0.4788 sec/batch
Epoch 8/40  Iteration 1960/11040 Training loss: 1.5073 0.4792 sec/batch
Epoch 8/40  Iteration 1961/11040 Training loss: 1.5070 0.4794 sec/batch
Epoch 8/40  Iteration 1962/11040 Training loss: 1.5075 0.4781 sec/batch
Epoch 8/40  Iteration 1963/11040 Training loss: 1.5069 0.4689 sec/batch
Epoch 8/40  Iteration 1964/11040 Training loss: 1.5066 0.4630 sec/batch
Epoch 8/40  Iteration 1965/11040 Training loss: 1.5068 0.4804 sec/batch
Epoch 8/40  Iteration 1966/11040 Training loss: 1.5064 0.4777 sec/batch
Epoch 8/40  Iteration 1967/11040 Training loss: 1.5064 0.4845 sec/batch
Epoch 8/40  Iteration 1968/11040 Training loss: 1.5067 0.4839 sec/batch
Epoch 8/40  Iteration 1969/11040 Training loss: 1.5066 0.4852 se

Epoch 8/40  Iteration 2070/11040 Training loss: 1.5002 0.4711 sec/batch
Epoch 8/40  Iteration 2071/11040 Training loss: 1.5003 0.4711 sec/batch
Epoch 8/40  Iteration 2072/11040 Training loss: 1.5003 0.4789 sec/batch
Epoch 8/40  Iteration 2073/11040 Training loss: 1.5003 0.4739 sec/batch
Epoch 8/40  Iteration 2074/11040 Training loss: 1.5004 0.4686 sec/batch
Epoch 8/40  Iteration 2075/11040 Training loss: 1.5002 0.4643 sec/batch
Epoch 8/40  Iteration 2076/11040 Training loss: 1.5001 0.4684 sec/batch
Epoch 8/40  Iteration 2077/11040 Training loss: 1.5000 0.4633 sec/batch
Epoch 8/40  Iteration 2078/11040 Training loss: 1.4999 0.4798 sec/batch
Epoch 8/40  Iteration 2079/11040 Training loss: 1.4999 0.4782 sec/batch
Epoch 8/40  Iteration 2080/11040 Training loss: 1.4997 0.4839 sec/batch
Epoch 8/40  Iteration 2081/11040 Training loss: 1.4993 0.4799 sec/batch
Epoch 8/40  Iteration 2082/11040 Training loss: 1.4993 0.4738 sec/batch
Epoch 8/40  Iteration 2083/11040 Training loss: 1.4993 0.4686 se

Epoch 8/40  Iteration 2184/11040 Training loss: 1.4959 0.4843 sec/batch
Epoch 8/40  Iteration 2185/11040 Training loss: 1.4959 0.4849 sec/batch
Epoch 8/40  Iteration 2186/11040 Training loss: 1.4960 0.4842 sec/batch
Epoch 8/40  Iteration 2187/11040 Training loss: 1.4959 0.4843 sec/batch
Epoch 8/40  Iteration 2188/11040 Training loss: 1.4959 0.4793 sec/batch
Epoch 8/40  Iteration 2189/11040 Training loss: 1.4958 0.4785 sec/batch
Epoch 8/40  Iteration 2190/11040 Training loss: 1.4958 0.4635 sec/batch
Epoch 8/40  Iteration 2191/11040 Training loss: 1.4957 0.4798 sec/batch
Epoch 8/40  Iteration 2192/11040 Training loss: 1.4956 0.4836 sec/batch
Epoch 8/40  Iteration 2193/11040 Training loss: 1.4955 0.4743 sec/batch
Epoch 8/40  Iteration 2194/11040 Training loss: 1.4956 0.4839 sec/batch
Epoch 8/40  Iteration 2195/11040 Training loss: 1.4955 0.4842 sec/batch
Epoch 8/40  Iteration 2196/11040 Training loss: 1.4956 0.4843 sec/batch
Epoch 8/40  Iteration 2197/11040 Training loss: 1.4955 0.4752 se

Epoch 9/40  Iteration 2298/11040 Training loss: 1.4610 0.4681 sec/batch
Epoch 9/40  Iteration 2299/11040 Training loss: 1.4615 0.4638 sec/batch
Epoch 9/40  Iteration 2300/11040 Training loss: 1.4614 0.4794 sec/batch
Epoch 9/40  Iteration 2301/11040 Training loss: 1.4612 0.4785 sec/batch
Epoch 9/40  Iteration 2302/11040 Training loss: 1.4615 0.4806 sec/batch
Epoch 9/40  Iteration 2303/11040 Training loss: 1.4620 0.4694 sec/batch
Epoch 9/40  Iteration 2304/11040 Training loss: 1.4623 0.4824 sec/batch
Epoch 9/40  Iteration 2305/11040 Training loss: 1.4627 0.4793 sec/batch
Epoch 9/40  Iteration 2306/11040 Training loss: 1.4630 0.4742 sec/batch
Epoch 9/40  Iteration 2307/11040 Training loss: 1.4631 0.4782 sec/batch
Epoch 9/40  Iteration 2308/11040 Training loss: 1.4628 0.4802 sec/batch
Epoch 9/40  Iteration 2309/11040 Training loss: 1.4630 0.4834 sec/batch
Epoch 9/40  Iteration 2310/11040 Training loss: 1.4628 0.4867 sec/batch
Epoch 9/40  Iteration 2311/11040 Training loss: 1.4622 0.4703 se

Epoch 9/40  Iteration 2412/11040 Training loss: 1.4666 0.4807 sec/batch
Epoch 9/40  Iteration 2413/11040 Training loss: 1.4665 0.4791 sec/batch
Epoch 9/40  Iteration 2414/11040 Training loss: 1.4664 0.4776 sec/batch
Epoch 9/40  Iteration 2415/11040 Training loss: 1.4664 0.4739 sec/batch
Epoch 9/40  Iteration 2416/11040 Training loss: 1.4663 0.4846 sec/batch
Epoch 9/40  Iteration 2417/11040 Training loss: 1.4663 0.4790 sec/batch
Epoch 9/40  Iteration 2418/11040 Training loss: 1.4662 0.4798 sec/batch
Epoch 9/40  Iteration 2419/11040 Training loss: 1.4662 0.4729 sec/batch
Epoch 9/40  Iteration 2420/11040 Training loss: 1.4662 0.4853 sec/batch
Epoch 9/40  Iteration 2421/11040 Training loss: 1.4662 0.4783 sec/batch
Epoch 9/40  Iteration 2422/11040 Training loss: 1.4661 0.4804 sec/batch
Epoch 9/40  Iteration 2423/11040 Training loss: 1.4659 0.4779 sec/batch
Epoch 9/40  Iteration 2424/11040 Training loss: 1.4659 0.4789 sec/batch
Epoch 9/40  Iteration 2425/11040 Training loss: 1.4657 0.4795 se

Epoch 10/40  Iteration 2525/11040 Training loss: 1.4463 0.4639 sec/batch
Epoch 10/40  Iteration 2526/11040 Training loss: 1.4454 0.4790 sec/batch
Epoch 10/40  Iteration 2527/11040 Training loss: 1.4457 0.4787 sec/batch
Epoch 10/40  Iteration 2528/11040 Training loss: 1.4455 0.4791 sec/batch
Epoch 10/40  Iteration 2529/11040 Training loss: 1.4453 0.4790 sec/batch
Epoch 10/40  Iteration 2530/11040 Training loss: 1.4447 0.4798 sec/batch
Epoch 10/40  Iteration 2531/11040 Training loss: 1.4440 0.4735 sec/batch
Epoch 10/40  Iteration 2532/11040 Training loss: 1.4437 0.4844 sec/batch
Epoch 10/40  Iteration 2533/11040 Training loss: 1.4428 0.4789 sec/batch
Epoch 10/40  Iteration 2534/11040 Training loss: 1.4431 0.4805 sec/batch
Epoch 10/40  Iteration 2535/11040 Training loss: 1.4429 0.4731 sec/batch
Epoch 10/40  Iteration 2536/11040 Training loss: 1.4424 0.4840 sec/batch
Epoch 10/40  Iteration 2537/11040 Training loss: 1.4428 0.4789 sec/batch
Epoch 10/40  Iteration 2538/11040 Training loss: 1.

Epoch 10/40  Iteration 2638/11040 Training loss: 1.4394 0.4769 sec/batch
Epoch 10/40  Iteration 2639/11040 Training loss: 1.4393 0.4728 sec/batch
Epoch 10/40  Iteration 2640/11040 Training loss: 1.4393 0.4734 sec/batch
Epoch 10/40  Iteration 2641/11040 Training loss: 1.4391 0.4763 sec/batch
Epoch 10/40  Iteration 2642/11040 Training loss: 1.4391 0.4742 sec/batch
Epoch 10/40  Iteration 2643/11040 Training loss: 1.4392 0.4570 sec/batch
Epoch 10/40  Iteration 2644/11040 Training loss: 1.4393 0.4732 sec/batch
Epoch 10/40  Iteration 2645/11040 Training loss: 1.4394 0.4754 sec/batch
Epoch 10/40  Iteration 2646/11040 Training loss: 1.4394 0.4746 sec/batch
Epoch 10/40  Iteration 2647/11040 Training loss: 1.4394 0.4754 sec/batch
Epoch 10/40  Iteration 2648/11040 Training loss: 1.4395 0.4740 sec/batch
Epoch 10/40  Iteration 2649/11040 Training loss: 1.4395 0.4738 sec/batch
Epoch 10/40  Iteration 2650/11040 Training loss: 1.4397 0.4615 sec/batch
Epoch 10/40  Iteration 2651/11040 Training loss: 1.

Epoch 10/40  Iteration 2751/11040 Training loss: 1.4369 0.4741 sec/batch
Epoch 10/40  Iteration 2752/11040 Training loss: 1.4369 0.4767 sec/batch
Epoch 10/40  Iteration 2753/11040 Training loss: 1.4370 0.4737 sec/batch
Epoch 10/40  Iteration 2754/11040 Training loss: 1.4370 0.4770 sec/batch
Epoch 10/40  Iteration 2755/11040 Training loss: 1.4373 0.4748 sec/batch
Epoch 10/40  Iteration 2756/11040 Training loss: 1.4374 0.4763 sec/batch
Epoch 10/40  Iteration 2757/11040 Training loss: 1.4375 0.4749 sec/batch
Epoch 10/40  Iteration 2758/11040 Training loss: 1.4374 0.4918 sec/batch
Epoch 10/40  Iteration 2759/11040 Training loss: 1.4373 0.4737 sec/batch
Epoch 10/40  Iteration 2760/11040 Training loss: 1.4373 0.4770 sec/batch
Epoch 11/40  Iteration 2761/11040 Training loss: 1.4978 0.4856 sec/batch
Epoch 11/40  Iteration 2762/11040 Training loss: 1.4544 0.4823 sec/batch
Epoch 11/40  Iteration 2763/11040 Training loss: 1.4494 0.4759 sec/batch
Epoch 11/40  Iteration 2764/11040 Training loss: 1.

Epoch 11/40  Iteration 2864/11040 Training loss: 1.4126 0.4745 sec/batch
Epoch 11/40  Iteration 2865/11040 Training loss: 1.4123 0.4732 sec/batch
Epoch 11/40  Iteration 2866/11040 Training loss: 1.4123 0.4616 sec/batch
Epoch 11/40  Iteration 2867/11040 Training loss: 1.4126 0.4574 sec/batch
Epoch 11/40  Iteration 2868/11040 Training loss: 1.4128 0.4598 sec/batch
Epoch 11/40  Iteration 2869/11040 Training loss: 1.4131 0.4586 sec/batch
Epoch 11/40  Iteration 2870/11040 Training loss: 1.4132 0.4749 sec/batch
Epoch 11/40  Iteration 2871/11040 Training loss: 1.4133 0.4737 sec/batch
Epoch 11/40  Iteration 2872/11040 Training loss: 1.4135 0.4769 sec/batch
Epoch 11/40  Iteration 2873/11040 Training loss: 1.4138 0.4742 sec/batch
Epoch 11/40  Iteration 2874/11040 Training loss: 1.4140 0.4782 sec/batch
Epoch 11/40  Iteration 2875/11040 Training loss: 1.4145 0.4694 sec/batch
Epoch 11/40  Iteration 2876/11040 Training loss: 1.4149 0.4752 sec/batch
Epoch 11/40  Iteration 2877/11040 Training loss: 1.

Epoch 11/40  Iteration 2977/11040 Training loss: 1.4168 0.4863 sec/batch
Epoch 11/40  Iteration 2978/11040 Training loss: 1.4166 0.4752 sec/batch
Epoch 11/40  Iteration 2979/11040 Training loss: 1.4165 0.4740 sec/batch
Epoch 11/40  Iteration 2980/11040 Training loss: 1.4165 0.4740 sec/batch
Epoch 11/40  Iteration 2981/11040 Training loss: 1.4164 0.4745 sec/batch
Epoch 11/40  Iteration 2982/11040 Training loss: 1.4163 0.4611 sec/batch
Epoch 11/40  Iteration 2983/11040 Training loss: 1.4161 0.4583 sec/batch
Epoch 11/40  Iteration 2984/11040 Training loss: 1.4163 0.4697 sec/batch
Epoch 11/40  Iteration 2985/11040 Training loss: 1.4163 0.4806 sec/batch
Epoch 11/40  Iteration 2986/11040 Training loss: 1.4163 0.4756 sec/batch
Epoch 11/40  Iteration 2987/11040 Training loss: 1.4161 0.4720 sec/batch
Epoch 11/40  Iteration 2988/11040 Training loss: 1.4160 0.4747 sec/batch
Epoch 11/40  Iteration 2989/11040 Training loss: 1.4160 0.4762 sec/batch
Epoch 11/40  Iteration 2990/11040 Training loss: 1.

Epoch 12/40  Iteration 3089/11040 Training loss: 1.3965 0.4750 sec/batch
Epoch 12/40  Iteration 3090/11040 Training loss: 1.3963 0.4739 sec/batch
Epoch 12/40  Iteration 3091/11040 Training loss: 1.3964 0.4750 sec/batch
Epoch 12/40  Iteration 3092/11040 Training loss: 1.3964 0.4741 sec/batch
Epoch 12/40  Iteration 3093/11040 Training loss: 1.3961 0.4612 sec/batch
Epoch 12/40  Iteration 3094/11040 Training loss: 1.3961 0.4584 sec/batch
Epoch 12/40  Iteration 3095/11040 Training loss: 1.3964 0.4593 sec/batch
Epoch 12/40  Iteration 3096/11040 Training loss: 1.3960 0.4585 sec/batch
Epoch 12/40  Iteration 3097/11040 Training loss: 1.3961 0.4739 sec/batch
Epoch 12/40  Iteration 3098/11040 Training loss: 1.3955 0.4727 sec/batch
Epoch 12/40  Iteration 3099/11040 Training loss: 1.3954 0.4624 sec/batch
Epoch 12/40  Iteration 3100/11040 Training loss: 1.3952 0.4732 sec/batch
Epoch 12/40  Iteration 3101/11040 Training loss: 1.3951 0.4611 sec/batch
Epoch 12/40  Iteration 3102/11040 Training loss: 1.

Epoch 12/40  Iteration 3202/11040 Training loss: 1.3951 0.4705 sec/batch
Epoch 12/40  Iteration 3203/11040 Training loss: 1.3951 0.4729 sec/batch
Epoch 12/40  Iteration 3204/11040 Training loss: 1.3950 0.4747 sec/batch
Epoch 12/40  Iteration 3205/11040 Training loss: 1.3949 0.9804 sec/batch
Epoch 12/40  Iteration 3206/11040 Training loss: 1.3948 0.8787 sec/batch
Epoch 12/40  Iteration 3207/11040 Training loss: 1.3950 0.7119 sec/batch
Epoch 12/40  Iteration 3208/11040 Training loss: 1.3951 0.7237 sec/batch
Epoch 12/40  Iteration 3209/11040 Training loss: 1.3951 0.6660 sec/batch
Epoch 12/40  Iteration 3210/11040 Training loss: 1.3951 0.6315 sec/batch
Epoch 12/40  Iteration 3211/11040 Training loss: 1.3951 0.4718 sec/batch
Epoch 12/40  Iteration 3212/11040 Training loss: 1.3951 0.4737 sec/batch
Epoch 12/40  Iteration 3213/11040 Training loss: 1.3951 0.4737 sec/batch
Epoch 12/40  Iteration 3214/11040 Training loss: 1.3951 0.4750 sec/batch
Epoch 12/40  Iteration 3215/11040 Training loss: 1.

Epoch 13/40  Iteration 3315/11040 Training loss: 1.4108 0.4753 sec/batch
Epoch 13/40  Iteration 3316/11040 Training loss: 1.4098 0.4750 sec/batch
Epoch 13/40  Iteration 3317/11040 Training loss: 1.4101 0.4738 sec/batch
Epoch 13/40  Iteration 3318/11040 Training loss: 1.4064 0.4605 sec/batch
Epoch 13/40  Iteration 3319/11040 Training loss: 1.3984 0.4587 sec/batch
Epoch 13/40  Iteration 3320/11040 Training loss: 1.3962 0.4594 sec/batch
Epoch 13/40  Iteration 3321/11040 Training loss: 1.3939 0.4585 sec/batch
Epoch 13/40  Iteration 3322/11040 Training loss: 1.3943 0.4749 sec/batch
Epoch 13/40  Iteration 3323/11040 Training loss: 1.3922 0.4759 sec/batch
Epoch 13/40  Iteration 3324/11040 Training loss: 1.3918 0.4744 sec/batch
Epoch 13/40  Iteration 3325/11040 Training loss: 1.3898 0.4596 sec/batch
Epoch 13/40  Iteration 3326/11040 Training loss: 1.3868 0.4582 sec/batch
Epoch 13/40  Iteration 3327/11040 Training loss: 1.3870 0.4756 sec/batch
Epoch 13/40  Iteration 3328/11040 Training loss: 1.

Epoch 13/40  Iteration 3428/11040 Training loss: 1.3747 0.4730 sec/batch
Epoch 13/40  Iteration 3429/11040 Training loss: 1.3746 0.4747 sec/batch
Epoch 13/40  Iteration 3430/11040 Training loss: 1.3750 0.4762 sec/batch
Epoch 13/40  Iteration 3431/11040 Training loss: 1.3752 0.4730 sec/batch
Epoch 13/40  Iteration 3432/11040 Training loss: 1.3754 0.4742 sec/batch
Epoch 13/40  Iteration 3433/11040 Training loss: 1.3754 0.4782 sec/batch
Epoch 13/40  Iteration 3434/11040 Training loss: 1.3754 0.4736 sec/batch
Epoch 13/40  Iteration 3435/11040 Training loss: 1.3758 0.4731 sec/batch
Epoch 13/40  Iteration 3436/11040 Training loss: 1.3762 0.4738 sec/batch
Epoch 13/40  Iteration 3437/11040 Training loss: 1.3762 0.4755 sec/batch
Epoch 13/40  Iteration 3438/11040 Training loss: 1.3765 0.4751 sec/batch
Epoch 13/40  Iteration 3439/11040 Training loss: 1.3765 0.4740 sec/batch
Epoch 13/40  Iteration 3440/11040 Training loss: 1.3765 0.4734 sec/batch
Epoch 13/40  Iteration 3441/11040 Training loss: 1.

Epoch 13/40  Iteration 3540/11040 Training loss: 1.3776 0.4584 sec/batch
Epoch 13/40  Iteration 3541/11040 Training loss: 1.3777 0.4750 sec/batch
Epoch 13/40  Iteration 3542/11040 Training loss: 1.3778 0.4747 sec/batch
Epoch 13/40  Iteration 3543/11040 Training loss: 1.3778 0.4756 sec/batch
Epoch 13/40  Iteration 3544/11040 Training loss: 1.3778 0.4744 sec/batch
Epoch 13/40  Iteration 3545/11040 Training loss: 1.3777 0.4732 sec/batch
Epoch 13/40  Iteration 3546/11040 Training loss: 1.3775 0.4771 sec/batch
Epoch 13/40  Iteration 3547/11040 Training loss: 1.3776 0.4748 sec/batch
Epoch 13/40  Iteration 3548/11040 Training loss: 1.3776 0.4734 sec/batch
Epoch 13/40  Iteration 3549/11040 Training loss: 1.3775 0.4740 sec/batch
Epoch 13/40  Iteration 3550/11040 Training loss: 1.3775 0.4749 sec/batch
Epoch 13/40  Iteration 3551/11040 Training loss: 1.3775 0.4596 sec/batch
Epoch 13/40  Iteration 3552/11040 Training loss: 1.3776 0.4741 sec/batch
Epoch 13/40  Iteration 3553/11040 Training loss: 1.

Epoch 14/40  Iteration 3653/11040 Training loss: 1.3600 0.4740 sec/batch
Epoch 14/40  Iteration 3654/11040 Training loss: 1.3596 0.4753 sec/batch
Epoch 14/40  Iteration 3655/11040 Training loss: 1.3594 0.4742 sec/batch
Epoch 14/40  Iteration 3656/11040 Training loss: 1.3594 0.4772 sec/batch
Epoch 14/40  Iteration 3657/11040 Training loss: 1.3591 0.4720 sec/batch
Epoch 14/40  Iteration 3658/11040 Training loss: 1.3591 0.4615 sec/batch
Epoch 14/40  Iteration 3659/11040 Training loss: 1.3587 0.4593 sec/batch
Epoch 14/40  Iteration 3660/11040 Training loss: 1.3582 0.4586 sec/batch
Epoch 14/40  Iteration 3661/11040 Training loss: 1.3576 0.4576 sec/batch
Epoch 14/40  Iteration 3662/11040 Training loss: 1.3573 0.4743 sec/batch
Epoch 14/40  Iteration 3663/11040 Training loss: 1.3572 0.4595 sec/batch
Epoch 14/40  Iteration 3664/11040 Training loss: 1.3574 0.4741 sec/batch
Epoch 14/40  Iteration 3665/11040 Training loss: 1.3571 0.4760 sec/batch
Epoch 14/40  Iteration 3666/11040 Training loss: 1.

Epoch 14/40  Iteration 3766/11040 Training loss: 1.3602 0.4747 sec/batch
Epoch 14/40  Iteration 3767/11040 Training loss: 1.3604 0.4724 sec/batch
Epoch 14/40  Iteration 3768/11040 Training loss: 1.3604 0.4750 sec/batch
Epoch 14/40  Iteration 3769/11040 Training loss: 1.3604 0.4762 sec/batch
Epoch 14/40  Iteration 3770/11040 Training loss: 1.3604 0.4745 sec/batch
Epoch 14/40  Iteration 3771/11040 Training loss: 1.3603 0.4603 sec/batch
Epoch 14/40  Iteration 3772/11040 Training loss: 1.3604 0.4730 sec/batch
Epoch 14/40  Iteration 3773/11040 Training loss: 1.3603 0.4786 sec/batch
Epoch 14/40  Iteration 3774/11040 Training loss: 1.3604 0.4700 sec/batch
Epoch 14/40  Iteration 3775/11040 Training loss: 1.3606 0.4615 sec/batch
Epoch 14/40  Iteration 3776/11040 Training loss: 1.3607 0.4576 sec/batch
Epoch 14/40  Iteration 3777/11040 Training loss: 1.3607 0.4740 sec/batch
Epoch 14/40  Iteration 3778/11040 Training loss: 1.3607 0.4761 sec/batch
Epoch 14/40  Iteration 3779/11040 Training loss: 1.

Epoch 15/40  Iteration 3879/11040 Training loss: 1.3550 0.4607 sec/batch
Epoch 15/40  Iteration 3880/11040 Training loss: 1.3550 0.4574 sec/batch
Epoch 15/40  Iteration 3881/11040 Training loss: 1.3528 0.4751 sec/batch
Epoch 15/40  Iteration 3882/11040 Training loss: 1.3514 0.4589 sec/batch
Epoch 15/40  Iteration 3883/11040 Training loss: 1.3505 0.4602 sec/batch
Epoch 15/40  Iteration 3884/11040 Training loss: 1.3506 0.4731 sec/batch
Epoch 15/40  Iteration 3885/11040 Training loss: 1.3516 0.4603 sec/batch
Epoch 15/40  Iteration 3886/11040 Training loss: 1.3521 0.4749 sec/batch
Epoch 15/40  Iteration 3887/11040 Training loss: 1.3524 0.4594 sec/batch
Epoch 15/40  Iteration 3888/11040 Training loss: 1.3529 0.4585 sec/batch
Epoch 15/40  Iteration 3889/11040 Training loss: 1.3521 0.4586 sec/batch
Epoch 15/40  Iteration 3890/11040 Training loss: 1.3511 0.4593 sec/batch
Epoch 15/40  Iteration 3891/11040 Training loss: 1.3507 0.4598 sec/batch
Epoch 15/40  Iteration 3892/11040 Training loss: 1.

Epoch 15/40  Iteration 3992/11040 Training loss: 1.3441 0.4737 sec/batch
Epoch 15/40  Iteration 3993/11040 Training loss: 1.3440 0.4761 sec/batch
Epoch 15/40  Iteration 3994/11040 Training loss: 1.3442 0.4733 sec/batch
Epoch 15/40  Iteration 3995/11040 Training loss: 1.3440 0.4764 sec/batch
Epoch 15/40  Iteration 3996/11040 Training loss: 1.3438 0.4739 sec/batch
Epoch 15/40  Iteration 3997/11040 Training loss: 1.3440 0.4750 sec/batch
Epoch 15/40  Iteration 3998/11040 Training loss: 1.3439 0.4730 sec/batch
Epoch 15/40  Iteration 3999/11040 Training loss: 1.3441 0.4761 sec/batch
Epoch 15/40  Iteration 4000/11040 Training loss: 1.3443 0.4732 sec/batch
Validation loss: 1.31312 Saving checkpoint!
Epoch 15/40  Iteration 4001/11040 Training loss: 1.3450 0.4617 sec/batch
Epoch 15/40  Iteration 4002/11040 Training loss: 1.3453 0.4812 sec/batch
Epoch 15/40  Iteration 4003/11040 Training loss: 1.3454 0.4734 sec/batch
Epoch 15/40  Iteration 4004/11040 Training loss: 1.3455 0.4741 sec/batch
Epoch 1

Epoch 15/40  Iteration 4104/11040 Training loss: 1.3455 0.4600 sec/batch
Epoch 15/40  Iteration 4105/11040 Training loss: 1.3455 0.4575 sec/batch
Epoch 15/40  Iteration 4106/11040 Training loss: 1.3455 0.4604 sec/batch
Epoch 15/40  Iteration 4107/11040 Training loss: 1.3454 0.4730 sec/batch
Epoch 15/40  Iteration 4108/11040 Training loss: 1.3453 0.4760 sec/batch
Epoch 15/40  Iteration 4109/11040 Training loss: 1.3452 0.4731 sec/batch
Epoch 15/40  Iteration 4110/11040 Training loss: 1.3452 0.4617 sec/batch
Epoch 15/40  Iteration 4111/11040 Training loss: 1.3452 0.4722 sec/batch
Epoch 15/40  Iteration 4112/11040 Training loss: 1.3451 0.4761 sec/batch
Epoch 15/40  Iteration 4113/11040 Training loss: 1.3451 0.4584 sec/batch
Epoch 15/40  Iteration 4114/11040 Training loss: 1.3451 0.4597 sec/batch
Epoch 15/40  Iteration 4115/11040 Training loss: 1.3450 0.4743 sec/batch
Epoch 15/40  Iteration 4116/11040 Training loss: 1.3449 0.4740 sec/batch
Epoch 15/40  Iteration 4117/11040 Training loss: 1.

Epoch 16/40  Iteration 4217/11040 Training loss: 1.3276 0.4749 sec/batch
Epoch 16/40  Iteration 4218/11040 Training loss: 1.3272 0.4706 sec/batch
Epoch 16/40  Iteration 4219/11040 Training loss: 1.3270 0.4780 sec/batch
Epoch 16/40  Iteration 4220/11040 Training loss: 1.3269 0.4613 sec/batch
Epoch 16/40  Iteration 4221/11040 Training loss: 1.3267 0.4725 sec/batch
Epoch 16/40  Iteration 4222/11040 Training loss: 1.3268 0.4738 sec/batch
Epoch 16/40  Iteration 4223/11040 Training loss: 1.3267 0.4560 sec/batch
Epoch 16/40  Iteration 4224/11040 Training loss: 1.3266 0.4751 sec/batch
Epoch 16/40  Iteration 4225/11040 Training loss: 1.3263 0.4752 sec/batch
Epoch 16/40  Iteration 4226/11040 Training loss: 1.3263 0.4762 sec/batch
Epoch 16/40  Iteration 4227/11040 Training loss: 1.3258 0.4574 sec/batch
Epoch 16/40  Iteration 4228/11040 Training loss: 1.3249 0.4698 sec/batch
Epoch 16/40  Iteration 4229/11040 Training loss: 1.3249 0.4792 sec/batch
Epoch 16/40  Iteration 4230/11040 Training loss: 1.

Epoch 16/40  Iteration 4330/11040 Training loss: 1.3319 0.4588 sec/batch
Epoch 16/40  Iteration 4331/11040 Training loss: 1.3317 0.4588 sec/batch
Epoch 16/40  Iteration 4332/11040 Training loss: 1.3317 0.4597 sec/batch
Epoch 16/40  Iteration 4333/11040 Training loss: 1.3317 0.4583 sec/batch
Epoch 16/40  Iteration 4334/11040 Training loss: 1.3317 0.4604 sec/batch
Epoch 16/40  Iteration 4335/11040 Training loss: 1.3319 0.4580 sec/batch
Epoch 16/40  Iteration 4336/11040 Training loss: 1.3319 0.4602 sec/batch
Epoch 16/40  Iteration 4337/11040 Training loss: 1.3320 0.4571 sec/batch
Epoch 16/40  Iteration 4338/11040 Training loss: 1.3321 0.4748 sec/batch
Epoch 16/40  Iteration 4339/11040 Training loss: 1.3322 0.4605 sec/batch
Epoch 16/40  Iteration 4340/11040 Training loss: 1.3323 0.4732 sec/batch
Epoch 16/40  Iteration 4341/11040 Training loss: 1.3325 0.4742 sec/batch
Epoch 16/40  Iteration 4342/11040 Training loss: 1.3327 0.4770 sec/batch
Epoch 16/40  Iteration 4343/11040 Training loss: 1.

Epoch 17/40  Iteration 4443/11040 Training loss: 1.3250 0.4598 sec/batch
Epoch 17/40  Iteration 4444/11040 Training loss: 1.3250 0.4731 sec/batch
Epoch 17/40  Iteration 4445/11040 Training loss: 1.3250 0.4749 sec/batch
Epoch 17/40  Iteration 4446/11040 Training loss: 1.3251 0.4739 sec/batch
Epoch 17/40  Iteration 4447/11040 Training loss: 1.3248 0.4742 sec/batch
Epoch 17/40  Iteration 4448/11040 Training loss: 1.3246 0.4764 sec/batch
Epoch 17/40  Iteration 4449/11040 Training loss: 1.3240 0.4743 sec/batch
Epoch 17/40  Iteration 4450/11040 Training loss: 1.3234 0.4751 sec/batch
Epoch 17/40  Iteration 4451/11040 Training loss: 1.3234 0.4723 sec/batch
Epoch 17/40  Iteration 4452/11040 Training loss: 1.3240 0.4606 sec/batch
Epoch 17/40  Iteration 4453/11040 Training loss: 1.3239 0.4762 sec/batch
Epoch 17/40  Iteration 4454/11040 Training loss: 1.3231 0.4744 sec/batch
Epoch 17/40  Iteration 4455/11040 Training loss: 1.3223 0.4761 sec/batch
Epoch 17/40  Iteration 4456/11040 Training loss: 1.

Epoch 17/40  Iteration 4555/11040 Training loss: 1.3183 0.4607 sec/batch
Epoch 17/40  Iteration 4556/11040 Training loss: 1.3183 0.4730 sec/batch
Epoch 17/40  Iteration 4557/11040 Training loss: 1.3183 0.4606 sec/batch
Epoch 17/40  Iteration 4558/11040 Training loss: 1.3184 0.4717 sec/batch
Epoch 17/40  Iteration 4559/11040 Training loss: 1.3184 0.4627 sec/batch
Epoch 17/40  Iteration 4560/11040 Training loss: 1.3182 0.4576 sec/batch
Epoch 17/40  Iteration 4561/11040 Training loss: 1.3182 0.4606 sec/batch
Epoch 17/40  Iteration 4562/11040 Training loss: 1.3182 0.4732 sec/batch
Epoch 17/40  Iteration 4563/11040 Training loss: 1.3182 0.4759 sec/batch
Epoch 17/40  Iteration 4564/11040 Training loss: 1.3181 0.4587 sec/batch
Epoch 17/40  Iteration 4565/11040 Training loss: 1.3179 0.4597 sec/batch
Epoch 17/40  Iteration 4566/11040 Training loss: 1.3179 0.4739 sec/batch
Epoch 17/40  Iteration 4567/11040 Training loss: 1.3179 0.4604 sec/batch
Epoch 17/40  Iteration 4568/11040 Training loss: 1.

Epoch 17/40  Iteration 4668/11040 Training loss: 1.3184 0.4618 sec/batch
Epoch 17/40  Iteration 4669/11040 Training loss: 1.3184 0.4575 sec/batch
Epoch 17/40  Iteration 4670/11040 Training loss: 1.3184 0.4593 sec/batch
Epoch 17/40  Iteration 4671/11040 Training loss: 1.3184 0.4584 sec/batch
Epoch 17/40  Iteration 4672/11040 Training loss: 1.3183 0.4594 sec/batch
Epoch 17/40  Iteration 4673/11040 Training loss: 1.3184 0.4589 sec/batch
Epoch 17/40  Iteration 4674/11040 Training loss: 1.3184 0.4599 sec/batch
Epoch 17/40  Iteration 4675/11040 Training loss: 1.3183 0.4595 sec/batch
Epoch 17/40  Iteration 4676/11040 Training loss: 1.3183 0.4730 sec/batch
Epoch 17/40  Iteration 4677/11040 Training loss: 1.3183 0.4617 sec/batch
Epoch 17/40  Iteration 4678/11040 Training loss: 1.3184 0.4734 sec/batch
Epoch 17/40  Iteration 4679/11040 Training loss: 1.3185 0.4713 sec/batch
Epoch 17/40  Iteration 4680/11040 Training loss: 1.3186 0.4585 sec/batch
Epoch 17/40  Iteration 4681/11040 Training loss: 1.

Epoch 18/40  Iteration 4781/11040 Training loss: 1.2986 0.4762 sec/batch
Epoch 18/40  Iteration 4782/11040 Training loss: 1.2988 0.4731 sec/batch
Epoch 18/40  Iteration 4783/11040 Training loss: 1.2990 0.4814 sec/batch
Epoch 18/40  Iteration 4784/11040 Training loss: 1.2991 0.4704 sec/batch
Epoch 18/40  Iteration 4785/11040 Training loss: 1.2989 0.4719 sec/batch
Epoch 18/40  Iteration 4786/11040 Training loss: 1.2993 0.4614 sec/batch
Epoch 18/40  Iteration 4787/11040 Training loss: 1.2999 0.4605 sec/batch
Epoch 18/40  Iteration 4788/11040 Training loss: 1.3002 0.4721 sec/batch
Epoch 18/40  Iteration 4789/11040 Training loss: 1.3006 0.4731 sec/batch
Epoch 18/40  Iteration 4790/11040 Training loss: 1.3010 0.4761 sec/batch
Epoch 18/40  Iteration 4791/11040 Training loss: 1.3011 0.4743 sec/batch
Epoch 18/40  Iteration 4792/11040 Training loss: 1.3009 0.4759 sec/batch
Epoch 18/40  Iteration 4793/11040 Training loss: 1.3011 0.4574 sec/batch
Epoch 18/40  Iteration 4794/11040 Training loss: 1.

Epoch 18/40  Iteration 4894/11040 Training loss: 1.3074 0.4749 sec/batch
Epoch 18/40  Iteration 4895/11040 Training loss: 1.3076 0.4758 sec/batch
Epoch 18/40  Iteration 4896/11040 Training loss: 1.3078 0.4598 sec/batch
Epoch 18/40  Iteration 4897/11040 Training loss: 1.3077 0.4753 sec/batch
Epoch 18/40  Iteration 4898/11040 Training loss: 1.3077 0.4738 sec/batch
Epoch 18/40  Iteration 4899/11040 Training loss: 1.3076 0.4749 sec/batch
Epoch 18/40  Iteration 4900/11040 Training loss: 1.3077 0.4741 sec/batch
Epoch 18/40  Iteration 4901/11040 Training loss: 1.3076 0.4760 sec/batch
Epoch 18/40  Iteration 4902/11040 Training loss: 1.3076 0.4720 sec/batch
Epoch 18/40  Iteration 4903/11040 Training loss: 1.3076 0.4764 sec/batch
Epoch 18/40  Iteration 4904/11040 Training loss: 1.3076 0.4585 sec/batch
Epoch 18/40  Iteration 4905/11040 Training loss: 1.3076 0.4757 sec/batch
Epoch 18/40  Iteration 4906/11040 Training loss: 1.3076 0.4740 sec/batch
Epoch 18/40  Iteration 4907/11040 Training loss: 1.

Epoch 19/40  Iteration 5006/11040 Training loss: 1.3014 0.4731 sec/batch
Epoch 19/40  Iteration 5007/11040 Training loss: 1.3008 0.4615 sec/batch
Epoch 19/40  Iteration 5008/11040 Training loss: 1.3008 0.4566 sec/batch
Epoch 19/40  Iteration 5009/11040 Training loss: 1.3002 0.4740 sec/batch
Epoch 19/40  Iteration 5010/11040 Training loss: 1.2989 0.4752 sec/batch
Epoch 19/40  Iteration 5011/11040 Training loss: 1.2989 0.4743 sec/batch
Epoch 19/40  Iteration 5012/11040 Training loss: 1.2984 0.4769 sec/batch
Epoch 19/40  Iteration 5013/11040 Training loss: 1.2979 0.4718 sec/batch
Epoch 19/40  Iteration 5014/11040 Training loss: 1.2975 0.4762 sec/batch
Epoch 19/40  Iteration 5015/11040 Training loss: 1.2967 0.4742 sec/batch
Epoch 19/40  Iteration 5016/11040 Training loss: 1.2964 0.4593 sec/batch
Epoch 19/40  Iteration 5017/11040 Training loss: 1.2957 0.4589 sec/batch
Epoch 19/40  Iteration 5018/11040 Training loss: 1.2960 0.4596 sec/batch
Epoch 19/40  Iteration 5019/11040 Training loss: 1.

Epoch 19/40  Iteration 5119/11040 Training loss: 1.2935 0.4734 sec/batch
Epoch 19/40  Iteration 5120/11040 Training loss: 1.2937 0.4713 sec/batch
Epoch 19/40  Iteration 5121/11040 Training loss: 1.2938 0.4776 sec/batch
Epoch 19/40  Iteration 5122/11040 Training loss: 1.2939 0.4747 sec/batch
Epoch 19/40  Iteration 5123/11040 Training loss: 1.2939 0.4759 sec/batch
Epoch 19/40  Iteration 5124/11040 Training loss: 1.2939 0.4733 sec/batch
Epoch 19/40  Iteration 5125/11040 Training loss: 1.2938 0.4759 sec/batch
Epoch 19/40  Iteration 5126/11040 Training loss: 1.2940 0.4756 sec/batch
Epoch 19/40  Iteration 5127/11040 Training loss: 1.2941 0.4733 sec/batch
Epoch 19/40  Iteration 5128/11040 Training loss: 1.2941 0.4751 sec/batch
Epoch 19/40  Iteration 5129/11040 Training loss: 1.2942 0.4754 sec/batch
Epoch 19/40  Iteration 5130/11040 Training loss: 1.2942 0.4886 sec/batch
Epoch 19/40  Iteration 5131/11040 Training loss: 1.2943 0.4606 sec/batch
Epoch 19/40  Iteration 5132/11040 Training loss: 1.

Epoch 19/40  Iteration 5232/11040 Training loss: 1.2957 0.4690 sec/batch
Epoch 19/40  Iteration 5233/11040 Training loss: 1.2956 0.4817 sec/batch
Epoch 19/40  Iteration 5234/11040 Training loss: 1.2957 0.4722 sec/batch
Epoch 19/40  Iteration 5235/11040 Training loss: 1.2959 0.4603 sec/batch
Epoch 19/40  Iteration 5236/11040 Training loss: 1.2959 0.4741 sec/batch
Epoch 19/40  Iteration 5237/11040 Training loss: 1.2960 0.4749 sec/batch
Epoch 19/40  Iteration 5238/11040 Training loss: 1.2961 0.4760 sec/batch
Epoch 19/40  Iteration 5239/11040 Training loss: 1.2964 0.4730 sec/batch
Epoch 19/40  Iteration 5240/11040 Training loss: 1.2966 0.4606 sec/batch
Epoch 19/40  Iteration 5241/11040 Training loss: 1.2967 0.4744 sec/batch
Epoch 19/40  Iteration 5242/11040 Training loss: 1.2967 0.4603 sec/batch
Epoch 19/40  Iteration 5243/11040 Training loss: 1.2967 0.4730 sec/batch
Epoch 19/40  Iteration 5244/11040 Training loss: 1.2968 0.4744 sec/batch
Epoch 20/40  Iteration 5245/11040 Training loss: 1.

Epoch 20/40  Iteration 5345/11040 Training loss: 1.2786 0.4595 sec/batch
Epoch 20/40  Iteration 5346/11040 Training loss: 1.2787 0.4585 sec/batch
Epoch 20/40  Iteration 5347/11040 Training loss: 1.2783 0.4603 sec/batch
Epoch 20/40  Iteration 5348/11040 Training loss: 1.2784 0.4731 sec/batch
Epoch 20/40  Iteration 5349/11040 Training loss: 1.2783 0.4761 sec/batch
Epoch 20/40  Iteration 5350/11040 Training loss: 1.2781 0.4730 sec/batch
Epoch 20/40  Iteration 5351/11040 Training loss: 1.2781 0.4776 sec/batch
Epoch 20/40  Iteration 5352/11040 Training loss: 1.2783 0.4732 sec/batch
Epoch 20/40  Iteration 5353/11040 Training loss: 1.2787 0.4744 sec/batch
Epoch 20/40  Iteration 5354/11040 Training loss: 1.2787 0.4604 sec/batch
Epoch 20/40  Iteration 5355/11040 Training loss: 1.2787 0.4729 sec/batch
Epoch 20/40  Iteration 5356/11040 Training loss: 1.2790 0.4748 sec/batch
Epoch 20/40  Iteration 5357/11040 Training loss: 1.2792 0.4751 sec/batch
Epoch 20/40  Iteration 5358/11040 Training loss: 1.

Epoch 20/40  Iteration 5458/11040 Training loss: 1.2857 0.4748 sec/batch
Epoch 20/40  Iteration 5459/11040 Training loss: 1.2855 0.4733 sec/batch
Epoch 20/40  Iteration 5460/11040 Training loss: 1.2854 0.4757 sec/batch
Epoch 20/40  Iteration 5461/11040 Training loss: 1.2853 0.4751 sec/batch
Epoch 20/40  Iteration 5462/11040 Training loss: 1.2852 0.4741 sec/batch
Epoch 20/40  Iteration 5463/11040 Training loss: 1.2852 0.4742 sec/batch
Epoch 20/40  Iteration 5464/11040 Training loss: 1.2851 0.4606 sec/batch
Epoch 20/40  Iteration 5465/11040 Training loss: 1.2850 0.4574 sec/batch
Epoch 20/40  Iteration 5466/11040 Training loss: 1.2848 0.4599 sec/batch
Epoch 20/40  Iteration 5467/11040 Training loss: 1.2847 0.4741 sec/batch
Epoch 20/40  Iteration 5468/11040 Training loss: 1.2848 0.4746 sec/batch
Epoch 20/40  Iteration 5469/11040 Training loss: 1.2847 0.4758 sec/batch
Epoch 20/40  Iteration 5470/11040 Training loss: 1.2847 0.4594 sec/batch
Epoch 20/40  Iteration 5471/11040 Training loss: 1.

Epoch 21/40  Iteration 5570/11040 Training loss: 1.2715 0.4588 sec/batch
Epoch 21/40  Iteration 5571/11040 Training loss: 1.2712 0.4604 sec/batch
Epoch 21/40  Iteration 5572/11040 Training loss: 1.2710 0.4588 sec/batch
Epoch 21/40  Iteration 5573/11040 Training loss: 1.2712 0.4575 sec/batch
Epoch 21/40  Iteration 5574/11040 Training loss: 1.2706 0.4602 sec/batch
Epoch 21/40  Iteration 5575/11040 Training loss: 1.2705 0.4733 sec/batch
Epoch 21/40  Iteration 5576/11040 Training loss: 1.2705 0.4604 sec/batch
Epoch 21/40  Iteration 5577/11040 Training loss: 1.2705 0.4732 sec/batch
Epoch 21/40  Iteration 5578/11040 Training loss: 1.2704 0.4606 sec/batch
Epoch 21/40  Iteration 5579/11040 Training loss: 1.2706 0.4748 sec/batch
Epoch 21/40  Iteration 5580/11040 Training loss: 1.2703 0.4752 sec/batch
Epoch 21/40  Iteration 5581/11040 Training loss: 1.2704 0.4593 sec/batch
Epoch 21/40  Iteration 5582/11040 Training loss: 1.2703 0.4744 sec/batch
Epoch 21/40  Iteration 5583/11040 Training loss: 1.

Epoch 21/40  Iteration 5683/11040 Training loss: 1.2720 0.4731 sec/batch
Epoch 21/40  Iteration 5684/11040 Training loss: 1.2722 0.4742 sec/batch
Epoch 21/40  Iteration 5685/11040 Training loss: 1.2723 0.4600 sec/batch
Epoch 21/40  Iteration 5686/11040 Training loss: 1.2726 0.4590 sec/batch
Epoch 21/40  Iteration 5687/11040 Training loss: 1.2726 0.4596 sec/batch
Epoch 21/40  Iteration 5688/11040 Training loss: 1.2725 0.4584 sec/batch
Epoch 21/40  Iteration 5689/11040 Training loss: 1.2725 0.4760 sec/batch
Epoch 21/40  Iteration 5690/11040 Training loss: 1.2725 0.4730 sec/batch
Epoch 21/40  Iteration 5691/11040 Training loss: 1.2727 0.4618 sec/batch
Epoch 21/40  Iteration 5692/11040 Training loss: 1.2728 0.4575 sec/batch
Epoch 21/40  Iteration 5693/11040 Training loss: 1.2728 0.4754 sec/batch
Epoch 21/40  Iteration 5694/11040 Training loss: 1.2728 0.4754 sec/batch
Epoch 21/40  Iteration 5695/11040 Training loss: 1.2729 0.4740 sec/batch
Epoch 21/40  Iteration 5696/11040 Training loss: 1.

Epoch 21/40  Iteration 5796/11040 Training loss: 1.2754 0.4840 sec/batch
Epoch 22/40  Iteration 5797/11040 Training loss: 1.3540 0.4781 sec/batch
Epoch 22/40  Iteration 5798/11040 Training loss: 1.3152 0.4750 sec/batch
Epoch 22/40  Iteration 5799/11040 Training loss: 1.3118 0.4752 sec/batch
Epoch 22/40  Iteration 5800/11040 Training loss: 1.3080 0.4605 sec/batch
Epoch 22/40  Iteration 5801/11040 Training loss: 1.3061 0.4738 sec/batch
Epoch 22/40  Iteration 5802/11040 Training loss: 1.3031 0.4740 sec/batch
Epoch 22/40  Iteration 5803/11040 Training loss: 1.2934 0.4589 sec/batch
Epoch 22/40  Iteration 5804/11040 Training loss: 1.2906 0.4603 sec/batch
Epoch 22/40  Iteration 5805/11040 Training loss: 1.2868 0.4734 sec/batch
Epoch 22/40  Iteration 5806/11040 Training loss: 1.2865 0.4751 sec/batch
Epoch 22/40  Iteration 5807/11040 Training loss: 1.2828 0.4751 sec/batch
Epoch 22/40  Iteration 5808/11040 Training loss: 1.2821 0.4732 sec/batch
Epoch 22/40  Iteration 5809/11040 Training loss: 1.

Epoch 22/40  Iteration 5909/11040 Training loss: 1.2604 0.4688 sec/batch
Epoch 22/40  Iteration 5910/11040 Training loss: 1.2604 0.4810 sec/batch
Epoch 22/40  Iteration 5911/11040 Training loss: 1.2608 0.4746 sec/batch
Epoch 22/40  Iteration 5912/11040 Training loss: 1.2611 0.4748 sec/batch
Epoch 22/40  Iteration 5913/11040 Training loss: 1.2610 0.4585 sec/batch
Epoch 22/40  Iteration 5914/11040 Training loss: 1.2613 0.4597 sec/batch
Epoch 22/40  Iteration 5915/11040 Training loss: 1.2615 0.4740 sec/batch
Epoch 22/40  Iteration 5916/11040 Training loss: 1.2616 0.4749 sec/batch
Epoch 22/40  Iteration 5917/11040 Training loss: 1.2617 0.4743 sec/batch
Epoch 22/40  Iteration 5918/11040 Training loss: 1.2617 0.4746 sec/batch
Epoch 22/40  Iteration 5919/11040 Training loss: 1.2620 0.4762 sec/batch
Epoch 22/40  Iteration 5920/11040 Training loss: 1.2625 0.4732 sec/batch
Epoch 22/40  Iteration 5921/11040 Training loss: 1.2626 0.4771 sec/batch
Epoch 22/40  Iteration 5922/11040 Training loss: 1.

Epoch 22/40  Iteration 6021/11040 Training loss: 1.2664 0.4748 sec/batch
Epoch 22/40  Iteration 6022/11040 Training loss: 1.2664 0.4600 sec/batch
Epoch 22/40  Iteration 6023/11040 Training loss: 1.2662 0.4593 sec/batch
Epoch 22/40  Iteration 6024/11040 Training loss: 1.2661 0.4584 sec/batch
Epoch 22/40  Iteration 6025/11040 Training loss: 1.2662 0.4606 sec/batch
Epoch 22/40  Iteration 6026/11040 Training loss: 1.2663 0.4592 sec/batch
Epoch 22/40  Iteration 6027/11040 Training loss: 1.2662 0.4729 sec/batch
Epoch 22/40  Iteration 6028/11040 Training loss: 1.2663 0.4751 sec/batch
Epoch 22/40  Iteration 6029/11040 Training loss: 1.2662 0.4740 sec/batch
Epoch 22/40  Iteration 6030/11040 Training loss: 1.2661 0.4760 sec/batch
Epoch 22/40  Iteration 6031/11040 Training loss: 1.2662 0.4733 sec/batch
Epoch 22/40  Iteration 6032/11040 Training loss: 1.2662 0.4760 sec/batch
Epoch 22/40  Iteration 6033/11040 Training loss: 1.2663 0.4741 sec/batch
Epoch 22/40  Iteration 6034/11040 Training loss: 1.

Epoch 23/40  Iteration 6134/11040 Training loss: 1.2526 0.4701 sec/batch
Epoch 23/40  Iteration 6135/11040 Training loss: 1.2526 0.4636 sec/batch
Epoch 23/40  Iteration 6136/11040 Training loss: 1.2524 0.4605 sec/batch
Epoch 23/40  Iteration 6137/11040 Training loss: 1.2526 0.4732 sec/batch
Epoch 23/40  Iteration 6138/11040 Training loss: 1.2524 0.4753 sec/batch
Epoch 23/40  Iteration 6139/11040 Training loss: 1.2523 0.4705 sec/batch
Epoch 23/40  Iteration 6140/11040 Training loss: 1.2525 0.4742 sec/batch
Epoch 23/40  Iteration 6141/11040 Training loss: 1.2524 0.4797 sec/batch
Epoch 23/40  Iteration 6142/11040 Training loss: 1.2523 0.4694 sec/batch
Epoch 23/40  Iteration 6143/11040 Training loss: 1.2520 0.4589 sec/batch
Epoch 23/40  Iteration 6144/11040 Training loss: 1.2514 0.4611 sec/batch
Epoch 23/40  Iteration 6145/11040 Training loss: 1.2509 0.4575 sec/batch
Epoch 23/40  Iteration 6146/11040 Training loss: 1.2505 0.4581 sec/batch
Epoch 23/40  Iteration 6147/11040 Training loss: 1.

Epoch 23/40  Iteration 6247/11040 Training loss: 1.2547 0.4744 sec/batch
Epoch 23/40  Iteration 6248/11040 Training loss: 1.2548 0.4743 sec/batch
Epoch 23/40  Iteration 6249/11040 Training loss: 1.2549 0.4752 sec/batch
Epoch 23/40  Iteration 6250/11040 Training loss: 1.2549 0.4712 sec/batch
Epoch 23/40  Iteration 6251/11040 Training loss: 1.2551 0.4784 sec/batch
Epoch 23/40  Iteration 6252/11040 Training loss: 1.2552 0.4763 sec/batch
Epoch 23/40  Iteration 6253/11040 Training loss: 1.2553 0.4737 sec/batch
Epoch 23/40  Iteration 6254/11040 Training loss: 1.2552 0.4921 sec/batch
Epoch 23/40  Iteration 6255/11040 Training loss: 1.2551 0.4768 sec/batch
Epoch 23/40  Iteration 6256/11040 Training loss: 1.2553 0.4738 sec/batch
Epoch 23/40  Iteration 6257/11040 Training loss: 1.2554 0.4754 sec/batch
Epoch 23/40  Iteration 6258/11040 Training loss: 1.2555 0.4612 sec/batch
Epoch 23/40  Iteration 6259/11040 Training loss: 1.2557 0.4733 sec/batch
Epoch 23/40  Iteration 6260/11040 Training loss: 1.

Epoch 24/40  Iteration 6360/11040 Training loss: 1.2642 0.4755 sec/batch
Epoch 24/40  Iteration 6361/11040 Training loss: 1.2606 0.4742 sec/batch
Epoch 24/40  Iteration 6362/11040 Training loss: 1.2581 0.4761 sec/batch
Epoch 24/40  Iteration 6363/11040 Training loss: 1.2571 0.4772 sec/batch
Epoch 24/40  Iteration 6364/11040 Training loss: 1.2569 0.4889 sec/batch
Epoch 24/40  Iteration 6365/11040 Training loss: 1.2545 0.4773 sec/batch
Epoch 24/40  Iteration 6366/11040 Training loss: 1.2522 0.4752 sec/batch
Epoch 24/40  Iteration 6367/11040 Training loss: 1.2517 0.4643 sec/batch
Epoch 24/40  Iteration 6368/11040 Training loss: 1.2517 0.4879 sec/batch
Epoch 24/40  Iteration 6369/11040 Training loss: 1.2535 0.4738 sec/batch
Epoch 24/40  Iteration 6370/11040 Training loss: 1.2541 0.4756 sec/batch
Epoch 24/40  Iteration 6371/11040 Training loss: 1.2538 0.4916 sec/batch
Epoch 24/40  Iteration 6372/11040 Training loss: 1.2539 0.4748 sec/batch
Epoch 24/40  Iteration 6373/11040 Training loss: 1.

Epoch 24/40  Iteration 6473/11040 Training loss: 1.2442 0.4758 sec/batch
Epoch 24/40  Iteration 6474/11040 Training loss: 1.2445 0.4745 sec/batch
Epoch 24/40  Iteration 6475/11040 Training loss: 1.2446 0.4731 sec/batch
Epoch 24/40  Iteration 6476/11040 Training loss: 1.2446 0.4752 sec/batch
Epoch 24/40  Iteration 6477/11040 Training loss: 1.2445 0.4595 sec/batch
Epoch 24/40  Iteration 6478/11040 Training loss: 1.2447 0.4595 sec/batch
Epoch 24/40  Iteration 6479/11040 Training loss: 1.2447 0.4743 sec/batch
Epoch 24/40  Iteration 6480/11040 Training loss: 1.2445 0.4612 sec/batch
Epoch 24/40  Iteration 6481/11040 Training loss: 1.2445 0.4741 sec/batch
Epoch 24/40  Iteration 6482/11040 Training loss: 1.2444 0.4759 sec/batch
Epoch 24/40  Iteration 6483/11040 Training loss: 1.2446 0.4752 sec/batch
Epoch 24/40  Iteration 6484/11040 Training loss: 1.2447 0.4748 sec/batch
Epoch 24/40  Iteration 6485/11040 Training loss: 1.2446 0.4741 sec/batch
Epoch 24/40  Iteration 6486/11040 Training loss: 1.

Epoch 24/40  Iteration 6585/11040 Training loss: 1.2484 0.4745 sec/batch
Epoch 24/40  Iteration 6586/11040 Training loss: 1.2484 0.4643 sec/batch
Epoch 24/40  Iteration 6587/11040 Training loss: 1.2485 0.4703 sec/batch
Epoch 24/40  Iteration 6588/11040 Training loss: 1.2486 0.4631 sec/batch
Epoch 24/40  Iteration 6589/11040 Training loss: 1.2487 0.4733 sec/batch
Epoch 24/40  Iteration 6590/11040 Training loss: 1.2486 0.4734 sec/batch
Epoch 24/40  Iteration 6591/11040 Training loss: 1.2486 0.4762 sec/batch
Epoch 24/40  Iteration 6592/11040 Training loss: 1.2486 0.4737 sec/batch
Epoch 24/40  Iteration 6593/11040 Training loss: 1.2485 0.4759 sec/batch
Epoch 24/40  Iteration 6594/11040 Training loss: 1.2486 0.4789 sec/batch
Epoch 24/40  Iteration 6595/11040 Training loss: 1.2486 0.4717 sec/batch
Epoch 24/40  Iteration 6596/11040 Training loss: 1.2485 0.4774 sec/batch
Epoch 24/40  Iteration 6597/11040 Training loss: 1.2485 0.4732 sec/batch
Epoch 24/40  Iteration 6598/11040 Training loss: 1.

Epoch 25/40  Iteration 6698/11040 Training loss: 1.2343 0.4729 sec/batch
Epoch 25/40  Iteration 6699/11040 Training loss: 1.2343 0.4749 sec/batch
Epoch 25/40  Iteration 6700/11040 Training loss: 1.2347 0.4641 sec/batch
Epoch 25/40  Iteration 6701/11040 Training loss: 1.2342 0.4694 sec/batch
Epoch 25/40  Iteration 6702/11040 Training loss: 1.2338 0.4742 sec/batch
Epoch 25/40  Iteration 6703/11040 Training loss: 1.2338 0.4775 sec/batch
Epoch 25/40  Iteration 6704/11040 Training loss: 1.2338 0.4747 sec/batch
Epoch 25/40  Iteration 6705/11040 Training loss: 1.2337 0.4755 sec/batch
Epoch 25/40  Iteration 6706/11040 Training loss: 1.2338 0.4743 sec/batch
Epoch 25/40  Iteration 6707/11040 Training loss: 1.2338 0.4759 sec/batch
Epoch 25/40  Iteration 6708/11040 Training loss: 1.2337 0.4733 sec/batch
Epoch 25/40  Iteration 6709/11040 Training loss: 1.2335 0.4766 sec/batch
Epoch 25/40  Iteration 6710/11040 Training loss: 1.2334 0.4753 sec/batch
Epoch 25/40  Iteration 6711/11040 Training loss: 1.

Epoch 25/40  Iteration 6811/11040 Training loss: 1.2394 0.4588 sec/batch
Epoch 25/40  Iteration 6812/11040 Training loss: 1.2395 0.4912 sec/batch
Epoch 25/40  Iteration 6813/11040 Training loss: 1.2395 0.4745 sec/batch
Epoch 25/40  Iteration 6814/11040 Training loss: 1.2394 0.4642 sec/batch
Epoch 25/40  Iteration 6815/11040 Training loss: 1.2393 0.4718 sec/batch
Epoch 25/40  Iteration 6816/11040 Training loss: 1.2393 0.4747 sec/batch
Epoch 25/40  Iteration 6817/11040 Training loss: 1.2394 0.4756 sec/batch
Epoch 25/40  Iteration 6818/11040 Training loss: 1.2394 0.4791 sec/batch
Epoch 25/40  Iteration 6819/11040 Training loss: 1.2396 0.4716 sec/batch
Epoch 25/40  Iteration 6820/11040 Training loss: 1.2395 0.4739 sec/batch
Epoch 25/40  Iteration 6821/11040 Training loss: 1.2396 0.4601 sec/batch
Epoch 25/40  Iteration 6822/11040 Training loss: 1.2396 0.4743 sec/batch
Epoch 25/40  Iteration 6823/11040 Training loss: 1.2397 0.4763 sec/batch
Epoch 25/40  Iteration 6824/11040 Training loss: 1.

Epoch 26/40  Iteration 6924/11040 Training loss: 1.2374 0.4740 sec/batch
Epoch 26/40  Iteration 6925/11040 Training loss: 1.2363 0.4773 sec/batch
Epoch 26/40  Iteration 6926/11040 Training loss: 1.2359 0.4733 sec/batch
Epoch 26/40  Iteration 6927/11040 Training loss: 1.2355 0.4746 sec/batch
Epoch 26/40  Iteration 6928/11040 Training loss: 1.2351 0.4744 sec/batch
Epoch 26/40  Iteration 6929/11040 Training loss: 1.2348 0.4764 sec/batch
Epoch 26/40  Iteration 6930/11040 Training loss: 1.2351 0.4754 sec/batch
Epoch 26/40  Iteration 6931/11040 Training loss: 1.2349 0.4744 sec/batch
Epoch 26/40  Iteration 6932/11040 Training loss: 1.2346 0.4753 sec/batch
Epoch 26/40  Iteration 6933/11040 Training loss: 1.2340 0.4752 sec/batch
Epoch 26/40  Iteration 6934/11040 Training loss: 1.2336 0.4751 sec/batch
Epoch 26/40  Iteration 6935/11040 Training loss: 1.2337 0.4743 sec/batch
Epoch 26/40  Iteration 6936/11040 Training loss: 1.2343 0.4743 sec/batch
Epoch 26/40  Iteration 6937/11040 Training loss: 1.

Epoch 26/40  Iteration 7036/11040 Training loss: 1.2296 0.4745 sec/batch
Epoch 26/40  Iteration 7037/11040 Training loss: 1.2294 0.4779 sec/batch
Epoch 26/40  Iteration 7038/11040 Training loss: 1.2295 0.4730 sec/batch
Epoch 26/40  Iteration 7039/11040 Training loss: 1.2297 0.4758 sec/batch
Epoch 26/40  Iteration 7040/11040 Training loss: 1.2298 0.4912 sec/batch
Epoch 26/40  Iteration 7041/11040 Training loss: 1.2299 0.4765 sec/batch
Epoch 26/40  Iteration 7042/11040 Training loss: 1.2300 0.4741 sec/batch
Epoch 26/40  Iteration 7043/11040 Training loss: 1.2300 0.4753 sec/batch
Epoch 26/40  Iteration 7044/11040 Training loss: 1.2299 0.4743 sec/batch
Epoch 26/40  Iteration 7045/11040 Training loss: 1.2299 0.4771 sec/batch
Epoch 26/40  Iteration 7046/11040 Training loss: 1.2299 0.4733 sec/batch
Epoch 26/40  Iteration 7047/11040 Training loss: 1.2300 0.4745 sec/batch
Epoch 26/40  Iteration 7048/11040 Training loss: 1.2299 0.4616 sec/batch
Epoch 26/40  Iteration 7049/11040 Training loss: 1.

Epoch 26/40  Iteration 7149/11040 Training loss: 1.2325 0.4760 sec/batch
Epoch 26/40  Iteration 7150/11040 Training loss: 1.2325 0.4736 sec/batch
Epoch 26/40  Iteration 7151/11040 Training loss: 1.2324 0.4735 sec/batch
Epoch 26/40  Iteration 7152/11040 Training loss: 1.2324 0.4769 sec/batch
Epoch 26/40  Iteration 7153/11040 Training loss: 1.2324 0.4769 sec/batch
Epoch 26/40  Iteration 7154/11040 Training loss: 1.2325 0.4706 sec/batch
Epoch 26/40  Iteration 7155/11040 Training loss: 1.2324 0.4836 sec/batch
Epoch 26/40  Iteration 7156/11040 Training loss: 1.2325 0.4849 sec/batch
Epoch 26/40  Iteration 7157/11040 Training loss: 1.2326 0.4758 sec/batch
Epoch 26/40  Iteration 7158/11040 Training loss: 1.2327 0.4759 sec/batch
Epoch 26/40  Iteration 7159/11040 Training loss: 1.2327 0.4775 sec/batch
Epoch 26/40  Iteration 7160/11040 Training loss: 1.2327 0.4742 sec/batch
Epoch 26/40  Iteration 7161/11040 Training loss: 1.2327 0.4874 sec/batch
Epoch 26/40  Iteration 7162/11040 Training loss: 1.

Epoch 27/40  Iteration 7262/11040 Training loss: 1.2171 0.4758 sec/batch
Epoch 27/40  Iteration 7263/11040 Training loss: 1.2166 0.4766 sec/batch
Epoch 27/40  Iteration 7264/11040 Training loss: 1.2157 0.4738 sec/batch
Epoch 27/40  Iteration 7265/11040 Training loss: 1.2157 0.4766 sec/batch
Epoch 27/40  Iteration 7266/11040 Training loss: 1.2157 0.4756 sec/batch
Epoch 27/40  Iteration 7267/11040 Training loss: 1.2158 0.4735 sec/batch
Epoch 27/40  Iteration 7268/11040 Training loss: 1.2157 0.4754 sec/batch
Epoch 27/40  Iteration 7269/11040 Training loss: 1.2156 0.4903 sec/batch
Epoch 27/40  Iteration 7270/11040 Training loss: 1.2158 0.4745 sec/batch
Epoch 27/40  Iteration 7271/11040 Training loss: 1.2164 0.4752 sec/batch
Epoch 27/40  Iteration 7272/11040 Training loss: 1.2166 0.4743 sec/batch
Epoch 27/40  Iteration 7273/11040 Training loss: 1.2171 0.4753 sec/batch
Epoch 27/40  Iteration 7274/11040 Training loss: 1.2176 0.4746 sec/batch
Epoch 27/40  Iteration 7275/11040 Training loss: 1.

Epoch 27/40  Iteration 7375/11040 Training loss: 1.2239 0.4752 sec/batch
Epoch 27/40  Iteration 7376/11040 Training loss: 1.2240 0.4762 sec/batch
Epoch 27/40  Iteration 7377/11040 Training loss: 1.2241 0.4741 sec/batch
Epoch 27/40  Iteration 7378/11040 Training loss: 1.2243 0.4732 sec/batch
Epoch 27/40  Iteration 7379/11040 Training loss: 1.2244 0.4748 sec/batch
Epoch 27/40  Iteration 7380/11040 Training loss: 1.2245 0.4773 sec/batch
Epoch 27/40  Iteration 7381/11040 Training loss: 1.2245 0.4576 sec/batch
Epoch 27/40  Iteration 7382/11040 Training loss: 1.2244 0.4743 sec/batch
Epoch 27/40  Iteration 7383/11040 Training loss: 1.2245 0.4751 sec/batch
Epoch 27/40  Iteration 7384/11040 Training loss: 1.2245 0.4754 sec/batch
Epoch 27/40  Iteration 7385/11040 Training loss: 1.2244 0.4607 sec/batch
Epoch 27/40  Iteration 7386/11040 Training loss: 1.2243 0.4743 sec/batch
Epoch 27/40  Iteration 7387/11040 Training loss: 1.2243 0.4738 sec/batch
Epoch 27/40  Iteration 7388/11040 Training loss: 1.

Epoch 28/40  Iteration 7488/11040 Training loss: 1.2194 0.4902 sec/batch
Epoch 28/40  Iteration 7489/11040 Training loss: 1.2194 0.4753 sec/batch
Epoch 28/40  Iteration 7490/11040 Training loss: 1.2187 0.4745 sec/batch
Epoch 28/40  Iteration 7491/11040 Training loss: 1.2181 0.4928 sec/batch
Epoch 28/40  Iteration 7492/11040 Training loss: 1.2180 0.4761 sec/batch
Epoch 28/40  Iteration 7493/11040 Training loss: 1.2176 0.4698 sec/batch
Epoch 28/40  Iteration 7494/11040 Training loss: 1.2164 0.4643 sec/batch
Epoch 28/40  Iteration 7495/11040 Training loss: 1.2165 0.4746 sec/batch
Epoch 28/40  Iteration 7496/11040 Training loss: 1.2156 0.4770 sec/batch
Epoch 28/40  Iteration 7497/11040 Training loss: 1.2151 0.4764 sec/batch
Epoch 28/40  Iteration 7498/11040 Training loss: 1.2143 0.4906 sec/batch
Epoch 28/40  Iteration 7499/11040 Training loss: 1.2136 0.4750 sec/batch
Epoch 28/40  Iteration 7500/11040 Training loss: 1.2135 0.4752 sec/batch
Validation loss: 1.24845 Saving checkpoint!
Epoch 2

Epoch 28/40  Iteration 7600/11040 Training loss: 1.2136 0.4747 sec/batch
Epoch 28/40  Iteration 7601/11040 Training loss: 1.2136 0.4754 sec/batch
Epoch 28/40  Iteration 7602/11040 Training loss: 1.2136 0.4768 sec/batch
Epoch 28/40  Iteration 7603/11040 Training loss: 1.2136 0.4825 sec/batch
Epoch 28/40  Iteration 7604/11040 Training loss: 1.2138 0.4823 sec/batch
Epoch 28/40  Iteration 7605/11040 Training loss: 1.2139 0.4734 sec/batch
Epoch 28/40  Iteration 7606/11040 Training loss: 1.2139 0.4600 sec/batch
Epoch 28/40  Iteration 7607/11040 Training loss: 1.2140 0.4784 sec/batch
Epoch 28/40  Iteration 7608/11040 Training loss: 1.2140 0.4704 sec/batch
Epoch 28/40  Iteration 7609/11040 Training loss: 1.2139 0.4649 sec/batch
Epoch 28/40  Iteration 7610/11040 Training loss: 1.2141 0.4703 sec/batch
Epoch 28/40  Iteration 7611/11040 Training loss: 1.2143 0.4753 sec/batch
Epoch 28/40  Iteration 7612/11040 Training loss: 1.2143 0.4754 sec/batch
Epoch 28/40  Iteration 7613/11040 Training loss: 1.

Epoch 28/40  Iteration 7713/11040 Training loss: 1.2173 0.4752 sec/batch
Epoch 28/40  Iteration 7714/11040 Training loss: 1.2175 0.4750 sec/batch
Epoch 28/40  Iteration 7715/11040 Training loss: 1.2175 0.4904 sec/batch
Epoch 28/40  Iteration 7716/11040 Training loss: 1.2176 0.4766 sec/batch
Epoch 28/40  Iteration 7717/11040 Training loss: 1.2176 0.4746 sec/batch
Epoch 28/40  Iteration 7718/11040 Training loss: 1.2177 0.4749 sec/batch
Epoch 28/40  Iteration 7719/11040 Training loss: 1.2179 0.4770 sec/batch
Epoch 28/40  Iteration 7720/11040 Training loss: 1.2180 0.4662 sec/batch
Epoch 28/40  Iteration 7721/11040 Training loss: 1.2181 0.4699 sec/batch
Epoch 28/40  Iteration 7722/11040 Training loss: 1.2182 0.4741 sec/batch
Epoch 28/40  Iteration 7723/11040 Training loss: 1.2185 0.4761 sec/batch
Epoch 28/40  Iteration 7724/11040 Training loss: 1.2188 0.4763 sec/batch
Epoch 28/40  Iteration 7725/11040 Training loss: 1.2189 0.4740 sec/batch
Epoch 28/40  Iteration 7726/11040 Training loss: 1.

Epoch 29/40  Iteration 7826/11040 Training loss: 1.2027 0.4745 sec/batch
Epoch 29/40  Iteration 7827/11040 Training loss: 1.2030 0.4748 sec/batch
Epoch 29/40  Iteration 7828/11040 Training loss: 1.2029 0.4764 sec/batch
Epoch 29/40  Iteration 7829/11040 Training loss: 1.2031 0.4751 sec/batch
Epoch 29/40  Iteration 7830/11040 Training loss: 1.2033 0.4737 sec/batch
Epoch 29/40  Iteration 7831/11040 Training loss: 1.2029 0.4771 sec/batch
Epoch 29/40  Iteration 7832/11040 Training loss: 1.2031 0.4732 sec/batch
Epoch 29/40  Iteration 7833/11040 Training loss: 1.2029 0.4646 sec/batch
Epoch 29/40  Iteration 7834/11040 Training loss: 1.2028 0.4700 sec/batch
Epoch 29/40  Iteration 7835/11040 Training loss: 1.2026 0.4757 sec/batch
Epoch 29/40  Iteration 7836/11040 Training loss: 1.2029 0.4930 sec/batch
Epoch 29/40  Iteration 7837/11040 Training loss: 1.2033 0.4902 sec/batch
Epoch 29/40  Iteration 7838/11040 Training loss: 1.2033 0.4747 sec/batch
Epoch 29/40  Iteration 7839/11040 Training loss: 1.

Epoch 29/40  Iteration 7939/11040 Training loss: 1.2105 0.4901 sec/batch
Epoch 29/40  Iteration 7940/11040 Training loss: 1.2105 0.4735 sec/batch
Epoch 29/40  Iteration 7941/11040 Training loss: 1.2105 0.4919 sec/batch
Epoch 29/40  Iteration 7942/11040 Training loss: 1.2104 0.4794 sec/batch
Epoch 29/40  Iteration 7943/11040 Training loss: 1.2103 0.4707 sec/batch
Epoch 29/40  Iteration 7944/11040 Training loss: 1.2102 0.4756 sec/batch
Epoch 29/40  Iteration 7945/11040 Training loss: 1.2101 0.4761 sec/batch
Epoch 29/40  Iteration 7946/11040 Training loss: 1.2100 0.4737 sec/batch
Epoch 29/40  Iteration 7947/11040 Training loss: 1.2099 0.4767 sec/batch
Epoch 29/40  Iteration 7948/11040 Training loss: 1.2098 0.4918 sec/batch
Epoch 29/40  Iteration 7949/11040 Training loss: 1.2097 0.4745 sec/batch
Epoch 29/40  Iteration 7950/11040 Training loss: 1.2095 0.4687 sec/batch
Epoch 29/40  Iteration 7951/11040 Training loss: 1.2095 0.4718 sec/batch
Epoch 29/40  Iteration 7952/11040 Training loss: 1.

Epoch 30/40  Iteration 8051/11040 Training loss: 1.2008 0.4740 sec/batch
Epoch 30/40  Iteration 8052/11040 Training loss: 1.2004 0.4757 sec/batch
Epoch 30/40  Iteration 8053/11040 Training loss: 1.2001 0.4737 sec/batch
Epoch 30/40  Iteration 8054/11040 Training loss: 1.2005 0.4791 sec/batch
Epoch 30/40  Iteration 8055/11040 Training loss: 1.2003 0.4862 sec/batch
Epoch 30/40  Iteration 8056/11040 Training loss: 1.2001 0.4765 sec/batch
Epoch 30/40  Iteration 8057/11040 Training loss: 1.2006 0.4754 sec/batch
Epoch 30/40  Iteration 8058/11040 Training loss: 1.2004 0.4753 sec/batch
Epoch 30/40  Iteration 8059/11040 Training loss: 1.2004 0.4745 sec/batch
Epoch 30/40  Iteration 8060/11040 Training loss: 1.2005 0.4742 sec/batch
Epoch 30/40  Iteration 8061/11040 Training loss: 1.2003 0.4754 sec/batch
Epoch 30/40  Iteration 8062/11040 Training loss: 1.2002 0.4758 sec/batch
Epoch 30/40  Iteration 8063/11040 Training loss: 1.2004 0.4755 sec/batch
Epoch 30/40  Iteration 8064/11040 Training loss: 1.

Epoch 30/40  Iteration 8164/11040 Training loss: 1.2016 0.4737 sec/batch
Epoch 30/40  Iteration 8165/11040 Training loss: 1.2017 0.4744 sec/batch
Epoch 30/40  Iteration 8166/11040 Training loss: 1.2018 0.4749 sec/batch
Epoch 30/40  Iteration 8167/11040 Training loss: 1.2018 0.4768 sec/batch
Epoch 30/40  Iteration 8168/11040 Training loss: 1.2020 0.4738 sec/batch
Epoch 30/40  Iteration 8169/11040 Training loss: 1.2022 0.4759 sec/batch
Epoch 30/40  Iteration 8170/11040 Training loss: 1.2025 0.4747 sec/batch
Epoch 30/40  Iteration 8171/11040 Training loss: 1.2025 0.4756 sec/batch
Epoch 30/40  Iteration 8172/11040 Training loss: 1.2024 0.4772 sec/batch
Epoch 30/40  Iteration 8173/11040 Training loss: 1.2025 0.4747 sec/batch
Epoch 30/40  Iteration 8174/11040 Training loss: 1.2024 0.4742 sec/batch
Epoch 30/40  Iteration 8175/11040 Training loss: 1.2027 0.4921 sec/batch
Epoch 30/40  Iteration 8176/11040 Training loss: 1.2028 0.4760 sec/batch
Epoch 30/40  Iteration 8177/11040 Training loss: 1.

Epoch 30/40  Iteration 8277/11040 Training loss: 1.2060 0.4918 sec/batch
Epoch 30/40  Iteration 8278/11040 Training loss: 1.2061 0.4728 sec/batch
Epoch 30/40  Iteration 8279/11040 Training loss: 1.2061 0.4748 sec/batch
Epoch 30/40  Iteration 8280/11040 Training loss: 1.2063 0.4770 sec/batch
Epoch 31/40  Iteration 8281/11040 Training loss: 1.2818 0.4746 sec/batch
Epoch 31/40  Iteration 8282/11040 Training loss: 1.2490 0.4772 sec/batch
Epoch 31/40  Iteration 8283/11040 Training loss: 1.2407 0.4894 sec/batch
Epoch 31/40  Iteration 8284/11040 Training loss: 1.2376 0.4760 sec/batch
Epoch 31/40  Iteration 8285/11040 Training loss: 1.2352 0.4750 sec/batch
Epoch 31/40  Iteration 8286/11040 Training loss: 1.2311 0.4734 sec/batch
Epoch 31/40  Iteration 8287/11040 Training loss: 1.2219 0.4808 sec/batch
Epoch 31/40  Iteration 8288/11040 Training loss: 1.2192 0.4709 sec/batch
Epoch 31/40  Iteration 8289/11040 Training loss: 1.2155 0.4757 sec/batch
Epoch 31/40  Iteration 8290/11040 Training loss: 1.

Epoch 31/40  Iteration 8390/11040 Training loss: 1.1898 0.4746 sec/batch
Epoch 31/40  Iteration 8391/11040 Training loss: 1.1898 0.4760 sec/batch
Epoch 31/40  Iteration 8392/11040 Training loss: 1.1900 0.4751 sec/batch
Epoch 31/40  Iteration 8393/11040 Training loss: 1.1901 0.4747 sec/batch
Epoch 31/40  Iteration 8394/11040 Training loss: 1.1901 0.4768 sec/batch
Epoch 31/40  Iteration 8395/11040 Training loss: 1.1904 0.4766 sec/batch
Epoch 31/40  Iteration 8396/11040 Training loss: 1.1907 0.4742 sec/batch
Epoch 31/40  Iteration 8397/11040 Training loss: 1.1906 0.4764 sec/batch
Epoch 31/40  Iteration 8398/11040 Training loss: 1.1908 0.4752 sec/batch
Epoch 31/40  Iteration 8399/11040 Training loss: 1.1909 0.4911 sec/batch
Epoch 31/40  Iteration 8400/11040 Training loss: 1.1910 0.4889 sec/batch
Epoch 31/40  Iteration 8401/11040 Training loss: 1.1910 0.4784 sec/batch
Epoch 31/40  Iteration 8402/11040 Training loss: 1.1911 0.4753 sec/batch
Epoch 31/40  Iteration 8403/11040 Training loss: 1.

Epoch 31/40  Iteration 8502/11040 Training loss: 1.1974 0.4624 sec/batch
Epoch 31/40  Iteration 8503/11040 Training loss: 1.1974 0.4724 sec/batch
Epoch 31/40  Iteration 8504/11040 Training loss: 1.1975 0.4756 sec/batch
Epoch 31/40  Iteration 8505/11040 Training loss: 1.1974 0.4732 sec/batch
Epoch 31/40  Iteration 8506/11040 Training loss: 1.1974 0.4738 sec/batch
Epoch 31/40  Iteration 8507/11040 Training loss: 1.1973 0.4602 sec/batch
Epoch 31/40  Iteration 8508/11040 Training loss: 1.1972 0.4596 sec/batch
Epoch 31/40  Iteration 8509/11040 Training loss: 1.1973 0.4583 sec/batch
Epoch 31/40  Iteration 8510/11040 Training loss: 1.1973 0.4750 sec/batch
Epoch 31/40  Iteration 8511/11040 Training loss: 1.1974 0.4741 sec/batch
Epoch 31/40  Iteration 8512/11040 Training loss: 1.1974 0.4751 sec/batch
Epoch 31/40  Iteration 8513/11040 Training loss: 1.1974 0.4756 sec/batch
Epoch 31/40  Iteration 8514/11040 Training loss: 1.1973 0.4745 sec/batch
Epoch 31/40  Iteration 8515/11040 Training loss: 1.

Epoch 32/40  Iteration 8615/11040 Training loss: 1.1870 0.4919 sec/batch
Epoch 32/40  Iteration 8616/11040 Training loss: 1.1865 0.4739 sec/batch
Epoch 32/40  Iteration 8617/11040 Training loss: 1.1865 0.4766 sec/batch
Epoch 32/40  Iteration 8618/11040 Training loss: 1.1862 0.4738 sec/batch
Epoch 32/40  Iteration 8619/11040 Training loss: 1.1864 0.4766 sec/batch
Epoch 32/40  Iteration 8620/11040 Training loss: 1.1863 0.4738 sec/batch
Epoch 32/40  Iteration 8621/11040 Training loss: 1.1866 0.4761 sec/batch
Epoch 32/40  Iteration 8622/11040 Training loss: 1.1865 0.4906 sec/batch
Epoch 32/40  Iteration 8623/11040 Training loss: 1.1863 0.4765 sec/batch
Epoch 32/40  Iteration 8624/11040 Training loss: 1.1863 0.4761 sec/batch
Epoch 32/40  Iteration 8625/11040 Training loss: 1.1862 0.4727 sec/batch
Epoch 32/40  Iteration 8626/11040 Training loss: 1.1860 0.4915 sec/batch
Epoch 32/40  Iteration 8627/11040 Training loss: 1.1858 0.4746 sec/batch
Epoch 32/40  Iteration 8628/11040 Training loss: 1.

Epoch 32/40  Iteration 8728/11040 Training loss: 1.1891 0.4733 sec/batch
Epoch 32/40  Iteration 8729/11040 Training loss: 1.1891 0.4746 sec/batch
Epoch 32/40  Iteration 8730/11040 Training loss: 1.1891 0.4600 sec/batch
Epoch 32/40  Iteration 8731/11040 Training loss: 1.1891 0.4760 sec/batch
Epoch 32/40  Iteration 8732/11040 Training loss: 1.1892 0.4745 sec/batch
Epoch 32/40  Iteration 8733/11040 Training loss: 1.1893 0.4902 sec/batch
Epoch 32/40  Iteration 8734/11040 Training loss: 1.1893 0.4754 sec/batch
Epoch 32/40  Iteration 8735/11040 Training loss: 1.1896 0.4750 sec/batch
Epoch 32/40  Iteration 8736/11040 Training loss: 1.1896 0.4730 sec/batch
Epoch 32/40  Iteration 8737/11040 Training loss: 1.1897 0.4773 sec/batch
Epoch 32/40  Iteration 8738/11040 Training loss: 1.1897 0.4749 sec/batch
Epoch 32/40  Iteration 8739/11040 Training loss: 1.1896 0.4806 sec/batch
Epoch 32/40  Iteration 8740/11040 Training loss: 1.1897 0.4852 sec/batch
Epoch 32/40  Iteration 8741/11040 Training loss: 1.

Epoch 33/40  Iteration 8841/11040 Training loss: 1.2059 0.4769 sec/batch
Epoch 33/40  Iteration 8842/11040 Training loss: 1.2050 0.4764 sec/batch
Epoch 33/40  Iteration 8843/11040 Training loss: 1.2000 0.4925 sec/batch
Epoch 33/40  Iteration 8844/11040 Training loss: 1.1991 0.4728 sec/batch
Epoch 33/40  Iteration 8845/11040 Training loss: 1.1955 0.4761 sec/batch
Epoch 33/40  Iteration 8846/11040 Training loss: 1.1930 0.4894 sec/batch
Epoch 33/40  Iteration 8847/11040 Training loss: 1.1922 0.4772 sec/batch
Epoch 33/40  Iteration 8848/11040 Training loss: 1.1917 0.4752 sec/batch
Epoch 33/40  Iteration 8849/11040 Training loss: 1.1888 0.4760 sec/batch
Epoch 33/40  Iteration 8850/11040 Training loss: 1.1867 0.4745 sec/batch
Epoch 33/40  Iteration 8851/11040 Training loss: 1.1859 0.4759 sec/batch
Epoch 33/40  Iteration 8852/11040 Training loss: 1.1855 0.4759 sec/batch
Epoch 33/40  Iteration 8853/11040 Training loss: 1.1871 0.4748 sec/batch
Epoch 33/40  Iteration 8854/11040 Training loss: 1.

Epoch 33/40  Iteration 8954/11040 Training loss: 1.1797 0.4745 sec/batch
Epoch 33/40  Iteration 8955/11040 Training loss: 1.1800 0.4766 sec/batch
Epoch 33/40  Iteration 8956/11040 Training loss: 1.1805 0.4742 sec/batch
Epoch 33/40  Iteration 8957/11040 Training loss: 1.1806 0.4607 sec/batch
Epoch 33/40  Iteration 8958/11040 Training loss: 1.1808 0.4742 sec/batch
Epoch 33/40  Iteration 8959/11040 Training loss: 1.1808 0.4738 sec/batch
Epoch 33/40  Iteration 8960/11040 Training loss: 1.1809 0.4773 sec/batch
Epoch 33/40  Iteration 8961/11040 Training loss: 1.1807 0.4743 sec/batch
Epoch 33/40  Iteration 8962/11040 Training loss: 1.1810 0.4754 sec/batch
Epoch 33/40  Iteration 8963/11040 Training loss: 1.1809 0.4732 sec/batch
Epoch 33/40  Iteration 8964/11040 Training loss: 1.1808 0.4603 sec/batch
Epoch 33/40  Iteration 8965/11040 Training loss: 1.1808 0.4771 sec/batch
Epoch 33/40  Iteration 8966/11040 Training loss: 1.1806 0.4719 sec/batch
Epoch 33/40  Iteration 8967/11040 Training loss: 1.

Epoch 33/40  Iteration 9066/11040 Training loss: 1.1855 0.4761 sec/batch
Epoch 33/40  Iteration 9067/11040 Training loss: 1.1856 0.4890 sec/batch
Epoch 33/40  Iteration 9068/11040 Training loss: 1.1857 0.4697 sec/batch
Epoch 33/40  Iteration 9069/11040 Training loss: 1.1857 0.4797 sec/batch
Epoch 33/40  Iteration 9070/11040 Training loss: 1.1858 0.4857 sec/batch
Epoch 33/40  Iteration 9071/11040 Training loss: 1.1859 0.4763 sec/batch
Epoch 33/40  Iteration 9072/11040 Training loss: 1.1860 0.4752 sec/batch
Epoch 33/40  Iteration 9073/11040 Training loss: 1.1860 0.4734 sec/batch
Epoch 33/40  Iteration 9074/11040 Training loss: 1.1861 0.4747 sec/batch
Epoch 33/40  Iteration 9075/11040 Training loss: 1.1860 0.4757 sec/batch
Epoch 33/40  Iteration 9076/11040 Training loss: 1.1861 0.4757 sec/batch
Epoch 33/40  Iteration 9077/11040 Training loss: 1.1860 0.4751 sec/batch
Epoch 33/40  Iteration 9078/11040 Training loss: 1.1861 0.4750 sec/batch
Epoch 33/40  Iteration 9079/11040 Training loss: 1.

Epoch 34/40  Iteration 9179/11040 Training loss: 1.1752 0.4735 sec/batch
Epoch 34/40  Iteration 9180/11040 Training loss: 1.1748 0.4615 sec/batch
Epoch 34/40  Iteration 9181/11040 Training loss: 1.1743 0.4737 sec/batch
Epoch 34/40  Iteration 9182/11040 Training loss: 1.1740 0.4766 sec/batch
Epoch 34/40  Iteration 9183/11040 Training loss: 1.1739 0.4750 sec/batch
Epoch 34/40  Iteration 9184/11040 Training loss: 1.1742 0.4763 sec/batch
Epoch 34/40  Iteration 9185/11040 Training loss: 1.1738 0.4726 sec/batch
Epoch 34/40  Iteration 9186/11040 Training loss: 1.1734 0.4759 sec/batch
Epoch 34/40  Iteration 9187/11040 Training loss: 1.1732 0.4749 sec/batch
Epoch 34/40  Iteration 9188/11040 Training loss: 1.1732 0.4758 sec/batch
Epoch 34/40  Iteration 9189/11040 Training loss: 1.1728 0.4613 sec/batch
Epoch 34/40  Iteration 9190/11040 Training loss: 1.1730 0.4738 sec/batch
Epoch 34/40  Iteration 9191/11040 Training loss: 1.1731 0.4767 sec/batch
Epoch 34/40  Iteration 9192/11040 Training loss: 1.

Epoch 34/40  Iteration 9292/11040 Training loss: 1.1787 0.4778 sec/batch
Epoch 34/40  Iteration 9293/11040 Training loss: 1.1788 0.4748 sec/batch
Epoch 34/40  Iteration 9294/11040 Training loss: 1.1788 0.4917 sec/batch
Epoch 34/40  Iteration 9295/11040 Training loss: 1.1791 0.4925 sec/batch
Epoch 34/40  Iteration 9296/11040 Training loss: 1.1791 0.4933 sec/batch
Epoch 34/40  Iteration 9297/11040 Training loss: 1.1790 0.4913 sec/batch
Epoch 34/40  Iteration 9298/11040 Training loss: 1.1790 0.4900 sec/batch
Epoch 34/40  Iteration 9299/11040 Training loss: 1.1789 0.4789 sec/batch
Epoch 34/40  Iteration 9300/11040 Training loss: 1.1789 0.4908 sec/batch
Epoch 34/40  Iteration 9301/11040 Training loss: 1.1789 0.4942 sec/batch
Epoch 34/40  Iteration 9302/11040 Training loss: 1.1790 0.4910 sec/batch
Epoch 34/40  Iteration 9303/11040 Training loss: 1.1792 0.4921 sec/batch
Epoch 34/40  Iteration 9304/11040 Training loss: 1.1790 0.4899 sec/batch
Epoch 34/40  Iteration 9305/11040 Training loss: 1.

Epoch 35/40  Iteration 9405/11040 Training loss: 1.1765 0.4782 sec/batch
Epoch 35/40  Iteration 9406/11040 Training loss: 1.1773 0.4879 sec/batch
Epoch 35/40  Iteration 9407/11040 Training loss: 1.1767 0.4752 sec/batch
Epoch 35/40  Iteration 9408/11040 Training loss: 1.1772 0.4771 sec/batch
Epoch 35/40  Iteration 9409/11040 Training loss: 1.1767 0.4804 sec/batch
Epoch 35/40  Iteration 9410/11040 Training loss: 1.1763 0.4891 sec/batch
Epoch 35/40  Iteration 9411/11040 Training loss: 1.1758 0.4864 sec/batch
Epoch 35/40  Iteration 9412/11040 Training loss: 1.1757 0.4764 sec/batch
Epoch 35/40  Iteration 9413/11040 Training loss: 1.1760 0.4770 sec/batch
Epoch 35/40  Iteration 9414/11040 Training loss: 1.1765 0.4748 sec/batch
Epoch 35/40  Iteration 9415/11040 Training loss: 1.1763 0.4752 sec/batch
Epoch 35/40  Iteration 9416/11040 Training loss: 1.1759 0.4919 sec/batch
Epoch 35/40  Iteration 9417/11040 Training loss: 1.1749 0.4898 sec/batch
Epoch 35/40  Iteration 9418/11040 Training loss: 1.

Epoch 35/40  Iteration 9517/11040 Training loss: 1.1702 0.4606 sec/batch
Epoch 35/40  Iteration 9518/11040 Training loss: 1.1700 0.4742 sec/batch
Epoch 35/40  Iteration 9519/11040 Training loss: 1.1703 0.4624 sec/batch
Epoch 35/40  Iteration 9520/11040 Training loss: 1.1703 0.4711 sec/batch
Epoch 35/40  Iteration 9521/11040 Training loss: 1.1702 0.4743 sec/batch
Epoch 35/40  Iteration 9522/11040 Training loss: 1.1703 0.4755 sec/batch
Epoch 35/40  Iteration 9523/11040 Training loss: 1.1705 0.4638 sec/batch
Epoch 35/40  Iteration 9524/11040 Training loss: 1.1706 0.4714 sec/batch
Epoch 35/40  Iteration 9525/11040 Training loss: 1.1707 0.4734 sec/batch
Epoch 35/40  Iteration 9526/11040 Training loss: 1.1708 0.4768 sec/batch
Epoch 35/40  Iteration 9527/11040 Training loss: 1.1708 0.4607 sec/batch
Epoch 35/40  Iteration 9528/11040 Training loss: 1.1707 0.4730 sec/batch
Epoch 35/40  Iteration 9529/11040 Training loss: 1.1708 0.4602 sec/batch
Epoch 35/40  Iteration 9530/11040 Training loss: 1.

Epoch 35/40  Iteration 9630/11040 Training loss: 1.1750 0.4750 sec/batch
Epoch 35/40  Iteration 9631/11040 Training loss: 1.1750 0.4731 sec/batch
Epoch 35/40  Iteration 9632/11040 Training loss: 1.1750 0.4788 sec/batch
Epoch 35/40  Iteration 9633/11040 Training loss: 1.1750 0.4740 sec/batch
Epoch 35/40  Iteration 9634/11040 Training loss: 1.1750 0.4744 sec/batch
Epoch 35/40  Iteration 9635/11040 Training loss: 1.1750 0.4753 sec/batch
Epoch 35/40  Iteration 9636/11040 Training loss: 1.1750 0.4753 sec/batch
Epoch 35/40  Iteration 9637/11040 Training loss: 1.1751 0.4745 sec/batch
Epoch 35/40  Iteration 9638/11040 Training loss: 1.1751 0.4787 sec/batch
Epoch 35/40  Iteration 9639/11040 Training loss: 1.1751 0.4736 sec/batch
Epoch 35/40  Iteration 9640/11040 Training loss: 1.1753 0.4740 sec/batch
Epoch 35/40  Iteration 9641/11040 Training loss: 1.1753 0.4768 sec/batch
Epoch 35/40  Iteration 9642/11040 Training loss: 1.1754 0.4726 sec/batch
Epoch 35/40  Iteration 9643/11040 Training loss: 1.

Epoch 36/40  Iteration 9743/11040 Training loss: 1.1624 0.4720 sec/batch
Epoch 36/40  Iteration 9744/11040 Training loss: 1.1622 0.4751 sec/batch
Epoch 36/40  Iteration 9745/11040 Training loss: 1.1620 0.4895 sec/batch
Epoch 36/40  Iteration 9746/11040 Training loss: 1.1618 0.4759 sec/batch
Epoch 36/40  Iteration 9747/11040 Training loss: 1.1611 0.4770 sec/batch
Epoch 36/40  Iteration 9748/11040 Training loss: 1.1600 0.4739 sec/batch
Epoch 36/40  Iteration 9749/11040 Training loss: 1.1600 0.4613 sec/batch
Epoch 36/40  Iteration 9750/11040 Training loss: 1.1601 0.4735 sec/batch
Epoch 36/40  Iteration 9751/11040 Training loss: 1.1601 0.4906 sec/batch
Epoch 36/40  Iteration 9752/11040 Training loss: 1.1599 0.4749 sec/batch
Epoch 36/40  Iteration 9753/11040 Training loss: 1.1597 0.4760 sec/batch
Epoch 36/40  Iteration 9754/11040 Training loss: 1.1600 0.4915 sec/batch
Epoch 36/40  Iteration 9755/11040 Training loss: 1.1605 0.4749 sec/batch
Epoch 36/40  Iteration 9756/11040 Training loss: 1.

Epoch 36/40  Iteration 9856/11040 Training loss: 1.1678 0.4725 sec/batch
Epoch 36/40  Iteration 9857/11040 Training loss: 1.1679 0.4770 sec/batch
Epoch 36/40  Iteration 9858/11040 Training loss: 1.1679 0.4744 sec/batch
Epoch 36/40  Iteration 9859/11040 Training loss: 1.1681 0.4753 sec/batch
Epoch 36/40  Iteration 9860/11040 Training loss: 1.1682 0.4617 sec/batch
Epoch 36/40  Iteration 9861/11040 Training loss: 1.1684 0.4747 sec/batch
Epoch 36/40  Iteration 9862/11040 Training loss: 1.1686 0.4899 sec/batch
Epoch 36/40  Iteration 9863/11040 Training loss: 1.1687 0.4750 sec/batch
Epoch 36/40  Iteration 9864/11040 Training loss: 1.1688 0.4921 sec/batch
Epoch 36/40  Iteration 9865/11040 Training loss: 1.1688 0.4800 sec/batch
Epoch 36/40  Iteration 9866/11040 Training loss: 1.1688 0.4860 sec/batch
Epoch 36/40  Iteration 9867/11040 Training loss: 1.1688 0.4790 sec/batch
Epoch 36/40  Iteration 9868/11040 Training loss: 1.1689 0.4851 sec/batch
Epoch 36/40  Iteration 9869/11040 Training loss: 1.

Epoch 37/40  Iteration 9969/11040 Training loss: 1.1652 0.4771 sec/batch
Epoch 37/40  Iteration 9970/11040 Training loss: 1.1652 0.4694 sec/batch
Epoch 37/40  Iteration 9971/11040 Training loss: 1.1650 0.4796 sec/batch
Epoch 37/40  Iteration 9972/11040 Training loss: 1.1657 0.4746 sec/batch
Epoch 37/40  Iteration 9973/11040 Training loss: 1.1656 0.4769 sec/batch
Epoch 37/40  Iteration 9974/11040 Training loss: 1.1649 0.4738 sec/batch
Epoch 37/40  Iteration 9975/11040 Training loss: 1.1644 0.4782 sec/batch
Epoch 37/40  Iteration 9976/11040 Training loss: 1.1647 0.4732 sec/batch
Epoch 37/40  Iteration 9977/11040 Training loss: 1.1644 0.4744 sec/batch
Epoch 37/40  Iteration 9978/11040 Training loss: 1.1634 0.4746 sec/batch
Epoch 37/40  Iteration 9979/11040 Training loss: 1.1637 0.4615 sec/batch
Epoch 37/40  Iteration 9980/11040 Training loss: 1.1628 0.4788 sec/batch
Epoch 37/40  Iteration 9981/11040 Training loss: 1.1622 0.4658 sec/batch
Epoch 37/40  Iteration 9982/11040 Training loss: 1.

Epoch 37/40  Iteration 10080/11040 Training loss: 1.1633 0.4750 sec/batch
Epoch 37/40  Iteration 10081/11040 Training loss: 1.1633 0.4735 sec/batch
Epoch 37/40  Iteration 10082/11040 Training loss: 1.1633 0.4774 sec/batch
Epoch 37/40  Iteration 10083/11040 Training loss: 1.1633 0.4746 sec/batch
Epoch 37/40  Iteration 10084/11040 Training loss: 1.1633 0.4746 sec/batch
Epoch 37/40  Iteration 10085/11040 Training loss: 1.1632 0.4839 sec/batch
Epoch 37/40  Iteration 10086/11040 Training loss: 1.1633 0.4666 sec/batch
Epoch 37/40  Iteration 10087/11040 Training loss: 1.1632 0.4731 sec/batch
Epoch 37/40  Iteration 10088/11040 Training loss: 1.1633 0.4744 sec/batch
Epoch 37/40  Iteration 10089/11040 Training loss: 1.1635 0.4764 sec/batch
Epoch 37/40  Iteration 10090/11040 Training loss: 1.1636 0.4770 sec/batch
Epoch 37/40  Iteration 10091/11040 Training loss: 1.1637 0.4737 sec/batch
Epoch 37/40  Iteration 10092/11040 Training loss: 1.1637 0.4743 sec/batch
Epoch 37/40  Iteration 10093/11040 Tra

Epoch 37/40  Iteration 10191/11040 Training loss: 1.1667 0.4743 sec/batch
Epoch 37/40  Iteration 10192/11040 Training loss: 1.1667 0.4745 sec/batch
Epoch 37/40  Iteration 10193/11040 Training loss: 1.1668 0.4764 sec/batch
Epoch 37/40  Iteration 10194/11040 Training loss: 1.1668 0.4924 sec/batch
Epoch 37/40  Iteration 10195/11040 Training loss: 1.1668 0.4890 sec/batch
Epoch 37/40  Iteration 10196/11040 Training loss: 1.1669 0.4879 sec/batch
Epoch 37/40  Iteration 10197/11040 Training loss: 1.1669 0.4793 sec/batch
Epoch 37/40  Iteration 10198/11040 Training loss: 1.1671 0.4822 sec/batch
Epoch 37/40  Iteration 10199/11040 Training loss: 1.1672 0.4807 sec/batch
Epoch 37/40  Iteration 10200/11040 Training loss: 1.1673 0.4875 sec/batch
Epoch 37/40  Iteration 10201/11040 Training loss: 1.1673 0.4780 sec/batch
Epoch 37/40  Iteration 10202/11040 Training loss: 1.1674 0.4979 sec/batch
Epoch 37/40  Iteration 10203/11040 Training loss: 1.1676 0.4765 sec/batch
Epoch 37/40  Iteration 10204/11040 Tra

Epoch 38/40  Iteration 10302/11040 Training loss: 1.1498 0.4751 sec/batch
Epoch 38/40  Iteration 10303/11040 Training loss: 1.1498 0.4756 sec/batch
Epoch 38/40  Iteration 10304/11040 Training loss: 1.1497 0.4905 sec/batch
Epoch 38/40  Iteration 10305/11040 Training loss: 1.1494 0.4766 sec/batch
Epoch 38/40  Iteration 10306/11040 Training loss: 1.1498 0.4879 sec/batch
Epoch 38/40  Iteration 10307/11040 Training loss: 1.1502 0.4785 sec/batch
Epoch 38/40  Iteration 10308/11040 Training loss: 1.1505 0.4903 sec/batch
Epoch 38/40  Iteration 10309/11040 Training loss: 1.1510 0.4747 sec/batch
Epoch 38/40  Iteration 10310/11040 Training loss: 1.1515 0.4758 sec/batch
Epoch 38/40  Iteration 10311/11040 Training loss: 1.1518 0.4738 sec/batch
Epoch 38/40  Iteration 10312/11040 Training loss: 1.1516 0.4764 sec/batch
Epoch 38/40  Iteration 10313/11040 Training loss: 1.1518 0.4746 sec/batch
Epoch 38/40  Iteration 10314/11040 Training loss: 1.1521 0.4750 sec/batch
Epoch 38/40  Iteration 10315/11040 Tra

Epoch 38/40  Iteration 10413/11040 Training loss: 1.1590 0.4893 sec/batch
Epoch 38/40  Iteration 10414/11040 Training loss: 1.1592 0.4750 sec/batch
Epoch 38/40  Iteration 10415/11040 Training loss: 1.1593 0.4754 sec/batch
Epoch 38/40  Iteration 10416/11040 Training loss: 1.1594 0.4763 sec/batch
Epoch 38/40  Iteration 10417/11040 Training loss: 1.1593 0.4764 sec/batch
Epoch 38/40  Iteration 10418/11040 Training loss: 1.1593 0.4761 sec/batch
Epoch 38/40  Iteration 10419/11040 Training loss: 1.1594 0.4900 sec/batch
Epoch 38/40  Iteration 10420/11040 Training loss: 1.1594 0.4910 sec/batch
Epoch 38/40  Iteration 10421/11040 Training loss: 1.1593 0.4760 sec/batch
Epoch 38/40  Iteration 10422/11040 Training loss: 1.1592 0.4754 sec/batch
Epoch 38/40  Iteration 10423/11040 Training loss: 1.1592 0.4917 sec/batch
Epoch 38/40  Iteration 10424/11040 Training loss: 1.1593 0.4931 sec/batch
Epoch 38/40  Iteration 10425/11040 Training loss: 1.1592 0.4888 sec/batch
Epoch 38/40  Iteration 10426/11040 Tra

Epoch 39/40  Iteration 10524/11040 Training loss: 1.1593 0.4770 sec/batch
Epoch 39/40  Iteration 10525/11040 Training loss: 1.1592 0.4723 sec/batch
Epoch 39/40  Iteration 10526/11040 Training loss: 1.1586 0.4726 sec/batch
Epoch 39/40  Iteration 10527/11040 Training loss: 1.1580 0.4608 sec/batch
Epoch 39/40  Iteration 10528/11040 Training loss: 1.1579 0.4733 sec/batch
Epoch 39/40  Iteration 10529/11040 Training loss: 1.1575 0.4756 sec/batch
Epoch 39/40  Iteration 10530/11040 Training loss: 1.1565 0.4757 sec/batch
Epoch 39/40  Iteration 10531/11040 Training loss: 1.1565 0.4734 sec/batch
Epoch 39/40  Iteration 10532/11040 Training loss: 1.1555 0.4741 sec/batch
Epoch 39/40  Iteration 10533/11040 Training loss: 1.1548 0.4775 sec/batch
Epoch 39/40  Iteration 10534/11040 Training loss: 1.1539 0.4737 sec/batch
Epoch 39/40  Iteration 10535/11040 Training loss: 1.1529 0.4754 sec/batch
Epoch 39/40  Iteration 10536/11040 Training loss: 1.1529 0.4746 sec/batch
Epoch 39/40  Iteration 10537/11040 Tra

Epoch 39/40  Iteration 10635/11040 Training loss: 1.1509 0.4756 sec/batch
Epoch 39/40  Iteration 10636/11040 Training loss: 1.1510 0.4757 sec/batch
Epoch 39/40  Iteration 10637/11040 Training loss: 1.1509 0.4891 sec/batch
Epoch 39/40  Iteration 10638/11040 Training loss: 1.1509 0.4771 sec/batch
Epoch 39/40  Iteration 10639/11040 Training loss: 1.1508 0.4743 sec/batch
Epoch 39/40  Iteration 10640/11040 Training loss: 1.1510 0.4752 sec/batch
Epoch 39/40  Iteration 10641/11040 Training loss: 1.1511 0.4746 sec/batch
Epoch 39/40  Iteration 10642/11040 Training loss: 1.1513 0.4757 sec/batch
Epoch 39/40  Iteration 10643/11040 Training loss: 1.1513 0.4746 sec/batch
Epoch 39/40  Iteration 10644/11040 Training loss: 1.1513 0.4617 sec/batch
Epoch 39/40  Iteration 10645/11040 Training loss: 1.1512 0.4735 sec/batch
Epoch 39/40  Iteration 10646/11040 Training loss: 1.1513 0.4753 sec/batch
Epoch 39/40  Iteration 10647/11040 Training loss: 1.1515 0.4762 sec/batch
Epoch 39/40  Iteration 10648/11040 Tra

Epoch 39/40  Iteration 10746/11040 Training loss: 1.1544 0.4744 sec/batch
Epoch 39/40  Iteration 10747/11040 Training loss: 1.1544 0.4765 sec/batch
Epoch 39/40  Iteration 10748/11040 Training loss: 1.1545 0.4742 sec/batch
Epoch 39/40  Iteration 10749/11040 Training loss: 1.1546 0.4604 sec/batch
Epoch 39/40  Iteration 10750/11040 Training loss: 1.1548 0.4731 sec/batch
Epoch 39/40  Iteration 10751/11040 Training loss: 1.1549 0.4745 sec/batch
Epoch 39/40  Iteration 10752/11040 Training loss: 1.1550 0.4604 sec/batch
Epoch 39/40  Iteration 10753/11040 Training loss: 1.1550 0.4749 sec/batch
Epoch 39/40  Iteration 10754/11040 Training loss: 1.1552 0.4752 sec/batch
Epoch 39/40  Iteration 10755/11040 Training loss: 1.1554 0.4751 sec/batch
Epoch 39/40  Iteration 10756/11040 Training loss: 1.1554 0.4588 sec/batch
Epoch 39/40  Iteration 10757/11040 Training loss: 1.1555 0.4762 sec/batch
Epoch 39/40  Iteration 10758/11040 Training loss: 1.1556 0.4733 sec/batch
Epoch 39/40  Iteration 10759/11040 Tra

Epoch 40/40  Iteration 10857/11040 Training loss: 1.1390 0.4742 sec/batch
Epoch 40/40  Iteration 10858/11040 Training loss: 1.1393 0.4770 sec/batch
Epoch 40/40  Iteration 10859/11040 Training loss: 1.1397 0.4760 sec/batch
Epoch 40/40  Iteration 10860/11040 Training loss: 1.1400 0.4723 sec/batch
Epoch 40/40  Iteration 10861/11040 Training loss: 1.1406 0.4755 sec/batch
Epoch 40/40  Iteration 10862/11040 Training loss: 1.1409 0.4763 sec/batch
Epoch 40/40  Iteration 10863/11040 Training loss: 1.1412 0.4901 sec/batch
Epoch 40/40  Iteration 10864/11040 Training loss: 1.1410 0.4774 sec/batch
Epoch 40/40  Iteration 10865/11040 Training loss: 1.1412 0.4722 sec/batch
Epoch 40/40  Iteration 10866/11040 Training loss: 1.1415 0.4763 sec/batch
Epoch 40/40  Iteration 10867/11040 Training loss: 1.1412 0.4613 sec/batch
Epoch 40/40  Iteration 10868/11040 Training loss: 1.1415 0.4725 sec/batch
Epoch 40/40  Iteration 10869/11040 Training loss: 1.1414 0.4903 sec/batch
Epoch 40/40  Iteration 10870/11040 Tra

Epoch 40/40  Iteration 10968/11040 Training loss: 1.1492 0.4595 sec/batch
Epoch 40/40  Iteration 10969/11040 Training loss: 1.1492 0.4755 sec/batch
Epoch 40/40  Iteration 10970/11040 Training loss: 1.1492 0.4719 sec/batch
Epoch 40/40  Iteration 10971/11040 Training loss: 1.1492 0.4770 sec/batch
Epoch 40/40  Iteration 10972/11040 Training loss: 1.1493 0.4749 sec/batch
Epoch 40/40  Iteration 10973/11040 Training loss: 1.1491 0.4746 sec/batch
Epoch 40/40  Iteration 10974/11040 Training loss: 1.1491 0.4639 sec/batch
Epoch 40/40  Iteration 10975/11040 Training loss: 1.1491 0.4691 sec/batch
Epoch 40/40  Iteration 10976/11040 Training loss: 1.1491 0.4762 sec/batch
Epoch 40/40  Iteration 10977/11040 Training loss: 1.1491 0.4752 sec/batch
Epoch 40/40  Iteration 10978/11040 Training loss: 1.1491 0.4740 sec/batch
Epoch 40/40  Iteration 10979/11040 Training loss: 1.1490 0.4743 sec/batch
Epoch 40/40  Iteration 10980/11040 Training loss: 1.1489 0.4753 sec/batch
Epoch 40/40  Iteration 10981/11040 Tra

Here is the loss graph,
<img src="assets/loss_graph.JPG" width="900">

### Test
No test here, because I'm not predicting house prices. The model is to generate, not to predict.

In [7]:
# list all checkpoints
tf.train.get_checkpoint_state('checkpoints/{}'.format(folder_name))

model_checkpoint_path: "checkpoints/memesljuirmoblteovanpaicmaanexdr\\i11040_l512_1.233.ckpt"
all_model_checkpoint_paths: "checkpoints/memesljuirmoblteovanpaicmaanexdr\\i500_l512_1.944.ckpt"
all_model_checkpoint_paths: "checkpoints/memesljuirmoblteovanpaicmaanexdr\\i1000_l512_1.637.ckpt"
all_model_checkpoint_paths: "checkpoints/memesljuirmoblteovanpaicmaanexdr\\i1500_l512_1.501.ckpt"
all_model_checkpoint_paths: "checkpoints/memesljuirmoblteovanpaicmaanexdr\\i2000_l512_1.428.ckpt"
all_model_checkpoint_paths: "checkpoints/memesljuirmoblteovanpaicmaanexdr\\i2500_l512_1.378.ckpt"
all_model_checkpoint_paths: "checkpoints/memesljuirmoblteovanpaicmaanexdr\\i3000_l512_1.353.ckpt"
all_model_checkpoint_paths: "checkpoints/memesljuirmoblteovanpaicmaanexdr\\i3500_l512_1.333.ckpt"
all_model_checkpoint_paths: "checkpoints/memesljuirmoblteovanpaicmaanexdr\\i4000_l512_1.313.ckpt"
all_model_checkpoint_paths: "checkpoints/memesljuirmoblteovanpaicmaanexdr\\i4500_l512_1.298.ckpt"
all_model_checkpoint_path

In [17]:
checkpoint = "checkpoints/{}/i10500_l512_1.231.ckpt".format(folder_name)
samp = sample(checkpoint, 700, lstm_size, len(vocab), vocab_to_int, vocab, int_to_vocab, prime="The ", top_n=3)
print(samp)

The Skin And The Stand
The Painkiller
The core was all the truth

There's no tomorrow, will never live to stop
The world is a conscience



Trapped us to the blood and the son and stare
The sun is calling to you
And the street of tomorrow's bleeding and darkness
Where the cores of the desires too much
I would be a shot or start to stone by the sky

I walk the earth to strength when the world's so falling
In the stars too long to the streets
And the fallen calls come for the same as they say the will to see

In my heart, I was a lie
The face that I see to save my sanity

I can't see my hands around me

I'm a most soul is all I want and I don't wanna see me anymore


There are the same and then the


In [27]:
checkpoint = "checkpoints/{}/i10020_l512_0.682.ckpt".format(folder_name)
samp = sample(checkpoint, 700, lstm_size, len(vocab), vocab_to_int, vocab, int_to_vocab, prime="Highway to Hell")
print(samp)

Highway to Hell
I am running ran away

In the night that search out
And the life I do is real
The sin I bring you where they soul
That he's suck a shame of someone else
I did to go down the fight
But they arconged straight one disgrace

I am the face that I'm going to
The fire in bed with someone else around

I walk always that you don't know it
If I can't take it any more again
The world is goena look behind all the storm
Why are you're feelings
In my sense that someone should haad
I'll never live the other day
It's my godna take that wounds
In the heart and that's all the way
This is the feot your sins of minutes
This is my war is all you'd die

And I'm going to be tile the proce is where you call
All thi


### Source
- [Udacity Deep Learning](https://github.com/udacity/deep-learning)  
- [Mat Leonard](https://github.com/mcleonard)