## RNN 创作古诗
在这一章中我们了解到循环神经网络非常擅长处理序列和自然语言处理，文本都是由单词或者汉字按照序列顺序组成的，那么如何能够生成文本呢？下面我们来讲一讲原理，需要你根据这个原理来实现整个网络。

### 原理介绍
前面我们介绍过 RNN 的输入和输出存在多种关系，比如多个输入对一个输出，这个时候输入是一个序列，输出是一个分类结果，就像使用 RNN 做图像分类。

这里我们使用 RNN 来生成文本，网络的输入是一个序列，同时输出也是一个相同长度的序列，结构如下

<img src=https://ws1.sinaimg.cn/large/006tNc79gy1fob5kq3r8jj30mt09dq2r.jpg width=700>

在上面的网络流程中，输入是一个序列 "床 前 明 月 光"，输出也是一个序列 "前 明 月 光 床"。如果你仔细观察可以发现网络的每一步输出都是下一步的输入，这就是其设计思路。

那么对于任意的一段话，比如 "我喜欢小猫"，我们可以将其拆分 "我 喜 欢 小 猫" 这个长度为 5 的序列，网络的每一步输出就是 "喜 欢 小 猫 我"，也就是每个字符的输出就是其**紧跟**的后一个字符。

当然对于一个序列，其最后一个字符后面并没有其他的字符，所以有多种方式选择，比如将序列的第一个字符作为其输出，也就是 "光" 的输出是 "床"，或者将其本身作为输出，也就是 "光" 的输出是 "光"，这里的选择可以有很多，我们使用一种循环的连接，将第一个字符作为最后一个字符的输出。

### 生成文本
这样设计网络的训练流程是为了非常好地生成文本，下面我们说明一下如何进行文本的生成。

首先需要输入网络一段初始的序列进行预热，预热的过程并不需要实际的输出结果，只是为了生成拥有记忆效果的隐藏状态，并将隐藏状态保留下来，接着我们开始正式生成文本，每个字符作为输入都可以得到输出，然后将输出作为下一步的输入，这样就可以不断地生成新的句子，这个过程是可以无限循环下去，或者到达我们的要求输出长度，具体可以看看下面的图示

<img src=https://ws2.sinaimg.cn/large/006tNc79gy1fob5z06w1uj30qh09m0sl.jpg width=800>

讲完了原理之后，下面就该你亲自动手去实现这个网络

In [1]:
from __future__ import print_function
from __future__ import division
from __future__ import absolute_import

import tensorflow as tf

首先我们可以探索一下数据集是什么样的

In [2]:
with open('./dataset/poetry.txt', 'r') as f:
    poetry_corpus = f.read()

我们取得了前100个字符的结果，其中 `\n` 表示换行符

In [3]:
poetry_corpus[:100]

'寒随穷律变，春逐鸟声开。\n初风飘带柳，晚雪间花梅。\n碧林青旧竹，绿沼翠新苔。\n芝田初雁去，绮树巧莺来。\n晚霞聊自怡，初晴弥可喜。\n日晃百花色，风动千林翠。\n池鱼跃不同，园鸟声还异。\n寄言博通者，知予物'

In [4]:
# 看看字符数
print('总的字符数: {}'.format(len(poetry_corpus)))

总的字符数: 942681


为了可视化比较方便，我们将换行字符 `\n` 替换成空格

In [5]:
poetry_corpus = poetry_corpus.replace('\n', ' ').replace('\r', ' ').replace('，', ' ').replace('。', ' ')
print(poetry_corpus[:100])

寒随穷律变 春逐鸟声开  初风飘带柳 晚雪间花梅  碧林青旧竹 绿沼翠新苔  芝田初雁去 绮树巧莺来  晚霞聊自怡 初晴弥可喜  日晃百花色 风动千林翠  池鱼跃不同 园鸟声还异  寄言博通者 知予物


### 文本数值表示
对于每个文字，电脑并不能像人一样能够有效地识别，所以必须做一个转换，将文字转换成电脑能够识别的数字，相当于每个不同的汉字，都用不同的数字去表示，可以对所有非重复的字符，从 0 开始建立索引

同时可能古诗中会出现一些生僻的字，这些字可能只会出现几次，甚至只会出现一次，引入这些字会增大模型的复杂度，同时也会影响模型的训练，可以将这些词频比较低的字去掉

关于汉字和数字的转换，我们已经为你实现好了一个转换器，感兴趣的同学可以去 `utils.py` 中查看，在之后的练习中，你可以使用这个转换器进行生成文本的转换，下面我们先看看例子

In [6]:
import numpy as np
from utils import TextConverter

In [7]:
convert = TextConverter('./dataset/poetry.txt', max_vocab=10000)


上面我们通过数据集建立好了这个转换器 `convert`，下面我们看看如何去调用

In [8]:
# 得到原始的文本结果
txt_char = poetry_corpus[:11]
print('原始的文本结果: {}'.format(txt_char))
print()

# 通过 convert 将文字转换成数字
num_char = convert.text_to_arr(txt_char)
print('转换成数字之后的结果: {}'.format(num_char))
print()

# 通过 convert 将数字转换成文字
origin_txt_char = convert.arr_to_text(num_char)
print('将数字重新转换成文字: {}'.format(origin_txt_char))

原始的文本结果: 寒随穷律变 春逐鸟声开

转换成数字之后的结果: [ 40 166 358 935 565   0  10 367 108  63  78]

将数字重新转换成文字: 寒随穷律变 春逐鸟声开


通过上面的例子，你可以看到，能够使用 `convert.text_to_arr` 对一个文本进行数字的转换，通过 `convert.arr_to_text` 将数字转换成文本 

### 构造时序样本数据
对于一整段文本，并不适合全部输入到循环神经网络中，因为我们前面了解到循环神经网络存在着长时依赖的问题，所以需要将整个文本分成很多个序列文本，然后将这些序列文本输入到循环神经网络中进行训练，只要我们定好每个序列的长度，那么序列个数也就被决定了。

In [9]:
# 每个序列的长度，你可以自行修改
n_step = 20

# 总的序列个数
num_seq = int(len(poetry_corpus) / n_step)

# 去掉最后不足一个序列长度的部分
text = poetry_corpus[:num_seq*n_step]

print('序列的个数: {}'.format(num_seq))

序列的个数: 47134


接着需要将序列中所有的文字转换成数字表示，同时重新排列成 **$(num\_seq \times n\_step)$** 的矩阵

完成下面的 `#todo` 的部分

In [10]:
arr = convert.text_to_arr(text) #todo: 使用 convert 将文本 text 转换成数字表示的数组
arr = arr.reshape(num_seq, n_step) #todo: 将转换之后的数组重新排列成 (num_seq x n_step) 的形状
arr = arr.astype(np.int32)

In [11]:
# 不要修改下面的代码
# ================== test =================
if arr.shape == (num_seq, n_step):
    print('Successful!')
else:
    print('Failed!')

Successful!


据此，我们可以构建 Tensorflow 中的数据读取来训练网络，这里我们将最后一个字符的输出 label 定为输入的第一个字符，也就是"床前明月光"的输出是"前明月光床"，完成下面 #todo 的部分

In [12]:
class TextDataset(object):
    def __init__(self, arr):
        self.arr = arr
        
    def __getitem__(self, item):
        #TODO: 取得 arr 中的 item 这一个序列
        x = self.arr[item, :]

        #TODO: 构造上述描述的 label
        y = np.zeros_like(x)
#         y = torch.zeros(x.size())
#         y = np.hstack((x[1:20], x[:1]))
        y[:-1], y[-1] = x[1:], x[0]

        return x, y
    
    def __len__(self):
        return self.arr.shape[0]

如果你构造好了这个数据集类，我们可以将其实例化

In [13]:
train_set = TextDataset(arr)

下面我们可以取出其中一个数据集参看一下是否是我们描述的这样，这个数据集需要像上面描述的一样，请自行检查

In [14]:
x, y = train_set[0]
# for x, y in train_set:
#     print('输入的文字序列 x: {}'.format(convert.arr_to_text(x)))
#     print('输出的文字序列 y: {}'.format(convert.arr_to_text(y)))
print("type(y):", type(y))
print('输入的文字序列 x: {}'.format(convert.arr_to_text(x)))
print('输出的文字序列 y: {}'.format(convert.arr_to_text(y)))

type(y): <class 'numpy.ndarray'>
输入的文字序列 x: 寒随穷律变 春逐鸟声开  初风飘带柳 晚
输出的文字序列 y: 随穷律变 春逐鸟声开  初风飘带柳 晚寒


### 建立模型
下面我们需要构建这个循环神经网路的网络结构，模型可以定义成非常简单的两层
- 第一层是 RNN 层, **LSTM (GRU)**
- 第二层是线性层，做分类问题，最后输出预测的字符 **slim.fully_connected**

只需要按照提示填写下面的 #todo 部分

- 构造输入

    首先构造一些placeholder作为网络的输入,方便之后代入数据, 需要构建的是:
    - inputs: placeholder, 接收 `[batch_size, n_step]` 的输入, 是输入的词
    - targets: placeholder, 接收 `[batch_size, n_step]` 的输入, 是输入词的对应词, 也就是 label
    - keep_prob: placeholder, 用来表示 dropout 的保留概率

In [15]:
def build_inputs(batch_size, n_step):
    '''
    
    args:
        batch_size: 一个批次中有多少个序列输入
        n_steps: 一个序列中有多少个词
        
    return:
        inputs: 输入的词
        targets: 输入词的对应词
        keep_prob: dropout 保留概率
    '''
    inputs = tf.placeholder(shape=(batch_size, n_step), dtype=tf.int32)
    targets = tf.placeholder(shape=(batch_size, n_step), dtype=tf.int32)
    keep_prob = tf.placeholder(name='keep_prob', dtype=tf.float32)
    
    return inputs, targets, keep_prob

- 构造 RNN

    然后我们开始构造 RNN, 将一个序列中的每个词产生一个输出词. 

    这里我们可以构造一个多层的 RNN, 可以使用 LSTM 或者 GRU 作为 RNN 的基本单元.

In [16]:
def build_rnn(hidden_size, num_layers, batch_size, keep_prob):
    '''
    
    args:
        keep_prob: dropout 保留概率
        hidden_size: RNN 隐藏层大小
        num_layers: RNN 隐藏层个数
        batch_size: batch size

    return:
        cell: RNN cell
        initial_state: RNN输入时的初始状态
    '''
    
    def build_cell(hidden_size, keep_prob):
        #todo: 得到一个 rnn cell, 可以是 rnn 或者 lstm 或者 gru
        lstm = tf.contrib.rnn.BasicLSTMCell(num_units=hidden_size)
        
        #todo: 添加 dropout
        lstm_drop = tf.nn.rnn_cell.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        
        return lstm_drop
    
    
    #todo: 得到一个多层的 rnn cell
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(hidden_size, keep_prob) for _ in range(num_layers)])
    
    #todo: 得到 cell 的初始状态
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, initial_state

- 构造分类层

    现在我们要构造以 RNN 输出的结果为输入, 一个词为输出的全连接层, 这是一个分类问题, 有多少个词就是多少分类

In [17]:
import tensorflow.contrib.slim as slim

def build_output(rnn_out, in_size, out_size):
    '''
    
    args:
        rnn_out: 上一步rnn的输出
        in_size: rnn输出的特征个数
        out_size: 词的总个数(分类数)
    
    return:
        out: 输出词的概率向量
        logits: softmax之前的结果
    '''

    #todo: rnn_out 的形状是 (batch, n_step, rnn_size), 将形状改成 (batch x n_step, rnn_size)
    # 变成一个2阶矩阵才可以参与到下一步的分类层
#     seq_output = tf.concat(rnn_out, axis=1)
    x = tf.reshape(rnn_out, [-1, in_size])
    
    #todo: 一个全连阶层作为分类层
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        # Create the weight and bias variables here
        softmax_w = tf.Variable(tf.truncated_normal((in_size, out_size), stddev=0.1))
        softmax_b = tf.Variable(tf.zeros(out_size))
        
    # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
    # of rows of logit outputs, one for each step and sequence
    logits = tf.matmul(x, softmax_w) + softmax_b
    
    #todo: softmax 得到概率
    out = tf.nn.softmax(logits, name='predictions')
    
    return out, logits,softmax_w,softmax_b

- 构造损失函数

    这是一个分类问题, 因此我们使用 softmax_with_logits 作为损失函数

In [18]:
def build_loss(logits, targets, lstm_size, num_classes, softmax_w, softmax_b):
    '''
    
    args:
        logits: softmax之前的结果
        targets: 目标词
        lstm_size: Number of LSTM hidden units
        num_classes: 词的总个数(分类数)
    
    return:
        loss: loss tensor.
    '''
    
    y_one_hot = tf.one_hot(targets, num_classes)
    y_reshaped = tf.reshape(y_one_hot, logits.get_shape())
    
    #todo: softmax 分类损失函数
    # Softmax cross entropy loss
    loss = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_reshaped, logits=logits)
    loss = tf.reduce_mean(loss)
    
    return loss

- 构造训练过程

    接下来我们需要构造训练过程. 前面说到过, RNN 经常会遇到梯度爆炸的问题, 但有一种方法可以避免这类问题, 就是"梯度裁剪".
    
    tensorflow 可以通过**`tf.clip_by_global_norm(tensors, grad_clip)`**函数对`tensors`进行梯度裁剪. 
    
    形象地说, 如果大于`grad_clip`就会将大于的部分剪掉, 这样操作之后, 所有的`tensors`都比`grad_clip`要小, 也就不会存在爆炸的问题了.
    
    因此在这里我们会用到这个方法, 那么我们就不再使用最简单的`optimizer.minimize`这个函数去构造训练过程了, 需要把这个过程拆开, 具体来说分为下面几步:
    
        - 计算所有可训练变量的梯度
        - 对所有梯度进行裁剪
        - 再将梯度应用到原来的变量上去
        
    第一步和第二步应该都知道如何去做, 第三步需要用到一个全新的函数, **`optimizer.apply_gradients`**. 
    
    `optimizer`就是前面我们定义的比如说梯度下降法方法, Momentum方法, Adam方法等等优化器, 每个优化器都有`apply_gradients`方法, 这里不具体展开如何使用这个函数, 大家可以查看下面的函数说明或者参考[这里](https://tensorflow.google.cn/api_docs/python/tf/train/AdamOptimizer#apply_gradients)

---

<img src="https://image.ibb.co/dx7cRn/apply_gradient.png" alt="apply gradient" border="0" />

In [19]:
def build_optimizer(loss, learning_rate, grad_clip):
    '''
    
    args:
        loss: loss tensor
        learning_rate: 学习率
    
    return:
        optimizer: 优化方法
    '''

    #todo: 获取所有的可训练变量
    tvars = tf.trainable_variables()
    
    #todo: 获取 loss 对 tvars 的梯度
    grads = tf.gradients(loss, tvars)
    
    #todo: 使用 tf.clip_by_global_norm 进行梯度裁剪
    grads_clipped, _ = tf.clip_by_global_norm(grads, grad_clip)
#     grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), 5)
    
    #todo: 生成一个 Adam 优化器
    train_op = tf.train.AdamOptimizer(learning_rate)
    
    #todo: 使用 apply_gradients 生成一个参数更新 op
    optimizer = train_op.apply_gradients(zip(grads_clipped, tvars))
    
    return optimizer

- 构建完整的 **CharRNN**

In [20]:
class CharRNN:
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       rnn_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, sampling=False):
        '''
        
        args:
            num_classes: 分类数, 也就是字符总数
            batch_size: batch size
            num_steps: 一个序列中出现的字符个数
            rnn_size: rnn 的隐藏层大小
            num_layers: rnn 中的隐藏层个数
            learning_rate: 学习率
            grad_clip: 梯度裁剪常数
            sampling: 是否进行采样
            
        '''
        # 之后我们用这个网络进行 inference 的时候, 我们会传入一个字符进来, 而不是训练时候的
        # 传入 n_step 个字符, 因此在这里用 sampling 来控制
        if sampling == True:
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps

        tf.reset_default_graph()
        
        with tf.name_scope('Inputs'):
        # Build the input placeholder tensors
            self.inputs, self.targets, self.keep_prob = build_inputs(batch_size, num_steps)
            
        #todo: 构建输入
        with tf.name_scope('Lstm'):
            cell, self.initial_state = build_rnn(rnn_size, num_layers, batch_size, self.keep_prob)
            
#         self.inputs, self.targets, self.keep_prob = tf.placeholder(tf.int32, [batch_size, None]),
#         tf.placeholder(tf.int32, [batch_size, None]),
#         tf.placeholder(tf.float32, name='keep_prob')

#         #todo: 构建RNN的cell
#         lstm = tf.contrib.rnn.BasicLSTMCell(num_units=rnn_size,state_is_tuple=True)     
#         #todo: 添加 dropout
#         lstm = tf.nn.rnn_cell.DropoutWrapper(lstm, output_keep_prob=keep_prob)   
#         #todo: 得到一个多层的 rnn cell
#         cell = tf.contrib.rnn.MultiRNNCell([lstm] * num_layers,state_is_tuple=True)
#         cell, self.initial_state = cell, cell.zero_state(batch_size, tf.float32)

        ### 用RNN跑一遍输入得到输出
        # 首先将输入转化成one_hot形式, 相当于给字符编码
        # 这里你也可以使用我们之前讲过的 word_embedding, 将字符嵌入到一个向量里面去
        x_one_hot = tf.one_hot(self.inputs, num_classes)
        
        #todo: 运行RNN得到输出和最终状态(提示: 使用 tf.nn.dynamic_rnn)
        outputs, state = tf.nn.dynamic_rnn(cell,x_one_hot,initial_state=self.initial_state)
        
        #todo: 将最后的状态保存在 final_state 中
        self.final_state = state
        
        #todo: 得到分类层的结果
        self.prediction, self.logits, self.softmax_w,self.softmax_b = build_output(outputs, rnn_size, num_classes)
        
        #todo: 得到损失函数
        # Loss and optimizer (with gradient clipping)
        with tf.name_scope('Loss'):
            self.loss = build_loss(self.logits, self.targets, rnn_size, num_classes,self.softmax_w,self.softmax_b)
        
        #todo: 得到优化算子
        with tf.name_scope('SGD'):
            self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)

接下来我们再定义网络的参数

In [21]:
# batch_size = None        # batch_size
# rnn_size = None          # rnn中隐藏层的大小
# num_layers = None        # rnn中隐藏层的个数
# learning_rate = None     # 学习率
# keep_prob = None         # dropout保留概率

batch_size = 256    # Sequences per batch
num_steps = 20    # Number of sequence steps per batch
rnn_size = 512     # Size of hidden layers in LSTMs
num_layers = 2         # Number of LSTM layers
learning_rate = 0.003    # Learning rate
keep_prob = 0.5     # Dropout keep probability

使用上面的`CharRNN`构造model

In [22]:
#todo
model =CharRNN(convert.vocab_size, batch_size=batch_size, num_steps=num_steps,
                rnn_size=rnn_size, num_layers=num_layers, 
                learning_rate=learning_rate)

In [23]:
epochs = 100

# Print losses every N interations
print_every_n = 1

# Save every N iterations
save_every_n = 1000

下面构造一个读取数据的`generator`, 也可以自行实现

In [24]:
import random

class build_data_generator:
    def __init__(self, data, batch_size, shuffle=False):
        self.data = data
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.nb_of_examples = len(data)
    
    def __call__(self):
        ind = 0
        indices = list(range(self.nb_of_examples))
        
        if self.shuffle:
            random.shuffle(indices)

        while ind + self.batch_size <= self.nb_of_examples:
            x, y = self.data[ind: ind + batch_size]
            ind += batch_size

            yield x, y

        return
    
    def __len__(self):
        return self.nb_of_examples // batch_size

In [25]:
import time

开始训练

In [26]:
saver = tf.train.Saver(max_to_keep=50)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    counter = 0
    
    for e in range(epochs):
        #todo: 
        #用 sess 去得到初始 state 保存在 new_state 中
        new_state = sess.run(model.initial_state)
        
        loss = 0
        dataset = build_data_generator(train_set, batch_size, True)
#         print("dataset:\n", dataset)
        
        for x, y in dataset():
            counter += 1
#             print("x:\n", x)
#             print("y:\n", y)
            start = time.time()
            
            #todo:
            # 构造 feed_dict
            # 这里, 我们需要得到model.inputs, model.targets, model.keep_prob, model.initial_state的输入
            # 需要将上一步得到的state作为这一步的model.initial_state
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            
            #todo:
            # 使用上面定义的 feed_dict, 运行 session, 获得当前 batch 的 loss, state, 并运行 model.optimizer
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)
            if (counter % print_every_n == 0):
                end = time.time()
                print('Epoch: {}/{}... '.format(e+1, epochs),
                  'Training Step: {}... '.format(counter),
                  'Training loss: {:.4f}... '.format(batch_loss),
                  '{:.4f} sec/batch'.format((end-start)))
        
            if (counter % save_every_n == 0):
                saver.save(sess, "checkpoints/checkpoints/i{}_l{}.ckpt".format(counter, rnn_size))
    
    saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, rnn_size, batch_loss))

Epoch: 1/100...  Training Step: 1...  Training loss: 8.5918...  1.7820 sec/batch
Epoch: 1/100...  Training Step: 2...  Training loss: 8.4878...  0.3435 sec/batch
Epoch: 1/100...  Training Step: 3...  Training loss: 9.4271...  0.3306 sec/batch
Epoch: 1/100...  Training Step: 4...  Training loss: 7.4339...  0.3334 sec/batch
Epoch: 1/100...  Training Step: 5...  Training loss: 7.6500...  0.3351 sec/batch
Epoch: 1/100...  Training Step: 6...  Training loss: 6.9588...  0.3350 sec/batch
Epoch: 1/100...  Training Step: 7...  Training loss: 6.6767...  0.3340 sec/batch
Epoch: 1/100...  Training Step: 8...  Training loss: 6.5810...  0.3349 sec/batch
Epoch: 1/100...  Training Step: 9...  Training loss: 6.6424...  0.3346 sec/batch
Epoch: 1/100...  Training Step: 10...  Training loss: 6.6177...  0.3337 sec/batch
Epoch: 1/100...  Training Step: 11...  Training loss: 6.3776...  0.3323 sec/batch
Epoch: 1/100...  Training Step: 12...  Training loss: 6.1868...  0.3315 sec/batch
Epoch: 1/100...  Training

Epoch: 1/100...  Training Step: 102...  Training loss: 5.6661...  0.3365 sec/batch
Epoch: 1/100...  Training Step: 103...  Training loss: 5.8227...  0.3362 sec/batch
Epoch: 1/100...  Training Step: 104...  Training loss: 5.7235...  0.3370 sec/batch
Epoch: 1/100...  Training Step: 105...  Training loss: 5.7773...  0.3363 sec/batch
Epoch: 1/100...  Training Step: 106...  Training loss: 5.7724...  0.3365 sec/batch
Epoch: 1/100...  Training Step: 107...  Training loss: 5.7560...  0.3349 sec/batch
Epoch: 1/100...  Training Step: 108...  Training loss: 5.5436...  0.3353 sec/batch
Epoch: 1/100...  Training Step: 109...  Training loss: 5.5355...  0.3327 sec/batch
Epoch: 1/100...  Training Step: 110...  Training loss: 5.5502...  0.3359 sec/batch
Epoch: 1/100...  Training Step: 111...  Training loss: 5.5329...  0.3383 sec/batch
Epoch: 1/100...  Training Step: 112...  Training loss: 5.6993...  0.3343 sec/batch
Epoch: 1/100...  Training Step: 113...  Training loss: 5.6548...  0.3372 sec/batch
Epoc

Epoch: 2/100...  Training Step: 201...  Training loss: 5.7605...  0.3374 sec/batch
Epoch: 2/100...  Training Step: 202...  Training loss: 5.5873...  0.3400 sec/batch
Epoch: 2/100...  Training Step: 203...  Training loss: 5.6592...  0.3404 sec/batch
Epoch: 2/100...  Training Step: 204...  Training loss: 5.6164...  0.3391 sec/batch
Epoch: 2/100...  Training Step: 205...  Training loss: 5.6804...  0.3373 sec/batch
Epoch: 2/100...  Training Step: 206...  Training loss: 5.6030...  0.3350 sec/batch
Epoch: 2/100...  Training Step: 207...  Training loss: 5.6373...  0.3371 sec/batch
Epoch: 2/100...  Training Step: 208...  Training loss: 5.6952...  0.3343 sec/batch
Epoch: 2/100...  Training Step: 209...  Training loss: 5.6030...  0.3355 sec/batch
Epoch: 2/100...  Training Step: 210...  Training loss: 5.5529...  0.3355 sec/batch
Epoch: 2/100...  Training Step: 211...  Training loss: 5.5456...  0.3359 sec/batch
Epoch: 2/100...  Training Step: 212...  Training loss: 5.5536...  0.3368 sec/batch
Epoc

Epoch: 2/100...  Training Step: 300...  Training loss: 5.3303...  0.3370 sec/batch
Epoch: 2/100...  Training Step: 301...  Training loss: 5.5105...  0.3354 sec/batch
Epoch: 2/100...  Training Step: 302...  Training loss: 5.3467...  0.3362 sec/batch
Epoch: 2/100...  Training Step: 303...  Training loss: 5.3109...  0.3369 sec/batch
Epoch: 2/100...  Training Step: 304...  Training loss: 5.4235...  0.3374 sec/batch
Epoch: 2/100...  Training Step: 305...  Training loss: 5.6227...  0.3385 sec/batch
Epoch: 2/100...  Training Step: 306...  Training loss: 5.5127...  0.3352 sec/batch
Epoch: 2/100...  Training Step: 307...  Training loss: 5.4087...  0.3357 sec/batch
Epoch: 2/100...  Training Step: 308...  Training loss: 5.3785...  0.3385 sec/batch
Epoch: 2/100...  Training Step: 309...  Training loss: 5.3132...  0.3355 sec/batch
Epoch: 2/100...  Training Step: 310...  Training loss: 5.3090...  0.3395 sec/batch
Epoch: 2/100...  Training Step: 311...  Training loss: 5.3995...  0.3363 sec/batch
Epoc

Epoch: 3/100...  Training Step: 399...  Training loss: 5.3120...  0.3376 sec/batch
Epoch: 3/100...  Training Step: 400...  Training loss: 5.3051...  0.3401 sec/batch
Epoch: 3/100...  Training Step: 401...  Training loss: 5.0679...  0.3364 sec/batch
Epoch: 3/100...  Training Step: 402...  Training loss: 5.1430...  0.3377 sec/batch
Epoch: 3/100...  Training Step: 403...  Training loss: 5.2891...  0.3377 sec/batch
Epoch: 3/100...  Training Step: 404...  Training loss: 5.4554...  0.3361 sec/batch
Epoch: 3/100...  Training Step: 405...  Training loss: 5.3807...  0.3377 sec/batch
Epoch: 3/100...  Training Step: 406...  Training loss: 5.4038...  0.3359 sec/batch
Epoch: 3/100...  Training Step: 407...  Training loss: 5.2884...  0.3366 sec/batch
Epoch: 3/100...  Training Step: 408...  Training loss: 5.2730...  0.3385 sec/batch
Epoch: 3/100...  Training Step: 409...  Training loss: 5.3838...  0.3368 sec/batch
Epoch: 3/100...  Training Step: 410...  Training loss: 5.3357...  0.3399 sec/batch
Epoc

Epoch: 3/100...  Training Step: 498...  Training loss: 5.2789...  0.3367 sec/batch
Epoch: 3/100...  Training Step: 499...  Training loss: 5.2845...  0.3372 sec/batch
Epoch: 3/100...  Training Step: 500...  Training loss: 5.3228...  0.3401 sec/batch
Epoch: 3/100...  Training Step: 501...  Training loss: 5.3731...  0.3404 sec/batch
Epoch: 3/100...  Training Step: 502...  Training loss: 5.1509...  0.3359 sec/batch
Epoch: 3/100...  Training Step: 503...  Training loss: 5.2230...  0.3411 sec/batch
Epoch: 3/100...  Training Step: 504...  Training loss: 5.1043...  0.3395 sec/batch
Epoch: 3/100...  Training Step: 505...  Training loss: 5.4191...  0.3416 sec/batch
Epoch: 3/100...  Training Step: 506...  Training loss: 5.2299...  0.3400 sec/batch
Epoch: 3/100...  Training Step: 507...  Training loss: 5.3472...  0.3405 sec/batch
Epoch: 3/100...  Training Step: 508...  Training loss: 5.8009...  0.3425 sec/batch
Epoch: 3/100...  Training Step: 509...  Training loss: 5.7385...  0.3415 sec/batch
Epoc

Epoch: 4/100...  Training Step: 597...  Training loss: 5.3087...  0.3384 sec/batch
Epoch: 4/100...  Training Step: 598...  Training loss: 5.3131...  0.3404 sec/batch
Epoch: 4/100...  Training Step: 599...  Training loss: 5.3548...  0.3395 sec/batch
Epoch: 4/100...  Training Step: 600...  Training loss: 5.2365...  0.3401 sec/batch
Epoch: 4/100...  Training Step: 601...  Training loss: 5.2936...  0.3429 sec/batch
Epoch: 4/100...  Training Step: 602...  Training loss: 5.5175...  0.3415 sec/batch
Epoch: 4/100...  Training Step: 603...  Training loss: 5.4185...  0.3409 sec/batch
Epoch: 4/100...  Training Step: 604...  Training loss: 5.3022...  0.3389 sec/batch
Epoch: 4/100...  Training Step: 605...  Training loss: 5.4163...  0.3413 sec/batch
Epoch: 4/100...  Training Step: 606...  Training loss: 5.4383...  0.3414 sec/batch
Epoch: 4/100...  Training Step: 607...  Training loss: 5.4318...  0.3369 sec/batch
Epoch: 4/100...  Training Step: 608...  Training loss: 5.3137...  0.3397 sec/batch
Epoc

Epoch: 4/100...  Training Step: 696...  Training loss: 5.2666...  0.3403 sec/batch
Epoch: 4/100...  Training Step: 697...  Training loss: 5.2380...  0.3372 sec/batch
Epoch: 4/100...  Training Step: 698...  Training loss: 5.2753...  0.3388 sec/batch
Epoch: 4/100...  Training Step: 699...  Training loss: 5.4055...  0.3398 sec/batch
Epoch: 4/100...  Training Step: 700...  Training loss: 5.3397...  0.3382 sec/batch
Epoch: 4/100...  Training Step: 701...  Training loss: 5.3405...  0.3396 sec/batch
Epoch: 4/100...  Training Step: 702...  Training loss: 5.2792...  0.3417 sec/batch
Epoch: 4/100...  Training Step: 703...  Training loss: 5.3065...  0.3383 sec/batch
Epoch: 4/100...  Training Step: 704...  Training loss: 5.0585...  0.3382 sec/batch
Epoch: 4/100...  Training Step: 705...  Training loss: 5.3315...  0.3431 sec/batch
Epoch: 4/100...  Training Step: 706...  Training loss: 5.2424...  0.3425 sec/batch
Epoch: 4/100...  Training Step: 707...  Training loss: 5.3407...  0.3412 sec/batch
Epoc

Epoch: 5/100...  Training Step: 795...  Training loss: 5.1678...  0.3375 sec/batch
Epoch: 5/100...  Training Step: 796...  Training loss: 5.2121...  0.3390 sec/batch
Epoch: 5/100...  Training Step: 797...  Training loss: 5.2067...  0.3382 sec/batch
Epoch: 5/100...  Training Step: 798...  Training loss: 5.2478...  0.3419 sec/batch
Epoch: 5/100...  Training Step: 799...  Training loss: 5.3409...  0.3398 sec/batch
Epoch: 5/100...  Training Step: 800...  Training loss: 5.3647...  0.3422 sec/batch
Epoch: 5/100...  Training Step: 801...  Training loss: 5.2018...  0.3418 sec/batch
Epoch: 5/100...  Training Step: 802...  Training loss: 5.2408...  0.3387 sec/batch
Epoch: 5/100...  Training Step: 803...  Training loss: 5.2555...  0.3389 sec/batch
Epoch: 5/100...  Training Step: 804...  Training loss: 5.1270...  0.3409 sec/batch
Epoch: 5/100...  Training Step: 805...  Training loss: 5.3713...  0.3415 sec/batch
Epoch: 5/100...  Training Step: 806...  Training loss: 5.2656...  0.3429 sec/batch
Epoc

Epoch: 5/100...  Training Step: 894...  Training loss: 5.3552...  0.3395 sec/batch
Epoch: 5/100...  Training Step: 895...  Training loss: 5.2534...  0.3395 sec/batch
Epoch: 5/100...  Training Step: 896...  Training loss: 5.2442...  0.3386 sec/batch
Epoch: 5/100...  Training Step: 897...  Training loss: 5.2547...  0.3417 sec/batch
Epoch: 5/100...  Training Step: 898...  Training loss: 5.3257...  0.3390 sec/batch
Epoch: 5/100...  Training Step: 899...  Training loss: 5.3381...  0.3423 sec/batch
Epoch: 5/100...  Training Step: 900...  Training loss: 5.3679...  0.3407 sec/batch
Epoch: 5/100...  Training Step: 901...  Training loss: 5.5567...  0.3408 sec/batch
Epoch: 5/100...  Training Step: 902...  Training loss: 5.4762...  0.3426 sec/batch
Epoch: 5/100...  Training Step: 903...  Training loss: 5.3739...  0.3411 sec/batch
Epoch: 5/100...  Training Step: 904...  Training loss: 5.2779...  0.3393 sec/batch
Epoch: 5/100...  Training Step: 905...  Training loss: 5.2444...  0.3393 sec/batch
Epoc

Epoch: 6/100...  Training Step: 993...  Training loss: 5.1132...  0.3430 sec/batch
Epoch: 6/100...  Training Step: 994...  Training loss: 5.1889...  0.3426 sec/batch
Epoch: 6/100...  Training Step: 995...  Training loss: 5.1439...  0.3396 sec/batch
Epoch: 6/100...  Training Step: 996...  Training loss: 5.1468...  0.3376 sec/batch
Epoch: 6/100...  Training Step: 997...  Training loss: 5.2440...  0.3386 sec/batch
Epoch: 6/100...  Training Step: 998...  Training loss: 5.2769...  0.3403 sec/batch
Epoch: 6/100...  Training Step: 999...  Training loss: 5.3192...  0.3417 sec/batch
Epoch: 6/100...  Training Step: 1000...  Training loss: 5.5363...  0.3386 sec/batch
Epoch: 6/100...  Training Step: 1001...  Training loss: 5.4459...  0.3961 sec/batch
Epoch: 6/100...  Training Step: 1002...  Training loss: 5.4285...  0.3501 sec/batch
Epoch: 6/100...  Training Step: 1003...  Training loss: 5.4066...  0.3388 sec/batch
Epoch: 6/100...  Training Step: 1004...  Training loss: 5.3607...  0.3380 sec/batch

Epoch: 6/100...  Training Step: 1091...  Training loss: 5.2488...  0.3405 sec/batch
Epoch: 6/100...  Training Step: 1092...  Training loss: 5.2734...  0.3416 sec/batch
Epoch: 6/100...  Training Step: 1093...  Training loss: 5.3842...  0.3386 sec/batch
Epoch: 6/100...  Training Step: 1094...  Training loss: 5.3382...  0.3377 sec/batch
Epoch: 6/100...  Training Step: 1095...  Training loss: 5.3496...  0.3414 sec/batch
Epoch: 6/100...  Training Step: 1096...  Training loss: 5.2614...  0.3373 sec/batch
Epoch: 6/100...  Training Step: 1097...  Training loss: 5.3179...  0.3398 sec/batch
Epoch: 6/100...  Training Step: 1098...  Training loss: 5.2602...  0.3399 sec/batch
Epoch: 6/100...  Training Step: 1099...  Training loss: 5.2460...  0.3391 sec/batch
Epoch: 6/100...  Training Step: 1100...  Training loss: 5.2684...  0.3413 sec/batch
Epoch: 6/100...  Training Step: 1101...  Training loss: 5.6620...  0.3399 sec/batch
Epoch: 6/100...  Training Step: 1102...  Training loss: 5.5289...  0.3408 se

Epoch: 7/100...  Training Step: 1189...  Training loss: 5.4981...  0.3370 sec/batch
Epoch: 7/100...  Training Step: 1190...  Training loss: 5.3785...  0.3378 sec/batch
Epoch: 7/100...  Training Step: 1191...  Training loss: 5.3236...  0.3420 sec/batch
Epoch: 7/100...  Training Step: 1192...  Training loss: 5.4333...  0.3403 sec/batch
Epoch: 7/100...  Training Step: 1193...  Training loss: 5.4838...  0.3391 sec/batch
Epoch: 7/100...  Training Step: 1194...  Training loss: 5.4450...  0.3384 sec/batch
Epoch: 7/100...  Training Step: 1195...  Training loss: 5.4374...  0.3388 sec/batch
Epoch: 7/100...  Training Step: 1196...  Training loss: 5.1745...  0.3403 sec/batch
Epoch: 7/100...  Training Step: 1197...  Training loss: 5.4589...  0.3406 sec/batch
Epoch: 7/100...  Training Step: 1198...  Training loss: 5.4785...  0.3518 sec/batch
Epoch: 7/100...  Training Step: 1199...  Training loss: 5.3877...  0.3436 sec/batch
Epoch: 7/100...  Training Step: 1200...  Training loss: 5.2799...  0.3399 se

Epoch: 7/100...  Training Step: 1287...  Training loss: 5.5630...  0.3411 sec/batch
Epoch: 7/100...  Training Step: 1288...  Training loss: 5.4921...  0.3371 sec/batch
Epoch: 8/100...  Training Step: 1289...  Training loss: 5.7489...  0.3435 sec/batch
Epoch: 8/100...  Training Step: 1290...  Training loss: 5.6917...  0.3402 sec/batch
Epoch: 8/100...  Training Step: 1291...  Training loss: 5.4880...  0.3390 sec/batch
Epoch: 8/100...  Training Step: 1292...  Training loss: 5.4088...  0.3422 sec/batch
Epoch: 8/100...  Training Step: 1293...  Training loss: 5.4074...  0.3403 sec/batch
Epoch: 8/100...  Training Step: 1294...  Training loss: 5.2980...  0.3376 sec/batch
Epoch: 8/100...  Training Step: 1295...  Training loss: 5.5568...  0.3400 sec/batch
Epoch: 8/100...  Training Step: 1296...  Training loss: 5.3473...  0.3400 sec/batch
Epoch: 8/100...  Training Step: 1297...  Training loss: 5.4300...  0.3392 sec/batch
Epoch: 8/100...  Training Step: 1298...  Training loss: 5.2756...  0.3402 se

Epoch: 8/100...  Training Step: 1385...  Training loss: 5.2681...  0.3416 sec/batch
Epoch: 8/100...  Training Step: 1386...  Training loss: 5.2458...  0.3400 sec/batch
Epoch: 8/100...  Training Step: 1387...  Training loss: 5.3397...  0.3402 sec/batch
Epoch: 8/100...  Training Step: 1388...  Training loss: 5.2620...  0.3399 sec/batch
Epoch: 8/100...  Training Step: 1389...  Training loss: 5.3002...  0.3408 sec/batch
Epoch: 8/100...  Training Step: 1390...  Training loss: 5.3104...  0.3392 sec/batch
Epoch: 8/100...  Training Step: 1391...  Training loss: 5.4602...  0.3380 sec/batch
Epoch: 8/100...  Training Step: 1392...  Training loss: 5.3194...  0.3386 sec/batch
Epoch: 8/100...  Training Step: 1393...  Training loss: 5.3996...  0.3403 sec/batch
Epoch: 8/100...  Training Step: 1394...  Training loss: 5.4208...  0.3397 sec/batch
Epoch: 8/100...  Training Step: 1395...  Training loss: 5.3583...  0.3391 sec/batch
Epoch: 8/100...  Training Step: 1396...  Training loss: 5.1232...  0.3387 se

Epoch: 9/100...  Training Step: 1483...  Training loss: 5.2846...  0.3409 sec/batch
Epoch: 9/100...  Training Step: 1484...  Training loss: 5.2643...  0.3405 sec/batch
Epoch: 9/100...  Training Step: 1485...  Training loss: 5.2538...  0.3397 sec/batch
Epoch: 9/100...  Training Step: 1486...  Training loss: 5.3948...  0.3416 sec/batch
Epoch: 9/100...  Training Step: 1487...  Training loss: 5.3703...  0.3440 sec/batch
Epoch: 9/100...  Training Step: 1488...  Training loss: 5.3627...  0.3427 sec/batch
Epoch: 9/100...  Training Step: 1489...  Training loss: 5.3703...  0.3392 sec/batch
Epoch: 9/100...  Training Step: 1490...  Training loss: 5.2426...  0.3438 sec/batch
Epoch: 9/100...  Training Step: 1491...  Training loss: 5.3783...  0.3394 sec/batch
Epoch: 9/100...  Training Step: 1492...  Training loss: 5.2511...  0.3420 sec/batch
Epoch: 9/100...  Training Step: 1493...  Training loss: 5.2965...  0.3427 sec/batch
Epoch: 9/100...  Training Step: 1494...  Training loss: 5.2707...  0.3435 se

Epoch: 9/100...  Training Step: 1581...  Training loss: 5.1270...  0.3408 sec/batch
Epoch: 9/100...  Training Step: 1582...  Training loss: 5.1372...  0.3390 sec/batch
Epoch: 9/100...  Training Step: 1583...  Training loss: 5.1275...  0.3378 sec/batch
Epoch: 9/100...  Training Step: 1584...  Training loss: 5.2300...  0.3434 sec/batch
Epoch: 9/100...  Training Step: 1585...  Training loss: 5.2134...  0.3441 sec/batch
Epoch: 9/100...  Training Step: 1586...  Training loss: 5.2061...  0.3429 sec/batch
Epoch: 9/100...  Training Step: 1587...  Training loss: 5.0828...  0.3430 sec/batch
Epoch: 9/100...  Training Step: 1588...  Training loss: 5.0980...  0.3421 sec/batch
Epoch: 9/100...  Training Step: 1589...  Training loss: 5.3074...  0.3410 sec/batch
Epoch: 9/100...  Training Step: 1590...  Training loss: 5.1379...  0.3429 sec/batch
Epoch: 9/100...  Training Step: 1591...  Training loss: 5.1137...  0.3435 sec/batch
Epoch: 9/100...  Training Step: 1592...  Training loss: 5.2071...  0.3432 se

Epoch: 10/100...  Training Step: 1679...  Training loss: 5.2527...  0.3389 sec/batch
Epoch: 10/100...  Training Step: 1680...  Training loss: 5.3010...  0.3378 sec/batch
Epoch: 10/100...  Training Step: 1681...  Training loss: 5.2686...  0.3394 sec/batch
Epoch: 10/100...  Training Step: 1682...  Training loss: 5.1867...  0.3405 sec/batch
Epoch: 10/100...  Training Step: 1683...  Training loss: 5.2378...  0.3388 sec/batch
Epoch: 10/100...  Training Step: 1684...  Training loss: 5.2200...  0.3387 sec/batch
Epoch: 10/100...  Training Step: 1685...  Training loss: 5.2415...  0.3402 sec/batch
Epoch: 10/100...  Training Step: 1686...  Training loss: 5.2444...  0.3392 sec/batch
Epoch: 10/100...  Training Step: 1687...  Training loss: 5.1946...  0.3376 sec/batch
Epoch: 10/100...  Training Step: 1688...  Training loss: 5.1734...  0.3402 sec/batch
Epoch: 10/100...  Training Step: 1689...  Training loss: 4.8858...  0.3388 sec/batch
Epoch: 10/100...  Training Step: 1690...  Training loss: 4.9908..

Epoch: 10/100...  Training Step: 1776...  Training loss: 5.1807...  0.3423 sec/batch
Epoch: 10/100...  Training Step: 1777...  Training loss: 5.3811...  0.3405 sec/batch
Epoch: 10/100...  Training Step: 1778...  Training loss: 5.2899...  0.3412 sec/batch
Epoch: 10/100...  Training Step: 1779...  Training loss: 5.1770...  0.3407 sec/batch
Epoch: 10/100...  Training Step: 1780...  Training loss: 5.1686...  0.3415 sec/batch
Epoch: 10/100...  Training Step: 1781...  Training loss: 5.0873...  0.3390 sec/batch
Epoch: 10/100...  Training Step: 1782...  Training loss: 5.1074...  0.3414 sec/batch
Epoch: 10/100...  Training Step: 1783...  Training loss: 5.1908...  0.3400 sec/batch
Epoch: 10/100...  Training Step: 1784...  Training loss: 5.2185...  0.3411 sec/batch
Epoch: 10/100...  Training Step: 1785...  Training loss: 5.3589...  0.3399 sec/batch
Epoch: 10/100...  Training Step: 1786...  Training loss: 5.1855...  0.3392 sec/batch
Epoch: 10/100...  Training Step: 1787...  Training loss: 5.1904..

Epoch: 11/100...  Training Step: 1873...  Training loss: 4.8740...  0.3419 sec/batch
Epoch: 11/100...  Training Step: 1874...  Training loss: 4.9733...  0.3411 sec/batch
Epoch: 11/100...  Training Step: 1875...  Training loss: 5.1161...  0.3405 sec/batch
Epoch: 11/100...  Training Step: 1876...  Training loss: 5.3360...  0.3414 sec/batch
Epoch: 11/100...  Training Step: 1877...  Training loss: 5.2047...  0.3413 sec/batch
Epoch: 11/100...  Training Step: 1878...  Training loss: 5.2749...  0.3401 sec/batch
Epoch: 11/100...  Training Step: 1879...  Training loss: 5.1615...  0.3409 sec/batch
Epoch: 11/100...  Training Step: 1880...  Training loss: 5.1928...  0.3401 sec/batch
Epoch: 11/100...  Training Step: 1881...  Training loss: 5.2912...  0.3414 sec/batch
Epoch: 11/100...  Training Step: 1882...  Training loss: 5.2507...  0.3440 sec/batch
Epoch: 11/100...  Training Step: 1883...  Training loss: 5.3219...  0.3399 sec/batch
Epoch: 11/100...  Training Step: 1884...  Training loss: 5.3030..

Epoch: 11/100...  Training Step: 1970...  Training loss: 5.1571...  0.3438 sec/batch
Epoch: 11/100...  Training Step: 1971...  Training loss: 5.1851...  0.3443 sec/batch
Epoch: 11/100...  Training Step: 1972...  Training loss: 5.2240...  0.3430 sec/batch
Epoch: 11/100...  Training Step: 1973...  Training loss: 5.2488...  0.3414 sec/batch
Epoch: 11/100...  Training Step: 1974...  Training loss: 5.0255...  0.3426 sec/batch
Epoch: 11/100...  Training Step: 1975...  Training loss: 5.1040...  0.3452 sec/batch
Epoch: 11/100...  Training Step: 1976...  Training loss: 5.0159...  0.3425 sec/batch
Epoch: 11/100...  Training Step: 1977...  Training loss: 5.3049...  0.3424 sec/batch
Epoch: 11/100...  Training Step: 1978...  Training loss: 5.1107...  0.3409 sec/batch
Epoch: 11/100...  Training Step: 1979...  Training loss: 5.2054...  0.3424 sec/batch
Epoch: 11/100...  Training Step: 1980...  Training loss: 5.6428...  0.3441 sec/batch
Epoch: 11/100...  Training Step: 1981...  Training loss: 5.5929..

Epoch: 12/100...  Training Step: 2067...  Training loss: 5.3205...  0.3425 sec/batch
Epoch: 12/100...  Training Step: 2068...  Training loss: 5.2984...  0.3447 sec/batch
Epoch: 12/100...  Training Step: 2069...  Training loss: 5.1887...  0.3400 sec/batch
Epoch: 12/100...  Training Step: 2070...  Training loss: 5.1742...  0.3405 sec/batch
Epoch: 12/100...  Training Step: 2071...  Training loss: 5.2463...  0.3407 sec/batch
Epoch: 12/100...  Training Step: 2072...  Training loss: 5.0990...  0.3409 sec/batch
Epoch: 12/100...  Training Step: 2073...  Training loss: 5.1770...  0.3412 sec/batch
Epoch: 12/100...  Training Step: 2074...  Training loss: 5.3787...  0.3407 sec/batch
Epoch: 12/100...  Training Step: 2075...  Training loss: 5.2655...  0.3416 sec/batch
Epoch: 12/100...  Training Step: 2076...  Training loss: 5.1604...  0.3410 sec/batch
Epoch: 12/100...  Training Step: 2077...  Training loss: 5.2478...  0.3417 sec/batch
Epoch: 12/100...  Training Step: 2078...  Training loss: 5.2901..

Epoch: 12/100...  Training Step: 2164...  Training loss: 5.6046...  0.3412 sec/batch
Epoch: 12/100...  Training Step: 2165...  Training loss: 5.5798...  0.3422 sec/batch
Epoch: 12/100...  Training Step: 2166...  Training loss: 5.3828...  0.3391 sec/batch
Epoch: 12/100...  Training Step: 2167...  Training loss: 5.1348...  0.3418 sec/batch
Epoch: 12/100...  Training Step: 2168...  Training loss: 5.0901...  0.3398 sec/batch
Epoch: 12/100...  Training Step: 2169...  Training loss: 5.0826...  0.3427 sec/batch
Epoch: 12/100...  Training Step: 2170...  Training loss: 5.1492...  0.3417 sec/batch
Epoch: 12/100...  Training Step: 2171...  Training loss: 5.2746...  0.3410 sec/batch
Epoch: 12/100...  Training Step: 2172...  Training loss: 5.2047...  0.3415 sec/batch
Epoch: 12/100...  Training Step: 2173...  Training loss: 5.2167...  0.3416 sec/batch
Epoch: 12/100...  Training Step: 2174...  Training loss: 5.1499...  0.3403 sec/batch
Epoch: 12/100...  Training Step: 2175...  Training loss: 5.1714..

Epoch: 13/100...  Training Step: 2261...  Training loss: 5.2535...  0.3402 sec/batch
Epoch: 13/100...  Training Step: 2262...  Training loss: 5.2781...  0.3414 sec/batch
Epoch: 13/100...  Training Step: 2263...  Training loss: 5.2856...  0.3411 sec/batch
Epoch: 13/100...  Training Step: 2264...  Training loss: 5.1943...  0.3387 sec/batch
Epoch: 13/100...  Training Step: 2265...  Training loss: 5.0830...  0.3387 sec/batch
Epoch: 13/100...  Training Step: 2266...  Training loss: 5.2368...  0.3401 sec/batch
Epoch: 13/100...  Training Step: 2267...  Training loss: 5.0402...  0.3412 sec/batch
Epoch: 13/100...  Training Step: 2268...  Training loss: 5.0841...  0.3412 sec/batch
Epoch: 13/100...  Training Step: 2269...  Training loss: 5.0917...  0.3410 sec/batch
Epoch: 13/100...  Training Step: 2270...  Training loss: 5.1555...  0.3405 sec/batch
Epoch: 13/100...  Training Step: 2271...  Training loss: 5.2171...  0.3394 sec/batch
Epoch: 13/100...  Training Step: 2272...  Training loss: 5.2387..

Epoch: 13/100...  Training Step: 2358...  Training loss: 5.1436...  0.3417 sec/batch
Epoch: 13/100...  Training Step: 2359...  Training loss: 5.1644...  0.3398 sec/batch
Epoch: 13/100...  Training Step: 2360...  Training loss: 4.9590...  0.3394 sec/batch
Epoch: 13/100...  Training Step: 2361...  Training loss: 5.1697...  0.3439 sec/batch
Epoch: 13/100...  Training Step: 2362...  Training loss: 5.0935...  0.3422 sec/batch
Epoch: 13/100...  Training Step: 2363...  Training loss: 5.1884...  0.3388 sec/batch
Epoch: 13/100...  Training Step: 2364...  Training loss: 5.1530...  0.3428 sec/batch
Epoch: 13/100...  Training Step: 2365...  Training loss: 5.2258...  0.3428 sec/batch
Epoch: 13/100...  Training Step: 2366...  Training loss: 5.2601...  0.3404 sec/batch
Epoch: 13/100...  Training Step: 2367...  Training loss: 5.1491...  0.3416 sec/batch
Epoch: 13/100...  Training Step: 2368...  Training loss: 5.1420...  0.3388 sec/batch
Epoch: 13/100...  Training Step: 2369...  Training loss: 5.1521..

Epoch: 14/100...  Training Step: 2455...  Training loss: 5.2171...  0.3391 sec/batch
Epoch: 14/100...  Training Step: 2456...  Training loss: 5.2254...  0.3396 sec/batch
Epoch: 14/100...  Training Step: 2457...  Training loss: 5.0718...  0.3417 sec/batch
Epoch: 14/100...  Training Step: 2458...  Training loss: 5.0878...  0.3422 sec/batch
Epoch: 14/100...  Training Step: 2459...  Training loss: 5.1376...  0.3397 sec/batch
Epoch: 14/100...  Training Step: 2460...  Training loss: 5.0111...  0.3412 sec/batch
Epoch: 14/100...  Training Step: 2461...  Training loss: 5.2354...  0.3420 sec/batch
Epoch: 14/100...  Training Step: 2462...  Training loss: 5.1524...  0.3395 sec/batch
Epoch: 14/100...  Training Step: 2463...  Training loss: 5.1882...  0.3406 sec/batch
Epoch: 14/100...  Training Step: 2464...  Training loss: 4.9887...  0.3413 sec/batch
Epoch: 14/100...  Training Step: 2465...  Training loss: 5.0257...  0.3394 sec/batch
Epoch: 14/100...  Training Step: 2466...  Training loss: 5.0953..

Epoch: 14/100...  Training Step: 2552...  Training loss: 5.1144...  0.3392 sec/batch
Epoch: 14/100...  Training Step: 2553...  Training loss: 5.1357...  0.3413 sec/batch
Epoch: 14/100...  Training Step: 2554...  Training loss: 5.1921...  0.3384 sec/batch
Epoch: 14/100...  Training Step: 2555...  Training loss: 5.2132...  0.3418 sec/batch
Epoch: 14/100...  Training Step: 2556...  Training loss: 5.2337...  0.3424 sec/batch
Epoch: 14/100...  Training Step: 2557...  Training loss: 5.3989...  0.3428 sec/batch
Epoch: 14/100...  Training Step: 2558...  Training loss: 5.3194...  0.3438 sec/batch
Epoch: 14/100...  Training Step: 2559...  Training loss: 5.2348...  0.3400 sec/batch
Epoch: 14/100...  Training Step: 2560...  Training loss: 5.1677...  0.3434 sec/batch
Epoch: 14/100...  Training Step: 2561...  Training loss: 5.1364...  0.3411 sec/batch
Epoch: 14/100...  Training Step: 2562...  Training loss: 5.1076...  0.3393 sec/batch
Epoch: 14/100...  Training Step: 2563...  Training loss: 5.0939..

Epoch: 15/100...  Training Step: 2649...  Training loss: 5.0169...  0.3430 sec/batch
Epoch: 15/100...  Training Step: 2650...  Training loss: 5.0737...  0.3426 sec/batch
Epoch: 15/100...  Training Step: 2651...  Training loss: 5.0448...  0.3409 sec/batch
Epoch: 15/100...  Training Step: 2652...  Training loss: 5.0473...  0.3408 sec/batch
Epoch: 15/100...  Training Step: 2653...  Training loss: 5.1375...  0.3417 sec/batch
Epoch: 15/100...  Training Step: 2654...  Training loss: 5.1481...  0.3394 sec/batch
Epoch: 15/100...  Training Step: 2655...  Training loss: 5.2006...  0.3412 sec/batch
Epoch: 15/100...  Training Step: 2656...  Training loss: 5.3668...  0.3408 sec/batch
Epoch: 15/100...  Training Step: 2657...  Training loss: 5.3171...  0.3424 sec/batch
Epoch: 15/100...  Training Step: 2658...  Training loss: 5.3037...  0.3413 sec/batch
Epoch: 15/100...  Training Step: 2659...  Training loss: 5.2829...  0.3416 sec/batch
Epoch: 15/100...  Training Step: 2660...  Training loss: 5.2255..

Epoch: 15/100...  Training Step: 2746...  Training loss: 5.0917...  0.3445 sec/batch
Epoch: 15/100...  Training Step: 2747...  Training loss: 5.0933...  0.3417 sec/batch
Epoch: 15/100...  Training Step: 2748...  Training loss: 5.1360...  0.3418 sec/batch
Epoch: 15/100...  Training Step: 2749...  Training loss: 5.2316...  0.3455 sec/batch
Epoch: 15/100...  Training Step: 2750...  Training loss: 5.1797...  0.3427 sec/batch
Epoch: 15/100...  Training Step: 2751...  Training loss: 5.1742...  0.3421 sec/batch
Epoch: 15/100...  Training Step: 2752...  Training loss: 5.1098...  0.3403 sec/batch
Epoch: 15/100...  Training Step: 2753...  Training loss: 5.1681...  0.3442 sec/batch
Epoch: 15/100...  Training Step: 2754...  Training loss: 5.1206...  0.3404 sec/batch
Epoch: 15/100...  Training Step: 2755...  Training loss: 5.1000...  0.3438 sec/batch
Epoch: 15/100...  Training Step: 2756...  Training loss: 5.1290...  0.3458 sec/batch
Epoch: 15/100...  Training Step: 2757...  Training loss: 5.5211..

Epoch: 16/100...  Training Step: 2843...  Training loss: 5.2628...  0.3444 sec/batch
Epoch: 16/100...  Training Step: 2844...  Training loss: 5.2210...  0.3424 sec/batch
Epoch: 16/100...  Training Step: 2845...  Training loss: 5.3487...  0.3430 sec/batch
Epoch: 16/100...  Training Step: 2846...  Training loss: 5.2367...  0.3426 sec/batch
Epoch: 16/100...  Training Step: 2847...  Training loss: 5.1616...  0.3446 sec/batch
Epoch: 16/100...  Training Step: 2848...  Training loss: 5.2725...  0.3399 sec/batch
Epoch: 16/100...  Training Step: 2849...  Training loss: 5.3452...  0.3426 sec/batch
Epoch: 16/100...  Training Step: 2850...  Training loss: 5.3349...  0.3428 sec/batch
Epoch: 16/100...  Training Step: 2851...  Training loss: 5.3018...  0.3419 sec/batch
Epoch: 16/100...  Training Step: 2852...  Training loss: 5.0443...  0.3420 sec/batch
Epoch: 16/100...  Training Step: 2853...  Training loss: 5.3143...  0.3425 sec/batch
Epoch: 16/100...  Training Step: 2854...  Training loss: 5.3147..

Epoch: 16/100...  Training Step: 2940...  Training loss: 5.1106...  0.3396 sec/batch
Epoch: 16/100...  Training Step: 2941...  Training loss: 5.4934...  0.3415 sec/batch
Epoch: 16/100...  Training Step: 2942...  Training loss: 5.3669...  0.3429 sec/batch
Epoch: 16/100...  Training Step: 2943...  Training loss: 5.4117...  0.3413 sec/batch
Epoch: 16/100...  Training Step: 2944...  Training loss: 5.3182...  0.3441 sec/batch
Epoch: 17/100...  Training Step: 2945...  Training loss: 5.4772...  0.3443 sec/batch
Epoch: 17/100...  Training Step: 2946...  Training loss: 5.5527...  0.3435 sec/batch
Epoch: 17/100...  Training Step: 2947...  Training loss: 5.3133...  0.3435 sec/batch
Epoch: 17/100...  Training Step: 2948...  Training loss: 5.2779...  0.3449 sec/batch
Epoch: 17/100...  Training Step: 2949...  Training loss: 5.2755...  0.3428 sec/batch
Epoch: 17/100...  Training Step: 2950...  Training loss: 5.1581...  0.3441 sec/batch
Epoch: 17/100...  Training Step: 2951...  Training loss: 5.4099..

Epoch: 17/100...  Training Step: 3037...  Training loss: 5.2977...  0.3427 sec/batch
Epoch: 17/100...  Training Step: 3038...  Training loss: 5.3000...  0.3421 sec/batch
Epoch: 17/100...  Training Step: 3039...  Training loss: 5.2178...  0.3409 sec/batch
Epoch: 17/100...  Training Step: 3040...  Training loss: 5.1709...  0.3405 sec/batch
Epoch: 17/100...  Training Step: 3041...  Training loss: 5.1312...  0.3410 sec/batch
Epoch: 17/100...  Training Step: 3042...  Training loss: 5.1065...  0.3417 sec/batch
Epoch: 17/100...  Training Step: 3043...  Training loss: 5.1517...  0.3383 sec/batch
Epoch: 17/100...  Training Step: 3044...  Training loss: 5.0979...  0.3422 sec/batch
Epoch: 17/100...  Training Step: 3045...  Training loss: 5.1065...  0.3424 sec/batch
Epoch: 17/100...  Training Step: 3046...  Training loss: 5.1110...  0.3414 sec/batch
Epoch: 17/100...  Training Step: 3047...  Training loss: 5.2943...  0.3437 sec/batch
Epoch: 17/100...  Training Step: 3048...  Training loss: 5.1879..

Epoch: 18/100...  Training Step: 3134...  Training loss: 5.1357...  0.3406 sec/batch
Epoch: 18/100...  Training Step: 3135...  Training loss: 5.4002...  0.3404 sec/batch
Epoch: 18/100...  Training Step: 3136...  Training loss: 5.2074...  0.3407 sec/batch
Epoch: 18/100...  Training Step: 3137...  Training loss: 5.2790...  0.3455 sec/batch
Epoch: 18/100...  Training Step: 3138...  Training loss: 5.1439...  0.3404 sec/batch
Epoch: 18/100...  Training Step: 3139...  Training loss: 5.1674...  0.3410 sec/batch
Epoch: 18/100...  Training Step: 3140...  Training loss: 5.1197...  0.3441 sec/batch
Epoch: 18/100...  Training Step: 3141...  Training loss: 5.1008...  0.3408 sec/batch
Epoch: 18/100...  Training Step: 3142...  Training loss: 5.2277...  0.3437 sec/batch
Epoch: 18/100...  Training Step: 3143...  Training loss: 5.2217...  0.3402 sec/batch
Epoch: 18/100...  Training Step: 3144...  Training loss: 5.2194...  0.3400 sec/batch
Epoch: 18/100...  Training Step: 3145...  Training loss: 5.2518..

Epoch: 18/100...  Training Step: 3231...  Training loss: 5.2985...  0.3430 sec/batch
Epoch: 18/100...  Training Step: 3232...  Training loss: 5.1798...  0.3435 sec/batch
Epoch: 18/100...  Training Step: 3233...  Training loss: 5.2526...  0.3407 sec/batch
Epoch: 18/100...  Training Step: 3234...  Training loss: 5.2868...  0.3418 sec/batch
Epoch: 18/100...  Training Step: 3235...  Training loss: 5.2226...  0.3422 sec/batch
Epoch: 18/100...  Training Step: 3236...  Training loss: 5.0253...  0.3415 sec/batch
Epoch: 18/100...  Training Step: 3237...  Training loss: 5.0389...  0.3408 sec/batch
Epoch: 18/100...  Training Step: 3238...  Training loss: 5.0208...  0.3410 sec/batch
Epoch: 18/100...  Training Step: 3239...  Training loss: 5.0258...  0.3424 sec/batch
Epoch: 18/100...  Training Step: 3240...  Training loss: 5.1403...  0.3426 sec/batch
Epoch: 18/100...  Training Step: 3241...  Training loss: 5.0927...  0.3398 sec/batch
Epoch: 18/100...  Training Step: 3242...  Training loss: 5.0689..

Epoch: 19/100...  Training Step: 3328...  Training loss: 5.1763...  0.3424 sec/batch
Epoch: 19/100...  Training Step: 3329...  Training loss: 5.2061...  0.3409 sec/batch
Epoch: 19/100...  Training Step: 3330...  Training loss: 5.0888...  0.3406 sec/batch
Epoch: 19/100...  Training Step: 3331...  Training loss: 5.2117...  0.3413 sec/batch
Epoch: 19/100...  Training Step: 3332...  Training loss: 5.1097...  0.3412 sec/batch
Epoch: 19/100...  Training Step: 3333...  Training loss: 5.1315...  0.3450 sec/batch
Epoch: 19/100...  Training Step: 3334...  Training loss: 5.1098...  0.3423 sec/batch
Epoch: 19/100...  Training Step: 3335...  Training loss: 5.1132...  0.3406 sec/batch
Epoch: 19/100...  Training Step: 3336...  Training loss: 5.1415...  0.3410 sec/batch
Epoch: 19/100...  Training Step: 3337...  Training loss: 5.1369...  0.3429 sec/batch
Epoch: 19/100...  Training Step: 3338...  Training loss: 5.0805...  0.3444 sec/batch
Epoch: 19/100...  Training Step: 3339...  Training loss: 5.1104..

Epoch: 19/100...  Training Step: 3425...  Training loss: 5.0834...  0.3407 sec/batch
Epoch: 19/100...  Training Step: 3426...  Training loss: 5.0585...  0.3400 sec/batch
Epoch: 19/100...  Training Step: 3427...  Training loss: 4.9868...  0.3406 sec/batch
Epoch: 19/100...  Training Step: 3428...  Training loss: 4.9670...  0.3420 sec/batch
Epoch: 19/100...  Training Step: 3429...  Training loss: 5.1486...  0.3412 sec/batch
Epoch: 19/100...  Training Step: 3430...  Training loss: 5.0010...  0.3389 sec/batch
Epoch: 19/100...  Training Step: 3431...  Training loss: 4.9818...  0.3406 sec/batch
Epoch: 19/100...  Training Step: 3432...  Training loss: 5.0612...  0.3391 sec/batch
Epoch: 19/100...  Training Step: 3433...  Training loss: 5.2438...  0.3424 sec/batch
Epoch: 19/100...  Training Step: 3434...  Training loss: 5.1567...  0.3393 sec/batch
Epoch: 19/100...  Training Step: 3435...  Training loss: 5.0766...  0.3407 sec/batch
Epoch: 19/100...  Training Step: 3436...  Training loss: 5.0631..

Epoch: 20/100...  Training Step: 3522...  Training loss: 5.0430...  0.3421 sec/batch
Epoch: 20/100...  Training Step: 3523...  Training loss: 5.1004...  0.3413 sec/batch
Epoch: 20/100...  Training Step: 3524...  Training loss: 5.0797...  0.3408 sec/batch
Epoch: 20/100...  Training Step: 3525...  Training loss: 5.0964...  0.3421 sec/batch
Epoch: 20/100...  Training Step: 3526...  Training loss: 5.0929...  0.3429 sec/batch
Epoch: 20/100...  Training Step: 3527...  Training loss: 5.0683...  0.3413 sec/batch
Epoch: 20/100...  Training Step: 3528...  Training loss: 5.0356...  0.3424 sec/batch
Epoch: 20/100...  Training Step: 3529...  Training loss: 4.7807...  0.3425 sec/batch
Epoch: 20/100...  Training Step: 3530...  Training loss: 4.8627...  0.3423 sec/batch
Epoch: 20/100...  Training Step: 3531...  Training loss: 5.0026...  0.3395 sec/batch
Epoch: 20/100...  Training Step: 3532...  Training loss: 5.2087...  0.3405 sec/batch
Epoch: 20/100...  Training Step: 3533...  Training loss: 5.1249..

Epoch: 20/100...  Training Step: 3619...  Training loss: 5.0520...  0.3410 sec/batch
Epoch: 20/100...  Training Step: 3620...  Training loss: 5.0339...  0.3404 sec/batch
Epoch: 20/100...  Training Step: 3621...  Training loss: 4.9640...  0.3414 sec/batch
Epoch: 20/100...  Training Step: 3622...  Training loss: 4.9919...  0.3401 sec/batch
Epoch: 20/100...  Training Step: 3623...  Training loss: 5.0672...  0.3413 sec/batch
Epoch: 20/100...  Training Step: 3624...  Training loss: 5.1017...  0.3409 sec/batch
Epoch: 20/100...  Training Step: 3625...  Training loss: 5.2105...  0.3394 sec/batch
Epoch: 20/100...  Training Step: 3626...  Training loss: 5.0702...  0.3399 sec/batch
Epoch: 20/100...  Training Step: 3627...  Training loss: 5.0723...  0.3414 sec/batch
Epoch: 20/100...  Training Step: 3628...  Training loss: 5.0932...  0.3412 sec/batch
Epoch: 20/100...  Training Step: 3629...  Training loss: 5.1347...  0.3398 sec/batch
Epoch: 20/100...  Training Step: 3630...  Training loss: 4.9252..

Epoch: 21/100...  Training Step: 3716...  Training loss: 5.1802...  0.3414 sec/batch
Epoch: 21/100...  Training Step: 3717...  Training loss: 5.0704...  0.3384 sec/batch
Epoch: 21/100...  Training Step: 3718...  Training loss: 5.1242...  0.3433 sec/batch
Epoch: 21/100...  Training Step: 3719...  Training loss: 5.0112...  0.3412 sec/batch
Epoch: 21/100...  Training Step: 3720...  Training loss: 5.0471...  0.3423 sec/batch
Epoch: 21/100...  Training Step: 3721...  Training loss: 5.1582...  0.3420 sec/batch
Epoch: 21/100...  Training Step: 3722...  Training loss: 5.1241...  0.3405 sec/batch
Epoch: 21/100...  Training Step: 3723...  Training loss: 5.1884...  0.3395 sec/batch
Epoch: 21/100...  Training Step: 3724...  Training loss: 5.1731...  0.3398 sec/batch
Epoch: 21/100...  Training Step: 3725...  Training loss: 5.0882...  0.3436 sec/batch
Epoch: 21/100...  Training Step: 3726...  Training loss: 5.0581...  0.3390 sec/batch
Epoch: 21/100...  Training Step: 3727...  Training loss: 5.1181..

Epoch: 21/100...  Training Step: 3813...  Training loss: 5.1326...  0.3414 sec/batch
Epoch: 21/100...  Training Step: 3814...  Training loss: 4.9067...  0.3391 sec/batch
Epoch: 21/100...  Training Step: 3815...  Training loss: 4.9721...  0.3427 sec/batch
Epoch: 21/100...  Training Step: 3816...  Training loss: 4.9145...  0.3420 sec/batch
Epoch: 21/100...  Training Step: 3817...  Training loss: 5.1332...  0.3442 sec/batch
Epoch: 21/100...  Training Step: 3818...  Training loss: 4.9779...  0.3417 sec/batch
Epoch: 21/100...  Training Step: 3819...  Training loss: 5.0642...  0.3436 sec/batch
Epoch: 21/100...  Training Step: 3820...  Training loss: 5.3647...  0.3410 sec/batch
Epoch: 21/100...  Training Step: 3821...  Training loss: 5.3572...  0.3404 sec/batch
Epoch: 21/100...  Training Step: 3822...  Training loss: 5.2027...  0.3437 sec/batch
Epoch: 21/100...  Training Step: 3823...  Training loss: 4.9666...  0.3388 sec/batch
Epoch: 21/100...  Training Step: 3824...  Training loss: 4.9291..

Epoch: 22/100...  Training Step: 3910...  Training loss: 5.0322...  0.3426 sec/batch
Epoch: 22/100...  Training Step: 3911...  Training loss: 5.1156...  0.3411 sec/batch
Epoch: 22/100...  Training Step: 3912...  Training loss: 4.9703...  0.3434 sec/batch
Epoch: 22/100...  Training Step: 3913...  Training loss: 5.0474...  0.3400 sec/batch
Epoch: 22/100...  Training Step: 3914...  Training loss: 5.2002...  0.3409 sec/batch
Epoch: 22/100...  Training Step: 3915...  Training loss: 5.0963...  0.3427 sec/batch
Epoch: 22/100...  Training Step: 3916...  Training loss: 5.0172...  0.3402 sec/batch
Epoch: 22/100...  Training Step: 3917...  Training loss: 5.0925...  0.3426 sec/batch
Epoch: 22/100...  Training Step: 3918...  Training loss: 5.1253...  0.3444 sec/batch
Epoch: 22/100...  Training Step: 3919...  Training loss: 5.1369...  0.3438 sec/batch
Epoch: 22/100...  Training Step: 3920...  Training loss: 5.0461...  0.3401 sec/batch
Epoch: 22/100...  Training Step: 3921...  Training loss: 4.9735..

Epoch: 22/100...  Training Step: 4007...  Training loss: 4.9618...  0.3433 sec/batch
Epoch: 22/100...  Training Step: 4008...  Training loss: 4.9374...  0.3434 sec/batch
Epoch: 22/100...  Training Step: 4009...  Training loss: 4.9172...  0.3404 sec/batch
Epoch: 22/100...  Training Step: 4010...  Training loss: 4.9810...  0.3399 sec/batch
Epoch: 22/100...  Training Step: 4011...  Training loss: 5.0941...  0.3397 sec/batch
Epoch: 22/100...  Training Step: 4012...  Training loss: 5.0396...  0.3416 sec/batch
Epoch: 22/100...  Training Step: 4013...  Training loss: 5.0564...  0.3419 sec/batch
Epoch: 22/100...  Training Step: 4014...  Training loss: 4.9973...  0.3413 sec/batch
Epoch: 22/100...  Training Step: 4015...  Training loss: 5.0226...  0.3388 sec/batch
Epoch: 22/100...  Training Step: 4016...  Training loss: 4.8414...  0.3421 sec/batch
Epoch: 22/100...  Training Step: 4017...  Training loss: 5.0353...  0.3400 sec/batch
Epoch: 22/100...  Training Step: 4018...  Training loss: 4.9714..

Epoch: 23/100...  Training Step: 4104...  Training loss: 5.0485...  0.3415 sec/batch
Epoch: 23/100...  Training Step: 4105...  Training loss: 4.9510...  0.3439 sec/batch
Epoch: 23/100...  Training Step: 4106...  Training loss: 5.0767...  0.3410 sec/batch
Epoch: 23/100...  Training Step: 4107...  Training loss: 4.8992...  0.3448 sec/batch
Epoch: 23/100...  Training Step: 4108...  Training loss: 4.9279...  0.3445 sec/batch
Epoch: 23/100...  Training Step: 4109...  Training loss: 4.9378...  0.3421 sec/batch
Epoch: 23/100...  Training Step: 4110...  Training loss: 5.0017...  0.3436 sec/batch
Epoch: 23/100...  Training Step: 4111...  Training loss: 5.0603...  0.3495 sec/batch
Epoch: 23/100...  Training Step: 4112...  Training loss: 5.0581...  0.3443 sec/batch
Epoch: 23/100...  Training Step: 4113...  Training loss: 4.9196...  0.3457 sec/batch
Epoch: 23/100...  Training Step: 4114...  Training loss: 4.9459...  0.3444 sec/batch
Epoch: 23/100...  Training Step: 4115...  Training loss: 4.9923..

Epoch: 23/100...  Training Step: 4201...  Training loss: 5.0190...  0.3409 sec/batch
Epoch: 23/100...  Training Step: 4202...  Training loss: 4.9411...  0.3408 sec/batch
Epoch: 23/100...  Training Step: 4203...  Training loss: 5.0410...  0.3436 sec/batch
Epoch: 23/100...  Training Step: 4204...  Training loss: 4.9751...  0.3432 sec/batch
Epoch: 23/100...  Training Step: 4205...  Training loss: 5.0480...  0.3419 sec/batch
Epoch: 23/100...  Training Step: 4206...  Training loss: 5.0880...  0.3427 sec/batch
Epoch: 23/100...  Training Step: 4207...  Training loss: 4.9863...  0.3412 sec/batch
Epoch: 23/100...  Training Step: 4208...  Training loss: 4.9652...  0.3436 sec/batch
Epoch: 23/100...  Training Step: 4209...  Training loss: 5.0087...  0.3426 sec/batch
Epoch: 23/100...  Training Step: 4210...  Training loss: 5.0487...  0.3394 sec/batch
Epoch: 23/100...  Training Step: 4211...  Training loss: 5.0509...  0.3431 sec/batch
Epoch: 23/100...  Training Step: 4212...  Training loss: 5.0758..

Epoch: 24/100...  Training Step: 4298...  Training loss: 4.9254...  0.3424 sec/batch
Epoch: 24/100...  Training Step: 4299...  Training loss: 4.9442...  0.3396 sec/batch
Epoch: 24/100...  Training Step: 4300...  Training loss: 4.8585...  0.3432 sec/batch
Epoch: 24/100...  Training Step: 4301...  Training loss: 5.0666...  0.3434 sec/batch
Epoch: 24/100...  Training Step: 4302...  Training loss: 4.9741...  0.3392 sec/batch
Epoch: 24/100...  Training Step: 4303...  Training loss: 5.0006...  0.3405 sec/batch
Epoch: 24/100...  Training Step: 4304...  Training loss: 4.8183...  0.3414 sec/batch
Epoch: 24/100...  Training Step: 4305...  Training loss: 4.8835...  0.3434 sec/batch
Epoch: 24/100...  Training Step: 4306...  Training loss: 4.9352...  0.3403 sec/batch
Epoch: 24/100...  Training Step: 4307...  Training loss: 4.9004...  0.3387 sec/batch
Epoch: 24/100...  Training Step: 4308...  Training loss: 4.9276...  0.3426 sec/batch
Epoch: 24/100...  Training Step: 4309...  Training loss: 5.0046..

Epoch: 24/100...  Training Step: 4395...  Training loss: 5.0222...  0.3401 sec/batch
Epoch: 24/100...  Training Step: 4396...  Training loss: 5.0699...  0.3415 sec/batch
Epoch: 24/100...  Training Step: 4397...  Training loss: 5.2122...  0.3424 sec/batch
Epoch: 24/100...  Training Step: 4398...  Training loss: 5.1145...  0.3392 sec/batch
Epoch: 24/100...  Training Step: 4399...  Training loss: 5.0576...  0.3393 sec/batch
Epoch: 24/100...  Training Step: 4400...  Training loss: 5.0171...  0.3407 sec/batch
Epoch: 24/100...  Training Step: 4401...  Training loss: 4.9973...  0.3409 sec/batch
Epoch: 24/100...  Training Step: 4402...  Training loss: 4.9837...  0.3400 sec/batch
Epoch: 24/100...  Training Step: 4403...  Training loss: 4.9540...  0.3413 sec/batch
Epoch: 24/100...  Training Step: 4404...  Training loss: 4.9709...  0.3437 sec/batch
Epoch: 24/100...  Training Step: 4405...  Training loss: 5.0496...  0.3418 sec/batch
Epoch: 24/100...  Training Step: 4406...  Training loss: 5.0136..

Epoch: 25/100...  Training Step: 4492...  Training loss: 4.9022...  0.3411 sec/batch
Epoch: 25/100...  Training Step: 4493...  Training loss: 4.9941...  0.3414 sec/batch
Epoch: 25/100...  Training Step: 4494...  Training loss: 4.9561...  0.3402 sec/batch
Epoch: 25/100...  Training Step: 4495...  Training loss: 5.0185...  0.3399 sec/batch
Epoch: 25/100...  Training Step: 4496...  Training loss: 5.1521...  0.3400 sec/batch
Epoch: 25/100...  Training Step: 4497...  Training loss: 5.1091...  0.3420 sec/batch
Epoch: 25/100...  Training Step: 4498...  Training loss: 5.0820...  0.3423 sec/batch
Epoch: 25/100...  Training Step: 4499...  Training loss: 5.0741...  0.3403 sec/batch
Epoch: 25/100...  Training Step: 4500...  Training loss: 5.0443...  0.3395 sec/batch
Epoch: 25/100...  Training Step: 4501...  Training loss: 5.1648...  0.3407 sec/batch
Epoch: 25/100...  Training Step: 4502...  Training loss: 5.0637...  0.3430 sec/batch
Epoch: 25/100...  Training Step: 4503...  Training loss: 4.9918..

Epoch: 25/100...  Training Step: 4589...  Training loss: 5.0295...  0.3398 sec/batch
Epoch: 25/100...  Training Step: 4590...  Training loss: 4.9926...  0.3417 sec/batch
Epoch: 25/100...  Training Step: 4591...  Training loss: 4.9745...  0.3404 sec/batch
Epoch: 25/100...  Training Step: 4592...  Training loss: 4.9321...  0.3427 sec/batch
Epoch: 25/100...  Training Step: 4593...  Training loss: 5.0026...  0.3422 sec/batch
Epoch: 25/100...  Training Step: 4594...  Training loss: 4.9726...  0.3421 sec/batch
Epoch: 25/100...  Training Step: 4595...  Training loss: 4.9463...  0.3406 sec/batch
Epoch: 25/100...  Training Step: 4596...  Training loss: 4.9698...  0.3401 sec/batch
Epoch: 25/100...  Training Step: 4597...  Training loss: 5.3022...  0.3419 sec/batch
Epoch: 25/100...  Training Step: 4598...  Training loss: 5.1800...  0.3416 sec/batch
Epoch: 25/100...  Training Step: 4599...  Training loss: 5.1851...  0.3416 sec/batch
Epoch: 25/100...  Training Step: 4600...  Training loss: 5.1116..

Epoch: 26/100...  Training Step: 4686...  Training loss: 5.0271...  0.3396 sec/batch
Epoch: 26/100...  Training Step: 4687...  Training loss: 4.9740...  0.3391 sec/batch
Epoch: 26/100...  Training Step: 4688...  Training loss: 5.0542...  0.3413 sec/batch
Epoch: 26/100...  Training Step: 4689...  Training loss: 5.1435...  0.3413 sec/batch
Epoch: 26/100...  Training Step: 4690...  Training loss: 5.1532...  0.3423 sec/batch
Epoch: 26/100...  Training Step: 4691...  Training loss: 5.0820...  0.3409 sec/batch
Epoch: 26/100...  Training Step: 4692...  Training loss: 4.8464...  0.3386 sec/batch
Epoch: 26/100...  Training Step: 4693...  Training loss: 5.0591...  0.3405 sec/batch
Epoch: 26/100...  Training Step: 4694...  Training loss: 5.0812...  0.3418 sec/batch
Epoch: 26/100...  Training Step: 4695...  Training loss: 5.0226...  0.3409 sec/batch
Epoch: 26/100...  Training Step: 4696...  Training loss: 4.9625...  0.3409 sec/batch
Epoch: 26/100...  Training Step: 4697...  Training loss: 4.9567..

Epoch: 26/100...  Training Step: 4783...  Training loss: 5.1904...  0.3403 sec/batch
Epoch: 26/100...  Training Step: 4784...  Training loss: 5.1054...  0.3420 sec/batch
Epoch: 27/100...  Training Step: 4785...  Training loss: 5.2492...  0.3429 sec/batch
Epoch: 27/100...  Training Step: 4786...  Training loss: 5.2465...  0.3424 sec/batch
Epoch: 27/100...  Training Step: 4787...  Training loss: 5.0538...  0.3411 sec/batch
Epoch: 27/100...  Training Step: 4788...  Training loss: 5.0284...  0.3397 sec/batch
Epoch: 27/100...  Training Step: 4789...  Training loss: 5.0325...  0.3424 sec/batch
Epoch: 27/100...  Training Step: 4790...  Training loss: 4.9351...  0.3386 sec/batch
Epoch: 27/100...  Training Step: 4791...  Training loss: 5.1236...  0.3441 sec/batch
Epoch: 27/100...  Training Step: 4792...  Training loss: 4.9976...  0.3412 sec/batch
Epoch: 27/100...  Training Step: 4793...  Training loss: 5.0249...  0.3400 sec/batch
Epoch: 27/100...  Training Step: 4794...  Training loss: 4.9412..

Epoch: 27/100...  Training Step: 4880...  Training loss: 4.9230...  0.3405 sec/batch
Epoch: 27/100...  Training Step: 4881...  Training loss: 4.8989...  0.3404 sec/batch
Epoch: 27/100...  Training Step: 4882...  Training loss: 4.8788...  0.3424 sec/batch
Epoch: 27/100...  Training Step: 4883...  Training loss: 4.9118...  0.3405 sec/batch
Epoch: 27/100...  Training Step: 4884...  Training loss: 4.8775...  0.3411 sec/batch
Epoch: 27/100...  Training Step: 4885...  Training loss: 4.8972...  0.3405 sec/batch
Epoch: 27/100...  Training Step: 4886...  Training loss: 4.8996...  0.3421 sec/batch
Epoch: 27/100...  Training Step: 4887...  Training loss: 5.0495...  0.3394 sec/batch
Epoch: 27/100...  Training Step: 4888...  Training loss: 4.9459...  0.3385 sec/batch
Epoch: 27/100...  Training Step: 4889...  Training loss: 4.9942...  0.3396 sec/batch
Epoch: 27/100...  Training Step: 4890...  Training loss: 5.0377...  0.3412 sec/batch
Epoch: 27/100...  Training Step: 4891...  Training loss: 4.9938..

Epoch: 28/100...  Training Step: 4977...  Training loss: 5.0169...  0.3411 sec/batch
Epoch: 28/100...  Training Step: 4978...  Training loss: 4.9315...  0.3411 sec/batch
Epoch: 28/100...  Training Step: 4979...  Training loss: 4.9176...  0.3397 sec/batch
Epoch: 28/100...  Training Step: 4980...  Training loss: 4.8636...  0.3397 sec/batch
Epoch: 28/100...  Training Step: 4981...  Training loss: 4.8485...  0.3439 sec/batch
Epoch: 28/100...  Training Step: 4982...  Training loss: 4.9376...  0.3433 sec/batch
Epoch: 28/100...  Training Step: 4983...  Training loss: 4.9499...  0.3403 sec/batch
Epoch: 28/100...  Training Step: 4984...  Training loss: 4.8951...  0.3412 sec/batch
Epoch: 28/100...  Training Step: 4985...  Training loss: 4.9798...  0.3400 sec/batch
Epoch: 28/100...  Training Step: 4986...  Training loss: 4.8710...  0.3409 sec/batch
Epoch: 28/100...  Training Step: 4987...  Training loss: 4.9987...  0.3403 sec/batch
Epoch: 28/100...  Training Step: 4988...  Training loss: 4.9103..

Epoch: 28/100...  Training Step: 5074...  Training loss: 5.0254...  0.3407 sec/batch
Epoch: 28/100...  Training Step: 5075...  Training loss: 4.9673...  0.3437 sec/batch
Epoch: 28/100...  Training Step: 5076...  Training loss: 4.8177...  0.3437 sec/batch
Epoch: 28/100...  Training Step: 5077...  Training loss: 4.8131...  0.3457 sec/batch
Epoch: 28/100...  Training Step: 5078...  Training loss: 4.8395...  0.3450 sec/batch
Epoch: 28/100...  Training Step: 5079...  Training loss: 4.8157...  0.3398 sec/batch
Epoch: 28/100...  Training Step: 5080...  Training loss: 4.8888...  0.3413 sec/batch
Epoch: 28/100...  Training Step: 5081...  Training loss: 4.8612...  0.3412 sec/batch
Epoch: 28/100...  Training Step: 5082...  Training loss: 4.8464...  0.3430 sec/batch
Epoch: 28/100...  Training Step: 5083...  Training loss: 4.8120...  0.3444 sec/batch
Epoch: 28/100...  Training Step: 5084...  Training loss: 4.8188...  0.3450 sec/batch
Epoch: 28/100...  Training Step: 5085...  Training loss: 4.9555..

Epoch: 29/100...  Training Step: 5171...  Training loss: 4.9641...  0.3405 sec/batch
Epoch: 29/100...  Training Step: 5172...  Training loss: 4.8719...  0.3402 sec/batch
Epoch: 29/100...  Training Step: 5173...  Training loss: 4.8610...  0.3403 sec/batch
Epoch: 29/100...  Training Step: 5174...  Training loss: 4.8538...  0.3399 sec/batch
Epoch: 29/100...  Training Step: 5175...  Training loss: 4.8684...  0.3395 sec/batch
Epoch: 29/100...  Training Step: 5176...  Training loss: 4.8867...  0.3439 sec/batch
Epoch: 29/100...  Training Step: 5177...  Training loss: 4.9051...  0.3399 sec/batch
Epoch: 29/100...  Training Step: 5178...  Training loss: 4.8461...  0.3403 sec/batch
Epoch: 29/100...  Training Step: 5179...  Training loss: 4.8624...  0.3399 sec/batch
Epoch: 29/100...  Training Step: 5180...  Training loss: 4.8611...  0.3409 sec/batch
Epoch: 29/100...  Training Step: 5181...  Training loss: 4.8828...  0.3405 sec/batch
Epoch: 29/100...  Training Step: 5182...  Training loss: 4.9190..

Epoch: 29/100...  Training Step: 5268...  Training loss: 4.7784...  0.3408 sec/batch
Epoch: 29/100...  Training Step: 5269...  Training loss: 4.9253...  0.3408 sec/batch
Epoch: 29/100...  Training Step: 5270...  Training loss: 4.7722...  0.3415 sec/batch
Epoch: 29/100...  Training Step: 5271...  Training loss: 4.7942...  0.3426 sec/batch
Epoch: 29/100...  Training Step: 5272...  Training loss: 4.8349...  0.3400 sec/batch
Epoch: 29/100...  Training Step: 5273...  Training loss: 5.0158...  0.3438 sec/batch
Epoch: 29/100...  Training Step: 5274...  Training loss: 4.9488...  0.3411 sec/batch
Epoch: 29/100...  Training Step: 5275...  Training loss: 4.8889...  0.3409 sec/batch
Epoch: 29/100...  Training Step: 5276...  Training loss: 4.8631...  0.3425 sec/batch
Epoch: 29/100...  Training Step: 5277...  Training loss: 4.8001...  0.3405 sec/batch
Epoch: 29/100...  Training Step: 5278...  Training loss: 4.8155...  0.3400 sec/batch
Epoch: 29/100...  Training Step: 5279...  Training loss: 4.8802..

Epoch: 30/100...  Training Step: 5365...  Training loss: 4.8784...  0.3448 sec/batch
Epoch: 30/100...  Training Step: 5366...  Training loss: 4.9042...  0.3390 sec/batch
Epoch: 30/100...  Training Step: 5367...  Training loss: 4.8543...  0.3411 sec/batch
Epoch: 30/100...  Training Step: 5368...  Training loss: 4.8216...  0.3407 sec/batch
Epoch: 30/100...  Training Step: 5369...  Training loss: 4.5663...  0.3427 sec/batch
Epoch: 30/100...  Training Step: 5370...  Training loss: 4.6888...  0.3408 sec/batch
Epoch: 30/100...  Training Step: 5371...  Training loss: 4.8122...  0.3414 sec/batch
Epoch: 30/100...  Training Step: 5372...  Training loss: 4.9699...  0.3430 sec/batch
Epoch: 30/100...  Training Step: 5373...  Training loss: 4.8972...  0.3413 sec/batch
Epoch: 30/100...  Training Step: 5374...  Training loss: 4.9448...  0.3412 sec/batch
Epoch: 30/100...  Training Step: 5375...  Training loss: 4.8674...  0.3398 sec/batch
Epoch: 30/100...  Training Step: 5376...  Training loss: 4.8828..

Epoch: 30/100...  Training Step: 5462...  Training loss: 4.8056...  0.3413 sec/batch
Epoch: 30/100...  Training Step: 5463...  Training loss: 4.8503...  0.3394 sec/batch
Epoch: 30/100...  Training Step: 5464...  Training loss: 4.8746...  0.3437 sec/batch
Epoch: 30/100...  Training Step: 5465...  Training loss: 4.9803...  0.3423 sec/batch
Epoch: 30/100...  Training Step: 5466...  Training loss: 4.8554...  0.3407 sec/batch
Epoch: 30/100...  Training Step: 5467...  Training loss: 4.8660...  0.3439 sec/batch
Epoch: 30/100...  Training Step: 5468...  Training loss: 4.8616...  0.3451 sec/batch
Epoch: 30/100...  Training Step: 5469...  Training loss: 4.9298...  0.3406 sec/batch
Epoch: 30/100...  Training Step: 5470...  Training loss: 4.7291...  0.3421 sec/batch
Epoch: 30/100...  Training Step: 5471...  Training loss: 4.8103...  0.3441 sec/batch
Epoch: 30/100...  Training Step: 5472...  Training loss: 4.7478...  0.3437 sec/batch
Epoch: 30/100...  Training Step: 5473...  Training loss: 4.9304..

Epoch: 31/100...  Training Step: 5559...  Training loss: 4.8563...  0.3407 sec/batch
Epoch: 31/100...  Training Step: 5560...  Training loss: 4.8709...  0.3406 sec/batch
Epoch: 31/100...  Training Step: 5561...  Training loss: 4.9545...  0.3416 sec/batch
Epoch: 31/100...  Training Step: 5562...  Training loss: 4.9005...  0.3421 sec/batch
Epoch: 31/100...  Training Step: 5563...  Training loss: 4.9763...  0.3444 sec/batch
Epoch: 31/100...  Training Step: 5564...  Training loss: 4.9352...  0.3399 sec/batch
Epoch: 31/100...  Training Step: 5565...  Training loss: 4.8772...  0.3415 sec/batch
Epoch: 31/100...  Training Step: 5566...  Training loss: 4.8330...  0.3428 sec/batch
Epoch: 31/100...  Training Step: 5567...  Training loss: 4.9090...  0.3396 sec/batch
Epoch: 31/100...  Training Step: 5568...  Training loss: 4.7941...  0.3428 sec/batch
Epoch: 31/100...  Training Step: 5569...  Training loss: 4.8517...  0.3418 sec/batch
Epoch: 31/100...  Training Step: 5570...  Training loss: 4.9903..

Epoch: 31/100...  Training Step: 5656...  Training loss: 4.7279...  0.3436 sec/batch
Epoch: 31/100...  Training Step: 5657...  Training loss: 4.9134...  0.3422 sec/batch
Epoch: 31/100...  Training Step: 5658...  Training loss: 4.7827...  0.3414 sec/batch
Epoch: 31/100...  Training Step: 5659...  Training loss: 4.8644...  0.3434 sec/batch
Epoch: 31/100...  Training Step: 5660...  Training loss: 5.0795...  0.3410 sec/batch
Epoch: 31/100...  Training Step: 5661...  Training loss: 5.0685...  0.3446 sec/batch
Epoch: 31/100...  Training Step: 5662...  Training loss: 4.9837...  0.3426 sec/batch
Epoch: 31/100...  Training Step: 5663...  Training loss: 4.8111...  0.3432 sec/batch
Epoch: 31/100...  Training Step: 5664...  Training loss: 4.7838...  0.3424 sec/batch
Epoch: 31/100...  Training Step: 5665...  Training loss: 4.7999...  0.3436 sec/batch
Epoch: 31/100...  Training Step: 5666...  Training loss: 4.8379...  0.3415 sec/batch
Epoch: 31/100...  Training Step: 5667...  Training loss: 4.8994..

Epoch: 32/100...  Training Step: 5753...  Training loss: 4.8357...  0.3405 sec/batch
Epoch: 32/100...  Training Step: 5754...  Training loss: 4.9576...  0.3403 sec/batch
Epoch: 32/100...  Training Step: 5755...  Training loss: 4.8884...  0.3413 sec/batch
Epoch: 32/100...  Training Step: 5756...  Training loss: 4.8317...  0.3394 sec/batch
Epoch: 32/100...  Training Step: 5757...  Training loss: 4.8990...  0.3397 sec/batch
Epoch: 32/100...  Training Step: 5758...  Training loss: 4.9279...  0.3412 sec/batch
Epoch: 32/100...  Training Step: 5759...  Training loss: 4.9187...  0.3411 sec/batch
Epoch: 32/100...  Training Step: 5760...  Training loss: 4.8516...  0.3410 sec/batch
Epoch: 32/100...  Training Step: 5761...  Training loss: 4.7857...  0.3393 sec/batch
Epoch: 32/100...  Training Step: 5762...  Training loss: 4.9000...  0.3453 sec/batch
Epoch: 32/100...  Training Step: 5763...  Training loss: 4.7451...  0.3397 sec/batch
Epoch: 32/100...  Training Step: 5764...  Training loss: 4.7682..

Epoch: 32/100...  Training Step: 5850...  Training loss: 4.8089...  0.3435 sec/batch
Epoch: 32/100...  Training Step: 5851...  Training loss: 4.8872...  0.3407 sec/batch
Epoch: 32/100...  Training Step: 5852...  Training loss: 4.8435...  0.3387 sec/batch
Epoch: 32/100...  Training Step: 5853...  Training loss: 4.8644...  0.3448 sec/batch
Epoch: 32/100...  Training Step: 5854...  Training loss: 4.7943...  0.3456 sec/batch
Epoch: 32/100...  Training Step: 5855...  Training loss: 4.8431...  0.3410 sec/batch
Epoch: 32/100...  Training Step: 5856...  Training loss: 4.7107...  0.3425 sec/batch
Epoch: 32/100...  Training Step: 5857...  Training loss: 4.8428...  0.3402 sec/batch
Epoch: 32/100...  Training Step: 5858...  Training loss: 4.7995...  0.3409 sec/batch
Epoch: 32/100...  Training Step: 5859...  Training loss: 4.8725...  0.3403 sec/batch
Epoch: 32/100...  Training Step: 5860...  Training loss: 4.8440...  0.3435 sec/batch
Epoch: 32/100...  Training Step: 5861...  Training loss: 4.9005..

Epoch: 33/100...  Training Step: 5947...  Training loss: 4.7044...  0.3454 sec/batch
Epoch: 33/100...  Training Step: 5948...  Training loss: 4.7308...  0.3455 sec/batch
Epoch: 33/100...  Training Step: 5949...  Training loss: 4.7460...  0.3445 sec/batch
Epoch: 33/100...  Training Step: 5950...  Training loss: 4.8046...  0.3458 sec/batch
Epoch: 33/100...  Training Step: 5951...  Training loss: 4.8501...  0.3422 sec/batch
Epoch: 33/100...  Training Step: 5952...  Training loss: 4.8643...  0.3422 sec/batch
Epoch: 33/100...  Training Step: 5953...  Training loss: 4.7532...  0.3410 sec/batch
Epoch: 33/100...  Training Step: 5954...  Training loss: 4.7606...  0.3444 sec/batch
Epoch: 33/100...  Training Step: 5955...  Training loss: 4.8002...  0.3411 sec/batch
Epoch: 33/100...  Training Step: 5956...  Training loss: 4.7172...  0.3437 sec/batch
Epoch: 33/100...  Training Step: 5957...  Training loss: 4.8767...  0.3443 sec/batch
Epoch: 33/100...  Training Step: 5958...  Training loss: 4.8265..

Epoch: 33/100...  Training Step: 6044...  Training loss: 4.8001...  0.3407 sec/batch
Epoch: 33/100...  Training Step: 6045...  Training loss: 4.8438...  0.3376 sec/batch
Epoch: 33/100...  Training Step: 6046...  Training loss: 4.8786...  0.3378 sec/batch
Epoch: 33/100...  Training Step: 6047...  Training loss: 4.7364...  0.3411 sec/batch
Epoch: 33/100...  Training Step: 6048...  Training loss: 4.7676...  0.3372 sec/batch
Epoch: 33/100...  Training Step: 6049...  Training loss: 4.7875...  0.3386 sec/batch
Epoch: 33/100...  Training Step: 6050...  Training loss: 4.8277...  0.3398 sec/batch
Epoch: 33/100...  Training Step: 6051...  Training loss: 4.8146...  0.3392 sec/batch
Epoch: 33/100...  Training Step: 6052...  Training loss: 4.8147...  0.3417 sec/batch
Epoch: 33/100...  Training Step: 6053...  Training loss: 4.9311...  0.3388 sec/batch
Epoch: 33/100...  Training Step: 6054...  Training loss: 4.8424...  0.3415 sec/batch
Epoch: 33/100...  Training Step: 6055...  Training loss: 4.8012..

Epoch: 34/100...  Training Step: 6141...  Training loss: 4.8529...  0.3366 sec/batch
Epoch: 34/100...  Training Step: 6142...  Training loss: 4.7789...  0.3422 sec/batch
Epoch: 34/100...  Training Step: 6143...  Training loss: 4.7841...  0.3423 sec/batch
Epoch: 34/100...  Training Step: 6144...  Training loss: 4.6303...  0.3419 sec/batch
Epoch: 34/100...  Training Step: 6145...  Training loss: 4.6648...  0.3421 sec/batch
Epoch: 34/100...  Training Step: 6146...  Training loss: 4.7030...  0.3406 sec/batch
Epoch: 34/100...  Training Step: 6147...  Training loss: 4.6866...  0.3408 sec/batch
Epoch: 34/100...  Training Step: 6148...  Training loss: 4.7092...  0.3416 sec/batch
Epoch: 34/100...  Training Step: 6149...  Training loss: 4.7593...  0.3372 sec/batch
Epoch: 34/100...  Training Step: 6150...  Training loss: 4.7508...  0.3404 sec/batch
Epoch: 34/100...  Training Step: 6151...  Training loss: 4.8285...  0.3383 sec/batch
Epoch: 34/100...  Training Step: 6152...  Training loss: 4.9163..

Epoch: 34/100...  Training Step: 6238...  Training loss: 4.8280...  0.3419 sec/batch
Epoch: 34/100...  Training Step: 6239...  Training loss: 4.7561...  0.3414 sec/batch
Epoch: 34/100...  Training Step: 6240...  Training loss: 4.7777...  0.3377 sec/batch
Epoch: 34/100...  Training Step: 6241...  Training loss: 4.7860...  0.3384 sec/batch
Epoch: 34/100...  Training Step: 6242...  Training loss: 4.7556...  0.3392 sec/batch
Epoch: 34/100...  Training Step: 6243...  Training loss: 4.7461...  0.3404 sec/batch
Epoch: 34/100...  Training Step: 6244...  Training loss: 4.7409...  0.3422 sec/batch
Epoch: 34/100...  Training Step: 6245...  Training loss: 4.7744...  0.3400 sec/batch
Epoch: 34/100...  Training Step: 6246...  Training loss: 4.7478...  0.3391 sec/batch
Epoch: 34/100...  Training Step: 6247...  Training loss: 4.7249...  0.3380 sec/batch
Epoch: 34/100...  Training Step: 6248...  Training loss: 4.7000...  0.3394 sec/batch
Epoch: 34/100...  Training Step: 6249...  Training loss: 4.7829..

Epoch: 35/100...  Training Step: 6335...  Training loss: 4.7877...  0.3376 sec/batch
Epoch: 35/100...  Training Step: 6336...  Training loss: 4.8499...  0.3425 sec/batch
Epoch: 35/100...  Training Step: 6337...  Training loss: 4.8641...  0.3379 sec/batch
Epoch: 35/100...  Training Step: 6338...  Training loss: 4.8359...  0.3429 sec/batch
Epoch: 35/100...  Training Step: 6339...  Training loss: 4.8132...  0.3416 sec/batch
Epoch: 35/100...  Training Step: 6340...  Training loss: 4.7802...  0.3391 sec/batch
Epoch: 35/100...  Training Step: 6341...  Training loss: 4.8183...  0.3405 sec/batch
Epoch: 35/100...  Training Step: 6342...  Training loss: 4.7811...  0.3382 sec/batch
Epoch: 35/100...  Training Step: 6343...  Training loss: 4.7537...  0.3430 sec/batch
Epoch: 35/100...  Training Step: 6344...  Training loss: 4.7742...  0.3425 sec/batch
Epoch: 35/100...  Training Step: 6345...  Training loss: 4.8226...  0.3422 sec/batch
Epoch: 35/100...  Training Step: 6346...  Training loss: 4.8554..

Epoch: 35/100...  Training Step: 6432...  Training loss: 4.6781...  0.3418 sec/batch
Epoch: 35/100...  Training Step: 6433...  Training loss: 4.7329...  0.3433 sec/batch
Epoch: 35/100...  Training Step: 6434...  Training loss: 4.7198...  0.3433 sec/batch
Epoch: 35/100...  Training Step: 6435...  Training loss: 4.7015...  0.3405 sec/batch
Epoch: 35/100...  Training Step: 6436...  Training loss: 4.6747...  0.3407 sec/batch
Epoch: 35/100...  Training Step: 6437...  Training loss: 4.9278...  0.3390 sec/batch
Epoch: 35/100...  Training Step: 6438...  Training loss: 4.8556...  0.3431 sec/batch
Epoch: 35/100...  Training Step: 6439...  Training loss: 4.8767...  0.3394 sec/batch
Epoch: 35/100...  Training Step: 6440...  Training loss: 4.8307...  0.3432 sec/batch
Epoch: 36/100...  Training Step: 6441...  Training loss: 4.9328...  0.3427 sec/batch
Epoch: 36/100...  Training Step: 6442...  Training loss: 4.8703...  0.3425 sec/batch
Epoch: 36/100...  Training Step: 6443...  Training loss: 4.7453..

Epoch: 36/100...  Training Step: 6529...  Training loss: 4.7935...  0.3398 sec/batch
Epoch: 36/100...  Training Step: 6530...  Training loss: 4.8428...  0.3399 sec/batch
Epoch: 36/100...  Training Step: 6531...  Training loss: 4.7663...  0.3393 sec/batch
Epoch: 36/100...  Training Step: 6532...  Training loss: 4.6235...  0.3406 sec/batch
Epoch: 36/100...  Training Step: 6533...  Training loss: 4.7659...  0.3377 sec/batch
Epoch: 36/100...  Training Step: 6534...  Training loss: 4.7738...  0.3398 sec/batch
Epoch: 36/100...  Training Step: 6535...  Training loss: 4.7429...  0.3416 sec/batch
Epoch: 36/100...  Training Step: 6536...  Training loss: 4.7256...  0.3400 sec/batch
Epoch: 36/100...  Training Step: 6537...  Training loss: 4.7096...  0.3395 sec/batch
Epoch: 36/100...  Training Step: 6538...  Training loss: 4.6792...  0.3401 sec/batch
Epoch: 36/100...  Training Step: 6539...  Training loss: 4.7152...  0.3378 sec/batch
Epoch: 36/100...  Training Step: 6540...  Training loss: 4.6790..

Epoch: 37/100...  Training Step: 6626...  Training loss: 4.8539...  0.3419 sec/batch
Epoch: 37/100...  Training Step: 6627...  Training loss: 4.7112...  0.3421 sec/batch
Epoch: 37/100...  Training Step: 6628...  Training loss: 4.7582...  0.3405 sec/batch
Epoch: 37/100...  Training Step: 6629...  Training loss: 4.7330...  0.3409 sec/batch
Epoch: 37/100...  Training Step: 6630...  Training loss: 4.6330...  0.3421 sec/batch
Epoch: 37/100...  Training Step: 6631...  Training loss: 4.8480...  0.3375 sec/batch
Epoch: 37/100...  Training Step: 6632...  Training loss: 4.7784...  0.3381 sec/batch
Epoch: 37/100...  Training Step: 6633...  Training loss: 4.7697...  0.3424 sec/batch
Epoch: 37/100...  Training Step: 6634...  Training loss: 4.7400...  0.3405 sec/batch
Epoch: 37/100...  Training Step: 6635...  Training loss: 4.7152...  0.3407 sec/batch
Epoch: 37/100...  Training Step: 6636...  Training loss: 4.7038...  0.3429 sec/batch
Epoch: 37/100...  Training Step: 6637...  Training loss: 4.6858..

Epoch: 37/100...  Training Step: 6723...  Training loss: 4.6440...  0.3393 sec/batch
Epoch: 37/100...  Training Step: 6724...  Training loss: 4.6439...  0.3401 sec/batch
Epoch: 37/100...  Training Step: 6725...  Training loss: 4.6709...  0.3401 sec/batch
Epoch: 37/100...  Training Step: 6726...  Training loss: 4.6392...  0.3397 sec/batch
Epoch: 37/100...  Training Step: 6727...  Training loss: 4.7703...  0.3389 sec/batch
Epoch: 37/100...  Training Step: 6728...  Training loss: 4.6486...  0.3426 sec/batch
Epoch: 37/100...  Training Step: 6729...  Training loss: 4.7404...  0.3395 sec/batch
Epoch: 37/100...  Training Step: 6730...  Training loss: 4.7654...  0.3410 sec/batch
Epoch: 37/100...  Training Step: 6731...  Training loss: 4.7160...  0.3422 sec/batch
Epoch: 37/100...  Training Step: 6732...  Training loss: 4.6094...  0.3414 sec/batch
Epoch: 37/100...  Training Step: 6733...  Training loss: 4.6126...  0.3402 sec/batch
Epoch: 37/100...  Training Step: 6734...  Training loss: 4.6234..

Epoch: 38/100...  Training Step: 6820...  Training loss: 4.6779...  0.3399 sec/batch
Epoch: 38/100...  Training Step: 6821...  Training loss: 4.6578...  0.3389 sec/batch
Epoch: 38/100...  Training Step: 6822...  Training loss: 4.7203...  0.3384 sec/batch
Epoch: 38/100...  Training Step: 6823...  Training loss: 4.7280...  0.3400 sec/batch
Epoch: 38/100...  Training Step: 6824...  Training loss: 4.6677...  0.3384 sec/batch
Epoch: 38/100...  Training Step: 6825...  Training loss: 4.7385...  0.3390 sec/batch
Epoch: 38/100...  Training Step: 6826...  Training loss: 4.6535...  0.3395 sec/batch
Epoch: 38/100...  Training Step: 6827...  Training loss: 4.7303...  0.3411 sec/batch
Epoch: 38/100...  Training Step: 6828...  Training loss: 4.6526...  0.3401 sec/batch
Epoch: 38/100...  Training Step: 6829...  Training loss: 4.6088...  0.3415 sec/batch
Epoch: 38/100...  Training Step: 6830...  Training loss: 4.5811...  0.3421 sec/batch
Epoch: 38/100...  Training Step: 6831...  Training loss: 4.5754..

Epoch: 38/100...  Training Step: 6917...  Training loss: 4.5827...  0.3387 sec/batch
Epoch: 38/100...  Training Step: 6918...  Training loss: 4.6164...  0.3390 sec/batch
Epoch: 38/100...  Training Step: 6919...  Training loss: 4.6431...  0.3422 sec/batch
Epoch: 38/100...  Training Step: 6920...  Training loss: 4.6771...  0.3397 sec/batch
Epoch: 38/100...  Training Step: 6921...  Training loss: 4.6669...  0.3397 sec/batch
Epoch: 38/100...  Training Step: 6922...  Training loss: 4.6161...  0.3407 sec/batch
Epoch: 38/100...  Training Step: 6923...  Training loss: 4.6205...  0.3378 sec/batch
Epoch: 38/100...  Training Step: 6924...  Training loss: 4.6050...  0.3414 sec/batch
Epoch: 38/100...  Training Step: 6925...  Training loss: 4.7127...  0.3379 sec/batch
Epoch: 38/100...  Training Step: 6926...  Training loss: 4.6222...  0.3413 sec/batch
Epoch: 38/100...  Training Step: 6927...  Training loss: 4.6510...  0.3403 sec/batch
Epoch: 38/100...  Training Step: 6928...  Training loss: 4.6800..

Epoch: 39/100...  Training Step: 7014...  Training loss: 4.5726...  0.3395 sec/batch
Epoch: 39/100...  Training Step: 7015...  Training loss: 4.5731...  0.3426 sec/batch
Epoch: 39/100...  Training Step: 7016...  Training loss: 4.5892...  0.3416 sec/batch
Epoch: 39/100...  Training Step: 7017...  Training loss: 4.6533...  0.3414 sec/batch
Epoch: 39/100...  Training Step: 7018...  Training loss: 4.6118...  0.3436 sec/batch
Epoch: 39/100...  Training Step: 7019...  Training loss: 4.6386...  0.3442 sec/batch
Epoch: 39/100...  Training Step: 7020...  Training loss: 4.6404...  0.3404 sec/batch
Epoch: 39/100...  Training Step: 7021...  Training loss: 4.6770...  0.3411 sec/batch
Epoch: 39/100...  Training Step: 7022...  Training loss: 4.7295...  0.3409 sec/batch
Epoch: 39/100...  Training Step: 7023...  Training loss: 4.6730...  0.3405 sec/batch
Epoch: 39/100...  Training Step: 7024...  Training loss: 4.6428...  0.3402 sec/batch
Epoch: 39/100...  Training Step: 7025...  Training loss: 4.4218..

Epoch: 39/100...  Training Step: 7111...  Training loss: 4.6291...  0.3425 sec/batch
Epoch: 39/100...  Training Step: 7112...  Training loss: 4.6615...  0.3423 sec/batch
Epoch: 39/100...  Training Step: 7113...  Training loss: 4.8496...  0.3404 sec/batch
Epoch: 39/100...  Training Step: 7114...  Training loss: 4.7709...  0.3416 sec/batch
Epoch: 39/100...  Training Step: 7115...  Training loss: 4.7174...  0.3411 sec/batch
Epoch: 39/100...  Training Step: 7116...  Training loss: 4.6690...  0.3410 sec/batch
Epoch: 39/100...  Training Step: 7117...  Training loss: 4.6168...  0.3416 sec/batch
Epoch: 39/100...  Training Step: 7118...  Training loss: 4.6359...  0.3404 sec/batch
Epoch: 39/100...  Training Step: 7119...  Training loss: 4.6391...  0.3424 sec/batch
Epoch: 39/100...  Training Step: 7120...  Training loss: 4.6603...  0.3416 sec/batch
Epoch: 39/100...  Training Step: 7121...  Training loss: 4.7560...  0.3422 sec/batch
Epoch: 39/100...  Training Step: 7122...  Training loss: 4.6743..

Epoch: 40/100...  Training Step: 7208...  Training loss: 4.6371...  0.3418 sec/batch
Epoch: 40/100...  Training Step: 7209...  Training loss: 4.4111...  0.3411 sec/batch
Epoch: 40/100...  Training Step: 7210...  Training loss: 4.4952...  0.3395 sec/batch
Epoch: 40/100...  Training Step: 7211...  Training loss: 4.6179...  0.3397 sec/batch
Epoch: 40/100...  Training Step: 7212...  Training loss: 4.7270...  0.3415 sec/batch
Epoch: 40/100...  Training Step: 7213...  Training loss: 4.7055...  0.3407 sec/batch
Epoch: 40/100...  Training Step: 7214...  Training loss: 4.7042...  0.3419 sec/batch
Epoch: 40/100...  Training Step: 7215...  Training loss: 4.6402...  0.3424 sec/batch
Epoch: 40/100...  Training Step: 7216...  Training loss: 4.6534...  0.3401 sec/batch
Epoch: 40/100...  Training Step: 7217...  Training loss: 4.7192...  0.3404 sec/batch
Epoch: 40/100...  Training Step: 7218...  Training loss: 4.7083...  0.3399 sec/batch
Epoch: 40/100...  Training Step: 7219...  Training loss: 4.7596..

Epoch: 40/100...  Training Step: 7305...  Training loss: 4.7299...  0.3420 sec/batch
Epoch: 40/100...  Training Step: 7306...  Training loss: 4.6717...  0.3433 sec/batch
Epoch: 40/100...  Training Step: 7307...  Training loss: 4.6672...  0.3416 sec/batch
Epoch: 40/100...  Training Step: 7308...  Training loss: 4.6407...  0.3437 sec/batch
Epoch: 40/100...  Training Step: 7309...  Training loss: 4.6837...  0.3416 sec/batch
Epoch: 40/100...  Training Step: 7310...  Training loss: 4.5450...  0.3431 sec/batch
Epoch: 40/100...  Training Step: 7311...  Training loss: 4.6082...  0.3391 sec/batch
Epoch: 40/100...  Training Step: 7312...  Training loss: 4.5601...  0.3418 sec/batch
Epoch: 40/100...  Training Step: 7313...  Training loss: 4.6723...  0.3414 sec/batch
Epoch: 40/100...  Training Step: 7314...  Training loss: 4.5613...  0.3401 sec/batch
Epoch: 40/100...  Training Step: 7315...  Training loss: 4.6582...  0.3420 sec/batch
Epoch: 40/100...  Training Step: 7316...  Training loss: 4.7349..

Epoch: 41/100...  Training Step: 7402...  Training loss: 4.6896...  0.3405 sec/batch
Epoch: 41/100...  Training Step: 7403...  Training loss: 4.7442...  0.3427 sec/batch
Epoch: 41/100...  Training Step: 7404...  Training loss: 4.6981...  0.3419 sec/batch
Epoch: 41/100...  Training Step: 7405...  Training loss: 4.6373...  0.3406 sec/batch
Epoch: 41/100...  Training Step: 7406...  Training loss: 4.5888...  0.3424 sec/batch
Epoch: 41/100...  Training Step: 7407...  Training loss: 4.6600...  0.3401 sec/batch
Epoch: 41/100...  Training Step: 7408...  Training loss: 4.5636...  0.3441 sec/batch
Epoch: 41/100...  Training Step: 7409...  Training loss: 4.6267...  0.3400 sec/batch
Epoch: 41/100...  Training Step: 7410...  Training loss: 4.6947...  0.3408 sec/batch
Epoch: 41/100...  Training Step: 7411...  Training loss: 4.6689...  0.3404 sec/batch
Epoch: 41/100...  Training Step: 7412...  Training loss: 4.6572...  0.3413 sec/batch
Epoch: 41/100...  Training Step: 7413...  Training loss: 4.6797..

Epoch: 41/100...  Training Step: 7499...  Training loss: 4.6226...  0.3417 sec/batch
Epoch: 41/100...  Training Step: 7500...  Training loss: 4.6796...  0.3408 sec/batch
Epoch: 41/100...  Training Step: 7501...  Training loss: 4.7360...  0.3406 sec/batch
Epoch: 41/100...  Training Step: 7502...  Training loss: 4.6949...  0.3387 sec/batch
Epoch: 41/100...  Training Step: 7503...  Training loss: 4.5662...  0.3435 sec/batch
Epoch: 41/100...  Training Step: 7504...  Training loss: 4.5131...  0.3401 sec/batch
Epoch: 41/100...  Training Step: 7505...  Training loss: 4.5673...  0.3394 sec/batch
Epoch: 41/100...  Training Step: 7506...  Training loss: 4.5983...  0.3415 sec/batch
Epoch: 41/100...  Training Step: 7507...  Training loss: 4.6199...  0.3422 sec/batch
Epoch: 41/100...  Training Step: 7508...  Training loss: 4.5856...  0.3430 sec/batch
Epoch: 41/100...  Training Step: 7509...  Training loss: 4.6294...  0.3411 sec/batch
Epoch: 41/100...  Training Step: 7510...  Training loss: 4.5735..

Epoch: 42/100...  Training Step: 7596...  Training loss: 4.6251...  0.3398 sec/batch
Epoch: 42/100...  Training Step: 7597...  Training loss: 4.6750...  0.3415 sec/batch
Epoch: 42/100...  Training Step: 7598...  Training loss: 4.6810...  0.3400 sec/batch
Epoch: 42/100...  Training Step: 7599...  Training loss: 4.6769...  0.3417 sec/batch
Epoch: 42/100...  Training Step: 7600...  Training loss: 4.6348...  0.3403 sec/batch
Epoch: 42/100...  Training Step: 7601...  Training loss: 4.5638...  0.3390 sec/batch
Epoch: 42/100...  Training Step: 7602...  Training loss: 4.6434...  0.3432 sec/batch
Epoch: 42/100...  Training Step: 7603...  Training loss: 4.5415...  0.3395 sec/batch
Epoch: 42/100...  Training Step: 7604...  Training loss: 4.5092...  0.3413 sec/batch
Epoch: 42/100...  Training Step: 7605...  Training loss: 4.5419...  0.3427 sec/batch
Epoch: 42/100...  Training Step: 7606...  Training loss: 4.6011...  0.3408 sec/batch
Epoch: 42/100...  Training Step: 7607...  Training loss: 4.6171..

Epoch: 42/100...  Training Step: 7693...  Training loss: 4.5988...  0.3402 sec/batch
Epoch: 42/100...  Training Step: 7694...  Training loss: 4.5656...  0.3421 sec/batch
Epoch: 42/100...  Training Step: 7695...  Training loss: 4.6050...  0.3419 sec/batch
Epoch: 42/100...  Training Step: 7696...  Training loss: 4.5312...  0.3426 sec/batch
Epoch: 42/100...  Training Step: 7697...  Training loss: 4.6070...  0.3423 sec/batch
Epoch: 42/100...  Training Step: 7698...  Training loss: 4.5741...  0.3395 sec/batch
Epoch: 42/100...  Training Step: 7699...  Training loss: 4.6355...  0.3421 sec/batch
Epoch: 42/100...  Training Step: 7700...  Training loss: 4.6031...  0.3420 sec/batch
Epoch: 42/100...  Training Step: 7701...  Training loss: 4.6274...  0.3428 sec/batch
Epoch: 42/100...  Training Step: 7702...  Training loss: 4.6951...  0.3395 sec/batch
Epoch: 42/100...  Training Step: 7703...  Training loss: 4.5184...  0.3444 sec/batch
Epoch: 42/100...  Training Step: 7704...  Training loss: 4.5607..

Epoch: 43/100...  Training Step: 7790...  Training loss: 4.5943...  0.3415 sec/batch
Epoch: 43/100...  Training Step: 7791...  Training loss: 4.6083...  0.3440 sec/batch
Epoch: 43/100...  Training Step: 7792...  Training loss: 4.6248...  0.3417 sec/batch
Epoch: 43/100...  Training Step: 7793...  Training loss: 4.5519...  0.3398 sec/batch
Epoch: 43/100...  Training Step: 7794...  Training loss: 4.5425...  0.3408 sec/batch
Epoch: 43/100...  Training Step: 7795...  Training loss: 4.5716...  0.3412 sec/batch
Epoch: 43/100...  Training Step: 7796...  Training loss: 4.4943...  0.3411 sec/batch
Epoch: 43/100...  Training Step: 7797...  Training loss: 4.5794...  0.3437 sec/batch
Epoch: 43/100...  Training Step: 7798...  Training loss: 4.5495...  0.3409 sec/batch
Epoch: 43/100...  Training Step: 7799...  Training loss: 4.5495...  0.3418 sec/batch
Epoch: 43/100...  Training Step: 7800...  Training loss: 4.4272...  0.3403 sec/batch
Epoch: 43/100...  Training Step: 7801...  Training loss: 4.4700..

Epoch: 43/100...  Training Step: 7887...  Training loss: 4.4751...  0.3403 sec/batch
Epoch: 43/100...  Training Step: 7888...  Training loss: 4.5502...  0.3427 sec/batch
Epoch: 43/100...  Training Step: 7889...  Training loss: 4.5666...  0.3388 sec/batch
Epoch: 43/100...  Training Step: 7890...  Training loss: 4.5913...  0.3399 sec/batch
Epoch: 43/100...  Training Step: 7891...  Training loss: 4.5713...  0.3419 sec/batch
Epoch: 43/100...  Training Step: 7892...  Training loss: 4.5530...  0.3426 sec/batch
Epoch: 43/100...  Training Step: 7893...  Training loss: 4.6588...  0.3450 sec/batch
Epoch: 43/100...  Training Step: 7894...  Training loss: 4.5939...  0.3414 sec/batch
Epoch: 43/100...  Training Step: 7895...  Training loss: 4.5407...  0.3421 sec/batch
Epoch: 43/100...  Training Step: 7896...  Training loss: 4.5294...  0.3384 sec/batch
Epoch: 43/100...  Training Step: 7897...  Training loss: 4.5473...  0.3405 sec/batch
Epoch: 43/100...  Training Step: 7898...  Training loss: 4.5831..

Epoch: 44/100...  Training Step: 7984...  Training loss: 4.3955...  0.3406 sec/batch
Epoch: 44/100...  Training Step: 7985...  Training loss: 4.4425...  0.3415 sec/batch
Epoch: 44/100...  Training Step: 7986...  Training loss: 4.4478...  0.3424 sec/batch
Epoch: 44/100...  Training Step: 7987...  Training loss: 4.4502...  0.3423 sec/batch
Epoch: 44/100...  Training Step: 7988...  Training loss: 4.4761...  0.3416 sec/batch
Epoch: 44/100...  Training Step: 7989...  Training loss: 4.5200...  0.3416 sec/batch
Epoch: 44/100...  Training Step: 7990...  Training loss: 4.4670...  0.3406 sec/batch
Epoch: 44/100...  Training Step: 7991...  Training loss: 4.5312...  0.3403 sec/batch
Epoch: 44/100...  Training Step: 7992...  Training loss: 4.5452...  0.3399 sec/batch
Epoch: 44/100...  Training Step: 7993...  Training loss: 4.5700...  0.3434 sec/batch
Epoch: 44/100...  Training Step: 7994...  Training loss: 4.5509...  0.3420 sec/batch
Epoch: 44/100...  Training Step: 7995...  Training loss: 4.5519..

Epoch: 44/100...  Training Step: 8081...  Training loss: 4.5491...  0.3406 sec/batch
Epoch: 44/100...  Training Step: 8082...  Training loss: 4.5580...  0.3385 sec/batch
Epoch: 44/100...  Training Step: 8083...  Training loss: 4.5296...  0.3401 sec/batch
Epoch: 44/100...  Training Step: 8084...  Training loss: 4.4905...  0.3403 sec/batch
Epoch: 44/100...  Training Step: 8085...  Training loss: 4.5012...  0.3408 sec/batch
Epoch: 44/100...  Training Step: 8086...  Training loss: 4.4748...  0.3391 sec/batch
Epoch: 44/100...  Training Step: 8087...  Training loss: 4.4611...  0.3392 sec/batch
Epoch: 44/100...  Training Step: 8088...  Training loss: 4.5205...  0.3403 sec/batch
Epoch: 44/100...  Training Step: 8089...  Training loss: 4.5683...  0.3401 sec/batch
Epoch: 44/100...  Training Step: 8090...  Training loss: 4.5301...  0.3392 sec/batch
Epoch: 44/100...  Training Step: 8091...  Training loss: 4.5360...  0.3401 sec/batch
Epoch: 44/100...  Training Step: 8092...  Training loss: 4.4818..

Epoch: 45/100...  Training Step: 8178...  Training loss: 4.5067...  0.3393 sec/batch
Epoch: 45/100...  Training Step: 8179...  Training loss: 4.5117...  0.3383 sec/batch
Epoch: 45/100...  Training Step: 8180...  Training loss: 4.5195...  0.3417 sec/batch
Epoch: 45/100...  Training Step: 8181...  Training loss: 4.4993...  0.3415 sec/batch
Epoch: 45/100...  Training Step: 8182...  Training loss: 4.5279...  0.3421 sec/batch
Epoch: 45/100...  Training Step: 8183...  Training loss: 4.4677...  0.3388 sec/batch
Epoch: 45/100...  Training Step: 8184...  Training loss: 4.4765...  0.3397 sec/batch
Epoch: 45/100...  Training Step: 8185...  Training loss: 4.5201...  0.3401 sec/batch
Epoch: 45/100...  Training Step: 8186...  Training loss: 4.5975...  0.3399 sec/batch
Epoch: 45/100...  Training Step: 8187...  Training loss: 4.5110...  0.3389 sec/batch
Epoch: 45/100...  Training Step: 8188...  Training loss: 4.4048...  0.3402 sec/batch
Epoch: 45/100...  Training Step: 8189...  Training loss: 4.5086..

Epoch: 45/100...  Training Step: 8275...  Training loss: 4.5408...  0.3399 sec/batch
Epoch: 45/100...  Training Step: 8276...  Training loss: 4.4816...  0.3387 sec/batch
Epoch: 45/100...  Training Step: 8277...  Training loss: 4.6048...  0.3414 sec/batch
Epoch: 45/100...  Training Step: 8278...  Training loss: 4.5378...  0.3387 sec/batch
Epoch: 45/100...  Training Step: 8279...  Training loss: 4.5384...  0.3367 sec/batch
Epoch: 45/100...  Training Step: 8280...  Training loss: 4.5496...  0.3416 sec/batch
Epoch: 46/100...  Training Step: 8281...  Training loss: 4.5982...  0.3407 sec/batch
Epoch: 46/100...  Training Step: 8282...  Training loss: 4.4911...  0.3401 sec/batch
Epoch: 46/100...  Training Step: 8283...  Training loss: 4.4116...  0.3399 sec/batch
Epoch: 46/100...  Training Step: 8284...  Training loss: 4.4755...  0.3402 sec/batch
Epoch: 46/100...  Training Step: 8285...  Training loss: 4.4326...  0.3403 sec/batch
Epoch: 46/100...  Training Step: 8286...  Training loss: 4.3901..

Epoch: 46/100...  Training Step: 8372...  Training loss: 4.3786...  0.3412 sec/batch
Epoch: 46/100...  Training Step: 8373...  Training loss: 4.4785...  0.3400 sec/batch
Epoch: 46/100...  Training Step: 8374...  Training loss: 4.4641...  0.3403 sec/batch
Epoch: 46/100...  Training Step: 8375...  Training loss: 4.4808...  0.3385 sec/batch
Epoch: 46/100...  Training Step: 8376...  Training loss: 4.4669...  0.3387 sec/batch
Epoch: 46/100...  Training Step: 8377...  Training loss: 4.4203...  0.3387 sec/batch
Epoch: 46/100...  Training Step: 8378...  Training loss: 4.3895...  0.3422 sec/batch
Epoch: 46/100...  Training Step: 8379...  Training loss: 4.4131...  0.3395 sec/batch
Epoch: 46/100...  Training Step: 8380...  Training loss: 4.4429...  0.3394 sec/batch
Epoch: 46/100...  Training Step: 8381...  Training loss: 4.4511...  0.3401 sec/batch
Epoch: 46/100...  Training Step: 8382...  Training loss: 4.4446...  0.3386 sec/batch
Epoch: 46/100...  Training Step: 8383...  Training loss: 4.5459..

Epoch: 47/100...  Training Step: 8469...  Training loss: 4.3723...  0.3411 sec/batch
Epoch: 47/100...  Training Step: 8470...  Training loss: 4.3379...  0.3407 sec/batch
Epoch: 47/100...  Training Step: 8471...  Training loss: 4.5390...  0.3390 sec/batch
Epoch: 47/100...  Training Step: 8472...  Training loss: 4.4661...  0.3404 sec/batch
Epoch: 47/100...  Training Step: 8473...  Training loss: 4.4802...  0.3395 sec/batch
Epoch: 47/100...  Training Step: 8474...  Training loss: 4.4825...  0.3375 sec/batch
Epoch: 47/100...  Training Step: 8475...  Training loss: 4.4547...  0.3398 sec/batch
Epoch: 47/100...  Training Step: 8476...  Training loss: 4.4265...  0.3398 sec/batch
Epoch: 47/100...  Training Step: 8477...  Training loss: 4.4178...  0.3417 sec/batch
Epoch: 47/100...  Training Step: 8478...  Training loss: 4.4360...  0.3404 sec/batch
Epoch: 47/100...  Training Step: 8479...  Training loss: 4.4832...  0.3408 sec/batch
Epoch: 47/100...  Training Step: 8480...  Training loss: 4.4294..

Epoch: 47/100...  Training Step: 8566...  Training loss: 4.4086...  0.3412 sec/batch
Epoch: 47/100...  Training Step: 8567...  Training loss: 4.5080...  0.3391 sec/batch
Epoch: 47/100...  Training Step: 8568...  Training loss: 4.3895...  0.3398 sec/batch
Epoch: 47/100...  Training Step: 8569...  Training loss: 4.4948...  0.3386 sec/batch
Epoch: 47/100...  Training Step: 8570...  Training loss: 4.4804...  0.3394 sec/batch
Epoch: 47/100...  Training Step: 8571...  Training loss: 4.4383...  0.3421 sec/batch
Epoch: 47/100...  Training Step: 8572...  Training loss: 4.4102...  0.3404 sec/batch
Epoch: 47/100...  Training Step: 8573...  Training loss: 4.3957...  0.3391 sec/batch
Epoch: 47/100...  Training Step: 8574...  Training loss: 4.4074...  0.3408 sec/batch
Epoch: 47/100...  Training Step: 8575...  Training loss: 4.4012...  0.3403 sec/batch
Epoch: 47/100...  Training Step: 8576...  Training loss: 4.3954...  0.3401 sec/batch
Epoch: 47/100...  Training Step: 8577...  Training loss: 4.4115..

Epoch: 48/100...  Training Step: 8663...  Training loss: 4.4755...  0.3398 sec/batch
Epoch: 48/100...  Training Step: 8664...  Training loss: 4.4184...  0.3396 sec/batch
Epoch: 48/100...  Training Step: 8665...  Training loss: 4.5033...  0.3388 sec/batch
Epoch: 48/100...  Training Step: 8666...  Training loss: 4.4356...  0.3379 sec/batch
Epoch: 48/100...  Training Step: 8667...  Training loss: 4.5344...  0.3383 sec/batch
Epoch: 48/100...  Training Step: 8668...  Training loss: 4.4482...  0.3428 sec/batch
Epoch: 48/100...  Training Step: 8669...  Training loss: 4.4181...  0.3416 sec/batch
Epoch: 48/100...  Training Step: 8670...  Training loss: 4.3714...  0.3403 sec/batch
Epoch: 48/100...  Training Step: 8671...  Training loss: 4.3755...  0.3385 sec/batch
Epoch: 48/100...  Training Step: 8672...  Training loss: 4.3737...  0.3399 sec/batch
Epoch: 48/100...  Training Step: 8673...  Training loss: 4.4391...  0.3396 sec/batch
Epoch: 48/100...  Training Step: 8674...  Training loss: 4.4469..

Epoch: 48/100...  Training Step: 8760...  Training loss: 4.4239...  0.3391 sec/batch
Epoch: 48/100...  Training Step: 8761...  Training loss: 4.4194...  0.3383 sec/batch
Epoch: 48/100...  Training Step: 8762...  Training loss: 4.3472...  0.3388 sec/batch
Epoch: 48/100...  Training Step: 8763...  Training loss: 4.4038...  0.3398 sec/batch
Epoch: 48/100...  Training Step: 8764...  Training loss: 4.3536...  0.3407 sec/batch
Epoch: 48/100...  Training Step: 8765...  Training loss: 4.4379...  0.3394 sec/batch
Epoch: 48/100...  Training Step: 8766...  Training loss: 4.3441...  0.3390 sec/batch
Epoch: 48/100...  Training Step: 8767...  Training loss: 4.3889...  0.3397 sec/batch
Epoch: 48/100...  Training Step: 8768...  Training loss: 4.3990...  0.3387 sec/batch
Epoch: 48/100...  Training Step: 8769...  Training loss: 4.5612...  0.3436 sec/batch
Epoch: 48/100...  Training Step: 8770...  Training loss: 4.4982...  0.3403 sec/batch
Epoch: 48/100...  Training Step: 8771...  Training loss: 4.4546..

Epoch: 49/100...  Training Step: 8857...  Training loss: 4.4366...  0.3420 sec/batch
Epoch: 49/100...  Training Step: 8858...  Training loss: 4.4206...  0.3407 sec/batch
Epoch: 49/100...  Training Step: 8859...  Training loss: 4.4663...  0.3391 sec/batch
Epoch: 49/100...  Training Step: 8860...  Training loss: 4.4193...  0.3376 sec/batch
Epoch: 49/100...  Training Step: 8861...  Training loss: 4.5061...  0.3376 sec/batch
Epoch: 49/100...  Training Step: 8862...  Training loss: 4.5439...  0.3396 sec/batch
Epoch: 49/100...  Training Step: 8863...  Training loss: 4.5104...  0.3408 sec/batch
Epoch: 49/100...  Training Step: 8864...  Training loss: 4.5068...  0.3390 sec/batch
Epoch: 49/100...  Training Step: 8865...  Training loss: 4.3076...  0.3400 sec/batch
Epoch: 49/100...  Training Step: 8866...  Training loss: 4.3746...  0.3392 sec/batch
Epoch: 49/100...  Training Step: 8867...  Training loss: 4.4452...  0.3395 sec/batch
Epoch: 49/100...  Training Step: 8868...  Training loss: 4.5712..

Epoch: 49/100...  Training Step: 8954...  Training loss: 4.5091...  0.3417 sec/batch
Epoch: 49/100...  Training Step: 8955...  Training loss: 4.4683...  0.3403 sec/batch
Epoch: 49/100...  Training Step: 8956...  Training loss: 4.4086...  0.3450 sec/batch
Epoch: 49/100...  Training Step: 8957...  Training loss: 4.3950...  0.3443 sec/batch
Epoch: 49/100...  Training Step: 8958...  Training loss: 4.4361...  0.3418 sec/batch
Epoch: 49/100...  Training Step: 8959...  Training loss: 4.4073...  0.3395 sec/batch
Epoch: 49/100...  Training Step: 8960...  Training loss: 4.4010...  0.3417 sec/batch
Epoch: 49/100...  Training Step: 8961...  Training loss: 4.5017...  0.3432 sec/batch
Epoch: 49/100...  Training Step: 8962...  Training loss: 4.4524...  0.3406 sec/batch
Epoch: 49/100...  Training Step: 8963...  Training loss: 4.4995...  0.3403 sec/batch
Epoch: 49/100...  Training Step: 8964...  Training loss: 4.4522...  0.3419 sec/batch
Epoch: 49/100...  Training Step: 8965...  Training loss: 4.4547..

Epoch: 50/100...  Training Step: 9051...  Training loss: 4.4443...  0.3408 sec/batch
Epoch: 50/100...  Training Step: 9052...  Training loss: 4.5325...  0.3433 sec/batch
Epoch: 50/100...  Training Step: 9053...  Training loss: 4.5102...  0.3444 sec/batch
Epoch: 50/100...  Training Step: 9054...  Training loss: 4.5165...  0.3407 sec/batch
Epoch: 50/100...  Training Step: 9055...  Training loss: 4.4686...  0.3429 sec/batch
Epoch: 50/100...  Training Step: 9056...  Training loss: 4.4723...  0.3436 sec/batch
Epoch: 50/100...  Training Step: 9057...  Training loss: 4.5100...  0.3457 sec/batch
Epoch: 50/100...  Training Step: 9058...  Training loss: 4.5417...  0.3428 sec/batch
Epoch: 50/100...  Training Step: 9059...  Training loss: 4.5619...  0.3460 sec/batch
Epoch: 50/100...  Training Step: 9060...  Training loss: 4.5217...  0.3457 sec/batch
Epoch: 50/100...  Training Step: 9061...  Training loss: 4.4811...  0.3445 sec/batch
Epoch: 50/100...  Training Step: 9062...  Training loss: 4.4383..

Epoch: 50/100...  Training Step: 9148...  Training loss: 4.4102...  0.3424 sec/batch
Epoch: 50/100...  Training Step: 9149...  Training loss: 4.4611...  0.3452 sec/batch
Epoch: 50/100...  Training Step: 9150...  Training loss: 4.3567...  0.3447 sec/batch
Epoch: 50/100...  Training Step: 9151...  Training loss: 4.3931...  0.3459 sec/batch
Epoch: 50/100...  Training Step: 9152...  Training loss: 4.3315...  0.3454 sec/batch
Epoch: 50/100...  Training Step: 9153...  Training loss: 4.4004...  0.3451 sec/batch
Epoch: 50/100...  Training Step: 9154...  Training loss: 4.3523...  0.3452 sec/batch
Epoch: 50/100...  Training Step: 9155...  Training loss: 4.4502...  0.3399 sec/batch
Epoch: 50/100...  Training Step: 9156...  Training loss: 4.4140...  0.3405 sec/batch
Epoch: 50/100...  Training Step: 9157...  Training loss: 4.4929...  0.3440 sec/batch
Epoch: 50/100...  Training Step: 9158...  Training loss: 4.4311...  0.3440 sec/batch
Epoch: 50/100...  Training Step: 9159...  Training loss: 4.3556..

Epoch: 51/100...  Training Step: 9245...  Training loss: 4.4400...  0.3473 sec/batch
Epoch: 51/100...  Training Step: 9246...  Training loss: 4.3917...  0.3458 sec/batch
Epoch: 51/100...  Training Step: 9247...  Training loss: 4.4730...  0.3454 sec/batch
Epoch: 51/100...  Training Step: 9248...  Training loss: 4.3726...  0.3399 sec/batch
Epoch: 51/100...  Training Step: 9249...  Training loss: 4.4191...  0.3428 sec/batch
Epoch: 51/100...  Training Step: 9250...  Training loss: 4.4589...  0.3455 sec/batch
Epoch: 51/100...  Training Step: 9251...  Training loss: 4.4511...  0.3415 sec/batch
Epoch: 51/100...  Training Step: 9252...  Training loss: 4.4312...  0.3439 sec/batch
Epoch: 51/100...  Training Step: 9253...  Training loss: 4.4475...  0.3434 sec/batch
Epoch: 51/100...  Training Step: 9254...  Training loss: 4.4706...  0.3454 sec/batch
Epoch: 51/100...  Training Step: 9255...  Training loss: 4.4623...  0.3449 sec/batch
Epoch: 51/100...  Training Step: 9256...  Training loss: 4.4508..

Epoch: 51/100...  Training Step: 9342...  Training loss: 4.4200...  0.3425 sec/batch
Epoch: 51/100...  Training Step: 9343...  Training loss: 4.3453...  0.3447 sec/batch
Epoch: 51/100...  Training Step: 9344...  Training loss: 4.2954...  0.3457 sec/batch
Epoch: 51/100...  Training Step: 9345...  Training loss: 4.3629...  0.3450 sec/batch
Epoch: 51/100...  Training Step: 9346...  Training loss: 4.3467...  0.3459 sec/batch
Epoch: 51/100...  Training Step: 9347...  Training loss: 4.3665...  0.3451 sec/batch
Epoch: 51/100...  Training Step: 9348...  Training loss: 4.3775...  0.3436 sec/batch
Epoch: 51/100...  Training Step: 9349...  Training loss: 4.3943...  0.3441 sec/batch
Epoch: 51/100...  Training Step: 9350...  Training loss: 4.3119...  0.3425 sec/batch
Epoch: 51/100...  Training Step: 9351...  Training loss: 4.3570...  0.3418 sec/batch
Epoch: 51/100...  Training Step: 9352...  Training loss: 4.3138...  0.3438 sec/batch
Epoch: 51/100...  Training Step: 9353...  Training loss: 4.3574..

Epoch: 52/100...  Training Step: 9439...  Training loss: 4.4272...  0.3456 sec/batch
Epoch: 52/100...  Training Step: 9440...  Training loss: 4.4351...  0.3443 sec/batch
Epoch: 52/100...  Training Step: 9441...  Training loss: 4.3242...  0.3443 sec/batch
Epoch: 52/100...  Training Step: 9442...  Training loss: 4.3884...  0.3440 sec/batch
Epoch: 52/100...  Training Step: 9443...  Training loss: 4.3150...  0.3420 sec/batch
Epoch: 52/100...  Training Step: 9444...  Training loss: 4.3160...  0.3460 sec/batch
Epoch: 52/100...  Training Step: 9445...  Training loss: 4.3576...  0.3452 sec/batch
Epoch: 52/100...  Training Step: 9446...  Training loss: 4.3725...  0.3403 sec/batch
Epoch: 52/100...  Training Step: 9447...  Training loss: 4.3789...  0.3435 sec/batch
Epoch: 52/100...  Training Step: 9448...  Training loss: 4.4523...  0.3418 sec/batch
Epoch: 52/100...  Training Step: 9449...  Training loss: 4.3382...  0.3459 sec/batch
Epoch: 52/100...  Training Step: 9450...  Training loss: 4.3586..

Epoch: 52/100...  Training Step: 9536...  Training loss: 4.3027...  0.3457 sec/batch
Epoch: 52/100...  Training Step: 9537...  Training loss: 4.3506...  0.3438 sec/batch
Epoch: 52/100...  Training Step: 9538...  Training loss: 4.3710...  0.3413 sec/batch
Epoch: 52/100...  Training Step: 9539...  Training loss: 4.3980...  0.3428 sec/batch
Epoch: 52/100...  Training Step: 9540...  Training loss: 4.3870...  0.3436 sec/batch
Epoch: 52/100...  Training Step: 9541...  Training loss: 4.4147...  0.3408 sec/batch
Epoch: 52/100...  Training Step: 9542...  Training loss: 4.4671...  0.3426 sec/batch
Epoch: 52/100...  Training Step: 9543...  Training loss: 4.3102...  0.3436 sec/batch
Epoch: 52/100...  Training Step: 9544...  Training loss: 4.3827...  0.3445 sec/batch
Epoch: 52/100...  Training Step: 9545...  Training loss: 4.4167...  0.3467 sec/batch
Epoch: 52/100...  Training Step: 9546...  Training loss: 4.4202...  0.3458 sec/batch
Epoch: 52/100...  Training Step: 9547...  Training loss: 4.3489..

Epoch: 53/100...  Training Step: 9633...  Training loss: 4.3463...  0.3413 sec/batch
Epoch: 53/100...  Training Step: 9634...  Training loss: 4.3435...  0.3460 sec/batch
Epoch: 53/100...  Training Step: 9635...  Training loss: 4.3423...  0.3452 sec/batch
Epoch: 53/100...  Training Step: 9636...  Training loss: 4.2625...  0.3439 sec/batch
Epoch: 53/100...  Training Step: 9637...  Training loss: 4.3683...  0.3427 sec/batch
Epoch: 53/100...  Training Step: 9638...  Training loss: 4.3833...  0.3428 sec/batch
Epoch: 53/100...  Training Step: 9639...  Training loss: 4.3512...  0.3421 sec/batch
Epoch: 53/100...  Training Step: 9640...  Training loss: 4.2408...  0.3461 sec/batch
Epoch: 53/100...  Training Step: 9641...  Training loss: 4.2798...  0.3437 sec/batch
Epoch: 53/100...  Training Step: 9642...  Training loss: 4.2624...  0.3426 sec/batch
Epoch: 53/100...  Training Step: 9643...  Training loss: 4.2572...  0.3422 sec/batch
Epoch: 53/100...  Training Step: 9644...  Training loss: 4.2889..

Epoch: 53/100...  Training Step: 9730...  Training loss: 4.3739...  0.3445 sec/batch
Epoch: 53/100...  Training Step: 9731...  Training loss: 4.3262...  0.3455 sec/batch
Epoch: 53/100...  Training Step: 9732...  Training loss: 4.3201...  0.3446 sec/batch
Epoch: 53/100...  Training Step: 9733...  Training loss: 4.4024...  0.3450 sec/batch
Epoch: 53/100...  Training Step: 9734...  Training loss: 4.3434...  0.3449 sec/batch
Epoch: 53/100...  Training Step: 9735...  Training loss: 4.3140...  0.3411 sec/batch
Epoch: 53/100...  Training Step: 9736...  Training loss: 4.3242...  0.3406 sec/batch
Epoch: 53/100...  Training Step: 9737...  Training loss: 4.3575...  0.3400 sec/batch
Epoch: 53/100...  Training Step: 9738...  Training loss: 4.3703...  0.3428 sec/batch
Epoch: 53/100...  Training Step: 9739...  Training loss: 4.3206...  0.3432 sec/batch
Epoch: 53/100...  Training Step: 9740...  Training loss: 4.2803...  0.3408 sec/batch
Epoch: 53/100...  Training Step: 9741...  Training loss: 4.2677..

Epoch: 54/100...  Training Step: 9827...  Training loss: 4.2569...  0.3407 sec/batch
Epoch: 54/100...  Training Step: 9828...  Training loss: 4.2877...  0.3450 sec/batch
Epoch: 54/100...  Training Step: 9829...  Training loss: 4.3125...  0.3442 sec/batch
Epoch: 54/100...  Training Step: 9830...  Training loss: 4.2706...  0.3430 sec/batch
Epoch: 54/100...  Training Step: 9831...  Training loss: 4.3096...  0.3428 sec/batch
Epoch: 54/100...  Training Step: 9832...  Training loss: 4.3040...  0.3447 sec/batch
Epoch: 54/100...  Training Step: 9833...  Training loss: 4.3392...  0.3428 sec/batch
Epoch: 54/100...  Training Step: 9834...  Training loss: 4.3068...  0.3449 sec/batch
Epoch: 54/100...  Training Step: 9835...  Training loss: 4.3002...  0.3413 sec/batch
Epoch: 54/100...  Training Step: 9836...  Training loss: 4.2882...  0.3439 sec/batch
Epoch: 54/100...  Training Step: 9837...  Training loss: 4.3271...  0.3412 sec/batch
Epoch: 54/100...  Training Step: 9838...  Training loss: 4.3406..

Epoch: 54/100...  Training Step: 9924...  Training loss: 4.2544...  0.3403 sec/batch
Epoch: 54/100...  Training Step: 9925...  Training loss: 4.2690...  0.3441 sec/batch
Epoch: 54/100...  Training Step: 9926...  Training loss: 4.2569...  0.3439 sec/batch
Epoch: 54/100...  Training Step: 9927...  Training loss: 4.1841...  0.3424 sec/batch
Epoch: 54/100...  Training Step: 9928...  Training loss: 4.2730...  0.3466 sec/batch
Epoch: 54/100...  Training Step: 9929...  Training loss: 4.2799...  0.3412 sec/batch
Epoch: 54/100...  Training Step: 9930...  Training loss: 4.2811...  0.3428 sec/batch
Epoch: 54/100...  Training Step: 9931...  Training loss: 4.2738...  0.3453 sec/batch
Epoch: 54/100...  Training Step: 9932...  Training loss: 4.2587...  0.3434 sec/batch
Epoch: 54/100...  Training Step: 9933...  Training loss: 4.3085...  0.3400 sec/batch
Epoch: 54/100...  Training Step: 9934...  Training loss: 4.3045...  0.3418 sec/batch
Epoch: 54/100...  Training Step: 9935...  Training loss: 4.2857..

Epoch: 55/100...  Training Step: 10021...  Training loss: 4.2839...  0.3408 sec/batch
Epoch: 55/100...  Training Step: 10022...  Training loss: 4.2998...  0.3435 sec/batch
Epoch: 55/100...  Training Step: 10023...  Training loss: 4.2521...  0.3437 sec/batch
Epoch: 55/100...  Training Step: 10024...  Training loss: 4.2180...  0.3445 sec/batch
Epoch: 55/100...  Training Step: 10025...  Training loss: 4.2962...  0.3443 sec/batch
Epoch: 55/100...  Training Step: 10026...  Training loss: 4.3667...  0.3441 sec/batch
Epoch: 55/100...  Training Step: 10027...  Training loss: 4.3015...  0.3413 sec/batch
Epoch: 55/100...  Training Step: 10028...  Training loss: 4.2159...  0.3467 sec/batch
Epoch: 55/100...  Training Step: 10029...  Training loss: 4.2977...  0.3416 sec/batch
Epoch: 55/100...  Training Step: 10030...  Training loss: 4.2651...  0.3407 sec/batch
Epoch: 55/100...  Training Step: 10031...  Training loss: 4.3409...  0.3425 sec/batch
Epoch: 55/100...  Training Step: 10032...  Training lo

Epoch: 55/100...  Training Step: 10117...  Training loss: 4.3096...  0.3418 sec/batch
Epoch: 55/100...  Training Step: 10118...  Training loss: 4.2687...  0.3448 sec/batch
Epoch: 55/100...  Training Step: 10119...  Training loss: 4.2786...  0.3410 sec/batch
Epoch: 55/100...  Training Step: 10120...  Training loss: 4.3202...  0.3439 sec/batch
Epoch: 56/100...  Training Step: 10121...  Training loss: 4.2580...  0.3447 sec/batch
Epoch: 56/100...  Training Step: 10122...  Training loss: 4.1161...  0.3419 sec/batch
Epoch: 56/100...  Training Step: 10123...  Training loss: 4.0917...  0.3448 sec/batch
Epoch: 56/100...  Training Step: 10124...  Training loss: 4.1664...  0.3417 sec/batch
Epoch: 56/100...  Training Step: 10125...  Training loss: 4.1530...  0.3414 sec/batch
Epoch: 56/100...  Training Step: 10126...  Training loss: 4.1078...  0.3410 sec/batch
Epoch: 56/100...  Training Step: 10127...  Training loss: 4.3068...  0.3438 sec/batch
Epoch: 56/100...  Training Step: 10128...  Training lo

Epoch: 56/100...  Training Step: 10213...  Training loss: 4.2619...  0.3419 sec/batch
Epoch: 56/100...  Training Step: 10214...  Training loss: 4.2525...  0.3400 sec/batch
Epoch: 56/100...  Training Step: 10215...  Training loss: 4.2800...  0.3448 sec/batch
Epoch: 56/100...  Training Step: 10216...  Training loss: 4.2884...  0.3424 sec/batch
Epoch: 56/100...  Training Step: 10217...  Training loss: 4.2307...  0.3449 sec/batch
Epoch: 56/100...  Training Step: 10218...  Training loss: 4.2041...  0.3461 sec/batch
Epoch: 56/100...  Training Step: 10219...  Training loss: 4.2405...  0.3419 sec/batch
Epoch: 56/100...  Training Step: 10220...  Training loss: 4.2366...  0.3421 sec/batch
Epoch: 56/100...  Training Step: 10221...  Training loss: 4.2901...  0.3423 sec/batch
Epoch: 56/100...  Training Step: 10222...  Training loss: 4.2267...  0.3410 sec/batch
Epoch: 56/100...  Training Step: 10223...  Training loss: 4.2797...  0.3417 sec/batch
Epoch: 56/100...  Training Step: 10224...  Training lo

Epoch: 57/100...  Training Step: 10309...  Training loss: 4.1282...  0.3458 sec/batch
Epoch: 57/100...  Training Step: 10310...  Training loss: 4.0685...  0.3440 sec/batch
Epoch: 57/100...  Training Step: 10311...  Training loss: 4.2859...  0.3426 sec/batch
Epoch: 57/100...  Training Step: 10312...  Training loss: 4.2374...  0.3435 sec/batch
Epoch: 57/100...  Training Step: 10313...  Training loss: 4.2544...  0.3442 sec/batch
Epoch: 57/100...  Training Step: 10314...  Training loss: 4.3022...  0.3412 sec/batch
Epoch: 57/100...  Training Step: 10315...  Training loss: 4.2336...  0.3429 sec/batch
Epoch: 57/100...  Training Step: 10316...  Training loss: 4.2291...  0.3396 sec/batch
Epoch: 57/100...  Training Step: 10317...  Training loss: 4.2097...  0.3410 sec/batch
Epoch: 57/100...  Training Step: 10318...  Training loss: 4.2285...  0.3444 sec/batch
Epoch: 57/100...  Training Step: 10319...  Training loss: 4.2818...  0.3473 sec/batch
Epoch: 57/100...  Training Step: 10320...  Training lo

Epoch: 57/100...  Training Step: 10405...  Training loss: 4.2424...  0.3432 sec/batch
Epoch: 57/100...  Training Step: 10406...  Training loss: 4.2241...  0.3421 sec/batch
Epoch: 57/100...  Training Step: 10407...  Training loss: 4.2620...  0.3432 sec/batch
Epoch: 57/100...  Training Step: 10408...  Training loss: 4.1501...  0.3459 sec/batch
Epoch: 57/100...  Training Step: 10409...  Training loss: 4.2708...  0.3440 sec/batch
Epoch: 57/100...  Training Step: 10410...  Training loss: 4.2645...  0.3451 sec/batch
Epoch: 57/100...  Training Step: 10411...  Training loss: 4.2248...  0.3463 sec/batch
Epoch: 57/100...  Training Step: 10412...  Training loss: 4.1899...  0.3467 sec/batch
Epoch: 57/100...  Training Step: 10413...  Training loss: 4.1526...  0.3455 sec/batch
Epoch: 57/100...  Training Step: 10414...  Training loss: 4.2328...  0.3459 sec/batch
Epoch: 57/100...  Training Step: 10415...  Training loss: 4.2775...  0.3448 sec/batch
Epoch: 57/100...  Training Step: 10416...  Training lo

Epoch: 58/100...  Training Step: 10501...  Training loss: 4.2163...  0.3419 sec/batch
Epoch: 58/100...  Training Step: 10502...  Training loss: 4.2048...  0.3421 sec/batch
Epoch: 58/100...  Training Step: 10503...  Training loss: 4.2675...  0.3441 sec/batch
Epoch: 58/100...  Training Step: 10504...  Training loss: 4.2336...  0.3456 sec/batch
Epoch: 58/100...  Training Step: 10505...  Training loss: 4.2894...  0.3440 sec/batch
Epoch: 58/100...  Training Step: 10506...  Training loss: 4.2034...  0.3416 sec/batch
Epoch: 58/100...  Training Step: 10507...  Training loss: 4.2800...  0.3441 sec/batch
Epoch: 58/100...  Training Step: 10508...  Training loss: 4.2607...  0.3436 sec/batch
Epoch: 58/100...  Training Step: 10509...  Training loss: 4.2096...  0.3409 sec/batch
Epoch: 58/100...  Training Step: 10510...  Training loss: 4.1491...  0.3414 sec/batch
Epoch: 58/100...  Training Step: 10511...  Training loss: 4.1563...  0.3459 sec/batch
Epoch: 58/100...  Training Step: 10512...  Training lo

Epoch: 58/100...  Training Step: 10597...  Training loss: 4.1610...  0.3437 sec/batch
Epoch: 58/100...  Training Step: 10598...  Training loss: 4.1803...  0.3457 sec/batch
Epoch: 58/100...  Training Step: 10599...  Training loss: 4.2371...  0.3445 sec/batch
Epoch: 58/100...  Training Step: 10600...  Training loss: 4.2333...  0.3408 sec/batch
Epoch: 58/100...  Training Step: 10601...  Training loss: 4.2439...  0.3426 sec/batch
Epoch: 58/100...  Training Step: 10602...  Training loss: 4.1913...  0.3409 sec/batch
Epoch: 58/100...  Training Step: 10603...  Training loss: 4.2516...  0.3439 sec/batch
Epoch: 58/100...  Training Step: 10604...  Training loss: 4.2128...  0.3439 sec/batch
Epoch: 58/100...  Training Step: 10605...  Training loss: 4.2484...  0.3454 sec/batch
Epoch: 58/100...  Training Step: 10606...  Training loss: 4.1577...  0.3426 sec/batch
Epoch: 58/100...  Training Step: 10607...  Training loss: 4.2273...  0.3426 sec/batch
Epoch: 58/100...  Training Step: 10608...  Training lo

Epoch: 59/100...  Training Step: 10693...  Training loss: 4.1902...  0.3433 sec/batch
Epoch: 59/100...  Training Step: 10694...  Training loss: 4.1328...  0.3430 sec/batch
Epoch: 59/100...  Training Step: 10695...  Training loss: 4.1442...  0.3424 sec/batch
Epoch: 59/100...  Training Step: 10696...  Training loss: 4.1604...  0.3428 sec/batch
Epoch: 59/100...  Training Step: 10697...  Training loss: 4.2324...  0.3428 sec/batch
Epoch: 59/100...  Training Step: 10698...  Training loss: 4.2349...  0.3445 sec/batch
Epoch: 59/100...  Training Step: 10699...  Training loss: 4.2316...  0.3440 sec/batch
Epoch: 59/100...  Training Step: 10700...  Training loss: 4.2205...  0.3415 sec/batch
Epoch: 59/100...  Training Step: 10701...  Training loss: 4.2619...  0.3446 sec/batch
Epoch: 59/100...  Training Step: 10702...  Training loss: 4.3030...  0.3414 sec/batch
Epoch: 59/100...  Training Step: 10703...  Training loss: 4.2762...  0.3432 sec/batch
Epoch: 59/100...  Training Step: 10704...  Training lo

Epoch: 59/100...  Training Step: 10789...  Training loss: 4.2355...  0.3448 sec/batch
Epoch: 59/100...  Training Step: 10790...  Training loss: 4.1572...  0.3446 sec/batch
Epoch: 59/100...  Training Step: 10791...  Training loss: 4.2270...  0.3455 sec/batch
Epoch: 59/100...  Training Step: 10792...  Training loss: 4.2163...  0.3466 sec/batch
Epoch: 59/100...  Training Step: 10793...  Training loss: 4.3492...  0.3443 sec/batch
Epoch: 59/100...  Training Step: 10794...  Training loss: 4.3050...  0.3459 sec/batch
Epoch: 59/100...  Training Step: 10795...  Training loss: 4.2936...  0.3425 sec/batch
Epoch: 59/100...  Training Step: 10796...  Training loss: 4.2300...  0.3438 sec/batch
Epoch: 59/100...  Training Step: 10797...  Training loss: 4.2764...  0.3436 sec/batch
Epoch: 59/100...  Training Step: 10798...  Training loss: 4.2693...  0.3435 sec/batch
Epoch: 59/100...  Training Step: 10799...  Training loss: 4.2438...  0.3440 sec/batch
Epoch: 59/100...  Training Step: 10800...  Training lo

Epoch: 60/100...  Training Step: 10885...  Training loss: 4.2262...  0.3409 sec/batch
Epoch: 60/100...  Training Step: 10886...  Training loss: 4.2545...  0.3431 sec/batch
Epoch: 60/100...  Training Step: 10887...  Training loss: 4.2471...  0.3423 sec/batch
Epoch: 60/100...  Training Step: 10888...  Training loss: 4.2272...  0.3451 sec/batch
Epoch: 60/100...  Training Step: 10889...  Training loss: 4.0685...  0.3443 sec/batch
Epoch: 60/100...  Training Step: 10890...  Training loss: 4.1401...  0.3397 sec/batch
Epoch: 60/100...  Training Step: 10891...  Training loss: 4.2388...  0.3424 sec/batch
Epoch: 60/100...  Training Step: 10892...  Training loss: 4.2643...  0.3428 sec/batch
Epoch: 60/100...  Training Step: 10893...  Training loss: 4.2907...  0.3426 sec/batch
Epoch: 60/100...  Training Step: 10894...  Training loss: 4.2555...  0.3404 sec/batch
Epoch: 60/100...  Training Step: 10895...  Training loss: 4.2521...  0.3450 sec/batch
Epoch: 60/100...  Training Step: 10896...  Training lo

Epoch: 60/100...  Training Step: 10981...  Training loss: 4.2246...  0.3446 sec/batch
Epoch: 60/100...  Training Step: 10982...  Training loss: 4.2397...  0.3415 sec/batch
Epoch: 60/100...  Training Step: 10983...  Training loss: 4.2463...  0.3438 sec/batch
Epoch: 60/100...  Training Step: 10984...  Training loss: 4.2366...  0.3410 sec/batch
Epoch: 60/100...  Training Step: 10985...  Training loss: 4.3072...  0.3443 sec/batch
Epoch: 60/100...  Training Step: 10986...  Training loss: 4.2471...  0.3434 sec/batch
Epoch: 60/100...  Training Step: 10987...  Training loss: 4.2735...  0.3413 sec/batch
Epoch: 60/100...  Training Step: 10988...  Training loss: 4.2388...  0.3424 sec/batch
Epoch: 60/100...  Training Step: 10989...  Training loss: 4.2479...  0.3458 sec/batch
Epoch: 60/100...  Training Step: 10990...  Training loss: 4.2031...  0.3412 sec/batch
Epoch: 60/100...  Training Step: 10991...  Training loss: 4.2696...  0.3441 sec/batch
Epoch: 60/100...  Training Step: 10992...  Training lo

Epoch: 61/100...  Training Step: 11077...  Training loss: 4.2859...  0.3422 sec/batch
Epoch: 61/100...  Training Step: 11078...  Training loss: 4.2596...  0.3442 sec/batch
Epoch: 61/100...  Training Step: 11079...  Training loss: 4.2466...  0.3426 sec/batch
Epoch: 61/100...  Training Step: 11080...  Training loss: 4.2471...  0.3454 sec/batch
Epoch: 61/100...  Training Step: 11081...  Training loss: 4.2616...  0.3438 sec/batch
Epoch: 61/100...  Training Step: 11082...  Training loss: 4.3224...  0.3444 sec/batch
Epoch: 61/100...  Training Step: 11083...  Training loss: 4.3672...  0.3447 sec/batch
Epoch: 61/100...  Training Step: 11084...  Training loss: 4.2851...  0.3441 sec/batch
Epoch: 61/100...  Training Step: 11085...  Training loss: 4.2746...  0.3427 sec/batch
Epoch: 61/100...  Training Step: 11086...  Training loss: 4.2240...  0.3432 sec/batch
Epoch: 61/100...  Training Step: 11087...  Training loss: 4.2883...  0.3415 sec/batch
Epoch: 61/100...  Training Step: 11088...  Training lo

Epoch: 61/100...  Training Step: 11173...  Training loss: 4.2662...  0.3442 sec/batch
Epoch: 61/100...  Training Step: 11174...  Training loss: 4.1596...  0.3408 sec/batch
Epoch: 61/100...  Training Step: 11175...  Training loss: 4.2369...  0.3411 sec/batch
Epoch: 61/100...  Training Step: 11176...  Training loss: 4.1505...  0.3425 sec/batch
Epoch: 61/100...  Training Step: 11177...  Training loss: 4.2132...  0.3449 sec/batch
Epoch: 61/100...  Training Step: 11178...  Training loss: 4.2102...  0.3419 sec/batch
Epoch: 61/100...  Training Step: 11179...  Training loss: 4.2555...  0.3425 sec/batch
Epoch: 61/100...  Training Step: 11180...  Training loss: 4.1749...  0.3438 sec/batch
Epoch: 61/100...  Training Step: 11181...  Training loss: 4.2623...  0.3423 sec/batch
Epoch: 61/100...  Training Step: 11182...  Training loss: 4.2279...  0.3418 sec/batch
Epoch: 61/100...  Training Step: 11183...  Training loss: 4.1883...  0.3425 sec/batch
Epoch: 61/100...  Training Step: 11184...  Training lo

Epoch: 62/100...  Training Step: 11269...  Training loss: 4.2667...  0.3424 sec/batch
Epoch: 62/100...  Training Step: 11270...  Training loss: 4.2030...  0.3406 sec/batch
Epoch: 62/100...  Training Step: 11271...  Training loss: 4.2571...  0.3399 sec/batch
Epoch: 62/100...  Training Step: 11272...  Training loss: 4.1834...  0.3419 sec/batch
Epoch: 62/100...  Training Step: 11273...  Training loss: 4.2112...  0.3426 sec/batch
Epoch: 62/100...  Training Step: 11274...  Training loss: 4.2617...  0.3410 sec/batch
Epoch: 62/100...  Training Step: 11275...  Training loss: 4.2541...  0.3398 sec/batch
Epoch: 62/100...  Training Step: 11276...  Training loss: 4.2582...  0.3398 sec/batch
Epoch: 62/100...  Training Step: 11277...  Training loss: 4.2480...  0.3420 sec/batch
Epoch: 62/100...  Training Step: 11278...  Training loss: 4.2760...  0.3411 sec/batch
Epoch: 62/100...  Training Step: 11279...  Training loss: 4.2622...  0.3406 sec/batch
Epoch: 62/100...  Training Step: 11280...  Training lo

Epoch: 62/100...  Training Step: 11365...  Training loss: 4.2670...  0.3403 sec/batch
Epoch: 62/100...  Training Step: 11366...  Training loss: 4.2182...  0.3399 sec/batch
Epoch: 62/100...  Training Step: 11367...  Training loss: 4.1871...  0.3425 sec/batch
Epoch: 62/100...  Training Step: 11368...  Training loss: 4.1389...  0.3442 sec/batch
Epoch: 62/100...  Training Step: 11369...  Training loss: 4.1732...  0.3417 sec/batch
Epoch: 62/100...  Training Step: 11370...  Training loss: 4.1655...  0.3434 sec/batch
Epoch: 62/100...  Training Step: 11371...  Training loss: 4.1571...  0.3430 sec/batch
Epoch: 62/100...  Training Step: 11372...  Training loss: 4.1909...  0.3402 sec/batch
Epoch: 62/100...  Training Step: 11373...  Training loss: 4.1977...  0.3437 sec/batch
Epoch: 62/100...  Training Step: 11374...  Training loss: 4.1431...  0.3434 sec/batch
Epoch: 62/100...  Training Step: 11375...  Training loss: 4.1917...  0.3407 sec/batch
Epoch: 62/100...  Training Step: 11376...  Training lo

Epoch: 63/100...  Training Step: 11461...  Training loss: 4.2527...  0.3429 sec/batch
Epoch: 63/100...  Training Step: 11462...  Training loss: 4.2492...  0.3415 sec/batch
Epoch: 63/100...  Training Step: 11463...  Training loss: 4.2482...  0.3419 sec/batch
Epoch: 63/100...  Training Step: 11464...  Training loss: 4.2480...  0.3412 sec/batch
Epoch: 63/100...  Training Step: 11465...  Training loss: 4.1721...  0.3414 sec/batch
Epoch: 63/100...  Training Step: 11466...  Training loss: 4.2060...  0.3413 sec/batch
Epoch: 63/100...  Training Step: 11467...  Training loss: 4.1144...  0.3449 sec/batch
Epoch: 63/100...  Training Step: 11468...  Training loss: 4.1098...  0.3450 sec/batch
Epoch: 63/100...  Training Step: 11469...  Training loss: 4.1258...  0.3443 sec/batch
Epoch: 63/100...  Training Step: 11470...  Training loss: 4.1877...  0.3410 sec/batch
Epoch: 63/100...  Training Step: 11471...  Training loss: 4.1624...  0.3433 sec/batch
Epoch: 63/100...  Training Step: 11472...  Training lo

Epoch: 63/100...  Training Step: 11557...  Training loss: 4.2040...  0.3433 sec/batch
Epoch: 63/100...  Training Step: 11558...  Training loss: 4.1142...  0.3446 sec/batch
Epoch: 63/100...  Training Step: 11559...  Training loss: 4.1577...  0.3446 sec/batch
Epoch: 63/100...  Training Step: 11560...  Training loss: 4.1178...  0.3448 sec/batch
Epoch: 63/100...  Training Step: 11561...  Training loss: 4.1779...  0.3427 sec/batch
Epoch: 63/100...  Training Step: 11562...  Training loss: 4.1608...  0.3461 sec/batch
Epoch: 63/100...  Training Step: 11563...  Training loss: 4.2031...  0.3434 sec/batch
Epoch: 63/100...  Training Step: 11564...  Training loss: 4.1930...  0.3418 sec/batch
Epoch: 63/100...  Training Step: 11565...  Training loss: 4.1883...  0.3440 sec/batch
Epoch: 63/100...  Training Step: 11566...  Training loss: 4.2311...  0.3449 sec/batch
Epoch: 63/100...  Training Step: 11567...  Training loss: 4.0923...  0.3450 sec/batch
Epoch: 63/100...  Training Step: 11568...  Training lo

Epoch: 64/100...  Training Step: 11653...  Training loss: 4.1432...  0.3419 sec/batch
Epoch: 64/100...  Training Step: 11654...  Training loss: 4.1514...  0.3396 sec/batch
Epoch: 64/100...  Training Step: 11655...  Training loss: 4.1344...  0.3425 sec/batch
Epoch: 64/100...  Training Step: 11656...  Training loss: 4.2155...  0.3402 sec/batch
Epoch: 64/100...  Training Step: 11657...  Training loss: 4.1342...  0.3419 sec/batch
Epoch: 64/100...  Training Step: 11658...  Training loss: 4.1326...  0.3404 sec/batch
Epoch: 64/100...  Training Step: 11659...  Training loss: 4.1658...  0.3424 sec/batch
Epoch: 64/100...  Training Step: 11660...  Training loss: 4.1055...  0.3409 sec/batch
Epoch: 64/100...  Training Step: 11661...  Training loss: 4.1881...  0.3408 sec/batch
Epoch: 64/100...  Training Step: 11662...  Training loss: 4.2226...  0.3414 sec/batch
Epoch: 64/100...  Training Step: 11663...  Training loss: 4.1381...  0.3402 sec/batch
Epoch: 64/100...  Training Step: 11664...  Training lo

Epoch: 64/100...  Training Step: 11749...  Training loss: 4.1969...  0.3432 sec/batch
Epoch: 64/100...  Training Step: 11750...  Training loss: 4.2467...  0.3405 sec/batch
Epoch: 64/100...  Training Step: 11751...  Training loss: 4.0966...  0.3419 sec/batch
Epoch: 64/100...  Training Step: 11752...  Training loss: 4.1203...  0.3419 sec/batch
Epoch: 64/100...  Training Step: 11753...  Training loss: 4.1278...  0.3412 sec/batch
Epoch: 64/100...  Training Step: 11754...  Training loss: 4.1782...  0.3410 sec/batch
Epoch: 64/100...  Training Step: 11755...  Training loss: 4.1047...  0.3416 sec/batch
Epoch: 64/100...  Training Step: 11756...  Training loss: 4.0934...  0.3437 sec/batch
Epoch: 64/100...  Training Step: 11757...  Training loss: 4.1627...  0.3428 sec/batch
Epoch: 64/100...  Training Step: 11758...  Training loss: 4.1226...  0.3420 sec/batch
Epoch: 64/100...  Training Step: 11759...  Training loss: 4.1353...  0.3426 sec/batch
Epoch: 64/100...  Training Step: 11760...  Training lo

Epoch: 65/100...  Training Step: 11845...  Training loss: 4.1985...  0.3410 sec/batch
Epoch: 65/100...  Training Step: 11846...  Training loss: 4.2074...  0.3430 sec/batch
Epoch: 65/100...  Training Step: 11847...  Training loss: 4.1853...  0.3406 sec/batch
Epoch: 65/100...  Training Step: 11848...  Training loss: 4.0714...  0.3390 sec/batch
Epoch: 65/100...  Training Step: 11849...  Training loss: 4.1125...  0.3416 sec/batch
Epoch: 65/100...  Training Step: 11850...  Training loss: 4.1225...  0.3410 sec/batch
Epoch: 65/100...  Training Step: 11851...  Training loss: 4.1120...  0.3428 sec/batch
Epoch: 65/100...  Training Step: 11852...  Training loss: 4.1184...  0.3428 sec/batch
Epoch: 65/100...  Training Step: 11853...  Training loss: 4.1616...  0.3419 sec/batch
Epoch: 65/100...  Training Step: 11854...  Training loss: 4.1060...  0.3427 sec/batch
Epoch: 65/100...  Training Step: 11855...  Training loss: 4.1532...  0.3418 sec/batch
Epoch: 65/100...  Training Step: 11856...  Training lo

Epoch: 65/100...  Training Step: 11941...  Training loss: 4.1610...  0.3395 sec/batch
Epoch: 65/100...  Training Step: 11942...  Training loss: 4.1275...  0.3424 sec/batch
Epoch: 65/100...  Training Step: 11943...  Training loss: 4.0883...  0.3425 sec/batch
Epoch: 65/100...  Training Step: 11944...  Training loss: 4.1190...  0.3400 sec/batch
Epoch: 65/100...  Training Step: 11945...  Training loss: 4.1614...  0.3396 sec/batch
Epoch: 65/100...  Training Step: 11946...  Training loss: 4.1943...  0.3425 sec/batch
Epoch: 65/100...  Training Step: 11947...  Training loss: 4.1345...  0.3395 sec/batch
Epoch: 65/100...  Training Step: 11948...  Training loss: 4.0876...  0.3447 sec/batch
Epoch: 65/100...  Training Step: 11949...  Training loss: 4.0688...  0.3410 sec/batch
Epoch: 65/100...  Training Step: 11950...  Training loss: 4.0621...  0.3432 sec/batch
Epoch: 65/100...  Training Step: 11951...  Training loss: 4.0523...  0.3415 sec/batch
Epoch: 65/100...  Training Step: 11952...  Training lo

Epoch: 66/100...  Training Step: 12037...  Training loss: 4.1302...  0.3408 sec/batch
Epoch: 66/100...  Training Step: 12038...  Training loss: 4.0418...  0.3408 sec/batch
Epoch: 66/100...  Training Step: 12039...  Training loss: 4.1439...  0.3437 sec/batch
Epoch: 66/100...  Training Step: 12040...  Training loss: 4.0715...  0.3412 sec/batch
Epoch: 66/100...  Training Step: 12041...  Training loss: 4.1493...  0.3391 sec/batch
Epoch: 66/100...  Training Step: 12042...  Training loss: 4.1006...  0.3444 sec/batch
Epoch: 66/100...  Training Step: 12043...  Training loss: 4.1223...  0.3442 sec/batch
Epoch: 66/100...  Training Step: 12044...  Training loss: 4.0852...  0.3387 sec/batch
Epoch: 66/100...  Training Step: 12045...  Training loss: 4.0830...  0.3384 sec/batch
Epoch: 66/100...  Training Step: 12046...  Training loss: 4.1048...  0.3388 sec/batch
Epoch: 66/100...  Training Step: 12047...  Training loss: 4.0637...  0.3401 sec/batch
Epoch: 66/100...  Training Step: 12048...  Training lo

Epoch: 66/100...  Training Step: 12133...  Training loss: 4.0310...  0.3390 sec/batch
Epoch: 66/100...  Training Step: 12134...  Training loss: 4.0400...  0.3422 sec/batch
Epoch: 66/100...  Training Step: 12135...  Training loss: 3.9901...  0.3443 sec/batch
Epoch: 66/100...  Training Step: 12136...  Training loss: 4.1028...  0.3401 sec/batch
Epoch: 66/100...  Training Step: 12137...  Training loss: 4.1198...  0.3420 sec/batch
Epoch: 66/100...  Training Step: 12138...  Training loss: 4.0763...  0.3424 sec/batch
Epoch: 66/100...  Training Step: 12139...  Training loss: 4.1259...  0.3393 sec/batch
Epoch: 66/100...  Training Step: 12140...  Training loss: 4.0856...  0.3420 sec/batch
Epoch: 66/100...  Training Step: 12141...  Training loss: 4.1363...  0.3389 sec/batch
Epoch: 66/100...  Training Step: 12142...  Training loss: 4.0905...  0.3423 sec/batch
Epoch: 66/100...  Training Step: 12143...  Training loss: 4.0703...  0.3432 sec/batch
Epoch: 66/100...  Training Step: 12144...  Training lo

Epoch: 67/100...  Training Step: 12229...  Training loss: 4.0618...  0.3404 sec/batch
Epoch: 67/100...  Training Step: 12230...  Training loss: 4.0838...  0.3404 sec/batch
Epoch: 67/100...  Training Step: 12231...  Training loss: 4.0153...  0.3412 sec/batch
Epoch: 67/100...  Training Step: 12232...  Training loss: 3.9827...  0.3420 sec/batch
Epoch: 67/100...  Training Step: 12233...  Training loss: 4.0928...  0.3401 sec/batch
Epoch: 67/100...  Training Step: 12234...  Training loss: 4.1697...  0.3403 sec/batch
Epoch: 67/100...  Training Step: 12235...  Training loss: 4.1037...  0.3433 sec/batch
Epoch: 67/100...  Training Step: 12236...  Training loss: 4.0286...  0.3381 sec/batch
Epoch: 67/100...  Training Step: 12237...  Training loss: 4.0727...  0.3403 sec/batch
Epoch: 67/100...  Training Step: 12238...  Training loss: 4.0575...  0.3402 sec/batch
Epoch: 67/100...  Training Step: 12239...  Training loss: 4.1144...  0.3386 sec/batch
Epoch: 67/100...  Training Step: 12240...  Training lo

Epoch: 67/100...  Training Step: 12325...  Training loss: 4.1077...  0.3420 sec/batch
Epoch: 67/100...  Training Step: 12326...  Training loss: 4.0734...  0.3427 sec/batch
Epoch: 67/100...  Training Step: 12327...  Training loss: 4.0331...  0.3437 sec/batch
Epoch: 67/100...  Training Step: 12328...  Training loss: 4.0776...  0.3399 sec/batch
Epoch: 68/100...  Training Step: 12329...  Training loss: 4.0541...  0.3434 sec/batch
Epoch: 68/100...  Training Step: 12330...  Training loss: 3.8467...  0.3417 sec/batch
Epoch: 68/100...  Training Step: 12331...  Training loss: 3.8594...  0.3390 sec/batch
Epoch: 68/100...  Training Step: 12332...  Training loss: 3.9616...  0.3437 sec/batch
Epoch: 68/100...  Training Step: 12333...  Training loss: 3.9225...  0.3442 sec/batch
Epoch: 68/100...  Training Step: 12334...  Training loss: 3.8739...  0.3424 sec/batch
Epoch: 68/100...  Training Step: 12335...  Training loss: 4.0548...  0.3395 sec/batch
Epoch: 68/100...  Training Step: 12336...  Training lo

Epoch: 68/100...  Training Step: 12421...  Training loss: 4.0436...  0.3400 sec/batch
Epoch: 68/100...  Training Step: 12422...  Training loss: 4.0584...  0.3434 sec/batch
Epoch: 68/100...  Training Step: 12423...  Training loss: 4.0885...  0.3418 sec/batch
Epoch: 68/100...  Training Step: 12424...  Training loss: 4.1004...  0.3398 sec/batch
Epoch: 68/100...  Training Step: 12425...  Training loss: 4.0035...  0.3414 sec/batch
Epoch: 68/100...  Training Step: 12426...  Training loss: 4.0210...  0.3411 sec/batch
Epoch: 68/100...  Training Step: 12427...  Training loss: 4.0260...  0.3431 sec/batch
Epoch: 68/100...  Training Step: 12428...  Training loss: 4.0272...  0.3402 sec/batch
Epoch: 68/100...  Training Step: 12429...  Training loss: 4.0857...  0.3415 sec/batch
Epoch: 68/100...  Training Step: 12430...  Training loss: 4.0318...  0.3415 sec/batch
Epoch: 68/100...  Training Step: 12431...  Training loss: 4.0594...  0.3407 sec/batch
Epoch: 68/100...  Training Step: 12432...  Training lo

Epoch: 69/100...  Training Step: 12517...  Training loss: 3.9083...  0.3401 sec/batch
Epoch: 69/100...  Training Step: 12518...  Training loss: 3.8597...  0.3424 sec/batch
Epoch: 69/100...  Training Step: 12519...  Training loss: 4.0176...  0.3386 sec/batch
Epoch: 69/100...  Training Step: 12520...  Training loss: 4.0413...  0.3428 sec/batch
Epoch: 69/100...  Training Step: 12521...  Training loss: 4.0348...  0.3427 sec/batch
Epoch: 69/100...  Training Step: 12522...  Training loss: 4.0577...  0.3397 sec/batch
Epoch: 69/100...  Training Step: 12523...  Training loss: 4.0142...  0.3429 sec/batch
Epoch: 69/100...  Training Step: 12524...  Training loss: 4.0073...  0.3394 sec/batch
Epoch: 69/100...  Training Step: 12525...  Training loss: 3.9842...  0.3402 sec/batch
Epoch: 69/100...  Training Step: 12526...  Training loss: 3.9996...  0.3411 sec/batch
Epoch: 69/100...  Training Step: 12527...  Training loss: 4.0336...  0.3404 sec/batch
Epoch: 69/100...  Training Step: 12528...  Training lo

Epoch: 69/100...  Training Step: 12613...  Training loss: 4.0702...  0.3429 sec/batch
Epoch: 69/100...  Training Step: 12614...  Training loss: 4.0318...  0.3396 sec/batch
Epoch: 69/100...  Training Step: 12615...  Training loss: 4.0412...  0.3420 sec/batch
Epoch: 69/100...  Training Step: 12616...  Training loss: 3.9420...  0.3404 sec/batch
Epoch: 69/100...  Training Step: 12617...  Training loss: 4.0538...  0.3416 sec/batch
Epoch: 69/100...  Training Step: 12618...  Training loss: 4.0632...  0.3406 sec/batch
Epoch: 69/100...  Training Step: 12619...  Training loss: 4.0262...  0.3402 sec/batch
Epoch: 69/100...  Training Step: 12620...  Training loss: 3.9966...  0.3417 sec/batch
Epoch: 69/100...  Training Step: 12621...  Training loss: 4.0096...  0.3413 sec/batch
Epoch: 69/100...  Training Step: 12622...  Training loss: 4.0040...  0.3403 sec/batch
Epoch: 69/100...  Training Step: 12623...  Training loss: 4.0589...  0.3417 sec/batch
Epoch: 69/100...  Training Step: 12624...  Training lo

Epoch: 70/100...  Training Step: 12709...  Training loss: 3.9931...  0.3407 sec/batch
Epoch: 70/100...  Training Step: 12710...  Training loss: 3.9722...  0.3422 sec/batch
Epoch: 70/100...  Training Step: 12711...  Training loss: 4.0062...  0.3414 sec/batch
Epoch: 70/100...  Training Step: 12712...  Training loss: 4.0166...  0.3408 sec/batch
Epoch: 70/100...  Training Step: 12713...  Training loss: 4.0980...  0.3403 sec/batch
Epoch: 70/100...  Training Step: 12714...  Training loss: 3.9941...  0.3409 sec/batch
Epoch: 70/100...  Training Step: 12715...  Training loss: 4.0898...  0.3430 sec/batch
Epoch: 70/100...  Training Step: 12716...  Training loss: 4.1089...  0.3436 sec/batch
Epoch: 70/100...  Training Step: 12717...  Training loss: 4.0254...  0.3417 sec/batch
Epoch: 70/100...  Training Step: 12718...  Training loss: 3.9371...  0.3400 sec/batch
Epoch: 70/100...  Training Step: 12719...  Training loss: 3.9803...  0.3410 sec/batch
Epoch: 70/100...  Training Step: 12720...  Training lo

Epoch: 70/100...  Training Step: 12805...  Training loss: 4.0040...  0.3424 sec/batch
Epoch: 70/100...  Training Step: 12806...  Training loss: 3.9972...  0.3418 sec/batch
Epoch: 70/100...  Training Step: 12807...  Training loss: 4.0430...  0.3451 sec/batch
Epoch: 70/100...  Training Step: 12808...  Training loss: 4.0059...  0.3441 sec/batch
Epoch: 70/100...  Training Step: 12809...  Training loss: 4.0332...  0.3396 sec/batch
Epoch: 70/100...  Training Step: 12810...  Training loss: 4.0024...  0.3403 sec/batch
Epoch: 70/100...  Training Step: 12811...  Training loss: 4.0722...  0.3415 sec/batch
Epoch: 70/100...  Training Step: 12812...  Training loss: 4.0462...  0.3436 sec/batch
Epoch: 70/100...  Training Step: 12813...  Training loss: 4.0813...  0.3426 sec/batch
Epoch: 70/100...  Training Step: 12814...  Training loss: 3.9890...  0.3402 sec/batch
Epoch: 70/100...  Training Step: 12815...  Training loss: 4.0640...  0.3390 sec/batch
Epoch: 70/100...  Training Step: 12816...  Training lo

Epoch: 71/100...  Training Step: 12901...  Training loss: 4.0072...  0.3414 sec/batch
Epoch: 71/100...  Training Step: 12902...  Training loss: 3.9015...  0.3436 sec/batch
Epoch: 71/100...  Training Step: 12903...  Training loss: 3.9564...  0.3414 sec/batch
Epoch: 71/100...  Training Step: 12904...  Training loss: 3.9689...  0.3439 sec/batch
Epoch: 71/100...  Training Step: 12905...  Training loss: 4.0012...  0.3430 sec/batch
Epoch: 71/100...  Training Step: 12906...  Training loss: 3.9964...  0.3433 sec/batch
Epoch: 71/100...  Training Step: 12907...  Training loss: 4.0066...  0.3450 sec/batch
Epoch: 71/100...  Training Step: 12908...  Training loss: 4.0132...  0.3447 sec/batch
Epoch: 71/100...  Training Step: 12909...  Training loss: 4.0421...  0.3440 sec/batch
Epoch: 71/100...  Training Step: 12910...  Training loss: 4.1126...  0.3429 sec/batch
Epoch: 71/100...  Training Step: 12911...  Training loss: 4.0851...  0.3433 sec/batch
Epoch: 71/100...  Training Step: 12912...  Training lo

Epoch: 71/100...  Training Step: 12997...  Training loss: 4.0564...  0.3449 sec/batch
Epoch: 71/100...  Training Step: 12998...  Training loss: 3.9790...  0.3408 sec/batch
Epoch: 71/100...  Training Step: 12999...  Training loss: 4.0819...  0.3408 sec/batch
Epoch: 71/100...  Training Step: 13000...  Training loss: 4.0229...  0.3433 sec/batch
Epoch: 71/100...  Training Step: 13001...  Training loss: 4.1861...  0.3842 sec/batch
Epoch: 71/100...  Training Step: 13002...  Training loss: 4.1433...  0.3476 sec/batch
Epoch: 71/100...  Training Step: 13003...  Training loss: 4.1036...  0.3437 sec/batch
Epoch: 71/100...  Training Step: 13004...  Training loss: 3.9973...  0.3435 sec/batch
Epoch: 71/100...  Training Step: 13005...  Training loss: 4.0447...  0.3427 sec/batch
Epoch: 71/100...  Training Step: 13006...  Training loss: 4.0646...  0.3430 sec/batch
Epoch: 71/100...  Training Step: 13007...  Training loss: 4.0733...  0.3391 sec/batch
Epoch: 71/100...  Training Step: 13008...  Training lo

Epoch: 72/100...  Training Step: 13093...  Training loss: 3.9974...  0.3470 sec/batch
Epoch: 72/100...  Training Step: 13094...  Training loss: 4.1037...  0.3384 sec/batch
Epoch: 72/100...  Training Step: 13095...  Training loss: 4.0515...  0.3406 sec/batch
Epoch: 72/100...  Training Step: 13096...  Training loss: 4.0323...  0.3402 sec/batch
Epoch: 72/100...  Training Step: 13097...  Training loss: 3.9109...  0.3373 sec/batch
Epoch: 72/100...  Training Step: 13098...  Training loss: 3.9338...  0.3405 sec/batch
Epoch: 72/100...  Training Step: 13099...  Training loss: 4.0263...  0.3401 sec/batch
Epoch: 72/100...  Training Step: 13100...  Training loss: 4.0812...  0.3407 sec/batch
Epoch: 72/100...  Training Step: 13101...  Training loss: 4.1331...  0.3391 sec/batch
Epoch: 72/100...  Training Step: 13102...  Training loss: 4.0706...  0.3378 sec/batch
Epoch: 72/100...  Training Step: 13103...  Training loss: 4.0808...  0.3392 sec/batch
Epoch: 72/100...  Training Step: 13104...  Training lo

Epoch: 72/100...  Training Step: 13189...  Training loss: 4.0648...  0.3392 sec/batch
Epoch: 72/100...  Training Step: 13190...  Training loss: 4.0860...  0.3402 sec/batch
Epoch: 72/100...  Training Step: 13191...  Training loss: 4.0717...  0.3410 sec/batch
Epoch: 72/100...  Training Step: 13192...  Training loss: 4.1039...  0.3408 sec/batch
Epoch: 72/100...  Training Step: 13193...  Training loss: 4.1236...  0.3405 sec/batch
Epoch: 72/100...  Training Step: 13194...  Training loss: 4.0997...  0.3382 sec/batch
Epoch: 72/100...  Training Step: 13195...  Training loss: 4.1277...  0.3406 sec/batch
Epoch: 72/100...  Training Step: 13196...  Training loss: 4.0395...  0.3381 sec/batch
Epoch: 72/100...  Training Step: 13197...  Training loss: 4.0927...  0.3384 sec/batch
Epoch: 72/100...  Training Step: 13198...  Training loss: 4.0307...  0.3415 sec/batch
Epoch: 72/100...  Training Step: 13199...  Training loss: 4.0778...  0.3393 sec/batch
Epoch: 72/100...  Training Step: 13200...  Training lo

Epoch: 73/100...  Training Step: 13285...  Training loss: 4.0804...  0.3407 sec/batch
Epoch: 73/100...  Training Step: 13286...  Training loss: 4.0555...  0.3429 sec/batch
Epoch: 73/100...  Training Step: 13287...  Training loss: 4.0817...  0.3389 sec/batch
Epoch: 73/100...  Training Step: 13288...  Training loss: 4.0887...  0.3417 sec/batch
Epoch: 73/100...  Training Step: 13289...  Training loss: 4.0492...  0.3441 sec/batch
Epoch: 73/100...  Training Step: 13290...  Training loss: 4.1005...  0.3393 sec/batch
Epoch: 73/100...  Training Step: 13291...  Training loss: 4.1620...  0.3429 sec/batch
Epoch: 73/100...  Training Step: 13292...  Training loss: 4.0984...  0.3403 sec/batch
Epoch: 73/100...  Training Step: 13293...  Training loss: 4.0566...  0.3411 sec/batch
Epoch: 73/100...  Training Step: 13294...  Training loss: 4.0030...  0.3420 sec/batch
Epoch: 73/100...  Training Step: 13295...  Training loss: 4.1005...  0.3410 sec/batch
Epoch: 73/100...  Training Step: 13296...  Training lo

Epoch: 73/100...  Training Step: 13381...  Training loss: 4.0796...  0.3404 sec/batch
Epoch: 73/100...  Training Step: 13382...  Training loss: 4.0447...  0.3390 sec/batch
Epoch: 73/100...  Training Step: 13383...  Training loss: 4.1169...  0.3396 sec/batch
Epoch: 73/100...  Training Step: 13384...  Training loss: 3.9864...  0.3404 sec/batch
Epoch: 73/100...  Training Step: 13385...  Training loss: 4.0170...  0.3415 sec/batch
Epoch: 73/100...  Training Step: 13386...  Training loss: 4.0249...  0.3401 sec/batch
Epoch: 73/100...  Training Step: 13387...  Training loss: 4.0615...  0.3436 sec/batch
Epoch: 73/100...  Training Step: 13388...  Training loss: 3.9781...  0.3422 sec/batch
Epoch: 73/100...  Training Step: 13389...  Training loss: 4.0732...  0.3422 sec/batch
Epoch: 73/100...  Training Step: 13390...  Training loss: 4.0640...  0.3407 sec/batch
Epoch: 73/100...  Training Step: 13391...  Training loss: 4.0091...  0.3396 sec/batch
Epoch: 73/100...  Training Step: 13392...  Training lo

Epoch: 74/100...  Training Step: 13477...  Training loss: 4.0434...  0.3411 sec/batch
Epoch: 74/100...  Training Step: 13478...  Training loss: 4.0194...  0.3431 sec/batch
Epoch: 74/100...  Training Step: 13479...  Training loss: 4.0737...  0.3423 sec/batch
Epoch: 74/100...  Training Step: 13480...  Training loss: 3.9989...  0.3399 sec/batch
Epoch: 74/100...  Training Step: 13481...  Training loss: 4.0055...  0.3410 sec/batch
Epoch: 74/100...  Training Step: 13482...  Training loss: 4.0728...  0.3422 sec/batch
Epoch: 74/100...  Training Step: 13483...  Training loss: 4.0590...  0.3441 sec/batch
Epoch: 74/100...  Training Step: 13484...  Training loss: 4.1040...  0.3437 sec/batch
Epoch: 74/100...  Training Step: 13485...  Training loss: 4.0818...  0.3425 sec/batch
Epoch: 74/100...  Training Step: 13486...  Training loss: 4.0775...  0.3448 sec/batch
Epoch: 74/100...  Training Step: 13487...  Training loss: 4.0622...  0.3423 sec/batch
Epoch: 74/100...  Training Step: 13488...  Training lo

Epoch: 74/100...  Training Step: 13573...  Training loss: 4.0457...  0.3431 sec/batch
Epoch: 74/100...  Training Step: 13574...  Training loss: 4.0750...  0.3438 sec/batch
Epoch: 74/100...  Training Step: 13575...  Training loss: 3.9903...  0.3404 sec/batch
Epoch: 74/100...  Training Step: 13576...  Training loss: 4.0017...  0.3426 sec/batch
Epoch: 74/100...  Training Step: 13577...  Training loss: 4.0113...  0.3391 sec/batch
Epoch: 74/100...  Training Step: 13578...  Training loss: 4.0095...  0.3429 sec/batch
Epoch: 74/100...  Training Step: 13579...  Training loss: 3.9957...  0.3438 sec/batch
Epoch: 74/100...  Training Step: 13580...  Training loss: 4.0413...  0.3407 sec/batch
Epoch: 74/100...  Training Step: 13581...  Training loss: 4.0347...  0.3405 sec/batch
Epoch: 74/100...  Training Step: 13582...  Training loss: 3.9402...  0.3415 sec/batch
Epoch: 74/100...  Training Step: 13583...  Training loss: 3.9773...  0.3419 sec/batch
Epoch: 74/100...  Training Step: 13584...  Training lo

Epoch: 75/100...  Training Step: 13669...  Training loss: 4.0616...  0.3411 sec/batch
Epoch: 75/100...  Training Step: 13670...  Training loss: 4.0690...  0.3404 sec/batch
Epoch: 75/100...  Training Step: 13671...  Training loss: 4.0759...  0.3428 sec/batch
Epoch: 75/100...  Training Step: 13672...  Training loss: 4.0667...  0.3421 sec/batch
Epoch: 75/100...  Training Step: 13673...  Training loss: 3.9660...  0.3440 sec/batch
Epoch: 75/100...  Training Step: 13674...  Training loss: 4.0225...  0.3429 sec/batch
Epoch: 75/100...  Training Step: 13675...  Training loss: 3.9167...  0.3409 sec/batch
Epoch: 75/100...  Training Step: 13676...  Training loss: 3.9265...  0.3427 sec/batch
Epoch: 75/100...  Training Step: 13677...  Training loss: 3.9770...  0.3434 sec/batch
Epoch: 75/100...  Training Step: 13678...  Training loss: 3.9817...  0.3406 sec/batch
Epoch: 75/100...  Training Step: 13679...  Training loss: 3.9791...  0.3438 sec/batch
Epoch: 75/100...  Training Step: 13680...  Training lo

Epoch: 75/100...  Training Step: 13765...  Training loss: 4.0519...  0.3385 sec/batch
Epoch: 75/100...  Training Step: 13766...  Training loss: 3.9304...  0.3429 sec/batch
Epoch: 75/100...  Training Step: 13767...  Training loss: 4.0030...  0.3444 sec/batch
Epoch: 75/100...  Training Step: 13768...  Training loss: 3.9620...  0.3421 sec/batch
Epoch: 75/100...  Training Step: 13769...  Training loss: 3.9892...  0.3442 sec/batch
Epoch: 75/100...  Training Step: 13770...  Training loss: 4.0214...  0.3396 sec/batch
Epoch: 75/100...  Training Step: 13771...  Training loss: 4.0207...  0.3456 sec/batch
Epoch: 75/100...  Training Step: 13772...  Training loss: 4.0483...  0.3429 sec/batch
Epoch: 75/100...  Training Step: 13773...  Training loss: 4.0322...  0.3427 sec/batch
Epoch: 75/100...  Training Step: 13774...  Training loss: 4.0745...  0.3393 sec/batch
Epoch: 75/100...  Training Step: 13775...  Training loss: 3.9471...  0.3424 sec/batch
Epoch: 75/100...  Training Step: 13776...  Training lo

Epoch: 76/100...  Training Step: 13861...  Training loss: 3.9480...  0.3429 sec/batch
Epoch: 76/100...  Training Step: 13862...  Training loss: 3.9899...  0.3414 sec/batch
Epoch: 76/100...  Training Step: 13863...  Training loss: 3.9773...  0.3429 sec/batch
Epoch: 76/100...  Training Step: 13864...  Training loss: 4.0450...  0.3421 sec/batch
Epoch: 76/100...  Training Step: 13865...  Training loss: 3.9655...  0.3425 sec/batch
Epoch: 76/100...  Training Step: 13866...  Training loss: 3.9481...  0.3412 sec/batch
Epoch: 76/100...  Training Step: 13867...  Training loss: 3.9455...  0.3442 sec/batch
Epoch: 76/100...  Training Step: 13868...  Training loss: 3.9056...  0.3429 sec/batch
Epoch: 76/100...  Training Step: 13869...  Training loss: 4.0025...  0.3424 sec/batch
Epoch: 76/100...  Training Step: 13870...  Training loss: 4.0241...  0.3397 sec/batch
Epoch: 76/100...  Training Step: 13871...  Training loss: 3.9309...  0.3453 sec/batch
Epoch: 76/100...  Training Step: 13872...  Training lo

Epoch: 76/100...  Training Step: 13957...  Training loss: 4.0333...  0.3412 sec/batch
Epoch: 76/100...  Training Step: 13958...  Training loss: 4.0390...  0.3419 sec/batch
Epoch: 76/100...  Training Step: 13959...  Training loss: 3.9118...  0.3442 sec/batch
Epoch: 76/100...  Training Step: 13960...  Training loss: 3.9521...  0.3394 sec/batch
Epoch: 76/100...  Training Step: 13961...  Training loss: 3.9710...  0.3439 sec/batch
Epoch: 76/100...  Training Step: 13962...  Training loss: 4.0379...  0.3440 sec/batch
Epoch: 76/100...  Training Step: 13963...  Training loss: 3.9268...  0.3406 sec/batch
Epoch: 76/100...  Training Step: 13964...  Training loss: 3.8914...  0.3421 sec/batch
Epoch: 76/100...  Training Step: 13965...  Training loss: 3.9773...  0.3390 sec/batch
Epoch: 76/100...  Training Step: 13966...  Training loss: 3.9319...  0.3436 sec/batch
Epoch: 76/100...  Training Step: 13967...  Training loss: 3.9221...  0.3421 sec/batch
Epoch: 76/100...  Training Step: 13968...  Training lo

Epoch: 77/100...  Training Step: 14053...  Training loss: 3.9477...  0.3442 sec/batch
Epoch: 77/100...  Training Step: 14054...  Training loss: 4.0228...  0.3448 sec/batch
Epoch: 77/100...  Training Step: 14055...  Training loss: 3.9082...  0.3407 sec/batch
Epoch: 77/100...  Training Step: 14056...  Training loss: 3.8697...  0.3449 sec/batch
Epoch: 77/100...  Training Step: 14057...  Training loss: 3.9130...  0.3444 sec/batch
Epoch: 77/100...  Training Step: 14058...  Training loss: 3.9130...  0.3448 sec/batch
Epoch: 77/100...  Training Step: 14059...  Training loss: 3.9241...  0.3467 sec/batch
Epoch: 77/100...  Training Step: 14060...  Training loss: 3.9665...  0.3405 sec/batch
Epoch: 77/100...  Training Step: 14061...  Training loss: 3.9852...  0.3442 sec/batch
Epoch: 77/100...  Training Step: 14062...  Training loss: 3.9016...  0.3440 sec/batch
Epoch: 77/100...  Training Step: 14063...  Training loss: 3.9913...  0.3420 sec/batch
Epoch: 77/100...  Training Step: 14064...  Training lo

Epoch: 77/100...  Training Step: 14149...  Training loss: 3.9881...  0.3457 sec/batch
Epoch: 77/100...  Training Step: 14150...  Training loss: 3.9062...  0.3432 sec/batch
Epoch: 77/100...  Training Step: 14151...  Training loss: 3.9035...  0.3432 sec/batch
Epoch: 77/100...  Training Step: 14152...  Training loss: 3.8801...  0.3447 sec/batch
Epoch: 77/100...  Training Step: 14153...  Training loss: 3.9881...  0.3409 sec/batch
Epoch: 77/100...  Training Step: 14154...  Training loss: 3.9984...  0.3410 sec/batch
Epoch: 77/100...  Training Step: 14155...  Training loss: 3.9323...  0.3424 sec/batch
Epoch: 77/100...  Training Step: 14156...  Training loss: 3.9183...  0.3458 sec/batch
Epoch: 77/100...  Training Step: 14157...  Training loss: 3.8742...  0.3422 sec/batch
Epoch: 77/100...  Training Step: 14158...  Training loss: 3.8785...  0.3403 sec/batch
Epoch: 77/100...  Training Step: 14159...  Training loss: 3.8361...  0.3430 sec/batch
Epoch: 77/100...  Training Step: 14160...  Training lo

Epoch: 78/100...  Training Step: 14245...  Training loss: 3.9282...  0.3434 sec/batch
Epoch: 78/100...  Training Step: 14246...  Training loss: 3.8562...  0.3431 sec/batch
Epoch: 78/100...  Training Step: 14247...  Training loss: 3.9328...  0.3450 sec/batch
Epoch: 78/100...  Training Step: 14248...  Training loss: 3.8813...  0.3424 sec/batch
Epoch: 78/100...  Training Step: 14249...  Training loss: 3.9678...  0.3404 sec/batch
Epoch: 78/100...  Training Step: 14250...  Training loss: 3.9376...  0.3451 sec/batch
Epoch: 78/100...  Training Step: 14251...  Training loss: 3.9354...  0.3454 sec/batch
Epoch: 78/100...  Training Step: 14252...  Training loss: 3.8792...  0.3448 sec/batch
Epoch: 78/100...  Training Step: 14253...  Training loss: 3.9007...  0.3455 sec/batch
Epoch: 78/100...  Training Step: 14254...  Training loss: 3.9571...  0.3452 sec/batch
Epoch: 78/100...  Training Step: 14255...  Training loss: 3.9021...  0.3433 sec/batch
Epoch: 78/100...  Training Step: 14256...  Training lo

Epoch: 78/100...  Training Step: 14341...  Training loss: 3.8605...  0.3433 sec/batch
Epoch: 78/100...  Training Step: 14342...  Training loss: 3.8611...  0.3429 sec/batch
Epoch: 78/100...  Training Step: 14343...  Training loss: 3.8634...  0.3433 sec/batch
Epoch: 78/100...  Training Step: 14344...  Training loss: 3.9213...  0.3437 sec/batch
Epoch: 78/100...  Training Step: 14345...  Training loss: 3.9733...  0.3450 sec/batch
Epoch: 78/100...  Training Step: 14346...  Training loss: 3.9449...  0.3465 sec/batch
Epoch: 78/100...  Training Step: 14347...  Training loss: 3.9509...  0.3445 sec/batch
Epoch: 78/100...  Training Step: 14348...  Training loss: 3.9193...  0.3445 sec/batch
Epoch: 78/100...  Training Step: 14349...  Training loss: 3.8799...  0.3452 sec/batch
Epoch: 78/100...  Training Step: 14350...  Training loss: 3.9167...  0.3385 sec/batch
Epoch: 78/100...  Training Step: 14351...  Training loss: 3.8847...  0.3447 sec/batch
Epoch: 78/100...  Training Step: 14352...  Training lo

Epoch: 79/100...  Training Step: 14437...  Training loss: 3.9361...  0.3439 sec/batch
Epoch: 79/100...  Training Step: 14438...  Training loss: 3.9357...  0.3439 sec/batch
Epoch: 79/100...  Training Step: 14439...  Training loss: 3.9014...  0.3410 sec/batch
Epoch: 79/100...  Training Step: 14440...  Training loss: 3.8259...  0.3463 sec/batch
Epoch: 79/100...  Training Step: 14441...  Training loss: 3.9146...  0.3441 sec/batch
Epoch: 79/100...  Training Step: 14442...  Training loss: 3.9948...  0.3452 sec/batch
Epoch: 79/100...  Training Step: 14443...  Training loss: 3.9159...  0.3430 sec/batch
Epoch: 79/100...  Training Step: 14444...  Training loss: 3.8331...  0.3460 sec/batch
Epoch: 79/100...  Training Step: 14445...  Training loss: 3.8851...  0.3437 sec/batch
Epoch: 79/100...  Training Step: 14446...  Training loss: 3.8910...  0.3418 sec/batch
Epoch: 79/100...  Training Step: 14447...  Training loss: 3.9572...  0.3425 sec/batch
Epoch: 79/100...  Training Step: 14448...  Training lo

Epoch: 79/100...  Training Step: 14533...  Training loss: 3.8734...  0.3454 sec/batch
Epoch: 79/100...  Training Step: 14534...  Training loss: 3.8755...  0.3445 sec/batch
Epoch: 79/100...  Training Step: 14535...  Training loss: 3.8430...  0.3443 sec/batch
Epoch: 79/100...  Training Step: 14536...  Training loss: 3.9356...  0.3426 sec/batch
Epoch: 80/100...  Training Step: 14537...  Training loss: 3.8247...  0.3435 sec/batch
Epoch: 80/100...  Training Step: 14538...  Training loss: 3.6502...  0.3395 sec/batch
Epoch: 80/100...  Training Step: 14539...  Training loss: 3.6735...  0.3439 sec/batch
Epoch: 80/100...  Training Step: 14540...  Training loss: 3.7282...  0.3414 sec/batch
Epoch: 80/100...  Training Step: 14541...  Training loss: 3.7142...  0.3411 sec/batch
Epoch: 80/100...  Training Step: 14542...  Training loss: 3.6767...  0.3457 sec/batch
Epoch: 80/100...  Training Step: 14543...  Training loss: 3.8357...  0.3445 sec/batch
Epoch: 80/100...  Training Step: 14544...  Training lo

Epoch: 80/100...  Training Step: 14629...  Training loss: 3.9032...  0.3410 sec/batch
Epoch: 80/100...  Training Step: 14630...  Training loss: 3.8935...  0.3442 sec/batch
Epoch: 80/100...  Training Step: 14631...  Training loss: 3.9543...  0.3448 sec/batch
Epoch: 80/100...  Training Step: 14632...  Training loss: 3.9754...  0.3452 sec/batch
Epoch: 80/100...  Training Step: 14633...  Training loss: 3.9333...  0.3447 sec/batch
Epoch: 80/100...  Training Step: 14634...  Training loss: 3.8730...  0.3405 sec/batch
Epoch: 80/100...  Training Step: 14635...  Training loss: 3.9001...  0.3407 sec/batch
Epoch: 80/100...  Training Step: 14636...  Training loss: 3.8977...  0.3431 sec/batch
Epoch: 80/100...  Training Step: 14637...  Training loss: 3.9243...  0.3430 sec/batch
Epoch: 80/100...  Training Step: 14638...  Training loss: 3.8541...  0.3427 sec/batch
Epoch: 80/100...  Training Step: 14639...  Training loss: 3.8877...  0.3434 sec/batch
Epoch: 80/100...  Training Step: 14640...  Training lo

Epoch: 81/100...  Training Step: 14725...  Training loss: 3.6727...  0.3426 sec/batch
Epoch: 81/100...  Training Step: 14726...  Training loss: 3.6367...  0.3447 sec/batch
Epoch: 81/100...  Training Step: 14727...  Training loss: 3.8256...  0.3451 sec/batch
Epoch: 81/100...  Training Step: 14728...  Training loss: 3.8641...  0.3419 sec/batch
Epoch: 81/100...  Training Step: 14729...  Training loss: 3.8526...  0.3452 sec/batch
Epoch: 81/100...  Training Step: 14730...  Training loss: 3.8990...  0.3405 sec/batch
Epoch: 81/100...  Training Step: 14731...  Training loss: 3.8311...  0.3437 sec/batch
Epoch: 81/100...  Training Step: 14732...  Training loss: 3.8545...  0.3440 sec/batch
Epoch: 81/100...  Training Step: 14733...  Training loss: 3.8613...  0.3445 sec/batch
Epoch: 81/100...  Training Step: 14734...  Training loss: 3.8153...  0.3415 sec/batch
Epoch: 81/100...  Training Step: 14735...  Training loss: 3.8873...  0.3451 sec/batch
Epoch: 81/100...  Training Step: 14736...  Training lo

Epoch: 81/100...  Training Step: 14821...  Training loss: 3.8878...  0.3458 sec/batch
Epoch: 81/100...  Training Step: 14822...  Training loss: 3.8465...  0.3415 sec/batch
Epoch: 81/100...  Training Step: 14823...  Training loss: 3.8992...  0.3410 sec/batch
Epoch: 81/100...  Training Step: 14824...  Training loss: 3.7508...  0.3435 sec/batch
Epoch: 81/100...  Training Step: 14825...  Training loss: 3.9004...  0.3433 sec/batch
Epoch: 81/100...  Training Step: 14826...  Training loss: 3.9056...  0.3444 sec/batch
Epoch: 81/100...  Training Step: 14827...  Training loss: 3.8252...  0.3416 sec/batch
Epoch: 81/100...  Training Step: 14828...  Training loss: 3.8238...  0.3424 sec/batch
Epoch: 81/100...  Training Step: 14829...  Training loss: 3.8397...  0.3413 sec/batch
Epoch: 81/100...  Training Step: 14830...  Training loss: 3.8881...  0.3405 sec/batch
Epoch: 81/100...  Training Step: 14831...  Training loss: 3.9310...  0.3446 sec/batch
Epoch: 81/100...  Training Step: 14832...  Training lo

Epoch: 82/100...  Training Step: 14917...  Training loss: 3.8149...  0.3421 sec/batch
Epoch: 82/100...  Training Step: 14918...  Training loss: 3.8057...  0.3419 sec/batch
Epoch: 82/100...  Training Step: 14919...  Training loss: 3.8280...  0.3429 sec/batch
Epoch: 82/100...  Training Step: 14920...  Training loss: 3.8223...  0.3404 sec/batch
Epoch: 82/100...  Training Step: 14921...  Training loss: 3.9054...  0.3418 sec/batch
Epoch: 82/100...  Training Step: 14922...  Training loss: 3.8422...  0.3407 sec/batch
Epoch: 82/100...  Training Step: 14923...  Training loss: 3.9091...  0.3432 sec/batch
Epoch: 82/100...  Training Step: 14924...  Training loss: 3.8902...  0.3446 sec/batch
Epoch: 82/100...  Training Step: 14925...  Training loss: 3.8269...  0.3414 sec/batch
Epoch: 82/100...  Training Step: 14926...  Training loss: 3.7939...  0.3414 sec/batch
Epoch: 82/100...  Training Step: 14927...  Training loss: 3.7640...  0.3449 sec/batch
Epoch: 82/100...  Training Step: 14928...  Training lo

Epoch: 82/100...  Training Step: 15013...  Training loss: 3.8475...  0.3413 sec/batch
Epoch: 82/100...  Training Step: 15014...  Training loss: 3.8175...  0.3426 sec/batch
Epoch: 82/100...  Training Step: 15015...  Training loss: 3.9059...  0.3405 sec/batch
Epoch: 82/100...  Training Step: 15016...  Training loss: 3.8398...  0.3446 sec/batch
Epoch: 82/100...  Training Step: 15017...  Training loss: 3.8448...  0.3406 sec/batch
Epoch: 82/100...  Training Step: 15018...  Training loss: 3.7978...  0.3456 sec/batch
Epoch: 82/100...  Training Step: 15019...  Training loss: 3.9221...  0.3424 sec/batch
Epoch: 82/100...  Training Step: 15020...  Training loss: 3.8559...  0.3450 sec/batch
Epoch: 82/100...  Training Step: 15021...  Training loss: 3.8849...  0.3418 sec/batch
Epoch: 82/100...  Training Step: 15022...  Training loss: 3.7936...  0.3410 sec/batch
Epoch: 82/100...  Training Step: 15023...  Training loss: 3.8813...  0.3412 sec/batch
Epoch: 82/100...  Training Step: 15024...  Training lo

Epoch: 83/100...  Training Step: 15109...  Training loss: 3.7989...  0.3422 sec/batch
Epoch: 83/100...  Training Step: 15110...  Training loss: 3.7194...  0.3399 sec/batch
Epoch: 83/100...  Training Step: 15111...  Training loss: 3.7560...  0.3424 sec/batch
Epoch: 83/100...  Training Step: 15112...  Training loss: 3.7790...  0.3402 sec/batch
Epoch: 83/100...  Training Step: 15113...  Training loss: 3.8611...  0.3411 sec/batch
Epoch: 83/100...  Training Step: 15114...  Training loss: 3.8485...  0.3442 sec/batch
Epoch: 83/100...  Training Step: 15115...  Training loss: 3.9059...  0.3436 sec/batch
Epoch: 83/100...  Training Step: 15116...  Training loss: 3.8348...  0.3437 sec/batch
Epoch: 83/100...  Training Step: 15117...  Training loss: 3.8601...  0.3424 sec/batch
Epoch: 83/100...  Training Step: 15118...  Training loss: 3.9149...  0.3397 sec/batch
Epoch: 83/100...  Training Step: 15119...  Training loss: 3.9077...  0.3424 sec/batch
Epoch: 83/100...  Training Step: 15120...  Training lo

Epoch: 83/100...  Training Step: 15205...  Training loss: 3.8906...  0.3424 sec/batch
Epoch: 83/100...  Training Step: 15206...  Training loss: 3.7874...  0.3413 sec/batch
Epoch: 83/100...  Training Step: 15207...  Training loss: 3.8784...  0.3399 sec/batch
Epoch: 83/100...  Training Step: 15208...  Training loss: 3.8434...  0.3408 sec/batch
Epoch: 83/100...  Training Step: 15209...  Training loss: 3.9817...  0.3393 sec/batch
Epoch: 83/100...  Training Step: 15210...  Training loss: 3.9572...  0.3396 sec/batch
Epoch: 83/100...  Training Step: 15211...  Training loss: 3.9562...  0.3410 sec/batch
Epoch: 83/100...  Training Step: 15212...  Training loss: 3.8404...  0.3399 sec/batch
Epoch: 83/100...  Training Step: 15213...  Training loss: 3.8986...  0.3394 sec/batch
Epoch: 83/100...  Training Step: 15214...  Training loss: 3.9256...  0.3416 sec/batch
Epoch: 83/100...  Training Step: 15215...  Training loss: 3.9245...  0.3412 sec/batch
Epoch: 83/100...  Training Step: 15216...  Training lo

Epoch: 84/100...  Training Step: 15301...  Training loss: 3.8113...  0.3412 sec/batch
Epoch: 84/100...  Training Step: 15302...  Training loss: 3.9070...  0.3429 sec/batch
Epoch: 84/100...  Training Step: 15303...  Training loss: 3.8842...  0.3397 sec/batch
Epoch: 84/100...  Training Step: 15304...  Training loss: 3.8426...  0.3409 sec/batch
Epoch: 84/100...  Training Step: 15305...  Training loss: 3.7078...  0.3430 sec/batch
Epoch: 84/100...  Training Step: 15306...  Training loss: 3.7683...  0.3406 sec/batch
Epoch: 84/100...  Training Step: 15307...  Training loss: 3.8659...  0.3393 sec/batch
Epoch: 84/100...  Training Step: 15308...  Training loss: 3.9480...  0.3418 sec/batch
Epoch: 84/100...  Training Step: 15309...  Training loss: 3.9476...  0.3424 sec/batch
Epoch: 84/100...  Training Step: 15310...  Training loss: 3.8872...  0.3386 sec/batch
Epoch: 84/100...  Training Step: 15311...  Training loss: 3.8792...  0.3387 sec/batch
Epoch: 84/100...  Training Step: 15312...  Training lo

Epoch: 84/100...  Training Step: 15397...  Training loss: 3.8986...  0.3405 sec/batch
Epoch: 84/100...  Training Step: 15398...  Training loss: 3.9140...  0.3439 sec/batch
Epoch: 84/100...  Training Step: 15399...  Training loss: 3.8606...  0.3421 sec/batch
Epoch: 84/100...  Training Step: 15400...  Training loss: 3.8873...  0.3430 sec/batch
Epoch: 84/100...  Training Step: 15401...  Training loss: 3.9290...  0.3431 sec/batch
Epoch: 84/100...  Training Step: 15402...  Training loss: 3.9446...  0.3399 sec/batch
Epoch: 84/100...  Training Step: 15403...  Training loss: 3.9660...  0.3409 sec/batch
Epoch: 84/100...  Training Step: 15404...  Training loss: 3.8842...  0.3409 sec/batch
Epoch: 84/100...  Training Step: 15405...  Training loss: 3.9054...  0.3398 sec/batch
Epoch: 84/100...  Training Step: 15406...  Training loss: 3.8347...  0.3403 sec/batch
Epoch: 84/100...  Training Step: 15407...  Training loss: 3.9235...  0.3430 sec/batch
Epoch: 84/100...  Training Step: 15408...  Training lo

Epoch: 85/100...  Training Step: 15493...  Training loss: 3.9420...  0.3410 sec/batch
Epoch: 85/100...  Training Step: 15494...  Training loss: 3.9094...  0.3408 sec/batch
Epoch: 85/100...  Training Step: 15495...  Training loss: 3.9035...  0.3420 sec/batch
Epoch: 85/100...  Training Step: 15496...  Training loss: 3.9758...  0.3429 sec/batch
Epoch: 85/100...  Training Step: 15497...  Training loss: 3.9537...  0.3449 sec/batch
Epoch: 85/100...  Training Step: 15498...  Training loss: 4.0042...  0.3409 sec/batch
Epoch: 85/100...  Training Step: 15499...  Training loss: 4.0540...  0.3414 sec/batch
Epoch: 85/100...  Training Step: 15500...  Training loss: 3.9347...  0.3432 sec/batch
Epoch: 85/100...  Training Step: 15501...  Training loss: 3.9255...  0.3436 sec/batch
Epoch: 85/100...  Training Step: 15502...  Training loss: 3.9178...  0.3403 sec/batch
Epoch: 85/100...  Training Step: 15503...  Training loss: 3.9609...  0.3430 sec/batch
Epoch: 85/100...  Training Step: 15504...  Training lo

Epoch: 85/100...  Training Step: 15589...  Training loss: 3.8841...  0.3386 sec/batch
Epoch: 85/100...  Training Step: 15590...  Training loss: 3.8661...  0.3411 sec/batch
Epoch: 85/100...  Training Step: 15591...  Training loss: 3.8701...  0.3439 sec/batch
Epoch: 85/100...  Training Step: 15592...  Training loss: 3.8461...  0.3393 sec/batch
Epoch: 85/100...  Training Step: 15593...  Training loss: 3.8400...  0.3404 sec/batch
Epoch: 85/100...  Training Step: 15594...  Training loss: 3.8577...  0.3422 sec/batch
Epoch: 85/100...  Training Step: 15595...  Training loss: 3.9005...  0.3429 sec/batch
Epoch: 85/100...  Training Step: 15596...  Training loss: 3.8093...  0.3426 sec/batch
Epoch: 85/100...  Training Step: 15597...  Training loss: 3.8963...  0.3412 sec/batch
Epoch: 85/100...  Training Step: 15598...  Training loss: 3.8966...  0.3393 sec/batch
Epoch: 85/100...  Training Step: 15599...  Training loss: 3.8493...  0.3415 sec/batch
Epoch: 85/100...  Training Step: 15600...  Training lo

Epoch: 86/100...  Training Step: 15685...  Training loss: 3.9260...  0.3435 sec/batch
Epoch: 86/100...  Training Step: 15686...  Training loss: 3.8946...  0.3433 sec/batch
Epoch: 86/100...  Training Step: 15687...  Training loss: 3.9586...  0.3402 sec/batch
Epoch: 86/100...  Training Step: 15688...  Training loss: 3.8652...  0.3401 sec/batch
Epoch: 86/100...  Training Step: 15689...  Training loss: 3.9284...  0.3412 sec/batch
Epoch: 86/100...  Training Step: 15690...  Training loss: 3.9317...  0.3409 sec/batch
Epoch: 86/100...  Training Step: 15691...  Training loss: 3.9810...  0.3398 sec/batch
Epoch: 86/100...  Training Step: 15692...  Training loss: 3.9771...  0.3416 sec/batch
Epoch: 86/100...  Training Step: 15693...  Training loss: 3.9847...  0.3391 sec/batch
Epoch: 86/100...  Training Step: 15694...  Training loss: 3.9472...  0.3420 sec/batch
Epoch: 86/100...  Training Step: 15695...  Training loss: 3.9244...  0.3392 sec/batch
Epoch: 86/100...  Training Step: 15696...  Training lo

Epoch: 86/100...  Training Step: 15781...  Training loss: 3.8555...  0.3391 sec/batch
Epoch: 86/100...  Training Step: 15782...  Training loss: 3.8729...  0.3441 sec/batch
Epoch: 86/100...  Training Step: 15783...  Training loss: 3.8167...  0.3400 sec/batch
Epoch: 86/100...  Training Step: 15784...  Training loss: 3.7934...  0.3411 sec/batch
Epoch: 86/100...  Training Step: 15785...  Training loss: 3.8601...  0.3435 sec/batch
Epoch: 86/100...  Training Step: 15786...  Training loss: 3.8689...  0.3389 sec/batch
Epoch: 86/100...  Training Step: 15787...  Training loss: 3.8450...  0.3407 sec/batch
Epoch: 86/100...  Training Step: 15788...  Training loss: 3.8487...  0.3433 sec/batch
Epoch: 86/100...  Training Step: 15789...  Training loss: 3.8608...  0.3426 sec/batch
Epoch: 86/100...  Training Step: 15790...  Training loss: 3.7670...  0.3424 sec/batch
Epoch: 86/100...  Training Step: 15791...  Training loss: 3.8124...  0.3448 sec/batch
Epoch: 86/100...  Training Step: 15792...  Training lo

Epoch: 87/100...  Training Step: 15877...  Training loss: 3.9398...  0.3450 sec/batch
Epoch: 87/100...  Training Step: 15878...  Training loss: 3.9222...  0.3418 sec/batch
Epoch: 87/100...  Training Step: 15879...  Training loss: 3.9249...  0.3431 sec/batch
Epoch: 87/100...  Training Step: 15880...  Training loss: 3.8835...  0.3420 sec/batch
Epoch: 87/100...  Training Step: 15881...  Training loss: 3.8403...  0.3453 sec/batch
Epoch: 87/100...  Training Step: 15882...  Training loss: 3.9013...  0.3430 sec/batch
Epoch: 87/100...  Training Step: 15883...  Training loss: 3.7835...  0.3443 sec/batch
Epoch: 87/100...  Training Step: 15884...  Training loss: 3.7817...  0.3458 sec/batch
Epoch: 87/100...  Training Step: 15885...  Training loss: 3.8446...  0.3409 sec/batch
Epoch: 87/100...  Training Step: 15886...  Training loss: 3.8934...  0.3428 sec/batch
Epoch: 87/100...  Training Step: 15887...  Training loss: 3.8627...  0.3437 sec/batch
Epoch: 87/100...  Training Step: 15888...  Training lo

Epoch: 87/100...  Training Step: 15973...  Training loss: 3.8672...  0.3406 sec/batch
Epoch: 87/100...  Training Step: 15974...  Training loss: 3.7728...  0.3406 sec/batch
Epoch: 87/100...  Training Step: 15975...  Training loss: 3.8106...  0.3408 sec/batch
Epoch: 87/100...  Training Step: 15976...  Training loss: 3.8191...  0.3430 sec/batch
Epoch: 87/100...  Training Step: 15977...  Training loss: 3.8253...  0.3435 sec/batch
Epoch: 87/100...  Training Step: 15978...  Training loss: 3.8476...  0.3402 sec/batch
Epoch: 87/100...  Training Step: 15979...  Training loss: 3.8672...  0.3401 sec/batch
Epoch: 87/100...  Training Step: 15980...  Training loss: 3.8916...  0.3396 sec/batch
Epoch: 87/100...  Training Step: 15981...  Training loss: 3.8579...  0.3437 sec/batch
Epoch: 87/100...  Training Step: 15982...  Training loss: 3.9246...  0.3405 sec/batch
Epoch: 87/100...  Training Step: 15983...  Training loss: 3.7520...  0.3414 sec/batch
Epoch: 87/100...  Training Step: 15984...  Training lo

Epoch: 88/100...  Training Step: 16069...  Training loss: 3.8068...  0.3437 sec/batch
Epoch: 88/100...  Training Step: 16070...  Training loss: 3.8263...  0.3434 sec/batch
Epoch: 88/100...  Training Step: 16071...  Training loss: 3.8465...  0.3427 sec/batch
Epoch: 88/100...  Training Step: 16072...  Training loss: 3.8968...  0.3435 sec/batch
Epoch: 88/100...  Training Step: 16073...  Training loss: 3.8035...  0.3422 sec/batch
Epoch: 88/100...  Training Step: 16074...  Training loss: 3.8413...  0.3412 sec/batch
Epoch: 88/100...  Training Step: 16075...  Training loss: 3.8131...  0.3439 sec/batch
Epoch: 88/100...  Training Step: 16076...  Training loss: 3.7740...  0.3442 sec/batch
Epoch: 88/100...  Training Step: 16077...  Training loss: 3.8107...  0.3449 sec/batch
Epoch: 88/100...  Training Step: 16078...  Training loss: 3.8656...  0.3452 sec/batch
Epoch: 88/100...  Training Step: 16079...  Training loss: 3.7923...  0.3394 sec/batch
Epoch: 88/100...  Training Step: 16080...  Training lo

Epoch: 88/100...  Training Step: 16165...  Training loss: 3.8580...  0.3466 sec/batch
Epoch: 88/100...  Training Step: 16166...  Training loss: 3.9009...  0.3444 sec/batch
Epoch: 88/100...  Training Step: 16167...  Training loss: 3.7459...  0.3449 sec/batch
Epoch: 88/100...  Training Step: 16168...  Training loss: 3.8280...  0.3444 sec/batch
Epoch: 88/100...  Training Step: 16169...  Training loss: 3.8178...  0.3447 sec/batch
Epoch: 88/100...  Training Step: 16170...  Training loss: 3.8540...  0.3416 sec/batch
Epoch: 88/100...  Training Step: 16171...  Training loss: 3.7590...  0.3447 sec/batch
Epoch: 88/100...  Training Step: 16172...  Training loss: 3.7245...  0.3409 sec/batch
Epoch: 88/100...  Training Step: 16173...  Training loss: 3.8225...  0.3455 sec/batch
Epoch: 88/100...  Training Step: 16174...  Training loss: 3.7601...  0.3427 sec/batch
Epoch: 88/100...  Training Step: 16175...  Training loss: 3.7660...  0.3460 sec/batch
Epoch: 88/100...  Training Step: 16176...  Training lo

Epoch: 89/100...  Training Step: 16261...  Training loss: 3.8286...  0.3438 sec/batch
Epoch: 89/100...  Training Step: 16262...  Training loss: 3.8561...  0.3416 sec/batch
Epoch: 89/100...  Training Step: 16263...  Training loss: 3.7897...  0.3445 sec/batch
Epoch: 89/100...  Training Step: 16264...  Training loss: 3.7267...  0.3404 sec/batch
Epoch: 89/100...  Training Step: 16265...  Training loss: 3.7593...  0.3459 sec/batch
Epoch: 89/100...  Training Step: 16266...  Training loss: 3.7607...  0.3436 sec/batch
Epoch: 89/100...  Training Step: 16267...  Training loss: 3.7907...  0.3458 sec/batch
Epoch: 89/100...  Training Step: 16268...  Training loss: 3.7752...  0.3422 sec/batch
Epoch: 89/100...  Training Step: 16269...  Training loss: 3.8074...  0.3424 sec/batch
Epoch: 89/100...  Training Step: 16270...  Training loss: 3.7349...  0.3442 sec/batch
Epoch: 89/100...  Training Step: 16271...  Training loss: 3.8194...  0.3437 sec/batch
Epoch: 89/100...  Training Step: 16272...  Training lo

Epoch: 89/100...  Training Step: 16357...  Training loss: 3.8171...  0.3444 sec/batch
Epoch: 89/100...  Training Step: 16358...  Training loss: 3.7562...  0.3414 sec/batch
Epoch: 89/100...  Training Step: 16359...  Training loss: 3.7257...  0.3442 sec/batch
Epoch: 89/100...  Training Step: 16360...  Training loss: 3.7824...  0.3437 sec/batch
Epoch: 89/100...  Training Step: 16361...  Training loss: 3.8739...  0.3440 sec/batch
Epoch: 89/100...  Training Step: 16362...  Training loss: 3.8601...  0.3454 sec/batch
Epoch: 89/100...  Training Step: 16363...  Training loss: 3.8138...  0.3429 sec/batch
Epoch: 89/100...  Training Step: 16364...  Training loss: 3.8005...  0.3453 sec/batch
Epoch: 89/100...  Training Step: 16365...  Training loss: 3.7004...  0.3467 sec/batch
Epoch: 89/100...  Training Step: 16366...  Training loss: 3.7351...  0.3434 sec/batch
Epoch: 89/100...  Training Step: 16367...  Training loss: 3.6751...  0.3444 sec/batch
Epoch: 89/100...  Training Step: 16368...  Training lo

Epoch: 90/100...  Training Step: 16453...  Training loss: 3.7883...  0.3435 sec/batch
Epoch: 90/100...  Training Step: 16454...  Training loss: 3.7358...  0.3439 sec/batch
Epoch: 90/100...  Training Step: 16455...  Training loss: 3.8028...  0.3423 sec/batch
Epoch: 90/100...  Training Step: 16456...  Training loss: 3.7332...  0.3406 sec/batch
Epoch: 90/100...  Training Step: 16457...  Training loss: 3.8118...  0.3412 sec/batch
Epoch: 90/100...  Training Step: 16458...  Training loss: 3.7929...  0.3422 sec/batch
Epoch: 90/100...  Training Step: 16459...  Training loss: 3.7962...  0.3437 sec/batch
Epoch: 90/100...  Training Step: 16460...  Training loss: 3.7724...  0.3409 sec/batch
Epoch: 90/100...  Training Step: 16461...  Training loss: 3.7765...  0.3429 sec/batch
Epoch: 90/100...  Training Step: 16462...  Training loss: 3.8195...  0.3400 sec/batch
Epoch: 90/100...  Training Step: 16463...  Training loss: 3.7640...  0.3417 sec/batch
Epoch: 90/100...  Training Step: 16464...  Training lo

Epoch: 90/100...  Training Step: 16549...  Training loss: 3.7298...  0.3441 sec/batch
Epoch: 90/100...  Training Step: 16550...  Training loss: 3.7217...  0.3443 sec/batch
Epoch: 90/100...  Training Step: 16551...  Training loss: 3.6844...  0.3439 sec/batch
Epoch: 90/100...  Training Step: 16552...  Training loss: 3.7753...  0.3430 sec/batch
Epoch: 90/100...  Training Step: 16553...  Training loss: 3.8238...  0.3441 sec/batch
Epoch: 90/100...  Training Step: 16554...  Training loss: 3.7652...  0.3444 sec/batch
Epoch: 90/100...  Training Step: 16555...  Training loss: 3.7790...  0.3412 sec/batch
Epoch: 90/100...  Training Step: 16556...  Training loss: 3.7432...  0.3452 sec/batch
Epoch: 90/100...  Training Step: 16557...  Training loss: 3.6970...  0.3446 sec/batch
Epoch: 90/100...  Training Step: 16558...  Training loss: 3.7229...  0.3451 sec/batch
Epoch: 90/100...  Training Step: 16559...  Training loss: 3.7117...  0.3407 sec/batch
Epoch: 90/100...  Training Step: 16560...  Training lo

Epoch: 91/100...  Training Step: 16645...  Training loss: 3.7959...  0.3439 sec/batch
Epoch: 91/100...  Training Step: 16646...  Training loss: 3.8286...  0.3432 sec/batch
Epoch: 91/100...  Training Step: 16647...  Training loss: 3.7484...  0.3411 sec/batch
Epoch: 91/100...  Training Step: 16648...  Training loss: 3.7173...  0.3451 sec/batch
Epoch: 91/100...  Training Step: 16649...  Training loss: 3.8242...  0.3415 sec/batch
Epoch: 91/100...  Training Step: 16650...  Training loss: 3.8876...  0.3422 sec/batch
Epoch: 91/100...  Training Step: 16651...  Training loss: 3.8176...  0.3456 sec/batch
Epoch: 91/100...  Training Step: 16652...  Training loss: 3.7427...  0.3464 sec/batch
Epoch: 91/100...  Training Step: 16653...  Training loss: 3.7619...  0.3459 sec/batch
Epoch: 91/100...  Training Step: 16654...  Training loss: 3.7461...  0.3457 sec/batch
Epoch: 91/100...  Training Step: 16655...  Training loss: 3.8240...  0.3423 sec/batch
Epoch: 91/100...  Training Step: 16656...  Training lo

Epoch: 91/100...  Training Step: 16741...  Training loss: 3.6722...  0.3430 sec/batch
Epoch: 91/100...  Training Step: 16742...  Training loss: 3.7417...  0.3443 sec/batch
Epoch: 91/100...  Training Step: 16743...  Training loss: 3.7461...  0.3448 sec/batch
Epoch: 91/100...  Training Step: 16744...  Training loss: 3.7555...  0.3435 sec/batch
Epoch: 92/100...  Training Step: 16745...  Training loss: 3.6931...  0.3435 sec/batch
Epoch: 92/100...  Training Step: 16746...  Training loss: 3.4992...  0.3419 sec/batch
Epoch: 92/100...  Training Step: 16747...  Training loss: 3.4792...  0.3446 sec/batch
Epoch: 92/100...  Training Step: 16748...  Training loss: 3.5894...  0.3421 sec/batch
Epoch: 92/100...  Training Step: 16749...  Training loss: 3.5582...  0.3406 sec/batch
Epoch: 92/100...  Training Step: 16750...  Training loss: 3.5222...  0.3414 sec/batch
Epoch: 92/100...  Training Step: 16751...  Training loss: 3.6806...  0.3451 sec/batch
Epoch: 92/100...  Training Step: 16752...  Training lo

Epoch: 92/100...  Training Step: 16837...  Training loss: 3.7155...  0.3423 sec/batch
Epoch: 92/100...  Training Step: 16838...  Training loss: 3.7238...  0.3422 sec/batch
Epoch: 92/100...  Training Step: 16839...  Training loss: 3.8239...  0.3418 sec/batch
Epoch: 92/100...  Training Step: 16840...  Training loss: 3.8122...  0.3424 sec/batch
Epoch: 92/100...  Training Step: 16841...  Training loss: 3.7528...  0.3426 sec/batch
Epoch: 92/100...  Training Step: 16842...  Training loss: 3.7476...  0.3446 sec/batch
Epoch: 92/100...  Training Step: 16843...  Training loss: 3.7520...  0.3450 sec/batch
Epoch: 92/100...  Training Step: 16844...  Training loss: 3.7944...  0.3429 sec/batch
Epoch: 92/100...  Training Step: 16845...  Training loss: 3.8248...  0.3440 sec/batch
Epoch: 92/100...  Training Step: 16846...  Training loss: 3.7343...  0.3443 sec/batch
Epoch: 92/100...  Training Step: 16847...  Training loss: 3.7570...  0.3451 sec/batch
Epoch: 92/100...  Training Step: 16848...  Training lo

Epoch: 93/100...  Training Step: 16933...  Training loss: 3.5473...  0.3449 sec/batch
Epoch: 93/100...  Training Step: 16934...  Training loss: 3.5608...  0.3413 sec/batch
Epoch: 93/100...  Training Step: 16935...  Training loss: 3.6903...  0.3398 sec/batch
Epoch: 93/100...  Training Step: 16936...  Training loss: 3.7312...  0.3414 sec/batch
Epoch: 93/100...  Training Step: 16937...  Training loss: 3.6833...  0.3414 sec/batch
Epoch: 93/100...  Training Step: 16938...  Training loss: 3.7386...  0.3444 sec/batch
Epoch: 93/100...  Training Step: 16939...  Training loss: 3.6558...  0.3417 sec/batch
Epoch: 93/100...  Training Step: 16940...  Training loss: 3.6564...  0.3417 sec/batch
Epoch: 93/100...  Training Step: 16941...  Training loss: 3.6840...  0.3426 sec/batch
Epoch: 93/100...  Training Step: 16942...  Training loss: 3.6176...  0.3449 sec/batch
Epoch: 93/100...  Training Step: 16943...  Training loss: 3.6702...  0.3446 sec/batch
Epoch: 93/100...  Training Step: 16944...  Training lo

Epoch: 93/100...  Training Step: 17029...  Training loss: 3.7871...  0.3408 sec/batch
Epoch: 93/100...  Training Step: 17030...  Training loss: 3.7099...  0.3434 sec/batch
Epoch: 93/100...  Training Step: 17031...  Training loss: 3.7434...  0.3441 sec/batch
Epoch: 93/100...  Training Step: 17032...  Training loss: 3.6445...  0.3421 sec/batch
Epoch: 93/100...  Training Step: 17033...  Training loss: 3.7677...  0.3425 sec/batch
Epoch: 93/100...  Training Step: 17034...  Training loss: 3.7338...  0.3448 sec/batch
Epoch: 93/100...  Training Step: 17035...  Training loss: 3.7018...  0.3413 sec/batch
Epoch: 93/100...  Training Step: 17036...  Training loss: 3.7331...  0.3411 sec/batch
Epoch: 93/100...  Training Step: 17037...  Training loss: 3.7235...  0.3423 sec/batch
Epoch: 93/100...  Training Step: 17038...  Training loss: 3.7131...  0.3428 sec/batch
Epoch: 93/100...  Training Step: 17039...  Training loss: 3.7879...  0.3433 sec/batch
Epoch: 93/100...  Training Step: 17040...  Training lo

Epoch: 94/100...  Training Step: 17125...  Training loss: 3.6767...  0.3408 sec/batch
Epoch: 94/100...  Training Step: 17126...  Training loss: 3.6344...  0.3434 sec/batch
Epoch: 94/100...  Training Step: 17127...  Training loss: 3.6663...  0.3428 sec/batch
Epoch: 94/100...  Training Step: 17128...  Training loss: 3.6630...  0.3410 sec/batch
Epoch: 94/100...  Training Step: 17129...  Training loss: 3.7158...  0.3447 sec/batch
Epoch: 94/100...  Training Step: 17130...  Training loss: 3.6759...  0.3397 sec/batch
Epoch: 94/100...  Training Step: 17131...  Training loss: 3.7172...  0.3426 sec/batch
Epoch: 94/100...  Training Step: 17132...  Training loss: 3.7226...  0.3423 sec/batch
Epoch: 94/100...  Training Step: 17133...  Training loss: 3.6721...  0.3416 sec/batch
Epoch: 94/100...  Training Step: 17134...  Training loss: 3.5830...  0.3397 sec/batch
Epoch: 94/100...  Training Step: 17135...  Training loss: 3.6288...  0.3431 sec/batch
Epoch: 94/100...  Training Step: 17136...  Training lo

Epoch: 94/100...  Training Step: 17221...  Training loss: 3.7215...  0.3431 sec/batch
Epoch: 94/100...  Training Step: 17222...  Training loss: 3.7196...  0.3399 sec/batch
Epoch: 94/100...  Training Step: 17223...  Training loss: 3.7958...  0.3430 sec/batch
Epoch: 94/100...  Training Step: 17224...  Training loss: 3.7121...  0.3433 sec/batch
Epoch: 94/100...  Training Step: 17225...  Training loss: 3.7096...  0.3387 sec/batch
Epoch: 94/100...  Training Step: 17226...  Training loss: 3.7031...  0.3402 sec/batch
Epoch: 94/100...  Training Step: 17227...  Training loss: 3.8021...  0.3385 sec/batch
Epoch: 94/100...  Training Step: 17228...  Training loss: 3.7195...  0.3422 sec/batch
Epoch: 94/100...  Training Step: 17229...  Training loss: 3.7386...  0.3388 sec/batch
Epoch: 94/100...  Training Step: 17230...  Training loss: 3.6838...  0.3442 sec/batch
Epoch: 94/100...  Training Step: 17231...  Training loss: 3.7950...  0.3390 sec/batch
Epoch: 94/100...  Training Step: 17232...  Training lo

Epoch: 95/100...  Training Step: 17317...  Training loss: 3.6843...  0.3435 sec/batch
Epoch: 95/100...  Training Step: 17318...  Training loss: 3.5463...  0.3435 sec/batch
Epoch: 95/100...  Training Step: 17319...  Training loss: 3.6122...  0.3428 sec/batch
Epoch: 95/100...  Training Step: 17320...  Training loss: 3.6297...  0.3387 sec/batch
Epoch: 95/100...  Training Step: 17321...  Training loss: 3.7227...  0.3411 sec/batch
Epoch: 95/100...  Training Step: 17322...  Training loss: 3.7100...  0.3409 sec/batch
Epoch: 95/100...  Training Step: 17323...  Training loss: 3.7475...  0.3421 sec/batch
Epoch: 95/100...  Training Step: 17324...  Training loss: 3.6888...  0.3416 sec/batch
Epoch: 95/100...  Training Step: 17325...  Training loss: 3.7667...  0.3403 sec/batch
Epoch: 95/100...  Training Step: 17326...  Training loss: 3.8079...  0.3410 sec/batch
Epoch: 95/100...  Training Step: 17327...  Training loss: 3.7805...  0.3427 sec/batch
Epoch: 95/100...  Training Step: 17328...  Training lo

Epoch: 95/100...  Training Step: 17413...  Training loss: 3.7252...  0.3439 sec/batch
Epoch: 95/100...  Training Step: 17414...  Training loss: 3.6921...  0.3404 sec/batch
Epoch: 95/100...  Training Step: 17415...  Training loss: 3.7683...  0.3400 sec/batch
Epoch: 95/100...  Training Step: 17416...  Training loss: 3.7311...  0.3391 sec/batch
Epoch: 95/100...  Training Step: 17417...  Training loss: 3.8784...  0.3410 sec/batch
Epoch: 95/100...  Training Step: 17418...  Training loss: 3.8445...  0.3409 sec/batch
Epoch: 95/100...  Training Step: 17419...  Training loss: 3.8288...  0.3399 sec/batch
Epoch: 95/100...  Training Step: 17420...  Training loss: 3.7001...  0.3413 sec/batch
Epoch: 95/100...  Training Step: 17421...  Training loss: 3.7815...  0.3388 sec/batch
Epoch: 95/100...  Training Step: 17422...  Training loss: 3.7981...  0.3425 sec/batch
Epoch: 95/100...  Training Step: 17423...  Training loss: 3.7440...  0.3402 sec/batch
Epoch: 95/100...  Training Step: 17424...  Training lo

Epoch: 96/100...  Training Step: 17509...  Training loss: 3.7229...  0.3418 sec/batch
Epoch: 96/100...  Training Step: 17510...  Training loss: 3.8073...  0.3428 sec/batch
Epoch: 96/100...  Training Step: 17511...  Training loss: 3.7782...  0.3407 sec/batch
Epoch: 96/100...  Training Step: 17512...  Training loss: 3.7112...  0.3392 sec/batch
Epoch: 96/100...  Training Step: 17513...  Training loss: 3.6237...  0.3405 sec/batch
Epoch: 96/100...  Training Step: 17514...  Training loss: 3.6425...  0.3404 sec/batch
Epoch: 96/100...  Training Step: 17515...  Training loss: 3.7060...  0.3413 sec/batch
Epoch: 96/100...  Training Step: 17516...  Training loss: 3.7635...  0.3381 sec/batch
Epoch: 96/100...  Training Step: 17517...  Training loss: 3.8009...  0.3421 sec/batch
Epoch: 96/100...  Training Step: 17518...  Training loss: 3.7329...  0.3389 sec/batch
Epoch: 96/100...  Training Step: 17519...  Training loss: 3.7431...  0.3374 sec/batch
Epoch: 96/100...  Training Step: 17520...  Training lo

Epoch: 96/100...  Training Step: 17605...  Training loss: 3.7696...  0.3391 sec/batch
Epoch: 96/100...  Training Step: 17606...  Training loss: 3.7745...  0.3404 sec/batch
Epoch: 96/100...  Training Step: 17607...  Training loss: 3.7588...  0.3382 sec/batch
Epoch: 96/100...  Training Step: 17608...  Training loss: 3.7598...  0.3392 sec/batch
Epoch: 96/100...  Training Step: 17609...  Training loss: 3.8134...  0.3417 sec/batch
Epoch: 96/100...  Training Step: 17610...  Training loss: 3.8349...  0.3397 sec/batch
Epoch: 96/100...  Training Step: 17611...  Training loss: 3.8383...  0.3400 sec/batch
Epoch: 96/100...  Training Step: 17612...  Training loss: 3.7533...  0.3399 sec/batch
Epoch: 96/100...  Training Step: 17613...  Training loss: 3.8149...  0.3387 sec/batch
Epoch: 96/100...  Training Step: 17614...  Training loss: 3.7814...  0.3370 sec/batch
Epoch: 96/100...  Training Step: 17615...  Training loss: 3.8110...  0.3402 sec/batch
Epoch: 96/100...  Training Step: 17616...  Training lo

Epoch: 97/100...  Training Step: 17701...  Training loss: 3.7498...  0.3383 sec/batch
Epoch: 97/100...  Training Step: 17702...  Training loss: 3.6879...  0.3380 sec/batch
Epoch: 97/100...  Training Step: 17703...  Training loss: 3.7581...  0.3381 sec/batch
Epoch: 97/100...  Training Step: 17704...  Training loss: 3.8156...  0.3400 sec/batch
Epoch: 97/100...  Training Step: 17705...  Training loss: 3.7653...  0.3388 sec/batch
Epoch: 97/100...  Training Step: 17706...  Training loss: 3.8191...  0.3386 sec/batch
Epoch: 97/100...  Training Step: 17707...  Training loss: 3.8914...  0.3420 sec/batch
Epoch: 97/100...  Training Step: 17708...  Training loss: 3.7755...  0.3401 sec/batch
Epoch: 97/100...  Training Step: 17709...  Training loss: 3.7402...  0.3422 sec/batch
Epoch: 97/100...  Training Step: 17710...  Training loss: 3.7234...  0.3379 sec/batch
Epoch: 97/100...  Training Step: 17711...  Training loss: 3.8163...  0.3377 sec/batch
Epoch: 97/100...  Training Step: 17712...  Training lo

Epoch: 97/100...  Training Step: 17797...  Training loss: 3.8237...  0.3421 sec/batch
Epoch: 97/100...  Training Step: 17798...  Training loss: 3.7802...  0.3449 sec/batch
Epoch: 97/100...  Training Step: 17799...  Training loss: 3.7926...  0.3411 sec/batch
Epoch: 97/100...  Training Step: 17800...  Training loss: 3.7686...  0.3399 sec/batch
Epoch: 97/100...  Training Step: 17801...  Training loss: 3.7103...  0.3394 sec/batch
Epoch: 97/100...  Training Step: 17802...  Training loss: 3.7698...  0.3444 sec/batch
Epoch: 97/100...  Training Step: 17803...  Training loss: 3.8014...  0.3431 sec/batch
Epoch: 97/100...  Training Step: 17804...  Training loss: 3.6373...  0.3405 sec/batch
Epoch: 97/100...  Training Step: 17805...  Training loss: 3.7668...  0.3443 sec/batch
Epoch: 97/100...  Training Step: 17806...  Training loss: 3.7433...  0.3438 sec/batch
Epoch: 97/100...  Training Step: 17807...  Training loss: 3.6997...  0.3402 sec/batch
Epoch: 97/100...  Training Step: 17808...  Training lo

Epoch: 98/100...  Training Step: 17893...  Training loss: 3.7567...  0.3396 sec/batch
Epoch: 98/100...  Training Step: 17894...  Training loss: 3.7308...  0.3407 sec/batch
Epoch: 98/100...  Training Step: 17895...  Training loss: 3.7531...  0.3421 sec/batch
Epoch: 98/100...  Training Step: 17896...  Training loss: 3.7251...  0.3443 sec/batch
Epoch: 98/100...  Training Step: 17897...  Training loss: 3.7271...  0.3410 sec/batch
Epoch: 98/100...  Training Step: 17898...  Training loss: 3.7601...  0.3426 sec/batch
Epoch: 98/100...  Training Step: 17899...  Training loss: 3.7989...  0.3412 sec/batch
Epoch: 98/100...  Training Step: 17900...  Training loss: 3.8447...  0.3400 sec/batch
Epoch: 98/100...  Training Step: 17901...  Training loss: 3.8359...  0.3409 sec/batch
Epoch: 98/100...  Training Step: 17902...  Training loss: 3.8195...  0.3395 sec/batch
Epoch: 98/100...  Training Step: 17903...  Training loss: 3.8333...  0.3434 sec/batch
Epoch: 98/100...  Training Step: 17904...  Training lo

Epoch: 98/100...  Training Step: 17989...  Training loss: 3.7370...  0.3434 sec/batch
Epoch: 98/100...  Training Step: 17990...  Training loss: 3.7761...  0.3428 sec/batch
Epoch: 98/100...  Training Step: 17991...  Training loss: 3.7144...  0.3416 sec/batch
Epoch: 98/100...  Training Step: 17992...  Training loss: 3.6896...  0.3393 sec/batch
Epoch: 98/100...  Training Step: 17993...  Training loss: 3.7509...  0.3424 sec/batch
Epoch: 98/100...  Training Step: 17994...  Training loss: 3.7124...  0.3398 sec/batch
Epoch: 98/100...  Training Step: 17995...  Training loss: 3.7414...  0.3433 sec/batch
Epoch: 98/100...  Training Step: 17996...  Training loss: 3.7591...  0.3403 sec/batch
Epoch: 98/100...  Training Step: 17997...  Training loss: 3.7742...  0.3413 sec/batch
Epoch: 98/100...  Training Step: 17998...  Training loss: 3.6792...  0.3406 sec/batch
Epoch: 98/100...  Training Step: 17999...  Training loss: 3.7239...  0.3407 sec/batch
Epoch: 98/100...  Training Step: 18000...  Training lo

Epoch: 99/100...  Training Step: 18085...  Training loss: 3.8065...  0.3414 sec/batch
Epoch: 99/100...  Training Step: 18086...  Training loss: 3.7839...  0.3407 sec/batch
Epoch: 99/100...  Training Step: 18087...  Training loss: 3.7979...  0.3452 sec/batch
Epoch: 99/100...  Training Step: 18088...  Training loss: 3.8141...  0.3449 sec/batch
Epoch: 99/100...  Training Step: 18089...  Training loss: 3.7589...  0.3431 sec/batch
Epoch: 99/100...  Training Step: 18090...  Training loss: 3.7936...  0.3456 sec/batch
Epoch: 99/100...  Training Step: 18091...  Training loss: 3.7061...  0.3449 sec/batch
Epoch: 99/100...  Training Step: 18092...  Training loss: 3.6718...  0.3412 sec/batch
Epoch: 99/100...  Training Step: 18093...  Training loss: 3.7408...  0.3415 sec/batch
Epoch: 99/100...  Training Step: 18094...  Training loss: 3.7496...  0.3431 sec/batch
Epoch: 99/100...  Training Step: 18095...  Training loss: 3.7367...  0.3459 sec/batch
Epoch: 99/100...  Training Step: 18096...  Training lo

Epoch: 99/100...  Training Step: 18181...  Training loss: 3.7486...  0.3426 sec/batch
Epoch: 99/100...  Training Step: 18182...  Training loss: 3.6699...  0.3440 sec/batch
Epoch: 99/100...  Training Step: 18183...  Training loss: 3.7264...  0.3464 sec/batch
Epoch: 99/100...  Training Step: 18184...  Training loss: 3.7236...  0.3404 sec/batch
Epoch: 99/100...  Training Step: 18185...  Training loss: 3.7107...  0.3450 sec/batch
Epoch: 99/100...  Training Step: 18186...  Training loss: 3.7634...  0.3408 sec/batch
Epoch: 99/100...  Training Step: 18187...  Training loss: 3.7602...  0.3416 sec/batch
Epoch: 99/100...  Training Step: 18188...  Training loss: 3.7945...  0.3422 sec/batch
Epoch: 99/100...  Training Step: 18189...  Training loss: 3.7239...  0.3428 sec/batch
Epoch: 99/100...  Training Step: 18190...  Training loss: 3.7957...  0.3417 sec/batch
Epoch: 99/100...  Training Step: 18191...  Training loss: 3.6429...  0.3413 sec/batch
Epoch: 99/100...  Training Step: 18192...  Training lo

Epoch: 100/100...  Training Step: 18276...  Training loss: 3.6642...  0.3417 sec/batch
Epoch: 100/100...  Training Step: 18277...  Training loss: 3.7261...  0.3444 sec/batch
Epoch: 100/100...  Training Step: 18278...  Training loss: 3.7540...  0.3428 sec/batch
Epoch: 100/100...  Training Step: 18279...  Training loss: 3.7448...  0.3388 sec/batch
Epoch: 100/100...  Training Step: 18280...  Training loss: 3.7821...  0.3402 sec/batch
Epoch: 100/100...  Training Step: 18281...  Training loss: 3.7097...  0.3436 sec/batch
Epoch: 100/100...  Training Step: 18282...  Training loss: 3.7069...  0.3445 sec/batch
Epoch: 100/100...  Training Step: 18283...  Training loss: 3.6887...  0.3404 sec/batch
Epoch: 100/100...  Training Step: 18284...  Training loss: 3.6678...  0.3409 sec/batch
Epoch: 100/100...  Training Step: 18285...  Training loss: 3.7282...  0.3417 sec/batch
Epoch: 100/100...  Training Step: 18286...  Training loss: 3.7640...  0.3389 sec/batch
Epoch: 100/100...  Training Step: 18287... 

Epoch: 100/100...  Training Step: 18371...  Training loss: 3.7679...  0.3412 sec/batch
Epoch: 100/100...  Training Step: 18372...  Training loss: 3.8026...  0.3447 sec/batch
Epoch: 100/100...  Training Step: 18373...  Training loss: 3.7478...  0.3416 sec/batch
Epoch: 100/100...  Training Step: 18374...  Training loss: 3.7725...  0.3403 sec/batch
Epoch: 100/100...  Training Step: 18375...  Training loss: 3.6509...  0.3402 sec/batch
Epoch: 100/100...  Training Step: 18376...  Training loss: 3.7214...  0.3400 sec/batch
Epoch: 100/100...  Training Step: 18377...  Training loss: 3.7156...  0.3400 sec/batch
Epoch: 100/100...  Training Step: 18378...  Training loss: 3.7085...  0.3438 sec/batch
Epoch: 100/100...  Training Step: 18379...  Training loss: 3.6613...  0.3434 sec/batch
Epoch: 100/100...  Training Step: 18380...  Training loss: 3.6372...  0.3448 sec/batch
Epoch: 100/100...  Training Step: 18381...  Training loss: 3.6983...  0.3447 sec/batch
Epoch: 100/100...  Training Step: 18382... 

然后我们可以查看在 checkpoints 中所有保存的模型

In [27]:
tf.train.get_checkpoint_state('checkpoints')

model_checkpoint_path: "checkpoints/i18400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i1000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i2000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i3000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i4000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i5000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i6000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i7000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i8000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i9000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i10000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i11000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i12000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/checkpoints/i13000_l512.ckpt"
all_model_checkpoint_paths: "che

在训练完成之后, 可以从 checkpoints 中恢复模型, 然后我们再给网络输入一个字符, 再让 CharRNN 不断生成新的字符, 也就是让神经网络"写诗"

为了让输出的字符更加丰富随意, 这里我们从模型输出的概率向量中随机选取前几个中的一个作为最后的输出

In [28]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

现在是时候看看我们训练的神经网络能够写出什么样的诗了, 这里封装成了一个 sample 函数, 需要完成下面的 #todo 部分

注意, 我们在用 CharRNN 写诗的时候, 需要先用输入的字符对网络进行"预热", 这个过程我们不采用网络输出字符的结果, 但是可以得到一个更好的状态, 然后就可以用这个状态作为后续生成新字符的初始状态, 从而获得更好的效果

In [29]:
def sample(checkpoint, n_samples, rnn_size, vocab_size, prime="The "):
    samples = [c for c in prime]
    
    # 构造CharRNN模型
    # 注意, 这里我们处于测试状态, batch_size 和 n_step 都应为1
    model = CharRNN(convert.vocab_size, rnn_size=rnn_size, sampling=True)
    
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)

        #todo:
        # 得到初始 state
        new_state = sess.run(model.initial_state)
        
        # 这一步我们先将输入的几个字符对网络进行"预热", 
        # 这样可以得到更好的 state
        for c in prime:
            x = np.zeros((1, 1))
            x[0, 0] = convert.word_to_int(c)
            
            #todo:
            # 像之前一样设定feed_dict
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            
            #todo:
            # 得到概率输出以及当前输出状态
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, convert.vocab_size)
        samples.append(convert.int_to_word(c))

        for i in range(n_samples):
            x[0,0] = c
            
            #todo:
            # 像之前一样设定feed_dict
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            
            #todo:
            # 得到概率输出以及当前输出状态
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, convert.vocab_size)
            samples.append(convert.int_to_word(c))
        
    return ''.join(samples)

看一下训练时长最长的模型的效果

In [30]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, convert.vocab_size, rnn_size, convert.vocab_size, prime="天青色等烟雨")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/i18400_l512.ckpt
天青色等烟雨无看年夜  在  何人有路心  不为学天人  故国上云人  凭山宋叶衣  危葭落夕云  庭河上望云  吾思寄何游  话  长人入上声  谁看上塞人  空看楚塞秋月风  莫叹如兹时  知君亦为襟  大禽飞水下  兹尺学金饰  素  天山在为心  萧洲逼蜀时  花山动白楼  春风落落风  江山上雾新霭风  如知圣柄明  当朝帝帝骢  明时意时行  何夜上云尘  愿言命上来  故门多此人  寒门满暮楼  更来天月深  吾君载古心  自知千涧云   翩盖降车列  共受丰中年  金旆在文台  徒此圣花开  圣光方大回  金飞织管空        风  秋  春风一不心  一怜支署微  还来思落山  山泉生竹生  寒吟向夏门  孤风隔海桥  无看入雪山  何堪白里中  谢君策水深  萧桡思钓然  登余入海庭  江流隔汉城  几朝杨桂尘  裴条有此情  空看映蕙文  佳传奉彩文  六媒四表书  清飘玉帐舒  含门意应知  何能素妾情  妾声看芳叶  相对花光清月同  妾妾折罗台月    光妾娇  上袖纤妆叶  拂舞含光扇  虚乐表秦恩  仙  天 何日上中开  清首拂花声  新门绝火风  无逢老此风  君言黄士来  因朝杨月来  关林月月寒  还因比化衣  金驾大川城上人  东山在水楼  春风雁路深  江峰洒洞衣  楚人过顶稀  茅冰落晚寒山心  一念在中情  相言白道时  青袍入石篱  终来得古年  眼冰桃玉冰空     风山映尘发  王条意幽曲月香  红粉拂飞衣  乍彩湿花飞  无  何去是无人  一窥鹓塞尘  平山不在时为风  早  明日复如里  自言尘贵玉不风  春看发花人  一为赴塞人  烟河落树斜  孤馆信无行  天枫北里来  行来发水楼  春风吹不行  一从出上中  谁知车下春    □风逐天发  因当二宰人  生精礼圣知  朝天渭水来  凭将逐镜台  花看映翠风  风风散彩罗  六士敌鱼时  吾君重恶簪  圣然游钓邑人归  不著入荆枝  结听黄风襟  周符神礼馀  天晨意中新  香  何风不如归  清看缑殿衣  江山有望飞山山  一是雪花时  莫是见君人  江看西远城  东城游雁归  故日戎铜衣人   风得映兰萼

看一下刚刚开始训练时模型的效果

In [32]:
checkpoint = 'checkpoints/checkpoints/i1000_l512.ckpt'
samp = sample(checkpoint, convert.vocab_size, rnn_size, convert.vocab_size, prime="天青色等烟雨")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/checkpoints/i1000_l512.ckpt
天青色等烟雨人山人  何是自中中  不山不上时  今风自自春  何风入自人  今  山人一自风       人山入自人      风风入人时  今人一路风                   风风有自时                山年有自人  不风不不风  何山一不春  不风入上中    何风一相人  山日不人归  不山不上人  风风见不时                        风人入不中  今  山人入相时  何日一人中    不年入自中                        春山不不风        山人入相中  何日不相春  一风有不春  山风一不人  何日不山中    何年一山人  何  不人入自人  山日有相人  何水入山归  山  山  春  何风有不春                     风山不自中                                                  风山入路人  今人有上中  今人不路人  今人不上中  何风不不人  山山有自人  何风不不风  今山有水人  不人不水春    何日不山中          春人不上人  何日自相时  何  山风入不稀  不风不不风  一山无水生  今人不上风                       山人不不时  何风有自春  风人不自人  何人一自中  不  山风入人时  山人见自稀  何  何人不水人  风  何人一路归                                  风年不不风  不年入上人  不人入路中      何人一不时       风年入上春    风风有不春无山上  不年一山中  不人不不人                                 山山不不人  何山不路时       人山在上人                                风山有路年  不山有水中  风年入上人       山生一不时  今风入水中                                                         

In [33]:
checkpoint = 'checkpoints/checkpoints/i5000_l512.ckpt'
samp = sample(checkpoint, convert.vocab_size, rnn_size, convert.vocab_size, prime="天青色等烟雨")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/checkpoints/i5000_l512.ckpt
天青色等烟雨  花歌玉月开  金山玉日筵    不中不天水  天山不自中  天人何日归  风云出不人  何山日月人  何人日上人  山山入不人  何人日水中  长来日日衣  何来日不中  长风出不来  何人出上归  云来不上心  无山无日心  无风日水归  风山日月归不不里      风山日风    青人风    山人不水衣  天来不不中  何山日水人  风云日上中  山来不上归  山风不不中  风人有水归  风风不未深  谁君何上心  不君无不来  长人望上来  山来有月人  空来入月人  长山入上来  云山有上中  无山日日来  山山日日归  空云入上流  长风不未中  山山无里人  风来不月归  风云有不中  何人不月心  还风不未人  不人一月来  长云日不中  何风日上来  山人一上中  长人入水来  山云入不中  山来入上人  长风一水归何中去  月  风风一云树  山山何不人  何人有水归  山山不水中  何风入水中  空山不不中  长山入不时  何人日不人  何来出月人  风云日不来  山人望上归  山山不水来  云云玉上来  山山入不深  何山日水人  长人有日来  何来入月来  空云日日归  山人不日深  山山何里心  何山日上深  无人无月心  空山落月流  风山日月人  山来入水深  长人何月来  何来不未深  江风一未来  江风不上心  相山日不深  山人有月归  空山日上来  风人日日中  长来不上深  山山无上来  何来落水来  山来不未归  长人入不人  江人不不人  不风有水来  山来不上归  山山落月归  风风不上深  长人不上来  风山出水人  云风日路人  山云落上开  江人不日深  山人一水深  长山有里人  江山不不深  山来不不归  长来不不归  何年日不春  高山有不时  不山有水来  山山入月归  山风不不春  风人不不春  风风不水人  风人日月来  长人入月人  空云不日归  山人不不人  山来不未人  何来何月来  何人不北人  何来出不人  风人不不春  风来不水人  风风日未中  山风日上来  云风有月人  云来不上春  风来入不时  何来不不来

看看模型的中间结果

In [34]:
checkpoint = 'checkpoints/checkpoints/i10000_l512.ckpt'
samp = sample(checkpoint, convert.vocab_size, rnn_size, convert.vocab_size, prime="天青色等烟雨")
print(samp)

INFO:tensorflow:Restoring parameters from checkpoints/checkpoints/i10000_l512.ckpt
天青色等烟雨  含花意月流  青山日素生龙归  不与山山心  云日见花深  江色宿人间    弱马白云存  一事有人衣  白日出山山  野发见山人  野  何此风已流  不为无此归  春云见路深  白山山水间山云  不在不沾城  山人日云路人人  独里夜云人  不雪暮风人  相知白上深  山门秋水心不人  不风风水归  江云发路稀  孤风日水深  月山山路间  山山入木中    风  何日不云山  江城雨雨深  山山见不人  江山过客深  高山独暮山  江风不客归有花暮人山  不月有关归  山日意何人  山花月叶清  寒声入客人  故人思路稀  春云日雨归  江山草水稀  江花草雪云有花路人长  江事有山秋  云雨入云流  相君无不人    风君此不云  寒山日木中  江花云雪长长山  一日风山山  不与不门城  月月过知过人人  山月夜人城  云月落花水未人  风风积竹山    鸟  不来不未见  山波不不时  孤风不自人    云风何未时  风山花水花  野家山故生  何思有山归无山雨人人  一月有人深  江思人不春  江风落月中    马  白日有山见  归思思不时  野风吹白移  不知云远间  春风月水间  山帆水木人  江山见一年  风河白木深  江风积竹流    水  不君思不时  离风此见时  山风风海人  江来日路人  江城日海山  山云白路深  白风春上山  云风入草深  江山天水来   山上上人人  山风水海云  江人见帝臣  相时白路时  风花落照声  莫君秋草间  东山秋草间  东江秋自人  山花出水流    风山白不死  孤公就白行人归  云草出山城  野日过山下自归  寒山入色来   风见有无时  幽上独东山  风望一风深  何有整门中  素烛带新阴自山  山为白沈间  一月白山时  高风积水低   山雪竹新衣  山风竹木流  山山水照新山云月  不暮空不发  自日惜麟心  野  山云见不春  山光落水中  江山水水深  白心不不归有人  风花白紫香  高条芳节稀  山花带月时  江云白雨阴  禅门不水人  月人寒上来云山  自人秋路时  江山过雪船  风云雪上山  有来寒上

可以看到, 随着模型不断训练, 得到的语句越来越丰富, 越来越完整, 也就是说效果越来越好. 但是本质上这还是一个概率模型, 