# Character-level language models

This tutorial shows how to train a character-level language model with a multilayer recurrent neural network. In particular, we will train a multilayer LSTM network that is able to generate text.



## Import necessary package

Import necessary package

In [1]:
import os
import urllib
import zipfile

## Prepare data
We first open the target file and show the first few characters.

In [2]:
# target_file = './data/shediao.txt'
target_file = './data/shediao_11_17.txt'

In [3]:
with open(target_file, 'r') as f:
    print f.read()[0:1000]


　　射雕英雄传

　　

　　第十一回长春服输

　　沙通天见师弟危殆，跃起急格，挡开了梅超风这一抓，两人手腕相交，都感臂酸心惊。这时左边嗤嗤连声，彭连虎的连珠钱镖也已袭到。梅超风顺手把侯通海身子往钱镖上掷去，“啊啾一声大叫，侯通海身上中镖。黄蓉百忙中叫道：“三头蛟，恭喜发财，得了这么多铜钱１沙通天见这一掷势道十分劲急，师弟撞到地下，必受重伤，倏地飞身过去，伸掌在他腰间向上一托。侯通海犹如纸鹞般飞了起来，待得再行落地，那已是自然之势，他一身武功，这般摔一交便毫不相干。只不过左手给这般势道甩了起来，挥拳打出，手臂长短恰到好处，又是重重的打在三个肉瘤之上。
　　梅超风掷人、沙通天救师弟，都只是眨眼间之事，侯通海肉瘤上刚刚中拳，彭连虎的钱镖又已陆续向梅超风打到，同时欧


Then we define a few utility functions to pre-process the dataset.

In [4]:
word_per_line = 64

def read_content_seperate(path):
    max_word_number = word_per_line
    result_string = ''
    with open(path) as ins:   
        for line in ins:
            temp_string = line.decode('utf-8')
            if len(temp_string) < max_word_number:
                result_string = result_string + '\n' + temp_string
            else:
                segment_number = int(len(temp_string)/max_word_number)
                for i in range(segment_number):
                    result_string = result_string + '\n' + temp_string[i*max_word_number: (i+1)*max_word_number]
            
                result_string = result_string + '\n' + temp_string[(i+1)*max_word_number:]
                
    return result_string

def read_content_whole(path):
    with open(path) as ins:        
        return ins.read().decode('utf-8')
    
def read_content(path):
    return read_content_seperate(path)
        
        
# Return a dict which maps each char into an unique int id
def build_vocab(path):
    content = list(read_content(path))
    idx = 1 # 0 is left for zero-padding
    the_vocab = {}
    for word in content:
        if len(word) == 0:
            continue
        if not word in the_vocab:
            the_vocab[word] = idx
            idx += 1
    return the_vocab

# Encode a sentence with int ids
def text2id(sentence, the_vocab):
    words = list(sentence)
    return [the_vocab[w] for w in words if len(w) > 0]
            
# build char vocabluary from input
vocab = build_vocab(target_file)
print('vocab size = %d' %(len(vocab)))

vocab size = 3042


## Create LSTM Model

Now we create the a multi-layer LSTM model. The definition of LSTM cell is implemented in [lstm.py](https://github.com/dmlc/mxnet-notebooks/blob/master/python/tutorials/lstm.py).

In [5]:
import lstm
# Each line contains at most 129 chars. 
seq_len = word_per_line+1
# embedding dimension, which maps a character to a 256-dimension vector
num_embed = 256
# number of lstm layers
num_lstm_layer = 3
# hidden unit in LSTM cell
num_hidden = 512

symbol = lstm.lstm_unroll(
    num_lstm_layer, 
    seq_len,
    len(vocab) + 1,
    num_hidden=num_hidden,
    num_embed=num_embed,
    num_label=len(vocab) + 1, 
    dropout=0.2)


## Train

First, we create a DataIterator

In [6]:
import bucket_io

# The batch size for training
batch_size = 16

# initalize states for LSTM
init_c = [('l%d_init_c'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
init_h = [('l%d_init_h'%l, (batch_size, num_hidden)) for l in range(num_lstm_layer)]
init_states = init_c + init_h

# Even though BucketSentenceIter supports various length examples,
# we simply use the fixed length version here
data_train = bucket_io.BucketSentenceIter(
    target_file, 
    vocab, 
    [seq_len], 
    batch_size,             
    init_states, 
    seperate_char='\n',
    text2id=text2id, 
    read_content=read_content)

bucket of len  65 : 2924 samples


Then we can train with the standard `model.fit` approach.

In [7]:
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
import mxnet as mx
import numpy as np
import logging
logging.getLogger().setLevel(logging.DEBUG)

# We will show a quick demo with only 1 epoch. In practice, we can set it to be 100
num_epoch = 300
# learning rate 
learning_rate = 0.01

# checkpoint_path = './checkpoint/shediao'
checkpoint_path = './checkpoint/shediao_part'

# Evaluation metric
def Perplexity(label, pred):
    loss = 0.
    for i in range(pred.shape[0]):
        loss += -np.log(max(1e-10, pred[i][int(label[i])]))
    return np.exp(loss / label.size)

model = mx.mod.Module(symbol=symbol,
                      data_names=[x[0] for x in data_train.provide_data],
                      label_names=[y[0] for y in data_train.provide_label],
                      context=[mx.gpu(0)])

model.fit(train_data=data_train,
          num_epoch=num_epoch,
          optimizer='sgd',
          optimizer_params={'learning_rate':learning_rate, 'momentum':0, 'wd':0.0001},
          initializer=mx.init.Xavier(factor_type="in", magnitude=2.34),
          eval_metric=mx.metric.np(Perplexity),
          batch_end_callback=mx.callback.Speedometer(batch_size, 20),
          epoch_end_callback=mx.callback.do_checkpoint(checkpoint_path, 5))

INFO:root:Epoch[0] Batch [20]	Speed: 135.29 samples/sec	Perplexity=2162.170146
INFO:root:Epoch[0] Batch [40]	Speed: 134.40 samples/sec	Perplexity=1668.860761
INFO:root:Epoch[0] Batch [60]	Speed: 137.85 samples/sec	Perplexity=1200.667986
INFO:root:Epoch[0] Batch [80]	Speed: 135.70 samples/sec	Perplexity=4337.388763
INFO:root:Epoch[0] Batch [100]	Speed: 135.89 samples/sec	Perplexity=799.584588
INFO:root:Epoch[0] Batch [120]	Speed: 138.65 samples/sec	Perplexity=562.060644
INFO:root:Epoch[0] Batch [140]	Speed: 134.80 samples/sec	Perplexity=465.964565
INFO:root:Epoch[0] Batch [160]	Speed: 132.73 samples/sec	Perplexity=428.486783
INFO:root:Epoch[0] Batch [180]	Speed: 134.85 samples/sec	Perplexity=450.312711
INFO:root:Epoch[0] Train-Perplexity=334.233429
INFO:root:Epoch[0] Time cost=22.128
INFO:root:Epoch[1] Batch [20]	Speed: 135.99 samples/sec	Perplexity=399.842340
INFO:root:Epoch[1] Batch [40]	Speed: 137.06 samples/sec	Perplexity=422.363518
INFO:root:Epoch[1] Batch [60]	Speed: 137.60 sample

INFO:root:Epoch[10] Batch [60]	Speed: 133.13 samples/sec	Perplexity=218.815480
INFO:root:Epoch[10] Batch [80]	Speed: 136.05 samples/sec	Perplexity=251.876378
INFO:root:Epoch[10] Batch [100]	Speed: 136.69 samples/sec	Perplexity=251.621474
INFO:root:Epoch[10] Batch [120]	Speed: 137.73 samples/sec	Perplexity=255.080627
INFO:root:Epoch[10] Batch [140]	Speed: 133.92 samples/sec	Perplexity=229.731658
INFO:root:Epoch[10] Batch [160]	Speed: 135.48 samples/sec	Perplexity=215.455716
INFO:root:Epoch[10] Batch [180]	Speed: 136.81 samples/sec	Perplexity=226.247079
INFO:root:Epoch[10] Train-Perplexity=154.687689
INFO:root:Epoch[10] Time cost=21.475
INFO:root:Epoch[11] Batch [20]	Speed: 137.12 samples/sec	Perplexity=210.673870
INFO:root:Epoch[11] Batch [40]	Speed: 137.28 samples/sec	Perplexity=226.551023
INFO:root:Epoch[11] Batch [60]	Speed: 135.50 samples/sec	Perplexity=211.265385
INFO:root:Epoch[11] Batch [80]	Speed: 132.47 samples/sec	Perplexity=240.708036
INFO:root:Epoch[11] Batch [100]	Speed: 13

INFO:root:Epoch[20] Batch [80]	Speed: 133.32 samples/sec	Perplexity=179.299526
INFO:root:Epoch[20] Batch [100]	Speed: 134.15 samples/sec	Perplexity=178.451834
INFO:root:Epoch[20] Batch [120]	Speed: 135.51 samples/sec	Perplexity=180.834568
INFO:root:Epoch[20] Batch [140]	Speed: 135.00 samples/sec	Perplexity=166.328884
INFO:root:Epoch[20] Batch [160]	Speed: 137.53 samples/sec	Perplexity=156.872062
INFO:root:Epoch[20] Batch [180]	Speed: 138.06 samples/sec	Perplexity=160.015455
INFO:root:Epoch[20] Train-Perplexity=107.894673
INFO:root:Epoch[20] Time cost=21.548
INFO:root:Epoch[21] Batch [20]	Speed: 136.49 samples/sec	Perplexity=152.507144
INFO:root:Epoch[21] Batch [40]	Speed: 136.86 samples/sec	Perplexity=160.988265
INFO:root:Epoch[21] Batch [60]	Speed: 135.66 samples/sec	Perplexity=150.412693
INFO:root:Epoch[21] Batch [80]	Speed: 136.23 samples/sec	Perplexity=174.137713
INFO:root:Epoch[21] Batch [100]	Speed: 135.69 samples/sec	Perplexity=174.111817
INFO:root:Epoch[21] Batch [120]	Speed: 1

INFO:root:Epoch[30] Batch [100]	Speed: 136.78 samples/sec	Perplexity=128.271057
INFO:root:Epoch[30] Batch [120]	Speed: 137.74 samples/sec	Perplexity=131.654851
INFO:root:Epoch[30] Batch [140]	Speed: 134.46 samples/sec	Perplexity=122.318546
INFO:root:Epoch[30] Batch [160]	Speed: 136.11 samples/sec	Perplexity=113.625406
INFO:root:Epoch[30] Batch [180]	Speed: 131.71 samples/sec	Perplexity=116.066102
INFO:root:Epoch[30] Train-Perplexity=78.935585
INFO:root:Epoch[30] Time cost=21.461
INFO:root:Epoch[31] Batch [20]	Speed: 136.08 samples/sec	Perplexity=112.754378
INFO:root:Epoch[31] Batch [40]	Speed: 134.59 samples/sec	Perplexity=118.964622
INFO:root:Epoch[31] Batch [60]	Speed: 137.09 samples/sec	Perplexity=110.879722
INFO:root:Epoch[31] Batch [80]	Speed: 135.13 samples/sec	Perplexity=124.908171
INFO:root:Epoch[31] Batch [100]	Speed: 136.76 samples/sec	Perplexity=124.656820
INFO:root:Epoch[31] Batch [120]	Speed: 138.10 samples/sec	Perplexity=127.585769
INFO:root:Epoch[31] Batch [140]	Speed: 1

INFO:root:Epoch[40] Batch [140]	Speed: 135.91 samples/sec	Perplexity=90.756650
INFO:root:Epoch[40] Batch [160]	Speed: 138.19 samples/sec	Perplexity=85.590317
INFO:root:Epoch[40] Batch [180]	Speed: 136.78 samples/sec	Perplexity=87.029403
INFO:root:Epoch[40] Train-Perplexity=61.319505
INFO:root:Epoch[40] Time cost=21.358
INFO:root:Epoch[41] Batch [20]	Speed: 136.35 samples/sec	Perplexity=84.506714
INFO:root:Epoch[41] Batch [40]	Speed: 137.33 samples/sec	Perplexity=89.883357
INFO:root:Epoch[41] Batch [60]	Speed: 136.73 samples/sec	Perplexity=83.915104
INFO:root:Epoch[41] Batch [80]	Speed: 134.99 samples/sec	Perplexity=94.572494
INFO:root:Epoch[41] Batch [100]	Speed: 137.20 samples/sec	Perplexity=94.076939
INFO:root:Epoch[41] Batch [120]	Speed: 138.24 samples/sec	Perplexity=95.434353
INFO:root:Epoch[41] Batch [140]	Speed: 136.08 samples/sec	Perplexity=87.917393
INFO:root:Epoch[41] Batch [160]	Speed: 136.81 samples/sec	Perplexity=83.028108
INFO:root:Epoch[41] Batch [180]	Speed: 135.47 sampl

INFO:root:Epoch[50] Batch [180]	Speed: 136.43 samples/sec	Perplexity=67.060269
INFO:root:Epoch[50] Train-Perplexity=48.746382
INFO:root:Epoch[50] Time cost=21.467
INFO:root:Epoch[51] Batch [20]	Speed: 136.19 samples/sec	Perplexity=65.977634
INFO:root:Epoch[51] Batch [40]	Speed: 135.62 samples/sec	Perplexity=69.446734
INFO:root:Epoch[51] Batch [60]	Speed: 136.36 samples/sec	Perplexity=66.452547
INFO:root:Epoch[51] Batch [80]	Speed: 135.53 samples/sec	Perplexity=73.496914
INFO:root:Epoch[51] Batch [100]	Speed: 133.20 samples/sec	Perplexity=73.029767
INFO:root:Epoch[51] Batch [120]	Speed: 136.00 samples/sec	Perplexity=72.832739
INFO:root:Epoch[51] Batch [140]	Speed: 133.88 samples/sec	Perplexity=69.138803
INFO:root:Epoch[51] Batch [160]	Speed: 136.55 samples/sec	Perplexity=64.722169
INFO:root:Epoch[51] Batch [180]	Speed: 137.77 samples/sec	Perplexity=66.311251
INFO:root:Epoch[51] Train-Perplexity=48.108898
INFO:root:Epoch[51] Time cost=21.498
INFO:root:Epoch[52] Batch [20]	Speed: 137.47 s

INFO:root:Epoch[61] Batch [20]	Speed: 135.32 samples/sec	Perplexity=52.483622
INFO:root:Epoch[61] Batch [40]	Speed: 136.87 samples/sec	Perplexity=55.250102
INFO:root:Epoch[61] Batch [60]	Speed: 136.43 samples/sec	Perplexity=53.311304
INFO:root:Epoch[61] Batch [80]	Speed: 137.13 samples/sec	Perplexity=58.201427
INFO:root:Epoch[61] Batch [100]	Speed: 134.59 samples/sec	Perplexity=58.224816
INFO:root:Epoch[61] Batch [120]	Speed: 134.08 samples/sec	Perplexity=59.103233
INFO:root:Epoch[61] Batch [140]	Speed: 136.53 samples/sec	Perplexity=54.333209
INFO:root:Epoch[61] Batch [160]	Speed: 136.32 samples/sec	Perplexity=51.449643
INFO:root:Epoch[61] Batch [180]	Speed: 137.03 samples/sec	Perplexity=53.148337
INFO:root:Epoch[61] Train-Perplexity=38.337717
INFO:root:Epoch[61] Time cost=21.457
INFO:root:Epoch[62] Batch [20]	Speed: 134.98 samples/sec	Perplexity=52.453541
INFO:root:Epoch[62] Batch [40]	Speed: 138.08 samples/sec	Perplexity=54.160767
INFO:root:Epoch[62] Batch [60]	Speed: 136.87 samples/

INFO:root:Epoch[71] Batch [60]	Speed: 136.16 samples/sec	Perplexity=42.954017
INFO:root:Epoch[71] Batch [80]	Speed: 136.06 samples/sec	Perplexity=47.411885
INFO:root:Epoch[71] Batch [100]	Speed: 136.34 samples/sec	Perplexity=47.019448
INFO:root:Epoch[71] Batch [120]	Speed: 136.74 samples/sec	Perplexity=47.710027
INFO:root:Epoch[71] Batch [140]	Speed: 135.48 samples/sec	Perplexity=43.669224
INFO:root:Epoch[71] Batch [160]	Speed: 137.03 samples/sec	Perplexity=41.302453
INFO:root:Epoch[71] Batch [180]	Speed: 135.04 samples/sec	Perplexity=43.742522
INFO:root:Epoch[71] Train-Perplexity=33.159523
INFO:root:Epoch[71] Time cost=21.502
INFO:root:Epoch[72] Batch [20]	Speed: 136.78 samples/sec	Perplexity=41.908800
INFO:root:Epoch[72] Batch [40]	Speed: 137.65 samples/sec	Perplexity=43.930617
INFO:root:Epoch[72] Batch [60]	Speed: 137.45 samples/sec	Perplexity=42.058278
INFO:root:Epoch[72] Batch [80]	Speed: 134.82 samples/sec	Perplexity=46.723153
INFO:root:Epoch[72] Batch [100]	Speed: 134.46 samples

INFO:root:Epoch[81] Batch [100]	Speed: 137.68 samples/sec	Perplexity=36.718012
INFO:root:Epoch[81] Batch [120]	Speed: 136.62 samples/sec	Perplexity=37.200072
INFO:root:Epoch[81] Batch [140]	Speed: 137.41 samples/sec	Perplexity=35.551412
INFO:root:Epoch[81] Batch [160]	Speed: 135.88 samples/sec	Perplexity=32.981812
INFO:root:Epoch[81] Batch [180]	Speed: 136.60 samples/sec	Perplexity=34.646065
INFO:root:Epoch[81] Train-Perplexity=27.956404
INFO:root:Epoch[81] Time cost=21.364
INFO:root:Epoch[82] Batch [20]	Speed: 135.38 samples/sec	Perplexity=33.780513
INFO:root:Epoch[82] Batch [40]	Speed: 135.36 samples/sec	Perplexity=34.948005
INFO:root:Epoch[82] Batch [60]	Speed: 135.91 samples/sec	Perplexity=33.936469
INFO:root:Epoch[82] Batch [80]	Speed: 138.64 samples/sec	Perplexity=37.240224
INFO:root:Epoch[82] Batch [100]	Speed: 134.91 samples/sec	Perplexity=35.950547
INFO:root:Epoch[82] Batch [120]	Speed: 137.92 samples/sec	Perplexity=37.330525
INFO:root:Epoch[82] Batch [140]	Speed: 135.87 sampl

INFO:root:Epoch[91] Batch [140]	Speed: 136.90 samples/sec	Perplexity=28.876351
INFO:root:Epoch[91] Batch [160]	Speed: 137.70 samples/sec	Perplexity=26.628516
INFO:root:Epoch[91] Batch [180]	Speed: 137.10 samples/sec	Perplexity=27.855009
INFO:root:Epoch[91] Train-Perplexity=22.472272
INFO:root:Epoch[91] Time cost=21.501
INFO:root:Epoch[92] Batch [20]	Speed: 134.16 samples/sec	Perplexity=27.094261
INFO:root:Epoch[92] Batch [40]	Speed: 137.84 samples/sec	Perplexity=28.047034
INFO:root:Epoch[92] Batch [60]	Speed: 134.77 samples/sec	Perplexity=27.333281
INFO:root:Epoch[92] Batch [80]	Speed: 133.86 samples/sec	Perplexity=29.735255
INFO:root:Epoch[92] Batch [100]	Speed: 136.04 samples/sec	Perplexity=28.948967
INFO:root:Epoch[92] Batch [120]	Speed: 136.38 samples/sec	Perplexity=29.825768
INFO:root:Epoch[92] Batch [140]	Speed: 136.48 samples/sec	Perplexity=28.360247
INFO:root:Epoch[92] Batch [160]	Speed: 135.46 samples/sec	Perplexity=25.836858
INFO:root:Epoch[92] Batch [180]	Speed: 135.30 sampl

INFO:root:Epoch[101] Batch [180]	Speed: 134.74 samples/sec	Perplexity=23.105190
INFO:root:Epoch[101] Train-Perplexity=19.018007
INFO:root:Epoch[101] Time cost=21.415
INFO:root:Epoch[102] Batch [20]	Speed: 135.30 samples/sec	Perplexity=22.358735
INFO:root:Epoch[102] Batch [40]	Speed: 137.50 samples/sec	Perplexity=23.261236
INFO:root:Epoch[102] Batch [60]	Speed: 136.92 samples/sec	Perplexity=22.666531
INFO:root:Epoch[102] Batch [80]	Speed: 136.16 samples/sec	Perplexity=23.993995
INFO:root:Epoch[102] Batch [100]	Speed: 136.28 samples/sec	Perplexity=23.481525
INFO:root:Epoch[102] Batch [120]	Speed: 136.53 samples/sec	Perplexity=23.985352
INFO:root:Epoch[102] Batch [140]	Speed: 135.38 samples/sec	Perplexity=22.854045
INFO:root:Epoch[102] Batch [160]	Speed: 115.17 samples/sec	Perplexity=21.191865
INFO:root:Epoch[102] Batch [180]	Speed: 107.64 samples/sec	Perplexity=22.449671
INFO:root:Epoch[102] Train-Perplexity=19.011853
INFO:root:Epoch[102] Time cost=22.519
INFO:root:Epoch[103] Batch [20]	

INFO:root:Epoch[111] Train-Perplexity=16.537019
INFO:root:Epoch[111] Time cost=21.418
INFO:root:Epoch[112] Batch [20]	Speed: 135.64 samples/sec	Perplexity=18.086249
INFO:root:Epoch[112] Batch [40]	Speed: 136.50 samples/sec	Perplexity=18.884656
INFO:root:Epoch[112] Batch [60]	Speed: 136.28 samples/sec	Perplexity=18.505704
INFO:root:Epoch[112] Batch [80]	Speed: 136.33 samples/sec	Perplexity=19.405193
INFO:root:Epoch[112] Batch [100]	Speed: 135.47 samples/sec	Perplexity=19.221456
INFO:root:Epoch[112] Batch [120]	Speed: 136.32 samples/sec	Perplexity=19.273862
INFO:root:Epoch[112] Batch [140]	Speed: 135.18 samples/sec	Perplexity=19.095537
INFO:root:Epoch[112] Batch [160]	Speed: 137.15 samples/sec	Perplexity=17.426532
INFO:root:Epoch[112] Batch [180]	Speed: 134.41 samples/sec	Perplexity=18.210732
INFO:root:Epoch[112] Train-Perplexity=16.596179
INFO:root:Epoch[112] Time cost=21.471
INFO:root:Epoch[113] Batch [20]	Speed: 134.71 samples/sec	Perplexity=17.938694
INFO:root:Epoch[113] Batch [40]	S

INFO:root:Epoch[121] Time cost=21.458
INFO:root:Epoch[122] Batch [20]	Speed: 133.82 samples/sec	Perplexity=15.438426
INFO:root:Epoch[122] Batch [40]	Speed: 137.72 samples/sec	Perplexity=15.587961
INFO:root:Epoch[122] Batch [60]	Speed: 136.52 samples/sec	Perplexity=15.445463
INFO:root:Epoch[122] Batch [80]	Speed: 135.36 samples/sec	Perplexity=16.735428
INFO:root:Epoch[122] Batch [100]	Speed: 135.01 samples/sec	Perplexity=15.951826
INFO:root:Epoch[122] Batch [120]	Speed: 137.30 samples/sec	Perplexity=15.751425
INFO:root:Epoch[122] Batch [140]	Speed: 135.74 samples/sec	Perplexity=15.506725
INFO:root:Epoch[122] Batch [160]	Speed: 137.42 samples/sec	Perplexity=14.172351
INFO:root:Epoch[122] Batch [180]	Speed: 135.88 samples/sec	Perplexity=14.945201
INFO:root:Epoch[122] Train-Perplexity=13.114279
INFO:root:Epoch[122] Time cost=21.444
INFO:root:Epoch[123] Batch [20]	Speed: 138.86 samples/sec	Perplexity=14.938976
INFO:root:Epoch[123] Batch [40]	Speed: 138.32 samples/sec	Perplexity=15.108580
IN

INFO:root:Epoch[132] Batch [20]	Speed: 133.91 samples/sec	Perplexity=12.720547
INFO:root:Epoch[132] Batch [40]	Speed: 134.95 samples/sec	Perplexity=12.987731
INFO:root:Epoch[132] Batch [60]	Speed: 138.30 samples/sec	Perplexity=13.039092
INFO:root:Epoch[132] Batch [80]	Speed: 137.73 samples/sec	Perplexity=13.528944
INFO:root:Epoch[132] Batch [100]	Speed: 136.99 samples/sec	Perplexity=13.411225
INFO:root:Epoch[132] Batch [120]	Speed: 137.31 samples/sec	Perplexity=13.380356
INFO:root:Epoch[132] Batch [140]	Speed: 136.97 samples/sec	Perplexity=12.792463
INFO:root:Epoch[132] Batch [160]	Speed: 137.43 samples/sec	Perplexity=12.061631
INFO:root:Epoch[132] Batch [180]	Speed: 137.76 samples/sec	Perplexity=12.578523
INFO:root:Epoch[132] Train-Perplexity=10.876413
INFO:root:Epoch[132] Time cost=21.323
INFO:root:Epoch[133] Batch [20]	Speed: 134.55 samples/sec	Perplexity=12.695587
INFO:root:Epoch[133] Batch [40]	Speed: 133.14 samples/sec	Perplexity=13.045742
INFO:root:Epoch[133] Batch [60]	Speed: 1

INFO:root:Epoch[142] Batch [40]	Speed: 136.07 samples/sec	Perplexity=10.760992
INFO:root:Epoch[142] Batch [60]	Speed: 136.16 samples/sec	Perplexity=11.115360
INFO:root:Epoch[142] Batch [80]	Speed: 138.96 samples/sec	Perplexity=11.486159
INFO:root:Epoch[142] Batch [100]	Speed: 136.15 samples/sec	Perplexity=11.306746
INFO:root:Epoch[142] Batch [120]	Speed: 136.55 samples/sec	Perplexity=11.005531
INFO:root:Epoch[142] Batch [140]	Speed: 138.06 samples/sec	Perplexity=10.771108
INFO:root:Epoch[142] Batch [160]	Speed: 135.14 samples/sec	Perplexity=10.143455
INFO:root:Epoch[142] Batch [180]	Speed: 136.53 samples/sec	Perplexity=10.816691
INFO:root:Epoch[142] Train-Perplexity=9.439250
INFO:root:Epoch[142] Time cost=21.343
INFO:root:Epoch[143] Batch [20]	Speed: 137.58 samples/sec	Perplexity=10.385903
INFO:root:Epoch[143] Batch [40]	Speed: 134.73 samples/sec	Perplexity=10.773050
INFO:root:Epoch[143] Batch [60]	Speed: 136.08 samples/sec	Perplexity=11.019647
INFO:root:Epoch[143] Batch [80]	Speed: 13

INFO:root:Epoch[152] Batch [60]	Speed: 135.29 samples/sec	Perplexity=9.660647
INFO:root:Epoch[152] Batch [80]	Speed: 137.16 samples/sec	Perplexity=9.846250
INFO:root:Epoch[152] Batch [100]	Speed: 135.52 samples/sec	Perplexity=9.623202
INFO:root:Epoch[152] Batch [120]	Speed: 136.25 samples/sec	Perplexity=9.687733
INFO:root:Epoch[152] Batch [140]	Speed: 136.39 samples/sec	Perplexity=9.274661
INFO:root:Epoch[152] Batch [160]	Speed: 137.17 samples/sec	Perplexity=8.761987
INFO:root:Epoch[152] Batch [180]	Speed: 135.93 samples/sec	Perplexity=9.281335
INFO:root:Epoch[152] Train-Perplexity=8.578845
INFO:root:Epoch[152] Time cost=21.390
INFO:root:Epoch[153] Batch [20]	Speed: 136.22 samples/sec	Perplexity=9.089267
INFO:root:Epoch[153] Batch [40]	Speed: 138.66 samples/sec	Perplexity=9.102090
INFO:root:Epoch[153] Batch [60]	Speed: 136.44 samples/sec	Perplexity=9.568697
INFO:root:Epoch[153] Batch [80]	Speed: 136.02 samples/sec	Perplexity=9.665831
INFO:root:Epoch[153] Batch [100]	Speed: 136.60 sampl

INFO:root:Epoch[162] Batch [100]	Speed: 103.57 samples/sec	Perplexity=8.270794
INFO:root:Epoch[162] Batch [120]	Speed: 100.71 samples/sec	Perplexity=8.298972
INFO:root:Epoch[162] Batch [140]	Speed: 93.84 samples/sec	Perplexity=8.013091
INFO:root:Epoch[162] Batch [160]	Speed: 118.01 samples/sec	Perplexity=7.655347
INFO:root:Epoch[162] Batch [180]	Speed: 136.07 samples/sec	Perplexity=8.003455
INFO:root:Epoch[162] Train-Perplexity=7.904192
INFO:root:Epoch[162] Time cost=27.599
INFO:root:Epoch[163] Batch [20]	Speed: 135.83 samples/sec	Perplexity=7.965963
INFO:root:Epoch[163] Batch [40]	Speed: 136.01 samples/sec	Perplexity=8.020078
INFO:root:Epoch[163] Batch [60]	Speed: 136.47 samples/sec	Perplexity=8.271546
INFO:root:Epoch[163] Batch [80]	Speed: 136.51 samples/sec	Perplexity=8.449961
INFO:root:Epoch[163] Batch [100]	Speed: 137.81 samples/sec	Perplexity=8.178477
INFO:root:Epoch[163] Batch [120]	Speed: 133.43 samples/sec	Perplexity=8.134797
INFO:root:Epoch[163] Batch [140]	Speed: 136.06 samp

INFO:root:Epoch[172] Batch [140]	Speed: 136.90 samples/sec	Perplexity=7.253074
INFO:root:Epoch[172] Batch [160]	Speed: 134.91 samples/sec	Perplexity=6.809251
INFO:root:Epoch[172] Batch [180]	Speed: 136.33 samples/sec	Perplexity=7.154248
INFO:root:Epoch[172] Train-Perplexity=6.844307
INFO:root:Epoch[172] Time cost=21.505
INFO:root:Epoch[173] Batch [20]	Speed: 135.79 samples/sec	Perplexity=7.091058
INFO:root:Epoch[173] Batch [40]	Speed: 136.71 samples/sec	Perplexity=7.038659
INFO:root:Epoch[173] Batch [60]	Speed: 138.33 samples/sec	Perplexity=7.428236
INFO:root:Epoch[173] Batch [80]	Speed: 137.24 samples/sec	Perplexity=7.697150
INFO:root:Epoch[173] Batch [100]	Speed: 138.13 samples/sec	Perplexity=7.221827
INFO:root:Epoch[173] Batch [120]	Speed: 135.99 samples/sec	Perplexity=7.196935
INFO:root:Epoch[173] Batch [140]	Speed: 137.16 samples/sec	Perplexity=7.230321
INFO:root:Epoch[173] Batch [160]	Speed: 137.24 samples/sec	Perplexity=6.529904
INFO:root:Epoch[173] Batch [180]	Speed: 136.27 sam

INFO:root:Epoch[182] Batch [180]	Speed: 136.55 samples/sec	Perplexity=6.378430
INFO:root:Epoch[182] Train-Perplexity=6.566023
INFO:root:Epoch[182] Time cost=21.551
INFO:root:Epoch[183] Batch [20]	Speed: 136.15 samples/sec	Perplexity=6.239196
INFO:root:Epoch[183] Batch [40]	Speed: 137.44 samples/sec	Perplexity=6.480176
INFO:root:Epoch[183] Batch [60]	Speed: 137.14 samples/sec	Perplexity=6.489790
INFO:root:Epoch[183] Batch [80]	Speed: 135.80 samples/sec	Perplexity=6.866194
INFO:root:Epoch[183] Batch [100]	Speed: 135.25 samples/sec	Perplexity=6.424793
INFO:root:Epoch[183] Batch [120]	Speed: 135.79 samples/sec	Perplexity=6.524450
INFO:root:Epoch[183] Batch [140]	Speed: 137.24 samples/sec	Perplexity=6.409157
INFO:root:Epoch[183] Batch [160]	Speed: 135.16 samples/sec	Perplexity=5.878740
INFO:root:Epoch[183] Batch [180]	Speed: 136.53 samples/sec	Perplexity=6.297262
INFO:root:Epoch[183] Train-Perplexity=6.185495
INFO:root:Epoch[183] Time cost=21.416
INFO:root:Epoch[184] Batch [20]	Speed: 136.4

INFO:root:Epoch[193] Batch [20]	Speed: 135.25 samples/sec	Perplexity=5.605834
INFO:root:Epoch[193] Batch [40]	Speed: 135.26 samples/sec	Perplexity=5.906745
INFO:root:Epoch[193] Batch [60]	Speed: 134.52 samples/sec	Perplexity=5.954474
INFO:root:Epoch[193] Batch [80]	Speed: 136.01 samples/sec	Perplexity=6.027732
INFO:root:Epoch[193] Batch [100]	Speed: 134.74 samples/sec	Perplexity=5.870132
INFO:root:Epoch[193] Batch [120]	Speed: 137.17 samples/sec	Perplexity=5.704815
INFO:root:Epoch[193] Batch [140]	Speed: 138.69 samples/sec	Perplexity=5.800181
INFO:root:Epoch[193] Batch [160]	Speed: 137.83 samples/sec	Perplexity=5.343468
INFO:root:Epoch[193] Batch [180]	Speed: 137.60 samples/sec	Perplexity=5.687202
INFO:root:Epoch[193] Train-Perplexity=5.075154
INFO:root:Epoch[193] Time cost=21.404
INFO:root:Epoch[194] Batch [20]	Speed: 134.67 samples/sec	Perplexity=5.534637
INFO:root:Epoch[194] Batch [40]	Speed: 132.83 samples/sec	Perplexity=5.701405
INFO:root:Epoch[194] Batch [60]	Speed: 133.81 sample

INFO:root:Epoch[203] Batch [60]	Speed: 135.63 samples/sec	Perplexity=5.324242
INFO:root:Epoch[203] Batch [80]	Speed: 138.46 samples/sec	Perplexity=5.599601
INFO:root:Epoch[203] Batch [100]	Speed: 136.64 samples/sec	Perplexity=5.351364
INFO:root:Epoch[203] Batch [120]	Speed: 136.73 samples/sec	Perplexity=5.296945
INFO:root:Epoch[203] Batch [140]	Speed: 136.39 samples/sec	Perplexity=5.191911
INFO:root:Epoch[203] Batch [160]	Speed: 137.72 samples/sec	Perplexity=4.925436
INFO:root:Epoch[203] Batch [180]	Speed: 137.51 samples/sec	Perplexity=5.198151
INFO:root:Epoch[203] Train-Perplexity=4.647330
INFO:root:Epoch[203] Time cost=21.329
INFO:root:Epoch[204] Batch [20]	Speed: 138.50 samples/sec	Perplexity=5.122490
INFO:root:Epoch[204] Batch [40]	Speed: 137.73 samples/sec	Perplexity=5.116365
INFO:root:Epoch[204] Batch [60]	Speed: 137.13 samples/sec	Perplexity=5.307428
INFO:root:Epoch[204] Batch [80]	Speed: 137.54 samples/sec	Perplexity=5.377876
INFO:root:Epoch[204] Batch [100]	Speed: 137.45 sampl

INFO:root:Epoch[213] Batch [100]	Speed: 137.16 samples/sec	Perplexity=4.835868
INFO:root:Epoch[213] Batch [120]	Speed: 135.79 samples/sec	Perplexity=4.902615
INFO:root:Epoch[213] Batch [140]	Speed: 136.76 samples/sec	Perplexity=4.777629
INFO:root:Epoch[213] Batch [160]	Speed: 135.93 samples/sec	Perplexity=4.574909
INFO:root:Epoch[213] Batch [180]	Speed: 130.60 samples/sec	Perplexity=4.854170
INFO:root:Epoch[213] Train-Perplexity=5.822443
INFO:root:Epoch[213] Time cost=21.507
INFO:root:Epoch[214] Batch [20]	Speed: 136.14 samples/sec	Perplexity=4.595949
INFO:root:Epoch[214] Batch [40]	Speed: 135.41 samples/sec	Perplexity=4.766049
INFO:root:Epoch[214] Batch [60]	Speed: 137.17 samples/sec	Perplexity=4.955549
INFO:root:Epoch[214] Batch [80]	Speed: 132.54 samples/sec	Perplexity=5.064602
INFO:root:Epoch[214] Batch [100]	Speed: 135.02 samples/sec	Perplexity=4.836996
INFO:root:Epoch[214] Batch [120]	Speed: 136.90 samples/sec	Perplexity=4.841711
INFO:root:Epoch[214] Batch [140]	Speed: 137.06 sam

INFO:root:Epoch[223] Batch [140]	Speed: 136.58 samples/sec	Perplexity=4.465665
INFO:root:Epoch[223] Batch [160]	Speed: 137.15 samples/sec	Perplexity=4.329429
INFO:root:Epoch[223] Batch [180]	Speed: 136.88 samples/sec	Perplexity=4.413125
INFO:root:Epoch[223] Train-Perplexity=3.826162
INFO:root:Epoch[223] Time cost=21.486
INFO:root:Epoch[224] Batch [20]	Speed: 135.50 samples/sec	Perplexity=4.586156
INFO:root:Epoch[224] Batch [40]	Speed: 135.94 samples/sec	Perplexity=4.352446
INFO:root:Epoch[224] Batch [60]	Speed: 136.95 samples/sec	Perplexity=4.573578
INFO:root:Epoch[224] Batch [80]	Speed: 137.00 samples/sec	Perplexity=4.702141
INFO:root:Epoch[224] Batch [100]	Speed: 138.18 samples/sec	Perplexity=4.595490
INFO:root:Epoch[224] Batch [120]	Speed: 136.19 samples/sec	Perplexity=4.491194
INFO:root:Epoch[224] Batch [140]	Speed: 138.98 samples/sec	Perplexity=4.453747
INFO:root:Epoch[224] Batch [160]	Speed: 137.45 samples/sec	Perplexity=4.157486
INFO:root:Epoch[224] Batch [180]	Speed: 137.08 sam

INFO:root:Epoch[233] Batch [180]	Speed: 134.36 samples/sec	Perplexity=4.086481
INFO:root:Epoch[233] Train-Perplexity=4.510376
INFO:root:Epoch[233] Time cost=21.406
INFO:root:Epoch[234] Batch [20]	Speed: 136.55 samples/sec	Perplexity=4.120104
INFO:root:Epoch[234] Batch [40]	Speed: 138.98 samples/sec	Perplexity=4.165306
INFO:root:Epoch[234] Batch [60]	Speed: 138.36 samples/sec	Perplexity=4.297152
INFO:root:Epoch[234] Batch [80]	Speed: 135.10 samples/sec	Perplexity=4.357692
INFO:root:Epoch[234] Batch [100]	Speed: 137.85 samples/sec	Perplexity=4.122818
INFO:root:Epoch[234] Batch [120]	Speed: 136.85 samples/sec	Perplexity=4.174193
INFO:root:Epoch[234] Batch [140]	Speed: 135.98 samples/sec	Perplexity=4.060846
INFO:root:Epoch[234] Batch [160]	Speed: 134.09 samples/sec	Perplexity=3.918152
INFO:root:Epoch[234] Batch [180]	Speed: 134.85 samples/sec	Perplexity=4.003228
INFO:root:Epoch[234] Train-Perplexity=3.959405
INFO:root:Epoch[234] Time cost=21.372
INFO:root:Saved checkpoint to "./checkpoint/

INFO:root:Epoch[244] Batch [20]	Speed: 135.09 samples/sec	Perplexity=3.782319
INFO:root:Epoch[244] Batch [40]	Speed: 135.68 samples/sec	Perplexity=3.976937
INFO:root:Epoch[244] Batch [60]	Speed: 135.51 samples/sec	Perplexity=3.926526
INFO:root:Epoch[244] Batch [80]	Speed: 135.33 samples/sec	Perplexity=3.982367
INFO:root:Epoch[244] Batch [100]	Speed: 134.10 samples/sec	Perplexity=3.883554
INFO:root:Epoch[244] Batch [120]	Speed: 135.86 samples/sec	Perplexity=3.931775
INFO:root:Epoch[244] Batch [140]	Speed: 136.62 samples/sec	Perplexity=3.784502
INFO:root:Epoch[244] Batch [160]	Speed: 138.52 samples/sec	Perplexity=3.675207
INFO:root:Epoch[244] Batch [180]	Speed: 136.07 samples/sec	Perplexity=3.921634
INFO:root:Epoch[244] Train-Perplexity=3.896060
INFO:root:Epoch[244] Time cost=21.483
INFO:root:Saved checkpoint to "./checkpoint/shediao_part-0245.params"
INFO:root:Epoch[245] Batch [20]	Speed: 139.00 samples/sec	Perplexity=3.837480
INFO:root:Epoch[245] Batch [40]	Speed: 137.48 samples/sec	Pe

INFO:root:Epoch[254] Batch [60]	Speed: 101.71 samples/sec	Perplexity=3.751933
INFO:root:Epoch[254] Batch [80]	Speed: 98.43 samples/sec	Perplexity=4.020225
INFO:root:Epoch[254] Batch [100]	Speed: 98.36 samples/sec	Perplexity=3.638975
INFO:root:Epoch[254] Batch [120]	Speed: 103.48 samples/sec	Perplexity=3.703654
INFO:root:Epoch[254] Batch [140]	Speed: 104.93 samples/sec	Perplexity=3.450927
INFO:root:Epoch[254] Batch [160]	Speed: 102.73 samples/sec	Perplexity=3.421646
INFO:root:Epoch[254] Batch [180]	Speed: 104.20 samples/sec	Perplexity=3.577016
INFO:root:Epoch[254] Train-Perplexity=3.990318
INFO:root:Epoch[254] Time cost=28.468
INFO:root:Saved checkpoint to "./checkpoint/shediao_part-0255.params"
INFO:root:Epoch[255] Batch [20]	Speed: 100.75 samples/sec	Perplexity=3.587546
INFO:root:Epoch[255] Batch [40]	Speed: 112.54 samples/sec	Perplexity=3.566221
INFO:root:Epoch[255] Batch [60]	Speed: 134.98 samples/sec	Perplexity=3.767176
INFO:root:Epoch[255] Batch [80]	Speed: 137.86 samples/sec	Perp

INFO:root:Epoch[264] Batch [100]	Speed: 137.42 samples/sec	Perplexity=3.387052
INFO:root:Epoch[264] Batch [120]	Speed: 134.83 samples/sec	Perplexity=3.454314
INFO:root:Epoch[264] Batch [140]	Speed: 137.13 samples/sec	Perplexity=3.435755
INFO:root:Epoch[264] Batch [160]	Speed: 136.04 samples/sec	Perplexity=3.226901
INFO:root:Epoch[264] Batch [180]	Speed: 136.18 samples/sec	Perplexity=3.478800
INFO:root:Epoch[264] Train-Perplexity=3.245876
INFO:root:Epoch[264] Time cost=21.425
INFO:root:Saved checkpoint to "./checkpoint/shediao_part-0265.params"
INFO:root:Epoch[265] Batch [20]	Speed: 138.66 samples/sec	Perplexity=3.484174
INFO:root:Epoch[265] Batch [40]	Speed: 138.11 samples/sec	Perplexity=3.783809
INFO:root:Epoch[265] Batch [60]	Speed: 137.23 samples/sec	Perplexity=3.776184
INFO:root:Epoch[265] Batch [80]	Speed: 133.38 samples/sec	Perplexity=3.769586
INFO:root:Epoch[265] Batch [100]	Speed: 138.17 samples/sec	Perplexity=3.543584
INFO:root:Epoch[265] Batch [120]	Speed: 137.46 samples/sec	

INFO:root:Epoch[274] Batch [140]	Speed: 137.31 samples/sec	Perplexity=3.169024
INFO:root:Epoch[274] Batch [160]	Speed: 135.53 samples/sec	Perplexity=3.155725
INFO:root:Epoch[274] Batch [180]	Speed: 135.28 samples/sec	Perplexity=3.288111
INFO:root:Epoch[274] Train-Perplexity=3.163533
INFO:root:Epoch[274] Time cost=21.470
INFO:root:Saved checkpoint to "./checkpoint/shediao_part-0275.params"
INFO:root:Epoch[275] Batch [20]	Speed: 133.73 samples/sec	Perplexity=3.160969
INFO:root:Epoch[275] Batch [40]	Speed: 135.19 samples/sec	Perplexity=3.217887
INFO:root:Epoch[275] Batch [60]	Speed: 135.49 samples/sec	Perplexity=3.363209
INFO:root:Epoch[275] Batch [80]	Speed: 136.49 samples/sec	Perplexity=3.332137
INFO:root:Epoch[275] Batch [100]	Speed: 135.29 samples/sec	Perplexity=3.245790
INFO:root:Epoch[275] Batch [120]	Speed: 134.18 samples/sec	Perplexity=3.315350
INFO:root:Epoch[275] Batch [140]	Speed: 135.89 samples/sec	Perplexity=3.103540
INFO:root:Epoch[275] Batch [160]	Speed: 136.02 samples/sec	

INFO:root:Epoch[284] Batch [180]	Speed: 135.82 samples/sec	Perplexity=3.073875
INFO:root:Epoch[284] Train-Perplexity=2.900739
INFO:root:Epoch[284] Time cost=21.368
INFO:root:Saved checkpoint to "./checkpoint/shediao_part-0285.params"
INFO:root:Epoch[285] Batch [20]	Speed: 134.81 samples/sec	Perplexity=3.019751
INFO:root:Epoch[285] Batch [40]	Speed: 134.99 samples/sec	Perplexity=3.189451
INFO:root:Epoch[285] Batch [60]	Speed: 135.63 samples/sec	Perplexity=3.202553
INFO:root:Epoch[285] Batch [80]	Speed: 134.75 samples/sec	Perplexity=3.289937
INFO:root:Epoch[285] Batch [100]	Speed: 136.59 samples/sec	Perplexity=3.107184
INFO:root:Epoch[285] Batch [120]	Speed: 136.47 samples/sec	Perplexity=3.152654
INFO:root:Epoch[285] Batch [140]	Speed: 136.88 samples/sec	Perplexity=3.079745
INFO:root:Epoch[285] Batch [160]	Speed: 136.27 samples/sec	Perplexity=2.936281
INFO:root:Epoch[285] Batch [180]	Speed: 137.13 samples/sec	Perplexity=3.040042
INFO:root:Epoch[285] Train-Perplexity=2.701407
INFO:root:Ep

INFO:root:Saved checkpoint to "./checkpoint/shediao_part-0295.params"
INFO:root:Epoch[295] Batch [20]	Speed: 135.35 samples/sec	Perplexity=2.907000
INFO:root:Epoch[295] Batch [40]	Speed: 137.27 samples/sec	Perplexity=3.014757
INFO:root:Epoch[295] Batch [60]	Speed: 136.78 samples/sec	Perplexity=3.066231
INFO:root:Epoch[295] Batch [80]	Speed: 135.50 samples/sec	Perplexity=3.155882
INFO:root:Epoch[295] Batch [100]	Speed: 136.43 samples/sec	Perplexity=2.963087
INFO:root:Epoch[295] Batch [120]	Speed: 135.08 samples/sec	Perplexity=2.911963
INFO:root:Epoch[295] Batch [140]	Speed: 136.20 samples/sec	Perplexity=2.922032
INFO:root:Epoch[295] Batch [160]	Speed: 136.59 samples/sec	Perplexity=2.777879
INFO:root:Epoch[295] Batch [180]	Speed: 138.00 samples/sec	Perplexity=2.923332
INFO:root:Epoch[295] Train-Perplexity=3.073210
INFO:root:Epoch[295] Time cost=21.383
INFO:root:Epoch[296] Batch [20]	Speed: 136.01 samples/sec	Perplexity=2.981348
INFO:root:Epoch[296] Batch [40]	Speed: 134.50 samples/sec	Pe