Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='SST-2', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='rand')
Use gpu0
Downloading data/sst-2/train-61f1f238.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/sst-2/train-61f1f238.zip...
Downloading data/sst-2/test-a39c1db6.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/sst-2/test-a39c1db6.zip...
Downloading data/sst-2/dev-65511587.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/sst-2/dev-65511587.zip...
maximum length (in tokens): 53
Done! Tokenizing Time=0.83s, #Sentences=76961
Done! Tokenizing Time=0.03s, #Sentences=1821
Done! Tokenizing Time=0.01s, #Sentences=872
SentimentNet(
(embedding): Embedding(17244 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/1540] avg loss 0.0138004, throughput 0.748774K wps
[Epoch 0 Batch 60/1540] avg loss 0.0138109, throughput 2.84564K wps
[Epoch 0 Batch 90/1540] avg loss 0.0137973, throughput 2.86792K wps
[Epoch 0 Batch 120/1540] avg loss 0.0138202, throughput 2.84788K wps
[Epoch 0 Batch 150/1540] avg loss 0.013823, throughput 2.84192K wps
[Epoch 0 Batch 180/1540] avg loss 0.0137685, throughput 2.87025K wps
[Epoch 0 Batch 210/1540] avg loss 0.0137934, throughput 2.85502K wps
[Epoch 0 Batch 240/1540] avg loss 0.0138037, throughput 2.86384K wps
[Epoch 0 Batch 270/1540] avg loss 0.0137626, throughput 2.83265K wps
[Epoch 0 Batch 300/1540] avg loss 0.0137211, throughput 2.85611K wps
[Epoch 0 Batch 330/1540] avg loss 0.0138154, throughput 2.84877K wps
[Epoch 0 Batch 360/1540] avg loss 0.0137426, throughput 2.80955K wps
[Epoch 0 Batch 390/1540] avg loss 0.0138279, throughput 2.85582K wps
[Epoch 0 Batch 420/1540] avg loss 0.0136362, throughput 2.82855K wps
[Epoch 0 Batch 450/1540] avg loss 0.0136607, throughput 2.84329K wps
[Epoch 0 Batch 480/1540] avg loss 0.0138069, throughput 2.812K wps
[Epoch 0 Batch 510/1540] avg loss 0.0137447, throughput 2.83391K wps
[Epoch 0 Batch 540/1540] avg loss 0.0136492, throughput 2.8742K wps
[Epoch 0 Batch 570/1540] avg loss 0.0137062, throughput 2.87247K wps
[Epoch 0 Batch 600/1540] avg loss 0.0135451, throughput 2.878K wps
[Epoch 0 Batch 630/1540] avg loss 0.0136163, throughput 2.85093K wps
[Epoch 0 Batch 660/1540] avg loss 0.0137568, throughput 2.86394K wps
[Epoch 0 Batch 690/1540] avg loss 0.0137163, throughput 2.84508K wps
[Epoch 0 Batch 720/1540] avg loss 0.013598, throughput 2.79075K wps
[Epoch 0 Batch 750/1540] avg loss 0.0137718, throughput 2.78031K wps
[Epoch 0 Batch 780/1540] avg loss 0.0137142, throughput 2.86873K wps
[Epoch 0 Batch 810/1540] avg loss 0.0135996, throughput 2.86383K wps
[Epoch 0 Batch 840/1540] avg loss 0.013755, throughput 2.77059K wps
[Epoch 0 Batch 870/1540] avg loss 0.0137318, throughput 2.8167K wps
[Epoch 0 Batch 900/1540] avg loss 0.0136668, throughput 2.84457K wps
[Epoch 0 Batch 930/1540] avg loss 0.0137071, throughput 2.80267K wps
[Epoch 0 Batch 960/1540] avg loss 0.0137056, throughput 2.87838K wps
[Epoch 0 Batch 990/1540] avg loss 0.0137057, throughput 2.84067K wps
[Epoch 0 Batch 1020/1540] avg loss 0.0136072, throughput 2.82948K wps
[Epoch 0 Batch 1050/1540] avg loss 0.0136353, throughput 2.87821K wps
[Epoch 0 Batch 1080/1540] avg loss 0.013673, throughput 2.8781K wps
[Epoch 0 Batch 1110/1540] avg loss 0.0136834, throughput 2.87317K wps
[Epoch 0 Batch 1140/1540] avg loss 0.0136265, throughput 2.83966K wps
[Epoch 0 Batch 1170/1540] avg loss 0.0137202, throughput 2.83108K wps
[Epoch 0 Batch 1200/1540] avg loss 0.0136948, throughput 2.85214K wps
[Epoch 0 Batch 1230/1540] avg loss 0.0137064, throughput 2.8669K wps
[Epoch 0 Batch 1260/1540] avg loss 0.0137498, throughput 2.80595K wps
[Epoch 0 Batch 1290/1540] avg loss 0.0137196, throughput 2.77533K wps
[Epoch 0 Batch 1320/1540] avg loss 0.0136522, throughput 2.8534K wps
[Epoch 0 Batch 1350/1540] avg loss 0.013645, throughput 2.865K wps
[Epoch 0 Batch 1380/1540] avg loss 0.0136207, throughput 2.85378K wps
[Epoch 0 Batch 1410/1540] avg loss 0.0136265, throughput 2.84326K wps
[Epoch 0 Batch 1440/1540] avg loss 0.0136989, throughput 2.79218K wps
[Epoch 0 Batch 1470/1540] avg loss 0.013661, throughput 2.8306K wps
[Epoch 0 Batch 1500/1540] avg loss 0.013638, throughput 2.87707K wps
[Epoch 0 Batch 1530/1540] avg loss 0.0136524, throughput 2.87419K wps
Begin Testing...
[Epoch 0] train avg loss 0.013714, dev acc 0.5596, dev avg loss 0.687023, throughput 2.60711K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 1 Batch 30/1540] avg loss 0.0135832, throughput 2.86227K wps
[Epoch 1 Batch 60/1540] avg loss 0.0135932, throughput 2.82791K wps
[Epoch 1 Batch 90/1540] avg loss 0.0136373, throughput 2.87172K wps
[Epoch 1 Batch 120/1540] avg loss 0.0134875, throughput 2.86972K wps
[Epoch 1 Batch 150/1540] avg loss 0.0136505, throughput 2.85811K wps
[Epoch 1 Batch 180/1540] avg loss 0.0136201, throughput 2.82788K wps
[Epoch 1 Batch 210/1540] avg loss 0.013589, throughput 2.85694K wps
[Epoch 1 Batch 240/1540] avg loss 0.0135679, throughput 2.87736K wps
[Epoch 1 Batch 270/1540] avg loss 0.0135552, throughput 2.87208K wps
[Epoch 1 Batch 300/1540] avg loss 0.0136459, throughput 2.84159K wps
[Epoch 1 Batch 330/1540] avg loss 0.0136596, throughput 2.78248K wps
[Epoch 1 Batch 360/1540] avg loss 0.0135834, throughput 2.85935K wps
[Epoch 1 Batch 390/1540] avg loss 0.0136186, throughput 2.84167K wps
[Epoch 1 Batch 420/1540] avg loss 0.0136105, throughput 2.80499K wps
[Epoch 1 Batch 450/1540] avg loss 0.0137079, throughput 2.8343K wps
[Epoch 1 Batch 480/1540] avg loss 0.0134815, throughput 2.80809K wps
[Epoch 1 Batch 510/1540] avg loss 0.0136119, throughput 2.88064K wps
[Epoch 1 Batch 540/1540] avg loss 0.0134643, throughput 2.87233K wps
[Epoch 1 Batch 570/1540] avg loss 0.0136057, throughput 2.80412K wps
[Epoch 1 Batch 600/1540] avg loss 0.0135286, throughput 2.8745K wps
[Epoch 1 Batch 630/1540] avg loss 0.0134499, throughput 2.83625K wps
[Epoch 1 Batch 660/1540] avg loss 0.0135644, throughput 2.83596K wps
[Epoch 1 Batch 690/1540] avg loss 0.0135062, throughput 2.86191K wps
[Epoch 1 Batch 720/1540] avg loss 0.0135646, throughput 2.87746K wps
[Epoch 1 Batch 750/1540] avg loss 0.0136507, throughput 2.87634K wps
[Epoch 1 Batch 780/1540] avg loss 0.0135311, throughput 2.82194K wps
[Epoch 1 Batch 810/1540] avg loss 0.0136298, throughput 2.84957K wps
[Epoch 1 Batch 840/1540] avg loss 0.0135529, throughput 2.832K wps
[Epoch 1 Batch 870/1540] avg loss 0.0135618, throughput 2.87618K wps
[Epoch 1 Batch 900/1540] avg loss 0.013478, throughput 2.87753K wps
[Epoch 1 Batch 930/1540] avg loss 0.0136277, throughput 2.83478K wps
[Epoch 1 Batch 960/1540] avg loss 0.0134312, throughput 2.84745K wps
[Epoch 1 Batch 990/1540] avg loss 0.0135353, throughput 2.86332K wps
[Epoch 1 Batch 1020/1540] avg loss 0.0134932, throughput 2.82394K wps
[Epoch 1 Batch 1050/1540] avg loss 0.0134916, throughput 2.86456K wps
[Epoch 1 Batch 1080/1540] avg loss 0.0135267, throughput 2.88255K wps
[Epoch 1 Batch 1110/1540] avg loss 0.0135142, throughput 2.84973K wps
[Epoch 1 Batch 1140/1540] avg loss 0.0134453, throughput 2.78757K wps
[Epoch 1 Batch 1170/1540] avg loss 0.0134157, throughput 2.7842K wps
[Epoch 1 Batch 1200/1540] avg loss 0.0134685, throughput 2.83834K wps
[Epoch 1 Batch 1230/1540] avg loss 0.0136388, throughput 2.8008K wps
[Epoch 1 Batch 1260/1540] avg loss 0.0134214, throughput 2.85152K wps
[Epoch 1 Batch 1290/1540] avg loss 0.0133604, throughput 2.86685K wps
[Epoch 1 Batch 1320/1540] avg loss 0.0134374, throughput 2.86778K wps
[Epoch 1 Batch 1350/1540] avg loss 0.0135148, throughput 2.86068K wps
[Epoch 1 Batch 1380/1540] avg loss 0.0134876, throughput 2.86039K wps
[Epoch 1 Batch 1410/1540] avg loss 0.0134357, throughput 2.86624K wps
[Epoch 1 Batch 1440/1540] avg loss 0.013369, throughput 2.85966K wps
[Epoch 1 Batch 1470/1540] avg loss 0.0134107, throughput 2.85568K wps
[Epoch 1 Batch 1500/1540] avg loss 0.0134085, throughput 2.84484K wps
[Epoch 1 Batch 1530/1540] avg loss 0.0134981, throughput 2.87142K wps
Begin Testing...
[Epoch 1] train avg loss 0.0135392, dev acc 0.6307, dev avg loss 0.672574, throughput 2.8479K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 2 Batch 30/1540] avg loss 0.0133759, throughput 2.87948K wps
[Epoch 2 Batch 60/1540] avg loss 0.0132897, throughput 2.87358K wps
[Epoch 2 Batch 90/1540] avg loss 0.0134477, throughput 2.87964K wps
[Epoch 2 Batch 120/1540] avg loss 0.0133849, throughput 2.88407K wps
[Epoch 2 Batch 150/1540] avg loss 0.0134694, throughput 2.85355K wps
[Epoch 2 Batch 180/1540] avg loss 0.0134142, throughput 2.87801K wps
[Epoch 2 Batch 210/1540] avg loss 0.0133096, throughput 2.88083K wps
[Epoch 2 Batch 240/1540] avg loss 0.0134146, throughput 2.85674K wps
[Epoch 2 Batch 270/1540] avg loss 0.0134303, throughput 2.82348K wps
[Epoch 2 Batch 300/1540] avg loss 0.0133716, throughput 2.84946K wps
[Epoch 2 Batch 330/1540] avg loss 0.0133088, throughput 2.85484K wps
[Epoch 2 Batch 360/1540] avg loss 0.0133007, throughput 2.8751K wps
[Epoch 2 Batch 390/1540] avg loss 0.0133131, throughput 2.87699K wps
[Epoch 2 Batch 420/1540] avg loss 0.0133858, throughput 2.87544K wps
[Epoch 2 Batch 450/1540] avg loss 0.0133583, throughput 2.83946K wps
[Epoch 2 Batch 480/1540] avg loss 0.0132419, throughput 2.84357K wps
[Epoch 2 Batch 510/1540] avg loss 0.0133307, throughput 2.87255K wps
[Epoch 2 Batch 540/1540] avg loss 0.0131001, throughput 2.86574K wps
[Epoch 2 Batch 570/1540] avg loss 0.0133911, throughput 2.86791K wps
[Epoch 2 Batch 600/1540] avg loss 0.0133887, throughput 2.8555K wps
[Epoch 2 Batch 630/1540] avg loss 0.0133636, throughput 2.85588K wps
[Epoch 2 Batch 660/1540] avg loss 0.0133111, throughput 2.86026K wps
[Epoch 2 Batch 690/1540] avg loss 0.0132738, throughput 2.86662K wps
[Epoch 2 Batch 720/1540] avg loss 0.0131999, throughput 2.85669K wps
[Epoch 2 Batch 750/1540] avg loss 0.0132786, throughput 2.85791K wps
[Epoch 2 Batch 780/1540] avg loss 0.0132843, throughput 2.83769K wps
[Epoch 2 Batch 810/1540] avg loss 0.0133804, throughput 2.84937K wps
[Epoch 2 Batch 840/1540] avg loss 0.013238, throughput 2.83274K wps
[Epoch 2 Batch 870/1540] avg loss 0.0134047, throughput 2.83185K wps
[Epoch 2 Batch 900/1540] avg loss 0.0132575, throughput 2.86519K wps
[Epoch 2 Batch 930/1540] avg loss 0.0131964, throughput 2.81784K wps
[Epoch 2 Batch 960/1540] avg loss 0.0132731, throughput 2.85009K wps
[Epoch 2 Batch 990/1540] avg loss 0.0132442, throughput 2.84527K wps
[Epoch 2 Batch 1020/1540] avg loss 0.0132967, throughput 2.87948K wps
[Epoch 2 Batch 1050/1540] avg loss 0.0132044, throughput 2.85214K wps
[Epoch 2 Batch 1080/1540] avg loss 0.0131556, throughput 2.87514K wps
[Epoch 2 Batch 1110/1540] avg loss 0.0131895, throughput 2.86857K wps
[Epoch 2 Batch 1140/1540] avg loss 0.0132053, throughput 2.86702K wps
[Epoch 2 Batch 1170/1540] avg loss 0.0130771, throughput 2.85977K wps
[Epoch 2 Batch 1200/1540] avg loss 0.0131146, throughput 2.83268K wps
[Epoch 2 Batch 1230/1540] avg loss 0.0132125, throughput 2.87218K wps
[Epoch 2 Batch 1260/1540] avg loss 0.0130669, throughput 2.87333K wps
[Epoch 2 Batch 1290/1540] avg loss 0.013026, throughput 2.87589K wps
[Epoch 2 Batch 1320/1540] avg loss 0.012992, throughput 2.87523K wps
[Epoch 2 Batch 1350/1540] avg loss 0.0130996, throughput 2.87466K wps
[Epoch 2 Batch 1380/1540] avg loss 0.0131247, throughput 2.82573K wps
[Epoch 2 Batch 1410/1540] avg loss 0.0128969, throughput 2.85584K wps
[Epoch 2 Batch 1440/1540] avg loss 0.0130748, throughput 2.86088K wps
[Epoch 2 Batch 1470/1540] avg loss 0.0130485, throughput 2.87395K wps
[Epoch 2 Batch 1500/1540] avg loss 0.013037, throughput 2.86989K wps
[Epoch 2 Batch 1530/1540] avg loss 0.0130386, throughput 2.85385K wps
Begin Testing...
[Epoch 2] train avg loss 0.0132523, dev acc 0.6800, dev avg loss 0.645965, throughput 2.85944K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 3 Batch 30/1540] avg loss 0.0129923, throughput 2.92239K wps
[Epoch 3 Batch 60/1540] avg loss 0.0130136, throughput 2.8446K wps
[Epoch 3 Batch 90/1540] avg loss 0.0130823, throughput 2.87191K wps
[Epoch 3 Batch 120/1540] avg loss 0.0131139, throughput 2.84983K wps
[Epoch 3 Batch 150/1540] avg loss 0.0128815, throughput 2.86898K wps
[Epoch 3 Batch 180/1540] avg loss 0.0129892, throughput 2.87856K wps
[Epoch 3 Batch 210/1540] avg loss 0.0129723, throughput 2.84054K wps
[Epoch 3 Batch 240/1540] avg loss 0.0130261, throughput 2.87275K wps
[Epoch 3 Batch 270/1540] avg loss 0.0130874, throughput 2.87694K wps
[Epoch 3 Batch 300/1540] avg loss 0.0129992, throughput 2.84739K wps
[Epoch 3 Batch 330/1540] avg loss 0.0127983, throughput 2.82931K wps
[Epoch 3 Batch 360/1540] avg loss 0.0128736, throughput 2.81006K wps
[Epoch 3 Batch 390/1540] avg loss 0.0126295, throughput 2.84396K wps
[Epoch 3 Batch 420/1540] avg loss 0.0129446, throughput 2.86718K wps
[Epoch 3 Batch 450/1540] avg loss 0.0129793, throughput 2.85866K wps
[Epoch 3 Batch 480/1540] avg loss 0.0128536, throughput 2.86042K wps
[Epoch 3 Batch 510/1540] avg loss 0.0128249, throughput 2.84524K wps
[Epoch 3 Batch 540/1540] avg loss 0.0128234, throughput 2.87389K wps
[Epoch 3 Batch 570/1540] avg loss 0.0125954, throughput 2.86267K wps
[Epoch 3 Batch 600/1540] avg loss 0.0128308, throughput 2.83533K wps
[Epoch 3 Batch 630/1540] avg loss 0.0129443, throughput 2.88006K wps
[Epoch 3 Batch 660/1540] avg loss 0.0127793, throughput 2.87001K wps
[Epoch 3 Batch 690/1540] avg loss 0.012889, throughput 2.87715K wps
[Epoch 3 Batch 720/1540] avg loss 0.0126644, throughput 2.85958K wps
[Epoch 3 Batch 750/1540] avg loss 0.0127483, throughput 2.84097K wps
[Epoch 3 Batch 780/1540] avg loss 0.0126084, throughput 2.83583K wps
[Epoch 3 Batch 810/1540] avg loss 0.0126001, throughput 2.86778K wps
[Epoch 3 Batch 840/1540] avg loss 0.0126695, throughput 2.85772K wps
[Epoch 3 Batch 870/1540] avg loss 0.0125709, throughput 2.87201K wps
[Epoch 3 Batch 900/1540] avg loss 0.0126231, throughput 2.82978K wps
[Epoch 3 Batch 930/1540] avg loss 0.0126425, throughput 2.85174K wps
[Epoch 3 Batch 960/1540] avg loss 0.0126107, throughput 2.80688K wps
[Epoch 3 Batch 990/1540] avg loss 0.0125201, throughput 2.85561K wps
[Epoch 3 Batch 1020/1540] avg loss 0.0127235, throughput 2.86877K wps
[Epoch 3 Batch 1050/1540] avg loss 0.0125715, throughput 2.84838K wps
[Epoch 3 Batch 1080/1540] avg loss 0.0124887, throughput 2.86394K wps
[Epoch 3 Batch 1110/1540] avg loss 0.0127491, throughput 2.83163K wps
[Epoch 3 Batch 1140/1540] avg loss 0.0124604, throughput 2.8722K wps
[Epoch 3 Batch 1170/1540] avg loss 0.0124426, throughput 2.83278K wps
[Epoch 3 Batch 1200/1540] avg loss 0.0125199, throughput 2.78996K wps
[Epoch 3 Batch 1230/1540] avg loss 0.0125532, throughput 2.84109K wps
[Epoch 3 Batch 1260/1540] avg loss 0.0124685, throughput 2.83328K wps
[Epoch 3 Batch 1290/1540] avg loss 0.0124559, throughput 2.86705K wps
[Epoch 3 Batch 1320/1540] avg loss 0.0125179, throughput 2.8554K wps
[Epoch 3 Batch 1350/1540] avg loss 0.0122735, throughput 2.87615K wps
[Epoch 3 Batch 1380/1540] avg loss 0.0125621, throughput 2.87624K wps
[Epoch 3 Batch 1410/1540] avg loss 0.0127375, throughput 2.87155K wps
[Epoch 3 Batch 1440/1540] avg loss 0.0122948, throughput 2.86514K wps
[Epoch 3 Batch 1470/1540] avg loss 0.0125175, throughput 2.83488K wps
[Epoch 3 Batch 1500/1540] avg loss 0.0123382, throughput 2.86703K wps
[Epoch 3 Batch 1530/1540] avg loss 0.012455, throughput 2.86083K wps
Begin Testing...
[Epoch 3] train avg loss 0.012718, dev acc 0.6755, dev avg loss 0.60974, throughput 2.85526K wps
[Epoch 4 Batch 30/1540] avg loss 0.0125572, throughput 2.92357K wps
[Epoch 4 Batch 60/1540] avg loss 0.0122396, throughput 2.86992K wps
[Epoch 4 Batch 90/1540] avg loss 0.0123901, throughput 2.85577K wps
[Epoch 4 Batch 120/1540] avg loss 0.012153, throughput 2.87528K wps
[Epoch 4 Batch 150/1540] avg loss 0.0123142, throughput 2.86424K wps
[Epoch 4 Batch 180/1540] avg loss 0.0122087, throughput 2.83155K wps
[Epoch 4 Batch 210/1540] avg loss 0.0120872, throughput 2.87105K wps
[Epoch 4 Batch 240/1540] avg loss 0.0121988, throughput 2.87384K wps
[Epoch 4 Batch 270/1540] avg loss 0.0121247, throughput 2.83189K wps
[Epoch 4 Batch 300/1540] avg loss 0.012065, throughput 2.8535K wps
[Epoch 4 Batch 330/1540] avg loss 0.0120954, throughput 2.82428K wps
[Epoch 4 Batch 360/1540] avg loss 0.0123357, throughput 2.84309K wps
[Epoch 4 Batch 390/1540] avg loss 0.0121202, throughput 2.84239K wps
[Epoch 4 Batch 420/1540] avg loss 0.0122495, throughput 2.87225K wps
[Epoch 4 Batch 450/1540] avg loss 0.0121583, throughput 2.85514K wps
[Epoch 4 Batch 480/1540] avg loss 0.0119734, throughput 2.85955K wps
[Epoch 4 Batch 510/1540] avg loss 0.0120053, throughput 2.86879K wps
[Epoch 4 Batch 540/1540] avg loss 0.0120942, throughput 2.85683K wps
[Epoch 4 Batch 570/1540] avg loss 0.0122327, throughput 2.85976K wps
[Epoch 4 Batch 600/1540] avg loss 0.0118178, throughput 2.84185K wps
[Epoch 4 Batch 630/1540] avg loss 0.0119157, throughput 2.79804K wps
[Epoch 4 Batch 660/1540] avg loss 0.0118483, throughput 2.85401K wps
[Epoch 4 Batch 690/1540] avg loss 0.0117789, throughput 2.86022K wps
[Epoch 4 Batch 720/1540] avg loss 0.011994, throughput 2.84566K wps
[Epoch 4 Batch 750/1540] avg loss 0.0117548, throughput 2.77875K wps
[Epoch 4 Batch 780/1540] avg loss 0.0116677, throughput 2.84742K wps
[Epoch 4 Batch 810/1540] avg loss 0.0116204, throughput 2.87215K wps
[Epoch 4 Batch 840/1540] avg loss 0.0119183, throughput 2.88186K wps
[Epoch 4 Batch 870/1540] avg loss 0.0118464, throughput 2.87139K wps
[Epoch 4 Batch 900/1540] avg loss 0.0118322, throughput 2.81555K wps
[Epoch 4 Batch 930/1540] avg loss 0.011724, throughput 2.86834K wps
[Epoch 4 Batch 960/1540] avg loss 0.0121262, throughput 2.87378K wps
[Epoch 4 Batch 990/1540] avg loss 0.0119363, throughput 2.86169K wps
[Epoch 4 Batch 1020/1540] avg loss 0.0118567, throughput 2.87693K wps
[Epoch 4 Batch 1050/1540] avg loss 0.0117349, throughput 2.80259K wps
[Epoch 4 Batch 1080/1540] avg loss 0.0113797, throughput 2.85144K wps
[Epoch 4 Batch 1110/1540] avg loss 0.011804, throughput 2.8186K wps
[Epoch 4 Batch 1140/1540] avg loss 0.0115873, throughput 2.86864K wps
[Epoch 4 Batch 1170/1540] avg loss 0.0117139, throughput 2.85465K wps
[Epoch 4 Batch 1200/1540] avg loss 0.0116297, throughput 2.86959K wps
[Epoch 4 Batch 1230/1540] avg loss 0.0116782, throughput 2.85461K wps
[Epoch 4 Batch 1260/1540] avg loss 0.0113475, throughput 2.846K wps
[Epoch 4 Batch 1290/1540] avg loss 0.0114436, throughput 2.864K wps
[Epoch 4 Batch 1320/1540] avg loss 0.011606, throughput 2.87069K wps
[Epoch 4 Batch 1350/1540] avg loss 0.0113964, throughput 2.85623K wps
[Epoch 4 Batch 1380/1540] avg loss 0.0113831, throughput 2.84508K wps
[Epoch 4 Batch 1410/1540] avg loss 0.0115043, throughput 2.87117K wps
[Epoch 4 Batch 1440/1540] avg loss 0.0114194, throughput 2.86567K wps
[Epoch 4 Batch 1470/1540] avg loss 0.0115309, throughput 2.82584K wps
[Epoch 4 Batch 1500/1540] avg loss 0.011224, throughput 2.78014K wps
[Epoch 4 Batch 1530/1540] avg loss 0.0115118, throughput 2.81791K wps
Begin Testing...
[Epoch 4] train avg loss 0.0118697, dev acc 0.7328, dev avg loss 0.563926, throughput 2.85161K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 5 Batch 30/1540] avg loss 0.0112126, throughput 2.89362K wps
[Epoch 5 Batch 60/1540] avg loss 0.0112174, throughput 2.84711K wps
[Epoch 5 Batch 90/1540] avg loss 0.0111179, throughput 2.77902K wps
[Epoch 5 Batch 120/1540] avg loss 0.0111249, throughput 2.81623K wps
[Epoch 5 Batch 150/1540] avg loss 0.010953, throughput 2.86461K wps
[Epoch 5 Batch 180/1540] avg loss 0.0110536, throughput 2.83347K wps
[Epoch 5 Batch 210/1540] avg loss 0.0108626, throughput 2.85891K wps
[Epoch 5 Batch 240/1540] avg loss 0.0108945, throughput 2.82328K wps
[Epoch 5 Batch 270/1540] avg loss 0.0109154, throughput 2.83112K wps
[Epoch 5 Batch 300/1540] avg loss 0.0109706, throughput 2.81987K wps
[Epoch 5 Batch 330/1540] avg loss 0.0110977, throughput 2.87863K wps
[Epoch 5 Batch 360/1540] avg loss 0.0109776, throughput 2.87388K wps
[Epoch 5 Batch 390/1540] avg loss 0.0111206, throughput 2.86252K wps
[Epoch 5 Batch 420/1540] avg loss 0.0108639, throughput 2.80473K wps
[Epoch 5 Batch 450/1540] avg loss 0.0111617, throughput 2.83524K wps
[Epoch 5 Batch 480/1540] avg loss 0.0108822, throughput 2.87329K wps
[Epoch 5 Batch 510/1540] avg loss 0.0109191, throughput 2.86906K wps
[Epoch 5 Batch 540/1540] avg loss 0.0107287, throughput 2.86481K wps
[Epoch 5 Batch 570/1540] avg loss 0.0106061, throughput 2.8623K wps
[Epoch 5 Batch 600/1540] avg loss 0.0109577, throughput 2.85949K wps
[Epoch 5 Batch 630/1540] avg loss 0.0108638, throughput 2.86017K wps
[Epoch 5 Batch 660/1540] avg loss 0.0108706, throughput 2.87993K wps
[Epoch 5 Batch 690/1540] avg loss 0.0108266, throughput 2.80922K wps
[Epoch 5 Batch 720/1540] avg loss 0.0107844, throughput 2.87708K wps
[Epoch 5 Batch 750/1540] avg loss 0.010623, throughput 2.88049K wps
[Epoch 5 Batch 780/1540] avg loss 0.010808, throughput 2.87552K wps
[Epoch 5 Batch 810/1540] avg loss 0.0105911, throughput 2.87466K wps
[Epoch 5 Batch 840/1540] avg loss 0.0104744, throughput 2.8703K wps
[Epoch 5 Batch 870/1540] avg loss 0.0105409, throughput 2.87683K wps
[Epoch 5 Batch 900/1540] avg loss 0.0104029, throughput 2.82022K wps
[Epoch 5 Batch 930/1540] avg loss 0.0107438, throughput 2.87093K wps
[Epoch 5 Batch 960/1540] avg loss 0.0104448, throughput 2.85399K wps
[Epoch 5 Batch 990/1540] avg loss 0.0106657, throughput 2.80573K wps
[Epoch 5 Batch 1020/1540] avg loss 0.0101842, throughput 2.81487K wps
[Epoch 5 Batch 1050/1540] avg loss 0.0105361, throughput 2.83669K wps
[Epoch 5 Batch 1080/1540] avg loss 0.0102072, throughput 2.85008K wps
[Epoch 5 Batch 1110/1540] avg loss 0.010488, throughput 2.8502K wps
[Epoch 5 Batch 1140/1540] avg loss 0.0103501, throughput 2.79995K wps
[Epoch 5 Batch 1170/1540] avg loss 0.0101048, throughput 2.79695K wps
[Epoch 5 Batch 1200/1540] avg loss 0.0100308, throughput 2.80336K wps
[Epoch 5 Batch 1230/1540] avg loss 0.0103184, throughput 2.83569K wps
[Epoch 5 Batch 1260/1540] avg loss 0.0103771, throughput 2.85997K wps
[Epoch 5 Batch 1290/1540] avg loss 0.0105557, throughput 2.84893K wps
[Epoch 5 Batch 1320/1540] avg loss 0.0102983, throughput 2.83776K wps
[Epoch 5 Batch 1350/1540] avg loss 0.0100994, throughput 2.85406K wps
[Epoch 5 Batch 1380/1540] avg loss 0.0100077, throughput 2.86905K wps
[Epoch 5 Batch 1410/1540] avg loss 0.00984836, throughput 2.8778K wps
[Epoch 5 Batch 1440/1540] avg loss 0.0102557, throughput 2.84244K wps
[Epoch 5 Batch 1470/1540] avg loss 0.00991223, throughput 2.79344K wps
[Epoch 5 Batch 1500/1540] avg loss 0.00954781, throughput 2.77435K wps
[Epoch 5 Batch 1530/1540] avg loss 0.0100746, throughput 2.86985K wps
Begin Testing...
[Epoch 5] train avg loss 0.0106185, dev acc 0.7787, dev avg loss 0.519779, throughput 2.84537K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 6 Batch 30/1540] avg loss 0.0098245, throughput 2.88896K wps
[Epoch 6 Batch 60/1540] avg loss 0.00962132, throughput 2.84569K wps
[Epoch 6 Batch 90/1540] avg loss 0.0100202, throughput 2.84797K wps
[Epoch 6 Batch 120/1540] avg loss 0.00959598, throughput 2.85914K wps
[Epoch 6 Batch 150/1540] avg loss 0.00988194, throughput 2.88108K wps
[Epoch 6 Batch 180/1540] avg loss 0.0096392, throughput 2.85567K wps
[Epoch 6 Batch 210/1540] avg loss 0.00951429, throughput 2.81937K wps
[Epoch 6 Batch 240/1540] avg loss 0.00972758, throughput 2.84235K wps
[Epoch 6 Batch 270/1540] avg loss 0.00959005, throughput 2.81022K wps
[Epoch 6 Batch 300/1540] avg loss 0.00984259, throughput 2.85913K wps
[Epoch 6 Batch 330/1540] avg loss 0.0094906, throughput 2.87195K wps
[Epoch 6 Batch 360/1540] avg loss 0.00961345, throughput 2.83716K wps
[Epoch 6 Batch 390/1540] avg loss 0.0096149, throughput 2.84824K wps
[Epoch 6 Batch 420/1540] avg loss 0.00944561, throughput 2.80174K wps
[Epoch 6 Batch 450/1540] avg loss 0.00941356, throughput 2.78631K wps
[Epoch 6 Batch 480/1540] avg loss 0.00941492, throughput 2.79546K wps
[Epoch 6 Batch 510/1540] avg loss 0.00898626, throughput 2.81181K wps
[Epoch 6 Batch 540/1540] avg loss 0.00908801, throughput 2.86957K wps
[Epoch 6 Batch 570/1540] avg loss 0.00921519, throughput 2.85765K wps
[Epoch 6 Batch 600/1540] avg loss 0.00905014, throughput 2.85542K wps
[Epoch 6 Batch 630/1540] avg loss 0.00946734, throughput 2.86637K wps
[Epoch 6 Batch 660/1540] avg loss 0.00939207, throughput 2.8701K wps
[Epoch 6 Batch 690/1540] avg loss 0.00924371, throughput 2.84635K wps
[Epoch 6 Batch 720/1540] avg loss 0.00938426, throughput 2.86549K wps
[Epoch 6 Batch 750/1540] avg loss 0.00933438, throughput 2.87202K wps
[Epoch 6 Batch 780/1540] avg loss 0.00915644, throughput 2.84552K wps
[Epoch 6 Batch 810/1540] avg loss 0.00931434, throughput 2.87413K wps
[Epoch 6 Batch 840/1540] avg loss 0.00927732, throughput 2.87069K wps
[Epoch 6 Batch 870/1540] avg loss 0.00948765, throughput 2.83603K wps
[Epoch 6 Batch 900/1540] avg loss 0.00919847, throughput 2.81271K wps
[Epoch 6 Batch 930/1540] avg loss 0.00885069, throughput 2.78791K wps
[Epoch 6 Batch 960/1540] avg loss 0.00918094, throughput 2.78323K wps
[Epoch 6 Batch 990/1540] avg loss 0.00878389, throughput 2.76527K wps
[Epoch 6 Batch 1020/1540] avg loss 0.00903069, throughput 2.79435K wps
[Epoch 6 Batch 1050/1540] avg loss 0.00894816, throughput 2.81653K wps
[Epoch 6 Batch 1080/1540] avg loss 0.00915588, throughput 2.76909K wps
[Epoch 6 Batch 1110/1540] avg loss 0.0089336, throughput 2.80962K wps
[Epoch 6 Batch 1140/1540] avg loss 0.00915533, throughput 2.8671K wps
[Epoch 6 Batch 1170/1540] avg loss 0.00866029, throughput 2.85805K wps
[Epoch 6 Batch 1200/1540] avg loss 0.00894738, throughput 2.85913K wps
[Epoch 6 Batch 1230/1540] avg loss 0.00877337, throughput 2.86589K wps
[Epoch 6 Batch 1260/1540] avg loss 0.00891224, throughput 2.80636K wps
[Epoch 6 Batch 1290/1540] avg loss 0.00906667, throughput 2.83283K wps
[Epoch 6 Batch 1320/1540] avg loss 0.00862772, throughput 2.85331K wps
[Epoch 6 Batch 1350/1540] avg loss 0.00891718, throughput 2.81811K wps
[Epoch 6 Batch 1380/1540] avg loss 0.00897235, throughput 2.86313K wps
[Epoch 6 Batch 1410/1540] avg loss 0.00842482, throughput 2.86556K wps
[Epoch 6 Batch 1440/1540] avg loss 0.0087864, throughput 2.86064K wps
[Epoch 6 Batch 1470/1540] avg loss 0.00869268, throughput 2.86509K wps
[Epoch 6 Batch 1500/1540] avg loss 0.00844809, throughput 2.83551K wps
[Epoch 6 Batch 1530/1540] avg loss 0.00866327, throughput 2.80195K wps
Begin Testing...
[Epoch 6] train avg loss 0.00920906, dev acc 0.7970, dev avg loss 0.460756, throughput 2.83874K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 7 Batch 30/1540] avg loss 0.00825991, throughput 2.89203K wps
[Epoch 7 Batch 60/1540] avg loss 0.00881551, throughput 2.825K wps
[Epoch 7 Batch 90/1540] avg loss 0.00833516, throughput 2.85913K wps
[Epoch 7 Batch 120/1540] avg loss 0.00806622, throughput 2.85202K wps
[Epoch 7 Batch 150/1540] avg loss 0.00843859, throughput 2.83694K wps
[Epoch 7 Batch 180/1540] avg loss 0.00834409, throughput 2.84418K wps
[Epoch 7 Batch 210/1540] avg loss 0.00839169, throughput 2.86917K wps
[Epoch 7 Batch 240/1540] avg loss 0.00828628, throughput 2.85639K wps
[Epoch 7 Batch 270/1540] avg loss 0.00829326, throughput 2.84642K wps
[Epoch 7 Batch 300/1540] avg loss 0.00826126, throughput 2.86427K wps
[Epoch 7 Batch 330/1540] avg loss 0.00788044, throughput 2.85554K wps
[Epoch 7 Batch 360/1540] avg loss 0.00816125, throughput 2.8748K wps
[Epoch 7 Batch 390/1540] avg loss 0.00798977, throughput 2.87324K wps
[Epoch 7 Batch 420/1540] avg loss 0.00834245, throughput 2.8599K wps
[Epoch 7 Batch 450/1540] avg loss 0.0081372, throughput 2.87742K wps
[Epoch 7 Batch 480/1540] avg loss 0.00785917, throughput 2.87735K wps
[Epoch 7 Batch 510/1540] avg loss 0.00822373, throughput 2.87238K wps
[Epoch 7 Batch 540/1540] avg loss 0.00817121, throughput 2.80606K wps
[Epoch 7 Batch 570/1540] avg loss 0.00813353, throughput 2.84701K wps
[Epoch 7 Batch 600/1540] avg loss 0.00797193, throughput 2.87978K wps
[Epoch 7 Batch 630/1540] avg loss 0.00791665, throughput 2.87629K wps
[Epoch 7 Batch 660/1540] avg loss 0.0081805, throughput 2.83898K wps
[Epoch 7 Batch 690/1540] avg loss 0.00804365, throughput 2.80083K wps
[Epoch 7 Batch 720/1540] avg loss 0.00820712, throughput 2.78857K wps
[Epoch 7 Batch 750/1540] avg loss 0.00820856, throughput 2.8715K wps
[Epoch 7 Batch 780/1540] avg loss 0.00800739, throughput 2.87654K wps
[Epoch 7 Batch 810/1540] avg loss 0.00797157, throughput 2.87081K wps
[Epoch 7 Batch 840/1540] avg loss 0.0077511, throughput 2.87122K wps
[Epoch 7 Batch 870/1540] avg loss 0.00811865, throughput 2.87788K wps
[Epoch 7 Batch 900/1540] avg loss 0.00765177, throughput 2.84477K wps
[Epoch 7 Batch 930/1540] avg loss 0.00757489, throughput 2.80149K wps
[Epoch 7 Batch 960/1540] avg loss 0.00796375, throughput 2.83747K wps
[Epoch 7 Batch 990/1540] avg loss 0.00761744, throughput 2.86349K wps
[Epoch 7 Batch 1020/1540] avg loss 0.00782431, throughput 2.87655K wps
[Epoch 7 Batch 1050/1540] avg loss 0.00752091, throughput 2.8687K wps
[Epoch 7 Batch 1080/1540] avg loss 0.00769699, throughput 2.85456K wps
[Epoch 7 Batch 1110/1540] avg loss 0.00732822, throughput 2.87136K wps
[Epoch 7 Batch 1140/1540] avg loss 0.00779556, throughput 2.84986K wps
[Epoch 7 Batch 1170/1540] avg loss 0.00795337, throughput 2.84892K wps
[Epoch 7 Batch 1200/1540] avg loss 0.00762326, throughput 2.87471K wps
[Epoch 7 Batch 1230/1540] avg loss 0.00788853, throughput 2.85078K wps
[Epoch 7 Batch 1260/1540] avg loss 0.0078197, throughput 2.83619K wps
[Epoch 7 Batch 1290/1540] avg loss 0.00777646, throughput 2.78379K wps
[Epoch 7 Batch 1320/1540] avg loss 0.00789422, throughput 2.84311K wps
[Epoch 7 Batch 1350/1540] avg loss 0.00731866, throughput 2.86707K wps
[Epoch 7 Batch 1380/1540] avg loss 0.00752436, throughput 2.87333K wps
[Epoch 7 Batch 1410/1540] avg loss 0.00738245, throughput 2.87458K wps
[Epoch 7 Batch 1440/1540] avg loss 0.00764089, throughput 2.84285K wps
[Epoch 7 Batch 1470/1540] avg loss 0.00760671, throughput 2.86642K wps
[Epoch 7 Batch 1500/1540] avg loss 0.00744158, throughput 2.86612K wps
[Epoch 7 Batch 1530/1540] avg loss 0.00759741, throughput 2.87982K wps
Begin Testing...
[Epoch 7] train avg loss 0.00794728, dev acc 0.8005, dev avg loss 0.437719, throughput 2.85496K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 8 Batch 30/1540] avg loss 0.00700426, throughput 2.89692K wps
[Epoch 8 Batch 60/1540] avg loss 0.0071169, throughput 2.87137K wps
[Epoch 8 Batch 90/1540] avg loss 0.00701109, throughput 2.85659K wps
[Epoch 8 Batch 120/1540] avg loss 0.00726235, throughput 2.86725K wps
[Epoch 8 Batch 150/1540] avg loss 0.00723402, throughput 2.83711K wps
[Epoch 8 Batch 180/1540] avg loss 0.00723113, throughput 2.87776K wps
[Epoch 8 Batch 210/1540] avg loss 0.00725831, throughput 2.87068K wps
[Epoch 8 Batch 240/1540] avg loss 0.00715983, throughput 2.86004K wps
[Epoch 8 Batch 270/1540] avg loss 0.00693161, throughput 2.86467K wps
[Epoch 8 Batch 300/1540] avg loss 0.00693836, throughput 2.87248K wps
[Epoch 8 Batch 330/1540] avg loss 0.0071374, throughput 2.86431K wps
[Epoch 8 Batch 360/1540] avg loss 0.0073231, throughput 2.84465K wps
[Epoch 8 Batch 390/1540] avg loss 0.00678718, throughput 2.87325K wps
[Epoch 8 Batch 420/1540] avg loss 0.0070796, throughput 2.8738K wps
[Epoch 8 Batch 450/1540] avg loss 0.00725956, throughput 2.84717K wps
[Epoch 8 Batch 480/1540] avg loss 0.00720291, throughput 2.86924K wps
[Epoch 8 Batch 510/1540] avg loss 0.00717188, throughput 2.87498K wps
[Epoch 8 Batch 540/1540] avg loss 0.00691361, throughput 2.85918K wps
[Epoch 8 Batch 570/1540] avg loss 0.00734546, throughput 2.79205K wps
[Epoch 8 Batch 600/1540] avg loss 0.00704686, throughput 2.78737K wps
[Epoch 8 Batch 630/1540] avg loss 0.0071282, throughput 2.84455K wps
[Epoch 8 Batch 660/1540] avg loss 0.00688982, throughput 2.84754K wps
[Epoch 8 Batch 690/1540] avg loss 0.00730296, throughput 2.87633K wps
[Epoch 8 Batch 720/1540] avg loss 0.00687909, throughput 2.83198K wps
[Epoch 8 Batch 750/1540] avg loss 0.00730986, throughput 2.78896K wps
[Epoch 8 Batch 780/1540] avg loss 0.0070227, throughput 2.81495K wps
[Epoch 8 Batch 810/1540] avg loss 0.00707117, throughput 2.85642K wps
[Epoch 8 Batch 840/1540] avg loss 0.00674418, throughput 2.87021K wps
[Epoch 8 Batch 870/1540] avg loss 0.00707607, throughput 2.87595K wps
[Epoch 8 Batch 900/1540] avg loss 0.00675896, throughput 2.86891K wps
[Epoch 8 Batch 930/1540] avg loss 0.00669084, throughput 2.87824K wps
[Epoch 8 Batch 960/1540] avg loss 0.00637316, throughput 2.86867K wps
[Epoch 8 Batch 990/1540] avg loss 0.00726551, throughput 2.86205K wps
[Epoch 8 Batch 1020/1540] avg loss 0.00734874, throughput 2.78227K wps
[Epoch 8 Batch 1050/1540] avg loss 0.00646123, throughput 2.85748K wps
[Epoch 8 Batch 1080/1540] avg loss 0.0065004, throughput 2.87453K wps
[Epoch 8 Batch 1110/1540] avg loss 0.00704716, throughput 2.87549K wps
[Epoch 8 Batch 1140/1540] avg loss 0.00704922, throughput 2.87203K wps
[Epoch 8 Batch 1170/1540] avg loss 0.00695946, throughput 2.86819K wps
[Epoch 8 Batch 1200/1540] avg loss 0.00747787, throughput 2.85317K wps
[Epoch 8 Batch 1230/1540] avg loss 0.00705022, throughput 2.78019K wps
[Epoch 8 Batch 1260/1540] avg loss 0.00652246, throughput 2.81089K wps
[Epoch 8 Batch 1290/1540] avg loss 0.00654109, throughput 2.80301K wps
[Epoch 8 Batch 1320/1540] avg loss 0.00685872, throughput 2.88059K wps
[Epoch 8 Batch 1350/1540] avg loss 0.0066386, throughput 2.85119K wps
[Epoch 8 Batch 1380/1540] avg loss 0.00672991, throughput 2.87041K wps
[Epoch 8 Batch 1410/1540] avg loss 0.00642001, throughput 2.85193K wps
[Epoch 8 Batch 1440/1540] avg loss 0.00635343, throughput 2.8692K wps
[Epoch 8 Batch 1470/1540] avg loss 0.00715519, throughput 2.87862K wps
[Epoch 8 Batch 1500/1540] avg loss 0.00679608, throughput 2.86928K wps
[Epoch 8 Batch 1530/1540] avg loss 0.00666721, throughput 2.85659K wps
Begin Testing...
[Epoch 8] train avg loss 0.0069764, dev acc 0.8154, dev avg loss 0.419131, throughput 2.85368K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 9 Batch 30/1540] avg loss 0.00619578, throughput 2.88586K wps
[Epoch 9 Batch 60/1540] avg loss 0.00605075, throughput 2.86157K wps
[Epoch 9 Batch 90/1540] avg loss 0.00617871, throughput 2.86141K wps
[Epoch 9 Batch 120/1540] avg loss 0.00646453, throughput 2.8703K wps
[Epoch 9 Batch 150/1540] avg loss 0.00673407, throughput 2.87575K wps
[Epoch 9 Batch 180/1540] avg loss 0.00656423, throughput 2.83105K wps
[Epoch 9 Batch 210/1540] avg loss 0.00645156, throughput 2.87681K wps
[Epoch 9 Batch 240/1540] avg loss 0.00635163, throughput 2.85517K wps
[Epoch 9 Batch 270/1540] avg loss 0.00630716, throughput 2.84929K wps
[Epoch 9 Batch 300/1540] avg loss 0.00643651, throughput 2.84773K wps
[Epoch 9 Batch 330/1540] avg loss 0.00613593, throughput 2.86889K wps
[Epoch 9 Batch 360/1540] avg loss 0.00643492, throughput 2.87121K wps
[Epoch 9 Batch 390/1540] avg loss 0.00636353, throughput 2.80756K wps
[Epoch 9 Batch 420/1540] avg loss 0.0062237, throughput 2.84315K wps
[Epoch 9 Batch 450/1540] avg loss 0.00640715, throughput 2.873K wps
[Epoch 9 Batch 480/1540] avg loss 0.00652341, throughput 2.84392K wps
[Epoch 9 Batch 510/1540] avg loss 0.00642114, throughput 2.86765K wps
[Epoch 9 Batch 540/1540] avg loss 0.00618197, throughput 2.86862K wps
[Epoch 9 Batch 570/1540] avg loss 0.00655068, throughput 2.85865K wps
[Epoch 9 Batch 600/1540] avg loss 0.00626943, throughput 2.84815K wps
[Epoch 9 Batch 630/1540] avg loss 0.00605462, throughput 2.86374K wps
[Epoch 9 Batch 660/1540] avg loss 0.00659492, throughput 2.86841K wps
[Epoch 9 Batch 690/1540] avg loss 0.00614132, throughput 2.85061K wps
[Epoch 9 Batch 720/1540] avg loss 0.00642699, throughput 2.87782K wps
[Epoch 9 Batch 750/1540] avg loss 0.00595195, throughput 2.86046K wps
[Epoch 9 Batch 780/1540] avg loss 0.00627769, throughput 2.82313K wps
[Epoch 9 Batch 810/1540] avg loss 0.00628799, throughput 2.87666K wps
[Epoch 9 Batch 840/1540] avg loss 0.00583507, throughput 2.85026K wps
[Epoch 9 Batch 870/1540] avg loss 0.00636069, throughput 2.8064K wps
[Epoch 9 Batch 900/1540] avg loss 0.00613266, throughput 2.8593K wps
[Epoch 9 Batch 930/1540] avg loss 0.0065748, throughput 2.83976K wps
[Epoch 9 Batch 960/1540] avg loss 0.00600959, throughput 2.86833K wps
[Epoch 9 Batch 990/1540] avg loss 0.00654631, throughput 2.86886K wps
[Epoch 9 Batch 1020/1540] avg loss 0.00608594, throughput 2.85835K wps
[Epoch 9 Batch 1050/1540] avg loss 0.00638485, throughput 2.82088K wps
[Epoch 9 Batch 1080/1540] avg loss 0.00587679, throughput 2.84836K wps
[Epoch 9 Batch 1110/1540] avg loss 0.00592697, throughput 2.84355K wps
[Epoch 9 Batch 1140/1540] avg loss 0.00603676, throughput 2.861K wps
[Epoch 9 Batch 1170/1540] avg loss 0.00620328, throughput 2.80242K wps
[Epoch 9 Batch 1200/1540] avg loss 0.00596717, throughput 2.84354K wps
[Epoch 9 Batch 1230/1540] avg loss 0.00649963, throughput 2.85948K wps
[Epoch 9 Batch 1260/1540] avg loss 0.00661678, throughput 2.85709K wps
[Epoch 9 Batch 1290/1540] avg loss 0.00619676, throughput 2.85552K wps
[Epoch 9 Batch 1320/1540] avg loss 0.00608795, throughput 2.8648K wps
[Epoch 9 Batch 1350/1540] avg loss 0.00603408, throughput 2.86484K wps
[Epoch 9 Batch 1380/1540] avg loss 0.00601827, throughput 2.86878K wps
[Epoch 9 Batch 1410/1540] avg loss 0.00602206, throughput 2.87319K wps
[Epoch 9 Batch 1440/1540] avg loss 0.00625086, throughput 2.87469K wps
[Epoch 9 Batch 1470/1540] avg loss 0.00580507, throughput 2.85548K wps
[Epoch 9 Batch 1500/1540] avg loss 0.00612506, throughput 2.87051K wps
[Epoch 9 Batch 1530/1540] avg loss 0.00595437, throughput 2.82238K wps
Begin Testing...
[Epoch 9] train avg loss 0.00625529, dev acc 0.8085, dev avg loss 0.417866, throughput 2.8552K wps
[Epoch 10 Batch 30/1540] avg loss 0.00560278, throughput 2.86309K wps
[Epoch 10 Batch 60/1540] avg loss 0.00576539, throughput 2.7933K wps
[Epoch 10 Batch 90/1540] avg loss 0.00591446, throughput 2.80802K wps
[Epoch 10 Batch 120/1540] avg loss 0.00560873, throughput 2.81841K wps
[Epoch 10 Batch 150/1540] avg loss 0.00552855, throughput 2.83278K wps
[Epoch 10 Batch 180/1540] avg loss 0.00532143, throughput 2.85648K wps
[Epoch 10 Batch 210/1540] avg loss 0.00561921, throughput 2.85415K wps
[Epoch 10 Batch 240/1540] avg loss 0.00606827, throughput 2.87784K wps
[Epoch 10 Batch 270/1540] avg loss 0.00598494, throughput 2.824K wps
[Epoch 10 Batch 300/1540] avg loss 0.00540564, throughput 2.83927K wps
[Epoch 10 Batch 330/1540] avg loss 0.00539115, throughput 2.8692K wps
[Epoch 10 Batch 360/1540] avg loss 0.00580298, throughput 2.88198K wps
[Epoch 10 Batch 390/1540] avg loss 0.00586173, throughput 2.83944K wps
[Epoch 10 Batch 420/1540] avg loss 0.00576411, throughput 2.85458K wps
[Epoch 10 Batch 450/1540] avg loss 0.00568409, throughput 2.81329K wps
[Epoch 10 Batch 480/1540] avg loss 0.0059675, throughput 2.86029K wps
[Epoch 10 Batch 510/1540] avg loss 0.00548594, throughput 2.86642K wps
[Epoch 10 Batch 540/1540] avg loss 0.00594244, throughput 2.79066K wps
[Epoch 10 Batch 570/1540] avg loss 0.00568997, throughput 2.80611K wps
[Epoch 10 Batch 600/1540] avg loss 0.00594787, throughput 2.85608K wps
[Epoch 10 Batch 630/1540] avg loss 0.00653243, throughput 2.86725K wps
[Epoch 10 Batch 660/1540] avg loss 0.00568763, throughput 2.86858K wps
[Epoch 10 Batch 690/1540] avg loss 0.00546361, throughput 2.87145K wps
[Epoch 10 Batch 720/1540] avg loss 0.0057613, throughput 2.8699K wps
[Epoch 10 Batch 750/1540] avg loss 0.0055321, throughput 2.86078K wps
[Epoch 10 Batch 780/1540] avg loss 0.00585637, throughput 2.84849K wps
[Epoch 10 Batch 810/1540] avg loss 0.00540297, throughput 2.87501K wps
[Epoch 10 Batch 840/1540] avg loss 0.00549143, throughput 2.87859K wps
[Epoch 10 Batch 870/1540] avg loss 0.00587356, throughput 2.85276K wps
[Epoch 10 Batch 900/1540] avg loss 0.00607192, throughput 2.83686K wps
[Epoch 10 Batch 930/1540] avg loss 0.00589352, throughput 2.86395K wps
[Epoch 10 Batch 960/1540] avg loss 0.00573896, throughput 2.87819K wps
[Epoch 10 Batch 990/1540] avg loss 0.0059876, throughput 2.87876K wps
[Epoch 10 Batch 1020/1540] avg loss 0.00566614, throughput 2.8719K wps
[Epoch 10 Batch 1050/1540] avg loss 0.00580399, throughput 2.85761K wps
[Epoch 10 Batch 1080/1540] avg loss 0.00627715, throughput 2.85497K wps
[Epoch 10 Batch 1110/1540] avg loss 0.00526307, throughput 2.8603K wps
[Epoch 10 Batch 1140/1540] avg loss 0.00545778, throughput 2.8599K wps
[Epoch 10 Batch 1170/1540] avg loss 0.00545641, throughput 2.87173K wps
[Epoch 10 Batch 1200/1540] avg loss 0.00546712, throughput 2.87661K wps
[Epoch 10 Batch 1230/1540] avg loss 0.00576475, throughput 2.84398K wps
[Epoch 10 Batch 1260/1540] avg loss 0.00554865, throughput 2.84947K wps
[Epoch 10 Batch 1290/1540] avg loss 0.00613281, throughput 2.86158K wps
[Epoch 10 Batch 1320/1540] avg loss 0.00559378, throughput 2.85915K wps
[Epoch 10 Batch 1350/1540] avg loss 0.00519544, throughput 2.8594K wps
[Epoch 10 Batch 1380/1540] avg loss 0.00518721, throughput 2.85886K wps
[Epoch 10 Batch 1410/1540] avg loss 0.00532817, throughput 2.86864K wps
[Epoch 10 Batch 1440/1540] avg loss 0.00539702, throughput 2.87141K wps
[Epoch 10 Batch 1470/1540] avg loss 0.00548314, throughput 2.86375K wps
[Epoch 10 Batch 1500/1540] avg loss 0.00576672, throughput 2.85622K wps
[Epoch 10 Batch 1530/1540] avg loss 0.00564908, throughput 2.85837K wps
Begin Testing...
[Epoch 10] train avg loss 0.00568935, dev acc 0.8268, dev avg loss 0.417476, throughput 2.85396K wps
Observed Improvement.
Begin Testing...
[Batch 30/37] elapsed 0.12 s
[Epoch 11 Batch 30/1540] avg loss 0.00529954, throughput 2.92189K wps
[Epoch 11 Batch 60/1540] avg loss 0.00544322, throughput 2.86505K wps
[Epoch 11 Batch 90/1540] avg loss 0.00502778, throughput 2.8135K wps
[Epoch 11 Batch 120/1540] avg loss 0.0053774, throughput 2.86688K wps
[Epoch 11 Batch 150/1540] avg loss 0.0054993, throughput 2.85092K wps
[Epoch 11 Batch 180/1540] avg loss 0.00511095, throughput 2.87244K wps
[Epoch 11 Batch 210/1540] avg loss 0.00577389, throughput 2.82656K wps
[Epoch 11 Batch 240/1540] avg loss 0.00503056, throughput 2.87752K wps
[Epoch 11 Batch 270/1540] avg loss 0.00527163, throughput 2.87603K wps
[Epoch 11 Batch 300/1540] avg loss 0.00486225, throughput 2.86389K wps
[Epoch 11 Batch 330/1540] avg loss 0.0053856, throughput 2.83982K wps
[Epoch 11 Batch 360/1540] avg loss 0.00505071, throughput 2.87728K wps
[Epoch 11 Batch 390/1540] avg loss 0.00544922, throughput 2.87923K wps
[Epoch 11 Batch 420/1540] avg loss 0.00586109, throughput 2.85331K wps
[Epoch 11 Batch 450/1540] avg loss 0.00548735, throughput 2.84435K wps
[Epoch 11 Batch 480/1540] avg loss 0.00539436, throughput 2.81618K wps
[Epoch 11 Batch 510/1540] avg loss 0.00549712, throughput 2.80748K wps
[Epoch 11 Batch 540/1540] avg loss 0.00525779, throughput 2.82319K wps
[Epoch 11 Batch 570/1540] avg loss 0.0049041, throughput 2.80543K wps
[Epoch 11 Batch 600/1540] avg loss 0.00467523, throughput 2.8282K wps
[Epoch 11 Batch 630/1540] avg loss 0.00520027, throughput 2.86484K wps
[Epoch 11 Batch 660/1540] avg loss 0.00535244, throughput 2.87777K wps
[Epoch 11 Batch 690/1540] avg loss 0.0050197, throughput 2.84967K wps
[Epoch 11 Batch 720/1540] avg loss 0.00491594, throughput 2.87223K wps
[Epoch 11 Batch 750/1540] avg loss 0.00531045, throughput 2.82622K wps
[Epoch 11 Batch 780/1540] avg loss 0.00504664, throughput 2.8625K wps
[Epoch 11 Batch 810/1540] avg loss 0.00499072, throughput 2.84293K wps
[Epoch 11 Batch 840/1540] avg loss 0.00484892, throughput 2.84938K wps
[Epoch 11 Batch 870/1540] avg loss 0.00493278, throughput 2.86382K wps
[Epoch 11 Batch 900/1540] avg loss 0.00548666, throughput 2.86781K wps
[Epoch 11 Batch 930/1540] avg loss 0.0053415, throughput 2.86029K wps
[Epoch 11 Batch 960/1540] avg loss 0.00524888, throughput 2.81201K wps
[Epoch 11 Batch 990/1540] avg loss 0.00536742, throughput 2.85772K wps
[Epoch 11 Batch 1020/1540] avg loss 0.00525738, throughput 2.82844K wps
[Epoch 11 Batch 1050/1540] avg loss 0.00481887, throughput 2.87807K wps
[Epoch 11 Batch 1080/1540] avg loss 0.00501868, throughput 2.80777K wps
[Epoch 11 Batch 1110/1540] avg loss 0.00503706, throughput 2.85932K wps
[Epoch 11 Batch 1140/1540] avg loss 0.00531201, throughput 2.86339K wps
[Epoch 11 Batch 1170/1540] avg loss 0.00537489, throughput 2.83131K wps
[Epoch 11 Batch 1200/1540] avg loss 0.00524029, throughput 2.788K wps
[Epoch 11 Batch 1230/1540] avg loss 0.00489996, throughput 2.82875K wps
[Epoch 11 Batch 1260/1540] avg loss 0.00506119, throughput 2.80727K wps
[Epoch 11 Batch 1290/1540] avg loss 0.00501068, throughput 2.80065K wps
[Epoch 11 Batch 1320/1540] avg loss 0.00530482, throughput 2.87433K wps
[Epoch 11 Batch 1350/1540] avg loss 0.00560463, throughput 2.85454K wps
[Epoch 11 Batch 1380/1540] avg loss 0.0053076, throughput 2.85734K wps
[Epoch 11 Batch 1410/1540] avg loss 0.00528334, throughput 2.86835K wps
[Epoch 11 Batch 1440/1540] avg loss 0.00551849, throughput 2.87039K wps
[Epoch 11 Batch 1470/1540] avg loss 0.0052075, throughput 2.86872K wps
[Epoch 11 Batch 1500/1540] avg loss 0.00519176, throughput 2.86183K wps
[Epoch 11 Batch 1530/1540] avg loss 0.00525325, throughput 2.86227K wps
Begin Testing...
[Epoch 11] train avg loss 0.00522733, dev acc 0.8154, dev avg loss 0.41487, throughput 2.84941K wps
[Epoch 12 Batch 30/1540] avg loss 0.00459408, throughput 2.85194K wps
[Epoch 12 Batch 60/1540] avg loss 0.00491037, throughput 2.86956K wps
[Epoch 12 Batch 90/1540] avg loss 0.00524657, throughput 2.8686K wps
[Epoch 12 Batch 120/1540] avg loss 0.00539493, throughput 2.85217K wps
[Epoch 12 Batch 150/1540] avg loss 0.00507731, throughput 2.8672K wps
[Epoch 12 Batch 180/1540] avg loss 0.00448498, throughput 2.86041K wps
[Epoch 12 Batch 210/1540] avg loss 0.00459914, throughput 2.85766K wps
[Epoch 12 Batch 240/1540] avg loss 0.00471152, throughput 2.7843K wps
[Epoch 12 Batch 270/1540] avg loss 0.00501468, throughput 2.79973K wps
[Epoch 12 Batch 300/1540] avg loss 0.00484297, throughput 2.87177K wps
[Epoch 12 Batch 330/1540] avg loss 0.00501018, throughput 2.79047K wps
[Epoch 12 Batch 360/1540] avg loss 0.00471138, throughput 2.87457K wps
[Epoch 12 Batch 390/1540] avg loss 0.0049432, throughput 2.87765K wps
[Epoch 12 Batch 420/1540] avg loss 0.00490452, throughput 2.87606K wps
[Epoch 12 Batch 450/1540] avg loss 0.00516997, throughput 2.84712K wps
[Epoch 12 Batch 480/1540] avg loss 0.00509852, throughput 2.84603K wps
[Epoch 12 Batch 510/1540] avg loss 0.00436913, throughput 2.82328K wps
[Epoch 12 Batch 540/1540] avg loss 0.00468779, throughput 2.8765K wps
[Epoch 12 Batch 570/1540] avg loss 0.00502387, throughput 2.83461K wps
[Epoch 12 Batch 600/1540] avg loss 0.0047895, throughput 2.79215K wps
[Epoch 12 Batch 630/1540] avg loss 0.00513222, throughput 2.85132K wps
[Epoch 12 Batch 660/1540] avg loss 0.00509643, throughput 2.84477K wps
[Epoch 12 Batch 690/1540] avg loss 0.00473696, throughput 2.8722K wps
[Epoch 12 Batch 720/1540] avg loss 0.00505605, throughput 2.81686K wps
[Epoch 12 Batch 750/1540] avg loss 0.00457419, throughput 2.87451K wps
[Epoch 12 Batch 780/1540] avg loss 0.0046172, throughput 2.85142K wps
[Epoch 12 Batch 810/1540] avg loss 0.00483807, throughput 2.8754K wps
[Epoch 12 Batch 840/1540] avg loss 0.00514774, throughput 2.82651K wps
[Epoch 12 Batch 870/1540] avg loss 0.00531144, throughput 2.85358K wps
[Epoch 12 Batch 900/1540] avg loss 0.00478178, throughput 2.83389K wps
[Epoch 12 Batch 930/1540] avg loss 0.00471039, throughput 2.87719K wps
[Epoch 12 Batch 960/1540] avg loss 0.00496347, throughput 2.83385K wps
[Epoch 12 Batch 990/1540] avg loss 0.0050525, throughput 2.79573K wps
[Epoch 12 Batch 1020/1540] avg loss 0.00468869, throughput 2.82976K wps
[Epoch 12 Batch 1050/1540] avg loss 0.00454686, throughput 2.87201K wps
[Epoch 12 Batch 1080/1540] avg loss 0.00469205, throughput 2.87248K wps
[Epoch 12 Batch 1110/1540] avg loss 0.00497689, throughput 2.87541K wps
[Epoch 12 Batch 1140/1540] avg loss 0.00445265, throughput 2.86966K wps
[Epoch 12 Batch 1170/1540] avg loss 0.00505778, throughput 2.8492K wps
[Epoch 12 Batch 1200/1540] avg loss 0.00482485, throughput 2.87481K wps
[Epoch 12 Batch 1230/1540] avg loss 0.00504717, throughput 2.81775K wps
[Epoch 12 Batch 1260/1540] avg loss 0.00473321, throughput 2.82531K wps
[Epoch 12 Batch 1290/1540] avg loss 0.00507508, throughput 2.87582K wps
[Epoch 12 Batch 1320/1540] avg loss 0.00442842, throughput 2.87338K wps
[Epoch 12 Batch 1350/1540] avg loss 0.00489596, throughput 2.86631K wps
[Epoch 12 Batch 1380/1540] avg loss 0.0044795, throughput 2.88059K wps
[Epoch 12 Batch 1410/1540] avg loss 0.00467905, throughput 2.85294K wps
[Epoch 12 Batch 1440/1540] avg loss 0.00440097, throughput 2.7771K wps
[Epoch 12 Batch 1470/1540] avg loss 0.00455162, throughput 2.81136K wps
[Epoch 12 Batch 1500/1540] avg loss 0.00475475, throughput 2.78996K wps
[Epoch 12 Batch 1530/1540] avg loss 0.00497422, throughput 2.84052K wps
Begin Testing...
[Epoch 12] train avg loss 0.00483987, dev acc 0.8200, dev avg loss 0.43082, throughput 2.84601K wps
[Epoch 13 Batch 30/1540] avg loss 0.00446236, throughput 2.83165K wps
[Epoch 13 Batch 60/1540] avg loss 0.00474609, throughput 2.7882K wps
[Epoch 13 Batch 90/1540] avg loss 0.00481771, throughput 2.78453K wps
[Epoch 13 Batch 120/1540] avg loss 0.00455366, throughput 2.82144K wps
[Epoch 13 Batch 150/1540] avg loss 0.00468424, throughput 2.82126K wps
[Epoch 13 Batch 180/1540] avg loss 0.00467242, throughput 2.82396K wps
[Epoch 13 Batch 210/1540] avg loss 0.00463527, throughput 2.81668K wps
[Epoch 13 Batch 240/1540] avg loss 0.00451349, throughput 2.8703K wps
[Epoch 13 Batch 270/1540] avg loss 0.00432513, throughput 2.82377K wps
[Epoch 13 Batch 300/1540] avg loss 0.00436745, throughput 2.84135K wps
[Epoch 13 Batch 330/1540] avg loss 0.00443646, throughput 2.86605K wps
[Epoch 13 Batch 360/1540] avg loss 0.0045516, throughput 2.87585K wps
[Epoch 13 Batch 390/1540] avg loss 0.00417121, throughput 2.83353K wps
[Epoch 13 Batch 420/1540] avg loss 0.00457967, throughput 2.83476K wps
[Epoch 13 Batch 450/1540] avg loss 0.00449326, throughput 2.86706K wps
[Epoch 13 Batch 480/1540] avg loss 0.00426371, throughput 2.86733K wps
[Epoch 13 Batch 510/1540] avg loss 0.00439008, throughput 2.88155K wps
[Epoch 13 Batch 540/1540] avg loss 0.00452512, throughput 2.82744K wps
[Epoch 13 Batch 570/1540] avg loss 0.00425974, throughput 2.87104K wps
[Epoch 13 Batch 600/1540] avg loss 0.00416599, throughput 2.87146K wps
[Epoch 13 Batch 630/1540] avg loss 0.00469689, throughput 2.87303K wps
[Epoch 13 Batch 660/1540] avg loss 0.00448656, throughput 2.87424K wps
[Epoch 13 Batch 690/1540] avg loss 0.00439577, throughput 2.8576K wps
[Epoch 13 Batch 720/1540] avg loss 0.00462097, throughput 2.8352K wps
[Epoch 13 Batch 750/1540] avg loss 0.00449958, throughput 2.86557K wps
[Epoch 13 Batch 780/1540] avg loss 0.00453113, throughput 2.79336K wps
[Epoch 13 Batch 810/1540] avg loss 0.00434875, throughput 2.84386K wps
[Epoch 13 Batch 840/1540] avg loss 0.00457832, throughput 2.87667K wps
[Epoch 13 Batch 870/1540] avg loss 0.00500019, throughput 2.85573K wps
[Epoch 13 Batch 900/1540] avg loss 0.00473433, throughput 2.85732K wps
[Epoch 13 Batch 930/1540] avg loss 0.00463568, throughput 2.87828K wps
[Epoch 13 Batch 960/1540] avg loss 0.00428502, throughput 2.85308K wps
[Epoch 13 Batch 990/1540] avg loss 0.0043615, throughput 2.82854K wps
[Epoch 13 Batch 1020/1540] avg loss 0.00450924, throughput 2.87002K wps
[Epoch 13 Batch 1050/1540] avg loss 0.00436647, throughput 2.8819K wps
[Epoch 13 Batch 1080/1540] avg loss 0.00451411, throughput 2.87603K wps
[Epoch 13 Batch 1110/1540] avg loss 0.00448811, throughput 2.84644K wps
[Epoch 13 Batch 1140/1540] avg loss 0.00484055, throughput 2.87198K wps
[Epoch 13 Batch 1170/1540] avg loss 0.0047167, throughput 2.87767K wps
[Epoch 13 Batch 1200/1540] avg loss 0.00501602, throughput 2.88106K wps
[Epoch 13 Batch 1230/1540] avg loss 0.00452643, throughput 2.87166K wps
[Epoch 13 Batch 1260/1540] avg loss 0.00454048, throughput 2.87546K wps
[Epoch 13 Batch 1290/1540] avg loss 0.00456253, throughput 2.88161K wps
[Epoch 13 Batch 1320/1540] avg loss 0.00476569, throughput 2.8565K wps
[Epoch 13 Batch 1350/1540] avg loss 0.00418505, throughput 2.85399K wps
[Epoch 13 Batch 1380/1540] avg loss 0.00445918, throughput 2.87694K wps
[Epoch 13 Batch 1410/1540] avg loss 0.00435116, throughput 2.87921K wps
[Epoch 13 Batch 1440/1540] avg loss 0.00479225, throughput 2.86933K wps
[Epoch 13 Batch 1470/1540] avg loss 0.00425588, throughput 2.8674K wps
[Epoch 13 Batch 1500/1540] avg loss 0.00433084, throughput 2.87503K wps
[Epoch 13 Batch 1530/1540] avg loss 0.00457583, throughput 2.87123K wps
Begin Testing...
[Epoch 13] train avg loss 0.00452305, dev acc 0.8131, dev avg loss 0.419574, throughput 2.85466K wps
[Epoch 14 Batch 30/1540] avg loss 0.00461691, throughput 2.90346K wps
[Epoch 14 Batch 60/1540] avg loss 0.00409082, throughput 2.86606K wps
[Epoch 14 Batch 90/1540] avg loss 0.00411367, throughput 2.87014K wps
[Epoch 14 Batch 120/1540] avg loss 0.00441572, throughput 2.84007K wps
[Epoch 14 Batch 150/1540] avg loss 0.00415351, throughput 2.85366K wps
[Epoch 14 Batch 180/1540] avg loss 0.004476, throughput 2.84757K wps
[Epoch 14 Batch 210/1540] avg loss 0.00440516, throughput 2.87513K wps
[Epoch 14 Batch 240/1540] avg loss 0.00403609, throughput 2.85843K wps
[Epoch 14 Batch 270/1540] avg loss 0.00436929, throughput 2.84186K wps
[Epoch 14 Batch 300/1540] avg loss 0.00429113, throughput 2.87756K wps
[Epoch 14 Batch 330/1540] avg loss 0.00443148, throughput 2.85188K wps
[Epoch 14 Batch 360/1540] avg loss 0.0043789, throughput 2.8415K wps
[Epoch 14 Batch 390/1540] avg loss 0.00430021, throughput 2.8736K wps
[Epoch 14 Batch 420/1540] avg loss 0.00399232, throughput 2.87152K wps
[Epoch 14 Batch 450/1540] avg loss 0.00451988, throughput 2.82441K wps
[Epoch 14 Batch 480/1540] avg loss 0.00399028, throughput 2.82411K wps
[Epoch 14 Batch 510/1540] avg loss 0.00442665, throughput 2.86096K wps
[Epoch 14 Batch 540/1540] avg loss 0.00434507, throughput 2.87596K wps
[Epoch 14 Batch 570/1540] avg loss 0.00463962, throughput 2.87929K wps
[Epoch 14 Batch 600/1540] avg loss 0.00411575, throughput 2.88231K wps
[Epoch 14 Batch 630/1540] avg loss 0.00417867, throughput 2.87889K wps
[Epoch 14 Batch 660/1540] avg loss 0.0045338, throughput 2.8693K wps
[Epoch 14 Batch 690/1540] avg loss 0.00399325, throughput 2.86181K wps
[Epoch 14 Batch 720/1540] avg loss 0.00423137, throughput 2.87301K wps
[Epoch 14 Batch 750/1540] avg loss 0.00413027, throughput 2.86344K wps
[Epoch 14 Batch 780/1540] avg loss 0.00404059, throughput 2.87574K wps
[Epoch 14 Batch 810/1540] avg loss 0.00409784, throughput 2.87673K wps
[Epoch 14 Batch 840/1540] avg loss 0.00402593, throughput 2.87102K wps
[Epoch 14 Batch 870/1540] avg loss 0.00402846, throughput 2.87835K wps
[Epoch 14 Batch 900/1540] avg loss 0.00413051, throughput 2.86132K wps
[Epoch 14 Batch 930/1540] avg loss 0.00425607, throughput 2.8608K wps
[Epoch 14 Batch 960/1540] avg loss 0.00422695, throughput 2.84248K wps
[Epoch 14 Batch 990/1540] avg loss 0.00441385, throughput 2.84117K wps
[Epoch 14 Batch 1020/1540] avg loss 0.00415427, throughput 2.87268K wps
[Epoch 14 Batch 1050/1540] avg loss 0.00376706, throughput 2.82777K wps
[Epoch 14 Batch 1080/1540] avg loss 0.00476164, throughput 2.87051K wps
[Epoch 14 Batch 1110/1540] avg loss 0.00441269, throughput 2.8621K wps
[Epoch 14 Batch 1140/1540] avg loss 0.00449428, throughput 2.80276K wps
[Epoch 14 Batch 1170/1540] avg loss 0.00418358, throughput 2.84116K wps
[Epoch 14 Batch 1200/1540] avg loss 0.00422666, throughput 2.87902K wps
[Epoch 14 Batch 1230/1540] avg loss 0.0043409, throughput 2.86527K wps
[Epoch 14 Batch 1260/1540] avg loss 0.00404391, throughput 2.82409K wps
[Epoch 14 Batch 1290/1540] avg loss 0.00451377, throughput 2.86621K wps
[Epoch 14 Batch 1320/1540] avg loss 0.0042835, throughput 2.83833K wps
[Epoch 14 Batch 1350/1540] avg loss 0.00434493, throughput 2.80139K wps
[Epoch 14 Batch 1380/1540] avg loss 0.00438983, throughput 2.8662K wps
[Epoch 14 Batch 1410/1540] avg loss 0.00423225, throughput 2.85795K wps
[Epoch 14 Batch 1440/1540] avg loss 0.00471089, throughput 2.81824K wps
[Epoch 14 Batch 1470/1540] avg loss 0.0042575, throughput 2.84146K wps
[Epoch 14 Batch 1500/1540] avg loss 0.00381145, throughput 2.87356K wps
[Epoch 14 Batch 1530/1540] avg loss 0.00411584, throughput 2.85227K wps
Begin Testing...
[Epoch 14] train avg loss 0.00426876, dev acc 0.8119, dev avg loss 0.431456, throughput 2.85742K wps
[Epoch 15 Batch 30/1540] avg loss 0.00407376, throughput 2.91374K wps
[Epoch 15 Batch 60/1540] avg loss 0.00401679, throughput 2.86123K wps
[Epoch 15 Batch 90/1540] avg loss 0.00392947, throughput 2.83942K wps
[Epoch 15 Batch 120/1540] avg loss 0.00422627, throughput 2.81369K wps
[Epoch 15 Batch 150/1540] avg loss 0.00453781, throughput 2.79111K wps
[Epoch 15 Batch 180/1540] avg loss 0.00383111, throughput 2.79639K wps
[Epoch 15 Batch 210/1540] avg loss 0.00426815, throughput 2.81802K wps
[Epoch 15 Batch 240/1540] avg loss 0.00367191, throughput 2.86546K wps
[Epoch 15 Batch 270/1540] avg loss 0.00390078, throughput 2.87507K wps
[Epoch 15 Batch 300/1540] avg loss 0.00391309, throughput 2.83758K wps
[Epoch 15 Batch 330/1540] avg loss 0.00372025, throughput 2.85992K wps
[Epoch 15 Batch 360/1540] avg loss 0.00388494, throughput 2.78417K wps
[Epoch 15 Batch 390/1540] avg loss 0.00373275, throughput 2.81732K wps
[Epoch 15 Batch 420/1540] avg loss 0.00396817, throughput 2.82379K wps
[Epoch 15 Batch 450/1540] avg loss 0.00436736, throughput 2.84744K wps
[Epoch 15 Batch 480/1540] avg loss 0.00373357, throughput 2.86786K wps
[Epoch 15 Batch 510/1540] avg loss 0.00368596, throughput 2.85527K wps
[Epoch 15 Batch 540/1540] avg loss 0.00371191, throughput 2.87065K wps
[Epoch 15 Batch 570/1540] avg loss 0.00422839, throughput 2.86757K wps
[Epoch 15 Batch 600/1540] avg loss 0.00409697, throughput 2.86764K wps
[Epoch 15 Batch 630/1540] avg loss 0.00381312, throughput 2.82558K wps
[Epoch 15 Batch 660/1540] avg loss 0.00431842, throughput 2.86827K wps
[Epoch 15 Batch 690/1540] avg loss 0.0037886, throughput 2.84364K wps
[Epoch 15 Batch 720/1540] avg loss 0.00409405, throughput 2.7893K wps
[Epoch 15 Batch 750/1540] avg loss 0.00368649, throughput 2.84533K wps
[Epoch 15 Batch 780/1540] avg loss 0.00435725, throughput 2.88253K wps
[Epoch 15 Batch 810/1540] avg loss 0.0039976, throughput 2.86638K wps
[Epoch 15 Batch 840/1540] avg loss 0.00391333, throughput 2.83815K wps
[Epoch 15 Batch 870/1540] avg loss 0.00444651, throughput 2.87223K wps
[Epoch 15 Batch 900/1540] avg loss 0.00426778, throughput 2.87275K wps
[Epoch 15 Batch 930/1540] avg loss 0.00473772, throughput 2.87182K wps
[Epoch 15 Batch 960/1540] avg loss 0.00413148, throughput 2.86173K wps
[Epoch 15 Batch 990/1540] avg loss 0.00388711, throughput 2.87103K wps
[Epoch 15 Batch 1020/1540] avg loss 0.00363337, throughput 2.8719K wps
[Epoch 15 Batch 1050/1540] avg loss 0.00394424, throughput 2.86909K wps
[Epoch 15 Batch 1080/1540] avg loss 0.00408659, throughput 2.86277K wps
[Epoch 15 Batch 1110/1540] avg loss 0.0036926, throughput 2.86438K wps
[Epoch 15 Batch 1140/1540] avg loss 0.00416388, throughput 2.85054K wps
[Epoch 15 Batch 1170/1540] avg loss 0.00405194, throughput 2.85722K wps
[Epoch 15 Batch 1200/1540] avg loss 0.00350064, throughput 2.87805K wps
[Epoch 15 Batch 1230/1540] avg loss 0.00421358, throughput 2.86051K wps
[Epoch 15 Batch 1260/1540] avg loss 0.00361744, throughput 2.87947K wps
[Epoch 15 Batch 1290/1540] avg loss 0.00420992, throughput 2.88045K wps
[Epoch 15 Batch 1320/1540] avg loss 0.00381196, throughput 2.85958K wps
[Epoch 15 Batch 1350/1540] avg loss 0.00409585, throughput 2.8782K wps
[Epoch 15 Batch 1380/1540] avg loss 0.00405585, throughput 2.87984K wps
[Epoch 15 Batch 1410/1540] avg loss 0.00439197, throughput 2.82591K wps
[Epoch 15 Batch 1440/1540] avg loss 0.00391647, throughput 2.82319K wps
[Epoch 15 Batch 1470/1540] avg loss 0.00429827, throughput 2.87291K wps
[Epoch 15 Batch 1500/1540] avg loss 0.00439724, throughput 2.88058K wps
[Epoch 15 Batch 1530/1540] avg loss 0.00413388, throughput 2.86962K wps
Begin Testing...
[Epoch 15] train avg loss 0.00402739, dev acc 0.8177, dev avg loss 0.427729, throughput 2.85364K wps
[Epoch 16 Batch 30/1540] avg loss 0.00364828, throughput 2.91324K wps
[Epoch 16 Batch 60/1540] avg loss 0.00346847, throughput 2.85303K wps
[Epoch 16 Batch 90/1540] avg loss 0.00373049, throughput 2.87388K wps
[Epoch 16 Batch 120/1540] avg loss 0.00393828, throughput 2.87238K wps
[Epoch 16 Batch 150/1540] avg loss 0.0035373, throughput 2.87733K wps
[Epoch 16 Batch 180/1540] avg loss 0.00401865, throughput 2.83449K wps
[Epoch 16 Batch 210/1540] avg loss 0.00394978, throughput 2.78224K wps
[Epoch 16 Batch 240/1540] avg loss 0.00388588, throughput 2.85971K wps
[Epoch 16 Batch 270/1540] avg loss 0.00357529, throughput 2.85854K wps
[Epoch 16 Batch 300/1540] avg loss 0.00413567, throughput 2.87997K wps
[Epoch 16 Batch 330/1540] avg loss 0.00358932, throughput 2.86505K wps
[Epoch 16 Batch 360/1540] avg loss 0.00350209, throughput 2.83764K wps
[Epoch 16 Batch 390/1540] avg loss 0.00366215, throughput 2.87282K wps
[Epoch 16 Batch 420/1540] avg loss 0.00393011, throughput 2.8481K wps
[Epoch 16 Batch 450/1540] avg loss 0.00378705, throughput 2.87037K wps
[Epoch 16 Batch 480/1540] avg loss 0.00404772, throughput 2.86988K wps
[Epoch 16 Batch 510/1540] avg loss 0.00393417, throughput 2.87401K wps
[Epoch 16 Batch 540/1540] avg loss 0.00400132, throughput 2.82079K wps
[Epoch 16 Batch 570/1540] avg loss 0.00382737, throughput 2.84271K wps
[Epoch 16 Batch 600/1540] avg loss 0.00365693, throughput 2.86856K wps
[Epoch 16 Batch 630/1540] avg loss 0.00367696, throughput 2.87466K wps
[Epoch 16 Batch 660/1540] avg loss 0.00396527, throughput 2.85173K wps
[Epoch 16 Batch 690/1540] avg loss 0.00398598, throughput 2.83263K wps
[Epoch 16 Batch 720/1540] avg loss 0.00357823, throughput 2.8294K wps
[Epoch 16 Batch 750/1540] avg loss 0.00361394, throughput 2.86624K wps
[Epoch 16 Batch 780/1540] avg loss 0.00360104, throughput 2.8646K wps
[Epoch 16 Batch 810/1540] avg loss 0.00364887, throughput 2.83687K wps
[Epoch 16 Batch 840/1540] avg loss 0.00380038, throughput 2.83233K wps
[Epoch 16 Batch 870/1540] avg loss 0.00367782, throughput 2.83244K wps
[Epoch 16 Batch 900/1540] avg loss 0.00383111, throughput 2.85035K wps
[Epoch 16 Batch 930/1540] avg loss 0.00394563, throughput 2.87646K wps
[Epoch 16 Batch 960/1540] avg loss 0.00400893, throughput 2.83558K wps
[Epoch 16 Batch 990/1540] avg loss 0.00376215, throughput 2.79682K wps
[Epoch 16 Batch 1020/1540] avg loss 0.00371723, throughput 2.79242K wps
[Epoch 16 Batch 1050/1540] avg loss 0.00377845, throughput 2.87491K wps
[Epoch 16 Batch 1080/1540] avg loss 0.0040392, throughput 2.84615K wps
[Epoch 16 Batch 1110/1540] avg loss 0.00410519, throughput 2.79365K wps
[Epoch 16 Batch 1140/1540] avg loss 0.00410984, throughput 2.84937K wps
[Epoch 16 Batch 1170/1540] avg loss 0.00360057, throughput 2.88165K wps
[Epoch 16 Batch 1200/1540] avg loss 0.00419759, throughput 2.87876K wps
[Epoch 16 Batch 1230/1540] avg loss 0.0038205, throughput 2.85643K wps
[Epoch 16 Batch 1260/1540] avg loss 0.00364247, throughput 2.87129K wps
[Epoch 16 Batch 1290/1540] avg loss 0.00401488, throughput 2.87944K wps
[Epoch 16 Batch 1320/1540] avg loss 0.00436021, throughput 2.84787K wps
[Epoch 16 Batch 1350/1540] avg loss 0.00362354, throughput 2.87069K wps
[Epoch 16 Batch 1380/1540] avg loss 0.0036643, throughput 2.86974K wps
[Epoch 16 Batch 1410/1540] avg loss 0.00433668, throughput 2.87245K wps
[Epoch 16 Batch 1440/1540] avg loss 0.00382339, throughput 2.87811K wps
[Epoch 16 Batch 1470/1540] avg loss 0.0038438, throughput 2.86514K wps
[Epoch 16 Batch 1500/1540] avg loss 0.0040651, throughput 2.8789K wps
[Epoch 16 Batch 1530/1540] avg loss 0.00382273, throughput 2.86611K wps
Begin Testing...
[Epoch 16] train avg loss 0.00383502, dev acc 0.8234, dev avg loss 0.435514, throughput 2.85508K wps
[Epoch 17 Batch 30/1540] avg loss 0.00319087, throughput 2.92501K wps
[Epoch 17 Batch 60/1540] avg loss 0.00332997, throughput 2.79723K wps
[Epoch 17 Batch 90/1540] avg loss 0.00377095, throughput 2.86783K wps
[Epoch 17 Batch 120/1540] avg loss 0.00342876, throughput 2.87935K wps
[Epoch 17 Batch 150/1540] avg loss 0.00361603, throughput 2.86129K wps
[Epoch 17 Batch 180/1540] avg loss 0.00398759, throughput 2.81541K wps
[Epoch 17 Batch 210/1540] avg loss 0.00365921, throughput 2.86253K wps
[Epoch 17 Batch 240/1540] avg loss 0.00333436, throughput 2.8684K wps
[Epoch 17 Batch 270/1540] avg loss 0.00319246, throughput 2.85613K wps
[Epoch 17 Batch 300/1540] avg loss 0.00355645, throughput 2.86481K wps
[Epoch 17 Batch 330/1540] avg loss 0.00378092, throughput 2.8106K wps
[Epoch 17 Batch 360/1540] avg loss 0.00339321, throughput 2.86744K wps
[Epoch 17 Batch 390/1540] avg loss 0.00355143, throughput 2.87489K wps
[Epoch 17 Batch 420/1540] avg loss 0.00338337, throughput 2.84091K wps
[Epoch 17 Batch 450/1540] avg loss 0.0034537, throughput 2.87476K wps
[Epoch 17 Batch 480/1540] avg loss 0.00350938, throughput 2.87454K wps
[Epoch 17 Batch 510/1540] avg loss 0.00372047, throughput 2.87112K wps
[Epoch 17 Batch 540/1540] avg loss 0.00341108, throughput 2.8482K wps
[Epoch 17 Batch 570/1540] avg loss 0.0034723, throughput 2.8272K wps
[Epoch 17 Batch 600/1540] avg loss 0.00364292, throughput 2.82127K wps
[Epoch 17 Batch 630/1540] avg loss 0.00350363, throughput 2.86217K wps
[Epoch 17 Batch 660/1540] avg loss 0.00402372, throughput 2.84594K wps
[Epoch 17 Batch 690/1540] avg loss 0.00364954, throughput 2.83362K wps
[Epoch 17 Batch 720/1540] avg loss 0.00374546, throughput 2.79015K wps
[Epoch 17 Batch 750/1540] avg loss 0.00352202, throughput 2.83113K wps
[Epoch 17 Batch 780/1540] avg loss 0.00363937, throughput 2.8698K wps
[Epoch 17 Batch 810/1540] avg loss 0.00400565, throughput 2.87314K wps
[Epoch 17 Batch 840/1540] avg loss 0.00355017, throughput 2.87075K wps
[Epoch 17 Batch 870/1540] avg loss 0.00372163, throughput 2.82721K wps
[Epoch 17 Batch 900/1540] avg loss 0.00393952, throughput 2.79891K wps
[Epoch 17 Batch 930/1540] avg loss 0.00345285, throughput 2.87991K wps
[Epoch 17 Batch 960/1540] avg loss 0.00373159, throughput 2.8686K wps
[Epoch 17 Batch 990/1540] avg loss 0.00373004, throughput 2.86531K wps
[Epoch 17 Batch 1020/1540] avg loss 0.00389974, throughput 2.84328K wps
[Epoch 17 Batch 1050/1540] avg loss 0.00369152, throughput 2.84203K wps
[Epoch 17 Batch 1080/1540] avg loss 0.00339621, throughput 2.86918K wps
[Epoch 17 Batch 1110/1540] avg loss 0.00371069, throughput 2.83962K wps
[Epoch 17 Batch 1140/1540] avg loss 0.00360007, throughput 2.83966K wps
[Epoch 17 Batch 1170/1540] avg loss 0.00341526, throughput 2.86066K wps
[Epoch 17 Batch 1200/1540] avg loss 0.00430276, throughput 2.84973K wps
[Epoch 17 Batch 1230/1540] avg loss 0.00392419, throughput 2.84217K wps
[Epoch 17 Batch 1260/1540] avg loss 0.00415668, throughput 2.8437K wps
[Epoch 17 Batch 1290/1540] avg loss 0.00331118, throughput 2.86902K wps
[Epoch 17 Batch 1320/1540] avg loss 0.00361056, throughput 2.84208K wps
[Epoch 17 Batch 1350/1540] avg loss 0.00372032, throughput 2.85573K wps
[Epoch 17 Batch 1380/1540] avg loss 0.00364981, throughput 2.85255K wps
[Epoch 17 Batch 1410/1540] avg loss 0.00370076, throughput 2.85557K wps
[Epoch 17 Batch 1440/1540] avg loss 0.00401664, throughput 2.87357K wps
[Epoch 17 Batch 1470/1540] avg loss 0.00412138, throughput 2.87957K wps
[Epoch 17 Batch 1500/1540] avg loss 0.00385896, throughput 2.85209K wps
[Epoch 17 Batch 1530/1540] avg loss 0.00437636, throughput 2.82669K wps
Begin Testing...
[Epoch 17] train avg loss 0.00367402, dev acc 0.8165, dev avg loss 0.43482, throughput 2.85161K wps
[Epoch 18 Batch 30/1540] avg loss 0.00357906, throughput 2.93864K wps
[Epoch 18 Batch 60/1540] avg loss 0.00337154, throughput 2.85325K wps
[Epoch 18 Batch 90/1540] avg loss 0.00297107, throughput 2.87903K wps
[Epoch 18 Batch 120/1540] avg loss 0.00333438, throughput 2.87263K wps
[Epoch 18 Batch 150/1540] avg loss 0.00338964, throughput 2.85472K wps
[Epoch 18 Batch 180/1540] avg loss 0.0030976, throughput 2.87709K wps
[Epoch 18 Batch 210/1540] avg loss 0.00295231, throughput 2.87133K wps
[Epoch 18 Batch 240/1540] avg loss 0.00355337, throughput 2.86864K wps
[Epoch 18 Batch 270/1540] avg loss 0.003575, throughput 2.87699K wps
[Epoch 18 Batch 300/1540] avg loss 0.00324794, throughput 2.8589K wps
[Epoch 18 Batch 330/1540] avg loss 0.00356497, throughput 2.84314K wps
[Epoch 18 Batch 360/1540] avg loss 0.00356339, throughput 2.87465K wps
[Epoch 18 Batch 390/1540] avg loss 0.00375464, throughput 2.86385K wps
[Epoch 18 Batch 420/1540] avg loss 0.00346486, throughput 2.86775K wps
[Epoch 18 Batch 450/1540] avg loss 0.00338985, throughput 2.87758K wps
[Epoch 18 Batch 480/1540] avg loss 0.00340048, throughput 2.85165K wps
[Epoch 18 Batch 510/1540] avg loss 0.00334323, throughput 2.82923K wps
[Epoch 18 Batch 540/1540] avg loss 0.00368448, throughput 2.86647K wps
[Epoch 18 Batch 570/1540] avg loss 0.00321845, throughput 2.85408K wps
[Epoch 18 Batch 600/1540] avg loss 0.00348318, throughput 2.86232K wps
[Epoch 18 Batch 630/1540] avg loss 0.00324679, throughput 2.87064K wps
[Epoch 18 Batch 660/1540] avg loss 0.00364707, throughput 2.86977K wps
[Epoch 18 Batch 690/1540] avg loss 0.00367947, throughput 2.87485K wps
[Epoch 18 Batch 720/1540] avg loss 0.00323967, throughput 2.81789K wps
[Epoch 18 Batch 750/1540] avg loss 0.00300659, throughput 2.86046K wps
[Epoch 18 Batch 780/1540] avg loss 0.00320991, throughput 2.81983K wps
[Epoch 18 Batch 810/1540] avg loss 0.00350821, throughput 2.88281K wps
[Epoch 18 Batch 840/1540] avg loss 0.00367232, throughput 2.85983K wps
[Epoch 18 Batch 870/1540] avg loss 0.00358799, throughput 2.85452K wps
[Epoch 18 Batch 900/1540] avg loss 0.00377593, throughput 2.87457K wps
[Epoch 18 Batch 930/1540] avg loss 0.00355176, throughput 2.8255K wps
[Epoch 18 Batch 960/1540] avg loss 0.00354487, throughput 2.7912K wps
[Epoch 18 Batch 990/1540] avg loss 0.00406148, throughput 2.81019K wps
[Epoch 18 Batch 1020/1540] avg loss 0.00371452, throughput 2.8685K wps
[Epoch 18 Batch 1050/1540] avg loss 0.00362137, throughput 2.80297K wps
[Epoch 18 Batch 1080/1540] avg loss 0.00383899, throughput 2.79751K wps
[Epoch 18 Batch 1110/1540] avg loss 0.00376215, throughput 2.85163K wps
[Epoch 18 Batch 1140/1540] avg loss 0.00357949, throughput 2.82323K wps
[Epoch 18 Batch 1170/1540] avg loss 0.00304835, throughput 2.871K wps
[Epoch 18 Batch 1200/1540] avg loss 0.00402783, throughput 2.87504K wps
[Epoch 18 Batch 1230/1540] avg loss 0.00391006, throughput 2.8776K wps
[Epoch 18 Batch 1260/1540] avg loss 0.00325532, throughput 2.85809K wps
[Epoch 18 Batch 1290/1540] avg loss 0.003581, throughput 2.86853K wps
[Epoch 18 Batch 1320/1540] avg loss 0.00310963, throughput 2.85105K wps
[Epoch 18 Batch 1350/1540] avg loss 0.00383194, throughput 2.85997K wps
[Epoch 18 Batch 1380/1540] avg loss 0.00394252, throughput 2.84992K wps
[Epoch 18 Batch 1410/1540] avg loss 0.00324707, throughput 2.86976K wps
[Epoch 18 Batch 1440/1540] avg loss 0.0036005, throughput 2.81174K wps
[Epoch 18 Batch 1470/1540] avg loss 0.00342126, throughput 2.82035K wps
[Epoch 18 Batch 1500/1540] avg loss 0.00342725, throughput 2.81288K wps
[Epoch 18 Batch 1530/1540] avg loss 0.00331513, throughput 2.81493K wps
Begin Testing...
[Epoch 18] train avg loss 0.00349049, dev acc 0.8096, dev avg loss 0.451863, throughput 2.85304K wps
[Epoch 19 Batch 30/1540] avg loss 0.00318735, throughput 2.86704K wps
[Epoch 19 Batch 60/1540] avg loss 0.00329827, throughput 2.84633K wps
[Epoch 19 Batch 90/1540] avg loss 0.00295942, throughput 2.87147K wps
[Epoch 19 Batch 120/1540] avg loss 0.00317722, throughput 2.86827K wps
[Epoch 19 Batch 150/1540] avg loss 0.00319235, throughput 2.84556K wps
[Epoch 19 Batch 180/1540] avg loss 0.00358871, throughput 2.87535K wps
[Epoch 19 Batch 210/1540] avg loss 0.0032216, throughput 2.81075K wps
[Epoch 19 Batch 240/1540] avg loss 0.00331344, throughput 2.78933K wps
[Epoch 19 Batch 270/1540] avg loss 0.00383577, throughput 2.86782K wps
[Epoch 19 Batch 300/1540] avg loss 0.00300314, throughput 2.86806K wps
[Epoch 19 Batch 330/1540] avg loss 0.00362143, throughput 2.78831K wps
[Epoch 19 Batch 360/1540] avg loss 0.00310181, throughput 2.85062K wps
[Epoch 19 Batch 390/1540] avg loss 0.00328461, throughput 2.8693K wps
[Epoch 19 Batch 420/1540] avg loss 0.0033123, throughput 2.84265K wps
[Epoch 19 Batch 450/1540] avg loss 0.00406618, throughput 2.81006K wps
[Epoch 19 Batch 480/1540] avg loss 0.0033118, throughput 2.87635K wps
[Epoch 19 Batch 510/1540] avg loss 0.0037313, throughput 2.86046K wps
[Epoch 19 Batch 540/1540] avg loss 0.00365229, throughput 2.86706K wps
[Epoch 19 Batch 570/1540] avg loss 0.00356581, throughput 2.87633K wps
[Epoch 19 Batch 600/1540] avg loss 0.00318694, throughput 2.87783K wps
[Epoch 19 Batch 630/1540] avg loss 0.00320496, throughput 2.88167K wps
[Epoch 19 Batch 660/1540] avg loss 0.00324332, throughput 2.86276K wps
[Epoch 19 Batch 690/1540] avg loss 0.00345855, throughput 2.86727K wps
[Epoch 19 Batch 720/1540] avg loss 0.00314163, throughput 2.86091K wps
[Epoch 19 Batch 750/1540] avg loss 0.00355368, throughput 2.81585K wps
[Epoch 19 Batch 780/1540] avg loss 0.00324602, throughput 2.86045K wps
[Epoch 19 Batch 810/1540] avg loss 0.00370243, throughput 2.83406K wps
[Epoch 19 Batch 840/1540] avg loss 0.00368683, throughput 2.86505K wps
[Epoch 19 Batch 870/1540] avg loss 0.00327352, throughput 2.80645K wps
[Epoch 19 Batch 900/1540] avg loss 0.00332074, throughput 2.82758K wps
[Epoch 19 Batch 930/1540] avg loss 0.00361743, throughput 2.87025K wps
[Epoch 19 Batch 960/1540] avg loss 0.00300051, throughput 2.86002K wps
[Epoch 19 Batch 990/1540] avg loss 0.00321304, throughput 2.80352K wps
[Epoch 19 Batch 1020/1540] avg loss 0.00369922, throughput 2.87685K wps
[Epoch 19 Batch 1050/1540] avg loss 0.00305007, throughput 2.83707K wps
[Epoch 19 Batch 1080/1540] avg loss 0.00358453, throughput 2.82909K wps
[Epoch 19 Batch 1110/1540] avg loss 0.00343419, throughput 2.88046K wps
[Epoch 19 Batch 1140/1540] avg loss 0.00326795, throughput 2.8748K wps
[Epoch 19 Batch 1170/1540] avg loss 0.00373046, throughput 2.87656K wps
[Epoch 19 Batch 1200/1540] avg loss 0.0034502, throughput 2.87201K wps
[Epoch 19 Batch 1230/1540] avg loss 0.00334682, throughput 2.87895K wps
[Epoch 19 Batch 1260/1540] avg loss 0.00339547, throughput 2.80381K wps
[Epoch 19 Batch 1290/1540] avg loss 0.00339138, throughput 2.84529K wps
[Epoch 19 Batch 1320/1540] avg loss 0.00357528, throughput 2.83887K wps
[Epoch 19 Batch 1350/1540] avg loss 0.00364623, throughput 2.87K wps
[Epoch 19 Batch 1380/1540] avg loss 0.00313599, throughput 2.85182K wps
[Epoch 19 Batch 1410/1540] avg loss 0.00299692, throughput 2.87934K wps
[Epoch 19 Batch 1440/1540] avg loss 0.00317699, throughput 2.87542K wps
[Epoch 19 Batch 1470/1540] avg loss 0.00334431, throughput 2.83182K wps
[Epoch 19 Batch 1500/1540] avg loss 0.00341411, throughput 2.83651K wps
[Epoch 19 Batch 1530/1540] avg loss 0.00376304, throughput 2.87732K wps
Begin Testing...
[Epoch 19] train avg loss 0.00338364, dev acc 0.8131, dev avg loss 0.446204, throughput 2.85203K wps
[Epoch 20 Batch 30/1540] avg loss 0.00322291, throughput 2.92462K wps
[Epoch 20 Batch 60/1540] avg loss 0.00287007, throughput 2.8529K wps
[Epoch 20 Batch 90/1540] avg loss 0.00303769, throughput 2.79475K wps
[Epoch 20 Batch 120/1540] avg loss 0.00276825, throughput 2.86558K wps
[Epoch 20 Batch 150/1540] avg loss 0.00305162, throughput 2.86841K wps
[Epoch 20 Batch 180/1540] avg loss 0.0032986, throughput 2.87617K wps
[Epoch 20 Batch 210/1540] avg loss 0.00350877, throughput 2.85078K wps
[Epoch 20 Batch 240/1540] avg loss 0.003069, throughput 2.79045K wps
[Epoch 20 Batch 270/1540] avg loss 0.0034995, throughput 2.87852K wps
[Epoch 20 Batch 300/1540] avg loss 0.00309145, throughput 2.8165K wps
[Epoch 20 Batch 330/1540] avg loss 0.00335601, throughput 2.84819K wps
[Epoch 20 Batch 360/1540] avg loss 0.00325714, throughput 2.84457K wps
[Epoch 20 Batch 390/1540] avg loss 0.00325351, throughput 2.86507K wps
[Epoch 20 Batch 420/1540] avg loss 0.00315532, throughput 2.8773K wps
[Epoch 20 Batch 450/1540] avg loss 0.00318432, throughput 2.86665K wps
[Epoch 20 Batch 480/1540] avg loss 0.00316532, throughput 2.87281K wps
[Epoch 20 Batch 510/1540] avg loss 0.00335802, throughput 2.78694K wps
[Epoch 20 Batch 540/1540] avg loss 0.0032944, throughput 2.87322K wps
[Epoch 20 Batch 570/1540] avg loss 0.00307869, throughput 2.88114K wps
[Epoch 20 Batch 600/1540] avg loss 0.00326062, throughput 2.88304K wps
[Epoch 20 Batch 630/1540] avg loss 0.00329051, throughput 2.87915K wps
[Epoch 20 Batch 660/1540] avg loss 0.00316112, throughput 2.85358K wps
[Epoch 20 Batch 690/1540] avg loss 0.00332212, throughput 2.87665K wps
[Epoch 20 Batch 720/1540] avg loss 0.00349856, throughput 2.85323K wps
[Epoch 20 Batch 750/1540] avg loss 0.00342787, throughput 2.8696K wps
[Epoch 20 Batch 780/1540] avg loss 0.00292761, throughput 2.84878K wps
[Epoch 20 Batch 810/1540] avg loss 0.00309589, throughput 2.86684K wps
[Epoch 20 Batch 840/1540] avg loss 0.00315749, throughput 2.84673K wps
[Epoch 20 Batch 870/1540] avg loss 0.00320325, throughput 2.79478K wps
[Epoch 20 Batch 900/1540] avg loss 0.0032549, throughput 2.87752K wps
[Epoch 20 Batch 930/1540] avg loss 0.00311146, throughput 2.86185K wps
[Epoch 20 Batch 960/1540] avg loss 0.00353997, throughput 2.85309K wps
[Epoch 20 Batch 990/1540] avg loss 0.0037487, throughput 2.86043K wps
[Epoch 20 Batch 1020/1540] avg loss 0.00315399, throughput 2.85797K wps
[Epoch 20 Batch 1050/1540] avg loss 0.00326294, throughput 2.87623K wps
[Epoch 20 Batch 1080/1540] avg loss 0.00298629, throughput 2.82635K wps
[Epoch 20 Batch 1110/1540] avg loss 0.00295518, throughput 2.85597K wps
[Epoch 20 Batch 1140/1540] avg loss 0.00334232, throughput 2.86511K wps
[Epoch 20 Batch 1170/1540] avg loss 0.00361975, throughput 2.79842K wps
[Epoch 20 Batch 1200/1540] avg loss 0.00332491, throughput 2.80976K wps
[Epoch 20 Batch 1230/1540] avg loss 0.00338287, throughput 2.8808K wps
[Epoch 20 Batch 1260/1540] avg loss 0.0026053, throughput 2.87229K wps
[Epoch 20 Batch 1290/1540] avg loss 0.0033712, throughput 2.86433K wps
[Epoch 20 Batch 1320/1540] avg loss 0.00339137, throughput 2.884K wps
[Epoch 20 Batch 1350/1540] avg loss 0.00349193, throughput 2.85217K wps
[Epoch 20 Batch 1380/1540] avg loss 0.00339059, throughput 2.7952K wps
[Epoch 20 Batch 1410/1540] avg loss 0.00371818, throughput 2.8401K wps
[Epoch 20 Batch 1440/1540] avg loss 0.00305314, throughput 2.86594K wps
[Epoch 20 Batch 1470/1540] avg loss 0.00323899, throughput 2.87791K wps
[Epoch 20 Batch 1500/1540] avg loss 0.00352124, throughput 2.87684K wps
[Epoch 20 Batch 1530/1540] avg loss 0.0033381, throughput 2.85945K wps
Begin Testing...
[Epoch 20] train avg loss 0.00324707, dev acc 0.8211, dev avg loss 0.449274, throughput 2.85455K wps
[Epoch 21 Batch 30/1540] avg loss 0.00313334, throughput 2.91814K wps
[Epoch 21 Batch 60/1540] avg loss 0.00317189, throughput 2.81536K wps
[Epoch 21 Batch 90/1540] avg loss 0.00303604, throughput 2.8796K wps
[Epoch 21 Batch 120/1540] avg loss 0.00275792, throughput 2.8732K wps
[Epoch 21 Batch 150/1540] avg loss 0.00330277, throughput 2.87519K wps
[Epoch 21 Batch 180/1540] avg loss 0.00352442, throughput 2.86927K wps
[Epoch 21 Batch 210/1540] avg loss 0.00324427, throughput 2.86545K wps
[Epoch 21 Batch 240/1540] avg loss 0.00354215, throughput 2.86938K wps
[Epoch 21 Batch 270/1540] avg loss 0.00286154, throughput 2.84255K wps
[Epoch 21 Batch 300/1540] avg loss 0.00266311, throughput 2.88382K wps
[Epoch 21 Batch 330/1540] avg loss 0.00324052, throughput 2.88248K wps
[Epoch 21 Batch 360/1540] avg loss 0.00296289, throughput 2.84734K wps
[Epoch 21 Batch 390/1540] avg loss 0.002875, throughput 2.87143K wps
[Epoch 21 Batch 420/1540] avg loss 0.00312895, throughput 2.81585K wps
[Epoch 21 Batch 450/1540] avg loss 0.00293955, throughput 2.83926K wps
[Epoch 21 Batch 480/1540] avg loss 0.00308991, throughput 2.86278K wps
[Epoch 21 Batch 510/1540] avg loss 0.00306917, throughput 2.82739K wps
[Epoch 21 Batch 540/1540] avg loss 0.00324099, throughput 2.83513K wps
[Epoch 21 Batch 570/1540] avg loss 0.00321909, throughput 2.87749K wps
[Epoch 21 Batch 600/1540] avg loss 0.00311013, throughput 2.86831K wps
[Epoch 21 Batch 630/1540] avg loss 0.00327312, throughput 2.80174K wps
[Epoch 21 Batch 660/1540] avg loss 0.00344773, throughput 2.85005K wps
[Epoch 21 Batch 690/1540] avg loss 0.00363532, throughput 2.85606K wps
[Epoch 21 Batch 720/1540] avg loss 0.00302558, throughput 2.86123K wps
[Epoch 21 Batch 750/1540] avg loss 0.00288845, throughput 2.86808K wps
[Epoch 21 Batch 780/1540] avg loss 0.00319066, throughput 2.86136K wps
[Epoch 21 Batch 810/1540] avg loss 0.00309795, throughput 2.83712K wps
[Epoch 21 Batch 840/1540] avg loss 0.00340161, throughput 2.87855K wps
[Epoch 21 Batch 870/1540] avg loss 0.00289211, throughput 2.87989K wps
[Epoch 21 Batch 900/1540] avg loss 0.00327628, throughput 2.85901K wps
[Epoch 21 Batch 930/1540] avg loss 0.0030395, throughput 2.8682K wps
[Epoch 21 Batch 960/1540] avg loss 0.00335489, throughput 2.87423K wps
[Epoch 21 Batch 990/1540] avg loss 0.00378134, throughput 2.86996K wps
[Epoch 21 Batch 1020/1540] avg loss 0.00306519, throughput 2.86663K wps
[Epoch 21 Batch 1050/1540] avg loss 0.00310666, throughput 2.84069K wps
[Epoch 21 Batch 1080/1540] avg loss 0.00307959, throughput 2.8486K wps
[Epoch 21 Batch 1110/1540] avg loss 0.00294899, throughput 2.87957K wps
[Epoch 21 Batch 1140/1540] avg loss 0.00323469, throughput 2.87273K wps
[Epoch 21 Batch 1170/1540] avg loss 0.00335863, throughput 2.86675K wps
[Epoch 21 Batch 1200/1540] avg loss 0.00277829, throughput 2.86083K wps
[Epoch 21 Batch 1230/1540] avg loss 0.00321977, throughput 2.84172K wps
[Epoch 21 Batch 1260/1540] avg loss 0.00280524, throughput 2.85028K wps
[Epoch 21 Batch 1290/1540] avg loss 0.0033545, throughput 2.83499K wps
[Epoch 21 Batch 1320/1540] avg loss 0.00313087, throughput 2.88148K wps
[Epoch 21 Batch 1350/1540] avg loss 0.0031993, throughput 2.86098K wps
[Epoch 21 Batch 1380/1540] avg loss 0.00338701, throughput 2.865K wps
[Epoch 21 Batch 1410/1540] avg loss 0.0032147, throughput 2.87203K wps
[Epoch 21 Batch 1440/1540] avg loss 0.00295939, throughput 2.85929K wps
[Epoch 21 Batch 1470/1540] avg loss 0.0030755, throughput 2.80241K wps
[Epoch 21 Batch 1500/1540] avg loss 0.00287541, throughput 2.87201K wps
[Epoch 21 Batch 1530/1540] avg loss 0.00299449, throughput 2.8429K wps
Begin Testing...
[Epoch 21] train avg loss 0.00313782, dev acc 0.8177, dev avg loss 0.467425, throughput 2.85884K wps
[Epoch 22 Batch 30/1540] avg loss 0.00282733, throughput 2.92973K wps
[Epoch 22 Batch 60/1540] avg loss 0.00278696, throughput 2.86666K wps
[Epoch 22 Batch 90/1540] avg loss 0.00324692, throughput 2.87315K wps
[Epoch 22 Batch 120/1540] avg loss 0.00300725, throughput 2.86509K wps
[Epoch 22 Batch 150/1540] avg loss 0.00297408, throughput 2.87337K wps
[Epoch 22 Batch 180/1540] avg loss 0.00286091, throughput 2.86416K wps
[Epoch 22 Batch 210/1540] avg loss 0.00278911, throughput 2.86701K wps
[Epoch 22 Batch 240/1540] avg loss 0.00291024, throughput 2.86352K wps
[Epoch 22 Batch 270/1540] avg loss 0.00322752, throughput 2.86028K wps
[Epoch 22 Batch 300/1540] avg loss 0.00288793, throughput 2.87147K wps
[Epoch 22 Batch 330/1540] avg loss 0.0030328, throughput 2.85605K wps
[Epoch 22 Batch 360/1540] avg loss 0.00317558, throughput 2.8152K wps
[Epoch 22 Batch 390/1540] avg loss 0.00287794, throughput 2.8662K wps
[Epoch 22 Batch 420/1540] avg loss 0.00272743, throughput 2.85039K wps
[Epoch 22 Batch 450/1540] avg loss 0.00289203, throughput 2.86104K wps
[Epoch 22 Batch 480/1540] avg loss 0.0028317, throughput 2.85957K wps
[Epoch 22 Batch 510/1540] avg loss 0.00292185, throughput 2.8766K wps
[Epoch 22 Batch 540/1540] avg loss 0.0029785, throughput 2.832K wps
[Epoch 22 Batch 570/1540] avg loss 0.00281449, throughput 2.81408K wps
[Epoch 22 Batch 600/1540] avg loss 0.00328502, throughput 2.7855K wps
[Epoch 22 Batch 630/1540] avg loss 0.00315896, throughput 2.8011K wps
[Epoch 22 Batch 660/1540] avg loss 0.00294471, throughput 2.87003K wps
[Epoch 22 Batch 690/1540] avg loss 0.00310993, throughput 2.87529K wps
[Epoch 22 Batch 720/1540] avg loss 0.00277707, throughput 2.86435K wps
[Epoch 22 Batch 750/1540] avg loss 0.00309243, throughput 2.87763K wps
[Epoch 22 Batch 780/1540] avg loss 0.00353386, throughput 2.85315K wps
[Epoch 22 Batch 810/1540] avg loss 0.00325083, throughput 2.86563K wps
[Epoch 22 Batch 840/1540] avg loss 0.00247711, throughput 2.87933K wps
[Epoch 22 Batch 870/1540] avg loss 0.00344158, throughput 2.87441K wps
[Epoch 22 Batch 900/1540] avg loss 0.00262524, throughput 2.86554K wps
[Epoch 22 Batch 930/1540] avg loss 0.00314025, throughput 2.85158K wps
[Epoch 22 Batch 960/1540] avg loss 0.00300459, throughput 2.87485K wps
[Epoch 22 Batch 990/1540] avg loss 0.00305307, throughput 2.86408K wps
[Epoch 22 Batch 1020/1540] avg loss 0.00294867, throughput 2.81877K wps
[Epoch 22 Batch 1050/1540] avg loss 0.00308634, throughput 2.8534K wps
[Epoch 22 Batch 1080/1540] avg loss 0.00327687, throughput 2.8535K wps
[Epoch 22 Batch 1110/1540] avg loss 0.00297118, throughput 2.81816K wps
[Epoch 22 Batch 1140/1540] avg loss 0.0027898, throughput 2.86765K wps
[Epoch 22 Batch 1170/1540] avg loss 0.00292083, throughput 2.87226K wps
[Epoch 22 Batch 1200/1540] avg loss 0.00318261, throughput 2.8646K wps
[Epoch 22 Batch 1230/1540] avg loss 0.00327711, throughput 2.83095K wps
[Epoch 22 Batch 1260/1540] avg loss 0.00303457, throughput 2.87204K wps
[Epoch 22 Batch 1290/1540] avg loss 0.00325259, throughput 2.88018K wps
[Epoch 22 Batch 1320/1540] avg loss 0.00316599, throughput 2.84129K wps
[Epoch 22 Batch 1350/1540] avg loss 0.00336852, throughput 2.84523K wps
[Epoch 22 Batch 1380/1540] avg loss 0.0029842, throughput 2.88054K wps
[Epoch 22 Batch 1410/1540] avg loss 0.00290526, throughput 2.87453K wps
[Epoch 22 Batch 1440/1540] avg loss 0.00316248, throughput 2.8697K wps
[Epoch 22 Batch 1470/1540] avg loss 0.00331497, throughput 2.87896K wps
[Epoch 22 Batch 1500/1540] avg loss 0.00289617, throughput 2.87031K wps
[Epoch 22 Batch 1530/1540] avg loss 0.00327411, throughput 2.7794K wps
Begin Testing...
[Epoch 22] train avg loss 0.00302828, dev acc 0.8142, dev avg loss 0.479162, throughput 2.85694K wps
[Epoch 23 Batch 30/1540] avg loss 0.00286773, throughput 2.92895K wps
[Epoch 23 Batch 60/1540] avg loss 0.00274873, throughput 2.87904K wps
[Epoch 23 Batch 90/1540] avg loss 0.00293792, throughput 2.87483K wps
[Epoch 23 Batch 120/1540] avg loss 0.00280889, throughput 2.87123K wps
[Epoch 23 Batch 150/1540] avg loss 0.00246361, throughput 2.83668K wps
[Epoch 23 Batch 180/1540] avg loss 0.00281801, throughput 2.86564K wps
[Epoch 23 Batch 210/1540] avg loss 0.00238389, throughput 2.87084K wps
[Epoch 23 Batch 240/1540] avg loss 0.0029477, throughput 2.86648K wps
[Epoch 23 Batch 270/1540] avg loss 0.00272324, throughput 2.8809K wps
[Epoch 23 Batch 300/1540] avg loss 0.00254789, throughput 2.83165K wps
[Epoch 23 Batch 330/1540] avg loss 0.00289714, throughput 2.86351K wps
[Epoch 23 Batch 360/1540] avg loss 0.00318455, throughput 2.86726K wps
[Epoch 23 Batch 390/1540] avg loss 0.00310095, throughput 2.86282K wps
[Epoch 23 Batch 420/1540] avg loss 0.00300737, throughput 2.86004K wps
[Epoch 23 Batch 450/1540] avg loss 0.00274201, throughput 2.86597K wps
[Epoch 23 Batch 480/1540] avg loss 0.00291038, throughput 2.87539K wps
[Epoch 23 Batch 510/1540] avg loss 0.0028225, throughput 2.8711K wps
[Epoch 23 Batch 540/1540] avg loss 0.00319809, throughput 2.87392K wps
[Epoch 23 Batch 570/1540] avg loss 0.002833, throughput 2.87782K wps
[Epoch 23 Batch 600/1540] avg loss 0.00253814, throughput 2.81454K wps
[Epoch 23 Batch 630/1540] avg loss 0.00291726, throughput 2.83638K wps
[Epoch 23 Batch 660/1540] avg loss 0.00304206, throughput 2.87139K wps
[Epoch 23 Batch 690/1540] avg loss 0.00237228, throughput 2.8755K wps
[Epoch 23 Batch 720/1540] avg loss 0.00331781, throughput 2.85092K wps
[Epoch 23 Batch 750/1540] avg loss 0.00322385, throughput 2.8784K wps
[Epoch 23 Batch 780/1540] avg loss 0.00290321, throughput 2.87555K wps
[Epoch 23 Batch 810/1540] avg loss 0.00286966, throughput 2.82096K wps
[Epoch 23 Batch 840/1540] avg loss 0.00346712, throughput 2.86078K wps
[Epoch 23 Batch 870/1540] avg loss 0.00290061, throughput 2.84107K wps
[Epoch 23 Batch 900/1540] avg loss 0.00289271, throughput 2.84332K wps
[Epoch 23 Batch 930/1540] avg loss 0.00275391, throughput 2.8727K wps
[Epoch 23 Batch 960/1540] avg loss 0.00269665, throughput 2.87774K wps
[Epoch 23 Batch 990/1540] avg loss 0.00309035, throughput 2.81989K wps
[Epoch 23 Batch 1020/1540] avg loss 0.00296705, throughput 2.87164K wps
[Epoch 23 Batch 1050/1540] avg loss 0.0031935, throughput 2.8112K wps
[Epoch 23 Batch 1080/1540] avg loss 0.00318004, throughput 2.87269K wps
[Epoch 23 Batch 1110/1540] avg loss 0.00297858, throughput 2.86409K wps
[Epoch 23 Batch 1140/1540] avg loss 0.00295833, throughput 2.85563K wps
[Epoch 23 Batch 1170/1540] avg loss 0.00288467, throughput 2.86134K wps
[Epoch 23 Batch 1200/1540] avg loss 0.00293214, throughput 2.87326K wps
[Epoch 23 Batch 1230/1540] avg loss 0.00321484, throughput 2.88035K wps
[Epoch 23 Batch 1260/1540] avg loss 0.00313475, throughput 2.87544K wps
[Epoch 23 Batch 1290/1540] avg loss 0.00250774, throughput 2.87365K wps
[Epoch 23 Batch 1320/1540] avg loss 0.00277714, throughput 2.79115K wps
[Epoch 23 Batch 1350/1540] avg loss 0.00278417, throughput 2.80668K wps
[Epoch 23 Batch 1380/1540] avg loss 0.00280519, throughput 2.81328K wps
[Epoch 23 Batch 1410/1540] avg loss 0.0031202, throughput 2.8734K wps
[Epoch 23 Batch 1440/1540] avg loss 0.00281972, throughput 2.86365K wps
[Epoch 23 Batch 1470/1540] avg loss 0.00299263, throughput 2.88113K wps
[Epoch 23 Batch 1500/1540] avg loss 0.00325375, throughput 2.8546K wps
[Epoch 23 Batch 1530/1540] avg loss 0.00294169, throughput 2.79523K wps
Begin Testing...
[Epoch 23] train avg loss 0.00290931, dev acc 0.8245, dev avg loss 0.471665, throughput 2.85823K wps
[Epoch 24 Batch 30/1540] avg loss 0.0027535, throughput 2.93546K wps
[Epoch 24 Batch 60/1540] avg loss 0.00283967, throughput 2.87195K wps
[Epoch 24 Batch 90/1540] avg loss 0.00267324, throughput 2.87079K wps
[Epoch 24 Batch 120/1540] avg loss 0.00243618, throughput 2.8045K wps
[Epoch 24 Batch 150/1540] avg loss 0.00254263, throughput 2.87777K wps
[Epoch 24 Batch 180/1540] avg loss 0.00257524, throughput 2.81295K wps
[Epoch 24 Batch 210/1540] avg loss 0.0026824, throughput 2.86583K wps
[Epoch 24 Batch 240/1540] avg loss 0.00251548, throughput 2.86809K wps
[Epoch 24 Batch 270/1540] avg loss 0.00259465, throughput 2.84216K wps
[Epoch 24 Batch 300/1540] avg loss 0.00249184, throughput 2.83959K wps
[Epoch 24 Batch 330/1540] avg loss 0.00278713, throughput 2.86815K wps
[Epoch 24 Batch 360/1540] avg loss 0.00294722, throughput 2.86137K wps
[Epoch 24 Batch 390/1540] avg loss 0.00280355, throughput 2.86184K wps
[Epoch 24 Batch 420/1540] avg loss 0.00284389, throughput 2.8335K wps
[Epoch 24 Batch 450/1540] avg loss 0.0028293, throughput 2.86712K wps
[Epoch 24 Batch 480/1540] avg loss 0.00303276, throughput 2.86092K wps
[Epoch 24 Batch 510/1540] avg loss 0.00294255, throughput 2.86071K wps
[Epoch 24 Batch 540/1540] avg loss 0.00292161, throughput 2.87517K wps
[Epoch 24 Batch 570/1540] avg loss 0.00309398, throughput 2.81509K wps
[Epoch 24 Batch 600/1540] avg loss 0.00292812, throughput 2.88067K wps
[Epoch 24 Batch 630/1540] avg loss 0.00316362, throughput 2.88013K wps
[Epoch 24 Batch 660/1540] avg loss 0.00309564, throughput 2.86913K wps
[Epoch 24 Batch 690/1540] avg loss 0.0027192, throughput 2.88115K wps
[Epoch 24 Batch 720/1540] avg loss 0.00274694, throughput 2.87721K wps
[Epoch 24 Batch 750/1540] avg loss 0.00286279, throughput 2.82806K wps
[Epoch 24 Batch 780/1540] avg loss 0.00266064, throughput 2.87895K wps
[Epoch 24 Batch 810/1540] avg loss 0.00291759, throughput 2.83291K wps
[Epoch 24 Batch 840/1540] avg loss 0.00253504, throughput 2.88458K wps
[Epoch 24 Batch 870/1540] avg loss 0.00322491, throughput 2.87172K wps
[Epoch 24 Batch 900/1540] avg loss 0.00266742, throughput 2.87498K wps
[Epoch 24 Batch 930/1540] avg loss 0.00297642, throughput 2.873K wps
[Epoch 24 Batch 960/1540] avg loss 0.00286695, throughput 2.86398K wps
[Epoch 24 Batch 990/1540] avg loss 0.0030373, throughput 2.86228K wps
[Epoch 24 Batch 1020/1540] avg loss 0.00314148, throughput 2.86995K wps
[Epoch 24 Batch 1050/1540] avg loss 0.00314522, throughput 2.85576K wps
[Epoch 24 Batch 1080/1540] avg loss 0.0027721, throughput 2.84225K wps
[Epoch 24 Batch 1110/1540] avg loss 0.00318235, throughput 2.86272K wps
[Epoch 24 Batch 1140/1540] avg loss 0.00322502, throughput 2.84763K wps
[Epoch 24 Batch 1170/1540] avg loss 0.00279356, throughput 2.86821K wps
[Epoch 24 Batch 1200/1540] avg loss 0.00241805, throughput 2.8647K wps
[Epoch 24 Batch 1230/1540] avg loss 0.00299595, throughput 2.86942K wps
[Epoch 24 Batch 1260/1540] avg loss 0.00276273, throughput 2.88167K wps
[Epoch 24 Batch 1290/1540] avg loss 0.00288227, throughput 2.84005K wps
[Epoch 24 Batch 1320/1540] avg loss 0.0026863, throughput 2.87175K wps
[Epoch 24 Batch 1350/1540] avg loss 0.00283038, throughput 2.86456K wps
[Epoch 24 Batch 1380/1540] avg loss 0.00274184, throughput 2.83716K wps
[Epoch 24 Batch 1410/1540] avg loss 0.0027683, throughput 2.87105K wps
[Epoch 24 Batch 1440/1540] avg loss 0.00272055, throughput 2.85368K wps
[Epoch 24 Batch 1470/1540] avg loss 0.00313037, throughput 2.83046K wps
[Epoch 24 Batch 1500/1540] avg loss 0.00280752, throughput 2.88113K wps
[Epoch 24 Batch 1530/1540] avg loss 0.00301511, throughput 2.86691K wps
Begin Testing...
[Epoch 24] train avg loss 0.0028412, dev acc 0.8177, dev avg loss 0.483309, throughput 2.86136K wps
[Epoch 25 Batch 30/1540] avg loss 0.00266207, throughput 2.84153K wps
[Epoch 25 Batch 60/1540] avg loss 0.00244357, throughput 2.79583K wps
[Epoch 25 Batch 90/1540] avg loss 0.00282656, throughput 2.85275K wps
[Epoch 25 Batch 120/1540] avg loss 0.00274608, throughput 2.85391K wps
[Epoch 25 Batch 150/1540] avg loss 0.00252018, throughput 2.85551K wps
[Epoch 25 Batch 180/1540] avg loss 0.00252817, throughput 2.87158K wps
[Epoch 25 Batch 210/1540] avg loss 0.00248763, throughput 2.81833K wps
[Epoch 25 Batch 240/1540] avg loss 0.00273261, throughput 2.80737K wps
[Epoch 25 Batch 270/1540] avg loss 0.00271718, throughput 2.87341K wps
[Epoch 25 Batch 300/1540] avg loss 0.00262288, throughput 2.85194K wps
[Epoch 25 Batch 330/1540] avg loss 0.00246257, throughput 2.80893K wps
[Epoch 25 Batch 360/1540] avg loss 0.00255895, throughput 2.83862K wps
[Epoch 25 Batch 390/1540] avg loss 0.0026047, throughput 2.8407K wps
[Epoch 25 Batch 420/1540] avg loss 0.00264311, throughput 2.88042K wps
[Epoch 25 Batch 450/1540] avg loss 0.00265039, throughput 2.87435K wps
[Epoch 25 Batch 480/1540] avg loss 0.00279353, throughput 2.87696K wps
[Epoch 25 Batch 510/1540] avg loss 0.00270168, throughput 2.8759K wps
[Epoch 25 Batch 540/1540] avg loss 0.00272125, throughput 2.87432K wps
[Epoch 25 Batch 570/1540] avg loss 0.00287344, throughput 2.77828K wps
[Epoch 25 Batch 600/1540] avg loss 0.00253746, throughput 2.80119K wps
[Epoch 25 Batch 630/1540] avg loss 0.00224462, throughput 2.87785K wps
[Epoch 25 Batch 660/1540] avg loss 0.00269117, throughput 2.84239K wps
[Epoch 25 Batch 690/1540] avg loss 0.00285172, throughput 2.8165K wps
[Epoch 25 Batch 720/1540] avg loss 0.00284792, throughput 2.86593K wps
[Epoch 25 Batch 750/1540] avg loss 0.00320145, throughput 2.85385K wps
[Epoch 25 Batch 780/1540] avg loss 0.00298935, throughput 2.86935K wps
[Epoch 25 Batch 810/1540] avg loss 0.00289699, throughput 2.8144K wps
[Epoch 25 Batch 840/1540] avg loss 0.00262486, throughput 2.82211K wps
[Epoch 25 Batch 870/1540] avg loss 0.00248235, throughput 2.82522K wps
[Epoch 25 Batch 900/1540] avg loss 0.00269453, throughput 2.83888K wps
[Epoch 25 Batch 930/1540] avg loss 0.002994, throughput 2.85975K wps
[Epoch 25 Batch 960/1540] avg loss 0.00270713, throughput 2.87674K wps
[Epoch 25 Batch 990/1540] avg loss 0.00260148, throughput 2.87123K wps
[Epoch 25 Batch 1020/1540] avg loss 0.00303346, throughput 2.86993K wps
[Epoch 25 Batch 1050/1540] avg loss 0.00281362, throughput 2.80162K wps
[Epoch 25 Batch 1080/1540] avg loss 0.00283846, throughput 2.8296K wps
[Epoch 25 Batch 1110/1540] avg loss 0.00261867, throughput 2.86226K wps
[Epoch 25 Batch 1140/1540] avg loss 0.00289001, throughput 2.86796K wps
[Epoch 25 Batch 1170/1540] avg loss 0.00282253, throughput 2.868K wps
[Epoch 25 Batch 1200/1540] avg loss 0.00306379, throughput 2.84872K wps
[Epoch 25 Batch 1230/1540] avg loss 0.00297488, throughput 2.86199K wps
[Epoch 25 Batch 1260/1540] avg loss 0.00305977, throughput 2.87063K wps
[Epoch 25 Batch 1290/1540] avg loss 0.00293299, throughput 2.82155K wps
[Epoch 25 Batch 1320/1540] avg loss 0.00288766, throughput 2.79851K wps
[Epoch 25 Batch 1350/1540] avg loss 0.0028363, throughput 2.87568K wps
[Epoch 25 Batch 1380/1540] avg loss 0.00288244, throughput 2.87729K wps
[Epoch 25 Batch 1410/1540] avg loss 0.00294812, throughput 2.87132K wps
[Epoch 25 Batch 1440/1540] avg loss 0.00299643, throughput 2.87707K wps
[Epoch 25 Batch 1470/1540] avg loss 0.00273169, throughput 2.858K wps
[Epoch 25 Batch 1500/1540] avg loss 0.00283984, throughput 2.87097K wps
[Epoch 25 Batch 1530/1540] avg loss 0.00273135, throughput 2.85077K wps
Begin Testing...
[Epoch 25] train avg loss 0.00275846, dev acc 0.8119, dev avg loss 0.491096, throughput 2.84844K wps
[Epoch 26 Batch 30/1540] avg loss 0.00218321, throughput 2.85747K wps
[Epoch 26 Batch 60/1540] avg loss 0.00264815, throughput 2.87151K wps
[Epoch 26 Batch 90/1540] avg loss 0.00234005, throughput 2.8791K wps
[Epoch 26 Batch 120/1540] avg loss 0.00258123, throughput 2.87254K wps
[Epoch 26 Batch 150/1540] avg loss 0.00260827, throughput 2.87477K wps
[Epoch 26 Batch 180/1540] avg loss 0.00264439, throughput 2.84173K wps
[Epoch 26 Batch 210/1540] avg loss 0.0026221, throughput 2.87244K wps
[Epoch 26 Batch 240/1540] avg loss 0.00277864, throughput 2.79499K wps
[Epoch 26 Batch 270/1540] avg loss 0.00240666, throughput 2.84094K wps
[Epoch 26 Batch 300/1540] avg loss 0.00246204, throughput 2.86753K wps
[Epoch 26 Batch 330/1540] avg loss 0.00268191, throughput 2.87132K wps
[Epoch 26 Batch 360/1540] avg loss 0.00270588, throughput 2.83776K wps
[Epoch 26 Batch 390/1540] avg loss 0.0026429, throughput 2.77473K wps
[Epoch 26 Batch 420/1540] avg loss 0.00312285, throughput 2.85647K wps
[Epoch 26 Batch 450/1540] avg loss 0.00238521, throughput 2.84916K wps
[Epoch 26 Batch 480/1540] avg loss 0.00221521, throughput 2.87635K wps
[Epoch 26 Batch 510/1540] avg loss 0.00246514, throughput 2.87318K wps
[Epoch 26 Batch 540/1540] avg loss 0.00282442, throughput 2.83832K wps
[Epoch 26 Batch 570/1540] avg loss 0.00277465, throughput 2.78992K wps
[Epoch 26 Batch 600/1540] avg loss 0.0024088, throughput 2.86898K wps
[Epoch 26 Batch 630/1540] avg loss 0.00233399, throughput 2.82512K wps
[Epoch 26 Batch 660/1540] avg loss 0.00269161, throughput 2.7973K wps
[Epoch 26 Batch 690/1540] avg loss 0.00246152, throughput 2.86296K wps
[Epoch 26 Batch 720/1540] avg loss 0.00296261, throughput 2.88123K wps
[Epoch 26 Batch 750/1540] avg loss 0.00211901, throughput 2.86477K wps
[Epoch 26 Batch 780/1540] avg loss 0.00255868, throughput 2.84577K wps
[Epoch 26 Batch 810/1540] avg loss 0.00273517, throughput 2.83244K wps
[Epoch 26 Batch 840/1540] avg loss 0.00265202, throughput 2.83766K wps
[Epoch 26 Batch 870/1540] avg loss 0.00223408, throughput 2.86792K wps
[Epoch 26 Batch 900/1540] avg loss 0.00255829, throughput 2.86664K wps
[Epoch 26 Batch 930/1540] avg loss 0.00268503, throughput 2.87112K wps
[Epoch 26 Batch 960/1540] avg loss 0.00246925, throughput 2.87054K wps
[Epoch 26 Batch 990/1540] avg loss 0.00255938, throughput 2.86426K wps
[Epoch 26 Batch 1020/1540] avg loss 0.0029436, throughput 2.87613K wps
[Epoch 26 Batch 1050/1540] avg loss 0.00266188, throughput 2.86584K wps
[Epoch 26 Batch 1080/1540] avg loss 0.00259524, throughput 2.8627K wps
[Epoch 26 Batch 1110/1540] avg loss 0.00264649, throughput 2.86904K wps
[Epoch 26 Batch 1140/1540] avg loss 0.00239486, throughput 2.85016K wps
[Epoch 26 Batch 1170/1540] avg loss 0.00316717, throughput 2.87064K wps
[Epoch 26 Batch 1200/1540] avg loss 0.00298445, throughput 2.8159K wps
[Epoch 26 Batch 1230/1540] avg loss 0.00304746, throughput 2.83937K wps
[Epoch 26 Batch 1260/1540] avg loss 0.00324882, throughput 2.87217K wps
[Epoch 26 Batch 1290/1540] avg loss 0.00293062, throughput 2.87396K wps
[Epoch 26 Batch 1320/1540] avg loss 0.00291952, throughput 2.7904K wps
[Epoch 26 Batch 1350/1540] avg loss 0.00283546, throughput 2.8595K wps
[Epoch 26 Batch 1380/1540] avg loss 0.00302675, throughput 2.87004K wps
[Epoch 26 Batch 1410/1540] avg loss 0.00285152, throughput 2.83345K wps
[Epoch 26 Batch 1440/1540] avg loss 0.00274447, throughput 2.809K wps
[Epoch 26 Batch 1470/1540] avg loss 0.0028066, throughput 2.86874K wps
[Epoch 26 Batch 1500/1540] avg loss 0.00269647, throughput 2.80441K wps
[Epoch 26 Batch 1530/1540] avg loss 0.00270831, throughput 2.87358K wps
Begin Testing...
[Epoch 26] train avg loss 0.00266137, dev acc 0.8096, dev avg loss 0.495286, throughput 2.85082K wps
[Epoch 27 Batch 30/1540] avg loss 0.00246118, throughput 2.87808K wps
[Epoch 27 Batch 60/1540] avg loss 0.00240395, throughput 2.8556K wps
[Epoch 27 Batch 90/1540] avg loss 0.00269905, throughput 2.86007K wps
[Epoch 27 Batch 120/1540] avg loss 0.00271742, throughput 2.85273K wps
[Epoch 27 Batch 150/1540] avg loss 0.0022783, throughput 2.87022K wps
[Epoch 27 Batch 180/1540] avg loss 0.00225897, throughput 2.81318K wps
[Epoch 27 Batch 210/1540] avg loss 0.00256017, throughput 2.78767K wps
[Epoch 27 Batch 240/1540] avg loss 0.00245576, throughput 2.82512K wps
[Epoch 27 Batch 270/1540] avg loss 0.00228544, throughput 2.8763K wps
[Epoch 27 Batch 300/1540] avg loss 0.00231599, throughput 2.84594K wps
[Epoch 27 Batch 330/1540] avg loss 0.00232084, throughput 2.82182K wps
[Epoch 27 Batch 360/1540] avg loss 0.00227463, throughput 2.85401K wps
[Epoch 27 Batch 390/1540] avg loss 0.00279022, throughput 2.81756K wps
[Epoch 27 Batch 420/1540] avg loss 0.00263864, throughput 2.86821K wps
[Epoch 27 Batch 450/1540] avg loss 0.00269081, throughput 2.79516K wps
[Epoch 27 Batch 480/1540] avg loss 0.0025738, throughput 2.79656K wps
[Epoch 27 Batch 510/1540] avg loss 0.00286828, throughput 2.87269K wps
[Epoch 27 Batch 540/1540] avg loss 0.00263527, throughput 2.87103K wps
[Epoch 27 Batch 570/1540] avg loss 0.00217656, throughput 2.87664K wps
[Epoch 27 Batch 600/1540] avg loss 0.00253169, throughput 2.87971K wps
[Epoch 27 Batch 630/1540] avg loss 0.00255721, throughput 2.87781K wps
[Epoch 27 Batch 660/1540] avg loss 0.002426, throughput 2.87048K wps
[Epoch 27 Batch 690/1540] avg loss 0.00230612, throughput 2.84023K wps
[Epoch 27 Batch 720/1540] avg loss 0.00288651, throughput 2.82388K wps
[Epoch 27 Batch 750/1540] avg loss 0.00271023, throughput 2.8069K wps
[Epoch 27 Batch 780/1540] avg loss 0.00287535, throughput 2.88074K wps
[Epoch 27 Batch 810/1540] avg loss 0.0028715, throughput 2.8615K wps
[Epoch 27 Batch 840/1540] avg loss 0.00245226, throughput 2.87299K wps
[Epoch 27 Batch 870/1540] avg loss 0.0027245, throughput 2.8653K wps
[Epoch 27 Batch 900/1540] avg loss 0.00272985, throughput 2.83066K wps
[Epoch 27 Batch 930/1540] avg loss 0.00266352, throughput 2.87067K wps
[Epoch 27 Batch 960/1540] avg loss 0.00266061, throughput 2.81529K wps
[Epoch 27 Batch 990/1540] avg loss 0.00269541, throughput 2.8664K wps
[Epoch 27 Batch 1020/1540] avg loss 0.00274142, throughput 2.83468K wps
[Epoch 27 Batch 1050/1540] avg loss 0.00238116, throughput 2.87795K wps
[Epoch 27 Batch 1080/1540] avg loss 0.00230881, throughput 2.85835K wps
[Epoch 27 Batch 1110/1540] avg loss 0.00247955, throughput 2.85671K wps
[Epoch 27 Batch 1140/1540] avg loss 0.00282179, throughput 2.84238K wps
[Epoch 27 Batch 1170/1540] avg loss 0.00283841, throughput 2.83261K wps
[Epoch 27 Batch 1200/1540] avg loss 0.00233445, throughput 2.87606K wps
[Epoch 27 Batch 1230/1540] avg loss 0.00273525, throughput 2.87638K wps
[Epoch 27 Batch 1260/1540] avg loss 0.00261169, throughput 2.84653K wps
[Epoch 27 Batch 1290/1540] avg loss 0.00235058, throughput 2.86823K wps
[Epoch 27 Batch 1320/1540] avg loss 0.00244526, throughput 2.88128K wps
[Epoch 27 Batch 1350/1540] avg loss 0.00319654, throughput 2.86168K wps
[Epoch 27 Batch 1380/1540] avg loss 0.00279997, throughput 2.85505K wps
[Epoch 27 Batch 1410/1540] avg loss 0.00290737, throughput 2.84507K wps
[Epoch 27 Batch 1440/1540] avg loss 0.00255679, throughput 2.85172K wps
[Epoch 27 Batch 1470/1540] avg loss 0.00270105, throughput 2.79048K wps
[Epoch 27 Batch 1500/1540] avg loss 0.00274212, throughput 2.87828K wps
[Epoch 27 Batch 1530/1540] avg loss 0.00237778, throughput 2.87945K wps
Begin Testing...
[Epoch 27] train avg loss 0.0025902, dev acc 0.7982, dev avg loss 0.515343, throughput 2.85119K wps
[Epoch 28 Batch 30/1540] avg loss 0.00228338, throughput 2.89132K wps
[Epoch 28 Batch 60/1540] avg loss 0.00233125, throughput 2.81261K wps
[Epoch 28 Batch 90/1540] avg loss 0.00253886, throughput 2.86148K wps
[Epoch 28 Batch 120/1540] avg loss 0.00245367, throughput 2.84798K wps
[Epoch 28 Batch 150/1540] avg loss 0.0023146, throughput 2.86013K wps
[Epoch 28 Batch 180/1540] avg loss 0.00223532, throughput 2.86028K wps
[Epoch 28 Batch 210/1540] avg loss 0.00242364, throughput 2.84039K wps
[Epoch 28 Batch 240/1540] avg loss 0.00249072, throughput 2.86368K wps
[Epoch 28 Batch 270/1540] avg loss 0.00197505, throughput 2.87967K wps
[Epoch 28 Batch 300/1540] avg loss 0.00259979, throughput 2.88021K wps
[Epoch 28 Batch 330/1540] avg loss 0.00234831, throughput 2.87102K wps
[Epoch 28 Batch 360/1540] avg loss 0.00216768, throughput 2.87226K wps
[Epoch 28 Batch 390/1540] avg loss 0.00285854, throughput 2.86992K wps
[Epoch 28 Batch 420/1540] avg loss 0.00253271, throughput 2.85416K wps
[Epoch 28 Batch 450/1540] avg loss 0.00258677, throughput 2.87274K wps
[Epoch 28 Batch 480/1540] avg loss 0.00235767, throughput 2.87232K wps
[Epoch 28 Batch 510/1540] avg loss 0.00235342, throughput 2.8453K wps
[Epoch 28 Batch 540/1540] avg loss 0.00245616, throughput 2.87947K wps
[Epoch 28 Batch 570/1540] avg loss 0.00274652, throughput 2.87849K wps
[Epoch 28 Batch 600/1540] avg loss 0.00284016, throughput 2.87783K wps
[Epoch 28 Batch 630/1540] avg loss 0.00238182, throughput 2.87911K wps
[Epoch 28 Batch 660/1540] avg loss 0.00248645, throughput 2.86713K wps
[Epoch 28 Batch 690/1540] avg loss 0.00251373, throughput 2.88577K wps
[Epoch 28 Batch 720/1540] avg loss 0.00251666, throughput 2.85217K wps
[Epoch 28 Batch 750/1540] avg loss 0.00284883, throughput 2.8649K wps
[Epoch 28 Batch 780/1540] avg loss 0.00242574, throughput 2.81753K wps
[Epoch 28 Batch 810/1540] avg loss 0.00247685, throughput 2.80774K wps
[Epoch 28 Batch 840/1540] avg loss 0.00262612, throughput 2.83531K wps
[Epoch 28 Batch 870/1540] avg loss 0.00244886, throughput 2.86931K wps
[Epoch 28 Batch 900/1540] avg loss 0.00250142, throughput 2.88178K wps
[Epoch 28 Batch 930/1540] avg loss 0.00267023, throughput 2.8876K wps
[Epoch 28 Batch 960/1540] avg loss 0.00237948, throughput 2.85451K wps
[Epoch 28 Batch 990/1540] avg loss 0.00239708, throughput 2.86119K wps
[Epoch 28 Batch 1020/1540] avg loss 0.0024045, throughput 2.82237K wps
[Epoch 28 Batch 1050/1540] avg loss 0.00250392, throughput 2.87394K wps
[Epoch 28 Batch 1080/1540] avg loss 0.0024827, throughput 2.85558K wps
[Epoch 28 Batch 1110/1540] avg loss 0.00277882, throughput 2.87575K wps
[Epoch 28 Batch 1140/1540] avg loss 0.0025416, throughput 2.8487K wps
[Epoch 28 Batch 1170/1540] avg loss 0.00266478, throughput 2.81752K wps
[Epoch 28 Batch 1200/1540] avg loss 0.00228312, throughput 2.86405K wps
[Epoch 28 Batch 1230/1540] avg loss 0.00263939, throughput 2.83652K wps
[Epoch 28 Batch 1260/1540] avg loss 0.00272009, throughput 2.83712K wps
[Epoch 28 Batch 1290/1540] avg loss 0.00238782, throughput 2.84294K wps
[Epoch 28 Batch 1320/1540] avg loss 0.00269143, throughput 2.87406K wps
[Epoch 28 Batch 1350/1540] avg loss 0.00261923, throughput 2.8436K wps
[Epoch 28 Batch 1380/1540] avg loss 0.00244292, throughput 2.86726K wps
[Epoch 28 Batch 1410/1540] avg loss 0.00283314, throughput 2.85948K wps
[Epoch 28 Batch 1440/1540] avg loss 0.00296519, throughput 2.85382K wps
[Epoch 28 Batch 1470/1540] avg loss 0.0028381, throughput 2.85992K wps
[Epoch 28 Batch 1500/1540] avg loss 0.00274499, throughput 2.8747K wps
[Epoch 28 Batch 1530/1540] avg loss 0.00244384, throughput 2.86842K wps
Begin Testing...
[Epoch 28] train avg loss 0.00251822, dev acc 0.8096, dev avg loss 0.504692, throughput 2.85917K wps
[Epoch 29 Batch 30/1540] avg loss 0.00220243, throughput 2.93209K wps
[Epoch 29 Batch 60/1540] avg loss 0.00210587, throughput 2.87543K wps
[Epoch 29 Batch 90/1540] avg loss 0.00197657, throughput 2.83075K wps
[Epoch 29 Batch 120/1540] avg loss 0.00226427, throughput 2.88428K wps
[Epoch 29 Batch 150/1540] avg loss 0.00238262, throughput 2.87188K wps
[Epoch 29 Batch 180/1540] avg loss 0.00260317, throughput 2.87276K wps
[Epoch 29 Batch 210/1540] avg loss 0.00275745, throughput 2.85899K wps
[Epoch 29 Batch 240/1540] avg loss 0.00265075, throughput 2.88135K wps
[Epoch 29 Batch 270/1540] avg loss 0.00217534, throughput 2.81097K wps
[Epoch 29 Batch 300/1540] avg loss 0.00235779, throughput 2.88226K wps
[Epoch 29 Batch 330/1540] avg loss 0.00241207, throughput 2.81043K wps
[Epoch 29 Batch 360/1540] avg loss 0.00216112, throughput 2.87903K wps
[Epoch 29 Batch 390/1540] avg loss 0.00275802, throughput 2.86638K wps
[Epoch 29 Batch 420/1540] avg loss 0.00228035, throughput 2.8688K wps
[Epoch 29 Batch 450/1540] avg loss 0.00223184, throughput 2.83573K wps
[Epoch 29 Batch 480/1540] avg loss 0.00209789, throughput 2.83949K wps
[Epoch 29 Batch 510/1540] avg loss 0.0024355, throughput 2.85662K wps
[Epoch 29 Batch 540/1540] avg loss 0.00264685, throughput 2.86145K wps
[Epoch 29 Batch 570/1540] avg loss 0.00261647, throughput 2.86975K wps
[Epoch 29 Batch 600/1540] avg loss 0.00249194, throughput 2.87534K wps
[Epoch 29 Batch 630/1540] avg loss 0.00268032, throughput 2.88172K wps
[Epoch 29 Batch 660/1540] avg loss 0.00258318, throughput 2.88031K wps
[Epoch 29 Batch 690/1540] avg loss 0.00276549, throughput 2.86385K wps
[Epoch 29 Batch 720/1540] avg loss 0.00240866, throughput 2.86482K wps
[Epoch 29 Batch 750/1540] avg loss 0.00249284, throughput 2.87368K wps
[Epoch 29 Batch 780/1540] avg loss 0.00273856, throughput 2.88159K wps
[Epoch 29 Batch 810/1540] avg loss 0.00262014, throughput 2.86703K wps
[Epoch 29 Batch 840/1540] avg loss 0.00233887, throughput 2.83469K wps
[Epoch 29 Batch 870/1540] avg loss 0.00244755, throughput 2.88054K wps
[Epoch 29 Batch 900/1540] avg loss 0.00303119, throughput 2.81453K wps
[Epoch 29 Batch 930/1540] avg loss 0.00222622, throughput 2.86964K wps
[Epoch 29 Batch 960/1540] avg loss 0.00252494, throughput 2.87435K wps
[Epoch 29 Batch 990/1540] avg loss 0.00274682, throughput 2.86684K wps
[Epoch 29 Batch 1020/1540] avg loss 0.00224375, throughput 2.86833K wps
[Epoch 29 Batch 1050/1540] avg loss 0.00262307, throughput 2.86041K wps
[Epoch 29 Batch 1080/1540] avg loss 0.00263917, throughput 2.78711K wps
[Epoch 29 Batch 1110/1540] avg loss 0.0026632, throughput 2.8619K wps
[Epoch 29 Batch 1140/1540] avg loss 0.00230619, throughput 2.87159K wps
[Epoch 29 Batch 1170/1540] avg loss 0.00230045, throughput 2.84917K wps
[Epoch 29 Batch 1200/1540] avg loss 0.0023643, throughput 2.79813K wps
[Epoch 29 Batch 1230/1540] avg loss 0.00225357, throughput 2.84219K wps
[Epoch 29 Batch 1260/1540] avg loss 0.00230815, throughput 2.81972K wps
[Epoch 29 Batch 1290/1540] avg loss 0.00267943, throughput 2.87262K wps
[Epoch 29 Batch 1320/1540] avg loss 0.00268418, throughput 2.85925K wps
[Epoch 29 Batch 1350/1540] avg loss 0.00269808, throughput 2.80299K wps
[Epoch 29 Batch 1380/1540] avg loss 0.00256758, throughput 2.86826K wps
[Epoch 29 Batch 1410/1540] avg loss 0.00212008, throughput 2.85703K wps
[Epoch 29 Batch 1440/1540] avg loss 0.00247645, throughput 2.8653K wps
[Epoch 29 Batch 1470/1540] avg loss 0.00270505, throughput 2.8239K wps
[Epoch 29 Batch 1500/1540] avg loss 0.00279965, throughput 2.877K wps
[Epoch 29 Batch 1530/1540] avg loss 0.00279424, throughput 2.87309K wps
Begin Testing...
[Epoch 29] train avg loss 0.00248326, dev acc 0.8177, dev avg loss 0.516655, throughput 2.85814K wps
[Epoch 30 Batch 30/1540] avg loss 0.00249684, throughput 2.84692K wps
[Epoch 30 Batch 60/1540] avg loss 0.00224051, throughput 2.80346K wps
[Epoch 30 Batch 90/1540] avg loss 0.00211147, throughput 2.79707K wps
[Epoch 30 Batch 120/1540] avg loss 0.00249741, throughput 2.82393K wps
[Epoch 30 Batch 150/1540] avg loss 0.00240749, throughput 2.87294K wps
[Epoch 30 Batch 180/1540] avg loss 0.00228212, throughput 2.85892K wps
[Epoch 30 Batch 210/1540] avg loss 0.00233092, throughput 2.83533K wps
[Epoch 30 Batch 240/1540] avg loss 0.00182019, throughput 2.88076K wps
[Epoch 30 Batch 270/1540] avg loss 0.00241824, throughput 2.87565K wps
[Epoch 30 Batch 300/1540] avg loss 0.00217923, throughput 2.87798K wps
[Epoch 30 Batch 330/1540] avg loss 0.00247861, throughput 2.83475K wps
[Epoch 30 Batch 360/1540] avg loss 0.00249643, throughput 2.8807K wps
[Epoch 30 Batch 390/1540] avg loss 0.00233962, throughput 2.86414K wps
[Epoch 30 Batch 420/1540] avg loss 0.00263992, throughput 2.83819K wps
[Epoch 30 Batch 450/1540] avg loss 0.00224389, throughput 2.87914K wps
[Epoch 30 Batch 480/1540] avg loss 0.00267016, throughput 2.86616K wps
[Epoch 30 Batch 510/1540] avg loss 0.00271132, throughput 2.8525K wps
[Epoch 30 Batch 540/1540] avg loss 0.00266553, throughput 2.87063K wps
[Epoch 30 Batch 570/1540] avg loss 0.00252963, throughput 2.78864K wps
[Epoch 30 Batch 600/1540] avg loss 0.00208616, throughput 2.84121K wps
[Epoch 30 Batch 630/1540] avg loss 0.00200684, throughput 2.86594K wps
[Epoch 30 Batch 660/1540] avg loss 0.00237625, throughput 2.81404K wps
[Epoch 30 Batch 690/1540] avg loss 0.00218422, throughput 2.79076K wps
[Epoch 30 Batch 720/1540] avg loss 0.00219682, throughput 2.80399K wps
[Epoch 30 Batch 750/1540] avg loss 0.00229147, throughput 2.87999K wps
[Epoch 30 Batch 780/1540] avg loss 0.00233648, throughput 2.87323K wps
[Epoch 30 Batch 810/1540] avg loss 0.00264328, throughput 2.87955K wps
[Epoch 30 Batch 840/1540] avg loss 0.00238581, throughput 2.84479K wps
[Epoch 30 Batch 870/1540] avg loss 0.00240443, throughput 2.87833K wps
[Epoch 30 Batch 900/1540] avg loss 0.00262961, throughput 2.88165K wps
[Epoch 30 Batch 930/1540] avg loss 0.002459, throughput 2.80111K wps
[Epoch 30 Batch 960/1540] avg loss 0.00222966, throughput 2.8803K wps
[Epoch 30 Batch 990/1540] avg loss 0.00289183, throughput 2.8867K wps
[Epoch 30 Batch 1020/1540] avg loss 0.00255286, throughput 2.88239K wps
[Epoch 30 Batch 1050/1540] avg loss 0.00268509, throughput 2.87941K wps
[Epoch 30 Batch 1080/1540] avg loss 0.00229046, throughput 2.88672K wps
[Epoch 30 Batch 1110/1540] avg loss 0.00231073, throughput 2.87443K wps
[Epoch 30 Batch 1140/1540] avg loss 0.00241119, throughput 2.88173K wps
[Epoch 30 Batch 1170/1540] avg loss 0.00260786, throughput 2.88001K wps
[Epoch 30 Batch 1200/1540] avg loss 0.00254282, throughput 2.87745K wps
[Epoch 30 Batch 1230/1540] avg loss 0.00230332, throughput 2.85858K wps
[Epoch 30 Batch 1260/1540] avg loss 0.00292424, throughput 2.87277K wps
[Epoch 30 Batch 1290/1540] avg loss 0.00230319, throughput 2.87746K wps
[Epoch 30 Batch 1320/1540] avg loss 0.00301152, throughput 2.87651K wps
[Epoch 30 Batch 1350/1540] avg loss 0.00254093, throughput 2.88325K wps
[Epoch 30 Batch 1380/1540] avg loss 0.00246553, throughput 2.85992K wps
[Epoch 30 Batch 1410/1540] avg loss 0.00257992, throughput 2.81248K wps
[Epoch 30 Batch 1440/1540] avg loss 0.0022797, throughput 2.87931K wps
[Epoch 30 Batch 1470/1540] avg loss 0.00257673, throughput 2.83721K wps
[Epoch 30 Batch 1500/1540] avg loss 0.00243389, throughput 2.88206K wps
[Epoch 30 Batch 1530/1540] avg loss 0.00253533, throughput 2.8694K wps
Begin Testing...
[Epoch 30] train avg loss 0.00243702, dev acc 0.8154, dev avg loss 0.516638, throughput 2.85755K wps
[Epoch 31 Batch 30/1540] avg loss 0.00219411, throughput 2.90708K wps
[Epoch 31 Batch 60/1540] avg loss 0.00205942, throughput 2.88101K wps
[Epoch 31 Batch 90/1540] avg loss 0.00222492, throughput 2.80635K wps
[Epoch 31 Batch 120/1540] avg loss 0.00214728, throughput 2.8347K wps
[Epoch 31 Batch 150/1540] avg loss 0.00207643, throughput 2.87381K wps
[Epoch 31 Batch 180/1540] avg loss 0.00196659, throughput 2.82518K wps
[Epoch 31 Batch 210/1540] avg loss 0.00230899, throughput 2.86587K wps
[Epoch 31 Batch 240/1540] avg loss 0.0023188, throughput 2.84854K wps
[Epoch 31 Batch 270/1540] avg loss 0.00229944, throughput 2.83883K wps
[Epoch 31 Batch 300/1540] avg loss 0.00225168, throughput 2.87791K wps
[Epoch 31 Batch 330/1540] avg loss 0.00251071, throughput 2.83947K wps
[Epoch 31 Batch 360/1540] avg loss 0.00222413, throughput 2.82538K wps
[Epoch 31 Batch 390/1540] avg loss 0.00217893, throughput 2.87949K wps
[Epoch 31 Batch 420/1540] avg loss 0.00220619, throughput 2.87669K wps
[Epoch 31 Batch 450/1540] avg loss 0.00239147, throughput 2.86577K wps
[Epoch 31 Batch 480/1540] avg loss 0.00211935, throughput 2.84098K wps
[Epoch 31 Batch 510/1540] avg loss 0.00214073, throughput 2.87617K wps
[Epoch 31 Batch 540/1540] avg loss 0.00245475, throughput 2.81411K wps
[Epoch 31 Batch 570/1540] avg loss 0.00237386, throughput 2.85405K wps
[Epoch 31 Batch 600/1540] avg loss 0.00250113, throughput 2.81768K wps
[Epoch 31 Batch 630/1540] avg loss 0.00277091, throughput 2.80837K wps
[Epoch 31 Batch 660/1540] avg loss 0.00235698, throughput 2.88049K wps
[Epoch 31 Batch 690/1540] avg loss 0.00254591, throughput 2.80085K wps
[Epoch 31 Batch 720/1540] avg loss 0.00250336, throughput 2.85967K wps
[Epoch 31 Batch 750/1540] avg loss 0.00243692, throughput 2.88311K wps
[Epoch 31 Batch 780/1540] avg loss 0.00261564, throughput 2.84254K wps
[Epoch 31 Batch 810/1540] avg loss 0.00241309, throughput 2.87043K wps
[Epoch 31 Batch 840/1540] avg loss 0.00248606, throughput 2.8825K wps
[Epoch 31 Batch 870/1540] avg loss 0.0024686, throughput 2.88461K wps
[Epoch 31 Batch 900/1540] avg loss 0.00222189, throughput 2.87109K wps
[Epoch 31 Batch 930/1540] avg loss 0.00234837, throughput 2.84498K wps
[Epoch 31 Batch 960/1540] avg loss 0.00223518, throughput 2.87765K wps
[Epoch 31 Batch 990/1540] avg loss 0.00232127, throughput 2.87271K wps
[Epoch 31 Batch 1020/1540] avg loss 0.00239307, throughput 2.82028K wps
[Epoch 31 Batch 1050/1540] avg loss 0.00217816, throughput 2.80185K wps
[Epoch 31 Batch 1080/1540] avg loss 0.00226759, throughput 2.85077K wps
[Epoch 31 Batch 1110/1540] avg loss 0.00235543, throughput 2.85122K wps
[Epoch 31 Batch 1140/1540] avg loss 0.00260483, throughput 2.81749K wps
[Epoch 31 Batch 1170/1540] avg loss 0.00245269, throughput 2.86976K wps
[Epoch 31 Batch 1200/1540] avg loss 0.00220001, throughput 2.83038K wps
[Epoch 31 Batch 1230/1540] avg loss 0.0026128, throughput 2.86337K wps
[Epoch 31 Batch 1260/1540] avg loss 0.00229015, throughput 2.88265K wps
[Epoch 31 Batch 1290/1540] avg loss 0.00242302, throughput 2.86374K wps
[Epoch 31 Batch 1320/1540] avg loss 0.00209875, throughput 2.8733K wps
[Epoch 31 Batch 1350/1540] avg loss 0.00238127, throughput 2.88107K wps
[Epoch 31 Batch 1380/1540] avg loss 0.00266171, throughput 2.87448K wps
[Epoch 31 Batch 1410/1540] avg loss 0.00246678, throughput 2.87371K wps
[Epoch 31 Batch 1440/1540] avg loss 0.00290741, throughput 2.8734K wps
[Epoch 31 Batch 1470/1540] avg loss 0.00234536, throughput 2.88207K wps
[Epoch 31 Batch 1500/1540] avg loss 0.00267041, throughput 2.88081K wps
[Epoch 31 Batch 1530/1540] avg loss 0.0024056, throughput 2.86819K wps
Begin Testing...
[Epoch 31] train avg loss 0.00235816, dev acc 0.8165, dev avg loss 0.52717, throughput 2.85616K wps
[Epoch 32 Batch 30/1540] avg loss 0.00211436, throughput 2.90277K wps
[Epoch 32 Batch 60/1540] avg loss 0.00203118, throughput 2.87101K wps
[Epoch 32 Batch 90/1540] avg loss 0.00183313, throughput 2.87019K wps
[Epoch 32 Batch 120/1540] avg loss 0.00222616, throughput 2.85972K wps
[Epoch 32 Batch 150/1540] avg loss 0.00187605, throughput 2.87751K wps
[Epoch 32 Batch 180/1540] avg loss 0.00207781, throughput 2.88123K wps
[Epoch 32 Batch 210/1540] avg loss 0.0021645, throughput 2.87135K wps
[Epoch 32 Batch 240/1540] avg loss 0.00224863, throughput 2.87482K wps
[Epoch 32 Batch 270/1540] avg loss 0.00267401, throughput 2.85655K wps
[Epoch 32 Batch 300/1540] avg loss 0.00209379, throughput 2.86763K wps
[Epoch 32 Batch 330/1540] avg loss 0.00217112, throughput 2.83764K wps
[Epoch 32 Batch 360/1540] avg loss 0.00208362, throughput 2.87869K wps
[Epoch 32 Batch 390/1540] avg loss 0.00222926, throughput 2.87161K wps
[Epoch 32 Batch 420/1540] avg loss 0.00215904, throughput 2.87215K wps
[Epoch 32 Batch 450/1540] avg loss 0.00205951, throughput 2.87373K wps
[Epoch 32 Batch 480/1540] avg loss 0.00213944, throughput 2.85716K wps
[Epoch 32 Batch 510/1540] avg loss 0.002194, throughput 2.81045K wps
[Epoch 32 Batch 540/1540] avg loss 0.00200761, throughput 2.85571K wps
[Epoch 32 Batch 570/1540] avg loss 0.00241187, throughput 2.78898K wps
[Epoch 32 Batch 600/1540] avg loss 0.00244464, throughput 2.83995K wps
[Epoch 32 Batch 630/1540] avg loss 0.00209909, throughput 2.86353K wps
[Epoch 32 Batch 660/1540] avg loss 0.00224067, throughput 2.86819K wps
[Epoch 32 Batch 690/1540] avg loss 0.00267477, throughput 2.86914K wps
[Epoch 32 Batch 720/1540] avg loss 0.002766, throughput 2.86665K wps
[Epoch 32 Batch 750/1540] avg loss 0.00190201, throughput 2.84002K wps
[Epoch 32 Batch 780/1540] avg loss 0.00211018, throughput 2.8583K wps
[Epoch 32 Batch 810/1540] avg loss 0.00250544, throughput 2.84365K wps
[Epoch 32 Batch 840/1540] avg loss 0.00231367, throughput 2.7983K wps
[Epoch 32 Batch 870/1540] avg loss 0.00279162, throughput 2.84359K wps
[Epoch 32 Batch 900/1540] avg loss 0.00228412, throughput 2.86716K wps
[Epoch 32 Batch 930/1540] avg loss 0.00245598, throughput 2.86161K wps
[Epoch 32 Batch 960/1540] avg loss 0.00232981, throughput 2.83784K wps
[Epoch 32 Batch 990/1540] avg loss 0.00234548, throughput 2.88126K wps
[Epoch 32 Batch 1020/1540] avg loss 0.00233246, throughput 2.85673K wps
[Epoch 32 Batch 1050/1540] avg loss 0.00195456, throughput 2.87016K wps
[Epoch 32 Batch 1080/1540] avg loss 0.00239443, throughput 2.87208K wps
[Epoch 32 Batch 1110/1540] avg loss 0.00197155, throughput 2.79251K wps
[Epoch 32 Batch 1140/1540] avg loss 0.00240771, throughput 2.88495K wps
[Epoch 32 Batch 1170/1540] avg loss 0.0021934, throughput 2.82483K wps
[Epoch 32 Batch 1200/1540] avg loss 0.00240031, throughput 2.87241K wps
[Epoch 32 Batch 1230/1540] avg loss 0.00225106, throughput 2.82701K wps
[Epoch 32 Batch 1260/1540] avg loss 0.00301525, throughput 2.87448K wps
[Epoch 32 Batch 1290/1540] avg loss 0.00240669, throughput 2.8722K wps
[Epoch 32 Batch 1320/1540] avg loss 0.00229283, throughput 2.87912K wps
[Epoch 32 Batch 1350/1540] avg loss 0.00261169, throughput 2.87254K wps
[Epoch 32 Batch 1380/1540] avg loss 0.00237748, throughput 2.87887K wps
[Epoch 32 Batch 1410/1540] avg loss 0.00237952, throughput 2.88371K wps
[Epoch 32 Batch 1440/1540] avg loss 0.00254333, throughput 2.86681K wps
[Epoch 32 Batch 1470/1540] avg loss 0.00250628, throughput 2.87861K wps
[Epoch 32 Batch 1500/1540] avg loss 0.00251843, throughput 2.87753K wps
[Epoch 32 Batch 1530/1540] avg loss 0.00225214, throughput 2.86891K wps
Begin Testing...
[Epoch 32] train avg loss 0.00229141, dev acc 0.8154, dev avg loss 0.536363, throughput 2.86009K wps
[Epoch 33 Batch 30/1540] avg loss 0.00225799, throughput 2.92338K wps
[Epoch 33 Batch 60/1540] avg loss 0.00214523, throughput 2.83063K wps
[Epoch 33 Batch 90/1540] avg loss 0.00236228, throughput 2.83426K wps
[Epoch 33 Batch 120/1540] avg loss 0.0021248, throughput 2.84179K wps
[Epoch 33 Batch 150/1540] avg loss 0.00187805, throughput 2.85181K wps
[Epoch 33 Batch 180/1540] avg loss 0.00185418, throughput 2.859K wps
[Epoch 33 Batch 210/1540] avg loss 0.00210872, throughput 2.86489K wps
[Epoch 33 Batch 240/1540] avg loss 0.00196198, throughput 2.86395K wps
[Epoch 33 Batch 270/1540] avg loss 0.00232542, throughput 2.84213K wps
[Epoch 33 Batch 300/1540] avg loss 0.00211268, throughput 2.86001K wps
[Epoch 33 Batch 330/1540] avg loss 0.00204724, throughput 2.87371K wps
[Epoch 33 Batch 360/1540] avg loss 0.00203589, throughput 2.87163K wps
[Epoch 33 Batch 390/1540] avg loss 0.00195799, throughput 2.87002K wps
[Epoch 33 Batch 420/1540] avg loss 0.00240106, throughput 2.87638K wps
[Epoch 33 Batch 450/1540] avg loss 0.00233312, throughput 2.87021K wps
[Epoch 33 Batch 480/1540] avg loss 0.00245062, throughput 2.8402K wps
[Epoch 33 Batch 510/1540] avg loss 0.00208068, throughput 2.82314K wps
[Epoch 33 Batch 540/1540] avg loss 0.00211097, throughput 2.85254K wps
[Epoch 33 Batch 570/1540] avg loss 0.00222104, throughput 2.85858K wps
[Epoch 33 Batch 600/1540] avg loss 0.00233959, throughput 2.8616K wps
[Epoch 33 Batch 630/1540] avg loss 0.00219302, throughput 2.83379K wps
[Epoch 33 Batch 660/1540] avg loss 0.00192819, throughput 2.85889K wps
[Epoch 33 Batch 690/1540] avg loss 0.00214907, throughput 2.86604K wps
[Epoch 33 Batch 720/1540] avg loss 0.00253004, throughput 2.87428K wps
[Epoch 33 Batch 750/1540] avg loss 0.00211787, throughput 2.84251K wps
[Epoch 33 Batch 780/1540] avg loss 0.00247597, throughput 2.85586K wps
[Epoch 33 Batch 810/1540] avg loss 0.00234229, throughput 2.83306K wps
[Epoch 33 Batch 840/1540] avg loss 0.00210054, throughput 2.81666K wps
[Epoch 33 Batch 870/1540] avg loss 0.0023982, throughput 2.88144K wps
[Epoch 33 Batch 900/1540] avg loss 0.00250825, throughput 2.88074K wps
[Epoch 33 Batch 930/1540] avg loss 0.00245583, throughput 2.84135K wps
[Epoch 33 Batch 960/1540] avg loss 0.00208338, throughput 2.82925K wps
[Epoch 33 Batch 990/1540] avg loss 0.00245968, throughput 2.86317K wps
[Epoch 33 Batch 1020/1540] avg loss 0.00217815, throughput 2.87821K wps
[Epoch 33 Batch 1050/1540] avg loss 0.0023714, throughput 2.84873K wps
[Epoch 33 Batch 1080/1540] avg loss 0.00217928, throughput 2.8604K wps
[Epoch 33 Batch 1110/1540] avg loss 0.0023243, throughput 2.87394K wps
[Epoch 33 Batch 1140/1540] avg loss 0.00252394, throughput 2.85549K wps
[Epoch 33 Batch 1170/1540] avg loss 0.00227218, throughput 2.86604K wps
[Epoch 33 Batch 1200/1540] avg loss 0.00232203, throughput 2.85986K wps
[Epoch 33 Batch 1230/1540] avg loss 0.00205795, throughput 2.88381K wps
[Epoch 33 Batch 1260/1540] avg loss 0.00225449, throughput 2.88363K wps
[Epoch 33 Batch 1290/1540] avg loss 0.00243244, throughput 2.80266K wps
[Epoch 33 Batch 1320/1540] avg loss 0.00238188, throughput 2.82778K wps
[Epoch 33 Batch 1350/1540] avg loss 0.00240426, throughput 2.81736K wps
[Epoch 33 Batch 1380/1540] avg loss 0.00240073, throughput 2.8224K wps
[Epoch 33 Batch 1410/1540] avg loss 0.00218195, throughput 2.82969K wps
[Epoch 33 Batch 1440/1540] avg loss 0.00224162, throughput 2.87371K wps
[Epoch 33 Batch 1470/1540] avg loss 0.00230394, throughput 2.83565K wps
[Epoch 33 Batch 1500/1540] avg loss 0.00211028, throughput 2.83565K wps
[Epoch 33 Batch 1530/1540] avg loss 0.00268628, throughput 2.8257K wps
Begin Testing...
[Epoch 33] train avg loss 0.00224773, dev acc 0.8096, dev avg loss 0.542604, throughput 2.85297K wps
[Epoch 34 Batch 30/1540] avg loss 0.00185699, throughput 2.88105K wps
[Epoch 34 Batch 60/1540] avg loss 0.00221423, throughput 2.83323K wps
[Epoch 34 Batch 90/1540] avg loss 0.00216446, throughput 2.80278K wps
[Epoch 34 Batch 120/1540] avg loss 0.00224172, throughput 2.8684K wps
[Epoch 34 Batch 150/1540] avg loss 0.00250105, throughput 2.85259K wps
[Epoch 34 Batch 180/1540] avg loss 0.0017713, throughput 2.81965K wps
[Epoch 34 Batch 210/1540] avg loss 0.00223324, throughput 2.84702K wps
[Epoch 34 Batch 240/1540] avg loss 0.00199088, throughput 2.87065K wps
[Epoch 34 Batch 270/1540] avg loss 0.00221241, throughput 2.87288K wps
[Epoch 34 Batch 300/1540] avg loss 0.00180324, throughput 2.81408K wps
[Epoch 34 Batch 330/1540] avg loss 0.00201388, throughput 2.86678K wps
[Epoch 34 Batch 360/1540] avg loss 0.00208103, throughput 2.87282K wps
[Epoch 34 Batch 390/1540] avg loss 0.00172043, throughput 2.82917K wps
[Epoch 34 Batch 420/1540] avg loss 0.0019157, throughput 2.8736K wps
[Epoch 34 Batch 450/1540] avg loss 0.00223167, throughput 2.88166K wps
[Epoch 34 Batch 480/1540] avg loss 0.0022687, throughput 2.86746K wps
[Epoch 34 Batch 510/1540] avg loss 0.00233402, throughput 2.84351K wps
[Epoch 34 Batch 540/1540] avg loss 0.00221038, throughput 2.87219K wps
[Epoch 34 Batch 570/1540] avg loss 0.0020499, throughput 2.87179K wps
[Epoch 34 Batch 600/1540] avg loss 0.00236584, throughput 2.86971K wps
[Epoch 34 Batch 630/1540] avg loss 0.00192739, throughput 2.87805K wps
[Epoch 34 Batch 660/1540] avg loss 0.00194575, throughput 2.88104K wps
[Epoch 34 Batch 690/1540] avg loss 0.00209689, throughput 2.84906K wps
[Epoch 34 Batch 720/1540] avg loss 0.00235259, throughput 2.87877K wps
[Epoch 34 Batch 750/1540] avg loss 0.00210533, throughput 2.80525K wps
[Epoch 34 Batch 780/1540] avg loss 0.00207546, throughput 2.85425K wps
[Epoch 34 Batch 810/1540] avg loss 0.00257213, throughput 2.82433K wps
[Epoch 34 Batch 840/1540] avg loss 0.00240324, throughput 2.841K wps
[Epoch 34 Batch 870/1540] avg loss 0.0021328, throughput 2.8693K wps
[Epoch 34 Batch 900/1540] avg loss 0.00200586, throughput 2.87921K wps
[Epoch 34 Batch 930/1540] avg loss 0.00251672, throughput 2.80258K wps
[Epoch 34 Batch 960/1540] avg loss 0.00215009, throughput 2.85116K wps
[Epoch 34 Batch 990/1540] avg loss 0.00255931, throughput 2.8681K wps
[Epoch 34 Batch 1020/1540] avg loss 0.00232108, throughput 2.8421K wps
[Epoch 34 Batch 1050/1540] avg loss 0.00242086, throughput 2.85429K wps
[Epoch 34 Batch 1080/1540] avg loss 0.00202535, throughput 2.85449K wps
[Epoch 34 Batch 1110/1540] avg loss 0.00198176, throughput 2.85708K wps
[Epoch 34 Batch 1140/1540] avg loss 0.00227741, throughput 2.86429K wps
[Epoch 34 Batch 1170/1540] avg loss 0.00268322, throughput 2.8361K wps
[Epoch 34 Batch 1200/1540] avg loss 0.00257884, throughput 2.86798K wps
[Epoch 34 Batch 1230/1540] avg loss 0.00246881, throughput 2.87453K wps
[Epoch 34 Batch 1260/1540] avg loss 0.00230341, throughput 2.86539K wps
[Epoch 34 Batch 1290/1540] avg loss 0.00227805, throughput 2.86089K wps
[Epoch 34 Batch 1320/1540] avg loss 0.00254792, throughput 2.78563K wps
[Epoch 34 Batch 1350/1540] avg loss 0.00216339, throughput 2.84127K wps
[Epoch 34 Batch 1380/1540] avg loss 0.00214711, throughput 2.87647K wps
[Epoch 34 Batch 1410/1540] avg loss 0.00242091, throughput 2.87934K wps
[Epoch 34 Batch 1440/1540] avg loss 0.00206269, throughput 2.83214K wps
[Epoch 34 Batch 1470/1540] avg loss 0.00244916, throughput 2.77757K wps
[Epoch 34 Batch 1500/1540] avg loss 0.00247765, throughput 2.79383K wps
[Epoch 34 Batch 1530/1540] avg loss 0.00243584, throughput 2.8428K wps
Begin Testing...
[Epoch 34] train avg loss 0.00221815, dev acc 0.8108, dev avg loss 0.538255, throughput 2.8504K wps
[Epoch 35 Batch 30/1540] avg loss 0.00185896, throughput 2.93092K wps
[Epoch 35 Batch 60/1540] avg loss 0.00194331, throughput 2.88066K wps
[Epoch 35 Batch 90/1540] avg loss 0.0018295, throughput 2.88343K wps
[Epoch 35 Batch 120/1540] avg loss 0.00200853, throughput 2.8639K wps
[Epoch 35 Batch 150/1540] avg loss 0.00207023, throughput 2.81308K wps
[Epoch 35 Batch 180/1540] avg loss 0.00245854, throughput 2.83842K wps
[Epoch 35 Batch 210/1540] avg loss 0.00176947, throughput 2.81358K wps
[Epoch 35 Batch 240/1540] avg loss 0.00214897, throughput 2.86969K wps
[Epoch 35 Batch 270/1540] avg loss 0.00180631, throughput 2.88416K wps
[Epoch 35 Batch 300/1540] avg loss 0.00201603, throughput 2.87079K wps
[Epoch 35 Batch 330/1540] avg loss 0.00234687, throughput 2.87166K wps
[Epoch 35 Batch 360/1540] avg loss 0.00213212, throughput 2.85862K wps
[Epoch 35 Batch 390/1540] avg loss 0.00212888, throughput 2.85493K wps
[Epoch 35 Batch 420/1540] avg loss 0.0019776, throughput 2.8532K wps
[Epoch 35 Batch 450/1540] avg loss 0.00215199, throughput 2.85115K wps
[Epoch 35 Batch 480/1540] avg loss 0.00237216, throughput 2.7891K wps
[Epoch 35 Batch 510/1540] avg loss 0.00208151, throughput 2.80287K wps
[Epoch 35 Batch 540/1540] avg loss 0.00231398, throughput 2.87873K wps
[Epoch 35 Batch 570/1540] avg loss 0.00197178, throughput 2.84386K wps
[Epoch 35 Batch 600/1540] avg loss 0.00220246, throughput 2.86811K wps
[Epoch 35 Batch 630/1540] avg loss 0.00192577, throughput 2.82235K wps
[Epoch 35 Batch 660/1540] avg loss 0.00196806, throughput 2.80827K wps
[Epoch 35 Batch 690/1540] avg loss 0.00209082, throughput 2.8324K wps
[Epoch 35 Batch 720/1540] avg loss 0.00198852, throughput 2.80425K wps
[Epoch 35 Batch 750/1540] avg loss 0.00216805, throughput 2.81132K wps
[Epoch 35 Batch 780/1540] avg loss 0.00210384, throughput 2.86782K wps
[Epoch 35 Batch 810/1540] avg loss 0.00246273, throughput 2.87144K wps
[Epoch 35 Batch 840/1540] avg loss 0.00224649, throughput 2.84351K wps
[Epoch 35 Batch 870/1540] avg loss 0.00198038, throughput 2.82386K wps
[Epoch 35 Batch 900/1540] avg loss 0.0020952, throughput 2.81345K wps
[Epoch 35 Batch 930/1540] avg loss 0.00227447, throughput 2.8633K wps
[Epoch 35 Batch 960/1540] avg loss 0.00240877, throughput 2.83807K wps
[Epoch 35 Batch 990/1540] avg loss 0.00233842, throughput 2.85507K wps
[Epoch 35 Batch 1020/1540] avg loss 0.00214066, throughput 2.86261K wps
[Epoch 35 Batch 1050/1540] avg loss 0.00222928, throughput 2.81984K wps
[Epoch 35 Batch 1080/1540] avg loss 0.00200818, throughput 2.82665K wps
[Epoch 35 Batch 1110/1540] avg loss 0.00203176, throughput 2.86025K wps
[Epoch 35 Batch 1140/1540] avg loss 0.00272861, throughput 2.86462K wps
[Epoch 35 Batch 1170/1540] avg loss 0.00232807, throughput 2.86727K wps
[Epoch 35 Batch 1200/1540] avg loss 0.00235257, throughput 2.87302K wps
[Epoch 35 Batch 1230/1540] avg loss 0.00257599, throughput 2.85327K wps
[Epoch 35 Batch 1260/1540] avg loss 0.00246118, throughput 2.8617K wps
[Epoch 35 Batch 1290/1540] avg loss 0.00209027, throughput 2.81844K wps
[Epoch 35 Batch 1320/1540] avg loss 0.00223507, throughput 2.87909K wps
[Epoch 35 Batch 1350/1540] avg loss 0.00210801, throughput 2.88154K wps
[Epoch 35 Batch 1380/1540] avg loss 0.00217735, throughput 2.87081K wps
[Epoch 35 Batch 1410/1540] avg loss 0.0023871, throughput 2.81082K wps
[Epoch 35 Batch 1440/1540] avg loss 0.0021464, throughput 2.84424K wps
[Epoch 35 Batch 1470/1540] avg loss 0.002057, throughput 2.87136K wps
[Epoch 35 Batch 1500/1540] avg loss 0.00200549, throughput 2.85297K wps
[Epoch 35 Batch 1530/1540] avg loss 0.00257826, throughput 2.83305K wps
Begin Testing...
[Epoch 35] train avg loss 0.0021636, dev acc 0.8165, dev avg loss 0.553693, throughput 2.8489K wps
[Epoch 36 Batch 30/1540] avg loss 0.0021115, throughput 2.89839K wps
[Epoch 36 Batch 60/1540] avg loss 0.00207805, throughput 2.80277K wps
[Epoch 36 Batch 90/1540] avg loss 0.00205815, throughput 2.87815K wps
[Epoch 36 Batch 120/1540] avg loss 0.00205485, throughput 2.87807K wps
[Epoch 36 Batch 150/1540] avg loss 0.00184931, throughput 2.8713K wps
[Epoch 36 Batch 180/1540] avg loss 0.00186755, throughput 2.86863K wps
[Epoch 36 Batch 210/1540] avg loss 0.00186693, throughput 2.87464K wps
[Epoch 36 Batch 240/1540] avg loss 0.00177197, throughput 2.87898K wps
[Epoch 36 Batch 270/1540] avg loss 0.00197384, throughput 2.8652K wps
[Epoch 36 Batch 300/1540] avg loss 0.0019993, throughput 2.84924K wps
[Epoch 36 Batch 330/1540] avg loss 0.00199787, throughput 2.85005K wps
[Epoch 36 Batch 360/1540] avg loss 0.00188837, throughput 2.86355K wps
[Epoch 36 Batch 390/1540] avg loss 0.00217756, throughput 2.86712K wps
[Epoch 36 Batch 420/1540] avg loss 0.00212238, throughput 2.87535K wps
[Epoch 36 Batch 450/1540] avg loss 0.00173762, throughput 2.8571K wps
[Epoch 36 Batch 480/1540] avg loss 0.00241791, throughput 2.85129K wps
[Epoch 36 Batch 510/1540] avg loss 0.00211141, throughput 2.82308K wps
[Epoch 36 Batch 540/1540] avg loss 0.00233695, throughput 2.83648K wps
[Epoch 36 Batch 570/1540] avg loss 0.0019144, throughput 2.87751K wps
[Epoch 36 Batch 600/1540] avg loss 0.00202713, throughput 2.83897K wps
[Epoch 36 Batch 630/1540] avg loss 0.00210756, throughput 2.85252K wps
[Epoch 36 Batch 660/1540] avg loss 0.0021456, throughput 2.81444K wps
[Epoch 36 Batch 690/1540] avg loss 0.00225181, throughput 2.8229K wps
[Epoch 36 Batch 720/1540] avg loss 0.00219742, throughput 2.83646K wps
[Epoch 36 Batch 750/1540] avg loss 0.00240111, throughput 2.86988K wps
[Epoch 36 Batch 780/1540] avg loss 0.00221919, throughput 2.86896K wps
[Epoch 36 Batch 810/1540] avg loss 0.00228287, throughput 2.86865K wps
[Epoch 36 Batch 840/1540] avg loss 0.0022033, throughput 2.80233K wps
[Epoch 36 Batch 870/1540] avg loss 0.00204664, throughput 2.79094K wps
[Epoch 36 Batch 900/1540] avg loss 0.0018718, throughput 2.84466K wps
[Epoch 36 Batch 930/1540] avg loss 0.00209306, throughput 2.872K wps
[Epoch 36 Batch 960/1540] avg loss 0.00203598, throughput 2.86635K wps
[Epoch 36 Batch 990/1540] avg loss 0.00242442, throughput 2.86215K wps
[Epoch 36 Batch 1020/1540] avg loss 0.0021988, throughput 2.88412K wps
[Epoch 36 Batch 1050/1540] avg loss 0.00248499, throughput 2.87011K wps
[Epoch 36 Batch 1080/1540] avg loss 0.00222674, throughput 2.87588K wps
[Epoch 36 Batch 1110/1540] avg loss 0.00213861, throughput 2.8748K wps
[Epoch 36 Batch 1140/1540] avg loss 0.00200535, throughput 2.83911K wps
[Epoch 36 Batch 1170/1540] avg loss 0.00220015, throughput 2.87841K wps
[Epoch 36 Batch 1200/1540] avg loss 0.00201907, throughput 2.83236K wps
[Epoch 36 Batch 1230/1540] avg loss 0.0021985, throughput 2.86731K wps
[Epoch 36 Batch 1260/1540] avg loss 0.0020466, throughput 2.8519K wps
[Epoch 36 Batch 1290/1540] avg loss 0.00217537, throughput 2.8769K wps
[Epoch 36 Batch 1320/1540] avg loss 0.00198458, throughput 2.86548K wps
[Epoch 36 Batch 1350/1540] avg loss 0.0025639, throughput 2.84324K wps
[Epoch 36 Batch 1380/1540] avg loss 0.00238164, throughput 2.78711K wps
[Epoch 36 Batch 1410/1540] avg loss 0.00242361, throughput 2.86416K wps
[Epoch 36 Batch 1440/1540] avg loss 0.00219124, throughput 2.85147K wps
[Epoch 36 Batch 1470/1540] avg loss 0.00200508, throughput 2.84818K wps
[Epoch 36 Batch 1500/1540] avg loss 0.00229067, throughput 2.84719K wps
[Epoch 36 Batch 1530/1540] avg loss 0.00223855, throughput 2.87129K wps
Begin Testing...
[Epoch 36] train avg loss 0.00212507, dev acc 0.8096, dev avg loss 0.554618, throughput 2.85464K wps
[Epoch 37 Batch 30/1540] avg loss 0.00210436, throughput 2.93619K wps
[Epoch 37 Batch 60/1540] avg loss 0.00180592, throughput 2.86578K wps
[Epoch 37 Batch 90/1540] avg loss 0.00234734, throughput 2.79721K wps
[Epoch 37 Batch 120/1540] avg loss 0.00184235, throughput 2.87658K wps
[Epoch 37 Batch 150/1540] avg loss 0.00206192, throughput 2.8674K wps
[Epoch 37 Batch 180/1540] avg loss 0.00202121, throughput 2.82734K wps
[Epoch 37 Batch 210/1540] avg loss 0.00187125, throughput 2.7942K wps
[Epoch 37 Batch 240/1540] avg loss 0.00167456, throughput 2.85961K wps
[Epoch 37 Batch 270/1540] avg loss 0.00230817, throughput 2.8587K wps
[Epoch 37 Batch 300/1540] avg loss 0.00196444, throughput 2.8271K wps
[Epoch 37 Batch 330/1540] avg loss 0.00194583, throughput 2.78978K wps
[Epoch 37 Batch 360/1540] avg loss 0.00184367, throughput 2.7902K wps
[Epoch 37 Batch 390/1540] avg loss 0.0022973, throughput 2.83419K wps
[Epoch 37 Batch 420/1540] avg loss 0.00210572, throughput 2.86415K wps
[Epoch 37 Batch 450/1540] avg loss 0.00204075, throughput 2.85247K wps
[Epoch 37 Batch 480/1540] avg loss 0.00254676, throughput 2.88298K wps
[Epoch 37 Batch 510/1540] avg loss 0.00197504, throughput 2.88082K wps
[Epoch 37 Batch 540/1540] avg loss 0.00200175, throughput 2.84706K wps
[Epoch 37 Batch 570/1540] avg loss 0.00201804, throughput 2.85232K wps
[Epoch 37 Batch 600/1540] avg loss 0.00218968, throughput 2.80061K wps
[Epoch 37 Batch 630/1540] avg loss 0.00180424, throughput 2.87844K wps
[Epoch 37 Batch 660/1540] avg loss 0.00219104, throughput 2.8K wps
[Epoch 37 Batch 690/1540] avg loss 0.00228279, throughput 2.85192K wps
[Epoch 37 Batch 720/1540] avg loss 0.00185204, throughput 2.8787K wps
[Epoch 37 Batch 750/1540] avg loss 0.00217934, throughput 2.81664K wps
[Epoch 37 Batch 780/1540] avg loss 0.00217237, throughput 2.83286K wps
[Epoch 37 Batch 810/1540] avg loss 0.00203625, throughput 2.87717K wps
[Epoch 37 Batch 840/1540] avg loss 0.00208638, throughput 2.8852K wps
[Epoch 37 Batch 870/1540] avg loss 0.00218632, throughput 2.87763K wps
[Epoch 37 Batch 900/1540] avg loss 0.00229563, throughput 2.87357K wps
[Epoch 37 Batch 930/1540] avg loss 0.00192038, throughput 2.85562K wps
[Epoch 37 Batch 960/1540] avg loss 0.00198916, throughput 2.87994K wps
[Epoch 37 Batch 990/1540] avg loss 0.00186056, throughput 2.87933K wps
[Epoch 37 Batch 1020/1540] avg loss 0.00229617, throughput 2.87639K wps
[Epoch 37 Batch 1050/1540] avg loss 0.00196085, throughput 2.87697K wps
[Epoch 37 Batch 1080/1540] avg loss 0.00195098, throughput 2.87589K wps
[Epoch 37 Batch 1110/1540] avg loss 0.00233814, throughput 2.87953K wps
[Epoch 37 Batch 1140/1540] avg loss 0.00271168, throughput 2.87449K wps
[Epoch 37 Batch 1170/1540] avg loss 0.00206024, throughput 2.87665K wps
[Epoch 37 Batch 1200/1540] avg loss 0.00206163, throughput 2.87314K wps
[Epoch 37 Batch 1230/1540] avg loss 0.00231264, throughput 2.82295K wps
[Epoch 37 Batch 1260/1540] avg loss 0.00204586, throughput 2.83773K wps
[Epoch 37 Batch 1290/1540] avg loss 0.00216765, throughput 2.8621K wps
[Epoch 37 Batch 1320/1540] avg loss 0.00167285, throughput 2.88343K wps
[Epoch 37 Batch 1350/1540] avg loss 0.00211054, throughput 2.8806K wps
[Epoch 37 Batch 1380/1540] avg loss 0.00216382, throughput 2.86759K wps
[Epoch 37 Batch 1410/1540] avg loss 0.00248898, throughput 2.88103K wps
[Epoch 37 Batch 1440/1540] avg loss 0.00217757, throughput 2.79742K wps
[Epoch 37 Batch 1470/1540] avg loss 0.0021514, throughput 2.86971K wps
[Epoch 37 Batch 1500/1540] avg loss 0.00193607, throughput 2.87656K wps
[Epoch 37 Batch 1530/1540] avg loss 0.00231819, throughput 2.82095K wps
Begin Testing...
[Epoch 37] train avg loss 0.00209352, dev acc 0.8096, dev avg loss 0.565882, throughput 2.85459K wps
[Epoch 38 Batch 30/1540] avg loss 0.00180095, throughput 2.9147K wps
[Epoch 38 Batch 60/1540] avg loss 0.0017208, throughput 2.86075K wps
[Epoch 38 Batch 90/1540] avg loss 0.00201787, throughput 2.85928K wps
[Epoch 38 Batch 120/1540] avg loss 0.00202718, throughput 2.88261K wps
[Epoch 38 Batch 150/1540] avg loss 0.00205406, throughput 2.87638K wps
[Epoch 38 Batch 180/1540] avg loss 0.00204528, throughput 2.85737K wps
[Epoch 38 Batch 210/1540] avg loss 0.0017033, throughput 2.88081K wps
[Epoch 38 Batch 240/1540] avg loss 0.00215977, throughput 2.85978K wps
[Epoch 38 Batch 270/1540] avg loss 0.00183184, throughput 2.83699K wps
[Epoch 38 Batch 300/1540] avg loss 0.00156806, throughput 2.87484K wps
[Epoch 38 Batch 330/1540] avg loss 0.00180864, throughput 2.82375K wps
[Epoch 38 Batch 360/1540] avg loss 0.00199462, throughput 2.87238K wps
[Epoch 38 Batch 390/1540] avg loss 0.00191809, throughput 2.85462K wps
[Epoch 38 Batch 420/1540] avg loss 0.00216524, throughput 2.84924K wps
[Epoch 38 Batch 450/1540] avg loss 0.002003, throughput 2.79958K wps
[Epoch 38 Batch 480/1540] avg loss 0.0025177, throughput 2.82679K wps
[Epoch 38 Batch 510/1540] avg loss 0.00214566, throughput 2.81577K wps
[Epoch 38 Batch 540/1540] avg loss 0.0019765, throughput 2.81903K wps
[Epoch 38 Batch 570/1540] avg loss 0.00174708, throughput 2.86512K wps
[Epoch 38 Batch 600/1540] avg loss 0.0019421, throughput 2.87982K wps
[Epoch 38 Batch 630/1540] avg loss 0.00179642, throughput 2.86988K wps
[Epoch 38 Batch 660/1540] avg loss 0.00216928, throughput 2.87994K wps
[Epoch 38 Batch 690/1540] avg loss 0.00224404, throughput 2.87609K wps
[Epoch 38 Batch 720/1540] avg loss 0.0021105, throughput 2.8659K wps
[Epoch 38 Batch 750/1540] avg loss 0.00182211, throughput 2.84648K wps
[Epoch 38 Batch 780/1540] avg loss 0.00223217, throughput 2.84967K wps
[Epoch 38 Batch 810/1540] avg loss 0.00233699, throughput 2.85761K wps
[Epoch 38 Batch 840/1540] avg loss 0.00202785, throughput 2.88677K wps
[Epoch 38 Batch 870/1540] avg loss 0.00233091, throughput 2.8758K wps
[Epoch 38 Batch 900/1540] avg loss 0.00209934, throughput 2.83856K wps
[Epoch 38 Batch 930/1540] avg loss 0.00186704, throughput 2.81176K wps
[Epoch 38 Batch 960/1540] avg loss 0.0021071, throughput 2.87064K wps
[Epoch 38 Batch 990/1540] avg loss 0.00211989, throughput 2.85586K wps
[Epoch 38 Batch 1020/1540] avg loss 0.00280565, throughput 2.81515K wps
[Epoch 38 Batch 1050/1540] avg loss 0.00215591, throughput 2.85516K wps
[Epoch 38 Batch 1080/1540] avg loss 0.00191293, throughput 2.84864K wps
[Epoch 38 Batch 1110/1540] avg loss 0.00208784, throughput 2.84009K wps
[Epoch 38 Batch 1140/1540] avg loss 0.0019147, throughput 2.83967K wps
[Epoch 38 Batch 1170/1540] avg loss 0.00202231, throughput 2.87934K wps
[Epoch 38 Batch 1200/1540] avg loss 0.00196531, throughput 2.8398K wps
[Epoch 38 Batch 1230/1540] avg loss 0.00231756, throughput 2.8469K wps
[Epoch 38 Batch 1260/1540] avg loss 0.00211065, throughput 2.84669K wps
[Epoch 38 Batch 1290/1540] avg loss 0.00191927, throughput 2.80657K wps
[Epoch 38 Batch 1320/1540] avg loss 0.00209143, throughput 2.87651K wps
[Epoch 38 Batch 1350/1540] avg loss 0.00218222, throughput 2.87075K wps
[Epoch 38 Batch 1380/1540] avg loss 0.00206696, throughput 2.87966K wps
[Epoch 38 Batch 1410/1540] avg loss 0.00201909, throughput 2.87737K wps
[Epoch 38 Batch 1440/1540] avg loss 0.00202192, throughput 2.8337K wps
[Epoch 38 Batch 1470/1540] avg loss 0.00184309, throughput 2.86892K wps
[Epoch 38 Batch 1500/1540] avg loss 0.0023982, throughput 2.84931K wps
[Epoch 38 Batch 1530/1540] avg loss 0.00211437, throughput 2.86319K wps
Begin Testing...
[Epoch 38] train avg loss 0.00204496, dev acc 0.8096, dev avg loss 0.562796, throughput 2.85538K wps
[Epoch 39 Batch 30/1540] avg loss 0.00153012, throughput 2.91749K wps
[Epoch 39 Batch 60/1540] avg loss 0.00195696, throughput 2.88058K wps
[Epoch 39 Batch 90/1540] avg loss 0.00184284, throughput 2.79921K wps
[Epoch 39 Batch 120/1540] avg loss 0.00197958, throughput 2.84031K wps
[Epoch 39 Batch 150/1540] avg loss 0.00164252, throughput 2.87511K wps
[Epoch 39 Batch 180/1540] avg loss 0.00189357, throughput 2.83699K wps
[Epoch 39 Batch 210/1540] avg loss 0.00196673, throughput 2.85114K wps
[Epoch 39 Batch 240/1540] avg loss 0.00165407, throughput 2.85445K wps
[Epoch 39 Batch 270/1540] avg loss 0.00193866, throughput 2.86385K wps
[Epoch 39 Batch 300/1540] avg loss 0.00216221, throughput 2.82249K wps
[Epoch 39 Batch 330/1540] avg loss 0.00211454, throughput 2.8172K wps
[Epoch 39 Batch 360/1540] avg loss 0.00188173, throughput 2.87644K wps
[Epoch 39 Batch 390/1540] avg loss 0.00218001, throughput 2.8755K wps
[Epoch 39 Batch 420/1540] avg loss 0.00167905, throughput 2.8677K wps
[Epoch 39 Batch 450/1540] avg loss 0.00192989, throughput 2.85733K wps
[Epoch 39 Batch 480/1540] avg loss 0.0020026, throughput 2.86614K wps
[Epoch 39 Batch 510/1540] avg loss 0.0021556, throughput 2.85441K wps
[Epoch 39 Batch 540/1540] avg loss 0.00215714, throughput 2.86689K wps
[Epoch 39 Batch 570/1540] avg loss 0.00225613, throughput 2.85375K wps
[Epoch 39 Batch 600/1540] avg loss 0.00186799, throughput 2.82486K wps
[Epoch 39 Batch 630/1540] avg loss 0.00208429, throughput 2.87744K wps
[Epoch 39 Batch 660/1540] avg loss 0.00190992, throughput 2.8771K wps
[Epoch 39 Batch 690/1540] avg loss 0.00176221, throughput 2.88322K wps
[Epoch 39 Batch 720/1540] avg loss 0.0018107, throughput 2.8193K wps
[Epoch 39 Batch 750/1540] avg loss 0.00233997, throughput 2.87724K wps
[Epoch 39 Batch 780/1540] avg loss 0.00196236, throughput 2.8813K wps
[Epoch 39 Batch 810/1540] avg loss 0.00205055, throughput 2.87512K wps
[Epoch 39 Batch 840/1540] avg loss 0.00209519, throughput 2.87575K wps
[Epoch 39 Batch 870/1540] avg loss 0.00214716, throughput 2.87615K wps
[Epoch 39 Batch 900/1540] avg loss 0.00195357, throughput 2.86409K wps
[Epoch 39 Batch 930/1540] avg loss 0.00183115, throughput 2.86873K wps
[Epoch 39 Batch 960/1540] avg loss 0.00191255, throughput 2.8755K wps
[Epoch 39 Batch 990/1540] avg loss 0.00217186, throughput 2.88237K wps
[Epoch 39 Batch 1020/1540] avg loss 0.00219625, throughput 2.87137K wps
[Epoch 39 Batch 1050/1540] avg loss 0.00187426, throughput 2.81665K wps
[Epoch 39 Batch 1080/1540] avg loss 0.00195803, throughput 2.8332K wps
[Epoch 39 Batch 1110/1540] avg loss 0.00181246, throughput 2.83685K wps
[Epoch 39 Batch 1140/1540] avg loss 0.00216798, throughput 2.8402K wps
[Epoch 39 Batch 1170/1540] avg loss 0.00206864, throughput 2.82461K wps
[Epoch 39 Batch 1200/1540] avg loss 0.00226328, throughput 2.84692K wps
[Epoch 39 Batch 1230/1540] avg loss 0.00221326, throughput 2.80599K wps
[Epoch 39 Batch 1260/1540] avg loss 0.00223156, throughput 2.88376K wps
[Epoch 39 Batch 1290/1540] avg loss 0.00189147, throughput 2.88401K wps
[Epoch 39 Batch 1320/1540] avg loss 0.00252209, throughput 2.85678K wps
[Epoch 39 Batch 1350/1540] avg loss 0.00219865, throughput 2.804K wps
[Epoch 39 Batch 1380/1540] avg loss 0.00233366, throughput 2.85097K wps
[Epoch 39 Batch 1410/1540] avg loss 0.00211024, throughput 2.8594K wps
[Epoch 39 Batch 1440/1540] avg loss 0.00191699, throughput 2.88425K wps
[Epoch 39 Batch 1470/1540] avg loss 0.0024491, throughput 2.84465K wps
[Epoch 39 Batch 1500/1540] avg loss 0.00232505, throughput 2.88042K wps
[Epoch 39 Batch 1530/1540] avg loss 0.00194146, throughput 2.84917K wps
Begin Testing...
[Epoch 39] train avg loss 0.00202279, dev acc 0.8131, dev avg loss 0.578561, throughput 2.85695K wps
[Epoch 40 Batch 30/1540] avg loss 0.00161785, throughput 2.93513K wps
[Epoch 40 Batch 60/1540] avg loss 0.00185255, throughput 2.86558K wps
[Epoch 40 Batch 90/1540] avg loss 0.00198759, throughput 2.88039K wps
[Epoch 40 Batch 120/1540] avg loss 0.00210937, throughput 2.87688K wps
[Epoch 40 Batch 150/1540] avg loss 0.00174658, throughput 2.8789K wps
[Epoch 40 Batch 180/1540] avg loss 0.00204225, throughput 2.8578K wps
[Epoch 40 Batch 210/1540] avg loss 0.00174001, throughput 2.82342K wps
[Epoch 40 Batch 240/1540] avg loss 0.00200586, throughput 2.87841K wps
[Epoch 40 Batch 270/1540] avg loss 0.00161794, throughput 2.85678K wps
[Epoch 40 Batch 300/1540] avg loss 0.0016471, throughput 2.85809K wps
[Epoch 40 Batch 330/1540] avg loss 0.0016341, throughput 2.87828K wps
[Epoch 40 Batch 360/1540] avg loss 0.0020939, throughput 2.86677K wps
[Epoch 40 Batch 390/1540] avg loss 0.00186234, throughput 2.8736K wps
[Epoch 40 Batch 420/1540] avg loss 0.00215001, throughput 2.85342K wps
[Epoch 40 Batch 450/1540] avg loss 0.00184841, throughput 2.83403K wps
[Epoch 40 Batch 480/1540] avg loss 0.00179378, throughput 2.88689K wps
[Epoch 40 Batch 510/1540] avg loss 0.00207638, throughput 2.85525K wps
[Epoch 40 Batch 540/1540] avg loss 0.00181646, throughput 2.86999K wps
[Epoch 40 Batch 570/1540] avg loss 0.00201161, throughput 2.87428K wps
[Epoch 40 Batch 600/1540] avg loss 0.00222042, throughput 2.87041K wps
[Epoch 40 Batch 630/1540] avg loss 0.00209101, throughput 2.8543K wps
[Epoch 40 Batch 660/1540] avg loss 0.00209549, throughput 2.82199K wps
[Epoch 40 Batch 690/1540] avg loss 0.00203968, throughput 2.83375K wps
[Epoch 40 Batch 720/1540] avg loss 0.00223001, throughput 2.8769K wps
[Epoch 40 Batch 750/1540] avg loss 0.00206453, throughput 2.87183K wps
[Epoch 40 Batch 780/1540] avg loss 0.00187584, throughput 2.88221K wps
[Epoch 40 Batch 810/1540] avg loss 0.00177128, throughput 2.87872K wps
[Epoch 40 Batch 840/1540] avg loss 0.00228532, throughput 2.84758K wps
[Epoch 40 Batch 870/1540] avg loss 0.00183352, throughput 2.85755K wps
[Epoch 40 Batch 900/1540] avg loss 0.00173027, throughput 2.87702K wps
[Epoch 40 Batch 930/1540] avg loss 0.00205834, throughput 2.87429K wps
[Epoch 40 Batch 960/1540] avg loss 0.0019566, throughput 2.84227K wps
[Epoch 40 Batch 990/1540] avg loss 0.00197673, throughput 2.78424K wps
[Epoch 40 Batch 1020/1540] avg loss 0.0018756, throughput 2.83753K wps
[Epoch 40 Batch 1050/1540] avg loss 0.00193235, throughput 2.86105K wps
[Epoch 40 Batch 1080/1540] avg loss 0.0021129, throughput 2.877K wps
[Epoch 40 Batch 1110/1540] avg loss 0.00222743, throughput 2.87656K wps
[Epoch 40 Batch 1140/1540] avg loss 0.00189794, throughput 2.88059K wps
[Epoch 40 Batch 1170/1540] avg loss 0.00192612, throughput 2.87509K wps
[Epoch 40 Batch 1200/1540] avg loss 0.00243412, throughput 2.86397K wps
[Epoch 40 Batch 1230/1540] avg loss 0.00214396, throughput 2.86037K wps
[Epoch 40 Batch 1260/1540] avg loss 0.00188561, throughput 2.83051K wps
[Epoch 40 Batch 1290/1540] avg loss 0.0020524, throughput 2.85282K wps
[Epoch 40 Batch 1320/1540] avg loss 0.00202463, throughput 2.8324K wps
[Epoch 40 Batch 1350/1540] avg loss 0.00198515, throughput 2.85979K wps
[Epoch 40 Batch 1380/1540] avg loss 0.00202081, throughput 2.88306K wps
[Epoch 40 Batch 1410/1540] avg loss 0.00213992, throughput 2.87167K wps
[Epoch 40 Batch 1440/1540] avg loss 0.00187656, throughput 2.88043K wps
[Epoch 40 Batch 1470/1540] avg loss 0.00188926, throughput 2.8728K wps
[Epoch 40 Batch 1500/1540] avg loss 0.00203265, throughput 2.85918K wps
[Epoch 40 Batch 1530/1540] avg loss 0.00197755, throughput 2.88599K wps
Begin Testing...
[Epoch 40] train avg loss 0.00196974, dev acc 0.8142, dev avg loss 0.57774, throughput 2.86342K wps
[Epoch 41 Batch 30/1540] avg loss 0.00159096, throughput 2.91503K wps
[Epoch 41 Batch 60/1540] avg loss 0.0017278, throughput 2.87797K wps
[Epoch 41 Batch 90/1540] avg loss 0.00153473, throughput 2.86723K wps
[Epoch 41 Batch 120/1540] avg loss 0.00190806, throughput 2.86437K wps
[Epoch 41 Batch 150/1540] avg loss 0.00193701, throughput 2.85801K wps
[Epoch 41 Batch 180/1540] avg loss 0.00153396, throughput 2.85826K wps
[Epoch 41 Batch 210/1540] avg loss 0.00195348, throughput 2.86768K wps
[Epoch 41 Batch 240/1540] avg loss 0.00170882, throughput 2.87695K wps
[Epoch 41 Batch 270/1540] avg loss 0.00212377, throughput 2.88251K wps
[Epoch 41 Batch 300/1540] avg loss 0.00190021, throughput 2.84439K wps
[Epoch 41 Batch 330/1540] avg loss 0.00209046, throughput 2.79188K wps
[Epoch 41 Batch 360/1540] avg loss 0.00188598, throughput 2.84022K wps
[Epoch 41 Batch 390/1540] avg loss 0.0015331, throughput 2.84926K wps
[Epoch 41 Batch 420/1540] avg loss 0.00158417, throughput 2.81069K wps
[Epoch 41 Batch 450/1540] avg loss 0.00204872, throughput 2.80834K wps
[Epoch 41 Batch 480/1540] avg loss 0.00165684, throughput 2.87865K wps
[Epoch 41 Batch 510/1540] avg loss 0.00205232, throughput 2.87426K wps
[Epoch 41 Batch 540/1540] avg loss 0.00212382, throughput 2.87473K wps
[Epoch 41 Batch 570/1540] avg loss 0.00203334, throughput 2.88035K wps
[Epoch 41 Batch 600/1540] avg loss 0.00151008, throughput 2.8703K wps
[Epoch 41 Batch 630/1540] avg loss 0.00196644, throughput 2.87743K wps
[Epoch 41 Batch 660/1540] avg loss 0.00191291, throughput 2.8829K wps
[Epoch 41 Batch 690/1540] avg loss 0.00181601, throughput 2.87624K wps
[Epoch 41 Batch 720/1540] avg loss 0.00174525, throughput 2.84444K wps
[Epoch 41 Batch 750/1540] avg loss 0.00190222, throughput 2.85897K wps
[Epoch 41 Batch 780/1540] avg loss 0.00186207, throughput 2.86562K wps
[Epoch 41 Batch 810/1540] avg loss 0.00223136, throughput 2.8678K wps
[Epoch 41 Batch 840/1540] avg loss 0.00161833, throughput 2.81958K wps
[Epoch 41 Batch 870/1540] avg loss 0.00190202, throughput 2.80535K wps
[Epoch 41 Batch 900/1540] avg loss 0.00182166, throughput 2.81536K wps
[Epoch 41 Batch 930/1540] avg loss 0.00184733, throughput 2.88499K wps
[Epoch 41 Batch 960/1540] avg loss 0.00229674, throughput 2.83816K wps
[Epoch 41 Batch 990/1540] avg loss 0.00196698, throughput 2.79831K wps
[Epoch 41 Batch 1020/1540] avg loss 0.001903, throughput 2.8513K wps
[Epoch 41 Batch 1050/1540] avg loss 0.00209876, throughput 2.88039K wps
[Epoch 41 Batch 1080/1540] avg loss 0.00218946, throughput 2.8412K wps
[Epoch 41 Batch 1110/1540] avg loss 0.0019529, throughput 2.86067K wps
[Epoch 41 Batch 1140/1540] avg loss 0.00218116, throughput 2.86407K wps
[Epoch 41 Batch 1170/1540] avg loss 0.00223796, throughput 2.86837K wps
[Epoch 41 Batch 1200/1540] avg loss 0.00194345, throughput 2.81795K wps
[Epoch 41 Batch 1230/1540] avg loss 0.00248052, throughput 2.85221K wps
[Epoch 41 Batch 1260/1540] avg loss 0.00226129, throughput 2.86653K wps
[Epoch 41 Batch 1290/1540] avg loss 0.00197151, throughput 2.88136K wps
[Epoch 41 Batch 1320/1540] avg loss 0.00182588, throughput 2.88406K wps
[Epoch 41 Batch 1350/1540] avg loss 0.00218226, throughput 2.8724K wps
[Epoch 41 Batch 1380/1540] avg loss 0.00221001, throughput 2.86963K wps
[Epoch 41 Batch 1410/1540] avg loss 0.00202806, throughput 2.88403K wps
[Epoch 41 Batch 1440/1540] avg loss 0.00202143, throughput 2.80294K wps
[Epoch 41 Batch 1470/1540] avg loss 0.00168593, throughput 2.84984K wps
[Epoch 41 Batch 1500/1540] avg loss 0.00174493, throughput 2.87264K wps
[Epoch 41 Batch 1530/1540] avg loss 0.00187284, throughput 2.84898K wps
Begin Testing...
[Epoch 41] train avg loss 0.0019289, dev acc 0.8028, dev avg loss 0.622761, throughput 2.85616K wps
[Epoch 42 Batch 30/1540] avg loss 0.00190722, throughput 2.90303K wps
[Epoch 42 Batch 60/1540] avg loss 0.00193595, throughput 2.88001K wps
[Epoch 42 Batch 90/1540] avg loss 0.00172442, throughput 2.88052K wps
[Epoch 42 Batch 120/1540] avg loss 0.00166201, throughput 2.87698K wps
[Epoch 42 Batch 150/1540] avg loss 0.00157431, throughput 2.82095K wps
[Epoch 42 Batch 180/1540] avg loss 0.00191111, throughput 2.8819K wps
[Epoch 42 Batch 210/1540] avg loss 0.00162648, throughput 2.83551K wps
[Epoch 42 Batch 240/1540] avg loss 0.00199454, throughput 2.80359K wps
[Epoch 42 Batch 270/1540] avg loss 0.00152483, throughput 2.83863K wps
[Epoch 42 Batch 300/1540] avg loss 0.00201804, throughput 2.80255K wps
[Epoch 42 Batch 330/1540] avg loss 0.00172918, throughput 2.84714K wps
[Epoch 42 Batch 360/1540] avg loss 0.00220597, throughput 2.80327K wps
[Epoch 42 Batch 390/1540] avg loss 0.00193881, throughput 2.86271K wps
[Epoch 42 Batch 420/1540] avg loss 0.00189905, throughput 2.87896K wps
[Epoch 42 Batch 450/1540] avg loss 0.00173614, throughput 2.8801K wps
[Epoch 42 Batch 480/1540] avg loss 0.00178563, throughput 2.87524K wps
[Epoch 42 Batch 510/1540] avg loss 0.0018636, throughput 2.7933K wps
[Epoch 42 Batch 540/1540] avg loss 0.00167232, throughput 2.8636K wps
[Epoch 42 Batch 570/1540] avg loss 0.00156458, throughput 2.87752K wps
[Epoch 42 Batch 600/1540] avg loss 0.00184272, throughput 2.8716K wps
[Epoch 42 Batch 630/1540] avg loss 0.00239768, throughput 2.88581K wps
[Epoch 42 Batch 660/1540] avg loss 0.00227655, throughput 2.87782K wps
[Epoch 42 Batch 690/1540] avg loss 0.00215603, throughput 2.85792K wps
[Epoch 42 Batch 720/1540] avg loss 0.00187691, throughput 2.85784K wps
[Epoch 42 Batch 750/1540] avg loss 0.00230324, throughput 2.86878K wps
[Epoch 42 Batch 780/1540] avg loss 0.00185111, throughput 2.87284K wps
[Epoch 42 Batch 810/1540] avg loss 0.00242319, throughput 2.86667K wps
[Epoch 42 Batch 840/1540] avg loss 0.00192204, throughput 2.8778K wps
[Epoch 42 Batch 870/1540] avg loss 0.00194514, throughput 2.87004K wps
[Epoch 42 Batch 900/1540] avg loss 0.00190372, throughput 2.88296K wps
[Epoch 42 Batch 930/1540] avg loss 0.00160831, throughput 2.87305K wps
[Epoch 42 Batch 960/1540] avg loss 0.00186844, throughput 2.87113K wps
[Epoch 42 Batch 990/1540] avg loss 0.00222196, throughput 2.848K wps
[Epoch 42 Batch 1020/1540] avg loss 0.00207424, throughput 2.81987K wps
[Epoch 42 Batch 1050/1540] avg loss 0.00174084, throughput 2.88152K wps
[Epoch 42 Batch 1080/1540] avg loss 0.00181984, throughput 2.87557K wps
[Epoch 42 Batch 1110/1540] avg loss 0.00213637, throughput 2.8874K wps
[Epoch 42 Batch 1140/1540] avg loss 0.00165722, throughput 2.85784K wps
[Epoch 42 Batch 1170/1540] avg loss 0.00188067, throughput 2.87093K wps
[Epoch 42 Batch 1200/1540] avg loss 0.00205945, throughput 2.88238K wps
[Epoch 42 Batch 1230/1540] avg loss 0.00200224, throughput 2.86357K wps
[Epoch 42 Batch 1260/1540] avg loss 0.00181519, throughput 2.87912K wps
[Epoch 42 Batch 1290/1540] avg loss 0.00203238, throughput 2.86838K wps
[Epoch 42 Batch 1320/1540] avg loss 0.00202622, throughput 2.81746K wps
[Epoch 42 Batch 1350/1540] avg loss 0.00168596, throughput 2.8753K wps
[Epoch 42 Batch 1380/1540] avg loss 0.00225139, throughput 2.88652K wps
[Epoch 42 Batch 1410/1540] avg loss 0.00193772, throughput 2.86547K wps
[Epoch 42 Batch 1440/1540] avg loss 0.00201013, throughput 2.8597K wps
[Epoch 42 Batch 1470/1540] avg loss 0.00193114, throughput 2.85347K wps
[Epoch 42 Batch 1500/1540] avg loss 0.00176515, throughput 2.86694K wps
[Epoch 42 Batch 1530/1540] avg loss 0.00201825, throughput 2.82703K wps
Begin Testing...
[Epoch 42] train avg loss 0.00191661, dev acc 0.8142, dev avg loss 0.588445, throughput 2.86122K wps
[Epoch 43 Batch 30/1540] avg loss 0.0016286, throughput 2.88612K wps
[Epoch 43 Batch 60/1540] avg loss 0.00158185, throughput 2.84539K wps
[Epoch 43 Batch 90/1540] avg loss 0.00171192, throughput 2.88685K wps
[Epoch 43 Batch 120/1540] avg loss 0.00191114, throughput 2.88587K wps
[Epoch 43 Batch 150/1540] avg loss 0.00175032, throughput 2.87427K wps
[Epoch 43 Batch 180/1540] avg loss 0.00164616, throughput 2.8836K wps
[Epoch 43 Batch 210/1540] avg loss 0.0017189, throughput 2.87754K wps
[Epoch 43 Batch 240/1540] avg loss 0.00169152, throughput 2.88269K wps
[Epoch 43 Batch 270/1540] avg loss 0.00151935, throughput 2.8665K wps
[Epoch 43 Batch 300/1540] avg loss 0.00185193, throughput 2.797K wps
[Epoch 43 Batch 330/1540] avg loss 0.00197746, throughput 2.83852K wps
[Epoch 43 Batch 360/1540] avg loss 0.00169875, throughput 2.85068K wps
[Epoch 43 Batch 390/1540] avg loss 0.00176788, throughput 2.87417K wps
[Epoch 43 Batch 420/1540] avg loss 0.00172462, throughput 2.88196K wps
[Epoch 43 Batch 450/1540] avg loss 0.00167622, throughput 2.88287K wps
[Epoch 43 Batch 480/1540] avg loss 0.00215706, throughput 2.8806K wps
[Epoch 43 Batch 510/1540] avg loss 0.00158564, throughput 2.88036K wps
[Epoch 43 Batch 540/1540] avg loss 0.00218011, throughput 2.88012K wps
[Epoch 43 Batch 570/1540] avg loss 0.00165376, throughput 2.83097K wps
[Epoch 43 Batch 600/1540] avg loss 0.00174864, throughput 2.88535K wps
[Epoch 43 Batch 630/1540] avg loss 0.0017642, throughput 2.87727K wps
[Epoch 43 Batch 660/1540] avg loss 0.00175718, throughput 2.86833K wps
[Epoch 43 Batch 690/1540] avg loss 0.00189264, throughput 2.88004K wps
[Epoch 43 Batch 720/1540] avg loss 0.00168219, throughput 2.87401K wps
[Epoch 43 Batch 750/1540] avg loss 0.00182581, throughput 2.88294K wps
[Epoch 43 Batch 780/1540] avg loss 0.00214202, throughput 2.86423K wps
[Epoch 43 Batch 810/1540] avg loss 0.00214839, throughput 2.84585K wps
[Epoch 43 Batch 840/1540] avg loss 0.00177299, throughput 2.86342K wps
[Epoch 43 Batch 870/1540] avg loss 0.00169535, throughput 2.87863K wps
[Epoch 43 Batch 900/1540] avg loss 0.00190947, throughput 2.86849K wps
[Epoch 43 Batch 930/1540] avg loss 0.00221284, throughput 2.83783K wps
[Epoch 43 Batch 960/1540] avg loss 0.0017121, throughput 2.8367K wps
[Epoch 43 Batch 990/1540] avg loss 0.00186384, throughput 2.88425K wps
[Epoch 43 Batch 1020/1540] avg loss 0.00185923, throughput 2.8751K wps
[Epoch 43 Batch 1050/1540] avg loss 0.00175324, throughput 2.85775K wps
[Epoch 43 Batch 1080/1540] avg loss 0.00194426, throughput 2.88608K wps
[Epoch 43 Batch 1110/1540] avg loss 0.00220522, throughput 2.82518K wps
[Epoch 43 Batch 1140/1540] avg loss 0.0017786, throughput 2.82424K wps
[Epoch 43 Batch 1170/1540] avg loss 0.00221141, throughput 2.87599K wps
[Epoch 43 Batch 1200/1540] avg loss 0.0018622, throughput 2.87812K wps
[Epoch 43 Batch 1230/1540] avg loss 0.00187319, throughput 2.86987K wps
[Epoch 43 Batch 1260/1540] avg loss 0.00207581, throughput 2.83564K wps
[Epoch 43 Batch 1290/1540] avg loss 0.00208124, throughput 2.83146K wps
[Epoch 43 Batch 1320/1540] avg loss 0.00208163, throughput 2.86232K wps
[Epoch 43 Batch 1350/1540] avg loss 0.0018199, throughput 2.87988K wps
[Epoch 43 Batch 1380/1540] avg loss 0.00171421, throughput 2.87651K wps
[Epoch 43 Batch 1410/1540] avg loss 0.00181201, throughput 2.86419K wps
[Epoch 43 Batch 1440/1540] avg loss 0.00197139, throughput 2.79543K wps
[Epoch 43 Batch 1470/1540] avg loss 0.00193528, throughput 2.88361K wps
[Epoch 43 Batch 1500/1540] avg loss 0.00207711, throughput 2.88105K wps
[Epoch 43 Batch 1530/1540] avg loss 0.00208025, throughput 2.86387K wps
Begin Testing...
[Epoch 43] train avg loss 0.00186166, dev acc 0.7982, dev avg loss 0.619389, throughput 2.86409K wps
[Epoch 44 Batch 30/1540] avg loss 0.00178507, throughput 2.87912K wps
[Epoch 44 Batch 60/1540] avg loss 0.00165861, throughput 2.80237K wps
[Epoch 44 Batch 90/1540] avg loss 0.00197858, throughput 2.81557K wps
[Epoch 44 Batch 120/1540] avg loss 0.00197197, throughput 2.88459K wps
[Epoch 44 Batch 150/1540] avg loss 0.00176752, throughput 2.85957K wps
[Epoch 44 Batch 180/1540] avg loss 0.00158065, throughput 2.82309K wps
[Epoch 44 Batch 210/1540] avg loss 0.00154851, throughput 2.87378K wps
[Epoch 44 Batch 240/1540] avg loss 0.00170776, throughput 2.81527K wps
[Epoch 44 Batch 270/1540] avg loss 0.00179112, throughput 2.82628K wps
[Epoch 44 Batch 300/1540] avg loss 0.00183949, throughput 2.83323K wps
[Epoch 44 Batch 330/1540] avg loss 0.00187602, throughput 2.87004K wps
[Epoch 44 Batch 360/1540] avg loss 0.00206642, throughput 2.88941K wps
[Epoch 44 Batch 390/1540] avg loss 0.00208146, throughput 2.81891K wps
[Epoch 44 Batch 420/1540] avg loss 0.00196363, throughput 2.79994K wps
[Epoch 44 Batch 450/1540] avg loss 0.00202088, throughput 2.88159K wps
[Epoch 44 Batch 480/1540] avg loss 0.00179531, throughput 2.86017K wps
[Epoch 44 Batch 510/1540] avg loss 0.0018409, throughput 2.83784K wps
[Epoch 44 Batch 540/1540] avg loss 0.00165213, throughput 2.86967K wps
[Epoch 44 Batch 570/1540] avg loss 0.00175087, throughput 2.88213K wps
[Epoch 44 Batch 600/1540] avg loss 0.0017629, throughput 2.86913K wps
[Epoch 44 Batch 630/1540] avg loss 0.0016445, throughput 2.84831K wps
[Epoch 44 Batch 660/1540] avg loss 0.00172755, throughput 2.86975K wps
[Epoch 44 Batch 690/1540] avg loss 0.0017686, throughput 2.84785K wps
[Epoch 44 Batch 720/1540] avg loss 0.00185187, throughput 2.7996K wps
[Epoch 44 Batch 750/1540] avg loss 0.00180182, throughput 2.86716K wps
[Epoch 44 Batch 780/1540] avg loss 0.00193594, throughput 2.87887K wps
[Epoch 44 Batch 810/1540] avg loss 0.00178564, throughput 2.87781K wps
[Epoch 44 Batch 840/1540] avg loss 0.0018291, throughput 2.84193K wps
[Epoch 44 Batch 870/1540] avg loss 0.00171159, throughput 2.79293K wps
[Epoch 44 Batch 900/1540] avg loss 0.0018483, throughput 2.8684K wps
[Epoch 44 Batch 930/1540] avg loss 0.00211075, throughput 2.8696K wps
[Epoch 44 Batch 960/1540] avg loss 0.00188132, throughput 2.86379K wps
[Epoch 44 Batch 990/1540] avg loss 0.00183744, throughput 2.88191K wps
[Epoch 44 Batch 1020/1540] avg loss 0.00161534, throughput 2.87052K wps
[Epoch 44 Batch 1050/1540] avg loss 0.00171603, throughput 2.83464K wps
[Epoch 44 Batch 1080/1540] avg loss 0.00190541, throughput 2.83753K wps
[Epoch 44 Batch 1110/1540] avg loss 0.00173751, throughput 2.87264K wps
[Epoch 44 Batch 1140/1540] avg loss 0.00191962, throughput 2.87443K wps
[Epoch 44 Batch 1170/1540] avg loss 0.00164151, throughput 2.87377K wps
[Epoch 44 Batch 1200/1540] avg loss 0.00225167, throughput 2.87922K wps
[Epoch 44 Batch 1230/1540] avg loss 0.00192108, throughput 2.8674K wps
[Epoch 44 Batch 1260/1540] avg loss 0.00165659, throughput 2.87201K wps
[Epoch 44 Batch 1290/1540] avg loss 0.00184463, throughput 2.83537K wps
[Epoch 44 Batch 1320/1540] avg loss 0.00159044, throughput 2.86568K wps
[Epoch 44 Batch 1350/1540] avg loss 0.00177793, throughput 2.88235K wps
[Epoch 44 Batch 1380/1540] avg loss 0.00191169, throughput 2.87852K wps
[Epoch 44 Batch 1410/1540] avg loss 0.00194151, throughput 2.83297K wps
[Epoch 44 Batch 1440/1540] avg loss 0.00214917, throughput 2.8545K wps
[Epoch 44 Batch 1470/1540] avg loss 0.00181885, throughput 2.84949K wps
[Epoch 44 Batch 1500/1540] avg loss 0.00160105, throughput 2.82361K wps
[Epoch 44 Batch 1530/1540] avg loss 0.001903, throughput 2.84041K wps
Begin Testing...
[Epoch 44] train avg loss 0.00182497, dev acc 0.8096, dev avg loss 0.613337, throughput 2.85346K wps
[Epoch 45 Batch 30/1540] avg loss 0.00171434, throughput 2.90244K wps
[Epoch 45 Batch 60/1540] avg loss 0.00159487, throughput 2.8753K wps
[Epoch 45 Batch 90/1540] avg loss 0.001776, throughput 2.82457K wps
[Epoch 45 Batch 120/1540] avg loss 0.0017065, throughput 2.88304K wps
[Epoch 45 Batch 150/1540] avg loss 0.00174585, throughput 2.87514K wps
[Epoch 45 Batch 180/1540] avg loss 0.0016277, throughput 2.86141K wps
[Epoch 45 Batch 210/1540] avg loss 0.00173884, throughput 2.87678K wps
[Epoch 45 Batch 240/1540] avg loss 0.00155326, throughput 2.85033K wps
[Epoch 45 Batch 270/1540] avg loss 0.00149791, throughput 2.88213K wps
[Epoch 45 Batch 300/1540] avg loss 0.00160162, throughput 2.85586K wps
[Epoch 45 Batch 330/1540] avg loss 0.0018944, throughput 2.86662K wps
[Epoch 45 Batch 360/1540] avg loss 0.00169698, throughput 2.87056K wps
[Epoch 45 Batch 390/1540] avg loss 0.00169082, throughput 2.8544K wps
[Epoch 45 Batch 420/1540] avg loss 0.00156559, throughput 2.80012K wps
[Epoch 45 Batch 450/1540] avg loss 0.00159113, throughput 2.8777K wps
[Epoch 45 Batch 480/1540] avg loss 0.00184089, throughput 2.85231K wps
[Epoch 45 Batch 510/1540] avg loss 0.00209554, throughput 2.87428K wps
[Epoch 45 Batch 540/1540] avg loss 0.00182985, throughput 2.88761K wps
[Epoch 45 Batch 570/1540] avg loss 0.00156126, throughput 2.8832K wps
[Epoch 45 Batch 600/1540] avg loss 0.00153607, throughput 2.84943K wps
[Epoch 45 Batch 630/1540] avg loss 0.0019637, throughput 2.88435K wps
[Epoch 45 Batch 660/1540] avg loss 0.00178111, throughput 2.87907K wps
[Epoch 45 Batch 690/1540] avg loss 0.00180482, throughput 2.83217K wps
[Epoch 45 Batch 720/1540] avg loss 0.00188739, throughput 2.86336K wps
[Epoch 45 Batch 750/1540] avg loss 0.00197911, throughput 2.86382K wps
[Epoch 45 Batch 780/1540] avg loss 0.00189525, throughput 2.87392K wps
[Epoch 45 Batch 810/1540] avg loss 0.00178324, throughput 2.88339K wps
[Epoch 45 Batch 840/1540] avg loss 0.0020528, throughput 2.88345K wps
[Epoch 45 Batch 870/1540] avg loss 0.00215353, throughput 2.87997K wps
[Epoch 45 Batch 900/1540] avg loss 0.00167501, throughput 2.80657K wps
[Epoch 45 Batch 930/1540] avg loss 0.00161396, throughput 2.84381K wps
[Epoch 45 Batch 960/1540] avg loss 0.00194591, throughput 2.86771K wps
[Epoch 45 Batch 990/1540] avg loss 0.00210268, throughput 2.86975K wps
[Epoch 45 Batch 1020/1540] avg loss 0.0020844, throughput 2.83211K wps
[Epoch 45 Batch 1050/1540] avg loss 0.00182598, throughput 2.85628K wps
[Epoch 45 Batch 1080/1540] avg loss 0.00177023, throughput 2.82975K wps
[Epoch 45 Batch 1110/1540] avg loss 0.00176237, throughput 2.85869K wps
[Epoch 45 Batch 1140/1540] avg loss 0.0020235, throughput 2.8277K wps
[Epoch 45 Batch 1170/1540] avg loss 0.00180428, throughput 2.87863K wps
[Epoch 45 Batch 1200/1540] avg loss 0.00189951, throughput 2.87642K wps
[Epoch 45 Batch 1230/1540] avg loss 0.00172664, throughput 2.82487K wps
[Epoch 45 Batch 1260/1540] avg loss 0.00221405, throughput 2.85798K wps
[Epoch 45 Batch 1290/1540] avg loss 0.00171446, throughput 2.88294K wps
[Epoch 45 Batch 1320/1540] avg loss 0.00201468, throughput 2.81486K wps
[Epoch 45 Batch 1350/1540] avg loss 0.00198801, throughput 2.80496K wps
[Epoch 45 Batch 1380/1540] avg loss 0.00168462, throughput 2.85308K wps
[Epoch 45 Batch 1410/1540] avg loss 0.00203671, throughput 2.84285K wps
[Epoch 45 Batch 1440/1540] avg loss 0.00184934, throughput 2.84346K wps
[Epoch 45 Batch 1470/1540] avg loss 0.00167197, throughput 2.86366K wps
[Epoch 45 Batch 1500/1540] avg loss 0.00166206, throughput 2.88433K wps
[Epoch 45 Batch 1530/1540] avg loss 0.00188029, throughput 2.85535K wps
Begin Testing...
[Epoch 45] train avg loss 0.00181071, dev acc 0.7959, dev avg loss 0.625502, throughput 2.85888K wps
[Epoch 46 Batch 30/1540] avg loss 0.00138348, throughput 2.93907K wps
[Epoch 46 Batch 60/1540] avg loss 0.00178699, throughput 2.8808K wps
[Epoch 46 Batch 90/1540] avg loss 0.00154693, throughput 2.86191K wps
[Epoch 46 Batch 120/1540] avg loss 0.00182151, throughput 2.82615K wps
[Epoch 46 Batch 150/1540] avg loss 0.00170104, throughput 2.84934K wps
[Epoch 46 Batch 180/1540] avg loss 0.00169737, throughput 2.87323K wps
[Epoch 46 Batch 210/1540] avg loss 0.00148025, throughput 2.88292K wps
[Epoch 46 Batch 240/1540] avg loss 0.00169853, throughput 2.88375K wps
[Epoch 46 Batch 270/1540] avg loss 0.00149021, throughput 2.88025K wps
[Epoch 46 Batch 300/1540] avg loss 0.00160924, throughput 2.85698K wps
[Epoch 46 Batch 330/1540] avg loss 0.00181857, throughput 2.81839K wps
[Epoch 46 Batch 360/1540] avg loss 0.00192724, throughput 2.8558K wps
[Epoch 46 Batch 390/1540] avg loss 0.00171371, throughput 2.813K wps
[Epoch 46 Batch 420/1540] avg loss 0.00213249, throughput 2.87733K wps
[Epoch 46 Batch 450/1540] avg loss 0.00167015, throughput 2.88302K wps
[Epoch 46 Batch 480/1540] avg loss 0.00197623, throughput 2.88455K wps
[Epoch 46 Batch 510/1540] avg loss 0.00182287, throughput 2.88781K wps
[Epoch 46 Batch 540/1540] avg loss 0.00180968, throughput 2.85806K wps
[Epoch 46 Batch 570/1540] avg loss 0.00194523, throughput 2.82132K wps
[Epoch 46 Batch 600/1540] avg loss 0.00175154, throughput 2.81794K wps
[Epoch 46 Batch 630/1540] avg loss 0.0017486, throughput 2.8144K wps
[Epoch 46 Batch 660/1540] avg loss 0.0017276, throughput 2.83956K wps
[Epoch 46 Batch 690/1540] avg loss 0.00171071, throughput 2.87707K wps
[Epoch 46 Batch 720/1540] avg loss 0.00196381, throughput 2.85515K wps
[Epoch 46 Batch 750/1540] avg loss 0.00167457, throughput 2.87818K wps
[Epoch 46 Batch 780/1540] avg loss 0.00184698, throughput 2.87997K wps
[Epoch 46 Batch 810/1540] avg loss 0.0016432, throughput 2.88377K wps
[Epoch 46 Batch 840/1540] avg loss 0.00194091, throughput 2.85161K wps
[Epoch 46 Batch 870/1540] avg loss 0.00156575, throughput 2.80319K wps
[Epoch 46 Batch 900/1540] avg loss 0.00143551, throughput 2.84065K wps
[Epoch 46 Batch 930/1540] avg loss 0.00174613, throughput 2.88765K wps
[Epoch 46 Batch 960/1540] avg loss 0.00181177, throughput 2.86624K wps
[Epoch 46 Batch 990/1540] avg loss 0.00193004, throughput 2.79921K wps
[Epoch 46 Batch 1020/1540] avg loss 0.00214417, throughput 2.86146K wps
[Epoch 46 Batch 1050/1540] avg loss 0.00157426, throughput 2.84438K wps
[Epoch 46 Batch 1080/1540] avg loss 0.00198349, throughput 2.8589K wps
[Epoch 46 Batch 1110/1540] avg loss 0.00173747, throughput 2.83648K wps
[Epoch 46 Batch 1140/1540] avg loss 0.00160478, throughput 2.88412K wps
[Epoch 46 Batch 1170/1540] avg loss 0.00194609, throughput 2.87455K wps
[Epoch 46 Batch 1200/1540] avg loss 0.00171604, throughput 2.87752K wps
[Epoch 46 Batch 1230/1540] avg loss 0.00183921, throughput 2.86419K wps
[Epoch 46 Batch 1260/1540] avg loss 0.0018187, throughput 2.85657K wps
[Epoch 46 Batch 1290/1540] avg loss 0.00193736, throughput 2.87741K wps
[Epoch 46 Batch 1320/1540] avg loss 0.00193955, throughput 2.82687K wps
[Epoch 46 Batch 1350/1540] avg loss 0.00154859, throughput 2.88471K wps
[Epoch 46 Batch 1380/1540] avg loss 0.00148342, throughput 2.87997K wps
[Epoch 46 Batch 1410/1540] avg loss 0.00167957, throughput 2.87053K wps
[Epoch 46 Batch 1440/1540] avg loss 0.0018172, throughput 2.8633K wps
[Epoch 46 Batch 1470/1540] avg loss 0.00219718, throughput 2.80442K wps
[Epoch 46 Batch 1500/1540] avg loss 0.00169882, throughput 2.79634K wps
[Epoch 46 Batch 1530/1540] avg loss 0.00248202, throughput 2.80379K wps
Begin Testing...
[Epoch 46] train avg loss 0.00177807, dev acc 0.8085, dev avg loss 0.615314, throughput 2.85599K wps
[Epoch 47 Batch 30/1540] avg loss 0.00157056, throughput 2.93802K wps
[Epoch 47 Batch 60/1540] avg loss 0.00136046, throughput 2.88167K wps
[Epoch 47 Batch 90/1540] avg loss 0.0017547, throughput 2.87118K wps
[Epoch 47 Batch 120/1540] avg loss 0.00147473, throughput 2.83433K wps
[Epoch 47 Batch 150/1540] avg loss 0.00150423, throughput 2.81132K wps
[Epoch 47 Batch 180/1540] avg loss 0.00162904, throughput 2.84428K wps
[Epoch 47 Batch 210/1540] avg loss 0.00172555, throughput 2.80706K wps
[Epoch 47 Batch 240/1540] avg loss 0.00178632, throughput 2.85658K wps
[Epoch 47 Batch 270/1540] avg loss 0.00176977, throughput 2.83037K wps
[Epoch 47 Batch 300/1540] avg loss 0.0015759, throughput 2.87857K wps
[Epoch 47 Batch 330/1540] avg loss 0.0019696, throughput 2.87434K wps
[Epoch 47 Batch 360/1540] avg loss 0.00171098, throughput 2.85555K wps
[Epoch 47 Batch 390/1540] avg loss 0.00193088, throughput 2.88024K wps
[Epoch 47 Batch 420/1540] avg loss 0.00167592, throughput 2.87563K wps
[Epoch 47 Batch 450/1540] avg loss 0.00146858, throughput 2.87789K wps
[Epoch 47 Batch 480/1540] avg loss 0.00191875, throughput 2.83376K wps
[Epoch 47 Batch 510/1540] avg loss 0.00182365, throughput 2.87382K wps
[Epoch 47 Batch 540/1540] avg loss 0.00201161, throughput 2.87046K wps
[Epoch 47 Batch 570/1540] avg loss 0.00144965, throughput 2.87868K wps
[Epoch 47 Batch 600/1540] avg loss 0.00165746, throughput 2.877K wps
[Epoch 47 Batch 630/1540] avg loss 0.00209909, throughput 2.82639K wps
[Epoch 47 Batch 660/1540] avg loss 0.00184565, throughput 2.88203K wps
[Epoch 47 Batch 690/1540] avg loss 0.00172305, throughput 2.88365K wps
[Epoch 47 Batch 720/1540] avg loss 0.00207514, throughput 2.87607K wps
[Epoch 47 Batch 750/1540] avg loss 0.00178216, throughput 2.83608K wps
[Epoch 47 Batch 780/1540] avg loss 0.00167015, throughput 2.79372K wps
[Epoch 47 Batch 810/1540] avg loss 0.00167317, throughput 2.83739K wps
[Epoch 47 Batch 840/1540] avg loss 0.00141187, throughput 2.87453K wps
[Epoch 47 Batch 870/1540] avg loss 0.00171077, throughput 2.86279K wps
[Epoch 47 Batch 900/1540] avg loss 0.00205578, throughput 2.8737K wps
[Epoch 47 Batch 930/1540] avg loss 0.00176794, throughput 2.86183K wps
[Epoch 47 Batch 960/1540] avg loss 0.00166052, throughput 2.87742K wps
[Epoch 47 Batch 990/1540] avg loss 0.0020022, throughput 2.8589K wps
[Epoch 47 Batch 1020/1540] avg loss 0.00142442, throughput 2.86572K wps
[Epoch 47 Batch 1050/1540] avg loss 0.00219105, throughput 2.82347K wps
[Epoch 47 Batch 1080/1540] avg loss 0.00168032, throughput 2.807K wps
[Epoch 47 Batch 1110/1540] avg loss 0.00149384, throughput 2.88438K wps
[Epoch 47 Batch 1140/1540] avg loss 0.00213952, throughput 2.88474K wps
[Epoch 47 Batch 1170/1540] avg loss 0.00169201, throughput 2.8475K wps
[Epoch 47 Batch 1200/1540] avg loss 0.00157078, throughput 2.8508K wps
[Epoch 47 Batch 1230/1540] avg loss 0.00192521, throughput 2.86745K wps
[Epoch 47 Batch 1260/1540] avg loss 0.00186675, throughput 2.87239K wps
[Epoch 47 Batch 1290/1540] avg loss 0.001902, throughput 2.88354K wps
[Epoch 47 Batch 1320/1540] avg loss 0.00158413, throughput 2.87918K wps
[Epoch 47 Batch 1350/1540] avg loss 0.0020471, throughput 2.84578K wps
[Epoch 47 Batch 1380/1540] avg loss 0.00188353, throughput 2.88848K wps
[Epoch 47 Batch 1410/1540] avg loss 0.0021026, throughput 2.87713K wps
[Epoch 47 Batch 1440/1540] avg loss 0.00197176, throughput 2.88129K wps
[Epoch 47 Batch 1470/1540] avg loss 0.00184436, throughput 2.87658K wps
[Epoch 47 Batch 1500/1540] avg loss 0.00172456, throughput 2.8775K wps
[Epoch 47 Batch 1530/1540] avg loss 0.00179076, throughput 2.88328K wps
Begin Testing...
[Epoch 47] train avg loss 0.00176988, dev acc 0.7936, dev avg loss 0.673759, throughput 2.86245K wps
[Epoch 48 Batch 30/1540] avg loss 0.00175607, throughput 2.86271K wps
[Epoch 48 Batch 60/1540] avg loss 0.00145654, throughput 2.87054K wps
[Epoch 48 Batch 90/1540] avg loss 0.0014888, throughput 2.88619K wps
[Epoch 48 Batch 120/1540] avg loss 0.00171718, throughput 2.88511K wps
[Epoch 48 Batch 150/1540] avg loss 0.00170884, throughput 2.87456K wps
[Epoch 48 Batch 180/1540] avg loss 0.0014389, throughput 2.81489K wps
[Epoch 48 Batch 210/1540] avg loss 0.00199017, throughput 2.8633K wps
[Epoch 48 Batch 240/1540] avg loss 0.00148378, throughput 2.84141K wps
[Epoch 48 Batch 270/1540] avg loss 0.00138904, throughput 2.84404K wps
[Epoch 48 Batch 300/1540] avg loss 0.00148846, throughput 2.79687K wps
[Epoch 48 Batch 330/1540] avg loss 0.00162118, throughput 2.88334K wps
[Epoch 48 Batch 360/1540] avg loss 0.00167091, throughput 2.88031K wps
[Epoch 48 Batch 390/1540] avg loss 0.00153836, throughput 2.88353K wps
[Epoch 48 Batch 420/1540] avg loss 0.00143608, throughput 2.87374K wps
[Epoch 48 Batch 450/1540] avg loss 0.00154231, throughput 2.81888K wps
[Epoch 48 Batch 480/1540] avg loss 0.00156606, throughput 2.81708K wps
[Epoch 48 Batch 510/1540] avg loss 0.0019006, throughput 2.882K wps
[Epoch 48 Batch 540/1540] avg loss 0.00167015, throughput 2.88186K wps
[Epoch 48 Batch 570/1540] avg loss 0.00151218, throughput 2.86549K wps
[Epoch 48 Batch 600/1540] avg loss 0.00187291, throughput 2.8103K wps
[Epoch 48 Batch 630/1540] avg loss 0.00165066, throughput 2.87911K wps
[Epoch 48 Batch 660/1540] avg loss 0.0016774, throughput 2.80148K wps
[Epoch 48 Batch 690/1540] avg loss 0.00170924, throughput 2.85539K wps
[Epoch 48 Batch 720/1540] avg loss 0.00190137, throughput 2.85001K wps
[Epoch 48 Batch 750/1540] avg loss 0.00146695, throughput 2.88182K wps
[Epoch 48 Batch 780/1540] avg loss 0.0015782, throughput 2.86212K wps
[Epoch 48 Batch 810/1540] avg loss 0.00174739, throughput 2.8276K wps
[Epoch 48 Batch 840/1540] avg loss 0.00198502, throughput 2.79085K wps
[Epoch 48 Batch 870/1540] avg loss 0.00193054, throughput 2.86033K wps
[Epoch 48 Batch 900/1540] avg loss 0.00167951, throughput 2.88211K wps
[Epoch 48 Batch 930/1540] avg loss 0.00193494, throughput 2.88307K wps
[Epoch 48 Batch 960/1540] avg loss 0.00187257, throughput 2.87711K wps
[Epoch 48 Batch 990/1540] avg loss 0.00160432, throughput 2.88114K wps
[Epoch 48 Batch 1020/1540] avg loss 0.00200682, throughput 2.87867K wps
[Epoch 48 Batch 1050/1540] avg loss 0.00189173, throughput 2.88229K wps
[Epoch 48 Batch 1080/1540] avg loss 0.00169249, throughput 2.87951K wps
[Epoch 48 Batch 1110/1540] avg loss 0.00191599, throughput 2.85751K wps
[Epoch 48 Batch 1140/1540] avg loss 0.00188669, throughput 2.86883K wps
[Epoch 48 Batch 1170/1540] avg loss 0.00160894, throughput 2.84068K wps
[Epoch 48 Batch 1200/1540] avg loss 0.00175084, throughput 2.83614K wps
[Epoch 48 Batch 1230/1540] avg loss 0.00190402, throughput 2.82925K wps
[Epoch 48 Batch 1260/1540] avg loss 0.00199674, throughput 2.8842K wps
[Epoch 48 Batch 1290/1540] avg loss 0.00192296, throughput 2.83489K wps
[Epoch 48 Batch 1320/1540] avg loss 0.00188224, throughput 2.80184K wps
[Epoch 48 Batch 1350/1540] avg loss 0.00192133, throughput 2.84411K wps
[Epoch 48 Batch 1380/1540] avg loss 0.00187858, throughput 2.88464K wps
[Epoch 48 Batch 1410/1540] avg loss 0.00159738, throughput 2.88368K wps
[Epoch 48 Batch 1440/1540] avg loss 0.00193976, throughput 2.86487K wps
[Epoch 48 Batch 1470/1540] avg loss 0.00177031, throughput 2.87801K wps
[Epoch 48 Batch 1500/1540] avg loss 0.00198447, throughput 2.87705K wps
[Epoch 48 Batch 1530/1540] avg loss 0.00192413, throughput 2.8452K wps
Begin Testing...
[Epoch 48] train avg loss 0.00173643, dev acc 0.8062, dev avg loss 0.618443, throughput 2.85697K wps
[Epoch 49 Batch 30/1540] avg loss 0.0016943, throughput 2.93468K wps
[Epoch 49 Batch 60/1540] avg loss 0.00124838, throughput 2.8716K wps
[Epoch 49 Batch 90/1540] avg loss 0.00136113, throughput 2.83249K wps
[Epoch 49 Batch 120/1540] avg loss 0.00145599, throughput 2.85676K wps
[Epoch 49 Batch 150/1540] avg loss 0.00171545, throughput 2.871K wps
[Epoch 49 Batch 180/1540] avg loss 0.00164624, throughput 2.86478K wps
[Epoch 49 Batch 210/1540] avg loss 0.00156309, throughput 2.8727K wps
[Epoch 49 Batch 240/1540] avg loss 0.00161133, throughput 2.84047K wps
[Epoch 49 Batch 270/1540] avg loss 0.00178276, throughput 2.86026K wps
[Epoch 49 Batch 300/1540] avg loss 0.00160855, throughput 2.82949K wps
[Epoch 49 Batch 330/1540] avg loss 0.00157482, throughput 2.87458K wps
[Epoch 49 Batch 360/1540] avg loss 0.00164562, throughput 2.86538K wps
[Epoch 49 Batch 390/1540] avg loss 0.00181035, throughput 2.87588K wps
[Epoch 49 Batch 420/1540] avg loss 0.00188506, throughput 2.85297K wps
[Epoch 49 Batch 450/1540] avg loss 0.00162336, throughput 2.84857K wps
[Epoch 49 Batch 480/1540] avg loss 0.00164353, throughput 2.87921K wps
[Epoch 49 Batch 510/1540] avg loss 0.00151523, throughput 2.83718K wps
[Epoch 49 Batch 540/1540] avg loss 0.00161554, throughput 2.8629K wps
[Epoch 49 Batch 570/1540] avg loss 0.00170464, throughput 2.81153K wps
[Epoch 49 Batch 600/1540] avg loss 0.00208358, throughput 2.87018K wps
[Epoch 49 Batch 630/1540] avg loss 0.00157615, throughput 2.86624K wps
[Epoch 49 Batch 660/1540] avg loss 0.00174352, throughput 2.87056K wps
[Epoch 49 Batch 690/1540] avg loss 0.00157808, throughput 2.81648K wps
[Epoch 49 Batch 720/1540] avg loss 0.00194906, throughput 2.84786K wps
[Epoch 49 Batch 750/1540] avg loss 0.00168906, throughput 2.88708K wps
[Epoch 49 Batch 780/1540] avg loss 0.00190747, throughput 2.87445K wps
[Epoch 49 Batch 810/1540] avg loss 0.00186282, throughput 2.82714K wps
[Epoch 49 Batch 840/1540] avg loss 0.00171289, throughput 2.85518K wps
[Epoch 49 Batch 870/1540] avg loss 0.00154115, throughput 2.88588K wps
[Epoch 49 Batch 900/1540] avg loss 0.0016406, throughput 2.8219K wps
[Epoch 49 Batch 930/1540] avg loss 0.00177296, throughput 2.83533K wps
[Epoch 49 Batch 960/1540] avg loss 0.00175854, throughput 2.84656K wps
[Epoch 49 Batch 990/1540] avg loss 0.00159266, throughput 2.86158K wps
[Epoch 49 Batch 1020/1540] avg loss 0.00143534, throughput 2.88603K wps
[Epoch 49 Batch 1050/1540] avg loss 0.0017494, throughput 2.81286K wps
[Epoch 49 Batch 1080/1540] avg loss 0.00154035, throughput 2.88389K wps
[Epoch 49 Batch 1110/1540] avg loss 0.00162611, throughput 2.86666K wps
[Epoch 49 Batch 1140/1540] avg loss 0.00165233, throughput 2.83998K wps
[Epoch 49 Batch 1170/1540] avg loss 0.00197636, throughput 2.88421K wps
[Epoch 49 Batch 1200/1540] avg loss 0.00181606, throughput 2.85449K wps
[Epoch 49 Batch 1230/1540] avg loss 0.00190196, throughput 2.88059K wps
[Epoch 49 Batch 1260/1540] avg loss 0.00162723, throughput 2.88538K wps
[Epoch 49 Batch 1290/1540] avg loss 0.00176593, throughput 2.88781K wps
[Epoch 49 Batch 1320/1540] avg loss 0.00225487, throughput 2.86241K wps
[Epoch 49 Batch 1350/1540] avg loss 0.00206513, throughput 2.87755K wps
[Epoch 49 Batch 1380/1540] avg loss 0.00172451, throughput 2.8654K wps
[Epoch 49 Batch 1410/1540] avg loss 0.00177676, throughput 2.81384K wps
[Epoch 49 Batch 1440/1540] avg loss 0.00181851, throughput 2.85457K wps
[Epoch 49 Batch 1470/1540] avg loss 0.001651, throughput 2.88176K wps
[Epoch 49 Batch 1500/1540] avg loss 0.00184771, throughput 2.88309K wps
[Epoch 49 Batch 1530/1540] avg loss 0.00181548, throughput 2.87947K wps
Begin Testing...
[Epoch 49] train avg loss 0.00170663, dev acc 0.8028, dev avg loss 0.616752, throughput 2.86044K wps
[Epoch 50 Batch 30/1540] avg loss 0.00163128, throughput 2.89354K wps
[Epoch 50 Batch 60/1540] avg loss 0.00169885, throughput 2.81463K wps
[Epoch 50 Batch 90/1540] avg loss 0.00187768, throughput 2.86848K wps
[Epoch 50 Batch 120/1540] avg loss 0.00128065, throughput 2.87812K wps
[Epoch 50 Batch 150/1540] avg loss 0.00128999, throughput 2.8775K wps
[Epoch 50 Batch 180/1540] avg loss 0.00146044, throughput 2.87629K wps
[Epoch 50 Batch 210/1540] avg loss 0.00144011, throughput 2.8564K wps
[Epoch 50 Batch 240/1540] avg loss 0.00174062, throughput 2.81673K wps
[Epoch 50 Batch 270/1540] avg loss 0.00157989, throughput 2.84236K wps
[Epoch 50 Batch 300/1540] avg loss 0.00164608, throughput 2.85494K wps
[Epoch 50 Batch 330/1540] avg loss 0.0016891, throughput 2.81003K wps
[Epoch 50 Batch 360/1540] avg loss 0.00138429, throughput 2.86407K wps
[Epoch 50 Batch 390/1540] avg loss 0.00162261, throughput 2.88265K wps
[Epoch 50 Batch 420/1540] avg loss 0.00151709, throughput 2.88337K wps
[Epoch 50 Batch 450/1540] avg loss 0.00162266, throughput 2.87881K wps
[Epoch 50 Batch 480/1540] avg loss 0.00153045, throughput 2.83972K wps
[Epoch 50 Batch 510/1540] avg loss 0.00170866, throughput 2.79679K wps
[Epoch 50 Batch 540/1540] avg loss 0.00175352, throughput 2.85787K wps
[Epoch 50 Batch 570/1540] avg loss 0.00190543, throughput 2.83455K wps
[Epoch 50 Batch 600/1540] avg loss 0.00176354, throughput 2.88139K wps
[Epoch 50 Batch 630/1540] avg loss 0.00188401, throughput 2.86266K wps
[Epoch 50 Batch 660/1540] avg loss 0.00156928, throughput 2.79782K wps
[Epoch 50 Batch 690/1540] avg loss 0.00150282, throughput 2.80323K wps
[Epoch 50 Batch 720/1540] avg loss 0.00152066, throughput 2.85371K wps
[Epoch 50 Batch 750/1540] avg loss 0.00157031, throughput 2.87251K wps
[Epoch 50 Batch 780/1540] avg loss 0.0014994, throughput 2.86666K wps
[Epoch 50 Batch 810/1540] avg loss 0.00179906, throughput 2.85472K wps
[Epoch 50 Batch 840/1540] avg loss 0.00144611, throughput 2.87967K wps
[Epoch 50 Batch 870/1540] avg loss 0.00175845, throughput 2.87776K wps
[Epoch 50 Batch 900/1540] avg loss 0.0016952, throughput 2.83258K wps
[Epoch 50 Batch 930/1540] avg loss 0.00164074, throughput 2.86407K wps
[Epoch 50 Batch 960/1540] avg loss 0.00160138, throughput 2.77913K wps
[Epoch 50 Batch 990/1540] avg loss 0.00196298, throughput 2.84479K wps
[Epoch 50 Batch 1020/1540] avg loss 0.00185084, throughput 2.8832K wps
[Epoch 50 Batch 1050/1540] avg loss 0.00189249, throughput 2.85467K wps
[Epoch 50 Batch 1080/1540] avg loss 0.00153275, throughput 2.86995K wps
[Epoch 50 Batch 1110/1540] avg loss 0.00151609, throughput 2.87461K wps
[Epoch 50 Batch 1140/1540] avg loss 0.00175254, throughput 2.87721K wps
[Epoch 50 Batch 1170/1540] avg loss 0.00165091, throughput 2.88459K wps
[Epoch 50 Batch 1200/1540] avg loss 0.00158004, throughput 2.85394K wps
[Epoch 50 Batch 1230/1540] avg loss 0.00204346, throughput 2.87629K wps
[Epoch 50 Batch 1260/1540] avg loss 0.00195682, throughput 2.88044K wps
[Epoch 50 Batch 1290/1540] avg loss 0.00141836, throughput 2.8778K wps
[Epoch 50 Batch 1320/1540] avg loss 0.00190664, throughput 2.867K wps
[Epoch 50 Batch 1350/1540] avg loss 0.00190241, throughput 2.82131K wps
[Epoch 50 Batch 1380/1540] avg loss 0.00186319, throughput 2.87229K wps
[Epoch 50 Batch 1410/1540] avg loss 0.00179353, throughput 2.83436K wps
[Epoch 50 Batch 1440/1540] avg loss 0.00166092, throughput 2.86155K wps
[Epoch 50 Batch 1470/1540] avg loss 0.0019531, throughput 2.88182K wps
[Epoch 50 Batch 1500/1540] avg loss 0.00170867, throughput 2.88187K wps
[Epoch 50 Batch 1530/1540] avg loss 0.00214667, throughput 2.87111K wps
Begin Testing...
[Epoch 50] train avg loss 0.00168246, dev acc 0.7913, dev avg loss 0.660416, throughput 2.85713K wps
[Epoch 51 Batch 30/1540] avg loss 0.00142113, throughput 2.86834K wps
[Epoch 51 Batch 60/1540] avg loss 0.00145163, throughput 2.8741K wps
[Epoch 51 Batch 90/1540] avg loss 0.00134122, throughput 2.85712K wps
[Epoch 51 Batch 120/1540] avg loss 0.00182151, throughput 2.87014K wps
[Epoch 51 Batch 150/1540] avg loss 0.00124264, throughput 2.85787K wps
[Epoch 51 Batch 180/1540] avg loss 0.0014713, throughput 2.8658K wps
[Epoch 51 Batch 210/1540] avg loss 0.00152796, throughput 2.87085K wps
[Epoch 51 Batch 240/1540] avg loss 0.0015178, throughput 2.86691K wps
[Epoch 51 Batch 270/1540] avg loss 0.00131892, throughput 2.87903K wps
[Epoch 51 Batch 300/1540] avg loss 0.00159005, throughput 2.86954K wps
[Epoch 51 Batch 330/1540] avg loss 0.00151198, throughput 2.85942K wps
[Epoch 51 Batch 360/1540] avg loss 0.00182372, throughput 2.86946K wps
[Epoch 51 Batch 390/1540] avg loss 0.00154303, throughput 2.82293K wps
[Epoch 51 Batch 420/1540] avg loss 0.00156805, throughput 2.87063K wps
[Epoch 51 Batch 450/1540] avg loss 0.00161715, throughput 2.86853K wps
[Epoch 51 Batch 480/1540] avg loss 0.00152731, throughput 2.86659K wps
[Epoch 51 Batch 510/1540] avg loss 0.00149679, throughput 2.85674K wps
[Epoch 51 Batch 540/1540] avg loss 0.00180771, throughput 2.86877K wps
[Epoch 51 Batch 570/1540] avg loss 0.00185839, throughput 2.87935K wps
[Epoch 51 Batch 600/1540] avg loss 0.00139146, throughput 2.86637K wps
[Epoch 51 Batch 630/1540] avg loss 0.00180738, throughput 2.85123K wps
[Epoch 51 Batch 660/1540] avg loss 0.00176172, throughput 2.8692K wps
[Epoch 51 Batch 690/1540] avg loss 0.00171076, throughput 2.8605K wps
[Epoch 51 Batch 720/1540] avg loss 0.00165953, throughput 2.81385K wps
[Epoch 51 Batch 750/1540] avg loss 0.00151253, throughput 2.88476K wps
[Epoch 51 Batch 780/1540] avg loss 0.00167912, throughput 2.8696K wps
[Epoch 51 Batch 810/1540] avg loss 0.00202941, throughput 2.88401K wps
[Epoch 51 Batch 840/1540] avg loss 0.00154469, throughput 2.88482K wps
[Epoch 51 Batch 870/1540] avg loss 0.00173649, throughput 2.88579K wps
[Epoch 51 Batch 900/1540] avg loss 0.00190205, throughput 2.88536K wps
[Epoch 51 Batch 930/1540] avg loss 0.00187698, throughput 2.85479K wps
[Epoch 51 Batch 960/1540] avg loss 0.00162177, throughput 2.88415K wps
[Epoch 51 Batch 990/1540] avg loss 0.00176214, throughput 2.88701K wps
[Epoch 51 Batch 1020/1540] avg loss 0.00160294, throughput 2.88404K wps
[Epoch 51 Batch 1050/1540] avg loss 0.00170194, throughput 2.88482K wps
[Epoch 51 Batch 1080/1540] avg loss 0.00175652, throughput 2.87339K wps
[Epoch 51 Batch 1110/1540] avg loss 0.00187039, throughput 2.81641K wps
[Epoch 51 Batch 1140/1540] avg loss 0.00158918, throughput 2.88452K wps
[Epoch 51 Batch 1170/1540] avg loss 0.0015759, throughput 2.87086K wps
[Epoch 51 Batch 1200/1540] avg loss 0.00160434, throughput 2.88149K wps
[Epoch 51 Batch 1230/1540] avg loss 0.00165133, throughput 2.80859K wps
[Epoch 51 Batch 1260/1540] avg loss 0.0018894, throughput 2.88342K wps
[Epoch 51 Batch 1290/1540] avg loss 0.00192602, throughput 2.87718K wps
[Epoch 51 Batch 1320/1540] avg loss 0.00176111, throughput 2.87233K wps
[Epoch 51 Batch 1350/1540] avg loss 0.00200248, throughput 2.88203K wps
[Epoch 51 Batch 1380/1540] avg loss 0.001681, throughput 2.87404K wps
[Epoch 51 Batch 1410/1540] avg loss 0.00171559, throughput 2.85847K wps
[Epoch 51 Batch 1440/1540] avg loss 0.00179991, throughput 2.88215K wps
[Epoch 51 Batch 1470/1540] avg loss 0.0015412, throughput 2.87263K wps
[Epoch 51 Batch 1500/1540] avg loss 0.00171294, throughput 2.87423K wps
[Epoch 51 Batch 1530/1540] avg loss 0.00185338, throughput 2.88346K wps
Begin Testing...
[Epoch 51] train avg loss 0.00165996, dev acc 0.8062, dev avg loss 0.633453, throughput 2.86816K wps
[Epoch 52 Batch 30/1540] avg loss 0.00162537, throughput 2.86169K wps
[Epoch 52 Batch 60/1540] avg loss 0.00148024, throughput 2.8646K wps
[Epoch 52 Batch 90/1540] avg loss 0.00142208, throughput 2.8131K wps
[Epoch 52 Batch 120/1540] avg loss 0.00144493, throughput 2.85401K wps
[Epoch 52 Batch 150/1540] avg loss 0.00148324, throughput 2.86412K wps
[Epoch 52 Batch 180/1540] avg loss 0.00135026, throughput 2.81061K wps
[Epoch 52 Batch 210/1540] avg loss 0.00172884, throughput 2.80338K wps
[Epoch 52 Batch 240/1540] avg loss 0.00171706, throughput 2.87511K wps
[Epoch 52 Batch 270/1540] avg loss 0.00148485, throughput 2.88383K wps
[Epoch 52 Batch 300/1540] avg loss 0.00147482, throughput 2.87568K wps
[Epoch 52 Batch 330/1540] avg loss 0.00154557, throughput 2.85388K wps
[Epoch 52 Batch 360/1540] avg loss 0.00159798, throughput 2.82289K wps
[Epoch 52 Batch 390/1540] avg loss 0.00156701, throughput 2.87947K wps
[Epoch 52 Batch 420/1540] avg loss 0.00173656, throughput 2.85971K wps
[Epoch 52 Batch 450/1540] avg loss 0.00177486, throughput 2.88332K wps
[Epoch 52 Batch 480/1540] avg loss 0.00185811, throughput 2.83916K wps
[Epoch 52 Batch 510/1540] avg loss 0.00184607, throughput 2.82222K wps
[Epoch 52 Batch 540/1540] avg loss 0.00151204, throughput 2.86881K wps
[Epoch 52 Batch 570/1540] avg loss 0.00176057, throughput 2.84178K wps
[Epoch 52 Batch 600/1540] avg loss 0.00202902, throughput 2.85813K wps
[Epoch 52 Batch 630/1540] avg loss 0.00171614, throughput 2.86147K wps
[Epoch 52 Batch 660/1540] avg loss 0.00173948, throughput 2.85963K wps
[Epoch 52 Batch 690/1540] avg loss 0.00148115, throughput 2.87524K wps
[Epoch 52 Batch 720/1540] avg loss 0.00192555, throughput 2.86326K wps
[Epoch 52 Batch 750/1540] avg loss 0.00171542, throughput 2.79699K wps
[Epoch 52 Batch 780/1540] avg loss 0.00150116, throughput 2.78204K wps
[Epoch 52 Batch 810/1540] avg loss 0.00156722, throughput 2.80226K wps
[Epoch 52 Batch 840/1540] avg loss 0.00200594, throughput 2.87808K wps
[Epoch 52 Batch 870/1540] avg loss 0.00158052, throughput 2.87695K wps
[Epoch 52 Batch 900/1540] avg loss 0.00140851, throughput 2.86751K wps
[Epoch 52 Batch 930/1540] avg loss 0.00146551, throughput 2.86954K wps
[Epoch 52 Batch 960/1540] avg loss 0.00167644, throughput 2.87498K wps
[Epoch 52 Batch 990/1540] avg loss 0.00168439, throughput 2.8548K wps
[Epoch 52 Batch 1020/1540] avg loss 0.00165303, throughput 2.87458K wps
[Epoch 52 Batch 1050/1540] avg loss 0.00162632, throughput 2.8074K wps
[Epoch 52 Batch 1080/1540] avg loss 0.00168283, throughput 2.81428K wps
[Epoch 52 Batch 1110/1540] avg loss 0.0018335, throughput 2.7795K wps
[Epoch 52 Batch 1140/1540] avg loss 0.00167305, throughput 2.85064K wps
[Epoch 52 Batch 1170/1540] avg loss 0.00187811, throughput 2.88094K wps
[Epoch 52 Batch 1200/1540] avg loss 0.00180566, throughput 2.85154K wps
[Epoch 52 Batch 1230/1540] avg loss 0.00160238, throughput 2.87958K wps
[Epoch 52 Batch 1260/1540] avg loss 0.00205542, throughput 2.83911K wps
[Epoch 52 Batch 1290/1540] avg loss 0.00154399, throughput 2.85099K wps
[Epoch 52 Batch 1320/1540] avg loss 0.00145603, throughput 2.87235K wps
[Epoch 52 Batch 1350/1540] avg loss 0.00138091, throughput 2.87373K wps
[Epoch 52 Batch 1380/1540] avg loss 0.00171187, throughput 2.86495K wps
[Epoch 52 Batch 1410/1540] avg loss 0.00166282, throughput 2.87544K wps
[Epoch 52 Batch 1440/1540] avg loss 0.00208419, throughput 2.86192K wps
[Epoch 52 Batch 1470/1540] avg loss 0.00186393, throughput 2.85K wps
[Epoch 52 Batch 1500/1540] avg loss 0.00212444, throughput 2.54186K wps
[Epoch 52 Batch 1530/1540] avg loss 0.0017496, throughput 2.87165K wps
Begin Testing...
[Epoch 52] train avg loss 0.00167242, dev acc 0.8050, dev avg loss 0.655776, throughput 2.84503K wps
[Epoch 53 Batch 30/1540] avg loss 0.00138012, throughput 2.86331K wps
[Epoch 53 Batch 60/1540] avg loss 0.00157063, throughput 2.87591K wps
[Epoch 53 Batch 90/1540] avg loss 0.00139439, throughput 2.88467K wps
[Epoch 53 Batch 120/1540] avg loss 0.00133002, throughput 2.88124K wps
[Epoch 53 Batch 150/1540] avg loss 0.00164539, throughput 2.88087K wps
[Epoch 53 Batch 180/1540] avg loss 0.00152802, throughput 2.82601K wps
[Epoch 53 Batch 210/1540] avg loss 0.00147438, throughput 2.84066K wps
[Epoch 53 Batch 240/1540] avg loss 0.00178636, throughput 2.88152K wps
[Epoch 53 Batch 270/1540] avg loss 0.00140357, throughput 2.88066K wps
[Epoch 53 Batch 300/1540] avg loss 0.00154853, throughput 2.87625K wps
[Epoch 53 Batch 330/1540] avg loss 0.00170519, throughput 2.88164K wps
[Epoch 53 Batch 360/1540] avg loss 0.00177349, throughput 2.85953K wps
[Epoch 53 Batch 390/1540] avg loss 0.00168981, throughput 2.80832K wps
[Epoch 53 Batch 420/1540] avg loss 0.00165937, throughput 2.85164K wps
[Epoch 53 Batch 450/1540] avg loss 0.00157112, throughput 2.87831K wps
[Epoch 53 Batch 480/1540] avg loss 0.00181099, throughput 2.87934K wps
[Epoch 53 Batch 510/1540] avg loss 0.00169652, throughput 2.88212K wps
[Epoch 53 Batch 540/1540] avg loss 0.00149292, throughput 2.87237K wps
[Epoch 53 Batch 570/1540] avg loss 0.00170467, throughput 2.87726K wps
[Epoch 53 Batch 600/1540] avg loss 0.00154741, throughput 2.82721K wps
[Epoch 53 Batch 630/1540] avg loss 0.00133283, throughput 2.85948K wps
[Epoch 53 Batch 660/1540] avg loss 0.00159838, throughput 2.82048K wps
[Epoch 53 Batch 690/1540] avg loss 0.00180991, throughput 2.87223K wps
[Epoch 53 Batch 720/1540] avg loss 0.00192515, throughput 2.83291K wps
[Epoch 53 Batch 750/1540] avg loss 0.00160697, throughput 2.82786K wps
[Epoch 53 Batch 780/1540] avg loss 0.00147094, throughput 2.86267K wps
[Epoch 53 Batch 810/1540] avg loss 0.00150221, throughput 2.87425K wps
[Epoch 53 Batch 840/1540] avg loss 0.00154197, throughput 2.87199K wps
[Epoch 53 Batch 870/1540] avg loss 0.00150239, throughput 2.87396K wps
[Epoch 53 Batch 900/1540] avg loss 0.00155505, throughput 2.792K wps
[Epoch 53 Batch 930/1540] avg loss 0.00162928, throughput 2.87322K wps
[Epoch 53 Batch 960/1540] avg loss 0.00168191, throughput 2.86278K wps
[Epoch 53 Batch 990/1540] avg loss 0.00182835, throughput 2.79529K wps
[Epoch 53 Batch 1020/1540] avg loss 0.0016458, throughput 2.80344K wps
[Epoch 53 Batch 1050/1540] avg loss 0.0018479, throughput 2.86896K wps
[Epoch 53 Batch 1080/1540] avg loss 0.00120049, throughput 2.87188K wps
[Epoch 53 Batch 1110/1540] avg loss 0.00170693, throughput 2.86333K wps
[Epoch 53 Batch 1140/1540] avg loss 0.00163802, throughput 2.87176K wps
[Epoch 53 Batch 1170/1540] avg loss 0.00193366, throughput 2.8769K wps
[Epoch 53 Batch 1200/1540] avg loss 0.00144651, throughput 2.8557K wps
[Epoch 53 Batch 1230/1540] avg loss 0.00173029, throughput 2.85871K wps
[Epoch 53 Batch 1260/1540] avg loss 0.00192992, throughput 2.86149K wps
[Epoch 53 Batch 1290/1540] avg loss 0.00162727, throughput 2.87382K wps
[Epoch 53 Batch 1320/1540] avg loss 0.00199145, throughput 2.81355K wps
[Epoch 53 Batch 1350/1540] avg loss 0.00177148, throughput 2.86949K wps
[Epoch 53 Batch 1380/1540] avg loss 0.0015248, throughput 2.88547K wps
[Epoch 53 Batch 1410/1540] avg loss 0.00183203, throughput 2.84754K wps
[Epoch 53 Batch 1440/1540] avg loss 0.00190012, throughput 2.79583K wps
[Epoch 53 Batch 1470/1540] avg loss 0.00157552, throughput 2.83925K wps
[Epoch 53 Batch 1500/1540] avg loss 0.00165161, throughput 2.86491K wps
[Epoch 53 Batch 1530/1540] avg loss 0.0016431, throughput 2.83309K wps
Begin Testing...
[Epoch 53] train avg loss 0.00163323, dev acc 0.8085, dev avg loss 0.635103, throughput 2.8564K wps
[Epoch 54 Batch 30/1540] avg loss 0.00163539, throughput 2.91631K wps
[Epoch 54 Batch 60/1540] avg loss 0.00147205, throughput 2.87638K wps
[Epoch 54 Batch 90/1540] avg loss 0.00134389, throughput 2.87884K wps
[Epoch 54 Batch 120/1540] avg loss 0.00128834, throughput 2.86632K wps
[Epoch 54 Batch 150/1540] avg loss 0.00166614, throughput 2.88295K wps
[Epoch 54 Batch 180/1540] avg loss 0.00129822, throughput 2.87777K wps
[Epoch 54 Batch 210/1540] avg loss 0.00159612, throughput 2.8603K wps
[Epoch 54 Batch 240/1540] avg loss 0.00132451, throughput 2.87012K wps
[Epoch 54 Batch 270/1540] avg loss 0.00132529, throughput 2.84394K wps
[Epoch 54 Batch 300/1540] avg loss 0.0014761, throughput 2.83706K wps
[Epoch 54 Batch 330/1540] avg loss 0.00151154, throughput 2.80271K wps
[Epoch 54 Batch 360/1540] avg loss 0.0016082, throughput 2.85393K wps
[Epoch 54 Batch 390/1540] avg loss 0.00133203, throughput 2.82578K wps
[Epoch 54 Batch 420/1540] avg loss 0.00174571, throughput 2.86889K wps
[Epoch 54 Batch 450/1540] avg loss 0.00137586, throughput 2.88483K wps
[Epoch 54 Batch 480/1540] avg loss 0.00172536, throughput 2.87634K wps
[Epoch 54 Batch 510/1540] avg loss 0.00167793, throughput 2.87835K wps
[Epoch 54 Batch 540/1540] avg loss 0.00156317, throughput 2.872K wps
[Epoch 54 Batch 570/1540] avg loss 0.00151037, throughput 2.87435K wps
[Epoch 54 Batch 600/1540] avg loss 0.00157829, throughput 2.80165K wps
[Epoch 54 Batch 630/1540] avg loss 0.00108632, throughput 2.84868K wps
[Epoch 54 Batch 660/1540] avg loss 0.00177253, throughput 2.83419K wps
[Epoch 54 Batch 690/1540] avg loss 0.00169602, throughput 2.85752K wps
[Epoch 54 Batch 720/1540] avg loss 0.0017772, throughput 2.8811K wps
[Epoch 54 Batch 750/1540] avg loss 0.00169919, throughput 2.87949K wps
[Epoch 54 Batch 780/1540] avg loss 0.00140909, throughput 2.88607K wps
[Epoch 54 Batch 810/1540] avg loss 0.00168464, throughput 2.88158K wps
[Epoch 54 Batch 840/1540] avg loss 0.00155154, throughput 2.86996K wps
[Epoch 54 Batch 870/1540] avg loss 0.00168542, throughput 2.84763K wps
[Epoch 54 Batch 900/1540] avg loss 0.00185692, throughput 2.80756K wps
[Epoch 54 Batch 930/1540] avg loss 0.00184696, throughput 2.86485K wps
[Epoch 54 Batch 960/1540] avg loss 0.00193656, throughput 2.82947K wps
[Epoch 54 Batch 990/1540] avg loss 0.00185081, throughput 2.8521K wps
[Epoch 54 Batch 1020/1540] avg loss 0.00132971, throughput 2.82283K wps
[Epoch 54 Batch 1050/1540] avg loss 0.00161826, throughput 2.8322K wps
[Epoch 54 Batch 1080/1540] avg loss 0.00145201, throughput 2.88581K wps
[Epoch 54 Batch 1110/1540] avg loss 0.00178705, throughput 2.88436K wps
[Epoch 54 Batch 1140/1540] avg loss 0.00189586, throughput 2.85717K wps
[Epoch 54 Batch 1170/1540] avg loss 0.00180082, throughput 2.83546K wps
[Epoch 54 Batch 1200/1540] avg loss 0.00141061, throughput 2.87258K wps
[Epoch 54 Batch 1230/1540] avg loss 0.00157085, throughput 2.86546K wps
[Epoch 54 Batch 1260/1540] avg loss 0.00160421, throughput 2.80864K wps
[Epoch 54 Batch 1290/1540] avg loss 0.00185784, throughput 2.8629K wps
[Epoch 54 Batch 1320/1540] avg loss 0.00169134, throughput 2.8121K wps
[Epoch 54 Batch 1350/1540] avg loss 0.00159699, throughput 2.84051K wps
[Epoch 54 Batch 1380/1540] avg loss 0.00146613, throughput 2.87057K wps
[Epoch 54 Batch 1410/1540] avg loss 0.00151952, throughput 2.81977K wps
[Epoch 54 Batch 1440/1540] avg loss 0.00142099, throughput 2.81522K wps
[Epoch 54 Batch 1470/1540] avg loss 0.00185266, throughput 2.88317K wps
[Epoch 54 Batch 1500/1540] avg loss 0.00148345, throughput 2.82458K wps
[Epoch 54 Batch 1530/1540] avg loss 0.00186429, throughput 2.80659K wps
Begin Testing...
[Epoch 54] train avg loss 0.00159458, dev acc 0.8005, dev avg loss 0.657767, throughput 2.85406K wps
[Epoch 55 Batch 30/1540] avg loss 0.00124492, throughput 2.94217K wps
[Epoch 55 Batch 60/1540] avg loss 0.00147765, throughput 2.87665K wps
[Epoch 55 Batch 90/1540] avg loss 0.0014715, throughput 2.88333K wps
[Epoch 55 Batch 120/1540] avg loss 0.00127482, throughput 2.86448K wps
[Epoch 55 Batch 150/1540] avg loss 0.00158108, throughput 2.88045K wps
[Epoch 55 Batch 180/1540] avg loss 0.00153626, throughput 2.88327K wps
[Epoch 55 Batch 210/1540] avg loss 0.00131076, throughput 2.86745K wps
[Epoch 55 Batch 240/1540] avg loss 0.00130059, throughput 2.87916K wps
[Epoch 55 Batch 270/1540] avg loss 0.00122245, throughput 2.87604K wps
[Epoch 55 Batch 300/1540] avg loss 0.00161689, throughput 2.87422K wps
[Epoch 55 Batch 330/1540] avg loss 0.00148988, throughput 2.883K wps
[Epoch 55 Batch 360/1540] avg loss 0.0014667, throughput 2.85607K wps
[Epoch 55 Batch 390/1540] avg loss 0.00174823, throughput 2.87035K wps
[Epoch 55 Batch 420/1540] avg loss 0.00149389, throughput 2.86326K wps
[Epoch 55 Batch 450/1540] avg loss 0.00179486, throughput 2.87781K wps
[Epoch 55 Batch 480/1540] avg loss 0.00157939, throughput 2.87517K wps
[Epoch 55 Batch 510/1540] avg loss 0.00150466, throughput 2.87598K wps
[Epoch 55 Batch 540/1540] avg loss 0.0013907, throughput 2.87165K wps
[Epoch 55 Batch 570/1540] avg loss 0.00143572, throughput 2.83788K wps
[Epoch 55 Batch 600/1540] avg loss 0.0015909, throughput 2.82607K wps
[Epoch 55 Batch 630/1540] avg loss 0.00160312, throughput 2.88278K wps
[Epoch 55 Batch 660/1540] avg loss 0.00149862, throughput 2.87852K wps
[Epoch 55 Batch 690/1540] avg loss 0.00173266, throughput 2.87597K wps
[Epoch 55 Batch 720/1540] avg loss 0.00148964, throughput 2.79391K wps
[Epoch 55 Batch 750/1540] avg loss 0.0015979, throughput 2.84196K wps
[Epoch 55 Batch 780/1540] avg loss 0.00177361, throughput 2.87524K wps
[Epoch 55 Batch 810/1540] avg loss 0.00160129, throughput 2.87262K wps
[Epoch 55 Batch 840/1540] avg loss 0.0013774, throughput 2.87185K wps
[Epoch 55 Batch 870/1540] avg loss 0.00168366, throughput 2.82126K wps
[Epoch 55 Batch 900/1540] avg loss 0.00148098, throughput 2.79922K wps
[Epoch 55 Batch 930/1540] avg loss 0.0016431, throughput 2.8793K wps
[Epoch 55 Batch 960/1540] avg loss 0.00185475, throughput 2.88677K wps
[Epoch 55 Batch 990/1540] avg loss 0.00207485, throughput 2.87555K wps
[Epoch 55 Batch 1020/1540] avg loss 0.00150112, throughput 2.86345K wps
[Epoch 55 Batch 1050/1540] avg loss 0.00173785, throughput 2.86573K wps
[Epoch 55 Batch 1080/1540] avg loss 0.00136793, throughput 2.879K wps
[Epoch 55 Batch 1110/1540] avg loss 0.00142333, throughput 2.89026K wps
[Epoch 55 Batch 1140/1540] avg loss 0.00143336, throughput 2.88188K wps
[Epoch 55 Batch 1170/1540] avg loss 0.00148153, throughput 2.88846K wps
[Epoch 55 Batch 1200/1540] avg loss 0.00167603, throughput 2.83104K wps
[Epoch 55 Batch 1230/1540] avg loss 0.00159012, throughput 2.88364K wps
[Epoch 55 Batch 1260/1540] avg loss 0.00151897, throughput 2.85706K wps
[Epoch 55 Batch 1290/1540] avg loss 0.00166813, throughput 2.81419K wps
[Epoch 55 Batch 1320/1540] avg loss 0.00161345, throughput 2.80834K wps
[Epoch 55 Batch 1350/1540] avg loss 0.00177007, throughput 2.88157K wps
[Epoch 55 Batch 1380/1540] avg loss 0.00169843, throughput 2.87526K wps
[Epoch 55 Batch 1410/1540] avg loss 0.00175196, throughput 2.87951K wps
[Epoch 55 Batch 1440/1540] avg loss 0.0017585, throughput 2.85879K wps
[Epoch 55 Batch 1470/1540] avg loss 0.00170235, throughput 2.83321K wps
[Epoch 55 Batch 1500/1540] avg loss 0.00170964, throughput 2.84866K wps
[Epoch 55 Batch 1530/1540] avg loss 0.00165325, throughput 2.83324K wps
Begin Testing...
[Epoch 55] train avg loss 0.00156992, dev acc 0.7959, dev avg loss 0.685755, throughput 2.86445K wps
[Epoch 56 Batch 30/1540] avg loss 0.00180244, throughput 2.86188K wps
[Epoch 56 Batch 60/1540] avg loss 0.00105024, throughput 2.88728K wps
[Epoch 56 Batch 90/1540] avg loss 0.0015301, throughput 2.8593K wps
[Epoch 56 Batch 120/1540] avg loss 0.00146804, throughput 2.84137K wps
[Epoch 56 Batch 150/1540] avg loss 0.00129625, throughput 2.86782K wps
[Epoch 56 Batch 180/1540] avg loss 0.00156668, throughput 2.87431K wps
[Epoch 56 Batch 210/1540] avg loss 0.00157356, throughput 2.87594K wps
[Epoch 56 Batch 240/1540] avg loss 0.00163485, throughput 2.86238K wps
[Epoch 56 Batch 270/1540] avg loss 0.00141531, throughput 2.88131K wps
[Epoch 56 Batch 300/1540] avg loss 0.00143207, throughput 2.87271K wps
[Epoch 56 Batch 330/1540] avg loss 0.00162955, throughput 2.8747K wps
[Epoch 56 Batch 360/1540] avg loss 0.00133976, throughput 2.86268K wps
[Epoch 56 Batch 390/1540] avg loss 0.00158179, throughput 2.80506K wps
[Epoch 56 Batch 420/1540] avg loss 0.00168417, throughput 2.87031K wps
[Epoch 56 Batch 450/1540] avg loss 0.00146233, throughput 2.86751K wps
[Epoch 56 Batch 480/1540] avg loss 0.00140249, throughput 2.87294K wps
[Epoch 56 Batch 510/1540] avg loss 0.00155015, throughput 2.87101K wps
[Epoch 56 Batch 540/1540] avg loss 0.00142226, throughput 2.87101K wps
[Epoch 56 Batch 570/1540] avg loss 0.0016842, throughput 2.86902K wps
[Epoch 56 Batch 600/1540] avg loss 0.00188379, throughput 2.86726K wps
[Epoch 56 Batch 630/1540] avg loss 0.00145573, throughput 2.8615K wps
[Epoch 56 Batch 660/1540] avg loss 0.00156801, throughput 2.84122K wps
[Epoch 56 Batch 690/1540] avg loss 0.00164164, throughput 2.85993K wps
[Epoch 56 Batch 720/1540] avg loss 0.00146044, throughput 2.85565K wps
[Epoch 56 Batch 750/1540] avg loss 0.00146909, throughput 2.86088K wps
[Epoch 56 Batch 780/1540] avg loss 0.0020578, throughput 2.77242K wps
[Epoch 56 Batch 810/1540] avg loss 0.00158001, throughput 2.86375K wps
[Epoch 56 Batch 840/1540] avg loss 0.00153885, throughput 2.87653K wps
[Epoch 56 Batch 870/1540] avg loss 0.00148596, throughput 2.8639K wps
[Epoch 56 Batch 900/1540] avg loss 0.00162622, throughput 2.8721K wps
[Epoch 56 Batch 930/1540] avg loss 0.00147645, throughput 2.86784K wps
[Epoch 56 Batch 960/1540] avg loss 0.00136855, throughput 2.87083K wps
[Epoch 56 Batch 990/1540] avg loss 0.00168565, throughput 2.8515K wps
[Epoch 56 Batch 1020/1540] avg loss 0.00167823, throughput 2.82489K wps
[Epoch 56 Batch 1050/1540] avg loss 0.00169685, throughput 2.83455K wps
[Epoch 56 Batch 1080/1540] avg loss 0.00195807, throughput 2.87659K wps
[Epoch 56 Batch 1110/1540] avg loss 0.00159418, throughput 2.85473K wps
[Epoch 56 Batch 1140/1540] avg loss 0.00175116, throughput 2.83794K wps
[Epoch 56 Batch 1170/1540] avg loss 0.00168192, throughput 2.87374K wps
[Epoch 56 Batch 1200/1540] avg loss 0.00147368, throughput 2.83907K wps
[Epoch 56 Batch 1230/1540] avg loss 0.00170464, throughput 2.80119K wps
[Epoch 56 Batch 1260/1540] avg loss 0.00152644, throughput 2.78366K wps
[Epoch 56 Batch 1290/1540] avg loss 0.00141947, throughput 2.83598K wps
[Epoch 56 Batch 1320/1540] avg loss 0.00167867, throughput 2.79508K wps
[Epoch 56 Batch 1350/1540] avg loss 0.00149218, throughput 2.87341K wps
[Epoch 56 Batch 1380/1540] avg loss 0.00177889, throughput 2.87519K wps
[Epoch 56 Batch 1410/1540] avg loss 0.00147678, throughput 2.8847K wps
[Epoch 56 Batch 1440/1540] avg loss 0.00175865, throughput 2.88313K wps
[Epoch 56 Batch 1470/1540] avg loss 0.00149894, throughput 2.84294K wps
[Epoch 56 Batch 1500/1540] avg loss 0.00157985, throughput 2.84399K wps
[Epoch 56 Batch 1530/1540] avg loss 0.00169065, throughput 2.85541K wps
Begin Testing...
[Epoch 56] train avg loss 0.00157215, dev acc 0.8073, dev avg loss 0.657178, throughput 2.85576K wps
[Epoch 57 Batch 30/1540] avg loss 0.00129406, throughput 2.90139K wps
[Epoch 57 Batch 60/1540] avg loss 0.00133932, throughput 2.88339K wps
[Epoch 57 Batch 90/1540] avg loss 0.00155277, throughput 2.884K wps
[Epoch 57 Batch 120/1540] avg loss 0.00154759, throughput 2.87071K wps
[Epoch 57 Batch 150/1540] avg loss 0.00138028, throughput 2.88335K wps
[Epoch 57 Batch 180/1540] avg loss 0.00125524, throughput 2.79425K wps
[Epoch 57 Batch 210/1540] avg loss 0.00139049, throughput 2.85905K wps
[Epoch 57 Batch 240/1540] avg loss 0.00135416, throughput 2.88176K wps
[Epoch 57 Batch 270/1540] avg loss 0.00147437, throughput 2.87061K wps
[Epoch 57 Batch 300/1540] avg loss 0.0017262, throughput 2.86509K wps
[Epoch 57 Batch 330/1540] avg loss 0.00131657, throughput 2.85015K wps
[Epoch 57 Batch 360/1540] avg loss 0.00141367, throughput 2.88182K wps
[Epoch 57 Batch 390/1540] avg loss 0.00142911, throughput 2.88341K wps
[Epoch 57 Batch 420/1540] avg loss 0.00165859, throughput 2.8507K wps
[Epoch 57 Batch 450/1540] avg loss 0.0013082, throughput 2.85843K wps
[Epoch 57 Batch 480/1540] avg loss 0.00169842, throughput 2.80663K wps
[Epoch 57 Batch 510/1540] avg loss 0.00119489, throughput 2.84821K wps
[Epoch 57 Batch 540/1540] avg loss 0.00152175, throughput 2.85803K wps
[Epoch 57 Batch 570/1540] avg loss 0.00137764, throughput 2.85072K wps
[Epoch 57 Batch 600/1540] avg loss 0.00144227, throughput 2.88327K wps
[Epoch 57 Batch 630/1540] avg loss 0.00150098, throughput 2.86997K wps
[Epoch 57 Batch 660/1540] avg loss 0.00111399, throughput 2.85787K wps
[Epoch 57 Batch 690/1540] avg loss 0.00137922, throughput 2.8314K wps
[Epoch 57 Batch 720/1540] avg loss 0.00172716, throughput 2.87603K wps
[Epoch 57 Batch 750/1540] avg loss 0.00162057, throughput 2.81583K wps
[Epoch 57 Batch 780/1540] avg loss 0.0016453, throughput 2.80872K wps
[Epoch 57 Batch 810/1540] avg loss 0.00132312, throughput 2.88822K wps
[Epoch 57 Batch 840/1540] avg loss 0.00178436, throughput 2.88075K wps
[Epoch 57 Batch 870/1540] avg loss 0.00160303, throughput 2.8273K wps
[Epoch 57 Batch 900/1540] avg loss 0.00163372, throughput 2.87672K wps
[Epoch 57 Batch 930/1540] avg loss 0.00186424, throughput 2.86459K wps
[Epoch 57 Batch 960/1540] avg loss 0.00176927, throughput 2.86477K wps
[Epoch 57 Batch 990/1540] avg loss 0.00163969, throughput 2.86864K wps
[Epoch 57 Batch 1020/1540] avg loss 0.00209186, throughput 2.8736K wps
[Epoch 57 Batch 1050/1540] avg loss 0.00158411, throughput 2.87072K wps
[Epoch 57 Batch 1080/1540] avg loss 0.00177683, throughput 2.82401K wps
[Epoch 57 Batch 1110/1540] avg loss 0.00161747, throughput 2.83668K wps
[Epoch 57 Batch 1140/1540] avg loss 0.00145895, throughput 2.84477K wps
[Epoch 57 Batch 1170/1540] avg loss 0.00167422, throughput 2.88238K wps
[Epoch 57 Batch 1200/1540] avg loss 0.00161537, throughput 2.8365K wps
[Epoch 57 Batch 1230/1540] avg loss 0.00164841, throughput 2.81447K wps
[Epoch 57 Batch 1260/1540] avg loss 0.00164964, throughput 2.82555K wps
[Epoch 57 Batch 1290/1540] avg loss 0.00161747, throughput 2.85627K wps
[Epoch 57 Batch 1320/1540] avg loss 0.00179286, throughput 2.85611K wps
[Epoch 57 Batch 1350/1540] avg loss 0.0016082, throughput 2.88363K wps
[Epoch 57 Batch 1380/1540] avg loss 0.0014972, throughput 2.83392K wps
[Epoch 57 Batch 1410/1540] avg loss 0.00192229, throughput 2.84645K wps
[Epoch 57 Batch 1440/1540] avg loss 0.00138704, throughput 2.82374K wps
[Epoch 57 Batch 1470/1540] avg loss 0.00145789, throughput 2.83121K wps
[Epoch 57 Batch 1500/1540] avg loss 0.00170267, throughput 2.87094K wps
[Epoch 57 Batch 1530/1540] avg loss 0.00161501, throughput 2.83726K wps
Begin Testing...
[Epoch 57] train avg loss 0.00154779, dev acc 0.8005, dev avg loss 0.664925, throughput 2.85511K wps
[Epoch 58 Batch 30/1540] avg loss 0.00129303, throughput 2.85846K wps
[Epoch 58 Batch 60/1540] avg loss 0.00113181, throughput 2.86961K wps
[Epoch 58 Batch 90/1540] avg loss 0.00168711, throughput 2.83382K wps
[Epoch 58 Batch 120/1540] avg loss 0.00148442, throughput 2.82927K wps
[Epoch 58 Batch 150/1540] avg loss 0.00142422, throughput 2.88219K wps
[Epoch 58 Batch 180/1540] avg loss 0.00129752, throughput 2.88008K wps
[Epoch 58 Batch 210/1540] avg loss 0.00124683, throughput 2.87841K wps
[Epoch 58 Batch 240/1540] avg loss 0.00137467, throughput 2.86908K wps
[Epoch 58 Batch 270/1540] avg loss 0.00138056, throughput 2.87517K wps
[Epoch 58 Batch 300/1540] avg loss 0.00138423, throughput 2.87K wps
[Epoch 58 Batch 330/1540] avg loss 0.00165849, throughput 2.88302K wps
[Epoch 58 Batch 360/1540] avg loss 0.00151811, throughput 2.86743K wps
[Epoch 58 Batch 390/1540] avg loss 0.00127148, throughput 2.87K wps
[Epoch 58 Batch 420/1540] avg loss 0.00151953, throughput 2.87107K wps
[Epoch 58 Batch 450/1540] avg loss 0.00161737, throughput 2.80169K wps
[Epoch 58 Batch 480/1540] avg loss 0.001325, throughput 2.83576K wps
[Epoch 58 Batch 510/1540] avg loss 0.00168517, throughput 2.85643K wps
[Epoch 58 Batch 540/1540] avg loss 0.0014829, throughput 2.88118K wps
[Epoch 58 Batch 570/1540] avg loss 0.00150121, throughput 2.88089K wps
[Epoch 58 Batch 600/1540] avg loss 0.00169595, throughput 2.88035K wps
[Epoch 58 Batch 630/1540] avg loss 0.00138872, throughput 2.84243K wps
[Epoch 58 Batch 660/1540] avg loss 0.0014494, throughput 2.88051K wps
[Epoch 58 Batch 690/1540] avg loss 0.00151378, throughput 2.84643K wps
[Epoch 58 Batch 720/1540] avg loss 0.00204804, throughput 2.84273K wps
[Epoch 58 Batch 750/1540] avg loss 0.00156839, throughput 2.88487K wps
[Epoch 58 Batch 780/1540] avg loss 0.00127075, throughput 2.88554K wps
[Epoch 58 Batch 810/1540] avg loss 0.00141047, throughput 2.86696K wps
[Epoch 58 Batch 840/1540] avg loss 0.00135412, throughput 2.80089K wps
[Epoch 58 Batch 870/1540] avg loss 0.00153328, throughput 2.79483K wps
[Epoch 58 Batch 900/1540] avg loss 0.00136481, throughput 2.86328K wps
[Epoch 58 Batch 930/1540] avg loss 0.00147852, throughput 2.86779K wps
[Epoch 58 Batch 960/1540] avg loss 0.00172654, throughput 2.8827K wps
[Epoch 58 Batch 990/1540] avg loss 0.00169902, throughput 2.84401K wps
[Epoch 58 Batch 1020/1540] avg loss 0.00136539, throughput 2.84906K wps
[Epoch 58 Batch 1050/1540] avg loss 0.0016565, throughput 2.86984K wps
[Epoch 58 Batch 1080/1540] avg loss 0.00170997, throughput 2.8055K wps
[Epoch 58 Batch 1110/1540] avg loss 0.00146026, throughput 2.86358K wps
[Epoch 58 Batch 1140/1540] avg loss 0.00144474, throughput 2.87833K wps
[Epoch 58 Batch 1170/1540] avg loss 0.00166011, throughput 2.87509K wps
[Epoch 58 Batch 1200/1540] avg loss 0.00172343, throughput 2.88205K wps
[Epoch 58 Batch 1230/1540] avg loss 0.00140404, throughput 2.87473K wps
[Epoch 58 Batch 1260/1540] avg loss 0.00157241, throughput 2.83549K wps
[Epoch 58 Batch 1290/1540] avg loss 0.00149957, throughput 2.80755K wps
[Epoch 58 Batch 1320/1540] avg loss 0.00153691, throughput 2.8537K wps
[Epoch 58 Batch 1350/1540] avg loss 0.0015925, throughput 2.82142K wps
[Epoch 58 Batch 1380/1540] avg loss 0.00166416, throughput 2.79352K wps
[Epoch 58 Batch 1410/1540] avg loss 0.00181628, throughput 2.86475K wps
[Epoch 58 Batch 1440/1540] avg loss 0.00153643, throughput 2.84471K wps
[Epoch 58 Batch 1470/1540] avg loss 0.00146927, throughput 2.81547K wps
[Epoch 58 Batch 1500/1540] avg loss 0.00180491, throughput 2.84358K wps
[Epoch 58 Batch 1530/1540] avg loss 0.00144835, throughput 2.87319K wps
Begin Testing...
[Epoch 58] train avg loss 0.00151459, dev acc 0.8039, dev avg loss 0.684326, throughput 2.85525K wps
[Epoch 59 Batch 30/1540] avg loss 0.00145384, throughput 2.85579K wps
[Epoch 59 Batch 60/1540] avg loss 0.00143124, throughput 2.86851K wps
[Epoch 59 Batch 90/1540] avg loss 0.00116401, throughput 2.84883K wps
[Epoch 59 Batch 120/1540] avg loss 0.00143982, throughput 2.87951K wps
[Epoch 59 Batch 150/1540] avg loss 0.0014341, throughput 2.87778K wps
[Epoch 59 Batch 180/1540] avg loss 0.00122985, throughput 2.79998K wps
[Epoch 59 Batch 210/1540] avg loss 0.00153862, throughput 2.83904K wps
[Epoch 59 Batch 240/1540] avg loss 0.00132901, throughput 2.82556K wps
[Epoch 59 Batch 270/1540] avg loss 0.00140804, throughput 2.84223K wps
[Epoch 59 Batch 300/1540] avg loss 0.00145515, throughput 2.86795K wps
[Epoch 59 Batch 330/1540] avg loss 0.00157652, throughput 2.85145K wps
[Epoch 59 Batch 360/1540] avg loss 0.00135847, throughput 2.87217K wps
[Epoch 59 Batch 390/1540] avg loss 0.00158372, throughput 2.88692K wps
[Epoch 59 Batch 420/1540] avg loss 0.00147964, throughput 2.88757K wps
[Epoch 59 Batch 450/1540] avg loss 0.00136952, throughput 2.87489K wps
[Epoch 59 Batch 480/1540] avg loss 0.00140544, throughput 2.87945K wps
[Epoch 59 Batch 510/1540] avg loss 0.00160071, throughput 2.8827K wps
[Epoch 59 Batch 540/1540] avg loss 0.00136304, throughput 2.88699K wps
[Epoch 59 Batch 570/1540] avg loss 0.00140471, throughput 2.84877K wps
[Epoch 59 Batch 600/1540] avg loss 0.00138006, throughput 2.84303K wps
[Epoch 59 Batch 630/1540] avg loss 0.00188352, throughput 2.86643K wps
[Epoch 59 Batch 660/1540] avg loss 0.00152358, throughput 2.8837K wps
[Epoch 59 Batch 690/1540] avg loss 0.0013352, throughput 2.87859K wps
[Epoch 59 Batch 720/1540] avg loss 0.00152586, throughput 2.88133K wps
[Epoch 59 Batch 750/1540] avg loss 0.00156961, throughput 2.87534K wps
[Epoch 59 Batch 780/1540] avg loss 0.00143647, throughput 2.85028K wps
[Epoch 59 Batch 810/1540] avg loss 0.00146483, throughput 2.86558K wps
[Epoch 59 Batch 840/1540] avg loss 0.00151164, throughput 2.88385K wps
[Epoch 59 Batch 870/1540] avg loss 0.00156588, throughput 2.88141K wps
[Epoch 59 Batch 900/1540] avg loss 0.00137261, throughput 2.88307K wps
[Epoch 59 Batch 930/1540] avg loss 0.00170766, throughput 2.858K wps
[Epoch 59 Batch 960/1540] avg loss 0.0013793, throughput 2.88333K wps
[Epoch 59 Batch 990/1540] avg loss 0.00138645, throughput 2.87969K wps
[Epoch 59 Batch 1020/1540] avg loss 0.00183637, throughput 2.85528K wps
[Epoch 59 Batch 1050/1540] avg loss 0.00165778, throughput 2.83266K wps
[Epoch 59 Batch 1080/1540] avg loss 0.00162576, throughput 2.8038K wps
[Epoch 59 Batch 1110/1540] avg loss 0.00165606, throughput 2.85031K wps
[Epoch 59 Batch 1140/1540] avg loss 0.00140849, throughput 2.8768K wps
[Epoch 59 Batch 1170/1540] avg loss 0.00161777, throughput 2.88153K wps
[Epoch 59 Batch 1200/1540] avg loss 0.00131224, throughput 2.86136K wps
[Epoch 59 Batch 1230/1540] avg loss 0.00187434, throughput 2.808K wps
[Epoch 59 Batch 1260/1540] avg loss 0.001587, throughput 2.87115K wps
[Epoch 59 Batch 1290/1540] avg loss 0.00183639, throughput 2.88597K wps
[Epoch 59 Batch 1320/1540] avg loss 0.00179215, throughput 2.88217K wps
[Epoch 59 Batch 1350/1540] avg loss 0.00157088, throughput 2.8781K wps
[Epoch 59 Batch 1380/1540] avg loss 0.00178483, throughput 2.88048K wps
[Epoch 59 Batch 1410/1540] avg loss 0.00152591, throughput 2.84313K wps
[Epoch 59 Batch 1440/1540] avg loss 0.00141969, throughput 2.80303K wps
[Epoch 59 Batch 1470/1540] avg loss 0.00139359, throughput 2.86278K wps
[Epoch 59 Batch 1500/1540] avg loss 0.00162533, throughput 2.87786K wps
[Epoch 59 Batch 1530/1540] avg loss 0.0013556, throughput 2.85951K wps
Begin Testing...
[Epoch 59] train avg loss 0.00150726, dev acc 0.7959, dev avg loss 0.676562, throughput 2.86252K wps
[Epoch 60 Batch 30/1540] avg loss 0.00120245, throughput 2.90566K wps
[Epoch 60 Batch 60/1540] avg loss 0.00137074, throughput 2.85587K wps
[Epoch 60 Batch 90/1540] avg loss 0.00128877, throughput 2.87844K wps
[Epoch 60 Batch 120/1540] avg loss 0.00125785, throughput 2.87349K wps
[Epoch 60 Batch 150/1540] avg loss 0.00141898, throughput 2.8439K wps
[Epoch 60 Batch 180/1540] avg loss 0.00126264, throughput 2.8512K wps
[Epoch 60 Batch 210/1540] avg loss 0.00120799, throughput 2.84115K wps
[Epoch 60 Batch 240/1540] avg loss 0.00132141, throughput 2.86506K wps
[Epoch 60 Batch 270/1540] avg loss 0.00157662, throughput 2.85713K wps
[Epoch 60 Batch 300/1540] avg loss 0.00131525, throughput 2.88203K wps
[Epoch 60 Batch 330/1540] avg loss 0.00146616, throughput 2.885K wps
[Epoch 60 Batch 360/1540] avg loss 0.00122601, throughput 2.8866K wps
[Epoch 60 Batch 390/1540] avg loss 0.00167913, throughput 2.88201K wps
[Epoch 60 Batch 420/1540] avg loss 0.00136496, throughput 2.87522K wps
[Epoch 60 Batch 450/1540] avg loss 0.00127847, throughput 2.86898K wps
[Epoch 60 Batch 480/1540] avg loss 0.00159243, throughput 2.84925K wps
[Epoch 60 Batch 510/1540] avg loss 0.00119511, throughput 2.87089K wps
[Epoch 60 Batch 540/1540] avg loss 0.00159871, throughput 2.84967K wps
[Epoch 60 Batch 570/1540] avg loss 0.00172966, throughput 2.87777K wps
[Epoch 60 Batch 600/1540] avg loss 0.00144906, throughput 2.87777K wps
[Epoch 60 Batch 630/1540] avg loss 0.00138828, throughput 2.85421K wps
[Epoch 60 Batch 660/1540] avg loss 0.0016171, throughput 2.80197K wps
[Epoch 60 Batch 690/1540] avg loss 0.00142942, throughput 2.81281K wps
[Epoch 60 Batch 720/1540] avg loss 0.00160852, throughput 2.85536K wps
[Epoch 60 Batch 750/1540] avg loss 0.00165803, throughput 2.88133K wps
[Epoch 60 Batch 780/1540] avg loss 0.00145743, throughput 2.8776K wps
[Epoch 60 Batch 810/1540] avg loss 0.00107512, throughput 2.84693K wps
[Epoch 60 Batch 840/1540] avg loss 0.00165124, throughput 2.8666K wps
[Epoch 60 Batch 870/1540] avg loss 0.00157041, throughput 2.87031K wps
[Epoch 60 Batch 900/1540] avg loss 0.00132127, throughput 2.85795K wps
[Epoch 60 Batch 930/1540] avg loss 0.00132974, throughput 2.87086K wps
[Epoch 60 Batch 960/1540] avg loss 0.00193235, throughput 2.81055K wps
[Epoch 60 Batch 990/1540] avg loss 0.00127326, throughput 2.86677K wps
[Epoch 60 Batch 1020/1540] avg loss 0.00132862, throughput 2.86278K wps
[Epoch 60 Batch 1050/1540] avg loss 0.00146038, throughput 2.78755K wps
[Epoch 60 Batch 1080/1540] avg loss 0.00171618, throughput 2.8044K wps
[Epoch 60 Batch 1110/1540] avg loss 0.00143493, throughput 2.80335K wps
[Epoch 60 Batch 1140/1540] avg loss 0.00170767, throughput 2.83207K wps
[Epoch 60 Batch 1170/1540] avg loss 0.00170345, throughput 2.87742K wps
[Epoch 60 Batch 1200/1540] avg loss 0.00183156, throughput 2.88353K wps
[Epoch 60 Batch 1230/1540] avg loss 0.00150363, throughput 2.87424K wps
[Epoch 60 Batch 1260/1540] avg loss 0.00173297, throughput 2.87826K wps
[Epoch 60 Batch 1290/1540] avg loss 0.00176204, throughput 2.87485K wps
[Epoch 60 Batch 1320/1540] avg loss 0.00151444, throughput 2.84072K wps
[Epoch 60 Batch 1350/1540] avg loss 0.00153623, throughput 2.87639K wps
[Epoch 60 Batch 1380/1540] avg loss 0.00132984, throughput 2.86083K wps
[Epoch 60 Batch 1410/1540] avg loss 0.00181687, throughput 2.80846K wps
[Epoch 60 Batch 1440/1540] avg loss 0.00143378, throughput 2.87494K wps
[Epoch 60 Batch 1470/1540] avg loss 0.00150053, throughput 2.88486K wps
[Epoch 60 Batch 1500/1540] avg loss 0.00155848, throughput 2.88407K wps
[Epoch 60 Batch 1530/1540] avg loss 0.00154024, throughput 2.86852K wps
Begin Testing...
[Epoch 60] train avg loss 0.00148106, dev acc 0.7970, dev avg loss 0.69161, throughput 2.85874K wps
[Epoch 61 Batch 30/1540] avg loss 0.00131478, throughput 2.94079K wps
[Epoch 61 Batch 60/1540] avg loss 0.00117909, throughput 2.88309K wps
[Epoch 61 Batch 90/1540] avg loss 0.00132449, throughput 2.8644K wps
[Epoch 61 Batch 120/1540] avg loss 0.00138634, throughput 2.86265K wps
[Epoch 61 Batch 150/1540] avg loss 0.00126874, throughput 2.80704K wps
[Epoch 61 Batch 180/1540] avg loss 0.00140365, throughput 2.85976K wps
[Epoch 61 Batch 210/1540] avg loss 0.00146773, throughput 2.82958K wps
[Epoch 61 Batch 240/1540] avg loss 0.00146206, throughput 2.80376K wps
[Epoch 61 Batch 270/1540] avg loss 0.00140736, throughput 2.84261K wps
[Epoch 61 Batch 300/1540] avg loss 0.00168363, throughput 2.88317K wps
[Epoch 61 Batch 330/1540] avg loss 0.00140963, throughput 2.87256K wps
[Epoch 61 Batch 360/1540] avg loss 0.00131855, throughput 2.8763K wps
[Epoch 61 Batch 390/1540] avg loss 0.00137739, throughput 2.84741K wps
[Epoch 61 Batch 420/1540] avg loss 0.0011672, throughput 2.85129K wps
[Epoch 61 Batch 450/1540] avg loss 0.00151568, throughput 2.8817K wps
[Epoch 61 Batch 480/1540] avg loss 0.00120829, throughput 2.8761K wps
[Epoch 61 Batch 510/1540] avg loss 0.00156115, throughput 2.87526K wps
[Epoch 61 Batch 540/1540] avg loss 0.00158817, throughput 2.87552K wps
[Epoch 61 Batch 570/1540] avg loss 0.00171521, throughput 2.86958K wps
[Epoch 61 Batch 600/1540] avg loss 0.00143131, throughput 2.86281K wps
[Epoch 61 Batch 630/1540] avg loss 0.00138841, throughput 2.85439K wps
[Epoch 61 Batch 660/1540] avg loss 0.00171108, throughput 2.88492K wps
[Epoch 61 Batch 690/1540] avg loss 0.00164868, throughput 2.8861K wps
[Epoch 61 Batch 720/1540] avg loss 0.00176892, throughput 2.79362K wps
[Epoch 61 Batch 750/1540] avg loss 0.00163437, throughput 2.80869K wps
[Epoch 61 Batch 780/1540] avg loss 0.00159856, throughput 2.8753K wps
[Epoch 61 Batch 810/1540] avg loss 0.00114459, throughput 2.86836K wps
[Epoch 61 Batch 840/1540] avg loss 0.00157669, throughput 2.84343K wps
[Epoch 61 Batch 870/1540] avg loss 0.00156843, throughput 2.88413K wps
[Epoch 61 Batch 900/1540] avg loss 0.00174577, throughput 2.88986K wps
[Epoch 61 Batch 930/1540] avg loss 0.00104375, throughput 2.86718K wps
[Epoch 61 Batch 960/1540] avg loss 0.00129181, throughput 2.83703K wps
[Epoch 61 Batch 990/1540] avg loss 0.00150291, throughput 2.87685K wps
[Epoch 61 Batch 1020/1540] avg loss 0.00163178, throughput 2.80451K wps
[Epoch 61 Batch 1050/1540] avg loss 0.00156491, throughput 2.85209K wps
[Epoch 61 Batch 1080/1540] avg loss 0.00151249, throughput 2.88075K wps
[Epoch 61 Batch 1110/1540] avg loss 0.00142286, throughput 2.88792K wps
[Epoch 61 Batch 1140/1540] avg loss 0.00151636, throughput 2.87864K wps
[Epoch 61 Batch 1170/1540] avg loss 0.00162214, throughput 2.8719K wps
[Epoch 61 Batch 1200/1540] avg loss 0.00169515, throughput 2.8652K wps
[Epoch 61 Batch 1230/1540] avg loss 0.00161447, throughput 2.8497K wps
[Epoch 61 Batch 1260/1540] avg loss 0.00192721, throughput 2.84378K wps
[Epoch 61 Batch 1290/1540] avg loss 0.00151312, throughput 2.86872K wps
[Epoch 61 Batch 1320/1540] avg loss 0.0016705, throughput 2.87906K wps
[Epoch 61 Batch 1350/1540] avg loss 0.00172396, throughput 2.87306K wps
[Epoch 61 Batch 1380/1540] avg loss 0.0012977, throughput 2.86888K wps
[Epoch 61 Batch 1410/1540] avg loss 0.00156535, throughput 2.84118K wps
[Epoch 61 Batch 1440/1540] avg loss 0.0011826, throughput 2.79363K wps
[Epoch 61 Batch 1470/1540] avg loss 0.0013929, throughput 2.78764K wps
[Epoch 61 Batch 1500/1540] avg loss 0.00133268, throughput 2.80173K wps
[Epoch 61 Batch 1530/1540] avg loss 0.00159632, throughput 2.8706K wps
Begin Testing...
[Epoch 61] train avg loss 0.00148174, dev acc 0.8050, dev avg loss 0.692253, throughput 2.85759K wps
[Epoch 62 Batch 30/1540] avg loss 0.0014608, throughput 2.92165K wps
[Epoch 62 Batch 60/1540] avg loss 0.00139396, throughput 2.89025K wps
[Epoch 62 Batch 90/1540] avg loss 0.00130864, throughput 2.88545K wps
[Epoch 62 Batch 120/1540] avg loss 0.00148001, throughput 2.83477K wps
[Epoch 62 Batch 150/1540] avg loss 0.00158948, throughput 2.88042K wps
[Epoch 62 Batch 180/1540] avg loss 0.00161449, throughput 2.87582K wps
[Epoch 62 Batch 210/1540] avg loss 0.00113817, throughput 2.83337K wps
[Epoch 62 Batch 240/1540] avg loss 0.0012825, throughput 2.80321K wps
[Epoch 62 Batch 270/1540] avg loss 0.0012889, throughput 2.83398K wps
[Epoch 62 Batch 300/1540] avg loss 0.00132653, throughput 2.88194K wps
[Epoch 62 Batch 330/1540] avg loss 0.00135628, throughput 2.88002K wps
[Epoch 62 Batch 360/1540] avg loss 0.00131182, throughput 2.88305K wps
[Epoch 62 Batch 390/1540] avg loss 0.00140409, throughput 2.87318K wps
[Epoch 62 Batch 420/1540] avg loss 0.00148405, throughput 2.87951K wps
[Epoch 62 Batch 450/1540] avg loss 0.00135808, throughput 2.87499K wps
[Epoch 62 Batch 480/1540] avg loss 0.00155737, throughput 2.86941K wps
[Epoch 62 Batch 510/1540] avg loss 0.00157236, throughput 2.86635K wps
[Epoch 62 Batch 540/1540] avg loss 0.00115881, throughput 2.87332K wps
[Epoch 62 Batch 570/1540] avg loss 0.0010958, throughput 2.87222K wps
[Epoch 62 Batch 600/1540] avg loss 0.00144549, throughput 2.8689K wps
[Epoch 62 Batch 630/1540] avg loss 0.00133805, throughput 2.87077K wps
[Epoch 62 Batch 660/1540] avg loss 0.00136056, throughput 2.86781K wps
[Epoch 62 Batch 690/1540] avg loss 0.0014986, throughput 2.83742K wps
[Epoch 62 Batch 720/1540] avg loss 0.00114309, throughput 2.88658K wps
[Epoch 62 Batch 750/1540] avg loss 0.00143869, throughput 2.88241K wps
[Epoch 62 Batch 780/1540] avg loss 0.00162726, throughput 2.87088K wps
[Epoch 62 Batch 810/1540] avg loss 0.00141126, throughput 2.8821K wps
[Epoch 62 Batch 840/1540] avg loss 0.00167746, throughput 2.88886K wps
[Epoch 62 Batch 870/1540] avg loss 0.00150443, throughput 2.88728K wps
[Epoch 62 Batch 900/1540] avg loss 0.00155171, throughput 2.87555K wps
[Epoch 62 Batch 930/1540] avg loss 0.00174564, throughput 2.8744K wps
[Epoch 62 Batch 960/1540] avg loss 0.0015592, throughput 2.88285K wps
[Epoch 62 Batch 990/1540] avg loss 0.00159293, throughput 2.88175K wps
[Epoch 62 Batch 1020/1540] avg loss 0.00113801, throughput 2.87906K wps
[Epoch 62 Batch 1050/1540] avg loss 0.0015816, throughput 2.8716K wps
[Epoch 62 Batch 1080/1540] avg loss 0.00142941, throughput 2.87067K wps
[Epoch 62 Batch 1110/1540] avg loss 0.0017649, throughput 2.80366K wps
[Epoch 62 Batch 1140/1540] avg loss 0.00178137, throughput 2.85208K wps
[Epoch 62 Batch 1170/1540] avg loss 0.00140872, throughput 2.84158K wps
[Epoch 62 Batch 1200/1540] avg loss 0.00147524, throughput 2.87764K wps
[Epoch 62 Batch 1230/1540] avg loss 0.00135903, throughput 2.87695K wps
[Epoch 62 Batch 1260/1540] avg loss 0.00154432, throughput 2.87553K wps
[Epoch 62 Batch 1290/1540] avg loss 0.00167385, throughput 2.88205K wps
[Epoch 62 Batch 1320/1540] avg loss 0.00172234, throughput 2.87754K wps
[Epoch 62 Batch 1350/1540] avg loss 0.00146875, throughput 2.884K wps
[Epoch 62 Batch 1380/1540] avg loss 0.00138098, throughput 2.88698K wps
[Epoch 62 Batch 1410/1540] avg loss 0.00134596, throughput 2.83201K wps
[Epoch 62 Batch 1440/1540] avg loss 0.00147188, throughput 2.82777K wps
[Epoch 62 Batch 1470/1540] avg loss 0.00129315, throughput 2.88291K wps
[Epoch 62 Batch 1500/1540] avg loss 0.00186492, throughput 2.82783K wps
[Epoch 62 Batch 1530/1540] avg loss 0.00142341, throughput 2.86984K wps
Begin Testing...
[Epoch 62] train avg loss 0.00145591, dev acc 0.7936, dev avg loss 0.693543, throughput 2.86815K wps
[Epoch 63 Batch 30/1540] avg loss 0.00138637, throughput 2.89955K wps
[Epoch 63 Batch 60/1540] avg loss 0.00134032, throughput 2.88748K wps
[Epoch 63 Batch 90/1540] avg loss 0.00104969, throughput 2.87956K wps
[Epoch 63 Batch 120/1540] avg loss 0.00121618, throughput 2.88255K wps
[Epoch 63 Batch 150/1540] avg loss 0.00116692, throughput 2.87174K wps
[Epoch 63 Batch 180/1540] avg loss 0.00122934, throughput 2.87306K wps
[Epoch 63 Batch 210/1540] avg loss 0.00126226, throughput 2.86195K wps
[Epoch 63 Batch 240/1540] avg loss 0.00149916, throughput 2.80273K wps
[Epoch 63 Batch 270/1540] avg loss 0.00113927, throughput 2.8021K wps
[Epoch 63 Batch 300/1540] avg loss 0.0011117, throughput 2.84091K wps
[Epoch 63 Batch 330/1540] avg loss 0.00104884, throughput 2.84041K wps
[Epoch 63 Batch 360/1540] avg loss 0.00150771, throughput 2.87184K wps
[Epoch 63 Batch 390/1540] avg loss 0.00154229, throughput 2.86519K wps
[Epoch 63 Batch 420/1540] avg loss 0.00127385, throughput 2.83782K wps
[Epoch 63 Batch 450/1540] avg loss 0.00136837, throughput 2.86213K wps
[Epoch 63 Batch 480/1540] avg loss 0.00155353, throughput 2.88213K wps
[Epoch 63 Batch 510/1540] avg loss 0.00146031, throughput 2.88643K wps
[Epoch 63 Batch 540/1540] avg loss 0.00120304, throughput 2.87815K wps
[Epoch 63 Batch 570/1540] avg loss 0.00119945, throughput 2.88073K wps
[Epoch 63 Batch 600/1540] avg loss 0.00128824, throughput 2.82694K wps
[Epoch 63 Batch 630/1540] avg loss 0.00145563, throughput 2.83179K wps
[Epoch 63 Batch 660/1540] avg loss 0.0015314, throughput 2.87304K wps
[Epoch 63 Batch 690/1540] avg loss 0.00139662, throughput 2.82344K wps
[Epoch 63 Batch 720/1540] avg loss 0.00148877, throughput 2.8759K wps
[Epoch 63 Batch 750/1540] avg loss 0.00140208, throughput 2.8804K wps
[Epoch 63 Batch 780/1540] avg loss 0.00131155, throughput 2.88282K wps
[Epoch 63 Batch 810/1540] avg loss 0.00169101, throughput 2.88817K wps
[Epoch 63 Batch 840/1540] avg loss 0.00141596, throughput 2.87956K wps
[Epoch 63 Batch 870/1540] avg loss 0.00160579, throughput 2.85719K wps
[Epoch 63 Batch 900/1540] avg loss 0.00131012, throughput 2.88552K wps
[Epoch 63 Batch 930/1540] avg loss 0.00206336, throughput 2.87453K wps
[Epoch 63 Batch 960/1540] avg loss 0.00144444, throughput 2.83439K wps
[Epoch 63 Batch 990/1540] avg loss 0.00156808, throughput 2.83647K wps
[Epoch 63 Batch 1020/1540] avg loss 0.00154591, throughput 2.83812K wps
[Epoch 63 Batch 1050/1540] avg loss 0.00137975, throughput 2.85648K wps
[Epoch 63 Batch 1080/1540] avg loss 0.0014682, throughput 2.85223K wps
[Epoch 63 Batch 1110/1540] avg loss 0.00165528, throughput 2.8628K wps
[Epoch 63 Batch 1140/1540] avg loss 0.00142698, throughput 2.88403K wps
[Epoch 63 Batch 1170/1540] avg loss 0.00123293, throughput 2.86632K wps
[Epoch 63 Batch 1200/1540] avg loss 0.00177847, throughput 2.86481K wps
[Epoch 63 Batch 1230/1540] avg loss 0.00147326, throughput 2.82887K wps
[Epoch 63 Batch 1260/1540] avg loss 0.00148242, throughput 2.86474K wps
[Epoch 63 Batch 1290/1540] avg loss 0.00156482, throughput 2.8835K wps
[Epoch 63 Batch 1320/1540] avg loss 0.00182512, throughput 2.87427K wps
[Epoch 63 Batch 1350/1540] avg loss 0.00177525, throughput 2.87688K wps
[Epoch 63 Batch 1380/1540] avg loss 0.00150945, throughput 2.81666K wps
[Epoch 63 Batch 1410/1540] avg loss 0.00155303, throughput 2.83051K wps
[Epoch 63 Batch 1440/1540] avg loss 0.00125028, throughput 2.81601K wps
[Epoch 63 Batch 1470/1540] avg loss 0.00145952, throughput 2.86639K wps
[Epoch 63 Batch 1500/1540] avg loss 0.00189058, throughput 2.83066K wps
[Epoch 63 Batch 1530/1540] avg loss 0.00176596, throughput 2.88222K wps
Begin Testing...
[Epoch 63] train avg loss 0.00144307, dev acc 0.7982, dev avg loss 0.701463, throughput 2.85976K wps
[Epoch 64 Batch 30/1540] avg loss 0.00117769, throughput 2.9385K wps
[Epoch 64 Batch 60/1540] avg loss 0.00105273, throughput 2.8767K wps
[Epoch 64 Batch 90/1540] avg loss 0.00133862, throughput 2.88633K wps
[Epoch 64 Batch 120/1540] avg loss 0.00133198, throughput 2.87575K wps
[Epoch 64 Batch 150/1540] avg loss 0.00134809, throughput 2.86167K wps
[Epoch 64 Batch 180/1540] avg loss 0.00149852, throughput 2.87512K wps
[Epoch 64 Batch 210/1540] avg loss 0.00129664, throughput 2.87205K wps
[Epoch 64 Batch 240/1540] avg loss 0.00130449, throughput 2.8572K wps
[Epoch 64 Batch 270/1540] avg loss 0.00147784, throughput 2.80031K wps
[Epoch 64 Batch 300/1540] avg loss 0.000978645, throughput 2.8115K wps
[Epoch 64 Batch 330/1540] avg loss 0.00121831, throughput 2.84587K wps
[Epoch 64 Batch 360/1540] avg loss 0.00141165, throughput 2.81797K wps
[Epoch 64 Batch 390/1540] avg loss 0.00135575, throughput 2.87736K wps
[Epoch 64 Batch 420/1540] avg loss 0.00159126, throughput 2.84174K wps
[Epoch 64 Batch 450/1540] avg loss 0.00122625, throughput 2.80295K wps
[Epoch 64 Batch 480/1540] avg loss 0.00159038, throughput 2.87636K wps
[Epoch 64 Batch 510/1540] avg loss 0.00157439, throughput 2.84765K wps
[Epoch 64 Batch 540/1540] avg loss 0.00178447, throughput 2.8624K wps
[Epoch 64 Batch 570/1540] avg loss 0.00151507, throughput 2.88425K wps
[Epoch 64 Batch 600/1540] avg loss 0.00149407, throughput 2.87679K wps
[Epoch 64 Batch 630/1540] avg loss 0.00119697, throughput 2.86501K wps
[Epoch 64 Batch 660/1540] avg loss 0.00145981, throughput 2.8453K wps
[Epoch 64 Batch 690/1540] avg loss 0.00124368, throughput 2.80698K wps
[Epoch 64 Batch 720/1540] avg loss 0.00143419, throughput 2.87793K wps
[Epoch 64 Batch 750/1540] avg loss 0.00135994, throughput 2.874K wps
[Epoch 64 Batch 780/1540] avg loss 0.00135243, throughput 2.87837K wps
[Epoch 64 Batch 810/1540] avg loss 0.00142273, throughput 2.88021K wps
[Epoch 64 Batch 840/1540] avg loss 0.00136815, throughput 2.86895K wps
[Epoch 64 Batch 870/1540] avg loss 0.00129468, throughput 2.82776K wps
[Epoch 64 Batch 900/1540] avg loss 0.00146663, throughput 2.80377K wps
[Epoch 64 Batch 930/1540] avg loss 0.0015077, throughput 2.87689K wps
[Epoch 64 Batch 960/1540] avg loss 0.00148238, throughput 2.84893K wps
[Epoch 64 Batch 990/1540] avg loss 0.00154602, throughput 2.88033K wps
[Epoch 64 Batch 1020/1540] avg loss 0.00137776, throughput 2.87609K wps
[Epoch 64 Batch 1050/1540] avg loss 0.00138246, throughput 2.87598K wps
[Epoch 64 Batch 1080/1540] avg loss 0.00174767, throughput 2.87554K wps
[Epoch 64 Batch 1110/1540] avg loss 0.00164823, throughput 2.87847K wps
[Epoch 64 Batch 1140/1540] avg loss 0.00159439, throughput 2.83226K wps
[Epoch 64 Batch 1170/1540] avg loss 0.00141982, throughput 2.87407K wps
[Epoch 64 Batch 1200/1540] avg loss 0.00133075, throughput 2.793K wps
[Epoch 64 Batch 1230/1540] avg loss 0.00146379, throughput 2.81158K wps
[Epoch 64 Batch 1260/1540] avg loss 0.00156105, throughput 2.86784K wps
[Epoch 64 Batch 1290/1540] avg loss 0.0012701, throughput 2.86829K wps
[Epoch 64 Batch 1320/1540] avg loss 0.00150198, throughput 2.87024K wps
[Epoch 64 Batch 1350/1540] avg loss 0.0013165, throughput 2.85929K wps
[Epoch 64 Batch 1380/1540] avg loss 0.00139111, throughput 2.87207K wps
[Epoch 64 Batch 1410/1540] avg loss 0.00130643, throughput 2.84801K wps
[Epoch 64 Batch 1440/1540] avg loss 0.00176947, throughput 2.86659K wps
[Epoch 64 Batch 1470/1540] avg loss 0.00128754, throughput 2.87209K wps
[Epoch 64 Batch 1500/1540] avg loss 0.00146498, throughput 2.85767K wps
[Epoch 64 Batch 1530/1540] avg loss 0.00123243, throughput 2.87921K wps
Begin Testing...
[Epoch 64] train avg loss 0.00140469, dev acc 0.7982, dev avg loss 0.705999, throughput 2.85859K wps
[Epoch 65 Batch 30/1540] avg loss 0.00135864, throughput 2.86169K wps
[Epoch 65 Batch 60/1540] avg loss 0.00116708, throughput 2.84618K wps
[Epoch 65 Batch 90/1540] avg loss 0.0011244, throughput 2.88195K wps
[Epoch 65 Batch 120/1540] avg loss 0.0015675, throughput 2.85975K wps
[Epoch 65 Batch 150/1540] avg loss 0.00134793, throughput 2.87694K wps
[Epoch 65 Batch 180/1540] avg loss 0.00121742, throughput 2.87982K wps
[Epoch 65 Batch 210/1540] avg loss 0.00160864, throughput 2.8753K wps
[Epoch 65 Batch 240/1540] avg loss 0.00110615, throughput 2.87081K wps
[Epoch 65 Batch 270/1540] avg loss 0.00172414, throughput 2.82316K wps
[Epoch 65 Batch 300/1540] avg loss 0.00124927, throughput 2.81787K wps
[Epoch 65 Batch 330/1540] avg loss 0.0013517, throughput 2.86386K wps
[Epoch 65 Batch 360/1540] avg loss 0.00146411, throughput 2.85726K wps
[Epoch 65 Batch 390/1540] avg loss 0.00132283, throughput 2.80253K wps
[Epoch 65 Batch 420/1540] avg loss 0.00119892, throughput 2.87759K wps
[Epoch 65 Batch 450/1540] avg loss 0.00150975, throughput 2.86186K wps
[Epoch 65 Batch 480/1540] avg loss 0.00141949, throughput 2.86194K wps
[Epoch 65 Batch 510/1540] avg loss 0.00137127, throughput 2.87699K wps
[Epoch 65 Batch 540/1540] avg loss 0.00141181, throughput 2.88105K wps
[Epoch 65 Batch 570/1540] avg loss 0.00142723, throughput 2.85221K wps
[Epoch 65 Batch 600/1540] avg loss 0.00169488, throughput 2.87258K wps
[Epoch 65 Batch 630/1540] avg loss 0.00129474, throughput 2.87877K wps
[Epoch 65 Batch 660/1540] avg loss 0.00170231, throughput 2.87048K wps
[Epoch 65 Batch 690/1540] avg loss 0.00141203, throughput 2.84175K wps
[Epoch 65 Batch 720/1540] avg loss 0.00138896, throughput 2.87316K wps
[Epoch 65 Batch 750/1540] avg loss 0.00133425, throughput 2.85649K wps
[Epoch 65 Batch 780/1540] avg loss 0.00124187, throughput 2.86494K wps
[Epoch 65 Batch 810/1540] avg loss 0.00137831, throughput 2.87852K wps
[Epoch 65 Batch 840/1540] avg loss 0.00105445, throughput 2.87955K wps
[Epoch 65 Batch 870/1540] avg loss 0.00135916, throughput 2.87503K wps
[Epoch 65 Batch 900/1540] avg loss 0.00153359, throughput 2.88337K wps
[Epoch 65 Batch 930/1540] avg loss 0.00136313, throughput 2.87167K wps
[Epoch 65 Batch 960/1540] avg loss 0.00121467, throughput 2.88571K wps
[Epoch 65 Batch 990/1540] avg loss 0.00124894, throughput 2.87564K wps
[Epoch 65 Batch 1020/1540] avg loss 0.00156046, throughput 2.88027K wps
[Epoch 65 Batch 1050/1540] avg loss 0.00135905, throughput 2.8521K wps
[Epoch 65 Batch 1080/1540] avg loss 0.00169769, throughput 2.82365K wps
[Epoch 65 Batch 1110/1540] avg loss 0.00151309, throughput 2.8764K wps
[Epoch 65 Batch 1140/1540] avg loss 0.00132438, throughput 2.88522K wps
[Epoch 65 Batch 1170/1540] avg loss 0.00139231, throughput 2.87414K wps
[Epoch 65 Batch 1200/1540] avg loss 0.0014443, throughput 2.87629K wps
[Epoch 65 Batch 1230/1540] avg loss 0.0014279, throughput 2.8273K wps
[Epoch 65 Batch 1260/1540] avg loss 0.00146021, throughput 2.83588K wps
[Epoch 65 Batch 1290/1540] avg loss 0.00153723, throughput 2.87703K wps
[Epoch 65 Batch 1320/1540] avg loss 0.00182373, throughput 2.87664K wps
[Epoch 65 Batch 1350/1540] avg loss 0.00162494, throughput 2.86786K wps
[Epoch 65 Batch 1380/1540] avg loss 0.00148996, throughput 2.85584K wps
[Epoch 65 Batch 1410/1540] avg loss 0.00134852, throughput 2.78684K wps
[Epoch 65 Batch 1440/1540] avg loss 0.0015024, throughput 2.85603K wps
[Epoch 65 Batch 1470/1540] avg loss 0.00141099, throughput 2.84567K wps
[Epoch 65 Batch 1500/1540] avg loss 0.00136961, throughput 2.87861K wps
[Epoch 65 Batch 1530/1540] avg loss 0.00106372, throughput 2.81411K wps
Begin Testing...
[Epoch 65] train avg loss 0.00140416, dev acc 0.7947, dev avg loss 0.712189, throughput 2.86115K wps
[Epoch 66 Batch 30/1540] avg loss 0.00127296, throughput 2.92888K wps
[Epoch 66 Batch 60/1540] avg loss 0.00131052, throughput 2.87637K wps
[Epoch 66 Batch 90/1540] avg loss 0.00136172, throughput 2.85959K wps
[Epoch 66 Batch 120/1540] avg loss 0.00129223, throughput 2.87284K wps
[Epoch 66 Batch 150/1540] avg loss 0.0011656, throughput 2.87542K wps
[Epoch 66 Batch 180/1540] avg loss 0.00138673, throughput 2.88035K wps
[Epoch 66 Batch 210/1540] avg loss 0.00108541, throughput 2.83625K wps
[Epoch 66 Batch 240/1540] avg loss 0.0014519, throughput 2.87392K wps
[Epoch 66 Batch 270/1540] avg loss 0.0011461, throughput 2.88087K wps
[Epoch 66 Batch 300/1540] avg loss 0.00139481, throughput 2.87824K wps
[Epoch 66 Batch 330/1540] avg loss 0.00133102, throughput 2.88493K wps
[Epoch 66 Batch 360/1540] avg loss 0.00106814, throughput 2.88212K wps
[Epoch 66 Batch 390/1540] avg loss 0.00126711, throughput 2.86149K wps
[Epoch 66 Batch 420/1540] avg loss 0.001258, throughput 2.85771K wps
[Epoch 66 Batch 450/1540] avg loss 0.00165762, throughput 2.86148K wps
[Epoch 66 Batch 480/1540] avg loss 0.00122495, throughput 2.87629K wps
[Epoch 66 Batch 510/1540] avg loss 0.00164661, throughput 2.84927K wps
[Epoch 66 Batch 540/1540] avg loss 0.00144247, throughput 2.8845K wps
[Epoch 66 Batch 570/1540] avg loss 0.00154407, throughput 2.85623K wps
[Epoch 66 Batch 600/1540] avg loss 0.00126339, throughput 2.85504K wps
[Epoch 66 Batch 630/1540] avg loss 0.00156497, throughput 2.86681K wps
[Epoch 66 Batch 660/1540] avg loss 0.00121659, throughput 2.84364K wps
[Epoch 66 Batch 690/1540] avg loss 0.00140733, throughput 2.88197K wps
[Epoch 66 Batch 720/1540] avg loss 0.00142489, throughput 2.87231K wps
[Epoch 66 Batch 750/1540] avg loss 0.00142205, throughput 2.83656K wps
[Epoch 66 Batch 780/1540] avg loss 0.00135829, throughput 2.85096K wps
[Epoch 66 Batch 810/1540] avg loss 0.00160215, throughput 2.88509K wps
[Epoch 66 Batch 840/1540] avg loss 0.0014316, throughput 2.85204K wps
[Epoch 66 Batch 870/1540] avg loss 0.0012224, throughput 2.8204K wps
[Epoch 66 Batch 900/1540] avg loss 0.00165676, throughput 2.87359K wps
[Epoch 66 Batch 930/1540] avg loss 0.00162252, throughput 2.86087K wps
[Epoch 66 Batch 960/1540] avg loss 0.00146342, throughput 2.87085K wps
[Epoch 66 Batch 990/1540] avg loss 0.00130375, throughput 2.83267K wps
[Epoch 66 Batch 1020/1540] avg loss 0.00150989, throughput 2.87397K wps
[Epoch 66 Batch 1050/1540] avg loss 0.00130153, throughput 2.86305K wps
[Epoch 66 Batch 1080/1540] avg loss 0.00163389, throughput 2.86859K wps
[Epoch 66 Batch 1110/1540] avg loss 0.0015353, throughput 2.79704K wps
[Epoch 66 Batch 1140/1540] avg loss 0.00125883, throughput 2.82674K wps
[Epoch 66 Batch 1170/1540] avg loss 0.00137844, throughput 2.87599K wps
[Epoch 66 Batch 1200/1540] avg loss 0.00129854, throughput 2.84851K wps
[Epoch 66 Batch 1230/1540] avg loss 0.00112331, throughput 2.84541K wps
[Epoch 66 Batch 1260/1540] avg loss 0.00121611, throughput 2.87717K wps
[Epoch 66 Batch 1290/1540] avg loss 0.00146363, throughput 2.88149K wps
[Epoch 66 Batch 1320/1540] avg loss 0.00129456, throughput 2.86987K wps
[Epoch 66 Batch 1350/1540] avg loss 0.00150397, throughput 2.87768K wps
[Epoch 66 Batch 1380/1540] avg loss 0.00142348, throughput 2.8718K wps
[Epoch 66 Batch 1410/1540] avg loss 0.00170415, throughput 2.81785K wps
[Epoch 66 Batch 1440/1540] avg loss 0.00150022, throughput 2.86702K wps
[Epoch 66 Batch 1470/1540] avg loss 0.00134069, throughput 2.84054K wps
[Epoch 66 Batch 1500/1540] avg loss 0.00132167, throughput 2.79714K wps
[Epoch 66 Batch 1530/1540] avg loss 0.00154356, throughput 2.79342K wps
Begin Testing...
[Epoch 66] train avg loss 0.00138724, dev acc 0.8050, dev avg loss 0.710395, throughput 2.85955K wps
[Epoch 67 Batch 30/1540] avg loss 0.00128354, throughput 2.86872K wps
[Epoch 67 Batch 60/1540] avg loss 0.0011927, throughput 2.78274K wps
[Epoch 67 Batch 90/1540] avg loss 0.00106239, throughput 2.86273K wps
[Epoch 67 Batch 120/1540] avg loss 0.00111617, throughput 2.87462K wps
[Epoch 67 Batch 150/1540] avg loss 0.00120439, throughput 2.88046K wps
[Epoch 67 Batch 180/1540] avg loss 0.00120761, throughput 2.86997K wps
[Epoch 67 Batch 210/1540] avg loss 0.00114949, throughput 2.8641K wps
[Epoch 67 Batch 240/1540] avg loss 0.00138368, throughput 2.82855K wps
[Epoch 67 Batch 270/1540] avg loss 0.00110411, throughput 2.86554K wps
[Epoch 67 Batch 300/1540] avg loss 0.00142149, throughput 2.81316K wps
[Epoch 67 Batch 330/1540] avg loss 0.0013373, throughput 2.86886K wps
[Epoch 67 Batch 360/1540] avg loss 0.00147957, throughput 2.88423K wps
[Epoch 67 Batch 390/1540] avg loss 0.00132644, throughput 2.87652K wps
[Epoch 67 Batch 420/1540] avg loss 0.00144741, throughput 2.80832K wps
[Epoch 67 Batch 450/1540] avg loss 0.0013313, throughput 2.84244K wps
[Epoch 67 Batch 480/1540] avg loss 0.00116393, throughput 2.80987K wps
[Epoch 67 Batch 510/1540] avg loss 0.00155002, throughput 2.82448K wps
[Epoch 67 Batch 540/1540] avg loss 0.00125249, throughput 2.8738K wps
[Epoch 67 Batch 570/1540] avg loss 0.00152926, throughput 2.87288K wps
[Epoch 67 Batch 600/1540] avg loss 0.000913149, throughput 2.84989K wps
[Epoch 67 Batch 630/1540] avg loss 0.00153403, throughput 2.77917K wps
[Epoch 67 Batch 660/1540] avg loss 0.00126011, throughput 2.82527K wps
[Epoch 67 Batch 690/1540] avg loss 0.00118272, throughput 2.8754K wps
[Epoch 67 Batch 720/1540] avg loss 0.00122676, throughput 2.86646K wps
[Epoch 67 Batch 750/1540] avg loss 0.00142917, throughput 2.82181K wps
[Epoch 67 Batch 780/1540] avg loss 0.00131482, throughput 2.81759K wps
[Epoch 67 Batch 810/1540] avg loss 0.00114612, throughput 2.85379K wps
[Epoch 67 Batch 840/1540] avg loss 0.0013978, throughput 2.84192K wps
[Epoch 67 Batch 870/1540] avg loss 0.00143805, throughput 2.8398K wps
[Epoch 67 Batch 900/1540] avg loss 0.00160063, throughput 2.84294K wps
[Epoch 67 Batch 930/1540] avg loss 0.00128115, throughput 2.86905K wps
[Epoch 67 Batch 960/1540] avg loss 0.00133373, throughput 2.87127K wps
[Epoch 67 Batch 990/1540] avg loss 0.00147027, throughput 2.8769K wps
[Epoch 67 Batch 1020/1540] avg loss 0.00147324, throughput 2.86696K wps
[Epoch 67 Batch 1050/1540] avg loss 0.00124212, throughput 2.86474K wps
[Epoch 67 Batch 1080/1540] avg loss 0.00183638, throughput 2.87769K wps
[Epoch 67 Batch 1110/1540] avg loss 0.00124529, throughput 2.87085K wps
[Epoch 67 Batch 1140/1540] avg loss 0.00157788, throughput 2.87128K wps
[Epoch 67 Batch 1170/1540] avg loss 0.00132704, throughput 2.85737K wps
[Epoch 67 Batch 1200/1540] avg loss 0.00148816, throughput 2.87289K wps
[Epoch 67 Batch 1230/1540] avg loss 0.00139041, throughput 2.87859K wps
[Epoch 67 Batch 1260/1540] avg loss 0.00153247, throughput 2.87816K wps
[Epoch 67 Batch 1290/1540] avg loss 0.00133618, throughput 2.87481K wps
[Epoch 67 Batch 1320/1540] avg loss 0.00164512, throughput 2.87206K wps
[Epoch 67 Batch 1350/1540] avg loss 0.00131037, throughput 2.86419K wps
[Epoch 67 Batch 1380/1540] avg loss 0.00147887, throughput 2.85676K wps
[Epoch 67 Batch 1410/1540] avg loss 0.00154637, throughput 2.85223K wps
[Epoch 67 Batch 1440/1540] avg loss 0.00149396, throughput 2.86109K wps
[Epoch 67 Batch 1470/1540] avg loss 0.00170094, throughput 2.86461K wps
[Epoch 67 Batch 1500/1540] avg loss 0.00129582, throughput 2.85808K wps
[Epoch 67 Batch 1530/1540] avg loss 0.00151147, throughput 2.86517K wps
Begin Testing...
[Epoch 67] train avg loss 0.00136488, dev acc 0.8016, dev avg loss 0.710172, throughput 2.85505K wps
[Epoch 68 Batch 30/1540] avg loss 0.00138491, throughput 2.85946K wps
[Epoch 68 Batch 60/1540] avg loss 0.00113984, throughput 2.84486K wps
[Epoch 68 Batch 90/1540] avg loss 0.00115619, throughput 2.86415K wps
[Epoch 68 Batch 120/1540] avg loss 0.00138668, throughput 2.85831K wps
[Epoch 68 Batch 150/1540] avg loss 0.00118362, throughput 2.83261K wps
[Epoch 68 Batch 180/1540] avg loss 0.00120757, throughput 2.81643K wps
[Epoch 68 Batch 210/1540] avg loss 0.00144233, throughput 2.87768K wps
[Epoch 68 Batch 240/1540] avg loss 0.00124299, throughput 2.88561K wps
[Epoch 68 Batch 270/1540] avg loss 0.00146579, throughput 2.87066K wps
[Epoch 68 Batch 300/1540] avg loss 0.00136144, throughput 2.84863K wps
[Epoch 68 Batch 330/1540] avg loss 0.00127276, throughput 2.88082K wps
[Epoch 68 Batch 360/1540] avg loss 0.00118131, throughput 2.87662K wps
[Epoch 68 Batch 390/1540] avg loss 0.00121002, throughput 2.87696K wps
[Epoch 68 Batch 420/1540] avg loss 0.00148879, throughput 2.84353K wps
[Epoch 68 Batch 450/1540] avg loss 0.00118207, throughput 2.85552K wps
[Epoch 68 Batch 480/1540] avg loss 0.00126864, throughput 2.88298K wps
[Epoch 68 Batch 510/1540] avg loss 0.00136322, throughput 2.83384K wps
[Epoch 68 Batch 540/1540] avg loss 0.00135538, throughput 2.85802K wps
[Epoch 68 Batch 570/1540] avg loss 0.00136025, throughput 2.8796K wps
[Epoch 68 Batch 600/1540] avg loss 0.00139865, throughput 2.86736K wps
[Epoch 68 Batch 630/1540] avg loss 0.00146072, throughput 2.87701K wps
[Epoch 68 Batch 660/1540] avg loss 0.00134608, throughput 2.88846K wps
[Epoch 68 Batch 690/1540] avg loss 0.00142137, throughput 2.85585K wps
[Epoch 68 Batch 720/1540] avg loss 0.00147293, throughput 2.87349K wps
[Epoch 68 Batch 750/1540] avg loss 0.00124596, throughput 2.84184K wps
[Epoch 68 Batch 780/1540] avg loss 0.00134268, throughput 2.86536K wps
[Epoch 68 Batch 810/1540] avg loss 0.00118601, throughput 2.88157K wps
[Epoch 68 Batch 840/1540] avg loss 0.00135527, throughput 2.85917K wps
[Epoch 68 Batch 870/1540] avg loss 0.00143757, throughput 2.83632K wps
[Epoch 68 Batch 900/1540] avg loss 0.00128205, throughput 2.81374K wps
[Epoch 68 Batch 930/1540] avg loss 0.00138094, throughput 2.86322K wps
[Epoch 68 Batch 960/1540] avg loss 0.00138868, throughput 2.87073K wps
[Epoch 68 Batch 990/1540] avg loss 0.00138386, throughput 2.87422K wps
[Epoch 68 Batch 1020/1540] avg loss 0.00131553, throughput 2.84358K wps
[Epoch 68 Batch 1050/1540] avg loss 0.00140384, throughput 2.88381K wps
[Epoch 68 Batch 1080/1540] avg loss 0.00161892, throughput 2.77232K wps
[Epoch 68 Batch 1110/1540] avg loss 0.00148205, throughput 2.81465K wps
[Epoch 68 Batch 1140/1540] avg loss 0.00153431, throughput 2.84545K wps
[Epoch 68 Batch 1170/1540] avg loss 0.00138349, throughput 2.81873K wps
[Epoch 68 Batch 1200/1540] avg loss 0.00140747, throughput 2.86449K wps
[Epoch 68 Batch 1230/1540] avg loss 0.00159407, throughput 2.87334K wps
[Epoch 68 Batch 1260/1540] avg loss 0.00124019, throughput 2.83877K wps
[Epoch 68 Batch 1290/1540] avg loss 0.0013837, throughput 2.88K wps
[Epoch 68 Batch 1320/1540] avg loss 0.0013709, throughput 2.87043K wps
[Epoch 68 Batch 1350/1540] avg loss 0.00134785, throughput 2.82007K wps
[Epoch 68 Batch 1380/1540] avg loss 0.00174242, throughput 2.86801K wps
[Epoch 68 Batch 1410/1540] avg loss 0.00128647, throughput 2.86212K wps
[Epoch 68 Batch 1440/1540] avg loss 0.00138818, throughput 2.87309K wps
[Epoch 68 Batch 1470/1540] avg loss 0.00119453, throughput 2.88283K wps
[Epoch 68 Batch 1500/1540] avg loss 0.00134178, throughput 2.8793K wps
[Epoch 68 Batch 1530/1540] avg loss 0.00151366, throughput 2.86484K wps
Begin Testing...
[Epoch 68] train avg loss 0.00136594, dev acc 0.8005, dev avg loss 0.730073, throughput 2.85813K wps
[Epoch 69 Batch 30/1540] avg loss 0.00108677, throughput 2.93316K wps
[Epoch 69 Batch 60/1540] avg loss 0.00134887, throughput 2.87616K wps
[Epoch 69 Batch 90/1540] avg loss 0.00123452, throughput 2.88014K wps
[Epoch 69 Batch 120/1540] avg loss 0.00115922, throughput 2.79294K wps
[Epoch 69 Batch 150/1540] avg loss 0.00120173, throughput 2.85999K wps
[Epoch 69 Batch 180/1540] avg loss 0.00133515, throughput 2.87775K wps
[Epoch 69 Batch 210/1540] avg loss 0.00130771, throughput 2.87925K wps
[Epoch 69 Batch 240/1540] avg loss 0.00115591, throughput 2.86455K wps
[Epoch 69 Batch 270/1540] avg loss 0.00136098, throughput 2.86961K wps
[Epoch 69 Batch 300/1540] avg loss 0.00159568, throughput 2.88481K wps
[Epoch 69 Batch 330/1540] avg loss 0.0010951, throughput 2.87791K wps
[Epoch 69 Batch 360/1540] avg loss 0.00108801, throughput 2.87676K wps
[Epoch 69 Batch 390/1540] avg loss 0.00119064, throughput 2.8744K wps
[Epoch 69 Batch 420/1540] avg loss 0.00130206, throughput 2.86436K wps
[Epoch 69 Batch 450/1540] avg loss 0.000966684, throughput 2.83045K wps
[Epoch 69 Batch 480/1540] avg loss 0.00122758, throughput 2.8812K wps
[Epoch 69 Batch 510/1540] avg loss 0.0012711, throughput 2.86265K wps
[Epoch 69 Batch 540/1540] avg loss 0.00124629, throughput 2.86758K wps
[Epoch 69 Batch 570/1540] avg loss 0.00133702, throughput 2.83996K wps
[Epoch 69 Batch 600/1540] avg loss 0.00153548, throughput 2.87091K wps
[Epoch 69 Batch 630/1540] avg loss 0.0011835, throughput 2.81232K wps
[Epoch 69 Batch 660/1540] avg loss 0.00145778, throughput 2.86717K wps
[Epoch 69 Batch 690/1540] avg loss 0.0014978, throughput 2.81336K wps
[Epoch 69 Batch 720/1540] avg loss 0.00138051, throughput 2.80532K wps
[Epoch 69 Batch 750/1540] avg loss 0.00164519, throughput 2.86651K wps
[Epoch 69 Batch 780/1540] avg loss 0.00156605, throughput 2.8027K wps
[Epoch 69 Batch 810/1540] avg loss 0.00134563, throughput 2.86924K wps
[Epoch 69 Batch 840/1540] avg loss 0.00136487, throughput 2.88479K wps
[Epoch 69 Batch 870/1540] avg loss 0.00128929, throughput 2.87521K wps
[Epoch 69 Batch 900/1540] avg loss 0.00115969, throughput 2.86471K wps
[Epoch 69 Batch 930/1540] avg loss 0.00139723, throughput 2.8698K wps
[Epoch 69 Batch 960/1540] avg loss 0.00113859, throughput 2.86811K wps
[Epoch 69 Batch 990/1540] avg loss 0.00159912, throughput 2.80308K wps
[Epoch 69 Batch 1020/1540] avg loss 0.00145419, throughput 2.86868K wps
[Epoch 69 Batch 1050/1540] avg loss 0.00120267, throughput 2.83571K wps
[Epoch 69 Batch 1080/1540] avg loss 0.00146918, throughput 2.87051K wps
[Epoch 69 Batch 1110/1540] avg loss 0.0015798, throughput 2.87018K wps
[Epoch 69 Batch 1140/1540] avg loss 0.00136862, throughput 2.86393K wps
[Epoch 69 Batch 1170/1540] avg loss 0.00149289, throughput 2.84741K wps