Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='MR', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='rand')
Use gpu0
Downloading data/mr/all-7606efec.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/mr/all-7606efec.zip...
maximum length (in tokens): 56
Done! Tokenizing Time=0.22s, #Sentences=10662
SentimentNet(
(embedding): Embedding(18768 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/173] avg loss 0.013858, throughput 0.762433K wps
[Epoch 0 Batch 60/173] avg loss 0.0138496, throughput 2.79303K wps
[Epoch 0 Batch 90/173] avg loss 0.0138834, throughput 2.77536K wps
[Epoch 0 Batch 120/173] avg loss 0.0138722, throughput 2.82234K wps
[Epoch 0 Batch 150/173] avg loss 0.0138735, throughput 2.79989K wps
Begin Testing...
[Epoch 0] train avg loss 0.0138898, dev acc 0.5422, dev avg loss 0.69218, throughput 1.51929K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0138447, throughput 2.83353K wps
[Epoch 1 Batch 60/173] avg loss 0.0138189, throughput 2.79188K wps
[Epoch 1 Batch 90/173] avg loss 0.0138579, throughput 2.83003K wps
[Epoch 1 Batch 120/173] avg loss 0.0138425, throughput 2.83742K wps
[Epoch 1 Batch 150/173] avg loss 0.0138478, throughput 2.84627K wps
Begin Testing...
[Epoch 1] train avg loss 0.0138566, dev acc 0.5047, dev avg loss 0.691591, throughput 2.82536K wps
[Epoch 2 Batch 30/173] avg loss 0.0137782, throughput 2.82725K wps
[Epoch 2 Batch 60/173] avg loss 0.0138161, throughput 2.8143K wps
[Epoch 2 Batch 90/173] avg loss 0.0137921, throughput 2.81419K wps
[Epoch 2 Batch 120/173] avg loss 0.0138244, throughput 2.81153K wps
[Epoch 2 Batch 150/173] avg loss 0.0137816, throughput 2.82565K wps
Begin Testing...
[Epoch 2] train avg loss 0.0138207, dev acc 0.5766, dev avg loss 0.689835, throughput 2.81958K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.0138143, throughput 2.87501K wps
[Epoch 3 Batch 60/173] avg loss 0.0137879, throughput 2.81266K wps
[Epoch 3 Batch 90/173] avg loss 0.0137789, throughput 2.86306K wps
[Epoch 3 Batch 120/173] avg loss 0.0137555, throughput 2.81338K wps
[Epoch 3 Batch 150/173] avg loss 0.0137677, throughput 2.81463K wps
Begin Testing...
[Epoch 3] train avg loss 0.0137984, dev acc 0.5600, dev avg loss 0.688921, throughput 2.83038K wps
[Epoch 4 Batch 30/173] avg loss 0.0137649, throughput 2.88036K wps
[Epoch 4 Batch 60/173] avg loss 0.0137381, throughput 2.82196K wps
[Epoch 4 Batch 90/173] avg loss 0.0137586, throughput 2.83709K wps
[Epoch 4 Batch 120/173] avg loss 0.0137859, throughput 2.84627K wps
[Epoch 4 Batch 150/173] avg loss 0.0137141, throughput 2.84843K wps
Begin Testing...
[Epoch 4] train avg loss 0.0137753, dev acc 0.5839, dev avg loss 0.686919, throughput 2.84992K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/173] avg loss 0.0137411, throughput 2.93813K wps
[Epoch 5 Batch 60/173] avg loss 0.0136844, throughput 2.79729K wps
[Epoch 5 Batch 90/173] avg loss 0.0137189, throughput 2.78059K wps
[Epoch 5 Batch 120/173] avg loss 0.0137056, throughput 2.85769K wps
[Epoch 5 Batch 150/173] avg loss 0.0137215, throughput 2.86254K wps
Begin Testing...
[Epoch 5] train avg loss 0.0137371, dev acc 0.5912, dev avg loss 0.685388, throughput 2.84576K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/173] avg loss 0.0136487, throughput 2.8991K wps
[Epoch 6 Batch 60/173] avg loss 0.0136887, throughput 2.84304K wps
[Epoch 6 Batch 90/173] avg loss 0.0137035, throughput 2.82467K wps
[Epoch 6 Batch 120/173] avg loss 0.0136893, throughput 2.84226K wps
[Epoch 6 Batch 150/173] avg loss 0.0136893, throughput 2.78838K wps
Begin Testing...
[Epoch 6] train avg loss 0.0137062, dev acc 0.5829, dev avg loss 0.68377, throughput 2.8394K wps
[Epoch 7 Batch 30/173] avg loss 0.0136731, throughput 2.84974K wps
[Epoch 7 Batch 60/173] avg loss 0.0136708, throughput 2.82122K wps
[Epoch 7 Batch 90/173] avg loss 0.0136266, throughput 2.82213K wps
[Epoch 7 Batch 120/173] avg loss 0.0136371, throughput 2.81602K wps
[Epoch 7 Batch 150/173] avg loss 0.0136621, throughput 2.79375K wps
Begin Testing...
[Epoch 7] train avg loss 0.0136738, dev acc 0.5902, dev avg loss 0.682368, throughput 2.8177K wps
[Epoch 8 Batch 30/173] avg loss 0.0136065, throughput 2.88073K wps
[Epoch 8 Batch 60/173] avg loss 0.0136484, throughput 2.82679K wps
[Epoch 8 Batch 90/173] avg loss 0.0136317, throughput 2.78794K wps
[Epoch 8 Batch 120/173] avg loss 0.0136507, throughput 2.83045K wps
[Epoch 8 Batch 150/173] avg loss 0.0136405, throughput 2.79362K wps
Begin Testing...
[Epoch 8] train avg loss 0.0136478, dev acc 0.5808, dev avg loss 0.681027, throughput 2.82163K wps
[Epoch 9 Batch 30/173] avg loss 0.0136628, throughput 2.85078K wps
[Epoch 9 Batch 60/173] avg loss 0.0136173, throughput 2.80469K wps
[Epoch 9 Batch 90/173] avg loss 0.013673, throughput 2.84321K wps
[Epoch 9 Batch 120/173] avg loss 0.0135095, throughput 2.81161K wps
[Epoch 9 Batch 150/173] avg loss 0.0136111, throughput 2.80942K wps
Begin Testing...
[Epoch 9] train avg loss 0.0136241, dev acc 0.5819, dev avg loss 0.679631, throughput 2.81984K wps
[Epoch 10 Batch 30/173] avg loss 0.0135318, throughput 2.86662K wps
[Epoch 10 Batch 60/173] avg loss 0.0135711, throughput 2.82626K wps
[Epoch 10 Batch 90/173] avg loss 0.013652, throughput 2.83093K wps
[Epoch 10 Batch 120/173] avg loss 0.0134934, throughput 2.85345K wps
[Epoch 10 Batch 150/173] avg loss 0.0135837, throughput 2.85902K wps
Begin Testing...
[Epoch 10] train avg loss 0.0135883, dev acc 0.5902, dev avg loss 0.679213, throughput 2.83681K wps
[Epoch 11 Batch 30/173] avg loss 0.0135771, throughput 2.86138K wps
[Epoch 11 Batch 60/173] avg loss 0.0135672, throughput 2.83241K wps
[Epoch 11 Batch 90/173] avg loss 0.0135108, throughput 2.8306K wps
[Epoch 11 Batch 120/173] avg loss 0.0135247, throughput 2.85073K wps
[Epoch 11 Batch 150/173] avg loss 0.0135443, throughput 2.84801K wps
Begin Testing...
[Epoch 11] train avg loss 0.0135647, dev acc 0.5850, dev avg loss 0.677075, throughput 2.84191K wps
[Epoch 12 Batch 30/173] avg loss 0.0134818, throughput 2.85652K wps
[Epoch 12 Batch 60/173] avg loss 0.0134947, throughput 2.85453K wps
[Epoch 12 Batch 90/173] avg loss 0.0134501, throughput 2.83772K wps
[Epoch 12 Batch 120/173] avg loss 0.0135248, throughput 2.83518K wps
[Epoch 12 Batch 150/173] avg loss 0.0134445, throughput 2.8326K wps
Begin Testing...
[Epoch 12] train avg loss 0.0135126, dev acc 0.5912, dev avg loss 0.675472, throughput 2.8477K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/173] avg loss 0.0134934, throughput 2.93718K wps
[Epoch 13 Batch 60/173] avg loss 0.0134772, throughput 2.85952K wps
[Epoch 13 Batch 90/173] avg loss 0.0134846, throughput 2.82341K wps
[Epoch 13 Batch 120/173] avg loss 0.0134049, throughput 2.84997K wps
[Epoch 13 Batch 150/173] avg loss 0.0134738, throughput 2.86765K wps
Begin Testing...
[Epoch 13] train avg loss 0.0134702, dev acc 0.5954, dev avg loss 0.67374, throughput 2.86205K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/173] avg loss 0.0135191, throughput 2.9097K wps
[Epoch 14 Batch 60/173] avg loss 0.0134871, throughput 2.87342K wps
[Epoch 14 Batch 90/173] avg loss 0.0134244, throughput 2.86084K wps
[Epoch 14 Batch 120/173] avg loss 0.0133643, throughput 2.86418K wps
[Epoch 14 Batch 150/173] avg loss 0.0134448, throughput 2.87526K wps
Begin Testing...
[Epoch 14] train avg loss 0.0134617, dev acc 0.5975, dev avg loss 0.672423, throughput 2.87689K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/173] avg loss 0.0133662, throughput 2.88074K wps
[Epoch 15 Batch 60/173] avg loss 0.013414, throughput 2.80659K wps
[Epoch 15 Batch 90/173] avg loss 0.0134034, throughput 2.78849K wps
[Epoch 15 Batch 120/173] avg loss 0.0134373, throughput 2.79753K wps
[Epoch 15 Batch 150/173] avg loss 0.0132903, throughput 2.82186K wps
Begin Testing...
[Epoch 15] train avg loss 0.0133975, dev acc 0.6017, dev avg loss 0.670618, throughput 2.81678K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.0133347, throughput 2.89711K wps
[Epoch 16 Batch 60/173] avg loss 0.0132968, throughput 2.83123K wps
[Epoch 16 Batch 90/173] avg loss 0.0133784, throughput 2.86253K wps
[Epoch 16 Batch 120/173] avg loss 0.0134435, throughput 2.85705K wps
[Epoch 16 Batch 150/173] avg loss 0.0133243, throughput 2.8678K wps
Begin Testing...
[Epoch 16] train avg loss 0.0133723, dev acc 0.6006, dev avg loss 0.668996, throughput 2.86339K wps
[Epoch 17 Batch 30/173] avg loss 0.0132163, throughput 2.94108K wps
[Epoch 17 Batch 60/173] avg loss 0.0134144, throughput 2.83627K wps
[Epoch 17 Batch 90/173] avg loss 0.0132425, throughput 2.82779K wps
[Epoch 17 Batch 120/173] avg loss 0.0133472, throughput 2.81064K wps
[Epoch 17 Batch 150/173] avg loss 0.0132521, throughput 2.80339K wps
Begin Testing...
[Epoch 17] train avg loss 0.0133049, dev acc 0.6048, dev avg loss 0.667247, throughput 2.83487K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/173] avg loss 0.0133428, throughput 2.86247K wps
[Epoch 18 Batch 60/173] avg loss 0.0132422, throughput 2.77369K wps
[Epoch 18 Batch 90/173] avg loss 0.0132231, throughput 2.82482K wps
[Epoch 18 Batch 120/173] avg loss 0.0132252, throughput 2.79923K wps
[Epoch 18 Batch 150/173] avg loss 0.0133112, throughput 2.81078K wps
Begin Testing...
[Epoch 18] train avg loss 0.0132943, dev acc 0.5985, dev avg loss 0.66658, throughput 2.80645K wps
[Epoch 19 Batch 30/173] avg loss 0.0132426, throughput 2.86364K wps
[Epoch 19 Batch 60/173] avg loss 0.0132867, throughput 2.78265K wps
[Epoch 19 Batch 90/173] avg loss 0.01327, throughput 2.84042K wps
[Epoch 19 Batch 120/173] avg loss 0.0131207, throughput 2.82015K wps
[Epoch 19 Batch 150/173] avg loss 0.0131522, throughput 2.84182K wps
Begin Testing...
[Epoch 19] train avg loss 0.01324, dev acc 0.6069, dev avg loss 0.664014, throughput 2.82469K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/173] avg loss 0.0131864, throughput 2.93899K wps
[Epoch 20 Batch 60/173] avg loss 0.0131016, throughput 2.87168K wps
[Epoch 20 Batch 90/173] avg loss 0.0132824, throughput 2.85806K wps
[Epoch 20 Batch 120/173] avg loss 0.0131796, throughput 2.85686K wps
[Epoch 20 Batch 150/173] avg loss 0.0132, throughput 2.84666K wps
Begin Testing...
[Epoch 20] train avg loss 0.0131956, dev acc 0.6163, dev avg loss 0.66184, throughput 2.86432K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/173] avg loss 0.0131521, throughput 2.91728K wps
[Epoch 21 Batch 60/173] avg loss 0.013115, throughput 2.85299K wps
[Epoch 21 Batch 90/173] avg loss 0.0131104, throughput 2.80085K wps
[Epoch 21 Batch 120/173] avg loss 0.0130376, throughput 2.87797K wps
[Epoch 21 Batch 150/173] avg loss 0.0131251, throughput 2.87154K wps
Begin Testing...
[Epoch 21] train avg loss 0.0131229, dev acc 0.6173, dev avg loss 0.659579, throughput 2.86443K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/173] avg loss 0.0129512, throughput 2.93767K wps
[Epoch 22 Batch 60/173] avg loss 0.0130849, throughput 2.82667K wps
[Epoch 22 Batch 90/173] avg loss 0.0131647, throughput 2.85706K wps
[Epoch 22 Batch 120/173] avg loss 0.013079, throughput 2.8537K wps
[Epoch 22 Batch 150/173] avg loss 0.0129188, throughput 2.81473K wps
Begin Testing...
[Epoch 22] train avg loss 0.0130697, dev acc 0.6173, dev avg loss 0.657897, throughput 2.85306K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/173] avg loss 0.0129347, throughput 2.81637K wps
[Epoch 23 Batch 60/173] avg loss 0.0128498, throughput 2.79162K wps
[Epoch 23 Batch 90/173] avg loss 0.0131128, throughput 2.78956K wps
[Epoch 23 Batch 120/173] avg loss 0.0129352, throughput 2.81724K wps
[Epoch 23 Batch 150/173] avg loss 0.0129007, throughput 2.80051K wps
Begin Testing...
[Epoch 23] train avg loss 0.0129677, dev acc 0.6152, dev avg loss 0.65606, throughput 2.80967K wps
[Epoch 24 Batch 30/173] avg loss 0.0129382, throughput 2.94473K wps
[Epoch 24 Batch 60/173] avg loss 0.0129373, throughput 2.87612K wps
[Epoch 24 Batch 90/173] avg loss 0.0129668, throughput 2.87711K wps
[Epoch 24 Batch 120/173] avg loss 0.0130905, throughput 2.87184K wps
[Epoch 24 Batch 150/173] avg loss 0.0128588, throughput 2.83659K wps
Begin Testing...
[Epoch 24] train avg loss 0.012958, dev acc 0.6277, dev avg loss 0.652918, throughput 2.87249K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/173] avg loss 0.0130018, throughput 2.92795K wps
[Epoch 25 Batch 60/173] avg loss 0.0127448, throughput 2.85109K wps
[Epoch 25 Batch 90/173] avg loss 0.0128802, throughput 2.81428K wps
[Epoch 25 Batch 120/173] avg loss 0.0128509, throughput 2.8283K wps
[Epoch 25 Batch 150/173] avg loss 0.0127495, throughput 2.80297K wps
Begin Testing...
[Epoch 25] train avg loss 0.0128703, dev acc 0.6350, dev avg loss 0.650426, throughput 2.83684K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/173] avg loss 0.0128737, throughput 2.86179K wps
[Epoch 26 Batch 60/173] avg loss 0.0128789, throughput 2.81829K wps
[Epoch 26 Batch 90/173] avg loss 0.0126877, throughput 2.80239K wps
[Epoch 26 Batch 120/173] avg loss 0.0127662, throughput 2.84039K wps
[Epoch 26 Batch 150/173] avg loss 0.0127523, throughput 2.80538K wps
Begin Testing...
[Epoch 26] train avg loss 0.0128087, dev acc 0.6288, dev avg loss 0.648372, throughput 2.82385K wps
[Epoch 27 Batch 30/173] avg loss 0.0127802, throughput 2.86895K wps
[Epoch 27 Batch 60/173] avg loss 0.0127903, throughput 2.79538K wps
[Epoch 27 Batch 90/173] avg loss 0.012767, throughput 2.82843K wps
[Epoch 27 Batch 120/173] avg loss 0.0128405, throughput 2.8531K wps
[Epoch 27 Batch 150/173] avg loss 0.0125621, throughput 2.83516K wps
Begin Testing...
[Epoch 27] train avg loss 0.0127629, dev acc 0.6330, dev avg loss 0.64615, throughput 2.83825K wps
[Epoch 28 Batch 30/173] avg loss 0.0126, throughput 2.94126K wps
[Epoch 28 Batch 60/173] avg loss 0.0126406, throughput 2.86966K wps
[Epoch 28 Batch 90/173] avg loss 0.0126047, throughput 2.87088K wps
[Epoch 28 Batch 120/173] avg loss 0.0128322, throughput 2.87456K wps
[Epoch 28 Batch 150/173] avg loss 0.0125731, throughput 2.87009K wps
Begin Testing...
[Epoch 28] train avg loss 0.0126636, dev acc 0.6434, dev avg loss 0.642774, throughput 2.87931K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/173] avg loss 0.012581, throughput 2.92737K wps
[Epoch 29 Batch 60/173] avg loss 0.0126474, throughput 2.83175K wps
[Epoch 29 Batch 90/173] avg loss 0.0126519, throughput 2.77202K wps
[Epoch 29 Batch 120/173] avg loss 0.0125789, throughput 2.79702K wps
[Epoch 29 Batch 150/173] avg loss 0.012517, throughput 2.83076K wps
Begin Testing...
[Epoch 29] train avg loss 0.0126127, dev acc 0.6423, dev avg loss 0.640157, throughput 2.82758K wps
[Epoch 30 Batch 30/173] avg loss 0.0125573, throughput 2.86426K wps
[Epoch 30 Batch 60/173] avg loss 0.0125171, throughput 2.74151K wps
[Epoch 30 Batch 90/173] avg loss 0.0124179, throughput 2.79426K wps
[Epoch 30 Batch 120/173] avg loss 0.0124316, throughput 2.77382K wps
[Epoch 30 Batch 150/173] avg loss 0.012546, throughput 2.81879K wps
Begin Testing...
[Epoch 30] train avg loss 0.012486, dev acc 0.6538, dev avg loss 0.638809, throughput 2.79954K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/173] avg loss 0.0123892, throughput 2.87259K wps
[Epoch 31 Batch 60/173] avg loss 0.0124444, throughput 2.83109K wps
[Epoch 31 Batch 90/173] avg loss 0.0122544, throughput 2.80389K wps
[Epoch 31 Batch 120/173] avg loss 0.0124181, throughput 2.7875K wps
[Epoch 31 Batch 150/173] avg loss 0.0126043, throughput 2.84444K wps
Begin Testing...
[Epoch 31] train avg loss 0.0124664, dev acc 0.6611, dev avg loss 0.634368, throughput 2.83272K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/173] avg loss 0.012314, throughput 2.9194K wps
[Epoch 32 Batch 60/173] avg loss 0.0123579, throughput 2.84725K wps
[Epoch 32 Batch 90/173] avg loss 0.0123689, throughput 2.84264K wps
[Epoch 32 Batch 120/173] avg loss 0.0125033, throughput 2.78425K wps
[Epoch 32 Batch 150/173] avg loss 0.0122131, throughput 2.76528K wps
Begin Testing...
[Epoch 32] train avg loss 0.0123723, dev acc 0.6621, dev avg loss 0.631311, throughput 2.83074K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/173] avg loss 0.0122469, throughput 2.87305K wps
[Epoch 33 Batch 60/173] avg loss 0.0122004, throughput 2.8201K wps
[Epoch 33 Batch 90/173] avg loss 0.0121345, throughput 2.83496K wps
[Epoch 33 Batch 120/173] avg loss 0.0122122, throughput 2.84116K wps
[Epoch 33 Batch 150/173] avg loss 0.0124643, throughput 2.83955K wps
Begin Testing...
[Epoch 33] train avg loss 0.0122679, dev acc 0.6705, dev avg loss 0.630289, throughput 2.84199K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/173] avg loss 0.0122934, throughput 2.86332K wps
[Epoch 34 Batch 60/173] avg loss 0.0121103, throughput 2.8098K wps
[Epoch 34 Batch 90/173] avg loss 0.0122489, throughput 2.77717K wps
[Epoch 34 Batch 120/173] avg loss 0.0121176, throughput 2.77783K wps
[Epoch 34 Batch 150/173] avg loss 0.0121982, throughput 2.75004K wps
Begin Testing...
[Epoch 34] train avg loss 0.0122037, dev acc 0.6674, dev avg loss 0.62594, throughput 2.79478K wps
[Epoch 35 Batch 30/173] avg loss 0.0122618, throughput 2.8664K wps
[Epoch 35 Batch 60/173] avg loss 0.0121416, throughput 2.85317K wps
[Epoch 35 Batch 90/173] avg loss 0.0119672, throughput 2.7785K wps
[Epoch 35 Batch 120/173] avg loss 0.0119667, throughput 2.78181K wps
[Epoch 35 Batch 150/173] avg loss 0.0120108, throughput 2.7915K wps
Begin Testing...
[Epoch 35] train avg loss 0.0120902, dev acc 0.6684, dev avg loss 0.621378, throughput 2.81196K wps
[Epoch 36 Batch 30/173] avg loss 0.0119505, throughput 2.85784K wps
[Epoch 36 Batch 60/173] avg loss 0.0120262, throughput 2.82871K wps
[Epoch 36 Batch 90/173] avg loss 0.0119942, throughput 2.85106K wps
[Epoch 36 Batch 120/173] avg loss 0.0120489, throughput 2.84755K wps
[Epoch 36 Batch 150/173] avg loss 0.0119616, throughput 2.85951K wps
Begin Testing...
[Epoch 36] train avg loss 0.0120167, dev acc 0.6705, dev avg loss 0.618352, throughput 2.84851K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/173] avg loss 0.0117772, throughput 2.89367K wps
[Epoch 37 Batch 60/173] avg loss 0.012061, throughput 2.88041K wps
[Epoch 37 Batch 90/173] avg loss 0.0119101, throughput 2.84421K wps
[Epoch 37 Batch 120/173] avg loss 0.0119083, throughput 2.82538K wps
[Epoch 37 Batch 150/173] avg loss 0.0116779, throughput 2.86788K wps
Begin Testing...
[Epoch 37] train avg loss 0.0119086, dev acc 0.6788, dev avg loss 0.61476, throughput 2.86161K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/173] avg loss 0.0116992, throughput 2.8921K wps
[Epoch 38 Batch 60/173] avg loss 0.0117128, throughput 2.82853K wps
[Epoch 38 Batch 90/173] avg loss 0.0119333, throughput 2.81985K wps
[Epoch 38 Batch 120/173] avg loss 0.0119745, throughput 2.81597K wps
[Epoch 38 Batch 150/173] avg loss 0.0116262, throughput 2.82154K wps
Begin Testing...
[Epoch 38] train avg loss 0.0117892, dev acc 0.6820, dev avg loss 0.610994, throughput 2.83434K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/173] avg loss 0.0117576, throughput 2.8398K wps
[Epoch 39 Batch 60/173] avg loss 0.0116196, throughput 2.76717K wps
[Epoch 39 Batch 90/173] avg loss 0.0115376, throughput 2.80029K wps
[Epoch 39 Batch 120/173] avg loss 0.0118645, throughput 2.76185K wps
[Epoch 39 Batch 150/173] avg loss 0.0114711, throughput 2.81771K wps
Begin Testing...
[Epoch 39] train avg loss 0.0116802, dev acc 0.6966, dev avg loss 0.608097, throughput 2.80355K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/173] avg loss 0.0115817, throughput 2.90773K wps
[Epoch 40 Batch 60/173] avg loss 0.0114582, throughput 2.87105K wps
[Epoch 40 Batch 90/173] avg loss 0.0114915, throughput 2.84442K wps
[Epoch 40 Batch 120/173] avg loss 0.0115942, throughput 2.86819K wps
[Epoch 40 Batch 150/173] avg loss 0.0115432, throughput 2.82321K wps
Begin Testing...
[Epoch 40] train avg loss 0.0115475, dev acc 0.6934, dev avg loss 0.602782, throughput 2.86322K wps
[Epoch 41 Batch 30/173] avg loss 0.0113578, throughput 2.90167K wps
[Epoch 41 Batch 60/173] avg loss 0.0113838, throughput 2.8414K wps
[Epoch 41 Batch 90/173] avg loss 0.0115125, throughput 2.83155K wps
[Epoch 41 Batch 120/173] avg loss 0.011407, throughput 2.81586K wps
[Epoch 41 Batch 150/173] avg loss 0.0113664, throughput 2.82739K wps
Begin Testing...
[Epoch 41] train avg loss 0.0114453, dev acc 0.6882, dev avg loss 0.600322, throughput 2.84122K wps
[Epoch 42 Batch 30/173] avg loss 0.0112032, throughput 2.86589K wps
[Epoch 42 Batch 60/173] avg loss 0.0113178, throughput 2.82529K wps
[Epoch 42 Batch 90/173] avg loss 0.0112512, throughput 2.79947K wps
[Epoch 42 Batch 120/173] avg loss 0.0113452, throughput 2.79559K wps
[Epoch 42 Batch 150/173] avg loss 0.0112048, throughput 2.82014K wps
Begin Testing...
[Epoch 42] train avg loss 0.0112701, dev acc 0.6997, dev avg loss 0.593351, throughput 2.8155K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/173] avg loss 0.0111501, throughput 2.895K wps
[Epoch 43 Batch 60/173] avg loss 0.0111514, throughput 2.82007K wps
[Epoch 43 Batch 90/173] avg loss 0.0111877, throughput 2.80141K wps
[Epoch 43 Batch 120/173] avg loss 0.0111339, throughput 2.7659K wps
[Epoch 43 Batch 150/173] avg loss 0.0110888, throughput 2.7911K wps
Begin Testing...
[Epoch 43] train avg loss 0.0111506, dev acc 0.7007, dev avg loss 0.588458, throughput 2.82063K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/173] avg loss 0.0110519, throughput 2.92558K wps
[Epoch 44 Batch 60/173] avg loss 0.0109655, throughput 2.83079K wps
[Epoch 44 Batch 90/173] avg loss 0.0110673, throughput 2.84174K wps
[Epoch 44 Batch 120/173] avg loss 0.0110981, throughput 2.86743K wps
[Epoch 44 Batch 150/173] avg loss 0.0108446, throughput 2.84226K wps
Begin Testing...
[Epoch 44] train avg loss 0.0109834, dev acc 0.7091, dev avg loss 0.584133, throughput 2.85928K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/173] avg loss 0.0109133, throughput 2.88495K wps
[Epoch 45 Batch 60/173] avg loss 0.0108108, throughput 2.84918K wps
[Epoch 45 Batch 90/173] avg loss 0.0109239, throughput 2.83224K wps
[Epoch 45 Batch 120/173] avg loss 0.0109531, throughput 2.79935K wps
[Epoch 45 Batch 150/173] avg loss 0.0106193, throughput 2.8237K wps
Begin Testing...
[Epoch 45] train avg loss 0.0108617, dev acc 0.7122, dev avg loss 0.579403, throughput 2.82872K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/173] avg loss 0.0108413, throughput 2.91204K wps
[Epoch 46 Batch 60/173] avg loss 0.0108481, throughput 2.86967K wps
[Epoch 46 Batch 90/173] avg loss 0.0105969, throughput 2.82418K wps
[Epoch 46 Batch 120/173] avg loss 0.0107021, throughput 2.82986K wps
[Epoch 46 Batch 150/173] avg loss 0.0105239, throughput 2.78903K wps
Begin Testing...
[Epoch 46] train avg loss 0.01073, dev acc 0.7101, dev avg loss 0.573997, throughput 2.83126K wps
[Epoch 47 Batch 30/173] avg loss 0.0102763, throughput 2.86943K wps
[Epoch 47 Batch 60/173] avg loss 0.0105362, throughput 2.847K wps
[Epoch 47 Batch 90/173] avg loss 0.0105802, throughput 2.81188K wps
[Epoch 47 Batch 120/173] avg loss 0.0106278, throughput 2.84942K wps
[Epoch 47 Batch 150/173] avg loss 0.0105746, throughput 2.7711K wps
Begin Testing...
[Epoch 47] train avg loss 0.0105047, dev acc 0.7091, dev avg loss 0.569951, throughput 2.82396K wps
[Epoch 48 Batch 30/173] avg loss 0.0103775, throughput 2.89806K wps
[Epoch 48 Batch 60/173] avg loss 0.0103894, throughput 2.85863K wps
[Epoch 48 Batch 90/173] avg loss 0.0104147, throughput 2.82972K wps
[Epoch 48 Batch 120/173] avg loss 0.0103617, throughput 2.85075K wps
[Epoch 48 Batch 150/173] avg loss 0.0104161, throughput 2.85204K wps
Begin Testing...
[Epoch 48] train avg loss 0.0103821, dev acc 0.7164, dev avg loss 0.563885, throughput 2.85051K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/173] avg loss 0.010285, throughput 2.92778K wps
[Epoch 49 Batch 60/173] avg loss 0.0101716, throughput 2.83842K wps
[Epoch 49 Batch 90/173] avg loss 0.0101329, throughput 2.80002K wps
[Epoch 49 Batch 120/173] avg loss 0.00994213, throughput 2.833K wps
[Epoch 49 Batch 150/173] avg loss 0.010308, throughput 2.83958K wps
Begin Testing...
[Epoch 49] train avg loss 0.0101861, dev acc 0.7404, dev avg loss 0.560171, throughput 2.84601K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/173] avg loss 0.00986438, throughput 2.89123K wps
[Epoch 50 Batch 60/173] avg loss 0.0100241, throughput 2.78586K wps
[Epoch 50 Batch 90/173] avg loss 0.00993399, throughput 2.82807K wps
[Epoch 50 Batch 120/173] avg loss 0.0101724, throughput 2.78747K wps
[Epoch 50 Batch 150/173] avg loss 0.0101361, throughput 2.82571K wps
Begin Testing...
[Epoch 50] train avg loss 0.0100675, dev acc 0.7445, dev avg loss 0.559735, throughput 2.82248K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/173] avg loss 0.00947852, throughput 2.89659K wps
[Epoch 51 Batch 60/173] avg loss 0.00979739, throughput 2.75455K wps
[Epoch 51 Batch 90/173] avg loss 0.0101167, throughput 2.7622K wps
[Epoch 51 Batch 120/173] avg loss 0.00979585, throughput 2.80788K wps
[Epoch 51 Batch 150/173] avg loss 0.00982677, throughput 2.83035K wps
Begin Testing...
[Epoch 51] train avg loss 0.00987029, dev acc 0.7435, dev avg loss 0.551499, throughput 2.81066K wps
[Epoch 52 Batch 30/173] avg loss 0.00972131, throughput 2.8769K wps
[Epoch 52 Batch 60/173] avg loss 0.00978643, throughput 2.8436K wps
[Epoch 52 Batch 90/173] avg loss 0.00957626, throughput 2.84734K wps
[Epoch 52 Batch 120/173] avg loss 0.00993765, throughput 2.83047K wps
[Epoch 52 Batch 150/173] avg loss 0.00951943, throughput 2.81223K wps
Begin Testing...
[Epoch 52] train avg loss 0.00970485, dev acc 0.7372, dev avg loss 0.544785, throughput 2.83065K wps
[Epoch 53 Batch 30/173] avg loss 0.00954047, throughput 2.86373K wps
[Epoch 53 Batch 60/173] avg loss 0.0094582, throughput 2.87255K wps
[Epoch 53 Batch 90/173] avg loss 0.00932399, throughput 2.87269K wps
[Epoch 53 Batch 120/173] avg loss 0.00918729, throughput 2.87796K wps
[Epoch 53 Batch 150/173] avg loss 0.00953964, throughput 2.85571K wps
Begin Testing...
[Epoch 53] train avg loss 0.00946817, dev acc 0.7351, dev avg loss 0.539095, throughput 2.86734K wps
[Epoch 54 Batch 30/173] avg loss 0.00921399, throughput 2.82324K wps
[Epoch 54 Batch 60/173] avg loss 0.00955666, throughput 2.76707K wps
[Epoch 54 Batch 90/173] avg loss 0.00962566, throughput 2.82328K wps
[Epoch 54 Batch 120/173] avg loss 0.00909644, throughput 2.78294K wps
[Epoch 54 Batch 150/173] avg loss 0.00913621, throughput 2.83901K wps
Begin Testing...
[Epoch 54] train avg loss 0.00932897, dev acc 0.7497, dev avg loss 0.537563, throughput 2.80821K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/173] avg loss 0.00894636, throughput 2.88883K wps
[Epoch 55 Batch 60/173] avg loss 0.00919017, throughput 2.87058K wps
[Epoch 55 Batch 90/173] avg loss 0.00929837, throughput 2.81403K wps
[Epoch 55 Batch 120/173] avg loss 0.00897295, throughput 2.79308K wps
[Epoch 55 Batch 150/173] avg loss 0.00921189, throughput 2.84668K wps
Begin Testing...
[Epoch 55] train avg loss 0.00915892, dev acc 0.7508, dev avg loss 0.530376, throughput 2.84381K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/173] avg loss 0.00909759, throughput 2.89226K wps
[Epoch 56 Batch 60/173] avg loss 0.00886636, throughput 2.85397K wps
[Epoch 56 Batch 90/173] avg loss 0.00897176, throughput 2.85759K wps
[Epoch 56 Batch 120/173] avg loss 0.00875004, throughput 2.86856K wps
[Epoch 56 Batch 150/173] avg loss 0.00890541, throughput 2.81987K wps
Begin Testing...
[Epoch 56] train avg loss 0.00891004, dev acc 0.7435, dev avg loss 0.524942, throughput 2.85491K wps
[Epoch 57 Batch 30/173] avg loss 0.00884751, throughput 2.88498K wps
[Epoch 57 Batch 60/173] avg loss 0.00867285, throughput 2.84107K wps
[Epoch 57 Batch 90/173] avg loss 0.00852449, throughput 2.79731K wps
[Epoch 57 Batch 120/173] avg loss 0.00865284, throughput 2.78065K wps
[Epoch 57 Batch 150/173] avg loss 0.00875395, throughput 2.77926K wps
Begin Testing...
[Epoch 57] train avg loss 0.00866548, dev acc 0.7570, dev avg loss 0.524794, throughput 2.8192K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/173] avg loss 0.008396, throughput 2.84244K wps
[Epoch 58 Batch 60/173] avg loss 0.0087398, throughput 2.83203K wps
[Epoch 58 Batch 90/173] avg loss 0.00858158, throughput 2.75378K wps
[Epoch 58 Batch 120/173] avg loss 0.00858318, throughput 2.8251K wps
[Epoch 58 Batch 150/173] avg loss 0.00860381, throughput 2.80327K wps
Begin Testing...
[Epoch 58] train avg loss 0.00854979, dev acc 0.7404, dev avg loss 0.517342, throughput 2.81465K wps
[Epoch 59 Batch 30/173] avg loss 0.00825582, throughput 2.86287K wps
[Epoch 59 Batch 60/173] avg loss 0.00838383, throughput 2.84834K wps
[Epoch 59 Batch 90/173] avg loss 0.00828314, throughput 2.80032K wps
[Epoch 59 Batch 120/173] avg loss 0.00813999, throughput 2.8515K wps
[Epoch 59 Batch 150/173] avg loss 0.00830236, throughput 2.82865K wps
Begin Testing...
[Epoch 59] train avg loss 0.00831674, dev acc 0.7487, dev avg loss 0.513146, throughput 2.83105K wps
[Epoch 60 Batch 30/173] avg loss 0.00813958, throughput 2.87878K wps
[Epoch 60 Batch 60/173] avg loss 0.00844412, throughput 2.80211K wps
[Epoch 60 Batch 90/173] avg loss 0.00804438, throughput 2.82917K wps
[Epoch 60 Batch 120/173] avg loss 0.00824252, throughput 2.87512K wps
[Epoch 60 Batch 150/173] avg loss 0.00803406, throughput 2.82136K wps
Begin Testing...
[Epoch 60] train avg loss 0.00818787, dev acc 0.7550, dev avg loss 0.510735, throughput 2.84434K wps
[Epoch 61 Batch 30/173] avg loss 0.00804397, throughput 2.89971K wps
[Epoch 61 Batch 60/173] avg loss 0.00800026, throughput 2.84552K wps
[Epoch 61 Batch 90/173] avg loss 0.00811896, throughput 2.84653K wps
[Epoch 61 Batch 120/173] avg loss 0.00794034, throughput 2.82467K wps
[Epoch 61 Batch 150/173] avg loss 0.00801316, throughput 2.81427K wps
Begin Testing...
[Epoch 61] train avg loss 0.00799783, dev acc 0.7550, dev avg loss 0.50842, throughput 2.84867K wps
[Epoch 62 Batch 30/173] avg loss 0.00755217, throughput 2.92047K wps
[Epoch 62 Batch 60/173] avg loss 0.0079229, throughput 2.87197K wps
[Epoch 62 Batch 90/173] avg loss 0.00772239, throughput 2.85032K wps
[Epoch 62 Batch 120/173] avg loss 0.00736654, throughput 2.8659K wps
[Epoch 62 Batch 150/173] avg loss 0.00785625, throughput 2.86859K wps
Begin Testing...
[Epoch 62] train avg loss 0.00770815, dev acc 0.7602, dev avg loss 0.502434, throughput 2.87137K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/173] avg loss 0.00745631, throughput 2.92708K wps
[Epoch 63 Batch 60/173] avg loss 0.00755982, throughput 2.85831K wps
[Epoch 63 Batch 90/173] avg loss 0.00756875, throughput 2.78068K wps
[Epoch 63 Batch 120/173] avg loss 0.00756388, throughput 2.79757K wps
[Epoch 63 Batch 150/173] avg loss 0.00781641, throughput 2.84634K wps
Begin Testing...
[Epoch 63] train avg loss 0.00759379, dev acc 0.7560, dev avg loss 0.499807, throughput 2.8376K wps
[Epoch 64 Batch 30/173] avg loss 0.00746125, throughput 2.92758K wps
[Epoch 64 Batch 60/173] avg loss 0.00746036, throughput 2.83773K wps
[Epoch 64 Batch 90/173] avg loss 0.00742351, throughput 2.857K wps
[Epoch 64 Batch 120/173] avg loss 0.00737278, throughput 2.79695K wps
[Epoch 64 Batch 150/173] avg loss 0.00752938, throughput 2.78863K wps
Begin Testing...
[Epoch 64] train avg loss 0.00740685, dev acc 0.7612, dev avg loss 0.496414, throughput 2.83267K wps
Observed Improvement.
Begin Testing...
[Epoch 65 Batch 30/173] avg loss 0.00757778, throughput 2.88394K wps
[Epoch 65 Batch 60/173] avg loss 0.00705095, throughput 2.87832K wps
[Epoch 65 Batch 90/173] avg loss 0.00726517, throughput 2.87459K wps
[Epoch 65 Batch 120/173] avg loss 0.00721143, throughput 2.86947K wps
[Epoch 65 Batch 150/173] avg loss 0.00710216, throughput 2.84778K wps
Begin Testing...
[Epoch 65] train avg loss 0.00724954, dev acc 0.7612, dev avg loss 0.501041, throughput 2.87033K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/173] avg loss 0.00702961, throughput 2.92772K wps
[Epoch 66 Batch 60/173] avg loss 0.00692905, throughput 2.86088K wps
[Epoch 66 Batch 90/173] avg loss 0.00695078, throughput 2.84867K wps
[Epoch 66 Batch 120/173] avg loss 0.00685481, throughput 2.86952K wps
[Epoch 66 Batch 150/173] avg loss 0.00723832, throughput 2.87762K wps
Begin Testing...
[Epoch 66] train avg loss 0.00705549, dev acc 0.7602, dev avg loss 0.501334, throughput 2.87665K wps
[Epoch 67 Batch 30/173] avg loss 0.00698178, throughput 2.86041K wps
[Epoch 67 Batch 60/173] avg loss 0.00684538, throughput 2.79525K wps
[Epoch 67 Batch 90/173] avg loss 0.00683384, throughput 2.7863K wps
[Epoch 67 Batch 120/173] avg loss 0.00654259, throughput 2.81907K wps
[Epoch 67 Batch 150/173] avg loss 0.00688228, throughput 2.80941K wps
Begin Testing...
[Epoch 67] train avg loss 0.00685806, dev acc 0.7633, dev avg loss 0.491034, throughput 2.82071K wps
Observed Improvement.
Begin Testing...
[Epoch 68 Batch 30/173] avg loss 0.00656866, throughput 2.85387K wps
[Epoch 68 Batch 60/173] avg loss 0.00642453, throughput 2.80912K wps
[Epoch 68 Batch 90/173] avg loss 0.0067182, throughput 2.82685K wps
[Epoch 68 Batch 120/173] avg loss 0.00671938, throughput 2.81232K wps
[Epoch 68 Batch 150/173] avg loss 0.00683007, throughput 2.84898K wps
Begin Testing...
[Epoch 68] train avg loss 0.00664053, dev acc 0.7591, dev avg loss 0.486373, throughput 2.83511K wps
[Epoch 69 Batch 30/173] avg loss 0.00639165, throughput 2.90402K wps
[Epoch 69 Batch 60/173] avg loss 0.00642038, throughput 2.85866K wps
[Epoch 69 Batch 90/173] avg loss 0.00663319, throughput 2.85515K wps
[Epoch 69 Batch 120/173] avg loss 0.00647176, throughput 2.81777K wps
[Epoch 69 Batch 150/173] avg loss 0.00662325, throughput 2.86153K wps
Begin Testing...
[Epoch 69] train avg loss 0.0065147, dev acc 0.7643, dev avg loss 0.485193, throughput 2.85538K wps
Observed Improvement.
Begin Testing...
[Epoch 70 Batch 30/173] avg loss 0.00606192, throughput 2.90644K wps
[Epoch 70 Batch 60/173] avg loss 0.00647944, throughput 2.85772K wps
[Epoch 70 Batch 90/173] avg loss 0.00628499, throughput 2.79983K wps
[Epoch 70 Batch 120/173] avg loss 0.0062122, throughput 2.85216K wps
[Epoch 70 Batch 150/173] avg loss 0.00646151, throughput 2.86836K wps
Begin Testing...
[Epoch 70] train avg loss 0.0063219, dev acc 0.7612, dev avg loss 0.484005, throughput 2.85948K wps
[Epoch 71 Batch 30/173] avg loss 0.00608254, throughput 2.92554K wps
[Epoch 71 Batch 60/173] avg loss 0.00624739, throughput 2.81936K wps
[Epoch 71 Batch 90/173] avg loss 0.00628715, throughput 2.79212K wps
[Epoch 71 Batch 120/173] avg loss 0.00600629, throughput 2.8421K wps
[Epoch 71 Batch 150/173] avg loss 0.00576564, throughput 2.84258K wps
Begin Testing...
[Epoch 71] train avg loss 0.00611613, dev acc 0.7612, dev avg loss 0.4831, throughput 2.84424K wps
[Epoch 72 Batch 30/173] avg loss 0.00605358, throughput 2.85228K wps
[Epoch 72 Batch 60/173] avg loss 0.00595848, throughput 2.81193K wps
[Epoch 72 Batch 90/173] avg loss 0.0060039, throughput 2.79088K wps
[Epoch 72 Batch 120/173] avg loss 0.0058261, throughput 2.84442K wps
[Epoch 72 Batch 150/173] avg loss 0.00586891, throughput 2.82497K wps
Begin Testing...
[Epoch 72] train avg loss 0.00601217, dev acc 0.7602, dev avg loss 0.485847, throughput 2.82886K wps
[Epoch 73 Batch 30/173] avg loss 0.00568607, throughput 2.87159K wps
[Epoch 73 Batch 60/173] avg loss 0.00602512, throughput 2.84466K wps
[Epoch 73 Batch 90/173] avg loss 0.00585513, throughput 2.81504K wps
[Epoch 73 Batch 120/173] avg loss 0.00597348, throughput 2.7963K wps
[Epoch 73 Batch 150/173] avg loss 0.00560968, throughput 2.78098K wps
Begin Testing...
[Epoch 73] train avg loss 0.00584022, dev acc 0.7602, dev avg loss 0.487624, throughput 2.8149K wps
[Epoch 74 Batch 30/173] avg loss 0.00588142, throughput 2.82514K wps
[Epoch 74 Batch 60/173] avg loss 0.00548567, throughput 2.81258K wps
[Epoch 74 Batch 90/173] avg loss 0.0056114, throughput 2.8252K wps
[Epoch 74 Batch 120/173] avg loss 0.0057425, throughput 2.83365K wps
[Epoch 74 Batch 150/173] avg loss 0.00569834, throughput 2.85591K wps
Begin Testing...
[Epoch 74] train avg loss 0.00567125, dev acc 0.7602, dev avg loss 0.48121, throughput 2.83416K wps
[Epoch 75 Batch 30/173] avg loss 0.00564009, throughput 2.88215K wps
[Epoch 75 Batch 60/173] avg loss 0.00558412, throughput 2.81902K wps
[Epoch 75 Batch 90/173] avg loss 0.00549746, throughput 2.84618K wps
[Epoch 75 Batch 120/173] avg loss 0.00546291, throughput 2.83179K wps
[Epoch 75 Batch 150/173] avg loss 0.00537892, throughput 2.77719K wps
Begin Testing...
[Epoch 75] train avg loss 0.00548476, dev acc 0.7550, dev avg loss 0.485025, throughput 2.82299K wps
[Epoch 76 Batch 30/173] avg loss 0.00521998, throughput 2.82045K wps
[Epoch 76 Batch 60/173] avg loss 0.00513972, throughput 2.76239K wps
[Epoch 76 Batch 90/173] avg loss 0.00532332, throughput 2.78006K wps
[Epoch 76 Batch 120/173] avg loss 0.00549032, throughput 2.84295K wps
[Epoch 76 Batch 150/173] avg loss 0.00519219, throughput 2.86116K wps
Begin Testing...
[Epoch 76] train avg loss 0.00529749, dev acc 0.7581, dev avg loss 0.47859, throughput 2.82015K wps
[Epoch 77 Batch 30/173] avg loss 0.00499868, throughput 2.92762K wps
[Epoch 77 Batch 60/173] avg loss 0.00506064, throughput 2.85425K wps
[Epoch 77 Batch 90/173] avg loss 0.00498066, throughput 2.86459K wps
[Epoch 77 Batch 120/173] avg loss 0.00521136, throughput 2.87508K wps
[Epoch 77 Batch 150/173] avg loss 0.00578578, throughput 2.87881K wps
Begin Testing...
[Epoch 77] train avg loss 0.0052061, dev acc 0.7518, dev avg loss 0.492599, throughput 2.87698K wps
[Epoch 78 Batch 30/173] avg loss 0.00472467, throughput 2.91587K wps
[Epoch 78 Batch 60/173] avg loss 0.00498265, throughput 2.86594K wps
[Epoch 78 Batch 90/173] avg loss 0.00517451, throughput 2.87167K wps
[Epoch 78 Batch 120/173] avg loss 0.00505111, throughput 2.87061K wps
[Epoch 78 Batch 150/173] avg loss 0.00539421, throughput 2.83039K wps
Begin Testing...
[Epoch 78] train avg loss 0.00506356, dev acc 0.7654, dev avg loss 0.47702, throughput 2.8606K wps
Observed Improvement.
Begin Testing...
[Epoch 79 Batch 30/173] avg loss 0.00503276, throughput 2.89904K wps
[Epoch 79 Batch 60/173] avg loss 0.00474867, throughput 2.85855K wps
[Epoch 79 Batch 90/173] avg loss 0.00471235, throughput 2.83107K wps
[Epoch 79 Batch 120/173] avg loss 0.00519423, throughput 2.82202K wps
[Epoch 79 Batch 150/173] avg loss 0.00478418, throughput 2.87455K wps
Begin Testing...
[Epoch 79] train avg loss 0.0048855, dev acc 0.7716, dev avg loss 0.477991, throughput 2.85869K wps
Observed Improvement.
Begin Testing...
[Epoch 80 Batch 30/173] avg loss 0.00475975, throughput 2.89193K wps
[Epoch 80 Batch 60/173] avg loss 0.00467823, throughput 2.84956K wps
[Epoch 80 Batch 90/173] avg loss 0.00486739, throughput 2.85042K wps
[Epoch 80 Batch 120/173] avg loss 0.00460199, throughput 2.85358K wps
[Epoch 80 Batch 150/173] avg loss 0.0044568, throughput 2.87602K wps
Begin Testing...
[Epoch 80] train avg loss 0.00469153, dev acc 0.7685, dev avg loss 0.476595, throughput 2.86524K wps
[Epoch 81 Batch 30/173] avg loss 0.00443246, throughput 2.92861K wps
[Epoch 81 Batch 60/173] avg loss 0.00459676, throughput 2.83186K wps
[Epoch 81 Batch 90/173] avg loss 0.00459568, throughput 2.87875K wps
[Epoch 81 Batch 120/173] avg loss 0.00478579, throughput 2.86807K wps
[Epoch 81 Batch 150/173] avg loss 0.00439786, throughput 2.87455K wps
Begin Testing...
[Epoch 81] train avg loss 0.0045956, dev acc 0.7570, dev avg loss 0.480488, throughput 2.87586K wps
[Epoch 82 Batch 30/173] avg loss 0.00439132, throughput 2.93318K wps
[Epoch 82 Batch 60/173] avg loss 0.00456928, throughput 2.87387K wps
[Epoch 82 Batch 90/173] avg loss 0.0043896, throughput 2.87182K wps
[Epoch 82 Batch 120/173] avg loss 0.00442254, throughput 2.85561K wps
[Epoch 82 Batch 150/173] avg loss 0.0044913, throughput 2.80784K wps
Begin Testing...
[Epoch 82] train avg loss 0.00448444, dev acc 0.7716, dev avg loss 0.479281, throughput 2.86292K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/173] avg loss 0.00441654, throughput 2.92794K wps
[Epoch 83 Batch 60/173] avg loss 0.00395316, throughput 2.84793K wps
[Epoch 83 Batch 90/173] avg loss 0.00428407, throughput 2.84613K wps
[Epoch 83 Batch 120/173] avg loss 0.00418587, throughput 2.87509K wps
[Epoch 83 Batch 150/173] avg loss 0.00456569, throughput 2.86275K wps
Begin Testing...
[Epoch 83] train avg loss 0.00430503, dev acc 0.7737, dev avg loss 0.479896, throughput 2.86844K wps
Observed Improvement.
Begin Testing...
[Epoch 84 Batch 30/173] avg loss 0.00425297, throughput 2.9277K wps
[Epoch 84 Batch 60/173] avg loss 0.00405887, throughput 2.86625K wps
[Epoch 84 Batch 90/173] avg loss 0.00415083, throughput 2.87417K wps
[Epoch 84 Batch 120/173] avg loss 0.00429959, throughput 2.86078K wps
[Epoch 84 Batch 150/173] avg loss 0.00407318, throughput 2.84188K wps
Begin Testing...
[Epoch 84] train avg loss 0.00414768, dev acc 0.7685, dev avg loss 0.483577, throughput 2.87084K wps
[Epoch 85 Batch 30/173] avg loss 0.00400147, throughput 2.9379K wps
[Epoch 85 Batch 60/173] avg loss 0.00400624, throughput 2.87111K wps
[Epoch 85 Batch 90/173] avg loss 0.00416031, throughput 2.8541K wps
[Epoch 85 Batch 120/173] avg loss 0.00413827, throughput 2.84004K wps
[Epoch 85 Batch 150/173] avg loss 0.00425507, throughput 2.79799K wps
Begin Testing...
[Epoch 85] train avg loss 0.00408915, dev acc 0.7758, dev avg loss 0.481987, throughput 2.85252K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/173] avg loss 0.00373217, throughput 2.89515K wps
[Epoch 86 Batch 60/173] avg loss 0.00404276, throughput 2.85092K wps
[Epoch 86 Batch 90/173] avg loss 0.00391421, throughput 2.82167K wps
[Epoch 86 Batch 120/173] avg loss 0.00374855, throughput 2.87471K wps
[Epoch 86 Batch 150/173] avg loss 0.00406402, throughput 2.8744K wps
Begin Testing...
[Epoch 86] train avg loss 0.00391905, dev acc 0.7685, dev avg loss 0.480607, throughput 2.86258K wps
[Epoch 87 Batch 30/173] avg loss 0.00365584, throughput 2.91241K wps
[Epoch 87 Batch 60/173] avg loss 0.00367721, throughput 2.80761K wps
[Epoch 87 Batch 90/173] avg loss 0.00388825, throughput 2.82004K wps
[Epoch 87 Batch 120/173] avg loss 0.00371824, throughput 2.83496K wps
[Epoch 87 Batch 150/173] avg loss 0.00377513, throughput 2.80365K wps
Begin Testing...
[Epoch 87] train avg loss 0.00377968, dev acc 0.7623, dev avg loss 0.486025, throughput 2.83931K wps
[Epoch 88 Batch 30/173] avg loss 0.00379235, throughput 2.88318K wps
[Epoch 88 Batch 60/173] avg loss 0.00344702, throughput 2.8647K wps
[Epoch 88 Batch 90/173] avg loss 0.0037828, throughput 2.83893K wps
[Epoch 88 Batch 120/173] avg loss 0.00361686, throughput 2.83435K wps
[Epoch 88 Batch 150/173] avg loss 0.00374892, throughput 2.87438K wps
Begin Testing...
[Epoch 88] train avg loss 0.00366386, dev acc 0.7810, dev avg loss 0.486913, throughput 2.85865K wps
Observed Improvement.
Begin Testing...
[Epoch 89 Batch 30/173] avg loss 0.00360929, throughput 2.88374K wps
[Epoch 89 Batch 60/173] avg loss 0.0036701, throughput 2.81478K wps
[Epoch 89 Batch 90/173] avg loss 0.00356796, throughput 2.79235K wps
[Epoch 89 Batch 120/173] avg loss 0.00382088, throughput 2.81095K wps
[Epoch 89 Batch 150/173] avg loss 0.00345135, throughput 2.87548K wps
Begin Testing...
[Epoch 89] train avg loss 0.00363384, dev acc 0.7789, dev avg loss 0.483989, throughput 2.84055K wps
[Epoch 90 Batch 30/173] avg loss 0.00351617, throughput 2.92064K wps
[Epoch 90 Batch 60/173] avg loss 0.0033299, throughput 2.78159K wps
[Epoch 90 Batch 90/173] avg loss 0.00367888, throughput 2.83433K wps
[Epoch 90 Batch 120/173] avg loss 0.00340261, throughput 2.85825K wps
[Epoch 90 Batch 150/173] avg loss 0.00333494, throughput 2.85879K wps
Begin Testing...
[Epoch 90] train avg loss 0.00344869, dev acc 0.7821, dev avg loss 0.486317, throughput 2.85302K wps
Observed Improvement.
Begin Testing...
[Epoch 91 Batch 30/173] avg loss 0.00331104, throughput 2.94054K wps
[Epoch 91 Batch 60/173] avg loss 0.00337104, throughput 2.87366K wps
[Epoch 91 Batch 90/173] avg loss 0.00324071, throughput 2.83979K wps
[Epoch 91 Batch 120/173] avg loss 0.00362212, throughput 2.77223K wps
[Epoch 91 Batch 150/173] avg loss 0.00323799, throughput 2.82357K wps
Begin Testing...
[Epoch 91] train avg loss 0.00334817, dev acc 0.7727, dev avg loss 0.489951, throughput 2.85249K wps
[Epoch 92 Batch 30/173] avg loss 0.00313825, throughput 2.8934K wps
[Epoch 92 Batch 60/173] avg loss 0.00312653, throughput 2.80156K wps
[Epoch 92 Batch 90/173] avg loss 0.00316488, throughput 2.87589K wps
[Epoch 92 Batch 120/173] avg loss 0.00310605, throughput 2.85773K wps
[Epoch 92 Batch 150/173] avg loss 0.00323249, throughput 2.85437K wps
Begin Testing...
[Epoch 92] train avg loss 0.00319052, dev acc 0.7769, dev avg loss 0.490764, throughput 2.85824K wps
[Epoch 93 Batch 30/173] avg loss 0.00302491, throughput 2.90091K wps
[Epoch 93 Batch 60/173] avg loss 0.00318061, throughput 2.8573K wps
[Epoch 93 Batch 90/173] avg loss 0.00304692, throughput 2.84879K wps
[Epoch 93 Batch 120/173] avg loss 0.00315468, throughput 2.86477K wps
[Epoch 93 Batch 150/173] avg loss 0.00305979, throughput 2.8752K wps
Begin Testing...
[Epoch 93] train avg loss 0.00311912, dev acc 0.7800, dev avg loss 0.49295, throughput 2.86765K wps
[Epoch 94 Batch 30/173] avg loss 0.00309168, throughput 2.93799K wps
[Epoch 94 Batch 60/173] avg loss 0.00314746, throughput 2.86709K wps
[Epoch 94 Batch 90/173] avg loss 0.00303398, throughput 2.82464K wps
[Epoch 94 Batch 120/173] avg loss 0.00315896, throughput 2.87727K wps
[Epoch 94 Batch 150/173] avg loss 0.00289876, throughput 2.83245K wps
Begin Testing...
[Epoch 94] train avg loss 0.00307985, dev acc 0.7696, dev avg loss 0.492623, throughput 2.85928K wps
[Epoch 95 Batch 30/173] avg loss 0.00282417, throughput 2.88192K wps
[Epoch 95 Batch 60/173] avg loss 0.00293875, throughput 2.81319K wps
[Epoch 95 Batch 90/173] avg loss 0.00279507, throughput 2.84861K wps
[Epoch 95 Batch 120/173] avg loss 0.00303026, throughput 2.87405K wps
[Epoch 95 Batch 150/173] avg loss 0.00281864, throughput 2.86039K wps
Begin Testing...
[Epoch 95] train avg loss 0.00287555, dev acc 0.7633, dev avg loss 0.493491, throughput 2.85678K wps
[Epoch 96 Batch 30/173] avg loss 0.00267692, throughput 2.93619K wps
[Epoch 96 Batch 60/173] avg loss 0.00278509, throughput 2.84719K wps
[Epoch 96 Batch 90/173] avg loss 0.00265713, throughput 2.87394K wps
[Epoch 96 Batch 120/173] avg loss 0.00295929, throughput 2.87529K wps
[Epoch 96 Batch 150/173] avg loss 0.00282857, throughput 2.82056K wps
Begin Testing...
[Epoch 96] train avg loss 0.00282971, dev acc 0.7675, dev avg loss 0.506722, throughput 2.8666K wps
[Epoch 97 Batch 30/173] avg loss 0.00268872, throughput 2.91686K wps
[Epoch 97 Batch 60/173] avg loss 0.00261931, throughput 2.84907K wps
[Epoch 97 Batch 90/173] avg loss 0.00278922, throughput 2.866K wps
[Epoch 97 Batch 120/173] avg loss 0.00283465, throughput 2.82718K wps
[Epoch 97 Batch 150/173] avg loss 0.00284164, throughput 2.866K wps
Begin Testing...
[Epoch 97] train avg loss 0.002764, dev acc 0.7810, dev avg loss 0.496995, throughput 2.86562K wps
[Epoch 98 Batch 30/173] avg loss 0.00262232, throughput 2.90468K wps
[Epoch 98 Batch 60/173] avg loss 0.00277685, throughput 2.85348K wps
[Epoch 98 Batch 90/173] avg loss 0.00260812, throughput 2.83099K wps
[Epoch 98 Batch 120/173] avg loss 0.00256258, throughput 2.79935K wps
[Epoch 98 Batch 150/173] avg loss 0.00256276, throughput 2.80841K wps
Begin Testing...
[Epoch 98] train avg loss 0.00263436, dev acc 0.7758, dev avg loss 0.501531, throughput 2.8436K wps
[Epoch 99 Batch 30/173] avg loss 0.00239553, throughput 2.90969K wps
[Epoch 99 Batch 60/173] avg loss 0.0025825, throughput 2.85234K wps
[Epoch 99 Batch 90/173] avg loss 0.00258089, throughput 2.85675K wps
[Epoch 99 Batch 120/173] avg loss 0.00251916, throughput 2.82083K wps
[Epoch 99 Batch 150/173] avg loss 0.00266175, throughput 2.84255K wps
Begin Testing...
[Epoch 99] train avg loss 0.00257661, dev acc 0.7737, dev avg loss 0.509817, throughput 2.85615K wps
[Epoch 100 Batch 30/173] avg loss 0.00242274, throughput 2.92441K wps
[Epoch 100 Batch 60/173] avg loss 0.00253684, throughput 2.86865K wps
[Epoch 100 Batch 90/173] avg loss 0.00258158, throughput 2.85539K wps
[Epoch 100 Batch 120/173] avg loss 0.00230409, throughput 2.86634K wps
[Epoch 100 Batch 150/173] avg loss 0.00242211, throughput 2.87119K wps
Begin Testing...
[Epoch 100] train avg loss 0.0024785, dev acc 0.7831, dev avg loss 0.502638, throughput 2.87302K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/173] avg loss 0.00253027, throughput 2.90366K wps
[Epoch 101 Batch 60/173] avg loss 0.00242415, throughput 2.82758K wps
[Epoch 101 Batch 90/173] avg loss 0.00251645, throughput 2.86849K wps
[Epoch 101 Batch 120/173] avg loss 0.00242963, throughput 2.86678K wps
[Epoch 101 Batch 150/173] avg loss 0.00227806, throughput 2.82675K wps
Begin Testing...
[Epoch 101] train avg loss 0.00243098, dev acc 0.7737, dev avg loss 0.505213, throughput 2.86082K wps
[Epoch 102 Batch 30/173] avg loss 0.0022233, throughput 2.93862K wps
[Epoch 102 Batch 60/173] avg loss 0.00232013, throughput 2.86643K wps
[Epoch 102 Batch 90/173] avg loss 0.00246824, throughput 2.86532K wps
[Epoch 102 Batch 120/173] avg loss 0.00240448, throughput 2.84189K wps
[Epoch 102 Batch 150/173] avg loss 0.00235744, throughput 2.8497K wps
Begin Testing...
[Epoch 102] train avg loss 0.00236143, dev acc 0.7706, dev avg loss 0.507274, throughput 2.86791K wps
[Epoch 103 Batch 30/173] avg loss 0.00223827, throughput 2.87873K wps
[Epoch 103 Batch 60/173] avg loss 0.00231628, throughput 2.83358K wps
[Epoch 103 Batch 90/173] avg loss 0.00217799, throughput 2.78783K wps
[Epoch 103 Batch 120/173] avg loss 0.00228732, throughput 2.81048K wps
[Epoch 103 Batch 150/173] avg loss 0.00229391, throughput 2.79536K wps
Begin Testing...
[Epoch 103] train avg loss 0.00226993, dev acc 0.7789, dev avg loss 0.509758, throughput 2.82213K wps
[Epoch 104 Batch 30/173] avg loss 0.00199615, throughput 2.87001K wps
[Epoch 104 Batch 60/173] avg loss 0.00214359, throughput 2.87586K wps
[Epoch 104 Batch 90/173] avg loss 0.00213974, throughput 2.82106K wps
[Epoch 104 Batch 120/173] avg loss 0.00219728, throughput 2.84641K wps
[Epoch 104 Batch 150/173] avg loss 0.00247808, throughput 2.86841K wps
Begin Testing...
[Epoch 104] train avg loss 0.00221463, dev acc 0.7758, dev avg loss 0.513239, throughput 2.85804K wps
[Epoch 105 Batch 30/173] avg loss 0.00220942, throughput 2.85121K wps
[Epoch 105 Batch 60/173] avg loss 0.00201673, throughput 2.86156K wps
[Epoch 105 Batch 90/173] avg loss 0.00205683, throughput 2.84639K wps
[Epoch 105 Batch 120/173] avg loss 0.00206979, throughput 2.86762K wps
[Epoch 105 Batch 150/173] avg loss 0.00232809, throughput 2.87184K wps
Begin Testing...
[Epoch 105] train avg loss 0.00213316, dev acc 0.7727, dev avg loss 0.521577, throughput 2.85882K wps
[Epoch 106 Batch 30/173] avg loss 0.00209779, throughput 2.935K wps
[Epoch 106 Batch 60/173] avg loss 0.00219789, throughput 2.85296K wps
[Epoch 106 Batch 90/173] avg loss 0.00196158, throughput 2.78994K wps
[Epoch 106 Batch 120/173] avg loss 0.00201087, throughput 2.84821K wps
[Epoch 106 Batch 150/173] avg loss 0.00206373, throughput 2.85572K wps
Begin Testing...
[Epoch 106] train avg loss 0.0020722, dev acc 0.7685, dev avg loss 0.531851, throughput 2.85821K wps
[Epoch 107 Batch 30/173] avg loss 0.00201423, throughput 2.88559K wps
[Epoch 107 Batch 60/173] avg loss 0.00195168, throughput 2.85626K wps
[Epoch 107 Batch 90/173] avg loss 0.00218915, throughput 2.87924K wps
[Epoch 107 Batch 120/173] avg loss 0.00186709, throughput 2.81945K wps
[Epoch 107 Batch 150/173] avg loss 0.00204496, throughput 2.84209K wps
Begin Testing...
[Epoch 107] train avg loss 0.00202097, dev acc 0.7664, dev avg loss 0.536904, throughput 2.85801K wps
[Epoch 108 Batch 30/173] avg loss 0.00197473, throughput 2.88551K wps
[Epoch 108 Batch 60/173] avg loss 0.0019554, throughput 2.80244K wps
[Epoch 108 Batch 90/173] avg loss 0.00187284, throughput 2.83978K wps
[Epoch 108 Batch 120/173] avg loss 0.00185818, throughput 2.87233K wps
[Epoch 108 Batch 150/173] avg loss 0.00195137, throughput 2.87199K wps
Begin Testing...
[Epoch 108] train avg loss 0.00192904, dev acc 0.7748, dev avg loss 0.521705, throughput 2.85627K wps
[Epoch 109 Batch 30/173] avg loss 0.00185068, throughput 2.88679K wps
[Epoch 109 Batch 60/173] avg loss 0.00191885, throughput 2.83782K wps
[Epoch 109 Batch 90/173] avg loss 0.00190935, throughput 2.8743K wps
[Epoch 109 Batch 120/173] avg loss 0.0018681, throughput 2.82119K wps
[Epoch 109 Batch 150/173] avg loss 0.00203438, throughput 2.83688K wps
Begin Testing...
[Epoch 109] train avg loss 0.0019159, dev acc 0.7789, dev avg loss 0.522547, throughput 2.84823K wps
[Epoch 110 Batch 30/173] avg loss 0.00174273, throughput 2.89512K wps
[Epoch 110 Batch 60/173] avg loss 0.00187691, throughput 2.85512K wps
[Epoch 110 Batch 90/173] avg loss 0.00176391, throughput 2.85022K wps
[Epoch 110 Batch 120/173] avg loss 0.00195703, throughput 2.86611K wps
[Epoch 110 Batch 150/173] avg loss 0.00165098, throughput 2.8675K wps
Begin Testing...
[Epoch 110] train avg loss 0.00178537, dev acc 0.7789, dev avg loss 0.526372, throughput 2.85688K wps
[Epoch 111 Batch 30/173] avg loss 0.00175293, throughput 2.91994K wps
[Epoch 111 Batch 60/173] avg loss 0.00173247, throughput 2.87876K wps
[Epoch 111 Batch 90/173] avg loss 0.00172629, throughput 2.84804K wps
[Epoch 111 Batch 120/173] avg loss 0.00191078, throughput 2.86329K wps
[Epoch 111 Batch 150/173] avg loss 0.00175501, throughput 2.85432K wps
Begin Testing...
[Epoch 111] train avg loss 0.00176655, dev acc 0.7727, dev avg loss 0.52871, throughput 2.87014K wps
[Epoch 112 Batch 30/173] avg loss 0.00169559, throughput 2.85705K wps
[Epoch 112 Batch 60/173] avg loss 0.00166603, throughput 2.78282K wps
[Epoch 112 Batch 90/173] avg loss 0.00171991, throughput 2.87428K wps
[Epoch 112 Batch 120/173] avg loss 0.00175862, throughput 2.87448K wps
[Epoch 112 Batch 150/173] avg loss 0.00173345, throughput 2.87026K wps
Begin Testing...
[Epoch 112] train avg loss 0.00170439, dev acc 0.7769, dev avg loss 0.528991, throughput 2.85242K wps
[Epoch 113 Batch 30/173] avg loss 0.00165558, throughput 2.9251K wps
[Epoch 113 Batch 60/173] avg loss 0.00159226, throughput 2.8575K wps
[Epoch 113 Batch 90/173] avg loss 0.00170415, throughput 2.86722K wps
[Epoch 113 Batch 120/173] avg loss 0.00169144, throughput 2.8683K wps
[Epoch 113 Batch 150/173] avg loss 0.00168289, throughput 2.86212K wps
Begin Testing...
[Epoch 113] train avg loss 0.00166705, dev acc 0.7727, dev avg loss 0.532223, throughput 2.87514K wps
[Epoch 114 Batch 30/173] avg loss 0.00168487, throughput 2.85623K wps
[Epoch 114 Batch 60/173] avg loss 0.00166532, throughput 2.81812K wps
[Epoch 114 Batch 90/173] avg loss 0.00158042, throughput 2.86542K wps
[Epoch 114 Batch 120/173] avg loss 0.001663, throughput 2.85943K wps
[Epoch 114 Batch 150/173] avg loss 0.0015798, throughput 2.81874K wps
Begin Testing...
[Epoch 114] train avg loss 0.00162736, dev acc 0.7591, dev avg loss 0.545488, throughput 2.84811K wps
[Epoch 115 Batch 30/173] avg loss 0.00156013, throughput 2.93506K wps
[Epoch 115 Batch 60/173] avg loss 0.00149659, throughput 2.86796K wps
[Epoch 115 Batch 90/173] avg loss 0.00151983, throughput 2.86641K wps
[Epoch 115 Batch 120/173] avg loss 0.00149692, throughput 2.87497K wps
[Epoch 115 Batch 150/173] avg loss 0.00164167, throughput 2.87572K wps
Begin Testing...
[Epoch 115] train avg loss 0.00154009, dev acc 0.7696, dev avg loss 0.545422, throughput 2.88144K wps
[Epoch 116 Batch 30/173] avg loss 0.00156664, throughput 2.8894K wps
[Epoch 116 Batch 60/173] avg loss 0.00142571, throughput 2.87369K wps
[Epoch 116 Batch 90/173] avg loss 0.00155445, throughput 2.79495K wps
[Epoch 116 Batch 120/173] avg loss 0.00164564, throughput 2.86556K wps
[Epoch 116 Batch 150/173] avg loss 0.00149176, throughput 2.87085K wps
Begin Testing...
[Epoch 116] train avg loss 0.00153647, dev acc 0.7591, dev avg loss 0.559662, throughput 2.85998K wps
[Epoch 117 Batch 30/173] avg loss 0.001533, throughput 2.84337K wps
[Epoch 117 Batch 60/173] avg loss 0.00143696, throughput 2.82402K wps
[Epoch 117 Batch 90/173] avg loss 0.00155137, throughput 2.8492K wps
[Epoch 117 Batch 120/173] avg loss 0.00150253, throughput 2.85619K wps
[Epoch 117 Batch 150/173] avg loss 0.00148142, throughput 2.86881K wps
Begin Testing...
[Epoch 117] train avg loss 0.00149383, dev acc 0.7664, dev avg loss 0.553024, throughput 2.85191K wps
[Epoch 118 Batch 30/173] avg loss 0.00140044, throughput 2.89085K wps
[Epoch 118 Batch 60/173] avg loss 0.00143102, throughput 2.8699K wps
[Epoch 118 Batch 90/173] avg loss 0.00141013, throughput 2.87193K wps
[Epoch 118 Batch 120/173] avg loss 0.00155062, throughput 2.84664K wps
[Epoch 118 Batch 150/173] avg loss 0.00155593, throughput 2.84793K wps
Begin Testing...
[Epoch 118] train avg loss 0.00146497, dev acc 0.7727, dev avg loss 0.547202, throughput 2.86573K wps
[Epoch 119 Batch 30/173] avg loss 0.00146048, throughput 2.94413K wps
[Epoch 119 Batch 60/173] avg loss 0.00141571, throughput 2.82835K wps
[Epoch 119 Batch 90/173] avg loss 0.00144074, throughput 2.85961K wps
[Epoch 119 Batch 120/173] avg loss 0.00141704, throughput 2.85112K wps
[Epoch 119 Batch 150/173] avg loss 0.00144568, throughput 2.8337K wps
Begin Testing...
[Epoch 119] train avg loss 0.00142987, dev acc 0.7675, dev avg loss 0.55453, throughput 2.85836K wps
[Epoch 120 Batch 30/173] avg loss 0.00138572, throughput 2.86967K wps
[Epoch 120 Batch 60/173] avg loss 0.0013167, throughput 2.81746K wps
[Epoch 120 Batch 90/173] avg loss 0.00135784, throughput 2.85281K wps
[Epoch 120 Batch 120/173] avg loss 0.00137723, throughput 2.87619K wps
[Epoch 120 Batch 150/173] avg loss 0.00134223, throughput 2.87245K wps
Begin Testing...
[Epoch 120] train avg loss 0.00136456, dev acc 0.7664, dev avg loss 0.554785, throughput 2.8507K wps
[Epoch 121 Batch 30/173] avg loss 0.00137248, throughput 2.85869K wps
[Epoch 121 Batch 60/173] avg loss 0.0013531, throughput 2.79336K wps
[Epoch 121 Batch 90/173] avg loss 0.00127649, throughput 2.79178K wps
[Epoch 121 Batch 120/173] avg loss 0.00116395, throughput 2.80092K wps
[Epoch 121 Batch 150/173] avg loss 0.0013055, throughput 2.86758K wps
Begin Testing...
[Epoch 121] train avg loss 0.00130149, dev acc 0.7727, dev avg loss 0.555629, throughput 2.82724K wps
[Epoch 122 Batch 30/173] avg loss 0.00114822, throughput 2.88721K wps
[Epoch 122 Batch 60/173] avg loss 0.00136883, throughput 2.87739K wps
[Epoch 122 Batch 90/173] avg loss 0.00129333, throughput 2.87876K wps
[Epoch 122 Batch 120/173] avg loss 0.00130951, throughput 2.87232K wps
[Epoch 122 Batch 150/173] avg loss 0.00125829, throughput 2.86997K wps
Begin Testing...
[Epoch 122] train avg loss 0.00129155, dev acc 0.7706, dev avg loss 0.566128, throughput 2.87227K wps
[Epoch 123 Batch 30/173] avg loss 0.00126724, throughput 2.92497K wps
[Epoch 123 Batch 60/173] avg loss 0.0011657, throughput 2.85299K wps
[Epoch 123 Batch 90/173] avg loss 0.00137064, throughput 2.8337K wps
[Epoch 123 Batch 120/173] avg loss 0.00119956, throughput 2.7984K wps
[Epoch 123 Batch 150/173] avg loss 0.0012459, throughput 2.85472K wps
Begin Testing...
[Epoch 123] train avg loss 0.0012438, dev acc 0.7654, dev avg loss 0.562627, throughput 2.85278K wps
[Epoch 124 Batch 30/173] avg loss 0.00125663, throughput 2.89203K wps
[Epoch 124 Batch 60/173] avg loss 0.00115118, throughput 2.85616K wps
[Epoch 124 Batch 90/173] avg loss 0.0012744, throughput 2.82447K wps
[Epoch 124 Batch 120/173] avg loss 0.00120719, throughput 2.84248K wps
[Epoch 124 Batch 150/173] avg loss 0.00122328, throughput 2.8081K wps
Begin Testing...
[Epoch 124] train avg loss 0.00122724, dev acc 0.7675, dev avg loss 0.566133, throughput 2.84482K wps
[Epoch 125 Batch 30/173] avg loss 0.00115535, throughput 2.92857K wps
[Epoch 125 Batch 60/173] avg loss 0.00118877, throughput 2.86747K wps
[Epoch 125 Batch 90/173] avg loss 0.00121195, throughput 2.86357K wps
[Epoch 125 Batch 120/173] avg loss 0.00120073, throughput 2.85563K wps
[Epoch 125 Batch 150/173] avg loss 0.00120977, throughput 2.86834K wps
Begin Testing...
[Epoch 125] train avg loss 0.00119441, dev acc 0.7696, dev avg loss 0.572289, throughput 2.87691K wps
[Epoch 126 Batch 30/173] avg loss 0.00110072, throughput 2.94045K wps
[Epoch 126 Batch 60/173] avg loss 0.00122275, throughput 2.80056K wps
[Epoch 126 Batch 90/173] avg loss 0.00111423, throughput 2.82665K wps
[Epoch 126 Batch 120/173] avg loss 0.00114325, throughput 2.78012K wps
[Epoch 126 Batch 150/173] avg loss 0.00124566, throughput 2.85274K wps
Begin Testing...
[Epoch 126] train avg loss 0.00117838, dev acc 0.7696, dev avg loss 0.57035, throughput 2.84375K wps
[Epoch 127 Batch 30/173] avg loss 0.00105938, throughput 2.94276K wps
[Epoch 127 Batch 60/173] avg loss 0.00113956, throughput 2.8832K wps
[Epoch 127 Batch 90/173] avg loss 0.00115905, throughput 2.86111K wps
[Epoch 127 Batch 120/173] avg loss 0.00118644, throughput 2.83649K wps
[Epoch 127 Batch 150/173] avg loss 0.00111052, throughput 2.85813K wps
Begin Testing...
[Epoch 127] train avg loss 0.00112613, dev acc 0.7675, dev avg loss 0.569932, throughput 2.87434K wps
[Epoch 128 Batch 30/173] avg loss 0.00102061, throughput 2.86481K wps
[Epoch 128 Batch 60/173] avg loss 0.00109859, throughput 2.82992K wps
[Epoch 128 Batch 90/173] avg loss 0.000980766, throughput 2.8468K wps
[Epoch 128 Batch 120/173] avg loss 0.000996565, throughput 2.82307K wps
[Epoch 128 Batch 150/173] avg loss 0.00118546, throughput 2.80529K wps
Begin Testing...
[Epoch 128] train avg loss 0.0010665, dev acc 0.7696, dev avg loss 0.587727, throughput 2.8286K wps
[Epoch 129 Batch 30/173] avg loss 0.00112301, throughput 2.88446K wps
[Epoch 129 Batch 60/173] avg loss 0.00107454, throughput 2.81908K wps
[Epoch 129 Batch 90/173] avg loss 0.00104919, throughput 2.81295K wps
[Epoch 129 Batch 120/173] avg loss 0.00109359, throughput 2.80353K wps
[Epoch 129 Batch 150/173] avg loss 0.00112029, throughput 2.86769K wps
Begin Testing...
[Epoch 129] train avg loss 0.0010939, dev acc 0.7685, dev avg loss 0.580141, throughput 2.83943K wps
[Epoch 130 Batch 30/173] avg loss 0.00107824, throughput 2.88599K wps
[Epoch 130 Batch 60/173] avg loss 0.00102088, throughput 2.8752K wps
[Epoch 130 Batch 90/173] avg loss 0.00104794, throughput 2.87308K wps
[Epoch 130 Batch 120/173] avg loss 0.00101569, throughput 2.87722K wps
[Epoch 130 Batch 150/173] avg loss 0.00100769, throughput 2.83627K wps
Begin Testing...
[Epoch 130] train avg loss 0.00104029, dev acc 0.7643, dev avg loss 0.586527, throughput 2.85388K wps
[Epoch 131 Batch 30/173] avg loss 0.00105069, throughput 2.92753K wps
[Epoch 131 Batch 60/173] avg loss 0.000970352, throughput 2.86453K wps
[Epoch 131 Batch 90/173] avg loss 0.000969441, throughput 2.81743K wps
[Epoch 131 Batch 120/173] avg loss 0.000981272, throughput 2.83136K wps
[Epoch 131 Batch 150/173] avg loss 0.00108516, throughput 2.85906K wps
Begin Testing...
[Epoch 131] train avg loss 0.00101653, dev acc 0.7654, dev avg loss 0.586792, throughput 2.85235K wps
[Epoch 132 Batch 30/173] avg loss 0.000975704, throughput 2.92785K wps
[Epoch 132 Batch 60/173] avg loss 0.00107232, throughput 2.86272K wps
[Epoch 132 Batch 90/173] avg loss 0.00107018, throughput 2.86714K wps
[Epoch 132 Batch 120/173] avg loss 0.000954982, throughput 2.87338K wps
[Epoch 132 Batch 150/173] avg loss 0.000900522, throughput 2.83869K wps
Begin Testing...
[Epoch 132] train avg loss 0.00101018, dev acc 0.7654, dev avg loss 0.59512, throughput 2.86232K wps
[Epoch 133 Batch 30/173] avg loss 0.000894993, throughput 2.92496K wps
[Epoch 133 Batch 60/173] avg loss 0.000956559, throughput 2.84192K wps
[Epoch 133 Batch 90/173] avg loss 0.00100755, throughput 2.86147K wps
[Epoch 133 Batch 120/173] avg loss 0.000919514, throughput 2.87756K wps
[Epoch 133 Batch 150/173] avg loss 0.00100639, throughput 2.87109K wps
Begin Testing...
[Epoch 133] train avg loss 0.000965343, dev acc 0.7696, dev avg loss 0.592901, throughput 2.86977K wps
[Epoch 134 Batch 30/173] avg loss 0.000884516, throughput 2.84456K wps
[Epoch 134 Batch 60/173] avg loss 0.00107734, throughput 2.78958K wps
[Epoch 134 Batch 90/173] avg loss 0.00099048, throughput 2.84067K wps
[Epoch 134 Batch 120/173] avg loss 0.000858841, throughput 2.82568K wps
[Epoch 134 Batch 150/173] avg loss 0.000916467, throughput 2.8151K wps
Begin Testing...
[Epoch 134] train avg loss 0.000935102, dev acc 0.7570, dev avg loss 0.591921, throughput 2.8235K wps
[Epoch 135 Batch 30/173] avg loss 0.000894676, throughput 2.9153K wps
[Epoch 135 Batch 60/173] avg loss 0.00104374, throughput 2.86656K wps
[Epoch 135 Batch 90/173] avg loss 0.000989842, throughput 2.87047K wps
[Epoch 135 Batch 120/173] avg loss 0.000900656, throughput 2.87094K wps
[Epoch 135 Batch 150/173] avg loss 0.000924091, throughput 2.85362K wps
Begin Testing...
[Epoch 135] train avg loss 0.000948956, dev acc 0.7623, dev avg loss 0.593907, throughput 2.87373K wps
[Epoch 136 Batch 30/173] avg loss 0.000955801, throughput 2.93035K wps
[Epoch 136 Batch 60/173] avg loss 0.000894278, throughput 2.8333K wps
[Epoch 136 Batch 90/173] avg loss 0.000795358, throughput 2.84237K wps
[Epoch 136 Batch 120/173] avg loss 0.000854242, throughput 2.84049K wps
[Epoch 136 Batch 150/173] avg loss 0.000930459, throughput 2.87702K wps
Begin Testing...
[Epoch 136] train avg loss 0.000894664, dev acc 0.7675, dev avg loss 0.594216, throughput 2.86539K wps
[Epoch 137 Batch 30/173] avg loss 0.000848867, throughput 2.89228K wps
[Epoch 137 Batch 60/173] avg loss 0.000854633, throughput 2.86402K wps
[Epoch 137 Batch 90/173] avg loss 0.000802667, throughput 2.8479K wps
[Epoch 137 Batch 120/173] avg loss 0.000824827, throughput 2.79653K wps
[Epoch 137 Batch 150/173] avg loss 0.000934229, throughput 2.8662K wps
Begin Testing...
[Epoch 137] train avg loss 0.000852209, dev acc 0.7675, dev avg loss 0.596253, throughput 2.85569K wps
[Epoch 138 Batch 30/173] avg loss 0.000826272, throughput 2.87348K wps
[Epoch 138 Batch 60/173] avg loss 0.00087403, throughput 2.85036K wps
[Epoch 138 Batch 90/173] avg loss 0.000824386, throughput 2.86614K wps
[Epoch 138 Batch 120/173] avg loss 0.000912783, throughput 2.80729K wps
[Epoch 138 Batch 150/173] avg loss 0.000827083, throughput 2.7889K wps
Begin Testing...
[Epoch 138] train avg loss 0.000862113, dev acc 0.7664, dev avg loss 0.609523, throughput 2.83862K wps
[Epoch 139 Batch 30/173] avg loss 0.000805043, throughput 2.90076K wps
[Epoch 139 Batch 60/173] avg loss 0.000798858, throughput 2.85506K wps
[Epoch 139 Batch 90/173] avg loss 0.000854541, throughput 2.82612K wps
[Epoch 139 Batch 120/173] avg loss 0.000909513, throughput 2.8579K wps
[Epoch 139 Batch 150/173] avg loss 0.000892242, throughput 2.85764K wps
Begin Testing...
[Epoch 139] train avg loss 0.000848526, dev acc 0.7654, dev avg loss 0.607253, throughput 2.86001K wps
[Epoch 140 Batch 30/173] avg loss 0.000744322, throughput 2.93998K wps
[Epoch 140 Batch 60/173] avg loss 0.00081524, throughput 2.86567K wps
[Epoch 140 Batch 90/173] avg loss 0.000833569, throughput 2.86991K wps
[Epoch 140 Batch 120/173] avg loss 0.000828859, throughput 2.85523K wps
[Epoch 140 Batch 150/173] avg loss 0.000840452, throughput 2.86102K wps
Begin Testing...
[Epoch 140] train avg loss 0.000831236, dev acc 0.7633, dev avg loss 0.60443, throughput 2.87759K wps
[Epoch 141 Batch 30/173] avg loss 0.000897869, throughput 2.90504K wps
[Epoch 141 Batch 60/173] avg loss 0.000764528, throughput 2.87543K wps
[Epoch 141 Batch 90/173] avg loss 0.000819716, throughput 2.8591K wps
[Epoch 141 Batch 120/173] avg loss 0.000834055, throughput 2.87777K wps
[Epoch 141 Batch 150/173] avg loss 0.00082402, throughput 2.86922K wps
Begin Testing...
[Epoch 141] train avg loss 0.000824915, dev acc 0.7643, dev avg loss 0.613727, throughput 2.87075K wps
[Epoch 142 Batch 30/173] avg loss 0.000813155, throughput 2.89984K wps
[Epoch 142 Batch 60/173] avg loss 0.000713325, throughput 2.85796K wps
[Epoch 142 Batch 90/173] avg loss 0.000825001, throughput 2.86903K wps
[Epoch 142 Batch 120/173] avg loss 0.000761778, throughput 2.86857K wps
[Epoch 142 Batch 150/173] avg loss 0.000774664, throughput 2.85346K wps
Begin Testing...
[Epoch 142] train avg loss 0.000780133, dev acc 0.7633, dev avg loss 0.610811, throughput 2.86616K wps
[Epoch 143 Batch 30/173] avg loss 0.00070084, throughput 2.91228K wps
[Epoch 143 Batch 60/173] avg loss 0.00078062, throughput 2.82128K wps
[Epoch 143 Batch 90/173] avg loss 0.000781734, throughput 2.81796K wps
[Epoch 143 Batch 120/173] avg loss 0.000729972, throughput 2.87104K wps
[Epoch 143 Batch 150/173] avg loss 0.000913063, throughput 2.84043K wps
Begin Testing...
[Epoch 143] train avg loss 0.000785611, dev acc 0.7664, dev avg loss 0.613935, throughput 2.8522K wps
[Epoch 144 Batch 30/173] avg loss 0.000773712, throughput 2.87511K wps
[Epoch 144 Batch 60/173] avg loss 0.000845819, throughput 2.8331K wps
[Epoch 144 Batch 90/173] avg loss 0.000770848, throughput 2.86606K wps
[Epoch 144 Batch 120/173] avg loss 0.000764075, throughput 2.86432K wps
[Epoch 144 Batch 150/173] avg loss 0.000718469, throughput 2.80874K wps
Begin Testing...
[Epoch 144] train avg loss 0.000768783, dev acc 0.7623, dev avg loss 0.623494, throughput 2.84738K wps
[Epoch 145 Batch 30/173] avg loss 0.000692462, throughput 2.85124K wps
[Epoch 145 Batch 60/173] avg loss 0.000709, throughput 2.88042K wps
[Epoch 145 Batch 90/173] avg loss 0.000728999, throughput 2.87977K wps
[Epoch 145 Batch 120/173] avg loss 0.000802699, throughput 2.85931K wps
[Epoch 145 Batch 150/173] avg loss 0.000786987, throughput 2.85851K wps
Begin Testing...
[Epoch 145] train avg loss 0.000747382, dev acc 0.7654, dev avg loss 0.617538, throughput 2.85701K wps
[Epoch 146 Batch 30/173] avg loss 0.000765782, throughput 2.85998K wps
[Epoch 146 Batch 60/173] avg loss 0.000689442, throughput 2.82713K wps
[Epoch 146 Batch 90/173] avg loss 0.000701194, throughput 2.872K wps
[Epoch 146 Batch 120/173] avg loss 0.000749368, throughput 2.79446K wps
[Epoch 146 Batch 150/173] avg loss 0.000772803, throughput 2.80203K wps
Begin Testing...
[Epoch 146] train avg loss 0.000741741, dev acc 0.7643, dev avg loss 0.623993, throughput 2.83596K wps
[Epoch 147 Batch 30/173] avg loss 0.000628173, throughput 2.92035K wps
[Epoch 147 Batch 60/173] avg loss 0.000729659, throughput 2.86973K wps
[Epoch 147 Batch 90/173] avg loss 0.000745529, throughput 2.86387K wps
[Epoch 147 Batch 120/173] avg loss 0.000713167, throughput 2.80222K wps
[Epoch 147 Batch 150/173] avg loss 0.000720477, throughput 2.82984K wps
Begin Testing...
[Epoch 147] train avg loss 0.000715608, dev acc 0.7623, dev avg loss 0.615317, throughput 2.857K wps
[Epoch 148 Batch 30/173] avg loss 0.000755427, throughput 2.94011K wps
[Epoch 148 Batch 60/173] avg loss 0.000747685, throughput 2.86987K wps
[Epoch 148 Batch 90/173] avg loss 0.000660337, throughput 2.8717K wps
[Epoch 148 Batch 120/173] avg loss 0.000697841, throughput 2.84265K wps
[Epoch 148 Batch 150/173] avg loss 0.000724346, throughput 2.85463K wps
Begin Testing...
[Epoch 148] train avg loss 0.000721145, dev acc 0.7685, dev avg loss 0.62808, throughput 2.87512K wps
[Epoch 149 Batch 30/173] avg loss 0.000708378, throughput 2.93761K wps
[Epoch 149 Batch 60/173] avg loss 0.000732424, throughput 2.8787K wps
[Epoch 149 Batch 90/173] avg loss 0.000703304, throughput 2.86584K wps
[Epoch 149 Batch 120/173] avg loss 0.000655921, throughput 2.86738K wps
[Epoch 149 Batch 150/173] avg loss 0.000702349, throughput 2.86851K wps
Begin Testing...
[Epoch 149] train avg loss 0.000696956, dev acc 0.7612, dev avg loss 0.625138, throughput 2.8814K wps
[Epoch 150 Batch 30/173] avg loss 0.000646834, throughput 2.83964K wps
[Epoch 150 Batch 60/173] avg loss 0.000693303, throughput 2.8554K wps
[Epoch 150 Batch 90/173] avg loss 0.000696259, throughput 2.86966K wps
[Epoch 150 Batch 120/173] avg loss 0.00070911, throughput 2.87039K wps
[Epoch 150 Batch 150/173] avg loss 0.000666949, throughput 2.86582K wps
Begin Testing...
[Epoch 150] train avg loss 0.000688921, dev acc 0.7685, dev avg loss 0.630687, throughput 2.8628K wps
[Epoch 151 Batch 30/173] avg loss 0.000655049, throughput 2.8801K wps
[Epoch 151 Batch 60/173] avg loss 0.000688289, throughput 2.85684K wps
[Epoch 151 Batch 90/173] avg loss 0.000667311, throughput 2.81941K wps
[Epoch 151 Batch 120/173] avg loss 0.000681947, throughput 2.87264K wps
[Epoch 151 Batch 150/173] avg loss 0.000693357, throughput 2.8666K wps
Begin Testing...
[Epoch 151] train avg loss 0.000682094, dev acc 0.7664, dev avg loss 0.63576, throughput 2.85915K wps
[Epoch 152 Batch 30/173] avg loss 0.000643465, throughput 2.8886K wps
[Epoch 152 Batch 60/173] avg loss 0.000653902, throughput 2.86929K wps
[Epoch 152 Batch 90/173] avg loss 0.00066017, throughput 2.86973K wps
[Epoch 152 Batch 120/173] avg loss 0.000656012, throughput 2.86021K wps
[Epoch 152 Batch 150/173] avg loss 0.000742815, throughput 2.82959K wps
Begin Testing...
[Epoch 152] train avg loss 0.000663759, dev acc 0.7581, dev avg loss 0.644257, throughput 2.86122K wps
[Epoch 153 Batch 30/173] avg loss 0.000691346, throughput 2.85221K wps
[Epoch 153 Batch 60/173] avg loss 0.000694334, throughput 2.86521K wps
[Epoch 153 Batch 90/173] avg loss 0.000594736, throughput 2.87345K wps
[Epoch 153 Batch 120/173] avg loss 0.00065442, throughput 2.86455K wps
[Epoch 153 Batch 150/173] avg loss 0.000577845, throughput 2.87135K wps
Begin Testing...
[Epoch 153] train avg loss 0.000646846, dev acc 0.7623, dev avg loss 0.639901, throughput 2.86579K wps
[Epoch 154 Batch 30/173] avg loss 0.000622559, throughput 2.94246K wps
[Epoch 154 Batch 60/173] avg loss 0.000600256, throughput 2.85141K wps
[Epoch 154 Batch 90/173] avg loss 0.000673363, throughput 2.84432K wps
[Epoch 154 Batch 120/173] avg loss 0.000611359, throughput 2.87018K wps
[Epoch 154 Batch 150/173] avg loss 0.000623604, throughput 2.86333K wps
Begin Testing...
[Epoch 154] train avg loss 0.000621977, dev acc 0.7602, dev avg loss 0.651495, throughput 2.872K wps
[Epoch 155 Batch 30/173] avg loss 0.000541586, throughput 2.93017K wps
[Epoch 155 Batch 60/173] avg loss 0.000556843, throughput 2.85875K wps
[Epoch 155 Batch 90/173] avg loss 0.000626988, throughput 2.85382K wps
[Epoch 155 Batch 120/173] avg loss 0.000603699, throughput 2.85522K wps
[Epoch 155 Batch 150/173] avg loss 0.000677207, throughput 2.8266K wps
Begin Testing...
[Epoch 155] train avg loss 0.000598978, dev acc 0.7633, dev avg loss 0.642239, throughput 2.86484K wps
[Epoch 156 Batch 30/173] avg loss 0.000544575, throughput 2.90445K wps
[Epoch 156 Batch 60/173] avg loss 0.00060017, throughput 2.83862K wps
[Epoch 156 Batch 90/173] avg loss 0.000614109, throughput 2.87763K wps
[Epoch 156 Batch 120/173] avg loss 0.000575597, throughput 2.85993K wps
[Epoch 156 Batch 150/173] avg loss 0.000583621, throughput 2.83337K wps
Begin Testing...
[Epoch 156] train avg loss 0.000584694, dev acc 0.7643, dev avg loss 0.64953, throughput 2.86233K wps
[Epoch 157 Batch 30/173] avg loss 0.000607268, throughput 2.90147K wps
[Epoch 157 Batch 60/173] avg loss 0.000556574, throughput 2.86887K wps
[Epoch 157 Batch 90/173] avg loss 0.000549533, throughput 2.80841K wps
[Epoch 157 Batch 120/173] avg loss 0.000612415, throughput 2.86622K wps
[Epoch 157 Batch 150/173] avg loss 0.00062212, throughput 2.86797K wps
Begin Testing...
[Epoch 157] train avg loss 0.000590523, dev acc 0.7675, dev avg loss 0.643366, throughput 2.86372K wps
[Epoch 158 Batch 30/173] avg loss 0.00050577, throughput 2.89973K wps
[Epoch 158 Batch 60/173] avg loss 0.000554748, throughput 2.85948K wps
[Epoch 158 Batch 90/173] avg loss 0.000558393, throughput 2.87411K wps
[Epoch 158 Batch 120/173] avg loss 0.000588883, throughput 2.86915K wps
[Epoch 158 Batch 150/173] avg loss 0.000580535, throughput 2.86628K wps
Begin Testing...
[Epoch 158] train avg loss 0.000565216, dev acc 0.7623, dev avg loss 0.655404, throughput 2.87254K wps
[Epoch 159 Batch 30/173] avg loss 0.000566062, throughput 2.92922K wps
[Epoch 159 Batch 60/173] avg loss 0.000563702, throughput 2.86113K wps
[Epoch 159 Batch 90/173] avg loss 0.000552321, throughput 2.86276K wps
[Epoch 159 Batch 120/173] avg loss 0.000488816, throughput 2.87251K wps
[Epoch 159 Batch 150/173] avg loss 0.000565051, throughput 2.87043K wps
Begin Testing...
[Epoch 159] train avg loss 0.000557585, dev acc 0.7581, dev avg loss 0.660738, throughput 2.87632K wps
[Epoch 160 Batch 30/173] avg loss 0.000515611, throughput 2.92089K wps
[Epoch 160 Batch 60/173] avg loss 0.00054029, throughput 2.85851K wps
[Epoch 160 Batch 90/173] avg loss 0.000569893, throughput 2.85324K wps
[Epoch 160 Batch 120/173] avg loss 0.000578491, throughput 2.86226K wps
[Epoch 160 Batch 150/173] avg loss 0.000577701, throughput 2.83547K wps
Begin Testing...
[Epoch 160] train avg loss 0.000549758, dev acc 0.7643, dev avg loss 0.653276, throughput 2.85928K wps
[Epoch 161 Batch 30/173] avg loss 0.000563186, throughput 2.91011K wps
[Epoch 161 Batch 60/173] avg loss 0.000586084, throughput 2.87936K wps
[Epoch 161 Batch 90/173] avg loss 0.000541829, throughput 2.84529K wps
[Epoch 161 Batch 120/173] avg loss 0.000531075, throughput 2.81745K wps
[Epoch 161 Batch 150/173] avg loss 0.000514894, throughput 2.79428K wps
Begin Testing...
[Epoch 161] train avg loss 0.000554376, dev acc 0.7643, dev avg loss 0.651661, throughput 2.84372K wps
[Epoch 162 Batch 30/173] avg loss 0.000489555, throughput 2.92083K wps
[Epoch 162 Batch 60/173] avg loss 0.000540734, throughput 2.85285K wps
[Epoch 162 Batch 90/173] avg loss 0.00053364, throughput 2.84304K wps
[Epoch 162 Batch 120/173] avg loss 0.000513424, throughput 2.8451K wps
[Epoch 162 Batch 150/173] avg loss 0.000490359, throughput 2.84069K wps
Begin Testing...
[Epoch 162] train avg loss 0.000520399, dev acc 0.7591, dev avg loss 0.662749, throughput 2.86145K wps
[Epoch 163 Batch 30/173] avg loss 0.000507142, throughput 2.93058K wps
[Epoch 163 Batch 60/173] avg loss 0.000496718, throughput 2.85944K wps
[Epoch 163 Batch 90/173] avg loss 0.000530765, throughput 2.85683K wps
[Epoch 163 Batch 120/173] avg loss 0.00054818, throughput 2.81435K wps
[Epoch 163 Batch 150/173] avg loss 0.000540228, throughput 2.87518K wps
Begin Testing...
[Epoch 163] train avg loss 0.000530533, dev acc 0.7591, dev avg loss 0.661066, throughput 2.86794K wps
[Epoch 164 Batch 30/173] avg loss 0.000482183, throughput 2.85635K wps
[Epoch 164 Batch 60/173] avg loss 0.000521526, throughput 2.83626K wps
[Epoch 164 Batch 90/173] avg loss 0.000512957, throughput 2.85097K wps
[Epoch 164 Batch 120/173] avg loss 0.000541019, throughput 2.87507K wps
[Epoch 164 Batch 150/173] avg loss 0.000492544, throughput 2.84909K wps
Begin Testing...
[Epoch 164] train avg loss 0.000512522, dev acc 0.7591, dev avg loss 0.663879, throughput 2.85618K wps
[Epoch 165 Batch 30/173] avg loss 0.000539049, throughput 2.94439K wps
[Epoch 165 Batch 60/173] avg loss 0.000492486, throughput 2.87251K wps
[Epoch 165 Batch 90/173] avg loss 0.000507351, throughput 2.87472K wps
[Epoch 165 Batch 120/173] avg loss 0.000555427, throughput 2.8704K wps
[Epoch 165 Batch 150/173] avg loss 0.000530831, throughput 2.87389K wps
Begin Testing...
[Epoch 165] train avg loss 0.000524823, dev acc 0.7560, dev avg loss 0.686265, throughput 2.88139K wps
[Epoch 166 Batch 30/173] avg loss 0.000454544, throughput 2.86527K wps
[Epoch 166 Batch 60/173] avg loss 0.000479696, throughput 2.83481K wps
[Epoch 166 Batch 90/173] avg loss 0.000515557, throughput 2.8663K wps
[Epoch 166 Batch 120/173] avg loss 0.000565387, throughput 2.86522K wps
[Epoch 166 Batch 150/173] avg loss 0.000557445, throughput 2.82733K wps
Begin Testing...
[Epoch 166] train avg loss 0.000514848, dev acc 0.7591, dev avg loss 0.673569, throughput 2.85279K wps
[Epoch 167 Batch 30/173] avg loss 0.000454323, throughput 2.93465K wps
[Epoch 167 Batch 60/173] avg loss 0.000513719, throughput 2.82616K wps
[Epoch 167 Batch 90/173] avg loss 0.000560447, throughput 2.81692K wps
[Epoch 167 Batch 120/173] avg loss 0.000527449, throughput 2.86529K wps
[Epoch 167 Batch 150/173] avg loss 0.000557255, throughput 2.80137K wps
Begin Testing...
[Epoch 167] train avg loss 0.00052068, dev acc 0.7581, dev avg loss 0.66697, throughput 2.8447K wps
[Epoch 168 Batch 30/173] avg loss 0.000475937, throughput 2.87838K wps
[Epoch 168 Batch 60/173] avg loss 0.000462142, throughput 2.87099K wps
[Epoch 168 Batch 90/173] avg loss 0.000508129, throughput 2.79991K wps
[Epoch 168 Batch 120/173] avg loss 0.00048899, throughput 2.8314K wps
[Epoch 168 Batch 150/173] avg loss 0.000502719, throughput 2.80608K wps
Begin Testing...
[Epoch 168] train avg loss 0.000486289, dev acc 0.7623, dev avg loss 0.671421, throughput 2.83708K wps
[Epoch 169 Batch 30/173] avg loss 0.00047924, throughput 2.89375K wps
[Epoch 169 Batch 60/173] avg loss 0.000510153, throughput 2.87368K wps
[Epoch 169 Batch 90/173] avg loss 0.000459575, throughput 2.87264K wps
[Epoch 169 Batch 120/173] avg loss 0.000477411, throughput 2.87053K wps
[Epoch 169 Batch 150/173] avg loss 0.00046667, throughput 2.85684K wps
Begin Testing...
[Epoch 169] train avg loss 0.000470332, dev acc 0.7623, dev avg loss 0.674157, throughput 2.86974K wps
[Epoch 170 Batch 30/173] avg loss 0.000494731, throughput 2.89248K wps
[Epoch 170 Batch 60/173] avg loss 0.00044401, throughput 2.8016K wps
[Epoch 170 Batch 90/173] avg loss 0.000492301, throughput 2.82947K wps
[Epoch 170 Batch 120/173] avg loss 0.00042653, throughput 2.86966K wps
[Epoch 170 Batch 150/173] avg loss 0.000451971, throughput 2.8512K wps
Begin Testing...
[Epoch 170] train avg loss 0.00046018, dev acc 0.7612, dev avg loss 0.684746, throughput 2.85145K wps
[Epoch 171 Batch 30/173] avg loss 0.000484036, throughput 2.85333K wps
[Epoch 171 Batch 60/173] avg loss 0.000497129, throughput 2.79704K wps
[Epoch 171 Batch 90/173] avg loss 0.000467354, throughput 2.81949K wps
[Epoch 171 Batch 120/173] avg loss 0.000433367, throughput 2.87972K wps
[Epoch 171 Batch 150/173] avg loss 0.000468851, throughput 2.87048K wps
Begin Testing...
[Epoch 171] train avg loss 0.000465953, dev acc 0.7612, dev avg loss 0.683572, throughput 2.84626K wps
[Epoch 172 Batch 30/173] avg loss 0.00044031, throughput 2.91501K wps
[Epoch 172 Batch 60/173] avg loss 0.00042638, throughput 2.85473K wps
[Epoch 172 Batch 90/173] avg loss 0.000415401, throughput 2.86877K wps
[Epoch 172 Batch 120/173] avg loss 0.000447822, throughput 2.86759K wps
[Epoch 172 Batch 150/173] avg loss 0.000508823, throughput 2.8747K wps
Begin Testing...
[Epoch 172] train avg loss 0.000446785, dev acc 0.7560, dev avg loss 0.686119, throughput 2.87285K wps
[Epoch 173 Batch 30/173] avg loss 0.000467807, throughput 2.8764K wps
[Epoch 173 Batch 60/173] avg loss 0.000426704, throughput 2.78049K wps
[Epoch 173 Batch 90/173] avg loss 0.000417188, throughput 2.87187K wps
[Epoch 173 Batch 120/173] avg loss 0.000454802, throughput 2.87902K wps
[Epoch 173 Batch 150/173] avg loss 0.000438894, throughput 2.8356K wps
Begin Testing...
[Epoch 173] train avg loss 0.00043031, dev acc 0.7591, dev avg loss 0.69457, throughput 2.84976K wps
[Epoch 174 Batch 30/173] avg loss 0.000408938, throughput 2.8954K wps
[Epoch 174 Batch 60/173] avg loss 0.000437853, throughput 2.87785K wps
[Epoch 174 Batch 90/173] avg loss 0.000440058, throughput 2.87133K wps
[Epoch 174 Batch 120/173] avg loss 0.000441542, throughput 2.78311K wps
[Epoch 174 Batch 150/173] avg loss 0.000423392, throughput 2.79392K wps
Begin Testing...
[Epoch 174] train avg loss 0.000445428, dev acc 0.7539, dev avg loss 0.696833, throughput 2.84206K wps
[Epoch 175 Batch 30/173] avg loss 0.000450066, throughput 2.8785K wps
[Epoch 175 Batch 60/173] avg loss 0.000420797, throughput 2.79064K wps
[Epoch 175 Batch 90/173] avg loss 0.000433726, throughput 2.82639K wps
[Epoch 175 Batch 120/173] avg loss 0.000481446, throughput 2.86755K wps
[Epoch 175 Batch 150/173] avg loss 0.000450406, throughput 2.81002K wps
Begin Testing...
[Epoch 175] train avg loss 0.000442371, dev acc 0.7602, dev avg loss 0.686005, throughput 2.82788K wps
[Epoch 176 Batch 30/173] avg loss 0.000382277, throughput 2.8876K wps
[Epoch 176 Batch 60/173] avg loss 0.000409993, throughput 2.86796K wps
[Epoch 176 Batch 90/173] avg loss 0.000478035, throughput 2.84031K wps
[Epoch 176 Batch 120/173] avg loss 0.000430066, throughput 2.79506K wps
[Epoch 176 Batch 150/173] avg loss 0.000403315, throughput 2.87841K wps
Begin Testing...
[Epoch 176] train avg loss 0.000417984, dev acc 0.7539, dev avg loss 0.700181, throughput 2.85569K wps
[Epoch 177 Batch 30/173] avg loss 0.000466989, throughput 2.8764K wps
[Epoch 177 Batch 60/173] avg loss 0.000389705, throughput 2.80664K wps
[Epoch 177 Batch 90/173] avg loss 0.000374449, throughput 2.86584K wps
[Epoch 177 Batch 120/173] avg loss 0.000435451, throughput 2.83951K wps
[Epoch 177 Batch 150/173] avg loss 0.000387674, throughput 2.87093K wps
Begin Testing...
[Epoch 177] train avg loss 0.000414489, dev acc 0.7591, dev avg loss 0.694954, throughput 2.8507K wps
[Epoch 178 Batch 30/173] avg loss 0.000408589, throughput 2.86641K wps
[Epoch 178 Batch 60/173] avg loss 0.000391438, throughput 2.79911K wps
[Epoch 178 Batch 90/173] avg loss 0.000412592, throughput 2.87198K wps
[Epoch 178 Batch 120/173] avg loss 0.000395547, throughput 2.83523K wps
[Epoch 178 Batch 150/173] avg loss 0.000446269, throughput 2.857K wps
Begin Testing...
[Epoch 178] train avg loss 0.000410226, dev acc 0.7570, dev avg loss 0.696532, throughput 2.84158K wps
[Epoch 179 Batch 30/173] avg loss 0.000402411, throughput 2.88592K wps
[Epoch 179 Batch 60/173] avg loss 0.000422361, throughput 2.85184K wps
[Epoch 179 Batch 90/173] avg loss 0.000402805, throughput 2.85881K wps
[Epoch 179 Batch 120/173] avg loss 0.000354566, throughput 2.84278K wps
[Epoch 179 Batch 150/173] avg loss 0.00039971, throughput 2.87908K wps
Begin Testing...
[Epoch 179] train avg loss 0.000404665, dev acc 0.7560, dev avg loss 0.702185, throughput 2.86358K wps
[Epoch 180 Batch 30/173] avg loss 0.000411283, throughput 2.86528K wps
[Epoch 180 Batch 60/173] avg loss 0.000373272, throughput 2.83645K wps
[Epoch 180 Batch 90/173] avg loss 0.000373432, throughput 2.8366K wps
[Epoch 180 Batch 120/173] avg loss 0.000402365, throughput 2.82944K wps
[Epoch 180 Batch 150/173] avg loss 0.00036778, throughput 2.83241K wps
Begin Testing...
[Epoch 180] train avg loss 0.000388581, dev acc 0.7550, dev avg loss 0.694688, throughput 2.84549K wps
[Epoch 181 Batch 30/173] avg loss 0.000372763, throughput 2.93911K wps
[Epoch 181 Batch 60/173] avg loss 0.000387304, throughput 2.87599K wps
[Epoch 181 Batch 90/173] avg loss 0.000372149, throughput 2.84183K wps
[Epoch 181 Batch 120/173] avg loss 0.000354759, throughput 2.85467K wps
[Epoch 181 Batch 150/173] avg loss 0.000424698, throughput 2.79803K wps
Begin Testing...
[Epoch 181] train avg loss 0.000384118, dev acc 0.7633, dev avg loss 0.695941, throughput 2.85221K wps
[Epoch 182 Batch 30/173] avg loss 0.000380816, throughput 2.92688K wps
[Epoch 182 Batch 60/173] avg loss 0.00036574, throughput 2.85042K wps
[Epoch 182 Batch 90/173] avg loss 0.000375165, throughput 2.86962K wps
[Epoch 182 Batch 120/173] avg loss 0.000355366, throughput 2.86373K wps
[Epoch 182 Batch 150/173] avg loss 0.000432152, throughput 2.87333K wps
Begin Testing...
[Epoch 182] train avg loss 0.000387869, dev acc 0.7539, dev avg loss 0.696871, throughput 2.86808K wps
[Epoch 183 Batch 30/173] avg loss 0.000322267, throughput 2.85564K wps
[Epoch 183 Batch 60/173] avg loss 0.00036079, throughput 2.8097K wps
[Epoch 183 Batch 90/173] avg loss 0.00032619, throughput 2.79569K wps
[Epoch 183 Batch 120/173] avg loss 0.000353888, throughput 2.8605K wps
[Epoch 183 Batch 150/173] avg loss 0.000385485, throughput 2.83278K wps
Begin Testing...
[Epoch 183] train avg loss 0.000353078, dev acc 0.7602, dev avg loss 0.700504, throughput 2.8325K wps
[Epoch 184 Batch 30/173] avg loss 0.000369639, throughput 2.86275K wps
[Epoch 184 Batch 60/173] avg loss 0.000333555, throughput 2.81318K wps
[Epoch 184 Batch 90/173] avg loss 0.000381211, throughput 2.86731K wps
[Epoch 184 Batch 120/173] avg loss 0.000402062, throughput 2.87827K wps
[Epoch 184 Batch 150/173] avg loss 0.000333346, throughput 2.86957K wps
Begin Testing...
[Epoch 184] train avg loss 0.000361537, dev acc 0.7581, dev avg loss 0.704008, throughput 2.85486K wps
[Epoch 185 Batch 30/173] avg loss 0.000357356, throughput 2.92227K wps
[Epoch 185 Batch 60/173] avg loss 0.000371755, throughput 2.87242K wps
[Epoch 185 Batch 90/173] avg loss 0.000367794, throughput 2.83199K wps
[Epoch 185 Batch 120/173] avg loss 0.000334654, throughput 2.85783K wps
[Epoch 185 Batch 150/173] avg loss 0.000372672, throughput 2.83053K wps
Begin Testing...
[Epoch 185] train avg loss 0.000362814, dev acc 0.7560, dev avg loss 0.705481, throughput 2.86256K wps
[Epoch 186 Batch 30/173] avg loss 0.000402427, throughput 2.8807K wps
[Epoch 186 Batch 60/173] avg loss 0.00036064, throughput 2.86442K wps
[Epoch 186 Batch 90/173] avg loss 0.000357484, throughput 2.86687K wps
[Epoch 186 Batch 120/173] avg loss 0.000340305, throughput 2.86033K wps
[Epoch 186 Batch 150/173] avg loss 0.000330047, throughput 2.87134K wps
Begin Testing...
[Epoch 186] train avg loss 0.000362731, dev acc 0.7529, dev avg loss 0.715251, throughput 2.86788K wps
[Epoch 187 Batch 30/173] avg loss 0.000365122, throughput 2.92596K wps
[Epoch 187 Batch 60/173] avg loss 0.000327291, throughput 2.86387K wps
[Epoch 187 Batch 90/173] avg loss 0.000350211, throughput 2.8604K wps
[Epoch 187 Batch 120/173] avg loss 0.000353024, throughput 2.85248K wps
[Epoch 187 Batch 150/173] avg loss 0.000421229, throughput 2.78778K wps
Begin Testing...
[Epoch 187] train avg loss 0.000361644, dev acc 0.7623, dev avg loss 0.710054, throughput 2.84669K wps
[Epoch 188 Batch 30/173] avg loss 0.00034796, throughput 2.87695K wps
[Epoch 188 Batch 60/173] avg loss 0.000311088, throughput 2.87006K wps
[Epoch 188 Batch 90/173] avg loss 0.000374477, throughput 2.7934K wps
[Epoch 188 Batch 120/173] avg loss 0.000356032, throughput 2.81132K wps
[Epoch 188 Batch 150/173] avg loss 0.000371106, throughput 2.8388K wps
Begin Testing...
[Epoch 188] train avg loss 0.000354074, dev acc 0.7529, dev avg loss 0.713231, throughput 2.84301K wps
[Epoch 189 Batch 30/173] avg loss 0.000336221, throughput 2.85965K wps
[Epoch 189 Batch 60/173] avg loss 0.000364712, throughput 2.83177K wps
[Epoch 189 Batch 90/173] avg loss 0.000381376, throughput 2.86222K wps
[Epoch 189 Batch 120/173] avg loss 0.000364539, throughput 2.87272K wps
[Epoch 189 Batch 150/173] avg loss 0.000358056, throughput 2.83737K wps
Begin Testing...
[Epoch 189] train avg loss 0.000360924, dev acc 0.7560, dev avg loss 0.711901, throughput 2.85579K wps
[Epoch 190 Batch 30/173] avg loss 0.000347886, throughput 2.94045K wps
[Epoch 190 Batch 60/173] avg loss 0.00033295, throughput 2.64561K wps
[Epoch 190 Batch 90/173] avg loss 0.000339859, throughput 2.87187K wps
[Epoch 190 Batch 120/173] avg loss 0.000341305, throughput 2.84657K wps
[Epoch 190 Batch 150/173] avg loss 0.000369424, throughput 2.79057K wps
Begin Testing...
[Epoch 190] train avg loss 0.000350734, dev acc 0.7623, dev avg loss 0.710971, throughput 2.81986K wps
[Epoch 191 Batch 30/173] avg loss 0.00035275, throughput 2.90028K wps
[Epoch 191 Batch 60/173] avg loss 0.000330401, throughput 2.8541K wps
[Epoch 191 Batch 90/173] avg loss 0.000292908, throughput 2.84618K wps
[Epoch 191 Batch 120/173] avg loss 0.000305752, throughput 2.85463K wps
[Epoch 191 Batch 150/173] avg loss 0.000355581, throughput 2.86331K wps
Begin Testing...
[Epoch 191] train avg loss 0.000324718, dev acc 0.7612, dev avg loss 0.716157, throughput 2.86213K wps
[Epoch 192 Batch 30/173] avg loss 0.000322712, throughput 2.93265K wps
[Epoch 192 Batch 60/173] avg loss 0.00030331, throughput 2.84834K wps
[Epoch 192 Batch 90/173] avg loss 0.000373863, throughput 2.87692K wps
[Epoch 192 Batch 120/173] avg loss 0.000359978, throughput 2.87972K wps
[Epoch 192 Batch 150/173] avg loss 0.000313972, throughput 2.83326K wps
Begin Testing...
[Epoch 192] train avg loss 0.000332955, dev acc 0.7550, dev avg loss 0.712311, throughput 2.86265K wps
[Epoch 193 Batch 30/173] avg loss 0.000350382, throughput 2.94098K wps
[Epoch 193 Batch 60/173] avg loss 0.000302455, throughput 2.84862K wps
[Epoch 193 Batch 90/173] avg loss 0.000281266, throughput 2.78865K wps
[Epoch 193 Batch 120/173] avg loss 0.000384448, throughput 2.82203K wps
[Epoch 193 Batch 150/173] avg loss 0.000383022, throughput 2.78048K wps
Begin Testing...
[Epoch 193] train avg loss 0.000339893, dev acc 0.7529, dev avg loss 0.729495, throughput 2.83324K wps
[Epoch 194 Batch 30/173] avg loss 0.000298309, throughput 2.90766K wps
[Epoch 194 Batch 60/173] avg loss 0.000301554, throughput 2.82146K wps
[Epoch 194 Batch 90/173] avg loss 0.000387045, throughput 2.8423K wps
[Epoch 194 Batch 120/173] avg loss 0.00034037, throughput 2.81755K wps
[Epoch 194 Batch 150/173] avg loss 0.000324712, throughput 2.84824K wps
Begin Testing...
[Epoch 194] train avg loss 0.000331662, dev acc 0.7570, dev avg loss 0.720122, throughput 2.84788K wps
[Epoch 195 Batch 30/173] avg loss 0.000360475, throughput 2.92277K wps
[Epoch 195 Batch 60/173] avg loss 0.000319664, throughput 2.81924K wps
[Epoch 195 Batch 90/173] avg loss 0.000312618, throughput 2.87354K wps
[Epoch 195 Batch 120/173] avg loss 0.000342268, throughput 2.83151K wps
[Epoch 195 Batch 150/173] avg loss 0.000337155, throughput 2.79461K wps
Begin Testing...
[Epoch 195] train avg loss 0.000329973, dev acc 0.7591, dev avg loss 0.71842, throughput 2.85106K wps
[Epoch 196 Batch 30/173] avg loss 0.000305367, throughput 2.91886K wps
[Epoch 196 Batch 60/173] avg loss 0.000353206, throughput 2.78089K wps
[Epoch 196 Batch 90/173] avg loss 0.000326119, throughput 2.85261K wps
[Epoch 196 Batch 120/173] avg loss 0.000328934, throughput 2.85207K wps
[Epoch 196 Batch 150/173] avg loss 0.000313202, throughput 2.84323K wps
Begin Testing...
[Epoch 196] train avg loss 0.000325502, dev acc 0.7508, dev avg loss 0.728728, throughput 2.84828K wps
[Epoch 197 Batch 30/173] avg loss 0.000306554, throughput 2.87093K wps
[Epoch 197 Batch 60/173] avg loss 0.000318672, throughput 2.86762K wps
[Epoch 197 Batch 90/173] avg loss 0.000289109, throughput 2.85243K wps
[Epoch 197 Batch 120/173] avg loss 0.000343191, throughput 2.85177K wps
[Epoch 197 Batch 150/173] avg loss 0.000322194, throughput 2.824K wps
Begin Testing...
[Epoch 197] train avg loss 0.000316829, dev acc 0.7508, dev avg loss 0.733286, throughput 2.8534K wps
[Epoch 198 Batch 30/173] avg loss 0.000373384, throughput 2.85873K wps
[Epoch 198 Batch 60/173] avg loss 0.000345327, throughput 2.845K wps
[Epoch 198 Batch 90/173] avg loss 0.000331198, throughput 2.82843K wps
[Epoch 198 Batch 120/173] avg loss 0.000302456, throughput 2.84639K wps
[Epoch 198 Batch 150/173] avg loss 0.000246545, throughput 2.86415K wps
Begin Testing...
[Epoch 198] train avg loss 0.000324215, dev acc 0.7539, dev avg loss 0.735658, throughput 2.84724K wps
[Epoch 199 Batch 30/173] avg loss 0.000296927, throughput 2.91437K wps
[Epoch 199 Batch 60/173] avg loss 0.000300807, throughput 2.86015K wps
[Epoch 199 Batch 90/173] avg loss 0.000301847, throughput 2.83393K wps
[Epoch 199 Batch 120/173] avg loss 0.000302362, throughput 2.81283K wps
[Epoch 199 Batch 150/173] avg loss 0.000296247, throughput 2.87977K wps
Begin Testing...
[Epoch 199] train avg loss 0.000302262, dev acc 0.7570, dev avg loss 0.731076, throughput 2.86198K wps
Test loss 0.533977, test acc 0.7692
Total time cost 710.22s
[Epoch 0 Batch 30/173] avg loss 0.0138679, throughput 2.5018K wps
[Epoch 0 Batch 60/173] avg loss 0.0138752, throughput 2.78835K wps
[Epoch 0 Batch 90/173] avg loss 0.0138553, throughput 2.80111K wps
[Epoch 0 Batch 120/173] avg loss 0.0138726, throughput 2.86697K wps
[Epoch 0 Batch 150/173] avg loss 0.013858, throughput 2.86261K wps
Begin Testing...
[Epoch 0] train avg loss 0.0138827, dev acc 0.5287, dev avg loss 0.691957, throughput 2.77169K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0138332, throughput 2.87021K wps
[Epoch 1 Batch 60/173] avg loss 0.0138374, throughput 2.87984K wps
[Epoch 1 Batch 90/173] avg loss 0.0138388, throughput 2.87164K wps
[Epoch 1 Batch 120/173] avg loss 0.0138567, throughput 2.866K wps
[Epoch 1 Batch 150/173] avg loss 0.0138298, throughput 2.87426K wps
Begin Testing...
[Epoch 1] train avg loss 0.0138616, dev acc 0.5693, dev avg loss 0.691667, throughput 2.8722K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.0138281, throughput 2.93778K wps
[Epoch 2 Batch 60/173] avg loss 0.0138017, throughput 2.86816K wps
[Epoch 2 Batch 90/173] avg loss 0.013825, throughput 2.84593K wps
[Epoch 2 Batch 120/173] avg loss 0.01381, throughput 2.87101K wps
[Epoch 2 Batch 150/173] avg loss 0.0137848, throughput 2.85943K wps
Begin Testing...
[Epoch 2] train avg loss 0.0138343, dev acc 0.5349, dev avg loss 0.690348, throughput 2.87578K wps
[Epoch 3 Batch 30/173] avg loss 0.013797, throughput 2.89087K wps
[Epoch 3 Batch 60/173] avg loss 0.0138152, throughput 2.82632K wps
[Epoch 3 Batch 90/173] avg loss 0.0138259, throughput 2.84675K wps
[Epoch 3 Batch 120/173] avg loss 0.0137966, throughput 2.87399K wps
[Epoch 3 Batch 150/173] avg loss 0.0137942, throughput 2.87545K wps
Begin Testing...
[Epoch 3] train avg loss 0.0138239, dev acc 0.5871, dev avg loss 0.689534, throughput 2.86446K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/173] avg loss 0.0138095, throughput 2.84978K wps
[Epoch 4 Batch 60/173] avg loss 0.0137573, throughput 2.78792K wps
[Epoch 4 Batch 90/173] avg loss 0.0137619, throughput 2.81445K wps
[Epoch 4 Batch 120/173] avg loss 0.0137395, throughput 2.86403K wps
[Epoch 4 Batch 150/173] avg loss 0.0137352, throughput 2.8691K wps
Begin Testing...
[Epoch 4] train avg loss 0.0137794, dev acc 0.5766, dev avg loss 0.688388, throughput 2.84101K wps
[Epoch 5 Batch 30/173] avg loss 0.0137139, throughput 2.88497K wps
[Epoch 5 Batch 60/173] avg loss 0.0137284, throughput 2.85825K wps
[Epoch 5 Batch 90/173] avg loss 0.0137549, throughput 2.83068K wps
[Epoch 5 Batch 120/173] avg loss 0.0137009, throughput 2.8331K wps
[Epoch 5 Batch 150/173] avg loss 0.0136667, throughput 2.87163K wps
Begin Testing...
[Epoch 5] train avg loss 0.013728, dev acc 0.5808, dev avg loss 0.68782, throughput 2.85365K wps
[Epoch 6 Batch 30/173] avg loss 0.0137026, throughput 2.9067K wps
[Epoch 6 Batch 60/173] avg loss 0.0136717, throughput 2.8248K wps
[Epoch 6 Batch 90/173] avg loss 0.0137213, throughput 2.82371K wps
[Epoch 6 Batch 120/173] avg loss 0.0136859, throughput 2.83381K wps
[Epoch 6 Batch 150/173] avg loss 0.0137102, throughput 2.8522K wps
Begin Testing...
[Epoch 6] train avg loss 0.0137111, dev acc 0.5631, dev avg loss 0.685702, throughput 2.84643K wps
[Epoch 7 Batch 30/173] avg loss 0.0136637, throughput 2.88601K wps
[Epoch 7 Batch 60/173] avg loss 0.0136625, throughput 2.84155K wps
[Epoch 7 Batch 90/173] avg loss 0.013668, throughput 2.87694K wps
[Epoch 7 Batch 120/173] avg loss 0.0136357, throughput 2.88025K wps
[Epoch 7 Batch 150/173] avg loss 0.0136502, throughput 2.794K wps
Begin Testing...
[Epoch 7] train avg loss 0.0136801, dev acc 0.5662, dev avg loss 0.684111, throughput 2.84982K wps
[Epoch 8 Batch 30/173] avg loss 0.0136697, throughput 2.89987K wps
[Epoch 8 Batch 60/173] avg loss 0.0136163, throughput 2.83416K wps
[Epoch 8 Batch 90/173] avg loss 0.0136247, throughput 2.79448K wps
[Epoch 8 Batch 120/173] avg loss 0.0136391, throughput 2.82118K wps
[Epoch 8 Batch 150/173] avg loss 0.0136822, throughput 2.81656K wps
Begin Testing...
[Epoch 8] train avg loss 0.0136608, dev acc 0.5662, dev avg loss 0.683529, throughput 2.82936K wps
[Epoch 9 Batch 30/173] avg loss 0.0136234, throughput 2.8592K wps
[Epoch 9 Batch 60/173] avg loss 0.0136238, throughput 2.81873K wps
[Epoch 9 Batch 90/173] avg loss 0.0136602, throughput 2.87514K wps
[Epoch 9 Batch 120/173] avg loss 0.0136074, throughput 2.87476K wps
[Epoch 9 Batch 150/173] avg loss 0.0135815, throughput 2.8508K wps
Begin Testing...
[Epoch 9] train avg loss 0.0136273, dev acc 0.5662, dev avg loss 0.682484, throughput 2.85798K wps
[Epoch 10 Batch 30/173] avg loss 0.0135664, throughput 2.94308K wps
[Epoch 10 Batch 60/173] avg loss 0.0135988, throughput 2.85442K wps
[Epoch 10 Batch 90/173] avg loss 0.0135751, throughput 2.79314K wps
[Epoch 10 Batch 120/173] avg loss 0.0136153, throughput 2.8115K wps
[Epoch 10 Batch 150/173] avg loss 0.0135622, throughput 2.81085K wps
Begin Testing...
[Epoch 10] train avg loss 0.0135981, dev acc 0.5673, dev avg loss 0.682366, throughput 2.84533K wps
[Epoch 11 Batch 30/173] avg loss 0.0135064, throughput 2.89042K wps
[Epoch 11 Batch 60/173] avg loss 0.013597, throughput 2.87307K wps
[Epoch 11 Batch 90/173] avg loss 0.0135366, throughput 2.88091K wps
[Epoch 11 Batch 120/173] avg loss 0.0136239, throughput 2.82121K wps
[Epoch 11 Batch 150/173] avg loss 0.0136006, throughput 2.79534K wps
Begin Testing...
[Epoch 11] train avg loss 0.013584, dev acc 0.5714, dev avg loss 0.67985, throughput 2.8429K wps
[Epoch 12 Batch 30/173] avg loss 0.0135613, throughput 2.8584K wps
[Epoch 12 Batch 60/173] avg loss 0.013513, throughput 2.83821K wps
[Epoch 12 Batch 90/173] avg loss 0.0135269, throughput 2.85598K wps
[Epoch 12 Batch 120/173] avg loss 0.0135665, throughput 2.87342K wps
[Epoch 12 Batch 150/173] avg loss 0.0134657, throughput 2.87196K wps
Begin Testing...
[Epoch 12] train avg loss 0.0135455, dev acc 0.5673, dev avg loss 0.679977, throughput 2.86071K wps
[Epoch 13 Batch 30/173] avg loss 0.0136175, throughput 2.85807K wps
[Epoch 13 Batch 60/173] avg loss 0.013452, throughput 2.87773K wps
[Epoch 13 Batch 90/173] avg loss 0.0134421, throughput 2.87111K wps
[Epoch 13 Batch 120/173] avg loss 0.0134152, throughput 2.8493K wps
[Epoch 13 Batch 150/173] avg loss 0.0135532, throughput 2.85252K wps
Begin Testing...
[Epoch 13] train avg loss 0.0135168, dev acc 0.5714, dev avg loss 0.67826, throughput 2.86376K wps
[Epoch 14 Batch 30/173] avg loss 0.0134865, throughput 2.92958K wps
[Epoch 14 Batch 60/173] avg loss 0.013459, throughput 2.87947K wps
[Epoch 14 Batch 90/173] avg loss 0.0134602, throughput 2.85013K wps
[Epoch 14 Batch 120/173] avg loss 0.0134568, throughput 2.84564K wps
[Epoch 14 Batch 150/173] avg loss 0.0134334, throughput 2.87458K wps
Begin Testing...
[Epoch 14] train avg loss 0.0134865, dev acc 0.5725, dev avg loss 0.677375, throughput 2.86941K wps
[Epoch 15 Batch 30/173] avg loss 0.0135163, throughput 2.86615K wps
[Epoch 15 Batch 60/173] avg loss 0.0133365, throughput 2.80849K wps
[Epoch 15 Batch 90/173] avg loss 0.0134622, throughput 2.85142K wps
[Epoch 15 Batch 120/173] avg loss 0.0134207, throughput 2.86536K wps
[Epoch 15 Batch 150/173] avg loss 0.0134316, throughput 2.87657K wps
Begin Testing...
[Epoch 15] train avg loss 0.0134533, dev acc 0.5766, dev avg loss 0.677596, throughput 2.85652K wps
[Epoch 16 Batch 30/173] avg loss 0.0133664, throughput 2.86393K wps
[Epoch 16 Batch 60/173] avg loss 0.0133536, throughput 2.87779K wps
[Epoch 16 Batch 90/173] avg loss 0.0134423, throughput 2.84887K wps
[Epoch 16 Batch 120/173] avg loss 0.0133278, throughput 2.84376K wps
[Epoch 16 Batch 150/173] avg loss 0.0134173, throughput 2.87125K wps
Begin Testing...
[Epoch 16] train avg loss 0.0134018, dev acc 0.5850, dev avg loss 0.67512, throughput 2.86273K wps
[Epoch 17 Batch 30/173] avg loss 0.0133791, throughput 2.84731K wps
[Epoch 17 Batch 60/173] avg loss 0.0133118, throughput 2.84392K wps
[Epoch 17 Batch 90/173] avg loss 0.0133729, throughput 2.874K wps
[Epoch 17 Batch 120/173] avg loss 0.0133472, throughput 2.86012K wps
[Epoch 17 Batch 150/173] avg loss 0.0133659, throughput 2.81929K wps
Begin Testing...
[Epoch 17] train avg loss 0.0133678, dev acc 0.5975, dev avg loss 0.673909, throughput 2.84581K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/173] avg loss 0.0134008, throughput 2.92911K wps
[Epoch 18 Batch 60/173] avg loss 0.0132005, throughput 2.86432K wps
[Epoch 18 Batch 90/173] avg loss 0.013321, throughput 2.82089K wps
[Epoch 18 Batch 120/173] avg loss 0.0132731, throughput 2.80293K wps
[Epoch 18 Batch 150/173] avg loss 0.0133398, throughput 2.81935K wps
Begin Testing...
[Epoch 18] train avg loss 0.0133333, dev acc 0.5881, dev avg loss 0.673134, throughput 2.84862K wps
[Epoch 19 Batch 30/173] avg loss 0.0131934, throughput 2.92487K wps
[Epoch 19 Batch 60/173] avg loss 0.0132422, throughput 2.87421K wps
[Epoch 19 Batch 90/173] avg loss 0.0132233, throughput 2.87353K wps
[Epoch 19 Batch 120/173] avg loss 0.0132297, throughput 2.86743K wps
[Epoch 19 Batch 150/173] avg loss 0.0133552, throughput 2.86252K wps
Begin Testing...
[Epoch 19] train avg loss 0.0132792, dev acc 0.5985, dev avg loss 0.671288, throughput 2.8773K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/173] avg loss 0.0132677, throughput 2.86979K wps
[Epoch 20 Batch 60/173] avg loss 0.0132043, throughput 2.84702K wps
[Epoch 20 Batch 90/173] avg loss 0.0131491, throughput 2.81298K wps
[Epoch 20 Batch 120/173] avg loss 0.0132254, throughput 2.79254K wps
[Epoch 20 Batch 150/173] avg loss 0.0132672, throughput 2.79779K wps
Begin Testing...
[Epoch 20] train avg loss 0.0132346, dev acc 0.5881, dev avg loss 0.671746, throughput 2.82365K wps
[Epoch 21 Batch 30/173] avg loss 0.0132946, throughput 2.92157K wps
[Epoch 21 Batch 60/173] avg loss 0.0132195, throughput 2.86574K wps
[Epoch 21 Batch 90/173] avg loss 0.0130677, throughput 2.86117K wps
[Epoch 21 Batch 120/173] avg loss 0.0132583, throughput 2.86664K wps
[Epoch 21 Batch 150/173] avg loss 0.0130579, throughput 2.86648K wps
Begin Testing...
[Epoch 21] train avg loss 0.0131802, dev acc 0.5954, dev avg loss 0.669568, throughput 2.86357K wps
[Epoch 22 Batch 30/173] avg loss 0.0131788, throughput 2.91013K wps
[Epoch 22 Batch 60/173] avg loss 0.0131741, throughput 2.80573K wps
[Epoch 22 Batch 90/173] avg loss 0.0131404, throughput 2.86858K wps
[Epoch 22 Batch 120/173] avg loss 0.0131105, throughput 2.87025K wps
[Epoch 22 Batch 150/173] avg loss 0.0130427, throughput 2.86303K wps
Begin Testing...
[Epoch 22] train avg loss 0.0131379, dev acc 0.6027, dev avg loss 0.666697, throughput 2.86023K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/173] avg loss 0.0130942, throughput 2.90255K wps
[Epoch 23 Batch 60/173] avg loss 0.0130167, throughput 2.79107K wps
[Epoch 23 Batch 90/173] avg loss 0.0129412, throughput 2.85845K wps
[Epoch 23 Batch 120/173] avg loss 0.013148, throughput 2.85724K wps
[Epoch 23 Batch 150/173] avg loss 0.0131008, throughput 2.86822K wps
Begin Testing...
[Epoch 23] train avg loss 0.0130881, dev acc 0.6038, dev avg loss 0.665223, throughput 2.85538K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/173] avg loss 0.0130751, throughput 2.88282K wps
[Epoch 24 Batch 60/173] avg loss 0.0131028, throughput 2.81242K wps
[Epoch 24 Batch 90/173] avg loss 0.0130801, throughput 2.87446K wps
[Epoch 24 Batch 120/173] avg loss 0.0130149, throughput 2.87507K wps
[Epoch 24 Batch 150/173] avg loss 0.0129138, throughput 2.82641K wps
Begin Testing...
[Epoch 24] train avg loss 0.0130391, dev acc 0.6173, dev avg loss 0.664006, throughput 2.84758K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/173] avg loss 0.0128653, throughput 2.91711K wps
[Epoch 25 Batch 60/173] avg loss 0.0129385, throughput 2.86502K wps
[Epoch 25 Batch 90/173] avg loss 0.0128427, throughput 2.87965K wps
[Epoch 25 Batch 120/173] avg loss 0.0129785, throughput 2.87734K wps
[Epoch 25 Batch 150/173] avg loss 0.0129967, throughput 2.86837K wps
Begin Testing...
[Epoch 25] train avg loss 0.0129672, dev acc 0.6131, dev avg loss 0.661466, throughput 2.87892K wps
[Epoch 26 Batch 30/173] avg loss 0.0129422, throughput 2.93391K wps
[Epoch 26 Batch 60/173] avg loss 0.0127573, throughput 2.85467K wps
[Epoch 26 Batch 90/173] avg loss 0.0127916, throughput 2.85358K wps
[Epoch 26 Batch 120/173] avg loss 0.0129318, throughput 2.80766K wps
[Epoch 26 Batch 150/173] avg loss 0.0128773, throughput 2.80618K wps
Begin Testing...
[Epoch 26] train avg loss 0.0128933, dev acc 0.6131, dev avg loss 0.659897, throughput 2.85338K wps
[Epoch 27 Batch 30/173] avg loss 0.0128481, throughput 2.92905K wps
[Epoch 27 Batch 60/173] avg loss 0.0128667, throughput 2.83383K wps
[Epoch 27 Batch 90/173] avg loss 0.0125622, throughput 2.8363K wps
[Epoch 27 Batch 120/173] avg loss 0.0128618, throughput 2.86203K wps
[Epoch 27 Batch 150/173] avg loss 0.0127974, throughput 2.82657K wps
Begin Testing...
[Epoch 27] train avg loss 0.0128163, dev acc 0.6131, dev avg loss 0.657566, throughput 2.85099K wps
[Epoch 28 Batch 30/173] avg loss 0.012825, throughput 2.92961K wps
[Epoch 28 Batch 60/173] avg loss 0.0127713, throughput 2.84673K wps
[Epoch 28 Batch 90/173] avg loss 0.0126203, throughput 2.8829K wps
[Epoch 28 Batch 120/173] avg loss 0.0127403, throughput 2.87538K wps
[Epoch 28 Batch 150/173] avg loss 0.0127227, throughput 2.86394K wps
Begin Testing...
[Epoch 28] train avg loss 0.0127582, dev acc 0.6194, dev avg loss 0.656009, throughput 2.87993K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/173] avg loss 0.0127997, throughput 2.8973K wps
[Epoch 29 Batch 60/173] avg loss 0.0126035, throughput 2.83731K wps
[Epoch 29 Batch 90/173] avg loss 0.012628, throughput 2.87622K wps
[Epoch 29 Batch 120/173] avg loss 0.0126562, throughput 2.87533K wps
[Epoch 29 Batch 150/173] avg loss 0.0126432, throughput 2.87435K wps
Begin Testing...
[Epoch 29] train avg loss 0.0126762, dev acc 0.6350, dev avg loss 0.653232, throughput 2.87124K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/173] avg loss 0.0125483, throughput 2.93843K wps
[Epoch 30 Batch 60/173] avg loss 0.0126095, throughput 2.80088K wps
[Epoch 30 Batch 90/173] avg loss 0.0124557, throughput 2.8238K wps
[Epoch 30 Batch 120/173] avg loss 0.0125682, throughput 2.7893K wps
[Epoch 30 Batch 150/173] avg loss 0.0126923, throughput 2.83438K wps
Begin Testing...
[Epoch 30] train avg loss 0.0126074, dev acc 0.6382, dev avg loss 0.650883, throughput 2.83905K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/173] avg loss 0.0123462, throughput 2.88759K wps
[Epoch 31 Batch 60/173] avg loss 0.012548, throughput 2.87726K wps
[Epoch 31 Batch 90/173] avg loss 0.0124915, throughput 2.82035K wps
[Epoch 31 Batch 120/173] avg loss 0.0124922, throughput 2.86567K wps
[Epoch 31 Batch 150/173] avg loss 0.0125529, throughput 2.88098K wps
Begin Testing...
[Epoch 31] train avg loss 0.0125045, dev acc 0.6392, dev avg loss 0.648953, throughput 2.86643K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/173] avg loss 0.0124643, throughput 2.9441K wps
[Epoch 32 Batch 60/173] avg loss 0.0125921, throughput 2.7979K wps
[Epoch 32 Batch 90/173] avg loss 0.0123449, throughput 2.78581K wps
[Epoch 32 Batch 120/173] avg loss 0.0124413, throughput 2.78562K wps
[Epoch 32 Batch 150/173] avg loss 0.0122204, throughput 2.84025K wps
Begin Testing...
[Epoch 32] train avg loss 0.0124235, dev acc 0.6423, dev avg loss 0.646849, throughput 2.83336K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/173] avg loss 0.012496, throughput 2.88158K wps
[Epoch 33 Batch 60/173] avg loss 0.012308, throughput 2.87395K wps
[Epoch 33 Batch 90/173] avg loss 0.0123155, throughput 2.85398K wps
[Epoch 33 Batch 120/173] avg loss 0.0123228, throughput 2.85333K wps
[Epoch 33 Batch 150/173] avg loss 0.0123354, throughput 2.8819K wps
Begin Testing...
[Epoch 33] train avg loss 0.0123483, dev acc 0.6559, dev avg loss 0.643098, throughput 2.86401K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/173] avg loss 0.0123631, throughput 2.90737K wps
[Epoch 34 Batch 60/173] avg loss 0.0123057, throughput 2.84833K wps
[Epoch 34 Batch 90/173] avg loss 0.012201, throughput 2.86754K wps
[Epoch 34 Batch 120/173] avg loss 0.0122137, throughput 2.866K wps
[Epoch 34 Batch 150/173] avg loss 0.0122731, throughput 2.86408K wps
Begin Testing...
[Epoch 34] train avg loss 0.0122724, dev acc 0.6455, dev avg loss 0.641501, throughput 2.87059K wps
[Epoch 35 Batch 30/173] avg loss 0.0122646, throughput 2.93502K wps
[Epoch 35 Batch 60/173] avg loss 0.0122524, throughput 2.87585K wps
[Epoch 35 Batch 90/173] avg loss 0.0120691, throughput 2.87675K wps
[Epoch 35 Batch 120/173] avg loss 0.0122873, throughput 2.87719K wps
[Epoch 35 Batch 150/173] avg loss 0.0121045, throughput 2.86234K wps
Begin Testing...
[Epoch 35] train avg loss 0.0121833, dev acc 0.6465, dev avg loss 0.637831, throughput 2.8821K wps
[Epoch 36 Batch 30/173] avg loss 0.012009, throughput 2.86237K wps
[Epoch 36 Batch 60/173] avg loss 0.0118945, throughput 2.87803K wps
[Epoch 36 Batch 90/173] avg loss 0.0118651, throughput 2.8671K wps
[Epoch 36 Batch 120/173] avg loss 0.0122484, throughput 2.87179K wps
[Epoch 36 Batch 150/173] avg loss 0.0119684, throughput 2.86805K wps
Begin Testing...
[Epoch 36] train avg loss 0.0120156, dev acc 0.6590, dev avg loss 0.634759, throughput 2.86897K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/173] avg loss 0.0119201, throughput 2.93482K wps
[Epoch 37 Batch 60/173] avg loss 0.0120449, throughput 2.85774K wps
[Epoch 37 Batch 90/173] avg loss 0.0118792, throughput 2.85659K wps
[Epoch 37 Batch 120/173] avg loss 0.011994, throughput 2.86332K wps
[Epoch 37 Batch 150/173] avg loss 0.0118964, throughput 2.87783K wps
Begin Testing...
[Epoch 37] train avg loss 0.0119794, dev acc 0.6601, dev avg loss 0.631778, throughput 2.87815K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/173] avg loss 0.0118965, throughput 2.88437K wps
[Epoch 38 Batch 60/173] avg loss 0.0117494, throughput 2.81562K wps
[Epoch 38 Batch 90/173] avg loss 0.0120089, throughput 2.8257K wps
[Epoch 38 Batch 120/173] avg loss 0.0116986, throughput 2.83342K wps
[Epoch 38 Batch 150/173] avg loss 0.0119253, throughput 2.79196K wps
Begin Testing...
[Epoch 38] train avg loss 0.0118614, dev acc 0.6601, dev avg loss 0.628957, throughput 2.82517K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/173] avg loss 0.0118536, throughput 2.91106K wps
[Epoch 39 Batch 60/173] avg loss 0.0119195, throughput 2.87069K wps
[Epoch 39 Batch 90/173] avg loss 0.0117747, throughput 2.86094K wps
[Epoch 39 Batch 120/173] avg loss 0.0116288, throughput 2.84042K wps
[Epoch 39 Batch 150/173] avg loss 0.0115976, throughput 2.82312K wps
Begin Testing...
[Epoch 39] train avg loss 0.0117551, dev acc 0.6569, dev avg loss 0.627466, throughput 2.85707K wps
[Epoch 40 Batch 30/173] avg loss 0.0116356, throughput 2.90208K wps
[Epoch 40 Batch 60/173] avg loss 0.0114572, throughput 2.84837K wps
[Epoch 40 Batch 90/173] avg loss 0.0117605, throughput 2.87682K wps
[Epoch 40 Batch 120/173] avg loss 0.0115494, throughput 2.87597K wps
[Epoch 40 Batch 150/173] avg loss 0.0117538, throughput 2.87381K wps
Begin Testing...
[Epoch 40] train avg loss 0.0116379, dev acc 0.6580, dev avg loss 0.624648, throughput 2.87609K wps
[Epoch 41 Batch 30/173] avg loss 0.0118209, throughput 2.89534K wps
[Epoch 41 Batch 60/173] avg loss 0.0113878, throughput 2.8732K wps
[Epoch 41 Batch 90/173] avg loss 0.0114686, throughput 2.82373K wps
[Epoch 41 Batch 120/173] avg loss 0.0111938, throughput 2.86328K wps
[Epoch 41 Batch 150/173] avg loss 0.0117093, throughput 2.82512K wps
Begin Testing...
[Epoch 41] train avg loss 0.0115313, dev acc 0.6767, dev avg loss 0.617907, throughput 2.85659K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/173] avg loss 0.0115237, throughput 2.90629K wps
[Epoch 42 Batch 60/173] avg loss 0.0114153, throughput 2.87643K wps
[Epoch 42 Batch 90/173] avg loss 0.0113494, throughput 2.8749K wps
[Epoch 42 Batch 120/173] avg loss 0.0111204, throughput 2.87469K wps
[Epoch 42 Batch 150/173] avg loss 0.0114679, throughput 2.87719K wps
Begin Testing...
[Epoch 42] train avg loss 0.0113935, dev acc 0.6788, dev avg loss 0.613739, throughput 2.87611K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/173] avg loss 0.0113825, throughput 2.87803K wps
[Epoch 43 Batch 60/173] avg loss 0.0111899, throughput 2.86851K wps
[Epoch 43 Batch 90/173] avg loss 0.011511, throughput 2.87499K wps
[Epoch 43 Batch 120/173] avg loss 0.011234, throughput 2.81656K wps
[Epoch 43 Batch 150/173] avg loss 0.011291, throughput 2.86548K wps
Begin Testing...
[Epoch 43] train avg loss 0.0113001, dev acc 0.6820, dev avg loss 0.60991, throughput 2.8609K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/173] avg loss 0.0113248, throughput 2.94587K wps
[Epoch 44 Batch 60/173] avg loss 0.011041, throughput 2.86624K wps
[Epoch 44 Batch 90/173] avg loss 0.0110124, throughput 2.8438K wps
[Epoch 44 Batch 120/173] avg loss 0.0111241, throughput 2.85345K wps
[Epoch 44 Batch 150/173] avg loss 0.0112187, throughput 2.87262K wps
Begin Testing...
[Epoch 44] train avg loss 0.0111543, dev acc 0.6799, dev avg loss 0.606075, throughput 2.87434K wps
[Epoch 45 Batch 30/173] avg loss 0.0107836, throughput 2.90344K wps
[Epoch 45 Batch 60/173] avg loss 0.0111219, throughput 2.80339K wps
[Epoch 45 Batch 90/173] avg loss 0.0109179, throughput 2.84094K wps
[Epoch 45 Batch 120/173] avg loss 0.0110828, throughput 2.87652K wps
[Epoch 45 Batch 150/173] avg loss 0.0109708, throughput 2.8748K wps
Begin Testing...
[Epoch 45] train avg loss 0.0109827, dev acc 0.6893, dev avg loss 0.602515, throughput 2.86178K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/173] avg loss 0.0106867, throughput 2.91029K wps
[Epoch 46 Batch 60/173] avg loss 0.0108715, throughput 2.87585K wps
[Epoch 46 Batch 90/173] avg loss 0.0107377, throughput 2.86058K wps
[Epoch 46 Batch 120/173] avg loss 0.0107626, throughput 2.82072K wps
[Epoch 46 Batch 150/173] avg loss 0.0109226, throughput 2.83417K wps
Begin Testing...
[Epoch 46] train avg loss 0.0108344, dev acc 0.6903, dev avg loss 0.596682, throughput 2.86306K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/173] avg loss 0.0106082, throughput 2.93153K wps
[Epoch 47 Batch 60/173] avg loss 0.0107877, throughput 2.85968K wps
[Epoch 47 Batch 90/173] avg loss 0.0105158, throughput 2.86228K wps
[Epoch 47 Batch 120/173] avg loss 0.0106315, throughput 2.87061K wps
[Epoch 47 Batch 150/173] avg loss 0.0107513, throughput 2.84608K wps
Begin Testing...
[Epoch 47] train avg loss 0.0107002, dev acc 0.6986, dev avg loss 0.592339, throughput 2.87285K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/173] avg loss 0.0104102, throughput 2.94081K wps
[Epoch 48 Batch 60/173] avg loss 0.0106314, throughput 2.86708K wps
[Epoch 48 Batch 90/173] avg loss 0.0105008, throughput 2.86342K wps
[Epoch 48 Batch 120/173] avg loss 0.0105427, throughput 2.81023K wps
[Epoch 48 Batch 150/173] avg loss 0.0103575, throughput 2.82737K wps
Begin Testing...
[Epoch 48] train avg loss 0.0105308, dev acc 0.6976, dev avg loss 0.58794, throughput 2.86279K wps
[Epoch 49 Batch 30/173] avg loss 0.0104909, throughput 2.90655K wps
[Epoch 49 Batch 60/173] avg loss 0.0102481, throughput 2.84668K wps
[Epoch 49 Batch 90/173] avg loss 0.0102958, throughput 2.84619K wps
[Epoch 49 Batch 120/173] avg loss 0.0103836, throughput 2.81912K wps
[Epoch 49 Batch 150/173] avg loss 0.0104299, throughput 2.8477K wps
Begin Testing...
[Epoch 49] train avg loss 0.0103739, dev acc 0.7080, dev avg loss 0.581534, throughput 2.85352K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/173] avg loss 0.0101501, throughput 2.90484K wps
[Epoch 50 Batch 60/173] avg loss 0.0105204, throughput 2.86414K wps
[Epoch 50 Batch 90/173] avg loss 0.0102345, throughput 2.87526K wps
[Epoch 50 Batch 120/173] avg loss 0.0101285, throughput 2.8716K wps
[Epoch 50 Batch 150/173] avg loss 0.0100843, throughput 2.86812K wps
Begin Testing...
[Epoch 50] train avg loss 0.0102335, dev acc 0.7080, dev avg loss 0.577113, throughput 2.87456K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/173] avg loss 0.0098681, throughput 2.8642K wps
[Epoch 51 Batch 60/173] avg loss 0.0100564, throughput 2.83286K wps
[Epoch 51 Batch 90/173] avg loss 0.010154, throughput 2.7961K wps
[Epoch 51 Batch 120/173] avg loss 0.00984974, throughput 2.85515K wps
[Epoch 51 Batch 150/173] avg loss 0.0100891, throughput 2.85957K wps
Begin Testing...
[Epoch 51] train avg loss 0.0100636, dev acc 0.7080, dev avg loss 0.573338, throughput 2.83795K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/173] avg loss 0.00982378, throughput 2.89046K wps
[Epoch 52 Batch 60/173] avg loss 0.00963631, throughput 2.82254K wps
[Epoch 52 Batch 90/173] avg loss 0.010116, throughput 2.84898K wps
[Epoch 52 Batch 120/173] avg loss 0.00997202, throughput 2.84132K wps
[Epoch 52 Batch 150/173] avg loss 0.00968267, throughput 2.86112K wps
Begin Testing...
[Epoch 52] train avg loss 0.00987962, dev acc 0.7101, dev avg loss 0.570876, throughput 2.84872K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/173] avg loss 0.00976667, throughput 2.88662K wps
[Epoch 53 Batch 60/173] avg loss 0.00966091, throughput 2.83991K wps
[Epoch 53 Batch 90/173] avg loss 0.0097492, throughput 2.8555K wps
[Epoch 53 Batch 120/173] avg loss 0.00982069, throughput 2.87291K wps
[Epoch 53 Batch 150/173] avg loss 0.00955442, throughput 2.82397K wps
Begin Testing...
[Epoch 53] train avg loss 0.00971569, dev acc 0.7143, dev avg loss 0.560483, throughput 2.85648K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/173] avg loss 0.00945136, throughput 2.93497K wps
[Epoch 54 Batch 60/173] avg loss 0.00956677, throughput 2.80576K wps
[Epoch 54 Batch 90/173] avg loss 0.00937743, throughput 2.78811K wps
[Epoch 54 Batch 120/173] avg loss 0.00951517, throughput 2.85101K wps
[Epoch 54 Batch 150/173] avg loss 0.00944554, throughput 2.87243K wps
Begin Testing...
[Epoch 54] train avg loss 0.00949202, dev acc 0.7195, dev avg loss 0.555841, throughput 2.84797K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/173] avg loss 0.00940937, throughput 2.89257K wps
[Epoch 55 Batch 60/173] avg loss 0.00931964, throughput 2.85137K wps
[Epoch 55 Batch 90/173] avg loss 0.00921804, throughput 2.83446K wps
[Epoch 55 Batch 120/173] avg loss 0.0092314, throughput 2.86339K wps
[Epoch 55 Batch 150/173] avg loss 0.00926398, throughput 2.81644K wps
Begin Testing...
[Epoch 55] train avg loss 0.00931809, dev acc 0.7258, dev avg loss 0.549654, throughput 2.85416K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/173] avg loss 0.00910319, throughput 2.88921K wps
[Epoch 56 Batch 60/173] avg loss 0.00910208, throughput 2.81452K wps
[Epoch 56 Batch 90/173] avg loss 0.00927417, throughput 2.82492K wps
[Epoch 56 Batch 120/173] avg loss 0.0091543, throughput 2.8137K wps
[Epoch 56 Batch 150/173] avg loss 0.00893224, throughput 2.86107K wps
Begin Testing...
[Epoch 56] train avg loss 0.00911768, dev acc 0.7289, dev avg loss 0.545448, throughput 2.83952K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/173] avg loss 0.00871802, throughput 2.9314K wps
[Epoch 57 Batch 60/173] avg loss 0.00868324, throughput 2.82367K wps
[Epoch 57 Batch 90/173] avg loss 0.00905005, throughput 2.83981K wps
[Epoch 57 Batch 120/173] avg loss 0.00908672, throughput 2.87154K wps
[Epoch 57 Batch 150/173] avg loss 0.00921157, throughput 2.83178K wps
Begin Testing...
[Epoch 57] train avg loss 0.00895599, dev acc 0.7414, dev avg loss 0.542639, throughput 2.8598K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/173] avg loss 0.0086247, throughput 2.89357K wps
[Epoch 58 Batch 60/173] avg loss 0.00868315, throughput 2.86322K wps
[Epoch 58 Batch 90/173] avg loss 0.00884672, throughput 2.86477K wps
[Epoch 58 Batch 120/173] avg loss 0.00911834, throughput 2.86435K wps
[Epoch 58 Batch 150/173] avg loss 0.0084772, throughput 2.85568K wps
Begin Testing...
[Epoch 58] train avg loss 0.00872696, dev acc 0.7435, dev avg loss 0.535388, throughput 2.86552K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/173] avg loss 0.00852702, throughput 2.92316K wps
[Epoch 59 Batch 60/173] avg loss 0.00845235, throughput 2.85318K wps
[Epoch 59 Batch 90/173] avg loss 0.00856396, throughput 2.86527K wps
[Epoch 59 Batch 120/173] avg loss 0.00850021, throughput 2.80749K wps
[Epoch 59 Batch 150/173] avg loss 0.00859181, throughput 2.83373K wps
Begin Testing...
[Epoch 59] train avg loss 0.00855654, dev acc 0.7351, dev avg loss 0.530169, throughput 2.85183K wps
[Epoch 60 Batch 30/173] avg loss 0.00830263, throughput 2.90478K wps
[Epoch 60 Batch 60/173] avg loss 0.00844008, throughput 2.85693K wps
[Epoch 60 Batch 90/173] avg loss 0.00862975, throughput 2.82714K wps
[Epoch 60 Batch 120/173] avg loss 0.00844246, throughput 2.84634K wps
[Epoch 60 Batch 150/173] avg loss 0.00831526, throughput 2.83337K wps
Begin Testing...
[Epoch 60] train avg loss 0.00840481, dev acc 0.7529, dev avg loss 0.527661, throughput 2.8533K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/173] avg loss 0.00834229, throughput 2.92261K wps
[Epoch 61 Batch 60/173] avg loss 0.00791324, throughput 2.85312K wps
[Epoch 61 Batch 90/173] avg loss 0.00803759, throughput 2.86676K wps
[Epoch 61 Batch 120/173] avg loss 0.00826904, throughput 2.85374K wps
[Epoch 61 Batch 150/173] avg loss 0.00832077, throughput 2.87321K wps
Begin Testing...
[Epoch 61] train avg loss 0.00817301, dev acc 0.7372, dev avg loss 0.519764, throughput 2.8745K wps
[Epoch 62 Batch 30/173] avg loss 0.00807788, throughput 2.92268K wps
[Epoch 62 Batch 60/173] avg loss 0.00820058, throughput 2.85813K wps
[Epoch 62 Batch 90/173] avg loss 0.00827705, throughput 2.83132K wps
[Epoch 62 Batch 120/173] avg loss 0.00796392, throughput 2.86087K wps
[Epoch 62 Batch 150/173] avg loss 0.00781833, throughput 2.83276K wps
Begin Testing...
[Epoch 62] train avg loss 0.00805148, dev acc 0.7435, dev avg loss 0.516231, throughput 2.85931K wps
[Epoch 63 Batch 30/173] avg loss 0.00793583, throughput 2.93351K wps
[Epoch 63 Batch 60/173] avg loss 0.00787404, throughput 2.87501K wps
[Epoch 63 Batch 90/173] avg loss 0.00779845, throughput 2.88408K wps
[Epoch 63 Batch 120/173] avg loss 0.00764128, throughput 2.87274K wps
[Epoch 63 Batch 150/173] avg loss 0.00784245, throughput 2.8583K wps
Begin Testing...
[Epoch 63] train avg loss 0.00783729, dev acc 0.7550, dev avg loss 0.512314, throughput 2.8825K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/173] avg loss 0.00749398, throughput 2.85985K wps
[Epoch 64 Batch 60/173] avg loss 0.00770603, throughput 2.80227K wps
[Epoch 64 Batch 90/173] avg loss 0.00788355, throughput 2.81285K wps
[Epoch 64 Batch 120/173] avg loss 0.007541, throughput 2.85375K wps
[Epoch 64 Batch 150/173] avg loss 0.0072599, throughput 2.85106K wps
Begin Testing...
[Epoch 64] train avg loss 0.00759917, dev acc 0.7560, dev avg loss 0.507687, throughput 2.84171K wps
Observed Improvement.
Begin Testing...
[Epoch 65 Batch 30/173] avg loss 0.00742614, throughput 2.9454K wps
[Epoch 65 Batch 60/173] avg loss 0.00738352, throughput 2.83699K wps
[Epoch 65 Batch 90/173] avg loss 0.00757089, throughput 2.87218K wps
[Epoch 65 Batch 120/173] avg loss 0.00723817, throughput 2.88063K wps
[Epoch 65 Batch 150/173] avg loss 0.00742102, throughput 2.80716K wps
Begin Testing...
[Epoch 65] train avg loss 0.00743501, dev acc 0.7518, dev avg loss 0.504293, throughput 2.86322K wps
[Epoch 66 Batch 30/173] avg loss 0.0076235, throughput 2.91435K wps
[Epoch 66 Batch 60/173] avg loss 0.00734888, throughput 2.84639K wps
[Epoch 66 Batch 90/173] avg loss 0.0073426, throughput 2.88267K wps
[Epoch 66 Batch 120/173] avg loss 0.00723786, throughput 2.86359K wps
[Epoch 66 Batch 150/173] avg loss 0.00715314, throughput 2.86824K wps
Begin Testing...
[Epoch 66] train avg loss 0.00731355, dev acc 0.7664, dev avg loss 0.503834, throughput 2.87392K wps
Observed Improvement.
Begin Testing...
[Epoch 67 Batch 30/173] avg loss 0.00716668, throughput 2.93902K wps
[Epoch 67 Batch 60/173] avg loss 0.00730668, throughput 2.88123K wps
[Epoch 67 Batch 90/173] avg loss 0.0067614, throughput 2.86853K wps
[Epoch 67 Batch 120/173] avg loss 0.00721719, throughput 2.87205K wps
[Epoch 67 Batch 150/173] avg loss 0.00688334, throughput 2.83756K wps
Begin Testing...
[Epoch 67] train avg loss 0.00710641, dev acc 0.7664, dev avg loss 0.500285, throughput 2.87894K wps
Observed Improvement.
Begin Testing...
[Epoch 68 Batch 30/173] avg loss 0.0069235, throughput 2.89127K wps
[Epoch 68 Batch 60/173] avg loss 0.00699719, throughput 2.87284K wps
[Epoch 68 Batch 90/173] avg loss 0.00717238, throughput 2.86293K wps
[Epoch 68 Batch 120/173] avg loss 0.00646611, throughput 2.81414K wps
[Epoch 68 Batch 150/173] avg loss 0.00704056, throughput 2.83788K wps
Begin Testing...
[Epoch 68] train avg loss 0.00694391, dev acc 0.7675, dev avg loss 0.493898, throughput 2.85703K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/173] avg loss 0.00671286, throughput 2.92114K wps
[Epoch 69 Batch 60/173] avg loss 0.00673842, throughput 2.85643K wps
[Epoch 69 Batch 90/173] avg loss 0.00667344, throughput 2.84246K wps
[Epoch 69 Batch 120/173] avg loss 0.0067687, throughput 2.87853K wps
[Epoch 69 Batch 150/173] avg loss 0.00658727, throughput 2.87176K wps
Begin Testing...
[Epoch 69] train avg loss 0.00672315, dev acc 0.7664, dev avg loss 0.491688, throughput 2.8695K wps
[Epoch 70 Batch 30/173] avg loss 0.00657922, throughput 2.85695K wps
[Epoch 70 Batch 60/173] avg loss 0.00654749, throughput 2.85236K wps
[Epoch 70 Batch 90/173] avg loss 0.00650815, throughput 2.86327K wps
[Epoch 70 Batch 120/173] avg loss 0.00666966, throughput 2.87414K wps
[Epoch 70 Batch 150/173] avg loss 0.00661529, throughput 2.86169K wps
Begin Testing...
[Epoch 70] train avg loss 0.00659926, dev acc 0.7664, dev avg loss 0.488786, throughput 2.85686K wps
[Epoch 71 Batch 30/173] avg loss 0.0062402, throughput 2.87319K wps
[Epoch 71 Batch 60/173] avg loss 0.00647673, throughput 2.84711K wps
[Epoch 71 Batch 90/173] avg loss 0.00620781, throughput 2.85877K wps
[Epoch 71 Batch 120/173] avg loss 0.0065761, throughput 2.79749K wps
[Epoch 71 Batch 150/173] avg loss 0.00631391, throughput 2.863K wps
Begin Testing...
[Epoch 71] train avg loss 0.00633841, dev acc 0.7518, dev avg loss 0.493869, throughput 2.85121K wps
[Epoch 72 Batch 30/173] avg loss 0.00616815, throughput 2.93992K wps
[Epoch 72 Batch 60/173] avg loss 0.00615616, throughput 2.85972K wps
[Epoch 72 Batch 90/173] avg loss 0.00612653, throughput 2.87616K wps
[Epoch 72 Batch 120/173] avg loss 0.00616299, throughput 2.85046K wps
[Epoch 72 Batch 150/173] avg loss 0.00637933, throughput 2.85343K wps
Begin Testing...
[Epoch 72] train avg loss 0.00616668, dev acc 0.7748, dev avg loss 0.485706, throughput 2.86529K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/173] avg loss 0.00574801, throughput 2.935K wps
[Epoch 73 Batch 60/173] avg loss 0.00615646, throughput 2.79996K wps
[Epoch 73 Batch 90/173] avg loss 0.00607605, throughput 2.84348K wps
[Epoch 73 Batch 120/173] avg loss 0.00604387, throughput 2.82781K wps
[Epoch 73 Batch 150/173] avg loss 0.00604298, throughput 2.83901K wps
Begin Testing...
[Epoch 73] train avg loss 0.00601291, dev acc 0.7716, dev avg loss 0.481851, throughput 2.84794K wps
[Epoch 74 Batch 30/173] avg loss 0.00575522, throughput 2.92906K wps
[Epoch 74 Batch 60/173] avg loss 0.00582366, throughput 2.85936K wps
[Epoch 74 Batch 90/173] avg loss 0.00580065, throughput 2.87749K wps
[Epoch 74 Batch 120/173] avg loss 0.00598474, throughput 2.82918K wps
[Epoch 74 Batch 150/173] avg loss 0.00576906, throughput 2.86277K wps
Begin Testing...
[Epoch 74] train avg loss 0.00587692, dev acc 0.7675, dev avg loss 0.481471, throughput 2.87165K wps
[Epoch 75 Batch 30/173] avg loss 0.0058574, throughput 2.93265K wps
[Epoch 75 Batch 60/173] avg loss 0.00592455, throughput 2.82412K wps
[Epoch 75 Batch 90/173] avg loss 0.00556488, throughput 2.87478K wps
[Epoch 75 Batch 120/173] avg loss 0.00561296, throughput 2.85701K wps
[Epoch 75 Batch 150/173] avg loss 0.00601812, throughput 2.87212K wps
Begin Testing...
[Epoch 75] train avg loss 0.00577205, dev acc 0.7789, dev avg loss 0.478988, throughput 2.87236K wps
Observed Improvement.
Begin Testing...
[Epoch 76 Batch 30/173] avg loss 0.00534377, throughput 2.85535K wps
[Epoch 76 Batch 60/173] avg loss 0.00543918, throughput 2.7978K wps
[Epoch 76 Batch 90/173] avg loss 0.00559307, throughput 2.86715K wps
[Epoch 76 Batch 120/173] avg loss 0.00549872, throughput 2.87074K wps
[Epoch 76 Batch 150/173] avg loss 0.00578396, throughput 2.86382K wps
Begin Testing...
[Epoch 76] train avg loss 0.00553951, dev acc 0.7737, dev avg loss 0.478761, throughput 2.85062K wps
[Epoch 77 Batch 30/173] avg loss 0.00529321, throughput 2.86251K wps
[Epoch 77 Batch 60/173] avg loss 0.00567666, throughput 2.82777K wps
[Epoch 77 Batch 90/173] avg loss 0.00530555, throughput 2.85814K wps
[Epoch 77 Batch 120/173] avg loss 0.00521124, throughput 2.86425K wps
[Epoch 77 Batch 150/173] avg loss 0.00533219, throughput 2.86812K wps
Begin Testing...
[Epoch 77] train avg loss 0.00538063, dev acc 0.7769, dev avg loss 0.479112, throughput 2.85869K wps
[Epoch 78 Batch 30/173] avg loss 0.00521082, throughput 2.91522K wps
[Epoch 78 Batch 60/173] avg loss 0.00539247, throughput 2.86595K wps
[Epoch 78 Batch 90/173] avg loss 0.00513986, throughput 2.86975K wps
[Epoch 78 Batch 120/173] avg loss 0.00536724, throughput 2.84033K wps
[Epoch 78 Batch 150/173] avg loss 0.00530364, throughput 2.88131K wps
Begin Testing...
[Epoch 78] train avg loss 0.0052694, dev acc 0.7675, dev avg loss 0.480205, throughput 2.87404K wps
[Epoch 79 Batch 30/173] avg loss 0.00496121, throughput 2.94095K wps
[Epoch 79 Batch 60/173] avg loss 0.00510759, throughput 2.87553K wps
[Epoch 79 Batch 90/173] avg loss 0.00531309, throughput 2.80917K wps
[Epoch 79 Batch 120/173] avg loss 0.00514262, throughput 2.86784K wps
[Epoch 79 Batch 150/173] avg loss 0.00523127, throughput 2.87727K wps
Begin Testing...
[Epoch 79] train avg loss 0.00517194, dev acc 0.7737, dev avg loss 0.4748, throughput 2.87326K wps
[Epoch 80 Batch 30/173] avg loss 0.00510943, throughput 2.86533K wps
[Epoch 80 Batch 60/173] avg loss 0.00502718, throughput 2.86774K wps
[Epoch 80 Batch 90/173] avg loss 0.00494491, throughput 2.87922K wps
[Epoch 80 Batch 120/173] avg loss 0.00494249, throughput 2.85307K wps
[Epoch 80 Batch 150/173] avg loss 0.0048266, throughput 2.86989K wps
Begin Testing...
[Epoch 80] train avg loss 0.00497209, dev acc 0.7748, dev avg loss 0.479881, throughput 2.8669K wps
[Epoch 81 Batch 30/173] avg loss 0.00487023, throughput 2.92615K wps
[Epoch 81 Batch 60/173] avg loss 0.00467616, throughput 2.87247K wps
[Epoch 81 Batch 90/173] avg loss 0.004796, throughput 2.87767K wps
[Epoch 81 Batch 120/173] avg loss 0.00522415, throughput 2.86722K wps
[Epoch 81 Batch 150/173] avg loss 0.00465344, throughput 2.87158K wps
Begin Testing...
[Epoch 81] train avg loss 0.00484188, dev acc 0.7769, dev avg loss 0.475095, throughput 2.87938K wps
[Epoch 82 Batch 30/173] avg loss 0.0046399, throughput 2.85757K wps
[Epoch 82 Batch 60/173] avg loss 0.0045729, throughput 2.85987K wps
[Epoch 82 Batch 90/173] avg loss 0.00456101, throughput 2.82572K wps
[Epoch 82 Batch 120/173] avg loss 0.00477147, throughput 2.80771K wps
[Epoch 82 Batch 150/173] avg loss 0.00461155, throughput 2.8167K wps
Begin Testing...
[Epoch 82] train avg loss 0.00465913, dev acc 0.7716, dev avg loss 0.474944, throughput 2.83366K wps
[Epoch 83 Batch 30/173] avg loss 0.00437607, throughput 2.86354K wps
[Epoch 83 Batch 60/173] avg loss 0.00429266, throughput 2.80545K wps
[Epoch 83 Batch 90/173] avg loss 0.00470259, throughput 2.76936K wps
[Epoch 83 Batch 120/173] avg loss 0.00449187, throughput 2.77519K wps
[Epoch 83 Batch 150/173] avg loss 0.00453992, throughput 2.82442K wps
Begin Testing...
[Epoch 83] train avg loss 0.00451186, dev acc 0.7685, dev avg loss 0.478335, throughput 2.81214K wps
[Epoch 84 Batch 30/173] avg loss 0.00427117, throughput 2.88448K wps
[Epoch 84 Batch 60/173] avg loss 0.00447657, throughput 2.85147K wps
[Epoch 84 Batch 90/173] avg loss 0.00451165, throughput 2.86864K wps
[Epoch 84 Batch 120/173] avg loss 0.00446397, throughput 2.8703K wps
[Epoch 84 Batch 150/173] avg loss 0.00440328, throughput 2.817K wps
Begin Testing...
[Epoch 84] train avg loss 0.00443084, dev acc 0.7800, dev avg loss 0.470976, throughput 2.86019K wps
Observed Improvement.
Begin Testing...
[Epoch 85 Batch 30/173] avg loss 0.00409881, throughput 2.94338K wps
[Epoch 85 Batch 60/173] avg loss 0.0044515, throughput 2.82905K wps
[Epoch 85 Batch 90/173] avg loss 0.00409558, throughput 2.7867K wps
[Epoch 85 Batch 120/173] avg loss 0.00420122, throughput 2.84829K wps
[Epoch 85 Batch 150/173] avg loss 0.00416802, throughput 2.87288K wps
Begin Testing...
[Epoch 85] train avg loss 0.00424428, dev acc 0.7758, dev avg loss 0.475817, throughput 2.85589K wps
[Epoch 86 Batch 30/173] avg loss 0.00417469, throughput 2.86163K wps
[Epoch 86 Batch 60/173] avg loss 0.0041789, throughput 2.85592K wps
[Epoch 86 Batch 90/173] avg loss 0.00395351, throughput 2.8569K wps
[Epoch 86 Batch 120/173] avg loss 0.00402224, throughput 2.84366K wps
[Epoch 86 Batch 150/173] avg loss 0.00428286, throughput 2.84611K wps
Begin Testing...
[Epoch 86] train avg loss 0.00415759, dev acc 0.7654, dev avg loss 0.483893, throughput 2.84335K wps
[Epoch 87 Batch 30/173] avg loss 0.00382067, throughput 2.86667K wps
[Epoch 87 Batch 60/173] avg loss 0.00402976, throughput 2.82246K wps
[Epoch 87 Batch 90/173] avg loss 0.00413846, throughput 2.81511K wps
[Epoch 87 Batch 120/173] avg loss 0.00396442, throughput 2.83721K wps
[Epoch 87 Batch 150/173] avg loss 0.00390664, throughput 2.86132K wps
Begin Testing...
[Epoch 87] train avg loss 0.00402581, dev acc 0.7800, dev avg loss 0.47189, throughput 2.84062K wps
Observed Improvement.
Begin Testing...
[Epoch 88 Batch 30/173] avg loss 0.00379332, throughput 2.93041K wps
[Epoch 88 Batch 60/173] avg loss 0.0039853, throughput 2.8801K wps
[Epoch 88 Batch 90/173] avg loss 0.00376316, throughput 2.8694K wps
[Epoch 88 Batch 120/173] avg loss 0.00390728, throughput 2.79903K wps
[Epoch 88 Batch 150/173] avg loss 0.00401801, throughput 2.83454K wps
Begin Testing...
[Epoch 88] train avg loss 0.003922, dev acc 0.7758, dev avg loss 0.475086, throughput 2.85352K wps
[Epoch 89 Batch 30/173] avg loss 0.00362125, throughput 2.94035K wps
[Epoch 89 Batch 60/173] avg loss 0.0037589, throughput 2.86869K wps
[Epoch 89 Batch 90/173] avg loss 0.0036437, throughput 2.87806K wps
[Epoch 89 Batch 120/173] avg loss 0.00405042, throughput 2.82027K wps
[Epoch 89 Batch 150/173] avg loss 0.0039823, throughput 2.79355K wps
Begin Testing...
[Epoch 89] train avg loss 0.00381403, dev acc 0.7810, dev avg loss 0.477668, throughput 2.85597K wps
Observed Improvement.
Begin Testing...
[Epoch 90 Batch 30/173] avg loss 0.00373275, throughput 2.86793K wps
[Epoch 90 Batch 60/173] avg loss 0.00358837, throughput 2.79951K wps
[Epoch 90 Batch 90/173] avg loss 0.00341811, throughput 2.86071K wps
[Epoch 90 Batch 120/173] avg loss 0.00371804, throughput 2.87259K wps
[Epoch 90 Batch 150/173] avg loss 0.00379787, throughput 2.8583K wps
Begin Testing...
[Epoch 90] train avg loss 0.00365492, dev acc 0.7758, dev avg loss 0.476289, throughput 2.85377K wps
[Epoch 91 Batch 30/173] avg loss 0.00351804, throughput 2.89245K wps
[Epoch 91 Batch 60/173] avg loss 0.00363351, throughput 2.84862K wps
[Epoch 91 Batch 90/173] avg loss 0.00349702, throughput 2.83009K wps
[Epoch 91 Batch 120/173] avg loss 0.00368513, throughput 2.84198K wps
[Epoch 91 Batch 150/173] avg loss 0.00352125, throughput 2.87196K wps
Begin Testing...
[Epoch 91] train avg loss 0.00355629, dev acc 0.7664, dev avg loss 0.485808, throughput 2.85993K wps
[Epoch 92 Batch 30/173] avg loss 0.00334749, throughput 2.93067K wps
[Epoch 92 Batch 60/173] avg loss 0.00358146, throughput 2.82364K wps
[Epoch 92 Batch 90/173] avg loss 0.00317431, throughput 2.79724K wps
[Epoch 92 Batch 120/173] avg loss 0.00344076, throughput 2.869K wps
[Epoch 92 Batch 150/173] avg loss 0.00328295, throughput 2.83001K wps
Begin Testing...
[Epoch 92] train avg loss 0.00341798, dev acc 0.7748, dev avg loss 0.480037, throughput 2.85139K wps
[Epoch 93 Batch 30/173] avg loss 0.00333343, throughput 2.91809K wps
[Epoch 93 Batch 60/173] avg loss 0.00333137, throughput 2.87997K wps
[Epoch 93 Batch 90/173] avg loss 0.00343815, throughput 2.86462K wps
[Epoch 93 Batch 120/173] avg loss 0.00335661, throughput 2.87969K wps
[Epoch 93 Batch 150/173] avg loss 0.00337156, throughput 2.85187K wps
Begin Testing...
[Epoch 93] train avg loss 0.00334616, dev acc 0.7779, dev avg loss 0.477323, throughput 2.87801K wps
[Epoch 94 Batch 30/173] avg loss 0.00324444, throughput 2.94124K wps
[Epoch 94 Batch 60/173] avg loss 0.00319439, throughput 2.83242K wps
[Epoch 94 Batch 90/173] avg loss 0.00318538, throughput 2.82513K wps
[Epoch 94 Batch 120/173] avg loss 0.00336243, throughput 2.8733K wps
[Epoch 94 Batch 150/173] avg loss 0.00318797, throughput 2.88523K wps
Begin Testing...
[Epoch 94] train avg loss 0.00324829, dev acc 0.7758, dev avg loss 0.483658, throughput 2.87163K wps
[Epoch 95 Batch 30/173] avg loss 0.0030983, throughput 2.8612K wps
[Epoch 95 Batch 60/173] avg loss 0.00310616, throughput 2.82006K wps
[Epoch 95 Batch 90/173] avg loss 0.00325862, throughput 2.87708K wps
[Epoch 95 Batch 120/173] avg loss 0.00306308, throughput 2.87691K wps
[Epoch 95 Batch 150/173] avg loss 0.00321175, throughput 2.83612K wps
Begin Testing...
[Epoch 95] train avg loss 0.00314738, dev acc 0.7831, dev avg loss 0.481076, throughput 2.8477K wps
Observed Improvement.
Begin Testing...
[Epoch 96 Batch 30/173] avg loss 0.00278917, throughput 2.86607K wps
[Epoch 96 Batch 60/173] avg loss 0.00285839, throughput 2.86969K wps
[Epoch 96 Batch 90/173] avg loss 0.00321495, throughput 2.80292K wps
[Epoch 96 Batch 120/173] avg loss 0.00292705, throughput 2.87005K wps
[Epoch 96 Batch 150/173] avg loss 0.00305573, throughput 2.81188K wps
Begin Testing...
[Epoch 96] train avg loss 0.00298657, dev acc 0.7769, dev avg loss 0.483675, throughput 2.84632K wps
[Epoch 97 Batch 30/173] avg loss 0.00290819, throughput 2.9296K wps
[Epoch 97 Batch 60/173] avg loss 0.00289945, throughput 2.88264K wps
[Epoch 97 Batch 90/173] avg loss 0.00288427, throughput 2.88269K wps
[Epoch 97 Batch 120/173] avg loss 0.00292974, throughput 2.86181K wps
[Epoch 97 Batch 150/173] avg loss 0.0028619, throughput 2.87762K wps
Begin Testing...
[Epoch 97] train avg loss 0.00291057, dev acc 0.7800, dev avg loss 0.488024, throughput 2.88634K wps
[Epoch 98 Batch 30/173] avg loss 0.0027762, throughput 2.94115K wps
[Epoch 98 Batch 60/173] avg loss 0.00294959, throughput 2.82937K wps
[Epoch 98 Batch 90/173] avg loss 0.00273897, throughput 2.86764K wps
[Epoch 98 Batch 120/173] avg loss 0.00296273, throughput 2.86979K wps
[Epoch 98 Batch 150/173] avg loss 0.00271593, throughput 2.85106K wps
Begin Testing...
[Epoch 98] train avg loss 0.00283707, dev acc 0.7800, dev avg loss 0.485353, throughput 2.86104K wps
[Epoch 99 Batch 30/173] avg loss 0.00280012, throughput 2.87972K wps
[Epoch 99 Batch 60/173] avg loss 0.00286694, throughput 2.88036K wps
[Epoch 99 Batch 90/173] avg loss 0.00279446, throughput 2.8266K wps
[Epoch 99 Batch 120/173] avg loss 0.00264196, throughput 2.84499K wps
[Epoch 99 Batch 150/173] avg loss 0.0027607, throughput 2.87482K wps
Begin Testing...
[Epoch 99] train avg loss 0.0027726, dev acc 0.7779, dev avg loss 0.487749, throughput 2.86248K wps
[Epoch 100 Batch 30/173] avg loss 0.00290664, throughput 2.93575K wps
[Epoch 100 Batch 60/173] avg loss 0.00255983, throughput 2.83486K wps
[Epoch 100 Batch 90/173] avg loss 0.00250652, throughput 2.88128K wps
[Epoch 100 Batch 120/173] avg loss 0.00265872, throughput 2.83724K wps
[Epoch 100 Batch 150/173] avg loss 0.00257395, throughput 2.87732K wps
Begin Testing...
[Epoch 100] train avg loss 0.00268074, dev acc 0.7779, dev avg loss 0.490127, throughput 2.87383K wps
[Epoch 101 Batch 30/173] avg loss 0.00270791, throughput 2.88327K wps
[Epoch 101 Batch 60/173] avg loss 0.00250275, throughput 2.86852K wps
[Epoch 101 Batch 90/173] avg loss 0.00260926, throughput 2.86862K wps
[Epoch 101 Batch 120/173] avg loss 0.0025697, throughput 2.87308K wps
[Epoch 101 Batch 150/173] avg loss 0.00245881, throughput 2.83128K wps
Begin Testing...
[Epoch 101] train avg loss 0.00256291, dev acc 0.7779, dev avg loss 0.489156, throughput 2.86732K wps
[Epoch 102 Batch 30/173] avg loss 0.00227622, throughput 2.94796K wps
[Epoch 102 Batch 60/173] avg loss 0.00237558, throughput 2.88311K wps
[Epoch 102 Batch 90/173] avg loss 0.00255605, throughput 2.86416K wps
[Epoch 102 Batch 120/173] avg loss 0.00267646, throughput 2.83902K wps
[Epoch 102 Batch 150/173] avg loss 0.00251917, throughput 2.79477K wps
Begin Testing...
[Epoch 102] train avg loss 0.00246865, dev acc 0.7789, dev avg loss 0.491893, throughput 2.86189K wps
[Epoch 103 Batch 30/173] avg loss 0.00260458, throughput 2.90884K wps
[Epoch 103 Batch 60/173] avg loss 0.00239855, throughput 2.87594K wps
[Epoch 103 Batch 90/173] avg loss 0.00211566, throughput 2.86087K wps
[Epoch 103 Batch 120/173] avg loss 0.0024781, throughput 2.87787K wps
[Epoch 103 Batch 150/173] avg loss 0.0025696, throughput 2.87089K wps
Begin Testing...
[Epoch 103] train avg loss 0.00243756, dev acc 0.7852, dev avg loss 0.4917, throughput 2.87842K wps
Observed Improvement.
Begin Testing...
[Epoch 104 Batch 30/173] avg loss 0.00220425, throughput 2.92348K wps
[Epoch 104 Batch 60/173] avg loss 0.00220144, throughput 2.86528K wps
[Epoch 104 Batch 90/173] avg loss 0.00242179, throughput 2.8534K wps
[Epoch 104 Batch 120/173] avg loss 0.00251295, throughput 2.85043K wps
[Epoch 104 Batch 150/173] avg loss 0.00236771, throughput 2.85362K wps
Begin Testing...
[Epoch 104] train avg loss 0.00231423, dev acc 0.7821, dev avg loss 0.495083, throughput 2.86875K wps
[Epoch 105 Batch 30/173] avg loss 0.00223238, throughput 2.92855K wps
[Epoch 105 Batch 60/173] avg loss 0.00234268, throughput 2.86827K wps
[Epoch 105 Batch 90/173] avg loss 0.00223497, throughput 2.87672K wps
[Epoch 105 Batch 120/173] avg loss 0.00227881, throughput 2.88K wps
[Epoch 105 Batch 150/173] avg loss 0.0022848, throughput 2.86402K wps
Begin Testing...
[Epoch 105] train avg loss 0.0022983, dev acc 0.7821, dev avg loss 0.498376, throughput 2.88283K wps
[Epoch 106 Batch 30/173] avg loss 0.00204174, throughput 2.91186K wps
[Epoch 106 Batch 60/173] avg loss 0.00218882, throughput 2.86007K wps
[Epoch 106 Batch 90/173] avg loss 0.00223022, throughput 2.87885K wps
[Epoch 106 Batch 120/173] avg loss 0.00211089, throughput 2.85585K wps
[Epoch 106 Batch 150/173] avg loss 0.00213683, throughput 2.82577K wps
Begin Testing...
[Epoch 106] train avg loss 0.0021773, dev acc 0.7862, dev avg loss 0.499482, throughput 2.86712K wps
Observed Improvement.
Begin Testing...
[Epoch 107 Batch 30/173] avg loss 0.0019562, throughput 2.89508K wps
[Epoch 107 Batch 60/173] avg loss 0.00220764, throughput 2.86123K wps
[Epoch 107 Batch 90/173] avg loss 0.00214838, throughput 2.87495K wps
[Epoch 107 Batch 120/173] avg loss 0.00211989, throughput 2.865K wps
[Epoch 107 Batch 150/173] avg loss 0.00215959, throughput 2.88185K wps
Begin Testing...
[Epoch 107] train avg loss 0.00213323, dev acc 0.7810, dev avg loss 0.500038, throughput 2.87067K wps
[Epoch 108 Batch 30/173] avg loss 0.00207027, throughput 2.89809K wps
[Epoch 108 Batch 60/173] avg loss 0.00211478, throughput 2.81797K wps
[Epoch 108 Batch 90/173] avg loss 0.00214031, throughput 2.84621K wps
[Epoch 108 Batch 120/173] avg loss 0.00199256, throughput 2.87496K wps
[Epoch 108 Batch 150/173] avg loss 0.00201802, throughput 2.87762K wps
Begin Testing...
[Epoch 108] train avg loss 0.002094, dev acc 0.7831, dev avg loss 0.505049, throughput 2.86452K wps
[Epoch 109 Batch 30/173] avg loss 0.00198236, throughput 2.88013K wps
[Epoch 109 Batch 60/173] avg loss 0.00204515, throughput 2.86116K wps
[Epoch 109 Batch 90/173] avg loss 0.00195322, throughput 2.83314K wps
[Epoch 109 Batch 120/173] avg loss 0.00206533, throughput 2.82952K wps
[Epoch 109 Batch 150/173] avg loss 0.00188844, throughput 2.86907K wps
Begin Testing...
[Epoch 109] train avg loss 0.00199476, dev acc 0.7800, dev avg loss 0.507663, throughput 2.85786K wps
[Epoch 110 Batch 30/173] avg loss 0.00206415, throughput 2.93976K wps
[Epoch 110 Batch 60/173] avg loss 0.00200979, throughput 2.8807K wps
[Epoch 110 Batch 90/173] avg loss 0.00198461, throughput 2.87507K wps
[Epoch 110 Batch 120/173] avg loss 0.00179186, throughput 2.88032K wps
[Epoch 110 Batch 150/173] avg loss 0.00191525, throughput 2.86082K wps
Begin Testing...
[Epoch 110] train avg loss 0.00196172, dev acc 0.7831, dev avg loss 0.511076, throughput 2.88705K wps
[Epoch 111 Batch 30/173] avg loss 0.00189991, throughput 2.91288K wps
[Epoch 111 Batch 60/173] avg loss 0.00182414, throughput 2.83018K wps
[Epoch 111 Batch 90/173] avg loss 0.00173861, throughput 2.78759K wps
[Epoch 111 Batch 120/173] avg loss 0.00194396, throughput 2.80534K wps
[Epoch 111 Batch 150/173] avg loss 0.00183359, throughput 2.87424K wps
Begin Testing...
[Epoch 111] train avg loss 0.00186921, dev acc 0.7821, dev avg loss 0.515874, throughput 2.84628K wps
[Epoch 112 Batch 30/173] avg loss 0.00178804, throughput 2.86531K wps
[Epoch 112 Batch 60/173] avg loss 0.00170772, throughput 2.79053K wps
[Epoch 112 Batch 90/173] avg loss 0.0019849, throughput 2.85886K wps
[Epoch 112 Batch 120/173] avg loss 0.00182558, throughput 2.87295K wps
[Epoch 112 Batch 150/173] avg loss 0.00206546, throughput 2.83887K wps
Begin Testing...
[Epoch 112] train avg loss 0.00188168, dev acc 0.7842, dev avg loss 0.510241, throughput 2.84974K wps
[Epoch 113 Batch 30/173] avg loss 0.00174663, throughput 2.90922K wps
[Epoch 113 Batch 60/173] avg loss 0.00172562, throughput 2.84632K wps
[Epoch 113 Batch 90/173] avg loss 0.00175189, throughput 2.87801K wps
[Epoch 113 Batch 120/173] avg loss 0.00184311, throughput 2.86851K wps
[Epoch 113 Batch 150/173] avg loss 0.00193921, throughput 2.8458K wps
Begin Testing...
[Epoch 113] train avg loss 0.00181451, dev acc 0.7831, dev avg loss 0.513996, throughput 2.86643K wps
[Epoch 114 Batch 30/173] avg loss 0.0017392, throughput 2.90069K wps
[Epoch 114 Batch 60/173] avg loss 0.00170207, throughput 2.88247K wps
[Epoch 114 Batch 90/173] avg loss 0.00174318, throughput 2.83582K wps
[Epoch 114 Batch 120/173] avg loss 0.00179972, throughput 2.88379K wps
[Epoch 114 Batch 150/173] avg loss 0.00177936, throughput 2.81614K wps
Begin Testing...
[Epoch 114] train avg loss 0.00174855, dev acc 0.7789, dev avg loss 0.518193, throughput 2.85618K wps
[Epoch 115 Batch 30/173] avg loss 0.00158164, throughput 2.87681K wps
[Epoch 115 Batch 60/173] avg loss 0.0015774, throughput 2.8094K wps
[Epoch 115 Batch 90/173] avg loss 0.00172989, throughput 2.84715K wps
[Epoch 115 Batch 120/173] avg loss 0.00182329, throughput 2.84917K wps
[Epoch 115 Batch 150/173] avg loss 0.0016391, throughput 2.82531K wps
Begin Testing...
[Epoch 115] train avg loss 0.00169685, dev acc 0.7831, dev avg loss 0.514863, throughput 2.84233K wps
[Epoch 116 Batch 30/173] avg loss 0.00167529, throughput 2.848K wps
[Epoch 116 Batch 60/173] avg loss 0.00156772, throughput 2.80205K wps
[Epoch 116 Batch 90/173] avg loss 0.00162338, throughput 2.87136K wps
[Epoch 116 Batch 120/173] avg loss 0.00173447, throughput 2.85211K wps
[Epoch 116 Batch 150/173] avg loss 0.00173673, throughput 2.88214K wps
Begin Testing...
[Epoch 116] train avg loss 0.00167247, dev acc 0.7727, dev avg loss 0.527224, throughput 2.85519K wps
[Epoch 117 Batch 30/173] avg loss 0.00161673, throughput 2.94369K wps
[Epoch 117 Batch 60/173] avg loss 0.00150159, throughput 2.85416K wps
[Epoch 117 Batch 90/173] avg loss 0.0016313, throughput 2.85984K wps
[Epoch 117 Batch 120/173] avg loss 0.00162143, throughput 2.88003K wps
[Epoch 117 Batch 150/173] avg loss 0.00162883, throughput 2.88024K wps
Begin Testing...
[Epoch 117] train avg loss 0.00159261, dev acc 0.7779, dev avg loss 0.522332, throughput 2.88217K wps
[Epoch 118 Batch 30/173] avg loss 0.00138884, throughput 2.9453K wps
[Epoch 118 Batch 60/173] avg loss 0.00147455, throughput 2.88214K wps
[Epoch 118 Batch 90/173] avg loss 0.00167598, throughput 2.87801K wps
[Epoch 118 Batch 120/173] avg loss 0.00152766, throughput 2.87034K wps
[Epoch 118 Batch 150/173] avg loss 0.00149081, throughput 2.87873K wps
Begin Testing...
[Epoch 118] train avg loss 0.0015256, dev acc 0.7842, dev avg loss 0.523219, throughput 2.88691K wps
[Epoch 119 Batch 30/173] avg loss 0.00148754, throughput 2.88396K wps
[Epoch 119 Batch 60/173] avg loss 0.00154905, throughput 2.88219K wps
[Epoch 119 Batch 90/173] avg loss 0.00151427, throughput 2.88512K wps
[Epoch 119 Batch 120/173] avg loss 0.00147225, throughput 2.87729K wps
[Epoch 119 Batch 150/173] avg loss 0.00140011, throughput 2.82795K wps
Begin Testing...
[Epoch 119] train avg loss 0.00150201, dev acc 0.7821, dev avg loss 0.526151, throughput 2.86728K wps
[Epoch 120 Batch 30/173] avg loss 0.00130813, throughput 2.90082K wps
[Epoch 120 Batch 60/173] avg loss 0.00133357, throughput 2.86754K wps
[Epoch 120 Batch 90/173] avg loss 0.00144867, throughput 2.84676K wps
[Epoch 120 Batch 120/173] avg loss 0.001529, throughput 2.88185K wps
[Epoch 120 Batch 150/173] avg loss 0.00156702, throughput 2.83003K wps
Begin Testing...
[Epoch 120] train avg loss 0.00143815, dev acc 0.7800, dev avg loss 0.534249, throughput 2.85895K wps
[Epoch 121 Batch 30/173] avg loss 0.00131178, throughput 2.94065K wps
[Epoch 121 Batch 60/173] avg loss 0.00141521, throughput 2.81945K wps
[Epoch 121 Batch 90/173] avg loss 0.00148493, throughput 2.87282K wps
[Epoch 121 Batch 120/173] avg loss 0.00141792, throughput 2.88244K wps
[Epoch 121 Batch 150/173] avg loss 0.0013935, throughput 2.86004K wps
Begin Testing...
[Epoch 121] train avg loss 0.00141276, dev acc 0.7831, dev avg loss 0.533358, throughput 2.87215K wps
[Epoch 122 Batch 30/173] avg loss 0.00126056, throughput 2.9366K wps
[Epoch 122 Batch 60/173] avg loss 0.00141567, throughput 2.88171K wps
[Epoch 122 Batch 90/173] avg loss 0.00132262, throughput 2.88065K wps
[Epoch 122 Batch 120/173] avg loss 0.00144387, throughput 2.87065K wps
[Epoch 122 Batch 150/173] avg loss 0.00140449, throughput 2.85188K wps
Begin Testing...
[Epoch 122] train avg loss 0.00136752, dev acc 0.7800, dev avg loss 0.533502, throughput 2.88263K wps
[Epoch 123 Batch 30/173] avg loss 0.00119615, throughput 2.93591K wps
[Epoch 123 Batch 60/173] avg loss 0.00138719, throughput 2.87703K wps
[Epoch 123 Batch 90/173] avg loss 0.00140509, throughput 2.88104K wps
[Epoch 123 Batch 120/173] avg loss 0.00146957, throughput 2.87969K wps
[Epoch 123 Batch 150/173] avg loss 0.00137771, throughput 2.87665K wps
Begin Testing...
[Epoch 123] train avg loss 0.00136905, dev acc 0.7779, dev avg loss 0.540198, throughput 2.88815K wps
[Epoch 124 Batch 30/173] avg loss 0.00136463, throughput 2.92051K wps
[Epoch 124 Batch 60/173] avg loss 0.00125972, throughput 2.86927K wps
[Epoch 124 Batch 90/173] avg loss 0.00139644, throughput 2.82573K wps
[Epoch 124 Batch 120/173] avg loss 0.00121137, throughput 2.87816K wps
[Epoch 124 Batch 150/173] avg loss 0.00127459, throughput 2.83284K wps
Begin Testing...
[Epoch 124] train avg loss 0.00130879, dev acc 0.7810, dev avg loss 0.538456, throughput 2.86553K wps
[Epoch 125 Batch 30/173] avg loss 0.00123032, throughput 2.93761K wps
[Epoch 125 Batch 60/173] avg loss 0.00121717, throughput 2.85501K wps
[Epoch 125 Batch 90/173] avg loss 0.0012987, throughput 2.88345K wps
[Epoch 125 Batch 120/173] avg loss 0.00122105, throughput 2.87279K wps
[Epoch 125 Batch 150/173] avg loss 0.0012437, throughput 2.83946K wps
Begin Testing...
[Epoch 125] train avg loss 0.00126102, dev acc 0.7831, dev avg loss 0.542445, throughput 2.87083K wps
[Epoch 126 Batch 30/173] avg loss 0.0011401, throughput 2.89332K wps
[Epoch 126 Batch 60/173] avg loss 0.00118617, throughput 2.86755K wps
[Epoch 126 Batch 90/173] avg loss 0.00127006, throughput 2.85643K wps
[Epoch 126 Batch 120/173] avg loss 0.00129273, throughput 2.85671K wps
[Epoch 126 Batch 150/173] avg loss 0.00123165, throughput 2.86945K wps
Begin Testing...
[Epoch 126] train avg loss 0.00123741, dev acc 0.7821, dev avg loss 0.542687, throughput 2.87015K wps
[Epoch 127 Batch 30/173] avg loss 0.00119745, throughput 2.864K wps
[Epoch 127 Batch 60/173] avg loss 0.00125089, throughput 2.87215K wps
[Epoch 127 Batch 90/173] avg loss 0.00122607, throughput 2.88426K wps
[Epoch 127 Batch 120/173] avg loss 0.00119181, throughput 2.85512K wps
[Epoch 127 Batch 150/173] avg loss 0.00116492, throughput 2.80098K wps
Begin Testing...
[Epoch 127] train avg loss 0.00120768, dev acc 0.7852, dev avg loss 0.551622, throughput 2.85588K wps
[Epoch 128 Batch 30/173] avg loss 0.00113874, throughput 2.89548K wps
[Epoch 128 Batch 60/173] avg loss 0.00117143, throughput 2.88318K wps
[Epoch 128 Batch 90/173] avg loss 0.00120013, throughput 2.8752K wps
[Epoch 128 Batch 120/173] avg loss 0.00115296, throughput 2.87662K wps
[Epoch 128 Batch 150/173] avg loss 0.00116149, throughput 2.79622K wps
Begin Testing...
[Epoch 128] train avg loss 0.00117261, dev acc 0.7769, dev avg loss 0.549404, throughput 2.85795K wps
[Epoch 129 Batch 30/173] avg loss 0.00117995, throughput 2.90431K wps
[Epoch 129 Batch 60/173] avg loss 0.0012313, throughput 2.8764K wps
[Epoch 129 Batch 90/173] avg loss 0.00115089, throughput 2.87571K wps
[Epoch 129 Batch 120/173] avg loss 0.00104864, throughput 2.85079K wps
[Epoch 129 Batch 150/173] avg loss 0.00119771, throughput 2.84571K wps
Begin Testing...
[Epoch 129] train avg loss 0.00116531, dev acc 0.7831, dev avg loss 0.552436, throughput 2.86382K wps
[Epoch 130 Batch 30/173] avg loss 0.00101835, throughput 2.93291K wps
[Epoch 130 Batch 60/173] avg loss 0.0010998, throughput 2.86563K wps
[Epoch 130 Batch 90/173] avg loss 0.00108281, throughput 2.88428K wps
[Epoch 130 Batch 120/173] avg loss 0.00110131, throughput 2.88404K wps
[Epoch 130 Batch 150/173] avg loss 0.00115237, throughput 2.88407K wps
Begin Testing...
[Epoch 130] train avg loss 0.00108735, dev acc 0.7789, dev avg loss 0.554274, throughput 2.88884K wps
[Epoch 131 Batch 30/173] avg loss 0.00105673, throughput 2.92976K wps
[Epoch 131 Batch 60/173] avg loss 0.00104768, throughput 2.86204K wps
[Epoch 131 Batch 90/173] avg loss 0.00107421, throughput 2.87305K wps
[Epoch 131 Batch 120/173] avg loss 0.00114289, throughput 2.87554K wps
[Epoch 131 Batch 150/173] avg loss 0.00116129, throughput 2.88039K wps
Begin Testing...
[Epoch 131] train avg loss 0.00108857, dev acc 0.7810, dev avg loss 0.554097, throughput 2.88234K wps
[Epoch 132 Batch 30/173] avg loss 0.000967962, throughput 2.92521K wps
[Epoch 132 Batch 60/173] avg loss 0.00108184, throughput 2.88026K wps
[Epoch 132 Batch 90/173] avg loss 0.00107108, throughput 2.87222K wps
[Epoch 132 Batch 120/173] avg loss 0.00106616, throughput 2.87821K wps
[Epoch 132 Batch 150/173] avg loss 0.00108957, throughput 2.8553K wps
Begin Testing...
[Epoch 132] train avg loss 0.0010626, dev acc 0.7800, dev avg loss 0.558571, throughput 2.88154K wps
[Epoch 133 Batch 30/173] avg loss 0.000998322, throughput 2.90834K wps
[Epoch 133 Batch 60/173] avg loss 0.00103707, throughput 2.79239K wps
[Epoch 133 Batch 90/173] avg loss 0.00100027, throughput 2.79301K wps
[Epoch 133 Batch 120/173] avg loss 0.00100925, throughput 2.82919K wps
[Epoch 133 Batch 150/173] avg loss 0.00105396, throughput 2.87997K wps
Begin Testing...
[Epoch 133] train avg loss 0.00104313, dev acc 0.7842, dev avg loss 0.559713, throughput 2.84562K wps
[Epoch 134 Batch 30/173] avg loss 0.000996921, throughput 2.91968K wps
[Epoch 134 Batch 60/173] avg loss 0.00107271, throughput 2.84932K wps
[Epoch 134 Batch 90/173] avg loss 0.00108941, throughput 2.85805K wps
[Epoch 134 Batch 120/173] avg loss 0.00105767, throughput 2.88105K wps
[Epoch 134 Batch 150/173] avg loss 0.000990564, throughput 2.86082K wps
Begin Testing...
[Epoch 134] train avg loss 0.00101564, dev acc 0.7789, dev avg loss 0.562045, throughput 2.87464K wps
[Epoch 135 Batch 30/173] avg loss 0.00101237, throughput 2.87137K wps
[Epoch 135 Batch 60/173] avg loss 0.00100314, throughput 2.79712K wps
[Epoch 135 Batch 90/173] avg loss 0.00100183, throughput 2.86371K wps
[Epoch 135 Batch 120/173] avg loss 0.000964967, throughput 2.8731K wps
[Epoch 135 Batch 150/173] avg loss 0.00097447, throughput 2.8489K wps
Begin Testing...
[Epoch 135] train avg loss 0.000989382, dev acc 0.7769, dev avg loss 0.563644, throughput 2.85534K wps
[Epoch 136 Batch 30/173] avg loss 0.000854856, throughput 2.89079K wps
[Epoch 136 Batch 60/173] avg loss 0.000902219, throughput 2.88171K wps
[Epoch 136 Batch 90/173] avg loss 0.000990633, throughput 2.88071K wps
[Epoch 136 Batch 120/173] avg loss 0.000995564, throughput 2.88404K wps
[Epoch 136 Batch 150/173] avg loss 0.000974996, throughput 2.81549K wps
Begin Testing...
[Epoch 136] train avg loss 0.000946541, dev acc 0.7800, dev avg loss 0.569567, throughput 2.86797K wps
[Epoch 137 Batch 30/173] avg loss 0.000872494, throughput 2.90002K wps
[Epoch 137 Batch 60/173] avg loss 0.000993884, throughput 2.80045K wps
[Epoch 137 Batch 90/173] avg loss 0.000994223, throughput 2.81877K wps
[Epoch 137 Batch 120/173] avg loss 0.000937797, throughput 2.83451K wps
[Epoch 137 Batch 150/173] avg loss 0.000832947, throughput 2.87701K wps
Begin Testing...
[Epoch 137] train avg loss 0.000923885, dev acc 0.7800, dev avg loss 0.570319, throughput 2.84138K wps
[Epoch 138 Batch 30/173] avg loss 0.000904531, throughput 2.94516K wps
[Epoch 138 Batch 60/173] avg loss 0.000872197, throughput 2.84482K wps
[Epoch 138 Batch 90/173] avg loss 0.000862295, throughput 2.81053K wps
[Epoch 138 Batch 120/173] avg loss 0.000960059, throughput 2.83822K wps
[Epoch 138 Batch 150/173] avg loss 0.000930763, throughput 2.83205K wps
Begin Testing...
[Epoch 138] train avg loss 0.000915049, dev acc 0.7810, dev avg loss 0.573799, throughput 2.85501K wps
[Epoch 139 Batch 30/173] avg loss 0.000913305, throughput 2.88808K wps
[Epoch 139 Batch 60/173] avg loss 0.000786423, throughput 2.83634K wps
[Epoch 139 Batch 90/173] avg loss 0.000896951, throughput 2.80524K wps
[Epoch 139 Batch 120/173] avg loss 0.000854187, throughput 2.83777K wps
[Epoch 139 Batch 150/173] avg loss 0.000864716, throughput 2.84859K wps
Begin Testing...
[Epoch 139] train avg loss 0.000867465, dev acc 0.7821, dev avg loss 0.575288, throughput 2.84016K wps
[Epoch 140 Batch 30/173] avg loss 0.000866183, throughput 2.86657K wps
[Epoch 140 Batch 60/173] avg loss 0.000916702, throughput 2.87981K wps
[Epoch 140 Batch 90/173] avg loss 0.000830543, throughput 2.88377K wps
[Epoch 140 Batch 120/173] avg loss 0.000842935, throughput 2.88245K wps
[Epoch 140 Batch 150/173] avg loss 0.00091057, throughput 2.84665K wps
Begin Testing...
[Epoch 140] train avg loss 0.000876688, dev acc 0.7810, dev avg loss 0.578298, throughput 2.87201K wps
[Epoch 141 Batch 30/173] avg loss 0.00084617, throughput 2.94752K wps
[Epoch 141 Batch 60/173] avg loss 0.000947918, throughput 2.85792K wps
[Epoch 141 Batch 90/173] avg loss 0.00090068, throughput 2.83062K wps
[Epoch 141 Batch 120/173] avg loss 0.000847969, throughput 2.78421K wps
[Epoch 141 Batch 150/173] avg loss 0.00086266, throughput 2.86795K wps
Begin Testing...
[Epoch 141] train avg loss 0.000868036, dev acc 0.7810, dev avg loss 0.578616, throughput 2.84996K wps
[Epoch 142 Batch 30/173] avg loss 0.000839891, throughput 2.94272K wps
[Epoch 142 Batch 60/173] avg loss 0.000808397, throughput 2.88684K wps
[Epoch 142 Batch 90/173] avg loss 0.000823227, throughput 2.81034K wps
[Epoch 142 Batch 120/173] avg loss 0.000804894, throughput 2.8722K wps
[Epoch 142 Batch 150/173] avg loss 0.000834517, throughput 2.88323K wps
Begin Testing...
[Epoch 142] train avg loss 0.000826148, dev acc 0.7769, dev avg loss 0.577852, throughput 2.87943K wps
[Epoch 143 Batch 30/173] avg loss 0.000806014, throughput 2.90246K wps
[Epoch 143 Batch 60/173] avg loss 0.000836153, throughput 2.8795K wps
[Epoch 143 Batch 90/173] avg loss 0.000781797, throughput 2.85909K wps
[Epoch 143 Batch 120/173] avg loss 0.000820083, throughput 2.88287K wps
[Epoch 143 Batch 150/173] avg loss 0.00076026, throughput 2.85632K wps
Begin Testing...
[Epoch 143] train avg loss 0.000814116, dev acc 0.7810, dev avg loss 0.583411, throughput 2.86936K wps
[Epoch 144 Batch 30/173] avg loss 0.000759672, throughput 2.92968K wps
[Epoch 144 Batch 60/173] avg loss 0.000736036, throughput 2.87681K wps
[Epoch 144 Batch 90/173] avg loss 0.000823863, throughput 2.85699K wps
[Epoch 144 Batch 120/173] avg loss 0.000788972, throughput 2.8702K wps
[Epoch 144 Batch 150/173] avg loss 0.000761518, throughput 2.87327K wps
Begin Testing...
[Epoch 144] train avg loss 0.000781989, dev acc 0.7842, dev avg loss 0.588443, throughput 2.87756K wps
[Epoch 145 Batch 30/173] avg loss 0.000774823, throughput 2.8836K wps
[Epoch 145 Batch 60/173] avg loss 0.000766377, throughput 2.86902K wps
[Epoch 145 Batch 90/173] avg loss 0.000719002, throughput 2.86732K wps
[Epoch 145 Batch 120/173] avg loss 0.000847673, throughput 2.87806K wps
[Epoch 145 Batch 150/173] avg loss 0.000813103, throughput 2.84654K wps
Begin Testing...
[Epoch 145] train avg loss 0.000793304, dev acc 0.7821, dev avg loss 0.587053, throughput 2.86491K wps
[Epoch 146 Batch 30/173] avg loss 0.000717196, throughput 2.89118K wps
[Epoch 146 Batch 60/173] avg loss 0.000694969, throughput 2.88047K wps
[Epoch 146 Batch 90/173] avg loss 0.000750654, throughput 2.88355K wps
[Epoch 146 Batch 120/173] avg loss 0.000832611, throughput 2.85022K wps
[Epoch 146 Batch 150/173] avg loss 0.000806709, throughput 2.80174K wps
Begin Testing...
[Epoch 146] train avg loss 0.000769985, dev acc 0.7769, dev avg loss 0.594724, throughput 2.85861K wps
[Epoch 147 Batch 30/173] avg loss 0.0007275, throughput 2.91947K wps
[Epoch 147 Batch 60/173] avg loss 0.000687823, throughput 2.83955K wps
[Epoch 147 Batch 90/173] avg loss 0.000775823, throughput 2.8425K wps
[Epoch 147 Batch 120/173] avg loss 0.000830569, throughput 2.81305K wps
[Epoch 147 Batch 150/173] avg loss 0.000837995, throughput 2.85327K wps
Begin Testing...
[Epoch 147] train avg loss 0.000759933, dev acc 0.7810, dev avg loss 0.590899, throughput 2.85234K wps
[Epoch 148 Batch 30/173] avg loss 0.000645355, throughput 2.93303K wps
[Epoch 148 Batch 60/173] avg loss 0.000730306, throughput 2.85506K wps
[Epoch 148 Batch 90/173] avg loss 0.000832937, throughput 2.86516K wps
[Epoch 148 Batch 120/173] avg loss 0.000725884, throughput 2.88188K wps
[Epoch 148 Batch 150/173] avg loss 0.000746412, throughput 2.87789K wps
Begin Testing...
[Epoch 148] train avg loss 0.000730057, dev acc 0.7821, dev avg loss 0.595182, throughput 2.87947K wps
[Epoch 149 Batch 30/173] avg loss 0.000658126, throughput 2.93441K wps
[Epoch 149 Batch 60/173] avg loss 0.000693054, throughput 2.81983K wps
[Epoch 149 Batch 90/173] avg loss 0.000684545, throughput 2.82148K wps
[Epoch 149 Batch 120/173] avg loss 0.000824422, throughput 2.82054K wps
[Epoch 149 Batch 150/173] avg loss 0.000750889, throughput 2.88111K wps
Begin Testing...
[Epoch 149] train avg loss 0.000725949, dev acc 0.7769, dev avg loss 0.593966, throughput 2.85796K wps
[Epoch 150 Batch 30/173] avg loss 0.000670909, throughput 2.93603K wps
[Epoch 150 Batch 60/173] avg loss 0.000741414, throughput 2.84939K wps
[Epoch 150 Batch 90/173] avg loss 0.000748092, throughput 2.87806K wps
[Epoch 150 Batch 120/173] avg loss 0.000718619, throughput 2.80407K wps
[Epoch 150 Batch 150/173] avg loss 0.000752049, throughput 2.80278K wps
Begin Testing...
[Epoch 150] train avg loss 0.000720467, dev acc 0.7800, dev avg loss 0.6014, throughput 2.85575K wps
[Epoch 151 Batch 30/173] avg loss 0.000699223, throughput 2.94198K wps
[Epoch 151 Batch 60/173] avg loss 0.000696665, throughput 2.81002K wps
[Epoch 151 Batch 90/173] avg loss 0.000683835, throughput 2.80068K wps
[Epoch 151 Batch 120/173] avg loss 0.000720725, throughput 2.86802K wps
[Epoch 151 Batch 150/173] avg loss 0.000684518, throughput 2.86945K wps
Begin Testing...
[Epoch 151] train avg loss 0.000694776, dev acc 0.7810, dev avg loss 0.604135, throughput 2.85913K wps
[Epoch 152 Batch 30/173] avg loss 0.000705829, throughput 2.86959K wps
[Epoch 152 Batch 60/173] avg loss 0.000587447, throughput 2.87443K wps
[Epoch 152 Batch 90/173] avg loss 0.000685874, throughput 2.82806K wps
[Epoch 152 Batch 120/173] avg loss 0.000676385, throughput 2.87516K wps
[Epoch 152 Batch 150/173] avg loss 0.000678771, throughput 2.85104K wps
Begin Testing...
[Epoch 152] train avg loss 0.00067485, dev acc 0.7789, dev avg loss 0.608059, throughput 2.86086K wps
[Epoch 153 Batch 30/173] avg loss 0.000647351, throughput 2.86307K wps
[Epoch 153 Batch 60/173] avg loss 0.000663956, throughput 2.84025K wps
[Epoch 153 Batch 90/173] avg loss 0.000671463, throughput 2.88069K wps
[Epoch 153 Batch 120/173] avg loss 0.000689065, throughput 2.88076K wps
[Epoch 153 Batch 150/173] avg loss 0.00060498, throughput 2.84897K wps
Begin Testing...
[Epoch 153] train avg loss 0.000663546, dev acc 0.7800, dev avg loss 0.604994, throughput 2.86458K wps
[Epoch 154 Batch 30/173] avg loss 0.00065516, throughput 2.94179K wps
[Epoch 154 Batch 60/173] avg loss 0.000635836, throughput 2.82937K wps
[Epoch 154 Batch 90/173] avg loss 0.000704411, throughput 2.87861K wps
[Epoch 154 Batch 120/173] avg loss 0.000638115, throughput 2.87751K wps
[Epoch 154 Batch 150/173] avg loss 0.000729785, throughput 2.80571K wps
Begin Testing...
[Epoch 154] train avg loss 0.000671688, dev acc 0.7769, dev avg loss 0.611951, throughput 2.85825K wps
[Epoch 155 Batch 30/173] avg loss 0.000633889, throughput 2.86285K wps
[Epoch 155 Batch 60/173] avg loss 0.000571359, throughput 2.82554K wps
[Epoch 155 Batch 90/173] avg loss 0.000647812, throughput 2.884K wps
[Epoch 155 Batch 120/173] avg loss 0.000660479, throughput 2.88255K wps
[Epoch 155 Batch 150/173] avg loss 0.000668301, throughput 2.87293K wps
Begin Testing...
[Epoch 155] train avg loss 0.000633875, dev acc 0.7769, dev avg loss 0.60965, throughput 2.8679K wps
[Epoch 156 Batch 30/173] avg loss 0.000605862, throughput 2.87185K wps
[Epoch 156 Batch 60/173] avg loss 0.000603338, throughput 2.82361K wps
[Epoch 156 Batch 90/173] avg loss 0.000648429, throughput 2.86302K wps
[Epoch 156 Batch 120/173] avg loss 0.000633284, throughput 2.88364K wps
[Epoch 156 Batch 150/173] avg loss 0.000645635, throughput 2.87264K wps
Begin Testing...
[Epoch 156] train avg loss 0.000628327, dev acc 0.7748, dev avg loss 0.611223, throughput 2.85984K wps
[Epoch 157 Batch 30/173] avg loss 0.000630546, throughput 2.93924K wps
[Epoch 157 Batch 60/173] avg loss 0.000623668, throughput 2.85076K wps
[Epoch 157 Batch 90/173] avg loss 0.000576343, throughput 2.87896K wps
[Epoch 157 Batch 120/173] avg loss 0.000662721, throughput 2.88225K wps
[Epoch 157 Batch 150/173] avg loss 0.000633436, throughput 2.84903K wps
Begin Testing...
[Epoch 157] train avg loss 0.000625788, dev acc 0.7789, dev avg loss 0.61028, throughput 2.87053K wps
[Epoch 158 Batch 30/173] avg loss 0.000630183, throughput 2.93431K wps
[Epoch 158 Batch 60/173] avg loss 0.000612684, throughput 2.87581K wps
[Epoch 158 Batch 90/173] avg loss 0.000538746, throughput 2.81188K wps
[Epoch 158 Batch 120/173] avg loss 0.000587141, throughput 2.82694K wps
[Epoch 158 Batch 150/173] avg loss 0.000667646, throughput 2.87875K wps
Begin Testing...
[Epoch 158] train avg loss 0.000610937, dev acc 0.7779, dev avg loss 0.61464, throughput 2.86318K wps
[Epoch 159 Batch 30/173] avg loss 0.000599396, throughput 2.94579K wps
[Epoch 159 Batch 60/173] avg loss 0.000540887, throughput 2.83111K wps
[Epoch 159 Batch 90/173] avg loss 0.000612633, throughput 2.81428K wps
[Epoch 159 Batch 120/173] avg loss 0.000582964, throughput 2.79481K wps
[Epoch 159 Batch 150/173] avg loss 0.000607897, throughput 2.83505K wps
Begin Testing...
[Epoch 159] train avg loss 0.000586066, dev acc 0.7737, dev avg loss 0.616631, throughput 2.84478K wps
[Epoch 160 Batch 30/173] avg loss 0.00055618, throughput 2.92807K wps
[Epoch 160 Batch 60/173] avg loss 0.000563523, throughput 2.87568K wps
[Epoch 160 Batch 90/173] avg loss 0.000613547, throughput 2.87267K wps
[Epoch 160 Batch 120/173] avg loss 0.000577678, throughput 2.84449K wps
[Epoch 160 Batch 150/173] avg loss 0.000673135, throughput 2.83677K wps
Begin Testing...
[Epoch 160] train avg loss 0.000592645, dev acc 0.7737, dev avg loss 0.61822, throughput 2.86679K wps
[Epoch 161 Batch 30/173] avg loss 0.000533079, throughput 2.94306K wps
[Epoch 161 Batch 60/173] avg loss 0.000567703, throughput 2.86587K wps
[Epoch 161 Batch 90/173] avg loss 0.000490051, throughput 2.88061K wps
[Epoch 161 Batch 120/173] avg loss 0.000589334, throughput 2.85666K wps
[Epoch 161 Batch 150/173] avg loss 0.000622206, throughput 2.8695K wps
Begin Testing...
[Epoch 161] train avg loss 0.000568629, dev acc 0.7769, dev avg loss 0.621224, throughput 2.8808K wps
[Epoch 162 Batch 30/173] avg loss 0.000577627, throughput 2.93119K wps
[Epoch 162 Batch 60/173] avg loss 0.000528901, throughput 2.87696K wps
[Epoch 162 Batch 90/173] avg loss 0.000589528, throughput 2.88727K wps
[Epoch 162 Batch 120/173] avg loss 0.000588041, throughput 2.83213K wps
[Epoch 162 Batch 150/173] avg loss 0.000566077, throughput 2.86967K wps
Begin Testing...
[Epoch 162] train avg loss 0.000578948, dev acc 0.7748, dev avg loss 0.624353, throughput 2.87964K wps
[Epoch 163 Batch 30/173] avg loss 0.000550771, throughput 2.91577K wps
[Epoch 163 Batch 60/173] avg loss 0.000544437, throughput 2.829K wps
[Epoch 163 Batch 90/173] avg loss 0.000574821, throughput 2.88116K wps
[Epoch 163 Batch 120/173] avg loss 0.000564647, throughput 2.87245K wps
[Epoch 163 Batch 150/173] avg loss 0.000572643, throughput 2.77458K wps
Begin Testing...
[Epoch 163] train avg loss 0.000562556, dev acc 0.7727, dev avg loss 0.628645, throughput 2.8498K wps
[Epoch 164 Batch 30/173] avg loss 0.000563595, throughput 2.86396K wps
[Epoch 164 Batch 60/173] avg loss 0.000526145, throughput 2.84368K wps
[Epoch 164 Batch 90/173] avg loss 0.000616506, throughput 2.87236K wps
[Epoch 164 Batch 120/173] avg loss 0.00056942, throughput 2.82709K wps
[Epoch 164 Batch 150/173] avg loss 0.000482619, throughput 2.8315K wps
Begin Testing...
[Epoch 164] train avg loss 0.000548473, dev acc 0.7737, dev avg loss 0.629349, throughput 2.85023K wps
[Epoch 165 Batch 30/173] avg loss 0.000523301, throughput 2.90581K wps
[Epoch 165 Batch 60/173] avg loss 0.000526265, throughput 2.88074K wps
[Epoch 165 Batch 90/173] avg loss 0.000497142, throughput 2.87636K wps
[Epoch 165 Batch 120/173] avg loss 0.000525685, throughput 2.87752K wps
[Epoch 165 Batch 150/173] avg loss 0.000529316, throughput 2.8693K wps
Begin Testing...
[Epoch 165] train avg loss 0.000521857, dev acc 0.7789, dev avg loss 0.638603, throughput 2.87544K wps
[Epoch 166 Batch 30/173] avg loss 0.000530698, throughput 2.9442K wps
[Epoch 166 Batch 60/173] avg loss 0.000476896, throughput 2.83657K wps
[Epoch 166 Batch 90/173] avg loss 0.000538558, throughput 2.80167K wps
[Epoch 166 Batch 120/173] avg loss 0.000546277, throughput 2.84959K wps
[Epoch 166 Batch 150/173] avg loss 0.000559716, throughput 2.80683K wps
Begin Testing...
[Epoch 166] train avg loss 0.000539523, dev acc 0.7748, dev avg loss 0.629046, throughput 2.84949K wps
[Epoch 167 Batch 30/173] avg loss 0.000551374, throughput 2.94726K wps
[Epoch 167 Batch 60/173] avg loss 0.000513436, throughput 2.86818K wps
[Epoch 167 Batch 90/173] avg loss 0.000495105, throughput 2.79426K wps
[Epoch 167 Batch 120/173] avg loss 0.000493981, throughput 2.83003K wps
[Epoch 167 Batch 150/173] avg loss 0.000538958, throughput 2.86698K wps
Begin Testing...
[Epoch 167] train avg loss 0.000532252, dev acc 0.7789, dev avg loss 0.62822, throughput 2.8609K wps
[Epoch 168 Batch 30/173] avg loss 0.000452709, throughput 2.85664K wps
[Epoch 168 Batch 60/173] avg loss 0.000459472, throughput 2.81752K wps
[Epoch 168 Batch 90/173] avg loss 0.000491686, throughput 2.88132K wps
[Epoch 168 Batch 120/173] avg loss 0.000492097, throughput 2.8053K wps
[Epoch 168 Batch 150/173] avg loss 0.000516282, throughput 2.87751K wps
Begin Testing...
[Epoch 168] train avg loss 0.000497187, dev acc 0.7821, dev avg loss 0.637109, throughput 2.85062K wps
[Epoch 169 Batch 30/173] avg loss 0.000526141, throughput 2.93768K wps
[Epoch 169 Batch 60/173] avg loss 0.00048836, throughput 2.88098K wps
[Epoch 169 Batch 90/173] avg loss 0.000446662, throughput 2.87525K wps
[Epoch 169 Batch 120/173] avg loss 0.00056032, throughput 2.87328K wps
[Epoch 169 Batch 150/173] avg loss 0.000530639, throughput 2.86389K wps
Begin Testing...
[Epoch 169] train avg loss 0.000505281, dev acc 0.7810, dev avg loss 0.635929, throughput 2.88274K wps
[Epoch 170 Batch 30/173] avg loss 0.000490213, throughput 2.94497K wps
[Epoch 170 Batch 60/173] avg loss 0.00050973, throughput 2.86962K wps
[Epoch 170 Batch 90/173] avg loss 0.000467205, throughput 2.86316K wps
[Epoch 170 Batch 120/173] avg loss 0.000497598, throughput 2.86673K wps
[Epoch 170 Batch 150/173] avg loss 0.000527154, throughput 2.82236K wps
Begin Testing...
[Epoch 170] train avg loss 0.00049504, dev acc 0.7769, dev avg loss 0.642508, throughput 2.87234K wps
[Epoch 171 Batch 30/173] avg loss 0.000508453, throughput 2.92541K wps
[Epoch 171 Batch 60/173] avg loss 0.000455063, throughput 2.84663K wps
[Epoch 171 Batch 90/173] avg loss 0.000441651, throughput 2.88225K wps
[Epoch 171 Batch 120/173] avg loss 0.000516881, throughput 2.87478K wps
[Epoch 171 Batch 150/173] avg loss 0.000477922, throughput 2.80736K wps
Begin Testing...
[Epoch 171] train avg loss 0.000482118, dev acc 0.7779, dev avg loss 0.641283, throughput 2.8573K wps
[Epoch 172 Batch 30/173] avg loss 0.000521026, throughput 2.95206K wps
[Epoch 172 Batch 60/173] avg loss 0.000465849, throughput 2.87596K wps
[Epoch 172 Batch 90/173] avg loss 0.000455652, throughput 2.85238K wps
[Epoch 172 Batch 120/173] avg loss 0.000461499, throughput 2.82888K wps
[Epoch 172 Batch 150/173] avg loss 0.000472578, throughput 2.80131K wps
Begin Testing...
[Epoch 172] train avg loss 0.000479511, dev acc 0.7758, dev avg loss 0.644028, throughput 2.85292K wps
[Epoch 173 Batch 30/173] avg loss 0.000428195, throughput 2.87092K wps
[Epoch 173 Batch 60/173] avg loss 0.000441234, throughput 2.86475K wps
[Epoch 173 Batch 90/173] avg loss 0.000445638, throughput 2.88498K wps
[Epoch 173 Batch 120/173] avg loss 0.000489307, throughput 2.87833K wps
[Epoch 173 Batch 150/173] avg loss 0.000461401, throughput 2.87414K wps
Begin Testing...
[Epoch 173] train avg loss 0.000460363, dev acc 0.7706, dev avg loss 0.646298, throughput 2.87541K wps
[Epoch 174 Batch 30/173] avg loss 0.000416787, throughput 2.87055K wps
[Epoch 174 Batch 60/173] avg loss 0.000466852, throughput 2.85211K wps
[Epoch 174 Batch 90/173] avg loss 0.000480111, throughput 2.88479K wps
[Epoch 174 Batch 120/173] avg loss 0.000490943, throughput 2.85917K wps
[Epoch 174 Batch 150/173] avg loss 0.00046886, throughput 2.88644K wps
Begin Testing...
[Epoch 174] train avg loss 0.00046261, dev acc 0.7779, dev avg loss 0.648196, throughput 2.87172K wps
[Epoch 175 Batch 30/173] avg loss 0.000388499, throughput 2.92033K wps
[Epoch 175 Batch 60/173] avg loss 0.000393056, throughput 2.87102K wps
[Epoch 175 Batch 90/173] avg loss 0.000441537, throughput 2.85129K wps
[Epoch 175 Batch 120/173] avg loss 0.000473135, throughput 2.88041K wps
[Epoch 175 Batch 150/173] avg loss 0.000450481, throughput 2.80487K wps
Begin Testing...
[Epoch 175] train avg loss 0.000429459, dev acc 0.7769, dev avg loss 0.651967, throughput 2.85566K wps
[Epoch 176 Batch 30/173] avg loss 0.000410868, throughput 2.91545K wps
[Epoch 176 Batch 60/173] avg loss 0.000433857, throughput 2.86749K wps
[Epoch 176 Batch 90/173] avg loss 0.000507626, throughput 2.87922K wps
[Epoch 176 Batch 120/173] avg loss 0.000464044, throughput 2.88244K wps
[Epoch 176 Batch 150/173] avg loss 0.000468802, throughput 2.87315K wps
Begin Testing...
[Epoch 176] train avg loss 0.000449443, dev acc 0.7779, dev avg loss 0.650123, throughput 2.87258K wps
[Epoch 177 Batch 30/173] avg loss 0.000365941, throughput 2.90242K wps
[Epoch 177 Batch 60/173] avg loss 0.000454424, throughput 2.83735K wps
[Epoch 177 Batch 90/173] avg loss 0.000403079, throughput 2.86428K wps
[Epoch 177 Batch 120/173] avg loss 0.000498198, throughput 2.8696K wps
[Epoch 177 Batch 150/173] avg loss 0.00044363, throughput 2.84406K wps
Begin Testing...
[Epoch 177] train avg loss 0.00044105, dev acc 0.7821, dev avg loss 0.650821, throughput 2.86091K wps
[Epoch 178 Batch 30/173] avg loss 0.000392623, throughput 2.90419K wps
[Epoch 178 Batch 60/173] avg loss 0.000386614, throughput 2.87819K wps
[Epoch 178 Batch 90/173] avg loss 0.000386295, throughput 2.8395K wps
[Epoch 178 Batch 120/173] avg loss 0.000501098, throughput 2.86584K wps
[Epoch 178 Batch 150/173] avg loss 0.000475572, throughput 2.87976K wps
Begin Testing...
[Epoch 178] train avg loss 0.000429994, dev acc 0.7810, dev avg loss 0.65651, throughput 2.87303K wps
[Epoch 179 Batch 30/173] avg loss 0.000395686, throughput 2.94399K wps
[Epoch 179 Batch 60/173] avg loss 0.000415093, throughput 2.88389K wps
[Epoch 179 Batch 90/173] avg loss 0.000411052, throughput 2.875K wps
[Epoch 179 Batch 120/173] avg loss 0.000406896, throughput 2.86802K wps
[Epoch 179 Batch 150/173] avg loss 0.000431933, throughput 2.86172K wps
Begin Testing...
[Epoch 179] train avg loss 0.000421925, dev acc 0.7748, dev avg loss 0.663103, throughput 2.88515K wps
[Epoch 180 Batch 30/173] avg loss 0.000413728, throughput 2.92814K wps
[Epoch 180 Batch 60/173] avg loss 0.000352758, throughput 2.87225K wps
[Epoch 180 Batch 90/173] avg loss 0.000431656, throughput 2.87662K wps
[Epoch 180 Batch 120/173] avg loss 0.000435211, throughput 2.83664K wps
[Epoch 180 Batch 150/173] avg loss 0.000471047, throughput 2.86389K wps
Begin Testing...
[Epoch 180] train avg loss 0.000423424, dev acc 0.7769, dev avg loss 0.661319, throughput 2.87385K wps
[Epoch 181 Batch 30/173] avg loss 0.000433227, throughput 2.8768K wps
[Epoch 181 Batch 60/173] avg loss 0.000417771, throughput 2.80363K wps
[Epoch 181 Batch 90/173] avg loss 0.00042677, throughput 2.88082K wps
[Epoch 181 Batch 120/173] avg loss 0.000407053, throughput 2.86617K wps
[Epoch 181 Batch 150/173] avg loss 0.000428408, throughput 2.80205K wps
Begin Testing...
[Epoch 181] train avg loss 0.000424804, dev acc 0.7789, dev avg loss 0.663162, throughput 2.84923K wps
[Epoch 182 Batch 30/173] avg loss 0.000426064, throughput 2.94076K wps
[Epoch 182 Batch 60/173] avg loss 0.000372239, throughput 2.87081K wps
[Epoch 182 Batch 90/173] avg loss 0.000390219, throughput 2.81151K wps
[Epoch 182 Batch 120/173] avg loss 0.000387729, throughput 2.82503K wps
[Epoch 182 Batch 150/173] avg loss 0.00041903, throughput 2.82971K wps
Begin Testing...
[Epoch 182] train avg loss 0.000401268, dev acc 0.7789, dev avg loss 0.663951, throughput 2.84721K wps
[Epoch 183 Batch 30/173] avg loss 0.000420418, throughput 2.93462K wps
[Epoch 183 Batch 60/173] avg loss 0.000392781, throughput 2.87505K wps
[Epoch 183 Batch 90/173] avg loss 0.000388287, throughput 2.87354K wps
[Epoch 183 Batch 120/173] avg loss 0.000393968, throughput 2.82614K wps
[Epoch 183 Batch 150/173] avg loss 0.000336396, throughput 2.80471K wps
Begin Testing...
[Epoch 183] train avg loss 0.000388202, dev acc 0.7800, dev avg loss 0.66649, throughput 2.85835K wps
[Epoch 184 Batch 30/173] avg loss 0.000393177, throughput 2.86844K wps
[Epoch 184 Batch 60/173] avg loss 0.00040943, throughput 2.84294K wps
[Epoch 184 Batch 90/173] avg loss 0.000388269, throughput 2.87922K wps
[Epoch 184 Batch 120/173] avg loss 0.000428751, throughput 2.86345K wps
[Epoch 184 Batch 150/173] avg loss 0.000389651, throughput 2.85557K wps
Begin Testing...
[Epoch 184] train avg loss 0.000410358, dev acc 0.7800, dev avg loss 0.666789, throughput 2.86276K wps
[Epoch 185 Batch 30/173] avg loss 0.000365812, throughput 2.8659K wps
[Epoch 185 Batch 60/173] avg loss 0.000430613, throughput 2.81379K wps
[Epoch 185 Batch 90/173] avg loss 0.000376032, throughput 2.86759K wps
[Epoch 185 Batch 120/173] avg loss 0.0004129, throughput 2.87704K wps
[Epoch 185 Batch 150/173] avg loss 0.000360709, throughput 2.87591K wps
Begin Testing...
[Epoch 185] train avg loss 0.000386952, dev acc 0.7789, dev avg loss 0.664839, throughput 2.86224K wps
[Epoch 186 Batch 30/173] avg loss 0.000373454, throughput 2.93224K wps
[Epoch 186 Batch 60/173] avg loss 0.000365208, throughput 2.82153K wps
[Epoch 186 Batch 90/173] avg loss 0.0003873, throughput 2.8439K wps
[Epoch 186 Batch 120/173] avg loss 0.000390106, throughput 2.88268K wps
[Epoch 186 Batch 150/173] avg loss 0.000375081, throughput 2.83899K wps
Begin Testing...
[Epoch 186] train avg loss 0.000376744, dev acc 0.7800, dev avg loss 0.671154, throughput 2.86162K wps
[Epoch 187 Batch 30/173] avg loss 0.00033682, throughput 2.86676K wps
[Epoch 187 Batch 60/173] avg loss 0.0003241, throughput 2.79323K wps
[Epoch 187 Batch 90/173] avg loss 0.000431101, throughput 2.79879K wps
[Epoch 187 Batch 120/173] avg loss 0.00040764, throughput 2.87179K wps
[Epoch 187 Batch 150/173] avg loss 0.000391184, throughput 2.86518K wps
Begin Testing...
[Epoch 187] train avg loss 0.000375536, dev acc 0.7789, dev avg loss 0.668521, throughput 2.84424K wps
[Epoch 188 Batch 30/173] avg loss 0.000342725, throughput 2.91057K wps
[Epoch 188 Batch 60/173] avg loss 0.000362099, throughput 2.82216K wps
[Epoch 188 Batch 90/173] avg loss 0.000412698, throughput 2.82679K wps
[Epoch 188 Batch 120/173] avg loss 0.000364357, throughput 2.85344K wps
[Epoch 188 Batch 150/173] avg loss 0.000363502, throughput 2.82961K wps
Begin Testing...
[Epoch 188] train avg loss 0.000372973, dev acc 0.7769, dev avg loss 0.675063, throughput 2.85266K wps
[Epoch 189 Batch 30/173] avg loss 0.000371425, throughput 2.9407K wps
[Epoch 189 Batch 60/173] avg loss 0.000373008, throughput 2.81072K wps
[Epoch 189 Batch 90/173] avg loss 0.00036384, throughput 2.80849K wps
[Epoch 189 Batch 120/173] avg loss 0.000435765, throughput 2.88019K wps
[Epoch 189 Batch 150/173] avg loss 0.000355961, throughput 2.8654K wps
Begin Testing...
[Epoch 189] train avg loss 0.000378928, dev acc 0.7789, dev avg loss 0.671558, throughput 2.86176K wps
[Epoch 190 Batch 30/173] avg loss 0.000374659, throughput 2.89277K wps
[Epoch 190 Batch 60/173] avg loss 0.000332445, throughput 2.87117K wps
[Epoch 190 Batch 90/173] avg loss 0.00035076, throughput 2.87194K wps
[Epoch 190 Batch 120/173] avg loss 0.000358538, throughput 2.85555K wps
[Epoch 190 Batch 150/173] avg loss 0.000392341, throughput 2.87657K wps
Begin Testing...
[Epoch 190] train avg loss 0.000367358, dev acc 0.7831, dev avg loss 0.672857, throughput 2.86294K wps
[Epoch 191 Batch 30/173] avg loss 0.000351376, throughput 2.86839K wps
[Epoch 191 Batch 60/173] avg loss 0.00032741, throughput 2.79978K wps
[Epoch 191 Batch 90/173] avg loss 0.000359158, throughput 2.83118K wps
[Epoch 191 Batch 120/173] avg loss 0.000386587, throughput 2.84978K wps
[Epoch 191 Batch 150/173] avg loss 0.000352515, throughput 2.87814K wps
Begin Testing...
[Epoch 191] train avg loss 0.00037042, dev acc 0.7789, dev avg loss 0.681482, throughput 2.85007K wps
[Epoch 192 Batch 30/173] avg loss 0.000410141, throughput 2.91006K wps
[Epoch 192 Batch 60/173] avg loss 0.0003805, throughput 2.80575K wps
[Epoch 192 Batch 90/173] avg loss 0.000328641, throughput 2.87237K wps
[Epoch 192 Batch 120/173] avg loss 0.000359236, throughput 2.80982K wps
[Epoch 192 Batch 150/173] avg loss 0.00033093, throughput 2.83827K wps
Begin Testing...
[Epoch 192] train avg loss 0.000364589, dev acc 0.7789, dev avg loss 0.679847, throughput 2.84538K wps
[Epoch 193 Batch 30/173] avg loss 0.000384031, throughput 2.88149K wps
[Epoch 193 Batch 60/173] avg loss 0.000352425, throughput 2.87667K wps
[Epoch 193 Batch 90/173] avg loss 0.000354816, throughput 2.87871K wps
[Epoch 193 Batch 120/173] avg loss 0.000383709, throughput 2.82741K wps
[Epoch 193 Batch 150/173] avg loss 0.000369553, throughput 2.80548K wps
Begin Testing...
[Epoch 193] train avg loss 0.00037176, dev acc 0.7789, dev avg loss 0.67585, throughput 2.85013K wps
[Epoch 194 Batch 30/173] avg loss 0.000332845, throughput 2.89365K wps
[Epoch 194 Batch 60/173] avg loss 0.000352638, throughput 2.84379K wps
[Epoch 194 Batch 90/173] avg loss 0.000351013, throughput 2.79437K wps
[Epoch 194 Batch 120/173] avg loss 0.000327838, throughput 2.83073K wps
[Epoch 194 Batch 150/173] avg loss 0.000341628, throughput 2.84515K wps
Begin Testing...
[Epoch 194] train avg loss 0.000345608, dev acc 0.7810, dev avg loss 0.677043, throughput 2.84609K wps
[Epoch 195 Batch 30/173] avg loss 0.000331222, throughput 2.88495K wps
[Epoch 195 Batch 60/173] avg loss 0.000325444, throughput 2.85989K wps
[Epoch 195 Batch 90/173] avg loss 0.000358637, throughput 2.8127K wps
[Epoch 195 Batch 120/173] avg loss 0.000362761, throughput 2.87353K wps
[Epoch 195 Batch 150/173] avg loss 0.000369311, throughput 2.82819K wps
Begin Testing...
[Epoch 195] train avg loss 0.000343818, dev acc 0.7800, dev avg loss 0.682101, throughput 2.85534K wps
[Epoch 196 Batch 30/173] avg loss 0.000348199, throughput 2.88877K wps
[Epoch 196 Batch 60/173] avg loss 0.000312237, throughput 2.87937K wps
[Epoch 196 Batch 90/173] avg loss 0.000337998, throughput 2.87628K wps
[Epoch 196 Batch 120/173] avg loss 0.000364641, throughput 2.84327K wps
[Epoch 196 Batch 150/173] avg loss 0.000316617, throughput 2.86995K wps
Begin Testing...
[Epoch 196] train avg loss 0.000342785, dev acc 0.7789, dev avg loss 0.679867, throughput 2.86663K wps
[Epoch 197 Batch 30/173] avg loss 0.000292972, throughput 2.93227K wps
[Epoch 197 Batch 60/173] avg loss 0.000328603, throughput 2.87567K wps
[Epoch 197 Batch 90/173] avg loss 0.000334394, throughput 2.86448K wps
[Epoch 197 Batch 120/173] avg loss 0.000368636, throughput 2.86358K wps
[Epoch 197 Batch 150/173] avg loss 0.000360107, throughput 2.87162K wps
Begin Testing...
[Epoch 197] train avg loss 0.000338141, dev acc 0.7769, dev avg loss 0.685619, throughput 2.88039K wps
[Epoch 198 Batch 30/173] avg loss 0.000293964, throughput 2.94257K wps
[Epoch 198 Batch 60/173] avg loss 0.000355301, throughput 2.87337K wps
[Epoch 198 Batch 90/173] avg loss 0.000354299, throughput 2.84713K wps
[Epoch 198 Batch 120/173] avg loss 0.000334706, throughput 2.88072K wps
[Epoch 198 Batch 150/173] avg loss 0.000311833, throughput 2.81015K wps
Begin Testing...
[Epoch 198] train avg loss 0.000328482, dev acc 0.7800, dev avg loss 0.689022, throughput 2.86055K wps
[Epoch 199 Batch 30/173] avg loss 0.000304378, throughput 2.89533K wps
[Epoch 199 Batch 60/173] avg loss 0.000343447, throughput 2.86859K wps
[Epoch 199 Batch 90/173] avg loss 0.000361467, throughput 2.8742K wps
[Epoch 199 Batch 120/173] avg loss 0.000320548, throughput 2.85645K wps
[Epoch 199 Batch 150/173] avg loss 0.0003758, throughput 2.82721K wps
Begin Testing...
[Epoch 199] train avg loss 0.000339035, dev acc 0.7769, dev avg loss 0.688094, throughput 2.85936K wps
Test loss 0.501227, test acc 0.7777
Total time cost 700.12s
[Epoch 0 Batch 30/173] avg loss 0.0138883, throughput 2.43441K wps
[Epoch 0 Batch 60/173] avg loss 0.0138626, throughput 2.8547K wps
[Epoch 0 Batch 90/173] avg loss 0.0138511, throughput 2.83706K wps
[Epoch 0 Batch 120/173] avg loss 0.0138363, throughput 2.86924K wps
[Epoch 0 Batch 150/173] avg loss 0.0138609, throughput 2.87926K wps
Begin Testing...
[Epoch 0] train avg loss 0.0138815, dev acc 0.5433, dev avg loss 0.692649, throughput 2.77872K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0138475, throughput 2.93678K wps
[Epoch 1 Batch 60/173] avg loss 0.0138343, throughput 2.87327K wps
[Epoch 1 Batch 90/173] avg loss 0.0138644, throughput 2.84516K wps
[Epoch 1 Batch 120/173] avg loss 0.0138306, throughput 2.88213K wps
[Epoch 1 Batch 150/173] avg loss 0.0138552, throughput 2.81932K wps
Begin Testing...
[Epoch 1] train avg loss 0.0138647, dev acc 0.5454, dev avg loss 0.692165, throughput 2.8622K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.0138007, throughput 2.91934K wps
[Epoch 2 Batch 60/173] avg loss 0.0138094, throughput 2.7986K wps
[Epoch 2 Batch 90/173] avg loss 0.0138083, throughput 2.7857K wps
[Epoch 2 Batch 120/173] avg loss 0.0138033, throughput 2.80641K wps
[Epoch 2 Batch 150/173] avg loss 0.0138103, throughput 2.82804K wps
Begin Testing...
[Epoch 2] train avg loss 0.0138281, dev acc 0.5474, dev avg loss 0.6913, throughput 2.83182K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.013796, throughput 2.9246K wps
[Epoch 3 Batch 60/173] avg loss 0.0137924, throughput 2.86953K wps
[Epoch 3 Batch 90/173] avg loss 0.013827, throughput 2.83582K wps
[Epoch 3 Batch 120/173] avg loss 0.0137782, throughput 2.82669K wps
[Epoch 3 Batch 150/173] avg loss 0.0138013, throughput 2.86393K wps
Begin Testing...
[Epoch 3] train avg loss 0.0138129, dev acc 0.5495, dev avg loss 0.690876, throughput 2.866K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/173] avg loss 0.0137621, throughput 2.89918K wps
[Epoch 4 Batch 60/173] avg loss 0.0137474, throughput 2.86222K wps
[Epoch 4 Batch 90/173] avg loss 0.0137633, throughput 2.88073K wps
[Epoch 4 Batch 120/173] avg loss 0.013736, throughput 2.87627K wps
[Epoch 4 Batch 150/173] avg loss 0.0137408, throughput 2.86606K wps
Begin Testing...
[Epoch 4] train avg loss 0.013777, dev acc 0.5381, dev avg loss 0.690089, throughput 2.87668K wps
[Epoch 5 Batch 30/173] avg loss 0.0137259, throughput 2.91714K wps
[Epoch 5 Batch 60/173] avg loss 0.0137196, throughput 2.80146K wps
[Epoch 5 Batch 90/173] avg loss 0.0137232, throughput 2.86124K wps
[Epoch 5 Batch 120/173] avg loss 0.0136995, throughput 2.86017K wps
[Epoch 5 Batch 150/173] avg loss 0.0137551, throughput 2.86884K wps
Begin Testing...
[Epoch 5] train avg loss 0.0137421, dev acc 0.5349, dev avg loss 0.689158, throughput 2.86117K wps
[Epoch 6 Batch 30/173] avg loss 0.0136765, throughput 2.91563K wps
[Epoch 6 Batch 60/173] avg loss 0.0136865, throughput 2.83384K wps
[Epoch 6 Batch 90/173] avg loss 0.0137025, throughput 2.88455K wps
[Epoch 6 Batch 120/173] avg loss 0.013735, throughput 2.88242K wps
[Epoch 6 Batch 150/173] avg loss 0.0136727, throughput 2.87684K wps
Begin Testing...
[Epoch 6] train avg loss 0.0137138, dev acc 0.5328, dev avg loss 0.688383, throughput 2.87878K wps
[Epoch 7 Batch 30/173] avg loss 0.0136585, throughput 2.86295K wps
[Epoch 7 Batch 60/173] avg loss 0.013674, throughput 2.86414K wps
[Epoch 7 Batch 90/173] avg loss 0.0137048, throughput 2.86745K wps
[Epoch 7 Batch 120/173] avg loss 0.0136612, throughput 2.88301K wps
[Epoch 7 Batch 150/173] avg loss 0.013634, throughput 2.88146K wps
Begin Testing...
[Epoch 7] train avg loss 0.0136908, dev acc 0.5328, dev avg loss 0.687657, throughput 2.87324K wps
[Epoch 8 Batch 30/173] avg loss 0.0137016, throughput 2.85931K wps
[Epoch 8 Batch 60/173] avg loss 0.013624, throughput 2.86253K wps
[Epoch 8 Batch 90/173] avg loss 0.0136465, throughput 2.88246K wps
[Epoch 8 Batch 120/173] avg loss 0.0136229, throughput 2.80147K wps
[Epoch 8 Batch 150/173] avg loss 0.0136582, throughput 2.88295K wps
Begin Testing...
[Epoch 8] train avg loss 0.0136672, dev acc 0.5454, dev avg loss 0.689297, throughput 2.8605K wps
[Epoch 9 Batch 30/173] avg loss 0.0135783, throughput 2.8929K wps
[Epoch 9 Batch 60/173] avg loss 0.0136034, throughput 2.88765K wps
[Epoch 9 Batch 90/173] avg loss 0.0136354, throughput 2.88425K wps
[Epoch 9 Batch 120/173] avg loss 0.0136353, throughput 2.88535K wps
[Epoch 9 Batch 150/173] avg loss 0.0135562, throughput 2.85171K wps
Begin Testing...
[Epoch 9] train avg loss 0.0136239, dev acc 0.5318, dev avg loss 0.68688, throughput 2.88091K wps
[Epoch 10 Batch 30/173] avg loss 0.013594, throughput 2.93598K wps
[Epoch 10 Batch 60/173] avg loss 0.013633, throughput 2.8407K wps
[Epoch 10 Batch 90/173] avg loss 0.0136068, throughput 2.83323K wps
[Epoch 10 Batch 120/173] avg loss 0.0136053, throughput 2.86612K wps
[Epoch 10 Batch 150/173] avg loss 0.0135887, throughput 2.87524K wps
Begin Testing...
[Epoch 10] train avg loss 0.0136222, dev acc 0.5349, dev avg loss 0.687071, throughput 2.87099K wps
[Epoch 11 Batch 30/173] avg loss 0.0135087, throughput 2.94948K wps
[Epoch 11 Batch 60/173] avg loss 0.0136375, throughput 2.87928K wps
[Epoch 11 Batch 90/173] avg loss 0.0135341, throughput 2.87408K wps
[Epoch 11 Batch 120/173] avg loss 0.0135856, throughput 2.8774K wps
[Epoch 11 Batch 150/173] avg loss 0.0135862, throughput 2.88371K wps
Begin Testing...
[Epoch 11] train avg loss 0.0135799, dev acc 0.5318, dev avg loss 0.685789, throughput 2.88234K wps
[Epoch 12 Batch 30/173] avg loss 0.0135183, throughput 2.93833K wps
[Epoch 12 Batch 60/173] avg loss 0.0135082, throughput 2.85486K wps
[Epoch 12 Batch 90/173] avg loss 0.013575, throughput 2.8764K wps
[Epoch 12 Batch 120/173] avg loss 0.0136095, throughput 2.87687K wps
[Epoch 12 Batch 150/173] avg loss 0.0134964, throughput 2.87729K wps
Begin Testing...
[Epoch 12] train avg loss 0.0135645, dev acc 0.5349, dev avg loss 0.68523, throughput 2.88164K wps
[Epoch 13 Batch 30/173] avg loss 0.0135477, throughput 2.91568K wps
[Epoch 13 Batch 60/173] avg loss 0.0135078, throughput 2.87151K wps
[Epoch 13 Batch 90/173] avg loss 0.0135792, throughput 2.87043K wps
[Epoch 13 Batch 120/173] avg loss 0.013393, throughput 2.84418K wps
[Epoch 13 Batch 150/173] avg loss 0.0135412, throughput 2.80849K wps
Begin Testing...
[Epoch 13] train avg loss 0.0135406, dev acc 0.5370, dev avg loss 0.684646, throughput 2.85429K wps
[Epoch 14 Batch 30/173] avg loss 0.0135032, throughput 2.88054K wps
[Epoch 14 Batch 60/173] avg loss 0.0134356, throughput 2.87114K wps
[Epoch 14 Batch 90/173] avg loss 0.01345, throughput 2.87566K wps
[Epoch 14 Batch 120/173] avg loss 0.0134358, throughput 2.85317K wps
[Epoch 14 Batch 150/173] avg loss 0.0134841, throughput 2.81912K wps
Begin Testing...
[Epoch 14] train avg loss 0.0134999, dev acc 0.5433, dev avg loss 0.683995, throughput 2.86013K wps
[Epoch 15 Batch 30/173] avg loss 0.0134916, throughput 2.90703K wps
[Epoch 15 Batch 60/173] avg loss 0.0134561, throughput 2.85413K wps
[Epoch 15 Batch 90/173] avg loss 0.0134483, throughput 2.85495K wps
[Epoch 15 Batch 120/173] avg loss 0.0134318, throughput 2.80437K wps
[Epoch 15 Batch 150/173] avg loss 0.0133998, throughput 2.78071K wps
Begin Testing...
[Epoch 15] train avg loss 0.0134685, dev acc 0.5506, dev avg loss 0.68347, throughput 2.83688K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.0134161, throughput 2.8992K wps
[Epoch 16 Batch 60/173] avg loss 0.0135065, throughput 2.84079K wps
[Epoch 16 Batch 90/173] avg loss 0.0132454, throughput 2.86927K wps
[Epoch 16 Batch 120/173] avg loss 0.0134557, throughput 2.85498K wps
[Epoch 16 Batch 150/173] avg loss 0.0134468, throughput 2.82698K wps
Begin Testing...
[Epoch 16] train avg loss 0.0134463, dev acc 0.5516, dev avg loss 0.681151, throughput 2.85838K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/173] avg loss 0.0133458, throughput 2.89889K wps
[Epoch 17 Batch 60/173] avg loss 0.0134284, throughput 2.87543K wps
[Epoch 17 Batch 90/173] avg loss 0.0134548, throughput 2.8632K wps
[Epoch 17 Batch 120/173] avg loss 0.0134134, throughput 2.86499K wps
[Epoch 17 Batch 150/173] avg loss 0.0132921, throughput 2.83753K wps
Begin Testing...
[Epoch 17] train avg loss 0.0134171, dev acc 0.5506, dev avg loss 0.68029, throughput 2.87054K wps
[Epoch 18 Batch 30/173] avg loss 0.0132479, throughput 2.95381K wps
[Epoch 18 Batch 60/173] avg loss 0.0133318, throughput 2.88107K wps
[Epoch 18 Batch 90/173] avg loss 0.0134286, throughput 2.88623K wps
[Epoch 18 Batch 120/173] avg loss 0.0133345, throughput 2.87169K wps
[Epoch 18 Batch 150/173] avg loss 0.0134128, throughput 2.87953K wps
Begin Testing...
[Epoch 18] train avg loss 0.013366, dev acc 0.5558, dev avg loss 0.679587, throughput 2.89102K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/173] avg loss 0.0133518, throughput 2.86938K wps
[Epoch 19 Batch 60/173] avg loss 0.0133039, throughput 2.86567K wps
[Epoch 19 Batch 90/173] avg loss 0.013304, throughput 2.86515K wps
[Epoch 19 Batch 120/173] avg loss 0.0132703, throughput 2.84722K wps
[Epoch 19 Batch 150/173] avg loss 0.013369, throughput 2.82528K wps
Begin Testing...
[Epoch 19] train avg loss 0.0133439, dev acc 0.5641, dev avg loss 0.677292, throughput 2.85468K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/173] avg loss 0.013157, throughput 2.90696K wps
[Epoch 20 Batch 60/173] avg loss 0.0133137, throughput 2.85347K wps
[Epoch 20 Batch 90/173] avg loss 0.0132206, throughput 2.86843K wps
[Epoch 20 Batch 120/173] avg loss 0.0134114, throughput 2.84797K wps
[Epoch 20 Batch 150/173] avg loss 0.0132432, throughput 2.86799K wps
Begin Testing...
[Epoch 20] train avg loss 0.0132941, dev acc 0.5631, dev avg loss 0.675849, throughput 2.86886K wps
[Epoch 21 Batch 30/173] avg loss 0.0133485, throughput 2.94384K wps
[Epoch 21 Batch 60/173] avg loss 0.0131359, throughput 2.86972K wps
[Epoch 21 Batch 90/173] avg loss 0.013128, throughput 2.88696K wps
[Epoch 21 Batch 120/173] avg loss 0.0132027, throughput 2.86977K wps
[Epoch 21 Batch 150/173] avg loss 0.0132526, throughput 2.86763K wps
Begin Testing...
[Epoch 21] train avg loss 0.0132347, dev acc 0.5652, dev avg loss 0.674205, throughput 2.88271K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/173] avg loss 0.0132236, throughput 2.94542K wps
[Epoch 22 Batch 60/173] avg loss 0.0132661, throughput 2.81125K wps
[Epoch 22 Batch 90/173] avg loss 0.0131579, throughput 2.82597K wps
[Epoch 22 Batch 120/173] avg loss 0.0130993, throughput 2.83987K wps
[Epoch 22 Batch 150/173] avg loss 0.0131767, throughput 2.88068K wps
Begin Testing...
[Epoch 22] train avg loss 0.0132135, dev acc 0.5673, dev avg loss 0.672342, throughput 2.86258K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/173] avg loss 0.013092, throughput 2.88932K wps
[Epoch 23 Batch 60/173] avg loss 0.0131382, throughput 2.87047K wps
[Epoch 23 Batch 90/173] avg loss 0.0131647, throughput 2.87477K wps
[Epoch 23 Batch 120/173] avg loss 0.0131026, throughput 2.88645K wps
[Epoch 23 Batch 150/173] avg loss 0.0130926, throughput 2.87998K wps
Begin Testing...
[Epoch 23] train avg loss 0.0131493, dev acc 0.5798, dev avg loss 0.669522, throughput 2.88018K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/173] avg loss 0.0130223, throughput 2.91978K wps
[Epoch 24 Batch 60/173] avg loss 0.0130003, throughput 2.87296K wps
[Epoch 24 Batch 90/173] avg loss 0.013064, throughput 2.87159K wps
[Epoch 24 Batch 120/173] avg loss 0.013011, throughput 2.84903K wps
[Epoch 24 Batch 150/173] avg loss 0.0130221, throughput 2.80823K wps
Begin Testing...
[Epoch 24] train avg loss 0.0130647, dev acc 0.5808, dev avg loss 0.667074, throughput 2.85784K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/173] avg loss 0.0130921, throughput 2.88257K wps
[Epoch 25 Batch 60/173] avg loss 0.0130489, throughput 2.87835K wps
[Epoch 25 Batch 90/173] avg loss 0.0130425, throughput 2.86651K wps
[Epoch 25 Batch 120/173] avg loss 0.0129078, throughput 2.87736K wps
[Epoch 25 Batch 150/173] avg loss 0.0129667, throughput 2.83824K wps
Begin Testing...
[Epoch 25] train avg loss 0.0130293, dev acc 0.5912, dev avg loss 0.665871, throughput 2.86731K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/173] avg loss 0.0128355, throughput 2.9386K wps
[Epoch 26 Batch 60/173] avg loss 0.0130298, throughput 2.83879K wps
[Epoch 26 Batch 90/173] avg loss 0.0129102, throughput 2.83202K wps
[Epoch 26 Batch 120/173] avg loss 0.0129441, throughput 2.88559K wps
[Epoch 26 Batch 150/173] avg loss 0.0129199, throughput 2.84982K wps
Begin Testing...
[Epoch 26] train avg loss 0.0129478, dev acc 0.5871, dev avg loss 0.661936, throughput 2.87123K wps
[Epoch 27 Batch 30/173] avg loss 0.0128112, throughput 2.89314K wps
[Epoch 27 Batch 60/173] avg loss 0.0127627, throughput 2.79543K wps
[Epoch 27 Batch 90/173] avg loss 0.0128976, throughput 2.82321K wps
[Epoch 27 Batch 120/173] avg loss 0.0129208, throughput 2.86571K wps
[Epoch 27 Batch 150/173] avg loss 0.0129474, throughput 2.852K wps
Begin Testing...
[Epoch 27] train avg loss 0.0128849, dev acc 0.5933, dev avg loss 0.660657, throughput 2.85129K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/173] avg loss 0.0126658, throughput 2.95256K wps
[Epoch 28 Batch 60/173] avg loss 0.0128258, throughput 2.73885K wps
[Epoch 28 Batch 90/173] avg loss 0.0128065, throughput 2.82801K wps
[Epoch 28 Batch 120/173] avg loss 0.0127622, throughput 2.88009K wps
[Epoch 28 Batch 150/173] avg loss 0.0128489, throughput 2.88046K wps
Begin Testing...
[Epoch 28] train avg loss 0.0128092, dev acc 0.5965, dev avg loss 0.656839, throughput 2.85785K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/173] avg loss 0.0126069, throughput 2.92011K wps
[Epoch 29 Batch 60/173] avg loss 0.0126216, throughput 2.79208K wps
[Epoch 29 Batch 90/173] avg loss 0.0127025, throughput 2.80952K wps
[Epoch 29 Batch 120/173] avg loss 0.0127386, throughput 2.81417K wps
[Epoch 29 Batch 150/173] avg loss 0.0128072, throughput 2.87233K wps
Begin Testing...
[Epoch 29] train avg loss 0.0127192, dev acc 0.5975, dev avg loss 0.655365, throughput 2.84515K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/173] avg loss 0.0126844, throughput 2.94398K wps
[Epoch 30 Batch 60/173] avg loss 0.0126304, throughput 2.86995K wps
[Epoch 30 Batch 90/173] avg loss 0.0125274, throughput 2.87407K wps
[Epoch 30 Batch 120/173] avg loss 0.0126379, throughput 2.84526K wps
[Epoch 30 Batch 150/173] avg loss 0.0124947, throughput 2.87276K wps
Begin Testing...
[Epoch 30] train avg loss 0.0126202, dev acc 0.5944, dev avg loss 0.655824, throughput 2.87894K wps
[Epoch 31 Batch 30/173] avg loss 0.0126289, throughput 2.87481K wps
[Epoch 31 Batch 60/173] avg loss 0.0126576, throughput 2.81477K wps
[Epoch 31 Batch 90/173] avg loss 0.0124197, throughput 2.83053K wps
[Epoch 31 Batch 120/173] avg loss 0.0125818, throughput 2.83076K wps
[Epoch 31 Batch 150/173] avg loss 0.0124287, throughput 2.8787K wps
Begin Testing...
[Epoch 31] train avg loss 0.0125505, dev acc 0.6027, dev avg loss 0.650565, throughput 2.84159K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/173] avg loss 0.012478, throughput 2.91931K wps
[Epoch 32 Batch 60/173] avg loss 0.0126127, throughput 2.85056K wps
[Epoch 32 Batch 90/173] avg loss 0.0125125, throughput 2.87882K wps
[Epoch 32 Batch 120/173] avg loss 0.012294, throughput 2.82473K wps
[Epoch 32 Batch 150/173] avg loss 0.0124961, throughput 2.82115K wps
Begin Testing...
[Epoch 32] train avg loss 0.0124818, dev acc 0.6204, dev avg loss 0.645642, throughput 2.86236K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/173] avg loss 0.0124018, throughput 2.90111K wps
[Epoch 33 Batch 60/173] avg loss 0.0124697, throughput 2.88161K wps
[Epoch 33 Batch 90/173] avg loss 0.0123058, throughput 2.85694K wps
[Epoch 33 Batch 120/173] avg loss 0.0123211, throughput 2.8297K wps
[Epoch 33 Batch 150/173] avg loss 0.0123027, throughput 2.83812K wps
Begin Testing...
[Epoch 33] train avg loss 0.0123842, dev acc 0.6225, dev avg loss 0.642737, throughput 2.85461K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/173] avg loss 0.0121465, throughput 2.94428K wps
[Epoch 34 Batch 60/173] avg loss 0.0123829, throughput 2.88049K wps
[Epoch 34 Batch 90/173] avg loss 0.0123399, throughput 2.87944K wps
[Epoch 34 Batch 120/173] avg loss 0.0121679, throughput 2.86788K wps
[Epoch 34 Batch 150/173] avg loss 0.0123407, throughput 2.85242K wps
Begin Testing...
[Epoch 34] train avg loss 0.0123001, dev acc 0.6184, dev avg loss 0.639841, throughput 2.88395K wps
[Epoch 35 Batch 30/173] avg loss 0.0120653, throughput 2.91634K wps
[Epoch 35 Batch 60/173] avg loss 0.0122152, throughput 2.86932K wps
[Epoch 35 Batch 90/173] avg loss 0.0122242, throughput 2.84796K wps
[Epoch 35 Batch 120/173] avg loss 0.0121216, throughput 2.80683K wps
[Epoch 35 Batch 150/173] avg loss 0.012279, throughput 2.86166K wps
Begin Testing...
[Epoch 35] train avg loss 0.0121683, dev acc 0.6246, dev avg loss 0.636967, throughput 2.86203K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/173] avg loss 0.0120188, throughput 2.945K wps
[Epoch 36 Batch 60/173] avg loss 0.0121888, throughput 2.86426K wps
[Epoch 36 Batch 90/173] avg loss 0.0120753, throughput 2.86625K wps
[Epoch 36 Batch 120/173] avg loss 0.0120641, throughput 2.86517K wps
[Epoch 36 Batch 150/173] avg loss 0.0120351, throughput 2.86204K wps
Begin Testing...
[Epoch 36] train avg loss 0.0120771, dev acc 0.6194, dev avg loss 0.634483, throughput 2.86977K wps
[Epoch 37 Batch 30/173] avg loss 0.0120315, throughput 2.94442K wps
[Epoch 37 Batch 60/173] avg loss 0.0119396, throughput 2.87076K wps
[Epoch 37 Batch 90/173] avg loss 0.0119971, throughput 2.82776K wps
[Epoch 37 Batch 120/173] avg loss 0.011869, throughput 2.86888K wps
[Epoch 37 Batch 150/173] avg loss 0.0120881, throughput 2.8761K wps
Begin Testing...
[Epoch 37] train avg loss 0.0119903, dev acc 0.6298, dev avg loss 0.632559, throughput 2.87531K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/173] avg loss 0.0118488, throughput 2.8758K wps
[Epoch 38 Batch 60/173] avg loss 0.0119531, throughput 2.87686K wps
[Epoch 38 Batch 90/173] avg loss 0.0119228, throughput 2.81777K wps
[Epoch 38 Batch 120/173] avg loss 0.0117085, throughput 2.88496K wps
[Epoch 38 Batch 150/173] avg loss 0.0117156, throughput 2.87454K wps
Begin Testing...
[Epoch 38] train avg loss 0.0118321, dev acc 0.6330, dev avg loss 0.626841, throughput 2.86779K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/173] avg loss 0.0118707, throughput 2.90235K wps
[Epoch 39 Batch 60/173] avg loss 0.0117562, throughput 2.87772K wps
[Epoch 39 Batch 90/173] avg loss 0.011539, throughput 2.86705K wps
[Epoch 39 Batch 120/173] avg loss 0.0117801, throughput 2.87747K wps
[Epoch 39 Batch 150/173] avg loss 0.0118021, throughput 2.85639K wps
Begin Testing...
[Epoch 39] train avg loss 0.0117461, dev acc 0.6319, dev avg loss 0.625137, throughput 2.87149K wps
[Epoch 40 Batch 30/173] avg loss 0.011461, throughput 2.90505K wps
[Epoch 40 Batch 60/173] avg loss 0.0117624, throughput 2.85483K wps
[Epoch 40 Batch 90/173] avg loss 0.0118299, throughput 2.85087K wps
[Epoch 40 Batch 120/173] avg loss 0.0116673, throughput 2.85827K wps
[Epoch 40 Batch 150/173] avg loss 0.011392, throughput 2.88769K wps
Begin Testing...
[Epoch 40] train avg loss 0.0116282, dev acc 0.6403, dev avg loss 0.620014, throughput 2.86487K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/173] avg loss 0.0115214, throughput 2.93501K wps
[Epoch 41 Batch 60/173] avg loss 0.011803, throughput 2.87179K wps
[Epoch 41 Batch 90/173] avg loss 0.011355, throughput 2.84247K wps
[Epoch 41 Batch 120/173] avg loss 0.0117182, throughput 2.82797K wps
[Epoch 41 Batch 150/173] avg loss 0.0113087, throughput 2.87546K wps
Begin Testing...
[Epoch 41] train avg loss 0.0115435, dev acc 0.6403, dev avg loss 0.616601, throughput 2.86295K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/173] avg loss 0.0112199, throughput 2.88533K wps
[Epoch 42 Batch 60/173] avg loss 0.0112861, throughput 2.85896K wps
[Epoch 42 Batch 90/173] avg loss 0.0112648, throughput 2.84122K wps
[Epoch 42 Batch 120/173] avg loss 0.0115883, throughput 2.85137K wps
[Epoch 42 Batch 150/173] avg loss 0.0114119, throughput 2.86409K wps
Begin Testing...
[Epoch 42] train avg loss 0.0113681, dev acc 0.6434, dev avg loss 0.614078, throughput 2.8577K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/173] avg loss 0.0111144, throughput 2.89284K wps
[Epoch 43 Batch 60/173] avg loss 0.0111535, throughput 2.82521K wps
[Epoch 43 Batch 90/173] avg loss 0.0112761, throughput 2.85235K wps
[Epoch 43 Batch 120/173] avg loss 0.0113823, throughput 2.80733K wps
[Epoch 43 Batch 150/173] avg loss 0.0112349, throughput 2.84523K wps
Begin Testing...
[Epoch 43] train avg loss 0.011232, dev acc 0.6475, dev avg loss 0.608557, throughput 2.84792K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/173] avg loss 0.0110408, throughput 2.90003K wps
[Epoch 44 Batch 60/173] avg loss 0.0109979, throughput 2.86658K wps
[Epoch 44 Batch 90/173] avg loss 0.0110656, throughput 2.86683K wps
[Epoch 44 Batch 120/173] avg loss 0.0112237, throughput 2.87622K wps
[Epoch 44 Batch 150/173] avg loss 0.0111704, throughput 2.86361K wps
Begin Testing...
[Epoch 44] train avg loss 0.0110968, dev acc 0.6528, dev avg loss 0.604154, throughput 2.8678K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/173] avg loss 0.0107002, throughput 2.91459K wps
[Epoch 45 Batch 60/173] avg loss 0.0111726, throughput 2.84813K wps
[Epoch 45 Batch 90/173] avg loss 0.01097, throughput 2.88023K wps
[Epoch 45 Batch 120/173] avg loss 0.0111117, throughput 2.88071K wps
[Epoch 45 Batch 150/173] avg loss 0.0109519, throughput 2.88133K wps
Begin Testing...
[Epoch 45] train avg loss 0.0109371, dev acc 0.6517, dev avg loss 0.600747, throughput 2.87995K wps
[Epoch 46 Batch 30/173] avg loss 0.0107865, throughput 2.89067K wps
[Epoch 46 Batch 60/173] avg loss 0.0108969, throughput 2.81575K wps
[Epoch 46 Batch 90/173] avg loss 0.010896, throughput 2.80025K wps
[Epoch 46 Batch 120/173] avg loss 0.0107776, throughput 2.84464K wps
[Epoch 46 Batch 150/173] avg loss 0.0107591, throughput 2.87002K wps
Begin Testing...
[Epoch 46] train avg loss 0.010794, dev acc 0.6632, dev avg loss 0.59512, throughput 2.83661K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/173] avg loss 0.0108443, throughput 2.87431K wps
[Epoch 47 Batch 60/173] avg loss 0.0106634, throughput 2.88778K wps
[Epoch 47 Batch 90/173] avg loss 0.0104686, throughput 2.87396K wps
[Epoch 47 Batch 120/173] avg loss 0.0107925, throughput 2.85607K wps
[Epoch 47 Batch 150/173] avg loss 0.0104544, throughput 2.81539K wps
Begin Testing...
[Epoch 47] train avg loss 0.0106577, dev acc 0.6642, dev avg loss 0.590339, throughput 2.86104K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/173] avg loss 0.010392, throughput 2.89926K wps
[Epoch 48 Batch 60/173] avg loss 0.010577, throughput 2.87349K wps
[Epoch 48 Batch 90/173] avg loss 0.0104137, throughput 2.81564K wps
[Epoch 48 Batch 120/173] avg loss 0.0104218, throughput 2.82606K wps
[Epoch 48 Batch 150/173] avg loss 0.0100431, throughput 2.88221K wps
Begin Testing...
[Epoch 48] train avg loss 0.0104215, dev acc 0.6726, dev avg loss 0.585965, throughput 2.8619K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/173] avg loss 0.0101359, throughput 2.86265K wps
[Epoch 49 Batch 60/173] avg loss 0.0104799, throughput 2.87285K wps
[Epoch 49 Batch 90/173] avg loss 0.0100935, throughput 2.88616K wps
[Epoch 49 Batch 120/173] avg loss 0.0105324, throughput 2.87747K wps
[Epoch 49 Batch 150/173] avg loss 0.0102879, throughput 2.89075K wps
Begin Testing...
[Epoch 49] train avg loss 0.0103079, dev acc 0.6747, dev avg loss 0.582394, throughput 2.87757K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/173] avg loss 0.0101556, throughput 2.91221K wps
[Epoch 50 Batch 60/173] avg loss 0.0101969, throughput 2.87258K wps
[Epoch 50 Batch 90/173] avg loss 0.0100127, throughput 2.87813K wps
[Epoch 50 Batch 120/173] avg loss 0.00997229, throughput 2.8827K wps
[Epoch 50 Batch 150/173] avg loss 0.0101745, throughput 2.82546K wps
Begin Testing...
[Epoch 50] train avg loss 0.0100908, dev acc 0.6913, dev avg loss 0.575394, throughput 2.8656K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/173] avg loss 0.0101351, throughput 2.89631K wps
[Epoch 51 Batch 60/173] avg loss 0.00991817, throughput 2.80875K wps
[Epoch 51 Batch 90/173] avg loss 0.00978581, throughput 2.88278K wps
[Epoch 51 Batch 120/173] avg loss 0.0100888, throughput 2.86451K wps
[Epoch 51 Batch 150/173] avg loss 0.0099943, throughput 2.86986K wps
Begin Testing...
[Epoch 51] train avg loss 0.00997682, dev acc 0.6986, dev avg loss 0.570874, throughput 2.86693K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/173] avg loss 0.0096092, throughput 2.91427K wps
[Epoch 52 Batch 60/173] avg loss 0.00952558, throughput 2.88926K wps
[Epoch 52 Batch 90/173] avg loss 0.00979017, throughput 2.88495K wps
[Epoch 52 Batch 120/173] avg loss 0.0095139, throughput 2.81639K wps
[Epoch 52 Batch 150/173] avg loss 0.00970821, throughput 2.88061K wps
Begin Testing...
[Epoch 52] train avg loss 0.00970394, dev acc 0.6934, dev avg loss 0.567119, throughput 2.8773K wps
[Epoch 53 Batch 30/173] avg loss 0.00942303, throughput 2.93816K wps
[Epoch 53 Batch 60/173] avg loss 0.00942579, throughput 2.78269K wps
[Epoch 53 Batch 90/173] avg loss 0.00967889, throughput 2.88196K wps
[Epoch 53 Batch 120/173] avg loss 0.00973974, throughput 2.87369K wps
[Epoch 53 Batch 150/173] avg loss 0.00945181, throughput 2.88598K wps
Begin Testing...
[Epoch 53] train avg loss 0.00954254, dev acc 0.6997, dev avg loss 0.560316, throughput 2.86637K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/173] avg loss 0.00965297, throughput 2.90869K wps
[Epoch 54 Batch 60/173] avg loss 0.00929518, throughput 2.84656K wps
[Epoch 54 Batch 90/173] avg loss 0.00957555, throughput 2.84276K wps
[Epoch 54 Batch 120/173] avg loss 0.00912706, throughput 2.86116K wps
[Epoch 54 Batch 150/173] avg loss 0.0094114, throughput 2.88319K wps
Begin Testing...
[Epoch 54] train avg loss 0.00941122, dev acc 0.7080, dev avg loss 0.55493, throughput 2.87011K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/173] avg loss 0.00928358, throughput 2.90169K wps
[Epoch 55 Batch 60/173] avg loss 0.00919777, throughput 2.84695K wps
[Epoch 55 Batch 90/173] avg loss 0.00904193, throughput 2.88288K wps
[Epoch 55 Batch 120/173] avg loss 0.00913, throughput 2.87602K wps
[Epoch 55 Batch 150/173] avg loss 0.00924613, throughput 2.86837K wps
Begin Testing...
[Epoch 55] train avg loss 0.00919494, dev acc 0.7216, dev avg loss 0.549186, throughput 2.87575K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/173] avg loss 0.00891953, throughput 2.87986K wps
[Epoch 56 Batch 60/173] avg loss 0.00929181, throughput 2.83575K wps
[Epoch 56 Batch 90/173] avg loss 0.00887203, throughput 2.85111K wps
[Epoch 56 Batch 120/173] avg loss 0.00908084, throughput 2.85916K wps
[Epoch 56 Batch 150/173] avg loss 0.00895063, throughput 2.81614K wps
Begin Testing...
[Epoch 56] train avg loss 0.00900487, dev acc 0.7237, dev avg loss 0.544553, throughput 2.85234K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/173] avg loss 0.00867665, throughput 2.92969K wps
[Epoch 57 Batch 60/173] avg loss 0.00896191, throughput 2.8787K wps
[Epoch 57 Batch 90/173] avg loss 0.00849745, throughput 2.88057K wps
[Epoch 57 Batch 120/173] avg loss 0.00897945, throughput 2.88631K wps
[Epoch 57 Batch 150/173] avg loss 0.00901995, throughput 2.88295K wps
Begin Testing...
[Epoch 57] train avg loss 0.00883176, dev acc 0.6945, dev avg loss 0.55275, throughput 2.89044K wps
[Epoch 58 Batch 30/173] avg loss 0.00844493, throughput 2.90424K wps
[Epoch 58 Batch 60/173] avg loss 0.00875562, throughput 2.87456K wps
[Epoch 58 Batch 90/173] avg loss 0.00884243, throughput 2.83853K wps
[Epoch 58 Batch 120/173] avg loss 0.0083998, throughput 2.83816K wps
[Epoch 58 Batch 150/173] avg loss 0.0087306, throughput 2.8787K wps
Begin Testing...
[Epoch 58] train avg loss 0.00865043, dev acc 0.7299, dev avg loss 0.535957, throughput 2.86784K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/173] avg loss 0.00825504, throughput 2.90946K wps
[Epoch 59 Batch 60/173] avg loss 0.00819926, throughput 2.87578K wps
[Epoch 59 Batch 90/173] avg loss 0.00861472, throughput 2.88045K wps
[Epoch 59 Batch 120/173] avg loss 0.008539, throughput 2.86566K wps
[Epoch 59 Batch 150/173] avg loss 0.0086241, throughput 2.88469K wps
Begin Testing...
[Epoch 59] train avg loss 0.00846732, dev acc 0.7310, dev avg loss 0.531985, throughput 2.87403K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/173] avg loss 0.00849995, throughput 2.90265K wps
[Epoch 60 Batch 60/173] avg loss 0.00822222, throughput 2.87735K wps
[Epoch 60 Batch 90/173] avg loss 0.00810666, throughput 2.87464K wps
[Epoch 60 Batch 120/173] avg loss 0.00819635, throughput 2.87137K wps
[Epoch 60 Batch 150/173] avg loss 0.00816801, throughput 2.87442K wps
Begin Testing...
[Epoch 60] train avg loss 0.00826085, dev acc 0.7278, dev avg loss 0.528676, throughput 2.87898K wps
[Epoch 61 Batch 30/173] avg loss 0.00803376, throughput 2.91767K wps
[Epoch 61 Batch 60/173] avg loss 0.00791858, throughput 2.85124K wps
[Epoch 61 Batch 90/173] avg loss 0.00790837, throughput 2.77013K wps
[Epoch 61 Batch 120/173] avg loss 0.00800935, throughput 2.87473K wps
[Epoch 61 Batch 150/173] avg loss 0.00790236, throughput 2.87593K wps
Begin Testing...
[Epoch 61] train avg loss 0.00802272, dev acc 0.7372, dev avg loss 0.522918, throughput 2.85896K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/173] avg loss 0.00784396, throughput 2.86798K wps
[Epoch 62 Batch 60/173] avg loss 0.00799355, throughput 2.85579K wps
[Epoch 62 Batch 90/173] avg loss 0.00764174, throughput 2.87026K wps
[Epoch 62 Batch 120/173] avg loss 0.00797202, throughput 2.87082K wps
[Epoch 62 Batch 150/173] avg loss 0.00782205, throughput 2.84632K wps
Begin Testing...
[Epoch 62] train avg loss 0.00789237, dev acc 0.7404, dev avg loss 0.51904, throughput 2.8622K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/173] avg loss 0.00770516, throughput 2.91873K wps
[Epoch 63 Batch 60/173] avg loss 0.00788327, throughput 2.88253K wps
[Epoch 63 Batch 90/173] avg loss 0.00751733, throughput 2.83708K wps
[Epoch 63 Batch 120/173] avg loss 0.00780671, throughput 2.85058K wps
[Epoch 63 Batch 150/173] avg loss 0.00770335, throughput 2.80368K wps
Begin Testing...
[Epoch 63] train avg loss 0.00771716, dev acc 0.7393, dev avg loss 0.516229, throughput 2.85681K wps
[Epoch 64 Batch 30/173] avg loss 0.00747501, throughput 2.87906K wps
[Epoch 64 Batch 60/173] avg loss 0.00753994, throughput 2.82587K wps
[Epoch 64 Batch 90/173] avg loss 0.00750861, throughput 2.8088K wps
[Epoch 64 Batch 120/173] avg loss 0.00733919, throughput 2.86852K wps
[Epoch 64 Batch 150/173] avg loss 0.00756784, throughput 2.8608K wps
Begin Testing...
[Epoch 64] train avg loss 0.00748739, dev acc 0.7435, dev avg loss 0.510454, throughput 2.85227K wps
Observed Improvement.
Begin Testing...
[Epoch 65 Batch 30/173] avg loss 0.00740881, throughput 2.88907K wps
[Epoch 65 Batch 60/173] avg loss 0.00744323, throughput 2.85769K wps
[Epoch 65 Batch 90/173] avg loss 0.00731777, throughput 2.87182K wps
[Epoch 65 Batch 120/173] avg loss 0.00725904, throughput 2.83127K wps
[Epoch 65 Batch 150/173] avg loss 0.00729439, throughput 2.86805K wps
Begin Testing...
[Epoch 65] train avg loss 0.00733245, dev acc 0.7414, dev avg loss 0.508336, throughput 2.86351K wps
[Epoch 66 Batch 30/173] avg loss 0.00710662, throughput 2.93206K wps
[Epoch 66 Batch 60/173] avg loss 0.00710269, throughput 2.81175K wps
[Epoch 66 Batch 90/173] avg loss 0.00698602, throughput 2.87484K wps
[Epoch 66 Batch 120/173] avg loss 0.00728269, throughput 2.87753K wps
[Epoch 66 Batch 150/173] avg loss 0.00733662, throughput 2.84061K wps
Begin Testing...
[Epoch 66] train avg loss 0.00716049, dev acc 0.7456, dev avg loss 0.504052, throughput 2.86506K wps
Observed Improvement.
Begin Testing...
[Epoch 67 Batch 30/173] avg loss 0.00695532, throughput 2.9193K wps
[Epoch 67 Batch 60/173] avg loss 0.00675405, throughput 2.87084K wps
[Epoch 67 Batch 90/173] avg loss 0.00683442, throughput 2.87301K wps
[Epoch 67 Batch 120/173] avg loss 0.00720248, throughput 2.81495K wps
[Epoch 67 Batch 150/173] avg loss 0.00703916, throughput 2.83372K wps
Begin Testing...
[Epoch 67] train avg loss 0.00696221, dev acc 0.7414, dev avg loss 0.50268, throughput 2.86394K wps
[Epoch 68 Batch 30/173] avg loss 0.00683741, throughput 2.93752K wps
[Epoch 68 Batch 60/173] avg loss 0.00640771, throughput 2.88074K wps
[Epoch 68 Batch 90/173] avg loss 0.00701816, throughput 2.81684K wps
[Epoch 68 Batch 120/173] avg loss 0.00675102, throughput 2.84261K wps
[Epoch 68 Batch 150/173] avg loss 0.00665997, throughput 2.85623K wps
Begin Testing...
[Epoch 68] train avg loss 0.00673509, dev acc 0.7362, dev avg loss 0.501929, throughput 2.86339K wps
[Epoch 69 Batch 30/173] avg loss 0.00679123, throughput 2.88111K wps
[Epoch 69 Batch 60/173] avg loss 0.00673239, throughput 2.86179K wps
[Epoch 69 Batch 90/173] avg loss 0.00629142, throughput 2.86424K wps
[Epoch 69 Batch 120/173] avg loss 0.00655149, throughput 2.85589K wps
[Epoch 69 Batch 150/173] avg loss 0.0069244, throughput 2.86435K wps
Begin Testing...
[Epoch 69] train avg loss 0.0066221, dev acc 0.7456, dev avg loss 0.498042, throughput 2.86792K wps
Observed Improvement.
Begin Testing...
[Epoch 70 Batch 30/173] avg loss 0.00644287, throughput 2.89333K wps
[Epoch 70 Batch 60/173] avg loss 0.006442, throughput 2.85004K wps
[Epoch 70 Batch 90/173] avg loss 0.0063505, throughput 2.82377K wps
[Epoch 70 Batch 120/173] avg loss 0.00653323, throughput 2.80572K wps
[Epoch 70 Batch 150/173] avg loss 0.00640773, throughput 2.8263K wps
Begin Testing...
[Epoch 70] train avg loss 0.00643103, dev acc 0.7435, dev avg loss 0.497196, throughput 2.84103K wps
[Epoch 71 Batch 30/173] avg loss 0.00638052, throughput 2.85884K wps
[Epoch 71 Batch 60/173] avg loss 0.00631361, throughput 2.87743K wps
[Epoch 71 Batch 90/173] avg loss 0.00632076, throughput 2.87776K wps
[Epoch 71 Batch 120/173] avg loss 0.00642196, throughput 2.87617K wps
[Epoch 71 Batch 150/173] avg loss 0.00626693, throughput 2.87828K wps
Begin Testing...
[Epoch 71] train avg loss 0.0063117, dev acc 0.7477, dev avg loss 0.49455, throughput 2.87509K wps
Observed Improvement.
Begin Testing...
[Epoch 72 Batch 30/173] avg loss 0.00614037, throughput 2.93274K wps
[Epoch 72 Batch 60/173] avg loss 0.00588698, throughput 2.87741K wps
[Epoch 72 Batch 90/173] avg loss 0.00626772, throughput 2.86101K wps
[Epoch 72 Batch 120/173] avg loss 0.00629111, throughput 2.85785K wps
[Epoch 72 Batch 150/173] avg loss 0.00617165, throughput 2.8524K wps
Begin Testing...
[Epoch 72] train avg loss 0.0061201, dev acc 0.7497, dev avg loss 0.494391, throughput 2.87022K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/173] avg loss 0.00585569, throughput 2.90784K wps
[Epoch 73 Batch 60/173] avg loss 0.00585125, throughput 2.87955K wps
[Epoch 73 Batch 90/173] avg loss 0.00610361, throughput 2.87945K wps
[Epoch 73 Batch 120/173] avg loss 0.00588569, throughput 2.86157K wps
[Epoch 73 Batch 150/173] avg loss 0.00584643, throughput 2.86573K wps
Begin Testing...
[Epoch 73] train avg loss 0.00591244, dev acc 0.7487, dev avg loss 0.504468, throughput 2.87862K wps
[Epoch 74 Batch 30/173] avg loss 0.00578311, throughput 2.90578K wps
[Epoch 74 Batch 60/173] avg loss 0.00576951, throughput 2.80464K wps
[Epoch 74 Batch 90/173] avg loss 0.00563105, throughput 2.87903K wps
[Epoch 74 Batch 120/173] avg loss 0.00555974, throughput 2.87913K wps
[Epoch 74 Batch 150/173] avg loss 0.00568556, throughput 2.87516K wps
Begin Testing...
[Epoch 74] train avg loss 0.00570798, dev acc 0.7497, dev avg loss 0.493376, throughput 2.85788K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/173] avg loss 0.00557729, throughput 2.89809K wps
[Epoch 75 Batch 60/173] avg loss 0.00561726, throughput 2.79597K wps
[Epoch 75 Batch 90/173] avg loss 0.00547669, throughput 2.83634K wps
[Epoch 75 Batch 120/173] avg loss 0.00593233, throughput 2.8776K wps
[Epoch 75 Batch 150/173] avg loss 0.00530394, throughput 2.87036K wps
Begin Testing...
[Epoch 75] train avg loss 0.00562803, dev acc 0.7550, dev avg loss 0.490635, throughput 2.85784K wps
Observed Improvement.
Begin Testing...
[Epoch 76 Batch 30/173] avg loss 0.00535459, throughput 2.86242K wps
[Epoch 76 Batch 60/173] avg loss 0.00545989, throughput 2.79921K wps
[Epoch 76 Batch 90/173] avg loss 0.00522732, throughput 2.80769K wps
[Epoch 76 Batch 120/173] avg loss 0.00545933, throughput 2.88185K wps
[Epoch 76 Batch 150/173] avg loss 0.00538997, throughput 2.83296K wps
Begin Testing...
[Epoch 76] train avg loss 0.00538512, dev acc 0.7508, dev avg loss 0.49049, throughput 2.83169K wps
[Epoch 77 Batch 30/173] avg loss 0.00531738, throughput 2.93497K wps
[Epoch 77 Batch 60/173] avg loss 0.00525264, throughput 2.87459K wps
[Epoch 77 Batch 90/173] avg loss 0.00532493, throughput 2.86007K wps
[Epoch 77 Batch 120/173] avg loss 0.00512292, throughput 2.79869K wps
[Epoch 77 Batch 150/173] avg loss 0.00514793, throughput 2.85156K wps
Begin Testing...
[Epoch 77] train avg loss 0.00526159, dev acc 0.7529, dev avg loss 0.490925, throughput 2.86561K wps
[Epoch 78 Batch 30/173] avg loss 0.00502586, throughput 2.87094K wps
[Epoch 78 Batch 60/173] avg loss 0.00513923, throughput 2.79486K wps
[Epoch 78 Batch 90/173] avg loss 0.00491378, throughput 2.82841K wps
[Epoch 78 Batch 120/173] avg loss 0.00505509, throughput 2.83916K wps
[Epoch 78 Batch 150/173] avg loss 0.00522947, throughput 2.80073K wps
Begin Testing...
[Epoch 78] train avg loss 0.00508242, dev acc 0.7560, dev avg loss 0.486825, throughput 2.82427K wps
Observed Improvement.
Begin Testing...
[Epoch 79 Batch 30/173] avg loss 0.0050651, throughput 2.92194K wps
[Epoch 79 Batch 60/173] avg loss 0.0050597, throughput 2.78011K wps
[Epoch 79 Batch 90/173] avg loss 0.00485313, throughput 2.86574K wps
[Epoch 79 Batch 120/173] avg loss 0.00527236, throughput 2.87525K wps
[Epoch 79 Batch 150/173] avg loss 0.00461089, throughput 2.88121K wps
Begin Testing...
[Epoch 79] train avg loss 0.00496876, dev acc 0.7518, dev avg loss 0.497423, throughput 2.86343K wps
[Epoch 80 Batch 30/173] avg loss 0.00448133, throughput 2.92663K wps
[Epoch 80 Batch 60/173] avg loss 0.00489585, throughput 2.8687K wps
[Epoch 80 Batch 90/173] avg loss 0.0048378, throughput 2.85738K wps
[Epoch 80 Batch 120/173] avg loss 0.00489518, throughput 2.83348K wps
[Epoch 80 Batch 150/173] avg loss 0.00489084, throughput 2.83646K wps
Begin Testing...
[Epoch 80] train avg loss 0.00480786, dev acc 0.7602, dev avg loss 0.490124, throughput 2.86625K wps
Observed Improvement.
Begin Testing...
[Epoch 81 Batch 30/173] avg loss 0.00453229, throughput 2.88049K wps
[Epoch 81 Batch 60/173] avg loss 0.00457783, throughput 2.87925K wps
[Epoch 81 Batch 90/173] avg loss 0.00493183, throughput 2.86815K wps
[Epoch 81 Batch 120/173] avg loss 0.00461768, throughput 2.85611K wps
[Epoch 81 Batch 150/173] avg loss 0.00461082, throughput 2.87675K wps
Begin Testing...
[Epoch 81] train avg loss 0.00464601, dev acc 0.7623, dev avg loss 0.487951, throughput 2.87287K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/173] avg loss 0.00447268, throughput 2.94675K wps
[Epoch 82 Batch 60/173] avg loss 0.00441718, throughput 2.85617K wps
[Epoch 82 Batch 90/173] avg loss 0.00452851, throughput 2.85106K wps
[Epoch 82 Batch 120/173] avg loss 0.0043855, throughput 2.82175K wps
[Epoch 82 Batch 150/173] avg loss 0.00469351, throughput 2.82485K wps
Begin Testing...
[Epoch 82] train avg loss 0.00454861, dev acc 0.7591, dev avg loss 0.489244, throughput 2.86057K wps
[Epoch 83 Batch 30/173] avg loss 0.00449502, throughput 2.87649K wps
[Epoch 83 Batch 60/173] avg loss 0.004415, throughput 2.81321K wps
[Epoch 83 Batch 90/173] avg loss 0.00442182, throughput 2.8034K wps
[Epoch 83 Batch 120/173] avg loss 0.00424622, throughput 2.87744K wps
[Epoch 83 Batch 150/173] avg loss 0.0045328, throughput 2.87286K wps
Begin Testing...
[Epoch 83] train avg loss 0.00441578, dev acc 0.7570, dev avg loss 0.493934, throughput 2.84837K wps
[Epoch 84 Batch 30/173] avg loss 0.00403368, throughput 2.91322K wps
[Epoch 84 Batch 60/173] avg loss 0.00427167, throughput 2.88087K wps
[Epoch 84 Batch 90/173] avg loss 0.00408169, throughput 2.85716K wps
[Epoch 84 Batch 120/173] avg loss 0.00429731, throughput 2.85828K wps
[Epoch 84 Batch 150/173] avg loss 0.00438117, throughput 2.87912K wps
Begin Testing...
[Epoch 84] train avg loss 0.0042288, dev acc 0.7560, dev avg loss 0.49293, throughput 2.87069K wps
[Epoch 85 Batch 30/173] avg loss 0.00386197, throughput 2.94473K wps
[Epoch 85 Batch 60/173] avg loss 0.00409888, throughput 2.87897K wps
[Epoch 85 Batch 90/173] avg loss 0.00398158, throughput 2.87506K wps
[Epoch 85 Batch 120/173] avg loss 0.00435916, throughput 2.87244K wps
[Epoch 85 Batch 150/173] avg loss 0.00409988, throughput 2.86222K wps
Begin Testing...
[Epoch 85] train avg loss 0.0041078, dev acc 0.7602, dev avg loss 0.500938, throughput 2.8785K wps
[Epoch 86 Batch 30/173] avg loss 0.00400303, throughput 2.89066K wps
[Epoch 86 Batch 60/173] avg loss 0.00382199, throughput 2.86804K wps
[Epoch 86 Batch 90/173] avg loss 0.00398872, throughput 2.88754K wps
[Epoch 86 Batch 120/173] avg loss 0.00393627, throughput 2.88591K wps
[Epoch 86 Batch 150/173] avg loss 0.00415153, throughput 2.87531K wps
Begin Testing...
[Epoch 86] train avg loss 0.00401406, dev acc 0.7623, dev avg loss 0.493043, throughput 2.87013K wps
Observed Improvement.
Begin Testing...
[Epoch 87 Batch 30/173] avg loss 0.00352063, throughput 2.86508K wps
[Epoch 87 Batch 60/173] avg loss 0.00397551, throughput 2.80094K wps
[Epoch 87 Batch 90/173] avg loss 0.00381131, throughput 2.86154K wps
[Epoch 87 Batch 120/173] avg loss 0.00384394, throughput 2.87878K wps
[Epoch 87 Batch 150/173] avg loss 0.00401496, throughput 2.88442K wps
Begin Testing...
[Epoch 87] train avg loss 0.00388386, dev acc 0.7570, dev avg loss 0.494677, throughput 2.85995K wps
[Epoch 88 Batch 30/173] avg loss 0.00374965, throughput 2.93907K wps
[Epoch 88 Batch 60/173] avg loss 0.00396875, throughput 2.87604K wps
[Epoch 88 Batch 90/173] avg loss 0.00376154, throughput 2.86221K wps
[Epoch 88 Batch 120/173] avg loss 0.00345677, throughput 2.85056K wps
[Epoch 88 Batch 150/173] avg loss 0.0040957, throughput 2.87163K wps
Begin Testing...
[Epoch 88] train avg loss 0.00374833, dev acc 0.7550, dev avg loss 0.496625, throughput 2.87928K wps
[Epoch 89 Batch 30/173] avg loss 0.0037718, throughput 2.93082K wps
[Epoch 89 Batch 60/173] avg loss 0.00368476, throughput 2.83062K wps
[Epoch 89 Batch 90/173] avg loss 0.00395857, throughput 2.86833K wps
[Epoch 89 Batch 120/173] avg loss 0.00337832, throughput 2.88372K wps
[Epoch 89 Batch 150/173] avg loss 0.00340126, throughput 2.7852K wps
Begin Testing...
[Epoch 89] train avg loss 0.00363849, dev acc 0.7560, dev avg loss 0.499128, throughput 2.85837K wps
[Epoch 90 Batch 30/173] avg loss 0.00339759, throughput 2.94314K wps
[Epoch 90 Batch 60/173] avg loss 0.00353068, throughput 2.87594K wps
[Epoch 90 Batch 90/173] avg loss 0.00365641, throughput 2.83639K wps
[Epoch 90 Batch 120/173] avg loss 0.00344512, throughput 2.81056K wps
[Epoch 90 Batch 150/173] avg loss 0.00347095, throughput 2.80556K wps
Begin Testing...
[Epoch 90] train avg loss 0.00352767, dev acc 0.7591, dev avg loss 0.50174, throughput 2.85606K wps
[Epoch 91 Batch 30/173] avg loss 0.00339739, throughput 2.92022K wps
[Epoch 91 Batch 60/173] avg loss 0.00343481, throughput 2.8841K wps
[Epoch 91 Batch 90/173] avg loss 0.00336919, throughput 2.86236K wps
[Epoch 91 Batch 120/173] avg loss 0.00331776, throughput 2.83622K wps
[Epoch 91 Batch 150/173] avg loss 0.00348292, throughput 2.81068K wps
Begin Testing...
[Epoch 91] train avg loss 0.00343652, dev acc 0.7570, dev avg loss 0.501586, throughput 2.86296K wps
[Epoch 92 Batch 30/173] avg loss 0.00327705, throughput 2.87521K wps
[Epoch 92 Batch 60/173] avg loss 0.00323196, throughput 2.79768K wps
[Epoch 92 Batch 90/173] avg loss 0.00332278, throughput 2.78359K wps
[Epoch 92 Batch 120/173] avg loss 0.00323077, throughput 2.80093K wps
[Epoch 92 Batch 150/173] avg loss 0.0032921, throughput 2.88686K wps
Begin Testing...
[Epoch 92] train avg loss 0.0032699, dev acc 0.7612, dev avg loss 0.504356, throughput 2.83401K wps
[Epoch 93 Batch 30/173] avg loss 0.00321798, throughput 2.93975K wps
[Epoch 93 Batch 60/173] avg loss 0.00316229, throughput 2.87535K wps
[Epoch 93 Batch 90/173] avg loss 0.00310736, throughput 2.8682K wps
[Epoch 93 Batch 120/173] avg loss 0.00313933, throughput 2.8414K wps
[Epoch 93 Batch 150/173] avg loss 0.00308277, throughput 2.87911K wps
Begin Testing...
[Epoch 93] train avg loss 0.00315296, dev acc 0.7487, dev avg loss 0.507217, throughput 2.88067K wps
[Epoch 94 Batch 30/173] avg loss 0.00295857, throughput 2.92779K wps
[Epoch 94 Batch 60/173] avg loss 0.00318074, throughput 2.8078K wps
[Epoch 94 Batch 90/173] avg loss 0.0029188, throughput 2.88299K wps
[Epoch 94 Batch 120/173] avg loss 0.00329113, throughput 2.84322K wps
[Epoch 94 Batch 150/173] avg loss 0.00307619, throughput 2.87585K wps
Begin Testing...
[Epoch 94] train avg loss 0.00310172, dev acc 0.7529, dev avg loss 0.510672, throughput 2.86921K wps
[Epoch 95 Batch 30/173] avg loss 0.00303845, throughput 2.94016K wps
[Epoch 95 Batch 60/173] avg loss 0.00277165, throughput 2.83709K wps
[Epoch 95 Batch 90/173] avg loss 0.00280912, throughput 2.86427K wps
[Epoch 95 Batch 120/173] avg loss 0.00306347, throughput 2.87731K wps
[Epoch 95 Batch 150/173] avg loss 0.00312691, throughput 2.88374K wps
Begin Testing...
[Epoch 95] train avg loss 0.00296274, dev acc 0.7570, dev avg loss 0.513577, throughput 2.87738K wps
[Epoch 96 Batch 30/173] avg loss 0.00270743, throughput 2.95346K wps
[Epoch 96 Batch 60/173] avg loss 0.0028361, throughput 2.88858K wps
[Epoch 96 Batch 90/173] avg loss 0.00313378, throughput 2.88853K wps
[Epoch 96 Batch 120/173] avg loss 0.00288241, throughput 2.86979K wps
[Epoch 96 Batch 150/173] avg loss 0.00319978, throughput 2.85906K wps
Begin Testing...
[Epoch 96] train avg loss 0.0029222, dev acc 0.7508, dev avg loss 0.520916, throughput 2.88666K wps
[Epoch 97 Batch 30/173] avg loss 0.00260934, throughput 2.87819K wps
[Epoch 97 Batch 60/173] avg loss 0.00286884, throughput 2.86143K wps
[Epoch 97 Batch 90/173] avg loss 0.00266743, throughput 2.88601K wps
[Epoch 97 Batch 120/173] avg loss 0.0028917, throughput 2.86212K wps
[Epoch 97 Batch 150/173] avg loss 0.00280823, throughput 2.81844K wps
Begin Testing...
[Epoch 97] train avg loss 0.0028012, dev acc 0.7633, dev avg loss 0.512675, throughput 2.85774K wps
Observed Improvement.
Begin Testing...
[Epoch 98 Batch 30/173] avg loss 0.00249849, throughput 2.86179K wps
[Epoch 98 Batch 60/173] avg loss 0.0027659, throughput 2.84578K wps
[Epoch 98 Batch 90/173] avg loss 0.00260137, throughput 2.87382K wps
[Epoch 98 Batch 120/173] avg loss 0.00267845, throughput 2.88156K wps
[Epoch 98 Batch 150/173] avg loss 0.00278836, throughput 2.84405K wps
Begin Testing...
[Epoch 98] train avg loss 0.00267815, dev acc 0.7570, dev avg loss 0.517503, throughput 2.85556K wps
[Epoch 99 Batch 30/173] avg loss 0.00252577, throughput 2.9326K wps
[Epoch 99 Batch 60/173] avg loss 0.00253565, throughput 2.83222K wps
[Epoch 99 Batch 90/173] avg loss 0.00249661, throughput 2.80958K wps
[Epoch 99 Batch 120/173] avg loss 0.0025939, throughput 2.84978K wps
[Epoch 99 Batch 150/173] avg loss 0.0026242, throughput 2.87663K wps
Begin Testing...
[Epoch 99] train avg loss 0.00256638, dev acc 0.7550, dev avg loss 0.520614, throughput 2.85289K wps
[Epoch 100 Batch 30/173] avg loss 0.00238792, throughput 2.90246K wps
[Epoch 100 Batch 60/173] avg loss 0.00242288, throughput 2.87214K wps
[Epoch 100 Batch 90/173] avg loss 0.00270445, throughput 2.85968K wps
[Epoch 100 Batch 120/173] avg loss 0.00241168, throughput 2.80021K wps
[Epoch 100 Batch 150/173] avg loss 0.00250922, throughput 2.83523K wps
Begin Testing...
[Epoch 100] train avg loss 0.00251329, dev acc 0.7560, dev avg loss 0.53083, throughput 2.85788K wps
[Epoch 101 Batch 30/173] avg loss 0.00229674, throughput 2.91088K wps
[Epoch 101 Batch 60/173] avg loss 0.00254901, throughput 2.8029K wps
[Epoch 101 Batch 90/173] avg loss 0.00240971, throughput 2.81548K wps
[Epoch 101 Batch 120/173] avg loss 0.00248558, throughput 2.79593K wps
[Epoch 101 Batch 150/173] avg loss 0.00244489, throughput 2.8092K wps
Begin Testing...
[Epoch 101] train avg loss 0.00244092, dev acc 0.7477, dev avg loss 0.52851, throughput 2.83179K wps
[Epoch 102 Batch 30/173] avg loss 0.00244526, throughput 2.85746K wps
[Epoch 102 Batch 60/173] avg loss 0.00227516, throughput 2.86232K wps
[Epoch 102 Batch 90/173] avg loss 0.00252956, throughput 2.88237K wps
[Epoch 102 Batch 120/173] avg loss 0.00242877, throughput 2.86897K wps
[Epoch 102 Batch 150/173] avg loss 0.00230168, throughput 2.84235K wps
Begin Testing...
[Epoch 102] train avg loss 0.00237863, dev acc 0.7497, dev avg loss 0.533098, throughput 2.86444K wps
[Epoch 103 Batch 30/173] avg loss 0.00233415, throughput 2.88374K wps
[Epoch 103 Batch 60/173] avg loss 0.0021558, throughput 2.84772K wps
[Epoch 103 Batch 90/173] avg loss 0.00242485, throughput 2.86349K wps
[Epoch 103 Batch 120/173] avg loss 0.00239708, throughput 2.86781K wps
[Epoch 103 Batch 150/173] avg loss 0.00230031, throughput 2.88279K wps
Begin Testing...
[Epoch 103] train avg loss 0.00231568, dev acc 0.7508, dev avg loss 0.534938, throughput 2.87127K wps
[Epoch 104 Batch 30/173] avg loss 0.00224327, throughput 2.92614K wps
[Epoch 104 Batch 60/173] avg loss 0.00219092, throughput 2.88224K wps
[Epoch 104 Batch 90/173] avg loss 0.00236989, throughput 2.79159K wps
[Epoch 104 Batch 120/173] avg loss 0.00225755, throughput 2.78622K wps
[Epoch 104 Batch 150/173] avg loss 0.00219266, throughput 2.79143K wps
Begin Testing...
[Epoch 104] train avg loss 0.00223565, dev acc 0.7487, dev avg loss 0.541165, throughput 2.83555K wps
[Epoch 105 Batch 30/173] avg loss 0.00211381, throughput 2.93573K wps
[Epoch 105 Batch 60/173] avg loss 0.00214674, throughput 2.84283K wps
[Epoch 105 Batch 90/173] avg loss 0.00212955, throughput 2.88001K wps