Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='MR', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='static')
Use gpu0
maximum length (in tokens): 56
Done! Tokenizing Time=0.21s, #Sentences=10662
SentimentNet(
(embedding): Embedding(18768 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/173] avg loss 0.0139592, throughput 0.859971K wps
[Epoch 0 Batch 60/173] avg loss 0.0139546, throughput 5.04742K wps
[Epoch 0 Batch 90/173] avg loss 0.0139594, throughput 4.9288K wps
[Epoch 0 Batch 120/173] avg loss 0.0139678, throughput 5.01641K wps
[Epoch 0 Batch 150/173] avg loss 0.0139971, throughput 4.78459K wps
Begin Testing...
[Epoch 0] train avg loss 0.0139904, dev acc 0.5735, dev avg loss 0.686487, throughput 1.98593K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0138722, throughput 5.07491K wps
[Epoch 1 Batch 60/173] avg loss 0.0136868, throughput 4.89608K wps
[Epoch 1 Batch 90/173] avg loss 0.0138665, throughput 5.33593K wps
[Epoch 1 Batch 120/173] avg loss 0.0137415, throughput 5.11124K wps
[Epoch 1 Batch 150/173] avg loss 0.0137569, throughput 5.47948K wps
Begin Testing...
[Epoch 1] train avg loss 0.0137782, dev acc 0.6048, dev avg loss 0.679893, throughput 5.25317K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.0136988, throughput 4.71871K wps
[Epoch 2 Batch 60/173] avg loss 0.0137503, throughput 4.71459K wps
[Epoch 2 Batch 90/173] avg loss 0.0135193, throughput 5.19952K wps
[Epoch 2 Batch 120/173] avg loss 0.0136776, throughput 5.13115K wps
[Epoch 2 Batch 150/173] avg loss 0.0134713, throughput 5.44621K wps
Begin Testing...
[Epoch 2] train avg loss 0.0136291, dev acc 0.6621, dev avg loss 0.672345, throughput 4.9837K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.0136055, throughput 4.64297K wps
[Epoch 3 Batch 60/173] avg loss 0.0134859, throughput 5.23359K wps
[Epoch 3 Batch 90/173] avg loss 0.0135942, throughput 5.55496K wps
[Epoch 3 Batch 120/173] avg loss 0.0134612, throughput 4.93266K wps
[Epoch 3 Batch 150/173] avg loss 0.0134018, throughput 5.00704K wps
Begin Testing...
[Epoch 3] train avg loss 0.0135104, dev acc 0.6517, dev avg loss 0.666805, throughput 5.00464K wps
[Epoch 4 Batch 30/173] avg loss 0.0133401, throughput 4.93826K wps
[Epoch 4 Batch 60/173] avg loss 0.0133744, throughput 4.81501K wps
[Epoch 4 Batch 90/173] avg loss 0.0133307, throughput 4.71943K wps
[Epoch 4 Batch 120/173] avg loss 0.0133674, throughput 4.98468K wps
[Epoch 4 Batch 150/173] avg loss 0.0131999, throughput 6.2394K wps
Begin Testing...
[Epoch 4] train avg loss 0.0133253, dev acc 0.7059, dev avg loss 0.658292, throughput 5.11466K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/173] avg loss 0.0131872, throughput 5.35456K wps
[Epoch 5 Batch 60/173] avg loss 0.0131686, throughput 5.12439K wps
[Epoch 5 Batch 90/173] avg loss 0.0130146, throughput 6.02872K wps
[Epoch 5 Batch 120/173] avg loss 0.013075, throughput 4.63608K wps
[Epoch 5 Batch 150/173] avg loss 0.0132127, throughput 4.86637K wps
Begin Testing...
[Epoch 5] train avg loss 0.013165, dev acc 0.7143, dev avg loss 0.650859, throughput 5.13897K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/173] avg loss 0.0129692, throughput 5.48548K wps
[Epoch 6 Batch 60/173] avg loss 0.012978, throughput 4.69927K wps
[Epoch 6 Batch 90/173] avg loss 0.0129739, throughput 4.94778K wps
[Epoch 6 Batch 120/173] avg loss 0.0129743, throughput 4.6578K wps
[Epoch 6 Batch 150/173] avg loss 0.0129116, throughput 5.20263K wps
Begin Testing...
[Epoch 6] train avg loss 0.0129867, dev acc 0.7258, dev avg loss 0.642776, throughput 5.08648K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/173] avg loss 0.0128045, throughput 4.95123K wps
[Epoch 7 Batch 60/173] avg loss 0.0128356, throughput 4.74869K wps
[Epoch 7 Batch 90/173] avg loss 0.0126493, throughput 4.77758K wps
[Epoch 7 Batch 120/173] avg loss 0.012856, throughput 5.0437K wps
[Epoch 7 Batch 150/173] avg loss 0.0127683, throughput 5.12392K wps
Begin Testing...
[Epoch 7] train avg loss 0.0127779, dev acc 0.7247, dev avg loss 0.633977, throughput 4.91574K wps
[Epoch 8 Batch 30/173] avg loss 0.0125263, throughput 4.96601K wps
[Epoch 8 Batch 60/173] avg loss 0.0125682, throughput 4.88474K wps
[Epoch 8 Batch 90/173] avg loss 0.0126192, throughput 4.87346K wps
[Epoch 8 Batch 120/173] avg loss 0.0125661, throughput 5.95468K wps
[Epoch 8 Batch 150/173] avg loss 0.0126211, throughput 5.38037K wps
Begin Testing...
[Epoch 8] train avg loss 0.0125781, dev acc 0.7351, dev avg loss 0.62483, throughput 5.28736K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/173] avg loss 0.0123429, throughput 6.14319K wps
[Epoch 9 Batch 60/173] avg loss 0.0124, throughput 4.7884K wps
[Epoch 9 Batch 90/173] avg loss 0.0124497, throughput 4.98175K wps
[Epoch 9 Batch 120/173] avg loss 0.0124261, throughput 5.81787K wps
[Epoch 9 Batch 150/173] avg loss 0.0123506, throughput 5.6372K wps
Begin Testing...
[Epoch 9] train avg loss 0.0123823, dev acc 0.7445, dev avg loss 0.614885, throughput 5.53249K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/173] avg loss 0.0123464, throughput 4.95872K wps
[Epoch 10 Batch 60/173] avg loss 0.012063, throughput 5.03299K wps
[Epoch 10 Batch 90/173] avg loss 0.0125063, throughput 5.4148K wps
[Epoch 10 Batch 120/173] avg loss 0.0119231, throughput 5.9199K wps
[Epoch 10 Batch 150/173] avg loss 0.0122448, throughput 5.00124K wps
Begin Testing...
[Epoch 10] train avg loss 0.0122086, dev acc 0.7424, dev avg loss 0.605797, throughput 5.15897K wps
[Epoch 11 Batch 30/173] avg loss 0.0121341, throughput 4.63378K wps
[Epoch 11 Batch 60/173] avg loss 0.0120307, throughput 5.15655K wps
[Epoch 11 Batch 90/173] avg loss 0.011933, throughput 4.98739K wps
[Epoch 11 Batch 120/173] avg loss 0.0117546, throughput 5.80611K wps
[Epoch 11 Batch 150/173] avg loss 0.0117962, throughput 5.12741K wps
Begin Testing...
[Epoch 11] train avg loss 0.0119294, dev acc 0.7487, dev avg loss 0.594476, throughput 5.06242K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/173] avg loss 0.0118272, throughput 5.32229K wps
[Epoch 12 Batch 60/173] avg loss 0.0116596, throughput 5.27366K wps
[Epoch 12 Batch 90/173] avg loss 0.0116785, throughput 5.00031K wps
[Epoch 12 Batch 120/173] avg loss 0.0116812, throughput 5.23788K wps
[Epoch 12 Batch 150/173] avg loss 0.0115237, throughput 5.01898K wps
Begin Testing...
[Epoch 12] train avg loss 0.0117061, dev acc 0.7591, dev avg loss 0.58382, throughput 5.15718K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/173] avg loss 0.0115664, throughput 4.81734K wps
[Epoch 13 Batch 60/173] avg loss 0.0116648, throughput 5.18198K wps
[Epoch 13 Batch 90/173] avg loss 0.0115259, throughput 5.5956K wps
[Epoch 13 Batch 120/173] avg loss 0.011364, throughput 5.40035K wps
[Epoch 13 Batch 150/173] avg loss 0.0115192, throughput 5.86949K wps
Begin Testing...
[Epoch 13] train avg loss 0.0115153, dev acc 0.7685, dev avg loss 0.573575, throughput 5.40833K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/173] avg loss 0.0113618, throughput 4.86458K wps
[Epoch 14 Batch 60/173] avg loss 0.0113505, throughput 5.53976K wps
[Epoch 14 Batch 90/173] avg loss 0.0113544, throughput 5.02748K wps
[Epoch 14 Batch 120/173] avg loss 0.010958, throughput 5.40818K wps
[Epoch 14 Batch 150/173] avg loss 0.0111424, throughput 4.86656K wps
Begin Testing...
[Epoch 14] train avg loss 0.0112567, dev acc 0.7591, dev avg loss 0.563909, throughput 5.09664K wps
[Epoch 15 Batch 30/173] avg loss 0.0109479, throughput 4.75154K wps
[Epoch 15 Batch 60/173] avg loss 0.0112465, throughput 4.97676K wps
[Epoch 15 Batch 90/173] avg loss 0.0111602, throughput 5.34838K wps
[Epoch 15 Batch 120/173] avg loss 0.0109129, throughput 5.12324K wps
[Epoch 15 Batch 150/173] avg loss 0.0108868, throughput 5.0448K wps
Begin Testing...
[Epoch 15] train avg loss 0.011032, dev acc 0.7748, dev avg loss 0.553656, throughput 5.08108K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.0110035, throughput 4.58524K wps
[Epoch 16 Batch 60/173] avg loss 0.0108265, throughput 5.32737K wps
[Epoch 16 Batch 90/173] avg loss 0.0110528, throughput 4.90884K wps
[Epoch 16 Batch 120/173] avg loss 0.0107485, throughput 5.2977K wps
[Epoch 16 Batch 150/173] avg loss 0.0107539, throughput 5.02405K wps
Begin Testing...
[Epoch 16] train avg loss 0.0108538, dev acc 0.7675, dev avg loss 0.544854, throughput 4.95744K wps
[Epoch 17 Batch 30/173] avg loss 0.010552, throughput 4.72419K wps
[Epoch 17 Batch 60/173] avg loss 0.0106793, throughput 4.72826K wps
[Epoch 17 Batch 90/173] avg loss 0.0104059, throughput 4.6911K wps
[Epoch 17 Batch 120/173] avg loss 0.0107529, throughput 5.01261K wps
[Epoch 17 Batch 150/173] avg loss 0.0104631, throughput 5.06156K wps
Begin Testing...
[Epoch 17] train avg loss 0.010592, dev acc 0.7748, dev avg loss 0.5362, throughput 4.90262K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/173] avg loss 0.0102878, throughput 4.6255K wps
[Epoch 18 Batch 60/173] avg loss 0.0103798, throughput 5.28978K wps
[Epoch 18 Batch 90/173] avg loss 0.0104866, throughput 5.2801K wps
[Epoch 18 Batch 120/173] avg loss 0.0102997, throughput 5.57359K wps
[Epoch 18 Batch 150/173] avg loss 0.0109386, throughput 4.78262K wps
Begin Testing...
[Epoch 18] train avg loss 0.0104692, dev acc 0.7696, dev avg loss 0.53043, throughput 5.01583K wps
[Epoch 19 Batch 30/173] avg loss 0.0104225, throughput 5.12012K wps
[Epoch 19 Batch 60/173] avg loss 0.0103977, throughput 6.12053K wps
[Epoch 19 Batch 90/173] avg loss 0.010326, throughput 4.70336K wps
[Epoch 19 Batch 120/173] avg loss 0.00971096, throughput 4.7824K wps
[Epoch 19 Batch 150/173] avg loss 0.00994272, throughput 4.98272K wps
Begin Testing...
[Epoch 19] train avg loss 0.0102313, dev acc 0.7696, dev avg loss 0.521531, throughput 5.14026K wps
[Epoch 20 Batch 30/173] avg loss 0.00978417, throughput 6.18268K wps
[Epoch 20 Batch 60/173] avg loss 0.0100251, throughput 5.17301K wps
[Epoch 20 Batch 90/173] avg loss 0.0100488, throughput 4.61047K wps
[Epoch 20 Batch 120/173] avg loss 0.0102194, throughput 5.5908K wps
[Epoch 20 Batch 150/173] avg loss 0.0102783, throughput 5.40576K wps
Begin Testing...
[Epoch 20] train avg loss 0.0100764, dev acc 0.7789, dev avg loss 0.515823, throughput 5.26049K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/173] avg loss 0.00994153, throughput 5.21892K wps
[Epoch 21 Batch 60/173] avg loss 0.0100899, throughput 5.43201K wps
[Epoch 21 Batch 90/173] avg loss 0.00985229, throughput 4.62678K wps
[Epoch 21 Batch 120/173] avg loss 0.00969551, throughput 4.63161K wps
[Epoch 21 Batch 150/173] avg loss 0.0096288, throughput 5.75681K wps
Begin Testing...
[Epoch 21] train avg loss 0.00985166, dev acc 0.7779, dev avg loss 0.509398, throughput 5.18702K wps
[Epoch 22 Batch 30/173] avg loss 0.00929923, throughput 5.25491K wps
[Epoch 22 Batch 60/173] avg loss 0.00977512, throughput 5.23939K wps
[Epoch 22 Batch 90/173] avg loss 0.0097635, throughput 5.40503K wps
[Epoch 22 Batch 120/173] avg loss 0.00977229, throughput 5.46813K wps
[Epoch 22 Batch 150/173] avg loss 0.00965568, throughput 4.8293K wps
Begin Testing...
[Epoch 22] train avg loss 0.00968861, dev acc 0.7727, dev avg loss 0.504416, throughput 5.17457K wps
[Epoch 23 Batch 30/173] avg loss 0.00941939, throughput 4.66248K wps
[Epoch 23 Batch 60/173] avg loss 0.00985802, throughput 5.28058K wps
[Epoch 23 Batch 90/173] avg loss 0.00951905, throughput 5.88526K wps
[Epoch 23 Batch 120/173] avg loss 0.00934025, throughput 5.20608K wps
[Epoch 23 Batch 150/173] avg loss 0.00938806, throughput 5.62994K wps
Begin Testing...
[Epoch 23] train avg loss 0.00958285, dev acc 0.7737, dev avg loss 0.500613, throughput 5.26923K wps
[Epoch 24 Batch 30/173] avg loss 0.00911558, throughput 5.14156K wps
[Epoch 24 Batch 60/173] avg loss 0.00943513, throughput 5.14959K wps
[Epoch 24 Batch 90/173] avg loss 0.00953656, throughput 4.95913K wps
[Epoch 24 Batch 120/173] avg loss 0.00936922, throughput 4.83085K wps
[Epoch 24 Batch 150/173] avg loss 0.00937584, throughput 5.22983K wps
Begin Testing...
[Epoch 24] train avg loss 0.00938743, dev acc 0.7758, dev avg loss 0.495413, throughput 5.15103K wps
[Epoch 25 Batch 30/173] avg loss 0.00918335, throughput 4.81752K wps
[Epoch 25 Batch 60/173] avg loss 0.00934811, throughput 4.88822K wps
[Epoch 25 Batch 90/173] avg loss 0.00918023, throughput 5.92339K wps
[Epoch 25 Batch 120/173] avg loss 0.00927627, throughput 5.30661K wps
[Epoch 25 Batch 150/173] avg loss 0.00935152, throughput 4.82201K wps
Begin Testing...
[Epoch 25] train avg loss 0.00928024, dev acc 0.7821, dev avg loss 0.491728, throughput 5.08358K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/173] avg loss 0.00939426, throughput 5.13251K wps
[Epoch 26 Batch 60/173] avg loss 0.00918322, throughput 5.78684K wps
[Epoch 26 Batch 90/173] avg loss 0.00914092, throughput 5.90043K wps
[Epoch 26 Batch 120/173] avg loss 0.00923014, throughput 4.80164K wps
[Epoch 26 Batch 150/173] avg loss 0.00922097, throughput 4.53339K wps
Begin Testing...
[Epoch 26] train avg loss 0.00921359, dev acc 0.7789, dev avg loss 0.488957, throughput 5.18521K wps
[Epoch 27 Batch 30/173] avg loss 0.0091182, throughput 5.381K wps
[Epoch 27 Batch 60/173] avg loss 0.00914677, throughput 5.0703K wps
[Epoch 27 Batch 90/173] avg loss 0.00896049, throughput 4.95128K wps
[Epoch 27 Batch 120/173] avg loss 0.00899332, throughput 5.25387K wps
[Epoch 27 Batch 150/173] avg loss 0.00899055, throughput 4.60744K wps
Begin Testing...
[Epoch 27] train avg loss 0.00903965, dev acc 0.7800, dev avg loss 0.485846, throughput 5.05451K wps
[Epoch 28 Batch 30/173] avg loss 0.00898214, throughput 6.21178K wps
[Epoch 28 Batch 60/173] avg loss 0.00858564, throughput 5.80557K wps
[Epoch 28 Batch 90/173] avg loss 0.00906, throughput 4.78494K wps
[Epoch 28 Batch 120/173] avg loss 0.00944235, throughput 5.10962K wps
[Epoch 28 Batch 150/173] avg loss 0.0088355, throughput 4.73165K wps
Begin Testing...
[Epoch 28] train avg loss 0.00898336, dev acc 0.7810, dev avg loss 0.483748, throughput 5.26289K wps
[Epoch 29 Batch 30/173] avg loss 0.00891964, throughput 4.64124K wps
[Epoch 29 Batch 60/173] avg loss 0.00877842, throughput 4.58405K wps
[Epoch 29 Batch 90/173] avg loss 0.00853464, throughput 5.12997K wps
[Epoch 29 Batch 120/173] avg loss 0.00871615, throughput 4.99187K wps
[Epoch 29 Batch 150/173] avg loss 0.0088723, throughput 4.57862K wps
Begin Testing...
[Epoch 29] train avg loss 0.00875205, dev acc 0.7789, dev avg loss 0.483071, throughput 4.91634K wps
[Epoch 30 Batch 30/173] avg loss 0.00887342, throughput 5.1861K wps
[Epoch 30 Batch 60/173] avg loss 0.00880271, throughput 4.64688K wps
[Epoch 30 Batch 90/173] avg loss 0.00862907, throughput 5.1282K wps
[Epoch 30 Batch 120/173] avg loss 0.008705, throughput 5.21429K wps
[Epoch 30 Batch 150/173] avg loss 0.00864815, throughput 5.48264K wps
Begin Testing...
[Epoch 30] train avg loss 0.00871618, dev acc 0.7821, dev avg loss 0.48151, throughput 5.12661K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/173] avg loss 0.00831812, throughput 5.22455K wps
[Epoch 31 Batch 60/173] avg loss 0.00857864, throughput 4.62706K wps
[Epoch 31 Batch 90/173] avg loss 0.00849921, throughput 4.89866K wps
[Epoch 31 Batch 120/173] avg loss 0.00882729, throughput 4.8287K wps
[Epoch 31 Batch 150/173] avg loss 0.00901219, throughput 5.17973K wps
Begin Testing...
[Epoch 31] train avg loss 0.00867261, dev acc 0.7821, dev avg loss 0.476626, throughput 4.95117K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/173] avg loss 0.008292, throughput 4.81326K wps
[Epoch 32 Batch 60/173] avg loss 0.00855807, throughput 4.65295K wps
[Epoch 32 Batch 90/173] avg loss 0.00827049, throughput 5.20426K wps
[Epoch 32 Batch 120/173] avg loss 0.00883727, throughput 5.6543K wps
[Epoch 32 Batch 150/173] avg loss 0.00864166, throughput 5.54518K wps
Begin Testing...
[Epoch 32] train avg loss 0.00858754, dev acc 0.7789, dev avg loss 0.475124, throughput 5.07224K wps
[Epoch 33 Batch 30/173] avg loss 0.00829047, throughput 5.19329K wps
[Epoch 33 Batch 60/173] avg loss 0.00826782, throughput 5.28126K wps
[Epoch 33 Batch 90/173] avg loss 0.00840756, throughput 4.96503K wps
[Epoch 33 Batch 120/173] avg loss 0.00858098, throughput 5.33065K wps
[Epoch 33 Batch 150/173] avg loss 0.0084759, throughput 4.91318K wps
Begin Testing...
[Epoch 33] train avg loss 0.00843128, dev acc 0.7800, dev avg loss 0.474768, throughput 5.14732K wps
[Epoch 34 Batch 30/173] avg loss 0.00830726, throughput 5.38701K wps
[Epoch 34 Batch 60/173] avg loss 0.00807689, throughput 4.91649K wps
[Epoch 34 Batch 90/173] avg loss 0.0085019, throughput 5.02826K wps
[Epoch 34 Batch 120/173] avg loss 0.00814369, throughput 5.10564K wps
[Epoch 34 Batch 150/173] avg loss 0.00840029, throughput 4.66324K wps
Begin Testing...
[Epoch 34] train avg loss 0.00832706, dev acc 0.7831, dev avg loss 0.47516, throughput 4.9623K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/173] avg loss 0.00844231, throughput 5.11061K wps
[Epoch 35 Batch 60/173] avg loss 0.0085052, throughput 5.21194K wps
[Epoch 35 Batch 90/173] avg loss 0.00794057, throughput 5.11834K wps
[Epoch 35 Batch 120/173] avg loss 0.00813651, throughput 4.71775K wps
[Epoch 35 Batch 150/173] avg loss 0.00810166, throughput 5.41444K wps
Begin Testing...
[Epoch 35] train avg loss 0.00823513, dev acc 0.7789, dev avg loss 0.469049, throughput 5.10312K wps
[Epoch 36 Batch 30/173] avg loss 0.00852158, throughput 4.75588K wps
[Epoch 36 Batch 60/173] avg loss 0.00819747, throughput 5.50027K wps
[Epoch 36 Batch 90/173] avg loss 0.00815849, throughput 4.71342K wps
[Epoch 36 Batch 120/173] avg loss 0.00820808, throughput 4.84565K wps
[Epoch 36 Batch 150/173] avg loss 0.00786317, throughput 5.11345K wps
Begin Testing...
[Epoch 36] train avg loss 0.00820665, dev acc 0.7769, dev avg loss 0.469086, throughput 4.99711K wps
[Epoch 37 Batch 30/173] avg loss 0.00755765, throughput 4.98489K wps
[Epoch 37 Batch 60/173] avg loss 0.00848475, throughput 4.60147K wps
[Epoch 37 Batch 90/173] avg loss 0.00784368, throughput 4.95572K wps
[Epoch 37 Batch 120/173] avg loss 0.00817641, throughput 5.74884K wps
[Epoch 37 Batch 150/173] avg loss 0.00798635, throughput 4.55348K wps
Begin Testing...
[Epoch 37] train avg loss 0.00800065, dev acc 0.7831, dev avg loss 0.466761, throughput 4.94068K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/173] avg loss 0.00784345, throughput 4.81873K wps
[Epoch 38 Batch 60/173] avg loss 0.00797278, throughput 5.49473K wps
[Epoch 38 Batch 90/173] avg loss 0.00796434, throughput 4.90832K wps
[Epoch 38 Batch 120/173] avg loss 0.00801822, throughput 5.3215K wps
[Epoch 38 Batch 150/173] avg loss 0.00785672, throughput 4.89642K wps
Begin Testing...
[Epoch 38] train avg loss 0.00793462, dev acc 0.7842, dev avg loss 0.465476, throughput 5.03799K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/173] avg loss 0.0080477, throughput 4.62867K wps
[Epoch 39 Batch 60/173] avg loss 0.00813901, throughput 4.70032K wps
[Epoch 39 Batch 90/173] avg loss 0.00783729, throughput 5.00518K wps
[Epoch 39 Batch 120/173] avg loss 0.00789919, throughput 5.23613K wps
[Epoch 39 Batch 150/173] avg loss 0.00766221, throughput 5.27675K wps
Begin Testing...
[Epoch 39] train avg loss 0.00794687, dev acc 0.7831, dev avg loss 0.465416, throughput 4.89699K wps
[Epoch 40 Batch 30/173] avg loss 0.00772384, throughput 4.86979K wps
[Epoch 40 Batch 60/173] avg loss 0.0081205, throughput 5.10069K wps
[Epoch 40 Batch 90/173] avg loss 0.00782274, throughput 5.60892K wps
[Epoch 40 Batch 120/173] avg loss 0.00763747, throughput 4.99782K wps
[Epoch 40 Batch 150/173] avg loss 0.00786925, throughput 4.82022K wps
Begin Testing...
[Epoch 40] train avg loss 0.00784546, dev acc 0.7852, dev avg loss 0.463321, throughput 5.11893K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/173] avg loss 0.00758745, throughput 4.89006K wps
[Epoch 41 Batch 60/173] avg loss 0.00779702, throughput 4.74498K wps
[Epoch 41 Batch 90/173] avg loss 0.00746944, throughput 4.91894K wps
[Epoch 41 Batch 120/173] avg loss 0.00793109, throughput 5.58583K wps
[Epoch 41 Batch 150/173] avg loss 0.00724105, throughput 5.30512K wps
Begin Testing...
[Epoch 41] train avg loss 0.00763703, dev acc 0.7737, dev avg loss 0.465945, throughput 5.07779K wps
[Epoch 42 Batch 30/173] avg loss 0.00746119, throughput 5.45629K wps
[Epoch 42 Batch 60/173] avg loss 0.00727936, throughput 4.57796K wps
[Epoch 42 Batch 90/173] avg loss 0.00768976, throughput 4.92635K wps
[Epoch 42 Batch 120/173] avg loss 0.00763451, throughput 4.56109K wps
[Epoch 42 Batch 150/173] avg loss 0.0078429, throughput 5.34274K wps
Begin Testing...
[Epoch 42] train avg loss 0.00756164, dev acc 0.7831, dev avg loss 0.461554, throughput 4.96438K wps
[Epoch 43 Batch 30/173] avg loss 0.00732077, throughput 4.94662K wps
[Epoch 43 Batch 60/173] avg loss 0.00731227, throughput 4.67574K wps
[Epoch 43 Batch 90/173] avg loss 0.00769109, throughput 5.02558K wps
[Epoch 43 Batch 120/173] avg loss 0.00773248, throughput 5.23311K wps
[Epoch 43 Batch 150/173] avg loss 0.0074817, throughput 4.96237K wps
Begin Testing...
[Epoch 43] train avg loss 0.00755765, dev acc 0.7821, dev avg loss 0.460611, throughput 4.99609K wps
[Epoch 44 Batch 30/173] avg loss 0.00748522, throughput 5.23307K wps
[Epoch 44 Batch 60/173] avg loss 0.00750163, throughput 4.65007K wps
[Epoch 44 Batch 90/173] avg loss 0.00733515, throughput 5.4155K wps
[Epoch 44 Batch 120/173] avg loss 0.00756809, throughput 4.89552K wps
[Epoch 44 Batch 150/173] avg loss 0.00723265, throughput 4.91179K wps
Begin Testing...
[Epoch 44] train avg loss 0.00744704, dev acc 0.7831, dev avg loss 0.466324, throughput 4.99008K wps
[Epoch 45 Batch 30/173] avg loss 0.00731343, throughput 5.94027K wps
[Epoch 45 Batch 60/173] avg loss 0.00751665, throughput 5.13301K wps
[Epoch 45 Batch 90/173] avg loss 0.00725386, throughput 5.05683K wps
[Epoch 45 Batch 120/173] avg loss 0.0074986, throughput 6.13089K wps
[Epoch 45 Batch 150/173] avg loss 0.00736482, throughput 5.35307K wps
Begin Testing...
[Epoch 45] train avg loss 0.0073794, dev acc 0.7831, dev avg loss 0.458927, throughput 5.5092K wps
[Epoch 46 Batch 30/173] avg loss 0.00721025, throughput 5.19021K wps
[Epoch 46 Batch 60/173] avg loss 0.00744209, throughput 5.06601K wps
[Epoch 46 Batch 90/173] avg loss 0.00716298, throughput 4.8686K wps
[Epoch 46 Batch 120/173] avg loss 0.0068921, throughput 4.63886K wps
[Epoch 46 Batch 150/173] avg loss 0.00726175, throughput 5.44511K wps
Begin Testing...
[Epoch 46] train avg loss 0.00724124, dev acc 0.7862, dev avg loss 0.457586, throughput 4.98381K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/173] avg loss 0.00696818, throughput 4.8814K wps
[Epoch 47 Batch 60/173] avg loss 0.00719189, throughput 4.93571K wps
[Epoch 47 Batch 90/173] avg loss 0.00735534, throughput 5.41436K wps
[Epoch 47 Batch 120/173] avg loss 0.0071368, throughput 5.03714K wps
[Epoch 47 Batch 150/173] avg loss 0.00720835, throughput 5.47927K wps
Begin Testing...
[Epoch 47] train avg loss 0.00720955, dev acc 0.7810, dev avg loss 0.456885, throughput 5.07976K wps
[Epoch 48 Batch 30/173] avg loss 0.00735913, throughput 4.98527K wps
[Epoch 48 Batch 60/173] avg loss 0.00705466, throughput 4.80371K wps
[Epoch 48 Batch 90/173] avg loss 0.00722077, throughput 6.20002K wps
[Epoch 48 Batch 120/173] avg loss 0.00722616, throughput 4.58374K wps
[Epoch 48 Batch 150/173] avg loss 0.00724642, throughput 5.50098K wps
Begin Testing...
[Epoch 48] train avg loss 0.00717154, dev acc 0.7810, dev avg loss 0.456272, throughput 5.15573K wps
[Epoch 49 Batch 30/173] avg loss 0.00715053, throughput 6.48065K wps
[Epoch 49 Batch 60/173] avg loss 0.0072213, throughput 5.2772K wps
[Epoch 49 Batch 90/173] avg loss 0.00685093, throughput 5.5664K wps
[Epoch 49 Batch 120/173] avg loss 0.00696004, throughput 4.79116K wps
[Epoch 49 Batch 150/173] avg loss 0.00704694, throughput 5.25789K wps
Begin Testing...
[Epoch 49] train avg loss 0.00706049, dev acc 0.7842, dev avg loss 0.457874, throughput 5.27159K wps
[Epoch 50 Batch 30/173] avg loss 0.00687562, throughput 5.49759K wps
[Epoch 50 Batch 60/173] avg loss 0.00721158, throughput 5.35279K wps
[Epoch 50 Batch 90/173] avg loss 0.0069903, throughput 4.8104K wps
[Epoch 50 Batch 120/173] avg loss 0.00690131, throughput 4.77049K wps
[Epoch 50 Batch 150/173] avg loss 0.00709492, throughput 4.94774K wps
Begin Testing...
[Epoch 50] train avg loss 0.00701432, dev acc 0.7925, dev avg loss 0.457981, throughput 5.05781K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/173] avg loss 0.0066735, throughput 4.62579K wps
[Epoch 51 Batch 60/173] avg loss 0.00694466, throughput 5.14269K wps
[Epoch 51 Batch 90/173] avg loss 0.00699331, throughput 5.27127K wps
[Epoch 51 Batch 120/173] avg loss 0.00655598, throughput 5.03749K wps
[Epoch 51 Batch 150/173] avg loss 0.00710796, throughput 4.67919K wps
Begin Testing...
[Epoch 51] train avg loss 0.00688582, dev acc 0.7852, dev avg loss 0.455275, throughput 4.90446K wps
[Epoch 52 Batch 30/173] avg loss 0.0069085, throughput 4.9208K wps
[Epoch 52 Batch 60/173] avg loss 0.00676454, throughput 4.71744K wps
[Epoch 52 Batch 90/173] avg loss 0.00658705, throughput 4.78604K wps
[Epoch 52 Batch 120/173] avg loss 0.00679894, throughput 4.65474K wps
[Epoch 52 Batch 150/173] avg loss 0.00658586, throughput 4.7807K wps
Begin Testing...
[Epoch 52] train avg loss 0.00674449, dev acc 0.7821, dev avg loss 0.454078, throughput 4.873K wps
[Epoch 53 Batch 30/173] avg loss 0.00678576, throughput 4.65855K wps
[Epoch 53 Batch 60/173] avg loss 0.00666812, throughput 5.26729K wps
[Epoch 53 Batch 90/173] avg loss 0.00673609, throughput 5.40461K wps
[Epoch 53 Batch 120/173] avg loss 0.00631991, throughput 5.30793K wps
[Epoch 53 Batch 150/173] avg loss 0.00676372, throughput 4.73306K wps
Begin Testing...
[Epoch 53] train avg loss 0.00670955, dev acc 0.7769, dev avg loss 0.455671, throughput 5.05298K wps
[Epoch 54 Batch 30/173] avg loss 0.00658493, throughput 4.69749K wps
[Epoch 54 Batch 60/173] avg loss 0.00688222, throughput 4.67672K wps
[Epoch 54 Batch 90/173] avg loss 0.00690957, throughput 5.63059K wps
[Epoch 54 Batch 120/173] avg loss 0.00657253, throughput 4.75145K wps
[Epoch 54 Batch 150/173] avg loss 0.00619436, throughput 4.65874K wps
Begin Testing...
[Epoch 54] train avg loss 0.00666112, dev acc 0.7873, dev avg loss 0.454713, throughput 4.83164K wps
[Epoch 55 Batch 30/173] avg loss 0.00652164, throughput 4.92312K wps
[Epoch 55 Batch 60/173] avg loss 0.00621313, throughput 4.66794K wps
[Epoch 55 Batch 90/173] avg loss 0.0067951, throughput 5.61499K wps
[Epoch 55 Batch 120/173] avg loss 0.00651888, throughput 5.13255K wps
[Epoch 55 Batch 150/173] avg loss 0.00671705, throughput 5.22778K wps
Begin Testing...
[Epoch 55] train avg loss 0.00655812, dev acc 0.7810, dev avg loss 0.452551, throughput 5.07548K wps
[Epoch 56 Batch 30/173] avg loss 0.00656313, throughput 4.75748K wps
[Epoch 56 Batch 60/173] avg loss 0.00650178, throughput 5.08728K wps
[Epoch 56 Batch 90/173] avg loss 0.0066046, throughput 5.44475K wps
[Epoch 56 Batch 120/173] avg loss 0.00613965, throughput 6.08224K wps
[Epoch 56 Batch 150/173] avg loss 0.00648199, throughput 5.02751K wps
Begin Testing...
[Epoch 56] train avg loss 0.00643856, dev acc 0.7821, dev avg loss 0.451522, throughput 5.14702K wps
[Epoch 57 Batch 30/173] avg loss 0.00629536, throughput 4.6547K wps
[Epoch 57 Batch 60/173] avg loss 0.00632944, throughput 5.76671K wps
[Epoch 57 Batch 90/173] avg loss 0.00618945, throughput 5.22342K wps
[Epoch 57 Batch 120/173] avg loss 0.00649701, throughput 4.75332K wps
[Epoch 57 Batch 150/173] avg loss 0.00657343, throughput 4.87755K wps
Begin Testing...
[Epoch 57] train avg loss 0.00639229, dev acc 0.7852, dev avg loss 0.454077, throughput 4.99911K wps
[Epoch 58 Batch 30/173] avg loss 0.00620641, throughput 4.99022K wps
[Epoch 58 Batch 60/173] avg loss 0.00640772, throughput 5.28465K wps
[Epoch 58 Batch 90/173] avg loss 0.00636956, throughput 5.61003K wps
[Epoch 58 Batch 120/173] avg loss 0.00626165, throughput 4.67783K wps
[Epoch 58 Batch 150/173] avg loss 0.00655967, throughput 5.59331K wps
Begin Testing...
[Epoch 58] train avg loss 0.00637025, dev acc 0.7842, dev avg loss 0.451325, throughput 5.2118K wps
[Epoch 59 Batch 30/173] avg loss 0.00643422, throughput 4.90862K wps
[Epoch 59 Batch 60/173] avg loss 0.0062976, throughput 5.23693K wps
[Epoch 59 Batch 90/173] avg loss 0.00610454, throughput 4.99105K wps
[Epoch 59 Batch 120/173] avg loss 0.00640902, throughput 5.02525K wps
[Epoch 59 Batch 150/173] avg loss 0.00591371, throughput 4.62366K wps
Begin Testing...
[Epoch 59] train avg loss 0.00628197, dev acc 0.7800, dev avg loss 0.451742, throughput 4.90543K wps
[Epoch 60 Batch 30/173] avg loss 0.00609979, throughput 4.87355K wps
[Epoch 60 Batch 60/173] avg loss 0.00611119, throughput 5.41554K wps
[Epoch 60 Batch 90/173] avg loss 0.00604506, throughput 4.97009K wps
[Epoch 60 Batch 120/173] avg loss 0.00619272, throughput 4.96679K wps
[Epoch 60 Batch 150/173] avg loss 0.00593627, throughput 5.75131K wps
Begin Testing...
[Epoch 60] train avg loss 0.00610443, dev acc 0.7789, dev avg loss 0.451216, throughput 5.32313K wps
[Epoch 61 Batch 30/173] avg loss 0.00621069, throughput 5.31044K wps
[Epoch 61 Batch 60/173] avg loss 0.00619516, throughput 5.70336K wps
[Epoch 61 Batch 90/173] avg loss 0.00619941, throughput 4.92295K wps
[Epoch 61 Batch 120/173] avg loss 0.00594796, throughput 4.73913K wps
[Epoch 61 Batch 150/173] avg loss 0.00604357, throughput 5.10883K wps
Begin Testing...
[Epoch 61] train avg loss 0.00606387, dev acc 0.7800, dev avg loss 0.450377, throughput 5.06184K wps
[Epoch 62 Batch 30/173] avg loss 0.00590932, throughput 4.72303K wps
[Epoch 62 Batch 60/173] avg loss 0.00608718, throughput 4.89251K wps
[Epoch 62 Batch 90/173] avg loss 0.00610524, throughput 4.60922K wps
[Epoch 62 Batch 120/173] avg loss 0.00564207, throughput 5.14773K wps
[Epoch 62 Batch 150/173] avg loss 0.00632679, throughput 5.00912K wps
Begin Testing...
[Epoch 62] train avg loss 0.00606584, dev acc 0.7810, dev avg loss 0.450293, throughput 4.89403K wps
[Epoch 63 Batch 30/173] avg loss 0.00608683, throughput 5.05761K wps
[Epoch 63 Batch 60/173] avg loss 0.00598142, throughput 5.09281K wps
[Epoch 63 Batch 90/173] avg loss 0.00583615, throughput 5.80538K wps
[Epoch 63 Batch 120/173] avg loss 0.00617235, throughput 5.16485K wps
[Epoch 63 Batch 150/173] avg loss 0.00586596, throughput 5.451K wps
Begin Testing...
[Epoch 63] train avg loss 0.00597303, dev acc 0.7852, dev avg loss 0.450348, throughput 5.35852K wps
[Epoch 64 Batch 30/173] avg loss 0.00547465, throughput 4.90365K wps
[Epoch 64 Batch 60/173] avg loss 0.00591287, throughput 5.31033K wps
[Epoch 64 Batch 90/173] avg loss 0.00577357, throughput 5.5912K wps
[Epoch 64 Batch 120/173] avg loss 0.00566097, throughput 5.54256K wps
[Epoch 64 Batch 150/173] avg loss 0.00615464, throughput 5.64747K wps
Begin Testing...
[Epoch 64] train avg loss 0.00579967, dev acc 0.7810, dev avg loss 0.450527, throughput 5.37963K wps
[Epoch 65 Batch 30/173] avg loss 0.0058986, throughput 4.78051K wps
[Epoch 65 Batch 60/173] avg loss 0.00565023, throughput 4.89644K wps
[Epoch 65 Batch 90/173] avg loss 0.00584556, throughput 4.85649K wps
[Epoch 65 Batch 120/173] avg loss 0.00583358, throughput 4.75762K wps
[Epoch 65 Batch 150/173] avg loss 0.0054663, throughput 4.89838K wps
Begin Testing...
[Epoch 65] train avg loss 0.00576666, dev acc 0.7852, dev avg loss 0.449974, throughput 4.88938K wps
[Epoch 66 Batch 30/173] avg loss 0.00562051, throughput 5.07112K wps
[Epoch 66 Batch 60/173] avg loss 0.00568315, throughput 5.07049K wps
[Epoch 66 Batch 90/173] avg loss 0.00577499, throughput 4.70297K wps
[Epoch 66 Batch 120/173] avg loss 0.00538106, throughput 4.88528K wps
[Epoch 66 Batch 150/173] avg loss 0.0058078, throughput 5.30335K wps
Begin Testing...
[Epoch 66] train avg loss 0.00571187, dev acc 0.7831, dev avg loss 0.453784, throughput 4.98565K wps
[Epoch 67 Batch 30/173] avg loss 0.00574403, throughput 5.08524K wps
[Epoch 67 Batch 60/173] avg loss 0.00540233, throughput 4.80773K wps
[Epoch 67 Batch 90/173] avg loss 0.00547079, throughput 5.37595K wps
[Epoch 67 Batch 120/173] avg loss 0.00539805, throughput 5.38098K wps
[Epoch 67 Batch 150/173] avg loss 0.00568117, throughput 5.15045K wps
Begin Testing...
[Epoch 67] train avg loss 0.00561382, dev acc 0.7852, dev avg loss 0.45011, throughput 5.22384K wps
[Epoch 68 Batch 30/173] avg loss 0.00559211, throughput 5.21787K wps
[Epoch 68 Batch 60/173] avg loss 0.00541885, throughput 5.27269K wps
[Epoch 68 Batch 90/173] avg loss 0.00564019, throughput 5.16632K wps
[Epoch 68 Batch 120/173] avg loss 0.00534886, throughput 5.81017K wps
[Epoch 68 Batch 150/173] avg loss 0.00561747, throughput 5.52397K wps
Begin Testing...
[Epoch 68] train avg loss 0.00559177, dev acc 0.7883, dev avg loss 0.449576, throughput 5.41083K wps
[Epoch 69 Batch 30/173] avg loss 0.00554967, throughput 4.96804K wps
[Epoch 69 Batch 60/173] avg loss 0.00558818, throughput 5.28922K wps
[Epoch 69 Batch 90/173] avg loss 0.00549578, throughput 5.06442K wps
[Epoch 69 Batch 120/173] avg loss 0.00559097, throughput 5.66193K wps
[Epoch 69 Batch 150/173] avg loss 0.00555813, throughput 5.16037K wps
Begin Testing...
[Epoch 69] train avg loss 0.00558867, dev acc 0.7810, dev avg loss 0.449768, throughput 5.19529K wps
[Epoch 70 Batch 30/173] avg loss 0.00563998, throughput 5.73997K wps
[Epoch 70 Batch 60/173] avg loss 0.00523092, throughput 5.54162K wps
[Epoch 70 Batch 90/173] avg loss 0.00521959, throughput 4.79535K wps
[Epoch 70 Batch 120/173] avg loss 0.00539761, throughput 5.13547K wps
[Epoch 70 Batch 150/173] avg loss 0.00551498, throughput 4.98261K wps
Begin Testing...
[Epoch 70] train avg loss 0.00541358, dev acc 0.7810, dev avg loss 0.449438, throughput 5.30971K wps
[Epoch 71 Batch 30/173] avg loss 0.00507258, throughput 4.87521K wps
[Epoch 71 Batch 60/173] avg loss 0.00533254, throughput 5.32723K wps
[Epoch 71 Batch 90/173] avg loss 0.00553293, throughput 5.23175K wps
[Epoch 71 Batch 120/173] avg loss 0.00522636, throughput 5.10651K wps
[Epoch 71 Batch 150/173] avg loss 0.00515043, throughput 5.60753K wps
Begin Testing...
[Epoch 71] train avg loss 0.00532609, dev acc 0.7800, dev avg loss 0.449483, throughput 5.20642K wps
[Epoch 72 Batch 30/173] avg loss 0.00525653, throughput 4.86621K wps
[Epoch 72 Batch 60/173] avg loss 0.00499161, throughput 4.97242K wps
[Epoch 72 Batch 90/173] avg loss 0.00553733, throughput 5.11335K wps
[Epoch 72 Batch 120/173] avg loss 0.00513816, throughput 5.1862K wps
[Epoch 72 Batch 150/173] avg loss 0.00541111, throughput 5.41126K wps
Begin Testing...
[Epoch 72] train avg loss 0.00531867, dev acc 0.7883, dev avg loss 0.44962, throughput 5.12159K wps
[Epoch 73 Batch 30/173] avg loss 0.00517682, throughput 5.05545K wps
[Epoch 73 Batch 60/173] avg loss 0.00504565, throughput 4.92746K wps
[Epoch 73 Batch 90/173] avg loss 0.00542049, throughput 4.75681K wps
[Epoch 73 Batch 120/173] avg loss 0.00519679, throughput 5.65095K wps
[Epoch 73 Batch 150/173] avg loss 0.00530848, throughput 5.68737K wps
Begin Testing...
[Epoch 73] train avg loss 0.00528565, dev acc 0.7862, dev avg loss 0.450621, throughput 5.1727K wps
[Epoch 74 Batch 30/173] avg loss 0.00498418, throughput 5.03366K wps
[Epoch 74 Batch 60/173] avg loss 0.00508972, throughput 5.20943K wps
[Epoch 74 Batch 90/173] avg loss 0.00505202, throughput 5.4472K wps
[Epoch 74 Batch 120/173] avg loss 0.00550171, throughput 5.07218K wps
[Epoch 74 Batch 150/173] avg loss 0.00529191, throughput 5.69724K wps
Begin Testing...
[Epoch 74] train avg loss 0.00517896, dev acc 0.7852, dev avg loss 0.451072, throughput 5.38431K wps
[Epoch 75 Batch 30/173] avg loss 0.00498953, throughput 4.99417K wps
[Epoch 75 Batch 60/173] avg loss 0.00512365, throughput 4.98438K wps
[Epoch 75 Batch 90/173] avg loss 0.0051018, throughput 4.87818K wps
[Epoch 75 Batch 120/173] avg loss 0.0052028, throughput 4.81964K wps
[Epoch 75 Batch 150/173] avg loss 0.00508948, throughput 4.97085K wps
Begin Testing...
[Epoch 75] train avg loss 0.0050817, dev acc 0.7873, dev avg loss 0.44988, throughput 4.97966K wps
[Epoch 76 Batch 30/173] avg loss 0.00493707, throughput 5.54807K wps
[Epoch 76 Batch 60/173] avg loss 0.00473491, throughput 4.97469K wps
[Epoch 76 Batch 90/173] avg loss 0.00502397, throughput 5.00583K wps
[Epoch 76 Batch 120/173] avg loss 0.00492352, throughput 4.86997K wps
[Epoch 76 Batch 150/173] avg loss 0.00530986, throughput 4.96239K wps
Begin Testing...
[Epoch 76] train avg loss 0.00502204, dev acc 0.7852, dev avg loss 0.448766, throughput 5.01191K wps
[Epoch 77 Batch 30/173] avg loss 0.00484569, throughput 4.93132K wps
[Epoch 77 Batch 60/173] avg loss 0.00519175, throughput 5.54081K wps
[Epoch 77 Batch 90/173] avg loss 0.00493742, throughput 5.00063K wps
[Epoch 77 Batch 120/173] avg loss 0.00505344, throughput 5.65602K wps
[Epoch 77 Batch 150/173] avg loss 0.00513794, throughput 5.59691K wps
Begin Testing...
[Epoch 77] train avg loss 0.00504905, dev acc 0.7883, dev avg loss 0.454003, throughput 5.32124K wps
[Epoch 78 Batch 30/173] avg loss 0.00477563, throughput 5.74104K wps
[Epoch 78 Batch 60/173] avg loss 0.00483013, throughput 4.73279K wps
[Epoch 78 Batch 90/173] avg loss 0.00499139, throughput 5.64434K wps
[Epoch 78 Batch 120/173] avg loss 0.00514453, throughput 4.99885K wps
[Epoch 78 Batch 150/173] avg loss 0.00515444, throughput 4.56949K wps
Begin Testing...
[Epoch 78] train avg loss 0.00497735, dev acc 0.7873, dev avg loss 0.449497, throughput 5.05404K wps
[Epoch 79 Batch 30/173] avg loss 0.00488428, throughput 4.64247K wps
[Epoch 79 Batch 60/173] avg loss 0.00465096, throughput 5.2455K wps
[Epoch 79 Batch 90/173] avg loss 0.00473426, throughput 5.76755K wps
[Epoch 79 Batch 120/173] avg loss 0.00498407, throughput 4.76367K wps
[Epoch 79 Batch 150/173] avg loss 0.00490865, throughput 5.14882K wps
Begin Testing...
[Epoch 79] train avg loss 0.00488717, dev acc 0.7842, dev avg loss 0.450095, throughput 5.15693K wps
[Epoch 80 Batch 30/173] avg loss 0.00460828, throughput 4.71925K wps
[Epoch 80 Batch 60/173] avg loss 0.00470408, throughput 4.84081K wps
[Epoch 80 Batch 90/173] avg loss 0.00473801, throughput 4.87458K wps
[Epoch 80 Batch 120/173] avg loss 0.00500825, throughput 4.94336K wps
[Epoch 80 Batch 150/173] avg loss 0.00471116, throughput 5.06299K wps
Begin Testing...
[Epoch 80] train avg loss 0.00473892, dev acc 0.7873, dev avg loss 0.449523, throughput 4.91787K wps
[Epoch 81 Batch 30/173] avg loss 0.00480492, throughput 5.60863K wps
[Epoch 81 Batch 60/173] avg loss 0.00463142, throughput 5.59586K wps
[Epoch 81 Batch 90/173] avg loss 0.00472625, throughput 5.04839K wps
[Epoch 81 Batch 120/173] avg loss 0.00492321, throughput 5.91366K wps
[Epoch 81 Batch 150/173] avg loss 0.00455484, throughput 4.82233K wps
Begin Testing...
[Epoch 81] train avg loss 0.00472983, dev acc 0.7810, dev avg loss 0.45272, throughput 5.25756K wps
[Epoch 82 Batch 30/173] avg loss 0.00467563, throughput 4.71121K wps
[Epoch 82 Batch 60/173] avg loss 0.00472526, throughput 5.29277K wps
[Epoch 82 Batch 90/173] avg loss 0.0047023, throughput 5.11077K wps
[Epoch 82 Batch 120/173] avg loss 0.00456943, throughput 4.88257K wps
[Epoch 82 Batch 150/173] avg loss 0.00460551, throughput 5.2205K wps
Begin Testing...
[Epoch 82] train avg loss 0.0047133, dev acc 0.7894, dev avg loss 0.449964, throughput 5.1464K wps
[Epoch 83 Batch 30/173] avg loss 0.00472788, throughput 5.61189K wps
[Epoch 83 Batch 60/173] avg loss 0.00462966, throughput 4.7916K wps
[Epoch 83 Batch 90/173] avg loss 0.00444684, throughput 4.98073K wps
[Epoch 83 Batch 120/173] avg loss 0.0046071, throughput 5.55282K wps
[Epoch 83 Batch 150/173] avg loss 0.00467045, throughput 5.59249K wps
Begin Testing...
[Epoch 83] train avg loss 0.00462788, dev acc 0.7862, dev avg loss 0.449006, throughput 5.28741K wps
[Epoch 84 Batch 30/173] avg loss 0.00445028, throughput 6.10534K wps
[Epoch 84 Batch 60/173] avg loss 0.00440756, throughput 5.21799K wps
[Epoch 84 Batch 90/173] avg loss 0.00440001, throughput 4.88101K wps
[Epoch 84 Batch 120/173] avg loss 0.00472342, throughput 5.61172K wps
[Epoch 84 Batch 150/173] avg loss 0.00451562, throughput 5.21519K wps
Begin Testing...
[Epoch 84] train avg loss 0.00446628, dev acc 0.7873, dev avg loss 0.449216, throughput 5.25636K wps
[Epoch 85 Batch 30/173] avg loss 0.00468972, throughput 4.83961K wps
[Epoch 85 Batch 60/173] avg loss 0.00458824, throughput 5.01999K wps
[Epoch 85 Batch 90/173] avg loss 0.00447773, throughput 5.2038K wps
[Epoch 85 Batch 120/173] avg loss 0.00433354, throughput 5.99419K wps
[Epoch 85 Batch 150/173] avg loss 0.00460573, throughput 5.70128K wps
Begin Testing...
[Epoch 85] train avg loss 0.00449169, dev acc 0.7831, dev avg loss 0.45037, throughput 5.2238K wps
[Epoch 86 Batch 30/173] avg loss 0.00454186, throughput 5.34928K wps
[Epoch 86 Batch 60/173] avg loss 0.00456338, throughput 5.07968K wps
[Epoch 86 Batch 90/173] avg loss 0.0042848, throughput 4.89314K wps
[Epoch 86 Batch 120/173] avg loss 0.00419287, throughput 5.44678K wps
[Epoch 86 Batch 150/173] avg loss 0.00444493, throughput 5.06363K wps
Begin Testing...
[Epoch 86] train avg loss 0.00443073, dev acc 0.7862, dev avg loss 0.454208, throughput 5.24736K wps
[Epoch 87 Batch 30/173] avg loss 0.00428443, throughput 6.19194K wps
[Epoch 87 Batch 60/173] avg loss 0.00439687, throughput 5.20502K wps
[Epoch 87 Batch 90/173] avg loss 0.00433872, throughput 4.97624K wps
[Epoch 87 Batch 120/173] avg loss 0.00425968, throughput 5.4932K wps
[Epoch 87 Batch 150/173] avg loss 0.00435473, throughput 5.17066K wps
Begin Testing...
[Epoch 87] train avg loss 0.00440124, dev acc 0.7821, dev avg loss 0.454522, throughput 5.41547K wps
[Epoch 88 Batch 30/173] avg loss 0.00447574, throughput 4.9638K wps
[Epoch 88 Batch 60/173] avg loss 0.00424529, throughput 4.99534K wps
[Epoch 88 Batch 90/173] avg loss 0.0044492, throughput 5.43945K wps
[Epoch 88 Batch 120/173] avg loss 0.00417818, throughput 5.86128K wps
[Epoch 88 Batch 150/173] avg loss 0.00431538, throughput 5.01351K wps
Begin Testing...
[Epoch 88] train avg loss 0.00432355, dev acc 0.7873, dev avg loss 0.450542, throughput 5.30142K wps
[Epoch 89 Batch 30/173] avg loss 0.00455407, throughput 4.82519K wps
[Epoch 89 Batch 60/173] avg loss 0.00423377, throughput 4.95294K wps
[Epoch 89 Batch 90/173] avg loss 0.0041945, throughput 5.7345K wps
[Epoch 89 Batch 120/173] avg loss 0.00421861, throughput 5.5745K wps
[Epoch 89 Batch 150/173] avg loss 0.00416633, throughput 5.48833K wps
Begin Testing...
[Epoch 89] train avg loss 0.00426453, dev acc 0.7914, dev avg loss 0.451115, throughput 5.2236K wps
[Epoch 90 Batch 30/173] avg loss 0.00424252, throughput 5.35709K wps
[Epoch 90 Batch 60/173] avg loss 0.00409977, throughput 4.84155K wps
[Epoch 90 Batch 90/173] avg loss 0.00433665, throughput 5.13427K wps
[Epoch 90 Batch 120/173] avg loss 0.00413672, throughput 5.38633K wps
[Epoch 90 Batch 150/173] avg loss 0.00409833, throughput 5.52041K wps
Begin Testing...
[Epoch 90] train avg loss 0.00421755, dev acc 0.7862, dev avg loss 0.451945, throughput 5.25981K wps
[Epoch 91 Batch 30/173] avg loss 0.00432324, throughput 6.20994K wps
[Epoch 91 Batch 60/173] avg loss 0.00441926, throughput 5.03267K wps
[Epoch 91 Batch 90/173] avg loss 0.00384216, throughput 4.63866K wps
[Epoch 91 Batch 120/173] avg loss 0.00421704, throughput 5.02538K wps
[Epoch 91 Batch 150/173] avg loss 0.00403386, throughput 4.94333K wps
Begin Testing...
[Epoch 91] train avg loss 0.00416608, dev acc 0.7904, dev avg loss 0.451226, throughput 5.04354K wps
[Epoch 92 Batch 30/173] avg loss 0.00396194, throughput 4.6996K wps
[Epoch 92 Batch 60/173] avg loss 0.00394259, throughput 4.85744K wps
[Epoch 92 Batch 90/173] avg loss 0.00408728, throughput 4.81136K wps
[Epoch 92 Batch 120/173] avg loss 0.00384698, throughput 5.04302K wps
[Epoch 92 Batch 150/173] avg loss 0.00412483, throughput 5.66761K wps
Begin Testing...
[Epoch 92] train avg loss 0.0040085, dev acc 0.7883, dev avg loss 0.453178, throughput 5.00909K wps
[Epoch 93 Batch 30/173] avg loss 0.0040213, throughput 5.06661K wps
[Epoch 93 Batch 60/173] avg loss 0.00390485, throughput 5.76549K wps
[Epoch 93 Batch 90/173] avg loss 0.00406105, throughput 4.75878K wps
[Epoch 93 Batch 120/173] avg loss 0.00410054, throughput 5.21254K wps
[Epoch 93 Batch 150/173] avg loss 0.00421118, throughput 4.83198K wps
Begin Testing...
[Epoch 93] train avg loss 0.00406281, dev acc 0.7873, dev avg loss 0.454673, throughput 5.03927K wps
[Epoch 94 Batch 30/173] avg loss 0.00401743, throughput 5.0579K wps
[Epoch 94 Batch 60/173] avg loss 0.00400931, throughput 5.55956K wps
[Epoch 94 Batch 90/173] avg loss 0.00393281, throughput 5.80933K wps
[Epoch 94 Batch 120/173] avg loss 0.00425057, throughput 5.31353K wps
[Epoch 94 Batch 150/173] avg loss 0.00386621, throughput 5.54297K wps
Begin Testing...
[Epoch 94] train avg loss 0.00400992, dev acc 0.7904, dev avg loss 0.453526, throughput 5.44677K wps
[Epoch 95 Batch 30/173] avg loss 0.00383435, throughput 5.71618K wps
[Epoch 95 Batch 60/173] avg loss 0.00398703, throughput 4.83181K wps
[Epoch 95 Batch 90/173] avg loss 0.00407045, throughput 4.96817K wps
[Epoch 95 Batch 120/173] avg loss 0.00412855, throughput 4.91094K wps
[Epoch 95 Batch 150/173] avg loss 0.00380654, throughput 4.71088K wps
Begin Testing...
[Epoch 95] train avg loss 0.00397795, dev acc 0.7883, dev avg loss 0.454167, throughput 5.01104K wps
[Epoch 96 Batch 30/173] avg loss 0.003961, throughput 4.80897K wps
[Epoch 96 Batch 60/173] avg loss 0.00403573, throughput 4.63041K wps
[Epoch 96 Batch 90/173] avg loss 0.00377341, throughput 5.39091K wps
[Epoch 96 Batch 120/173] avg loss 0.00382963, throughput 5.60129K wps
[Epoch 96 Batch 150/173] avg loss 0.00401569, throughput 5.03656K wps
Begin Testing...
[Epoch 96] train avg loss 0.00391439, dev acc 0.7894, dev avg loss 0.452545, throughput 5.08393K wps
[Epoch 97 Batch 30/173] avg loss 0.00396288, throughput 4.63555K wps
[Epoch 97 Batch 60/173] avg loss 0.00375187, throughput 4.71666K wps
[Epoch 97 Batch 90/173] avg loss 0.00374924, throughput 5.7539K wps
[Epoch 97 Batch 120/173] avg loss 0.00387192, throughput 5.42161K wps
[Epoch 97 Batch 150/173] avg loss 0.00385474, throughput 4.91628K wps
Begin Testing...
[Epoch 97] train avg loss 0.0038774, dev acc 0.7862, dev avg loss 0.453103, throughput 5.05038K wps
[Epoch 98 Batch 30/173] avg loss 0.00384701, throughput 5.82867K wps
[Epoch 98 Batch 60/173] avg loss 0.00390209, throughput 5.2243K wps
[Epoch 98 Batch 90/173] avg loss 0.00371811, throughput 6.05481K wps
[Epoch 98 Batch 120/173] avg loss 0.00382224, throughput 5.11724K wps
[Epoch 98 Batch 150/173] avg loss 0.00383323, throughput 5.4181K wps
Begin Testing...
[Epoch 98] train avg loss 0.00383618, dev acc 0.7873, dev avg loss 0.454312, throughput 5.46634K wps
[Epoch 99 Batch 30/173] avg loss 0.00364985, throughput 4.71403K wps
[Epoch 99 Batch 60/173] avg loss 0.00373699, throughput 4.84845K wps
[Epoch 99 Batch 90/173] avg loss 0.00384006, throughput 5.12744K wps
[Epoch 99 Batch 120/173] avg loss 0.00361616, throughput 4.80495K wps
[Epoch 99 Batch 150/173] avg loss 0.00401832, throughput 5.15477K wps
Begin Testing...
[Epoch 99] train avg loss 0.00374124, dev acc 0.7873, dev avg loss 0.454567, throughput 4.90439K wps
[Epoch 100 Batch 30/173] avg loss 0.0035969, throughput 4.85374K wps
[Epoch 100 Batch 60/173] avg loss 0.0035032, throughput 4.86685K wps
[Epoch 100 Batch 90/173] avg loss 0.00354996, throughput 4.98039K wps
[Epoch 100 Batch 120/173] avg loss 0.00369747, throughput 4.9325K wps
[Epoch 100 Batch 150/173] avg loss 0.00387944, throughput 4.72287K wps
Begin Testing...
[Epoch 100] train avg loss 0.00370059, dev acc 0.7894, dev avg loss 0.454939, throughput 5.04823K wps
[Epoch 101 Batch 30/173] avg loss 0.00367165, throughput 5.03337K wps
[Epoch 101 Batch 60/173] avg loss 0.0037461, throughput 4.78724K wps
[Epoch 101 Batch 90/173] avg loss 0.0035057, throughput 4.70157K wps
[Epoch 101 Batch 120/173] avg loss 0.00373323, throughput 5.46683K wps
[Epoch 101 Batch 150/173] avg loss 0.00369881, throughput 5.80094K wps
Begin Testing...
[Epoch 101] train avg loss 0.0036658, dev acc 0.7852, dev avg loss 0.45488, throughput 5.12105K wps
[Epoch 102 Batch 30/173] avg loss 0.00353079, throughput 4.86006K wps
[Epoch 102 Batch 60/173] avg loss 0.00359715, throughput 5.15486K wps
[Epoch 102 Batch 90/173] avg loss 0.00367373, throughput 4.70949K wps
[Epoch 102 Batch 120/173] avg loss 0.00366783, throughput 4.66536K wps
[Epoch 102 Batch 150/173] avg loss 0.00358241, throughput 5.141K wps
Begin Testing...
[Epoch 102] train avg loss 0.00362369, dev acc 0.7852, dev avg loss 0.457404, throughput 5.04686K wps
[Epoch 103 Batch 30/173] avg loss 0.00368971, throughput 5.76399K wps
[Epoch 103 Batch 60/173] avg loss 0.00345309, throughput 4.92775K wps
[Epoch 103 Batch 90/173] avg loss 0.00341535, throughput 5.00753K wps
[Epoch 103 Batch 120/173] avg loss 0.00361173, throughput 5.6509K wps
[Epoch 103 Batch 150/173] avg loss 0.00353963, throughput 5.78864K wps
Begin Testing...
[Epoch 103] train avg loss 0.00357394, dev acc 0.7883, dev avg loss 0.455052, throughput 5.42372K wps
[Epoch 104 Batch 30/173] avg loss 0.00322516, throughput 4.99482K wps
[Epoch 104 Batch 60/173] avg loss 0.00351169, throughput 5.16076K wps
[Epoch 104 Batch 90/173] avg loss 0.00351841, throughput 5.54484K wps
[Epoch 104 Batch 120/173] avg loss 0.00371576, throughput 4.68799K wps
[Epoch 104 Batch 150/173] avg loss 0.00381342, throughput 5.03398K wps
Begin Testing...
[Epoch 104] train avg loss 0.0035866, dev acc 0.7914, dev avg loss 0.455506, throughput 5.11163K wps
[Epoch 105 Batch 30/173] avg loss 0.00352458, throughput 5.01073K wps
[Epoch 105 Batch 60/173] avg loss 0.00343286, throughput 5.62017K wps
[Epoch 105 Batch 90/173] avg loss 0.0035398, throughput 4.89308K wps
[Epoch 105 Batch 120/173] avg loss 0.00351128, throughput 5.55938K wps
[Epoch 105 Batch 150/173] avg loss 0.00343354, throughput 5.49224K wps
Begin Testing...
[Epoch 105] train avg loss 0.00349728, dev acc 0.7852, dev avg loss 0.45764, throughput 5.31643K wps
[Epoch 106 Batch 30/173] avg loss 0.00350361, throughput 4.67375K wps
[Epoch 106 Batch 60/173] avg loss 0.00368902, throughput 5.41073K wps
[Epoch 106 Batch 90/173] avg loss 0.0031481, throughput 5.24052K wps
[Epoch 106 Batch 120/173] avg loss 0.0033222, throughput 6.11613K wps
[Epoch 106 Batch 150/173] avg loss 0.00357974, throughput 5.78481K wps
Begin Testing...
[Epoch 106] train avg loss 0.00343035, dev acc 0.7883, dev avg loss 0.456443, throughput 5.46332K wps
[Epoch 107 Batch 30/173] avg loss 0.00332353, throughput 5.6031K wps
[Epoch 107 Batch 60/173] avg loss 0.00346801, throughput 6.13323K wps
[Epoch 107 Batch 90/173] avg loss 0.0033416, throughput 5.34954K wps
[Epoch 107 Batch 120/173] avg loss 0.00316182, throughput 5.30532K wps
[Epoch 107 Batch 150/173] avg loss 0.0035636, throughput 5.47503K wps
Begin Testing...
[Epoch 107] train avg loss 0.00336095, dev acc 0.7842, dev avg loss 0.461098, throughput 5.47591K wps
[Epoch 108 Batch 30/173] avg loss 0.00330352, throughput 4.70464K wps
[Epoch 108 Batch 60/173] avg loss 0.00328997, throughput 4.86846K wps
[Epoch 108 Batch 90/173] avg loss 0.00325961, throughput 5.39994K wps
[Epoch 108 Batch 120/173] avg loss 0.00324668, throughput 4.77664K wps
[Epoch 108 Batch 150/173] avg loss 0.00338481, throughput 5.44651K wps
Begin Testing...
[Epoch 108] train avg loss 0.00329167, dev acc 0.7862, dev avg loss 0.459092, throughput 5.01782K wps
[Epoch 109 Batch 30/173] avg loss 0.00324466, throughput 4.82489K wps
[Epoch 109 Batch 60/173] avg loss 0.00319182, throughput 5.75267K wps
[Epoch 109 Batch 90/173] avg loss 0.00342428, throughput 4.658K wps
[Epoch 109 Batch 120/173] avg loss 0.00326537, throughput 4.97345K wps
[Epoch 109 Batch 150/173] avg loss 0.00332422, throughput 4.94665K wps
Begin Testing...
[Epoch 109] train avg loss 0.00329946, dev acc 0.7873, dev avg loss 0.4599, throughput 5.06284K wps
[Epoch 110 Batch 30/173] avg loss 0.00314529, throughput 4.95852K wps
[Epoch 110 Batch 60/173] avg loss 0.00339547, throughput 4.93819K wps
[Epoch 110 Batch 90/173] avg loss 0.00338003, throughput 4.7238K wps
[Epoch 110 Batch 120/173] avg loss 0.00330916, throughput 5.05252K wps
[Epoch 110 Batch 150/173] avg loss 0.0031276, throughput 5.69718K wps
Begin Testing...
[Epoch 110] train avg loss 0.0032835, dev acc 0.7894, dev avg loss 0.459086, throughput 5.05483K wps
[Epoch 111 Batch 30/173] avg loss 0.00314321, throughput 5.39367K wps
[Epoch 111 Batch 60/173] avg loss 0.0030968, throughput 5.61208K wps
[Epoch 111 Batch 90/173] avg loss 0.00318693, throughput 6.0588K wps
[Epoch 111 Batch 120/173] avg loss 0.00318451, throughput 5.92185K wps
[Epoch 111 Batch 150/173] avg loss 0.00332451, throughput 4.78321K wps
Begin Testing...
[Epoch 111] train avg loss 0.00319311, dev acc 0.7894, dev avg loss 0.459051, throughput 5.57014K wps
[Epoch 112 Batch 30/173] avg loss 0.00313409, throughput 4.67671K wps
[Epoch 112 Batch 60/173] avg loss 0.00308711, throughput 5.38889K wps
[Epoch 112 Batch 90/173] avg loss 0.00333781, throughput 5.505K wps
[Epoch 112 Batch 120/173] avg loss 0.00309845, throughput 4.93722K wps
[Epoch 112 Batch 150/173] avg loss 0.00347637, throughput 5.47268K wps
Begin Testing...
[Epoch 112] train avg loss 0.00322156, dev acc 0.7852, dev avg loss 0.468866, throughput 5.2171K wps
[Epoch 113 Batch 30/173] avg loss 0.0031545, throughput 4.66797K wps
[Epoch 113 Batch 60/173] avg loss 0.00305222, throughput 4.7938K wps
[Epoch 113 Batch 90/173] avg loss 0.00329086, throughput 5.62833K wps
[Epoch 113 Batch 120/173] avg loss 0.00318771, throughput 6.09974K wps
[Epoch 113 Batch 150/173] avg loss 0.00307162, throughput 5.25872K wps
Begin Testing...
[Epoch 113] train avg loss 0.00310428, dev acc 0.7904, dev avg loss 0.460847, throughput 5.17222K wps
[Epoch 114 Batch 30/173] avg loss 0.0031105, throughput 5.93935K wps
[Epoch 114 Batch 60/173] avg loss 0.00314471, throughput 5.1339K wps
[Epoch 114 Batch 90/173] avg loss 0.00318781, throughput 5.46521K wps
[Epoch 114 Batch 120/173] avg loss 0.00308867, throughput 4.66164K wps
[Epoch 114 Batch 150/173] avg loss 0.00306163, throughput 5.11375K wps
Begin Testing...
[Epoch 114] train avg loss 0.00313281, dev acc 0.7883, dev avg loss 0.459744, throughput 5.22089K wps
[Epoch 115 Batch 30/173] avg loss 0.00325467, throughput 5.22617K wps
[Epoch 115 Batch 60/173] avg loss 0.0028557, throughput 5.54931K wps
[Epoch 115 Batch 90/173] avg loss 0.00304974, throughput 5.4498K wps
[Epoch 115 Batch 120/173] avg loss 0.00314306, throughput 6.26373K wps
[Epoch 115 Batch 150/173] avg loss 0.00301353, throughput 4.72759K wps
Begin Testing...
[Epoch 115] train avg loss 0.00307848, dev acc 0.7904, dev avg loss 0.462451, throughput 5.3112K wps
[Epoch 116 Batch 30/173] avg loss 0.00285394, throughput 5.29186K wps
[Epoch 116 Batch 60/173] avg loss 0.00299178, throughput 5.06226K wps
[Epoch 116 Batch 90/173] avg loss 0.00323933, throughput 4.9417K wps
[Epoch 116 Batch 120/173] avg loss 0.00308667, throughput 5.57887K wps
[Epoch 116 Batch 150/173] avg loss 0.00294812, throughput 5.18223K wps
Begin Testing...
[Epoch 116] train avg loss 0.00304233, dev acc 0.7894, dev avg loss 0.461988, throughput 5.24109K wps
[Epoch 117 Batch 30/173] avg loss 0.00297324, throughput 6.59189K wps
[Epoch 117 Batch 60/173] avg loss 0.00320726, throughput 5.28235K wps
[Epoch 117 Batch 90/173] avg loss 0.0028622, throughput 5.3992K wps
[Epoch 117 Batch 120/173] avg loss 0.0029092, throughput 5.26062K wps
[Epoch 117 Batch 150/173] avg loss 0.0029064, throughput 5.54789K wps
Begin Testing...
[Epoch 117] train avg loss 0.00297617, dev acc 0.7883, dev avg loss 0.462362, throughput 5.45793K wps
[Epoch 118 Batch 30/173] avg loss 0.00304478, throughput 4.96629K wps
[Epoch 118 Batch 60/173] avg loss 0.00304126, throughput 5.45251K wps
[Epoch 118 Batch 90/173] avg loss 0.00275211, throughput 4.67882K wps
[Epoch 118 Batch 120/173] avg loss 0.00310447, throughput 5.15454K wps
[Epoch 118 Batch 150/173] avg loss 0.0029866, throughput 4.91188K wps
Begin Testing...
[Epoch 118] train avg loss 0.00299443, dev acc 0.7894, dev avg loss 0.462347, throughput 4.99621K wps
[Epoch 119 Batch 30/173] avg loss 0.00294236, throughput 5.53588K wps
[Epoch 119 Batch 60/173] avg loss 0.00273285, throughput 5.88183K wps
[Epoch 119 Batch 90/173] avg loss 0.00294596, throughput 5.64156K wps
[Epoch 119 Batch 120/173] avg loss 0.00281749, throughput 4.98404K wps
[Epoch 119 Batch 150/173] avg loss 0.00309832, throughput 4.80876K wps
Begin Testing...
[Epoch 119] train avg loss 0.00289897, dev acc 0.7904, dev avg loss 0.462971, throughput 5.45827K wps
[Epoch 120 Batch 30/173] avg loss 0.00290888, throughput 4.80775K wps
[Epoch 120 Batch 60/173] avg loss 0.00290432, throughput 5.18233K wps
[Epoch 120 Batch 90/173] avg loss 0.0027549, throughput 5.34742K wps
[Epoch 120 Batch 120/173] avg loss 0.00313517, throughput 5.08831K wps
[Epoch 120 Batch 150/173] avg loss 0.00292401, throughput 4.70548K wps
Begin Testing...
[Epoch 120] train avg loss 0.00295944, dev acc 0.7852, dev avg loss 0.465702, throughput 5.00287K wps
[Epoch 121 Batch 30/173] avg loss 0.00276284, throughput 4.83195K wps
[Epoch 121 Batch 60/173] avg loss 0.00303029, throughput 4.70618K wps
[Epoch 121 Batch 90/173] avg loss 0.00289006, throughput 5.28336K wps
[Epoch 121 Batch 120/173] avg loss 0.00264874, throughput 5.53807K wps
[Epoch 121 Batch 150/173] avg loss 0.00277358, throughput 4.88931K wps
Begin Testing...
[Epoch 121] train avg loss 0.00282733, dev acc 0.7904, dev avg loss 0.464145, throughput 5.15332K wps
[Epoch 122 Batch 30/173] avg loss 0.00267563, throughput 4.82801K wps
[Epoch 122 Batch 60/173] avg loss 0.00281154, throughput 4.95754K wps
[Epoch 122 Batch 90/173] avg loss 0.00285828, throughput 4.89922K wps
[Epoch 122 Batch 120/173] avg loss 0.00265815, throughput 4.73692K wps
[Epoch 122 Batch 150/173] avg loss 0.00273637, throughput 4.79023K wps
Begin Testing...
[Epoch 122] train avg loss 0.00275484, dev acc 0.7862, dev avg loss 0.465148, throughput 4.82984K wps
[Epoch 123 Batch 30/173] avg loss 0.0027475, throughput 5.25237K wps
[Epoch 123 Batch 60/173] avg loss 0.00257797, throughput 5.21286K wps
[Epoch 123 Batch 90/173] avg loss 0.0028083, throughput 5.42334K wps
[Epoch 123 Batch 120/173] avg loss 0.00255743, throughput 5.98547K wps
[Epoch 123 Batch 150/173] avg loss 0.00279674, throughput 5.73907K wps
Begin Testing...
[Epoch 123] train avg loss 0.00271829, dev acc 0.7894, dev avg loss 0.465596, throughput 5.53756K wps
[Epoch 124 Batch 30/173] avg loss 0.00271765, throughput 5.26679K wps
[Epoch 124 Batch 60/173] avg loss 0.00280633, throughput 5.18235K wps
[Epoch 124 Batch 90/173] avg loss 0.00282551, throughput 4.9989K wps
[Epoch 124 Batch 120/173] avg loss 0.00277635, throughput 5.32346K wps
[Epoch 124 Batch 150/173] avg loss 0.00256121, throughput 5.00497K wps
Begin Testing...
[Epoch 124] train avg loss 0.00275546, dev acc 0.7873, dev avg loss 0.465651, throughput 5.14307K wps
[Epoch 125 Batch 30/173] avg loss 0.00263359, throughput 4.75238K wps
[Epoch 125 Batch 60/173] avg loss 0.00248309, throughput 4.86453K wps
[Epoch 125 Batch 90/173] avg loss 0.00271217, throughput 5.91143K wps
[Epoch 125 Batch 120/173] avg loss 0.00263363, throughput 5.1979K wps
[Epoch 125 Batch 150/173] avg loss 0.0027977, throughput 5.19489K wps
Begin Testing...
[Epoch 125] train avg loss 0.00264163, dev acc 0.7873, dev avg loss 0.466332, throughput 5.21347K wps
[Epoch 126 Batch 30/173] avg loss 0.00277286, throughput 5.63228K wps
[Epoch 126 Batch 60/173] avg loss 0.00242584, throughput 5.75157K wps
[Epoch 126 Batch 90/173] avg loss 0.00267244, throughput 5.69179K wps
[Epoch 126 Batch 120/173] avg loss 0.00260237, throughput 5.00359K wps
[Epoch 126 Batch 150/173] avg loss 0.0027954, throughput 4.96889K wps
Begin Testing...
[Epoch 126] train avg loss 0.00268704, dev acc 0.7852, dev avg loss 0.467278, throughput 5.49081K wps
[Epoch 127 Batch 30/173] avg loss 0.00272146, throughput 5.45984K wps
[Epoch 127 Batch 60/173] avg loss 0.00273003, throughput 4.95659K wps
[Epoch 127 Batch 90/173] avg loss 0.00255637, throughput 4.78046K wps
[Epoch 127 Batch 120/173] avg loss 0.00270311, throughput 4.84494K wps
[Epoch 127 Batch 150/173] avg loss 0.00282563, throughput 5.51805K wps
Begin Testing...
[Epoch 127] train avg loss 0.00267879, dev acc 0.7842, dev avg loss 0.473693, throughput 5.09468K wps
[Epoch 128 Batch 30/173] avg loss 0.00257001, throughput 4.67674K wps
[Epoch 128 Batch 60/173] avg loss 0.00252343, throughput 4.74725K wps
[Epoch 128 Batch 90/173] avg loss 0.00240067, throughput 5.47669K wps
[Epoch 128 Batch 120/173] avg loss 0.00270188, throughput 5.27378K wps
[Epoch 128 Batch 150/173] avg loss 0.00280754, throughput 5.16007K wps
Begin Testing...
[Epoch 128] train avg loss 0.00261297, dev acc 0.7894, dev avg loss 0.468787, throughput 5.12757K wps
[Epoch 129 Batch 30/173] avg loss 0.00282001, throughput 5.27328K wps
[Epoch 129 Batch 60/173] avg loss 0.00260619, throughput 5.41175K wps
[Epoch 129 Batch 90/173] avg loss 0.00240985, throughput 5.95922K wps
[Epoch 129 Batch 120/173] avg loss 0.00248852, throughput 5.13013K wps
[Epoch 129 Batch 150/173] avg loss 0.00248361, throughput 5.52816K wps
Begin Testing...
[Epoch 129] train avg loss 0.00257784, dev acc 0.7894, dev avg loss 0.467749, throughput 5.35442K wps
[Epoch 130 Batch 30/173] avg loss 0.00263095, throughput 5.02408K wps
[Epoch 130 Batch 60/173] avg loss 0.00250885, throughput 5.95457K wps
[Epoch 130 Batch 90/173] avg loss 0.00261711, throughput 5.2308K wps
[Epoch 130 Batch 120/173] avg loss 0.00242492, throughput 5.13153K wps
[Epoch 130 Batch 150/173] avg loss 0.00253704, throughput 5.09646K wps
Begin Testing...
[Epoch 130] train avg loss 0.00256, dev acc 0.7904, dev avg loss 0.46771, throughput 5.34285K wps
[Epoch 131 Batch 30/173] avg loss 0.00260497, throughput 5.29385K wps
[Epoch 131 Batch 60/173] avg loss 0.00250554, throughput 4.6862K wps
[Epoch 131 Batch 90/173] avg loss 0.0025258, throughput 5.27609K wps
[Epoch 131 Batch 120/173] avg loss 0.00240046, throughput 5.20909K wps
[Epoch 131 Batch 150/173] avg loss 0.00276515, throughput 5.35269K wps
Begin Testing...
[Epoch 131] train avg loss 0.00255835, dev acc 0.7914, dev avg loss 0.468667, throughput 5.1903K wps
[Epoch 132 Batch 30/173] avg loss 0.00248278, throughput 5.039K wps
[Epoch 132 Batch 60/173] avg loss 0.00240163, throughput 5.00883K wps
[Epoch 132 Batch 90/173] avg loss 0.00267836, throughput 5.52903K wps
[Epoch 132 Batch 120/173] avg loss 0.00232913, throughput 5.05768K wps
[Epoch 132 Batch 150/173] avg loss 0.00241082, throughput 5.24719K wps
Begin Testing...
[Epoch 132] train avg loss 0.00250302, dev acc 0.7935, dev avg loss 0.467841, throughput 5.21777K wps
Observed Improvement.
Begin Testing...
[Epoch 133 Batch 30/173] avg loss 0.00245825, throughput 4.63211K wps
[Epoch 133 Batch 60/173] avg loss 0.00239485, throughput 5.31206K wps
[Epoch 133 Batch 90/173] avg loss 0.0025382, throughput 5.31526K wps
[Epoch 133 Batch 120/173] avg loss 0.00250072, throughput 5.63917K wps
[Epoch 133 Batch 150/173] avg loss 0.00249388, throughput 5.30632K wps
Begin Testing...
[Epoch 133] train avg loss 0.00246855, dev acc 0.7883, dev avg loss 0.469901, throughput 5.17732K wps
[Epoch 134 Batch 30/173] avg loss 0.00254487, throughput 5.13101K wps
[Epoch 134 Batch 60/173] avg loss 0.00260291, throughput 5.25667K wps
[Epoch 134 Batch 90/173] avg loss 0.00239495, throughput 5.52637K wps
[Epoch 134 Batch 120/173] avg loss 0.00250582, throughput 5.91173K wps
[Epoch 134 Batch 150/173] avg loss 0.00233683, throughput 5.12748K wps
Begin Testing...
[Epoch 134] train avg loss 0.00247047, dev acc 0.7894, dev avg loss 0.473736, throughput 5.33958K wps
[Epoch 135 Batch 30/173] avg loss 0.00241812, throughput 5.49695K wps
[Epoch 135 Batch 60/173] avg loss 0.00257859, throughput 5.15994K wps
[Epoch 135 Batch 90/173] avg loss 0.00248215, throughput 5.3844K wps
[Epoch 135 Batch 120/173] avg loss 0.0023373, throughput 4.85309K wps
[Epoch 135 Batch 150/173] avg loss 0.00250189, throughput 5.17239K wps
Begin Testing...
[Epoch 135] train avg loss 0.00248007, dev acc 0.7873, dev avg loss 0.470395, throughput 5.28404K wps
[Epoch 136 Batch 30/173] avg loss 0.00224515, throughput 5.46864K wps
[Epoch 136 Batch 60/173] avg loss 0.00251766, throughput 5.2461K wps
[Epoch 136 Batch 90/173] avg loss 0.0023009, throughput 5.46254K wps
[Epoch 136 Batch 120/173] avg loss 0.0023857, throughput 5.14807K wps
[Epoch 136 Batch 150/173] avg loss 0.00237551, throughput 5.60101K wps
Begin Testing...
[Epoch 136] train avg loss 0.00234414, dev acc 0.7862, dev avg loss 0.471317, throughput 5.35351K wps
[Epoch 137 Batch 30/173] avg loss 0.00243624, throughput 5.02195K wps
[Epoch 137 Batch 60/173] avg loss 0.00227153, throughput 5.17852K wps
[Epoch 137 Batch 90/173] avg loss 0.00223473, throughput 5.3851K wps
[Epoch 137 Batch 120/173] avg loss 0.0022597, throughput 5.19472K wps
[Epoch 137 Batch 150/173] avg loss 0.0024825, throughput 5.92719K wps
Begin Testing...
[Epoch 137] train avg loss 0.00235144, dev acc 0.7873, dev avg loss 0.472702, throughput 5.25723K wps
[Epoch 138 Batch 30/173] avg loss 0.0023482, throughput 5.11131K wps
[Epoch 138 Batch 60/173] avg loss 0.00227728, throughput 4.72603K wps
[Epoch 138 Batch 90/173] avg loss 0.00221664, throughput 5.69326K wps
[Epoch 138 Batch 120/173] avg loss 0.00264002, throughput 5.27357K wps
[Epoch 138 Batch 150/173] avg loss 0.00232237, throughput 6.20198K wps
Begin Testing...
[Epoch 138] train avg loss 0.00238209, dev acc 0.7873, dev avg loss 0.473069, throughput 5.3692K wps
[Epoch 139 Batch 30/173] avg loss 0.00232198, throughput 5.91668K wps
[Epoch 139 Batch 60/173] avg loss 0.00234279, throughput 5.28433K wps
[Epoch 139 Batch 90/173] avg loss 0.00237299, throughput 6.27735K wps
[Epoch 139 Batch 120/173] avg loss 0.00226923, throughput 5.25725K wps
[Epoch 139 Batch 150/173] avg loss 0.00222156, throughput 5.70479K wps
Begin Testing...
[Epoch 139] train avg loss 0.00233733, dev acc 0.7894, dev avg loss 0.473028, throughput 5.50688K wps
[Epoch 140 Batch 30/173] avg loss 0.00221074, throughput 4.83593K wps
[Epoch 140 Batch 60/173] avg loss 0.00244009, throughput 4.61315K wps
[Epoch 140 Batch 90/173] avg loss 0.00222798, throughput 5.59308K wps
[Epoch 140 Batch 120/173] avg loss 0.00225196, throughput 4.69342K wps
[Epoch 140 Batch 150/173] avg loss 0.0023614, throughput 4.88136K wps
Begin Testing...
[Epoch 140] train avg loss 0.00232221, dev acc 0.7914, dev avg loss 0.471174, throughput 4.88993K wps
[Epoch 141 Batch 30/173] avg loss 0.00225225, throughput 4.8165K wps
[Epoch 141 Batch 60/173] avg loss 0.00220462, throughput 5.36065K wps
[Epoch 141 Batch 90/173] avg loss 0.00220797, throughput 4.98213K wps
[Epoch 141 Batch 120/173] avg loss 0.00226468, throughput 5.33168K wps
[Epoch 141 Batch 150/173] avg loss 0.00209861, throughput 5.38373K wps
Begin Testing...
[Epoch 141] train avg loss 0.00223568, dev acc 0.7883, dev avg loss 0.472974, throughput 5.09782K wps
[Epoch 142 Batch 30/173] avg loss 0.00232476, throughput 4.67109K wps
[Epoch 142 Batch 60/173] avg loss 0.00208886, throughput 5.0887K wps
[Epoch 142 Batch 90/173] avg loss 0.00229479, throughput 4.86433K wps
[Epoch 142 Batch 120/173] avg loss 0.00232753, throughput 4.87156K wps
[Epoch 142 Batch 150/173] avg loss 0.00231888, throughput 5.25386K wps
Begin Testing...
[Epoch 142] train avg loss 0.00229685, dev acc 0.7925, dev avg loss 0.475945, throughput 5.03024K wps
[Epoch 143 Batch 30/173] avg loss 0.00221252, throughput 5.34466K wps
[Epoch 143 Batch 60/173] avg loss 0.00212053, throughput 4.76087K wps
[Epoch 143 Batch 90/173] avg loss 0.00209093, throughput 5.11227K wps
[Epoch 143 Batch 120/173] avg loss 0.00221154, throughput 4.85801K wps
[Epoch 143 Batch 150/173] avg loss 0.00230108, throughput 6.23495K wps
Begin Testing...
[Epoch 143] train avg loss 0.0021883, dev acc 0.7862, dev avg loss 0.475621, throughput 5.16106K wps
[Epoch 144 Batch 30/173] avg loss 0.00215872, throughput 5.18388K wps
[Epoch 144 Batch 60/173] avg loss 0.00220644, throughput 5.54351K wps
[Epoch 144 Batch 90/173] avg loss 0.00221059, throughput 5.08521K wps
[Epoch 144 Batch 120/173] avg loss 0.00221384, throughput 5.33297K wps
[Epoch 144 Batch 150/173] avg loss 0.00212903, throughput 5.92325K wps
Begin Testing...
[Epoch 144] train avg loss 0.00219274, dev acc 0.7842, dev avg loss 0.475354, throughput 5.28942K wps
[Epoch 145 Batch 30/173] avg loss 0.00207882, throughput 4.90272K wps
[Epoch 145 Batch 60/173] avg loss 0.0020996, throughput 5.86748K wps
[Epoch 145 Batch 90/173] avg loss 0.0021307, throughput 5.24288K wps
[Epoch 145 Batch 120/173] avg loss 0.00204276, throughput 5.47064K wps
[Epoch 145 Batch 150/173] avg loss 0.00215407, throughput 5.16965K wps
Begin Testing...
[Epoch 145] train avg loss 0.00208968, dev acc 0.7883, dev avg loss 0.477093, throughput 5.27988K wps
[Epoch 146 Batch 30/173] avg loss 0.00203493, throughput 5.66188K wps
[Epoch 146 Batch 60/173] avg loss 0.00209565, throughput 5.22089K wps
[Epoch 146 Batch 90/173] avg loss 0.00217757, throughput 5.02867K wps
[Epoch 146 Batch 120/173] avg loss 0.00199661, throughput 5.22647K wps
[Epoch 146 Batch 150/173] avg loss 0.0022635, throughput 5.24561K wps
Begin Testing...
[Epoch 146] train avg loss 0.00213321, dev acc 0.7914, dev avg loss 0.475303, throughput 5.30166K wps
[Epoch 147 Batch 30/173] avg loss 0.00184984, throughput 6.03051K wps
[Epoch 147 Batch 60/173] avg loss 0.00209556, throughput 4.68365K wps
[Epoch 147 Batch 90/173] avg loss 0.00214307, throughput 5.14831K wps
[Epoch 147 Batch 120/173] avg loss 0.00226268, throughput 4.92306K wps
[Epoch 147 Batch 150/173] avg loss 0.0020255, throughput 5.45633K wps
Begin Testing...
[Epoch 147] train avg loss 0.00209214, dev acc 0.7873, dev avg loss 0.476169, throughput 5.16144K wps
[Epoch 148 Batch 30/173] avg loss 0.0022459, throughput 5.27059K wps
[Epoch 148 Batch 60/173] avg loss 0.00210925, throughput 4.94294K wps
[Epoch 148 Batch 90/173] avg loss 0.0020891, throughput 4.83481K wps
[Epoch 148 Batch 120/173] avg loss 0.00211836, throughput 5.34961K wps
[Epoch 148 Batch 150/173] avg loss 0.002104, throughput 5.55308K wps
Begin Testing...
[Epoch 148] train avg loss 0.00212737, dev acc 0.7914, dev avg loss 0.476169, throughput 5.20366K wps
[Epoch 149 Batch 30/173] avg loss 0.0021022, throughput 4.88942K wps
[Epoch 149 Batch 60/173] avg loss 0.00219424, throughput 5.16204K wps
[Epoch 149 Batch 90/173] avg loss 0.00196689, throughput 5.40915K wps
[Epoch 149 Batch 120/173] avg loss 0.00198955, throughput 4.8108K wps
[Epoch 149 Batch 150/173] avg loss 0.00205093, throughput 5.57673K wps
Begin Testing...
[Epoch 149] train avg loss 0.0020528, dev acc 0.7883, dev avg loss 0.477704, throughput 5.07521K wps
[Epoch 150 Batch 30/173] avg loss 0.00192618, throughput 4.74081K wps
[Epoch 150 Batch 60/173] avg loss 0.00209697, throughput 5.43393K wps
[Epoch 150 Batch 90/173] avg loss 0.0021888, throughput 5.06008K wps
[Epoch 150 Batch 120/173] avg loss 0.00201361, throughput 5.27779K wps
[Epoch 150 Batch 150/173] avg loss 0.00195934, throughput 6.01988K wps
Begin Testing...
[Epoch 150] train avg loss 0.00205831, dev acc 0.7925, dev avg loss 0.477798, throughput 5.26614K wps
[Epoch 151 Batch 30/173] avg loss 0.00202289, throughput 5.27003K wps
[Epoch 151 Batch 60/173] avg loss 0.00195142, throughput 5.27583K wps
[Epoch 151 Batch 90/173] avg loss 0.00215137, throughput 5.03502K wps
[Epoch 151 Batch 120/173] avg loss 0.00208087, throughput 5.1657K wps
[Epoch 151 Batch 150/173] avg loss 0.00205323, throughput 5.46109K wps
Begin Testing...
[Epoch 151] train avg loss 0.00204408, dev acc 0.7894, dev avg loss 0.479045, throughput 5.24389K wps
[Epoch 152 Batch 30/173] avg loss 0.00180326, throughput 5.32971K wps
[Epoch 152 Batch 60/173] avg loss 0.00196078, throughput 5.58688K wps
[Epoch 152 Batch 90/173] avg loss 0.00194703, throughput 5.43278K wps
[Epoch 152 Batch 120/173] avg loss 0.00203235, throughput 4.96975K wps
[Epoch 152 Batch 150/173] avg loss 0.00213093, throughput 4.68003K wps
Begin Testing...
[Epoch 152] train avg loss 0.00197577, dev acc 0.7883, dev avg loss 0.482997, throughput 5.15609K wps
[Epoch 153 Batch 30/173] avg loss 0.00196785, throughput 4.66722K wps
[Epoch 153 Batch 60/173] avg loss 0.00205959, throughput 4.92853K wps
[Epoch 153 Batch 90/173] avg loss 0.00192804, throughput 5.73648K wps
[Epoch 153 Batch 120/173] avg loss 0.00203784, throughput 6.09648K wps
[Epoch 153 Batch 150/173] avg loss 0.00199345, throughput 5.20884K wps
Begin Testing...
[Epoch 153] train avg loss 0.00201538, dev acc 0.7873, dev avg loss 0.483353, throughput 5.30528K wps
[Epoch 154 Batch 30/173] avg loss 0.00197265, throughput 5.22673K wps
[Epoch 154 Batch 60/173] avg loss 0.00191038, throughput 5.17571K wps
[Epoch 154 Batch 90/173] avg loss 0.00192314, throughput 5.21783K wps
[Epoch 154 Batch 120/173] avg loss 0.00187374, throughput 5.01837K wps
[Epoch 154 Batch 150/173] avg loss 0.00204047, throughput 4.87567K wps
Begin Testing...
[Epoch 154] train avg loss 0.00193144, dev acc 0.7852, dev avg loss 0.481964, throughput 5.11322K wps
[Epoch 155 Batch 30/173] avg loss 0.00176506, throughput 4.70988K wps
[Epoch 155 Batch 60/173] avg loss 0.00198806, throughput 5.21149K wps
[Epoch 155 Batch 90/173] avg loss 0.00192535, throughput 5.78982K wps
[Epoch 155 Batch 120/173] avg loss 0.00188638, throughput 5.49842K wps
[Epoch 155 Batch 150/173] avg loss 0.00216585, throughput 5.19389K wps
Begin Testing...
[Epoch 155] train avg loss 0.00197511, dev acc 0.7873, dev avg loss 0.480786, throughput 5.26346K wps
[Epoch 156 Batch 30/173] avg loss 0.00197341, throughput 4.92608K wps
[Epoch 156 Batch 60/173] avg loss 0.001911, throughput 5.5326K wps
[Epoch 156 Batch 90/173] avg loss 0.00193321, throughput 5.0585K wps
[Epoch 156 Batch 120/173] avg loss 0.00186712, throughput 4.92232K wps
[Epoch 156 Batch 150/173] avg loss 0.00191843, throughput 4.93387K wps
Begin Testing...
[Epoch 156] train avg loss 0.00191944, dev acc 0.7883, dev avg loss 0.485305, throughput 5.20317K wps
[Epoch 157 Batch 30/173] avg loss 0.00195818, throughput 5.4158K wps
[Epoch 157 Batch 60/173] avg loss 0.00170481, throughput 4.92096K wps
[Epoch 157 Batch 90/173] avg loss 0.00187989, throughput 6.38065K wps
[Epoch 157 Batch 120/173] avg loss 0.00200624, throughput 6.14059K wps
[Epoch 157 Batch 150/173] avg loss 0.00196926, throughput 5.05118K wps
Begin Testing...
[Epoch 157] train avg loss 0.0019192, dev acc 0.7842, dev avg loss 0.48178, throughput 5.42515K wps
[Epoch 158 Batch 30/173] avg loss 0.00171224, throughput 6.03355K wps
[Epoch 158 Batch 60/173] avg loss 0.0019209, throughput 5.4144K wps
[Epoch 158 Batch 90/173] avg loss 0.00188654, throughput 5.96587K wps
[Epoch 158 Batch 120/173] avg loss 0.00188298, throughput 5.11515K wps
[Epoch 158 Batch 150/173] avg loss 0.00182712, throughput 4.91799K wps
Begin Testing...
[Epoch 158] train avg loss 0.00185638, dev acc 0.7883, dev avg loss 0.49015, throughput 5.3431K wps
[Epoch 159 Batch 30/173] avg loss 0.00182837, throughput 5.02303K wps
[Epoch 159 Batch 60/173] avg loss 0.00185055, throughput 5.56446K wps
[Epoch 159 Batch 90/173] avg loss 0.0017323, throughput 5.24097K wps
[Epoch 159 Batch 120/173] avg loss 0.00180258, throughput 4.87483K wps
[Epoch 159 Batch 150/173] avg loss 0.00185085, throughput 5.58468K wps
Begin Testing...
[Epoch 159] train avg loss 0.00184667, dev acc 0.7894, dev avg loss 0.487807, throughput 5.30552K wps
[Epoch 160 Batch 30/173] avg loss 0.00183534, throughput 5.19785K wps
[Epoch 160 Batch 60/173] avg loss 0.0018767, throughput 5.56401K wps
[Epoch 160 Batch 90/173] avg loss 0.00171708, throughput 5.54654K wps
[Epoch 160 Batch 120/173] avg loss 0.0018264, throughput 6.24683K wps
[Epoch 160 Batch 150/173] avg loss 0.00196411, throughput 5.97031K wps
Begin Testing...
[Epoch 160] train avg loss 0.0018469, dev acc 0.7904, dev avg loss 0.484956, throughput 5.57668K wps
[Epoch 161 Batch 30/173] avg loss 0.00192447, throughput 4.67003K wps
[Epoch 161 Batch 60/173] avg loss 0.00183003, throughput 5.16289K wps
[Epoch 161 Batch 90/173] avg loss 0.00190901, throughput 5.45645K wps
[Epoch 161 Batch 120/173] avg loss 0.00175938, throughput 4.98017K wps
[Epoch 161 Batch 150/173] avg loss 0.00175546, throughput 4.72531K wps
Begin Testing...
[Epoch 161] train avg loss 0.00184654, dev acc 0.7904, dev avg loss 0.484095, throughput 4.98752K wps
[Epoch 162 Batch 30/173] avg loss 0.00191431, throughput 4.89979K wps
[Epoch 162 Batch 60/173] avg loss 0.00175172, throughput 5.39284K wps
[Epoch 162 Batch 90/173] avg loss 0.00180336, throughput 4.68859K wps
[Epoch 162 Batch 120/173] avg loss 0.00188608, throughput 5.43021K wps
[Epoch 162 Batch 150/173] avg loss 0.00182507, throughput 4.67578K wps
Begin Testing...
[Epoch 162] train avg loss 0.00183398, dev acc 0.7883, dev avg loss 0.485603, throughput 5.02469K wps
[Epoch 163 Batch 30/173] avg loss 0.00174756, throughput 5.1568K wps
[Epoch 163 Batch 60/173] avg loss 0.00162385, throughput 4.73138K wps
[Epoch 163 Batch 90/173] avg loss 0.00183248, throughput 5.23786K wps
[Epoch 163 Batch 120/173] avg loss 0.00189916, throughput 5.97465K wps
[Epoch 163 Batch 150/173] avg loss 0.00185201, throughput 5.57731K wps
Begin Testing...
[Epoch 163] train avg loss 0.0018117, dev acc 0.7842, dev avg loss 0.488249, throughput 5.24473K wps
[Epoch 164 Batch 30/173] avg loss 0.00174683, throughput 5.73342K wps
[Epoch 164 Batch 60/173] avg loss 0.00190898, throughput 4.9982K wps
[Epoch 164 Batch 90/173] avg loss 0.00160992, throughput 4.69181K wps
[Epoch 164 Batch 120/173] avg loss 0.00170474, throughput 4.85792K wps
[Epoch 164 Batch 150/173] avg loss 0.0018749, throughput 6.10746K wps
Begin Testing...
[Epoch 164] train avg loss 0.0017609, dev acc 0.7883, dev avg loss 0.486122, throughput 5.32229K wps
[Epoch 165 Batch 30/173] avg loss 0.00176108, throughput 5.52563K wps
[Epoch 165 Batch 60/173] avg loss 0.00170406, throughput 5.23565K wps
[Epoch 165 Batch 90/173] avg loss 0.0018139, throughput 5.24035K wps
[Epoch 165 Batch 120/173] avg loss 0.00174858, throughput 4.9703K wps
[Epoch 165 Batch 150/173] avg loss 0.00190978, throughput 5.64412K wps
Begin Testing...
[Epoch 165] train avg loss 0.00178682, dev acc 0.7862, dev avg loss 0.486119, throughput 5.40256K wps
[Epoch 166 Batch 30/173] avg loss 0.00159059, throughput 5.42111K wps
[Epoch 166 Batch 60/173] avg loss 0.0017111, throughput 5.14483K wps
[Epoch 166 Batch 90/173] avg loss 0.00182051, throughput 5.01099K wps
[Epoch 166 Batch 120/173] avg loss 0.00177087, throughput 5.7559K wps
[Epoch 166 Batch 150/173] avg loss 0.00176226, throughput 5.1717K wps
Begin Testing...
[Epoch 166] train avg loss 0.00174946, dev acc 0.7852, dev avg loss 0.487779, throughput 5.19708K wps
[Epoch 167 Batch 30/173] avg loss 0.00159249, throughput 4.81064K wps
[Epoch 167 Batch 60/173] avg loss 0.00180998, throughput 5.43023K wps
[Epoch 167 Batch 90/173] avg loss 0.00170608, throughput 4.78839K wps
[Epoch 167 Batch 120/173] avg loss 0.00173488, throughput 4.63909K wps
[Epoch 167 Batch 150/173] avg loss 0.00158417, throughput 4.84661K wps
Begin Testing...
[Epoch 167] train avg loss 0.00171524, dev acc 0.7883, dev avg loss 0.486539, throughput 4.91956K wps
[Epoch 168 Batch 30/173] avg loss 0.00162571, throughput 4.89808K wps
[Epoch 168 Batch 60/173] avg loss 0.0017355, throughput 5.98713K wps
[Epoch 168 Batch 90/173] avg loss 0.00176681, throughput 6.49365K wps
[Epoch 168 Batch 120/173] avg loss 0.00172396, throughput 5.4706K wps
[Epoch 168 Batch 150/173] avg loss 0.00171108, throughput 5.04764K wps
Begin Testing...
[Epoch 168] train avg loss 0.00170648, dev acc 0.7873, dev avg loss 0.488281, throughput 5.43135K wps
[Epoch 169 Batch 30/173] avg loss 0.00177981, throughput 6.05095K wps
[Epoch 169 Batch 60/173] avg loss 0.00175206, throughput 5.76012K wps
[Epoch 169 Batch 90/173] avg loss 0.00155842, throughput 5.314K wps
[Epoch 169 Batch 120/173] avg loss 0.00161497, throughput 4.77607K wps
[Epoch 169 Batch 150/173] avg loss 0.00168298, throughput 4.87247K wps
Begin Testing...
[Epoch 169] train avg loss 0.00170909, dev acc 0.7894, dev avg loss 0.488601, throughput 5.27598K wps
[Epoch 170 Batch 30/173] avg loss 0.00176708, throughput 5.39757K wps
[Epoch 170 Batch 60/173] avg loss 0.00159824, throughput 4.85162K wps
[Epoch 170 Batch 90/173] avg loss 0.00173764, throughput 5.28005K wps
[Epoch 170 Batch 120/173] avg loss 0.00162798, throughput 5.875K wps
[Epoch 170 Batch 150/173] avg loss 0.00162011, throughput 5.2493K wps
Begin Testing...
[Epoch 170] train avg loss 0.0016735, dev acc 0.7904, dev avg loss 0.489149, throughput 5.24873K wps
[Epoch 171 Batch 30/173] avg loss 0.00155835, throughput 5.24203K wps
[Epoch 171 Batch 60/173] avg loss 0.00178218, throughput 5.31703K wps
[Epoch 171 Batch 90/173] avg loss 0.00170736, throughput 5.1746K wps
[Epoch 171 Batch 120/173] avg loss 0.00167264, throughput 5.21391K wps
[Epoch 171 Batch 150/173] avg loss 0.00163691, throughput 5.02784K wps
Begin Testing...
[Epoch 171] train avg loss 0.00168328, dev acc 0.7883, dev avg loss 0.489318, throughput 5.14934K wps
[Epoch 172 Batch 30/173] avg loss 0.00162006, throughput 4.71485K wps
[Epoch 172 Batch 60/173] avg loss 0.00155789, throughput 5.01273K wps
[Epoch 172 Batch 90/173] avg loss 0.00157612, throughput 5.39342K wps
[Epoch 172 Batch 120/173] avg loss 0.00161164, throughput 4.70447K wps
[Epoch 172 Batch 150/173] avg loss 0.0017132, throughput 4.98882K wps
Begin Testing...
[Epoch 172] train avg loss 0.00162133, dev acc 0.7904, dev avg loss 0.490874, throughput 5.04919K wps
[Epoch 173 Batch 30/173] avg loss 0.00166273, throughput 5.97845K wps
[Epoch 173 Batch 60/173] avg loss 0.00158034, throughput 5.06768K wps
[Epoch 173 Batch 90/173] avg loss 0.00176723, throughput 5.9496K wps
[Epoch 173 Batch 120/173] avg loss 0.00170725, throughput 5.0273K wps
[Epoch 173 Batch 150/173] avg loss 0.00167908, throughput 4.98327K wps
Begin Testing...
[Epoch 173] train avg loss 0.00166664, dev acc 0.7883, dev avg loss 0.49031, throughput 5.29668K wps
[Epoch 174 Batch 30/173] avg loss 0.0015423, throughput 4.8155K wps
[Epoch 174 Batch 60/173] avg loss 0.00158604, throughput 6.302K wps
[Epoch 174 Batch 90/173] avg loss 0.00161559, throughput 4.89519K wps
[Epoch 174 Batch 120/173] avg loss 0.00164907, throughput 5.49544K wps
[Epoch 174 Batch 150/173] avg loss 0.00161129, throughput 6.30426K wps
Begin Testing...
[Epoch 174] train avg loss 0.00162229, dev acc 0.7873, dev avg loss 0.491019, throughput 5.55882K wps
[Epoch 175 Batch 30/173] avg loss 0.00158212, throughput 4.83627K wps
[Epoch 175 Batch 60/173] avg loss 0.0016746, throughput 6.21773K wps
[Epoch 175 Batch 90/173] avg loss 0.00171405, throughput 5.32271K wps
[Epoch 175 Batch 120/173] avg loss 0.00182087, throughput 5.03227K wps
[Epoch 175 Batch 150/173] avg loss 0.00152218, throughput 5.70153K wps
Begin Testing...
[Epoch 175] train avg loss 0.00163848, dev acc 0.7894, dev avg loss 0.49121, throughput 5.36639K wps
[Epoch 176 Batch 30/173] avg loss 0.00152877, throughput 5.28515K wps
[Epoch 176 Batch 60/173] avg loss 0.00171552, throughput 5.54124K wps
[Epoch 176 Batch 90/173] avg loss 0.00175184, throughput 5.98724K wps
[Epoch 176 Batch 120/173] avg loss 0.0015259, throughput 5.26568K wps
[Epoch 176 Batch 150/173] avg loss 0.00153838, throughput 4.61031K wps
Begin Testing...
[Epoch 176] train avg loss 0.00160078, dev acc 0.7914, dev avg loss 0.493296, throughput 5.34882K wps
[Epoch 177 Batch 30/173] avg loss 0.00167113, throughput 5.58983K wps
[Epoch 177 Batch 60/173] avg loss 0.00160284, throughput 5.18075K wps
[Epoch 177 Batch 90/173] avg loss 0.0014374, throughput 5.33755K wps
[Epoch 177 Batch 120/173] avg loss 0.00160486, throughput 5.06249K wps
[Epoch 177 Batch 150/173] avg loss 0.00150461, throughput 5.00369K wps
Begin Testing...
[Epoch 177] train avg loss 0.00156489, dev acc 0.7873, dev avg loss 0.492337, throughput 5.36827K wps
[Epoch 178 Batch 30/173] avg loss 0.00154602, throughput 5.90664K wps
[Epoch 178 Batch 60/173] avg loss 0.00160998, throughput 5.13172K wps
[Epoch 178 Batch 90/173] avg loss 0.00146146, throughput 4.87888K wps
[Epoch 178 Batch 120/173] avg loss 0.00164763, throughput 5.97637K wps
[Epoch 178 Batch 150/173] avg loss 0.00153048, throughput 5.05765K wps
Begin Testing...
[Epoch 178] train avg loss 0.00157585, dev acc 0.7904, dev avg loss 0.492487, throughput 5.43289K wps
[Epoch 179 Batch 30/173] avg loss 0.00151171, throughput 5.1835K wps
[Epoch 179 Batch 60/173] avg loss 0.00149752, throughput 5.53638K wps
[Epoch 179 Batch 90/173] avg loss 0.00150915, throughput 5.4334K wps
[Epoch 179 Batch 120/173] avg loss 0.00144409, throughput 4.8411K wps
[Epoch 179 Batch 150/173] avg loss 0.00157101, throughput 5.35457K wps
Begin Testing...
[Epoch 179] train avg loss 0.00152371, dev acc 0.7894, dev avg loss 0.493048, throughput 5.37591K wps
[Epoch 180 Batch 30/173] avg loss 0.00152581, throughput 4.76023K wps
[Epoch 180 Batch 60/173] avg loss 0.00146274, throughput 5.10908K wps
[Epoch 180 Batch 90/173] avg loss 0.00153098, throughput 4.84073K wps
[Epoch 180 Batch 120/173] avg loss 0.00148441, throughput 5.38479K wps
[Epoch 180 Batch 150/173] avg loss 0.00158396, throughput 5.49478K wps
Begin Testing...
[Epoch 180] train avg loss 0.00150685, dev acc 0.7862, dev avg loss 0.493618, throughput 5.10326K wps
[Epoch 181 Batch 30/173] avg loss 0.00164966, throughput 4.62712K wps
[Epoch 181 Batch 60/173] avg loss 0.00138364, throughput 5.02672K wps
[Epoch 181 Batch 90/173] avg loss 0.00143738, throughput 5.42681K wps
[Epoch 181 Batch 120/173] avg loss 0.00147641, throughput 5.67187K wps
[Epoch 181 Batch 150/173] avg loss 0.00151734, throughput 5.24569K wps
Begin Testing...
[Epoch 181] train avg loss 0.00148952, dev acc 0.7904, dev avg loss 0.494234, throughput 5.1317K wps
[Epoch 182 Batch 30/173] avg loss 0.0015096, throughput 5.51181K wps
[Epoch 182 Batch 60/173] avg loss 0.00150754, throughput 5.18069K wps
[Epoch 182 Batch 90/173] avg loss 0.00143901, throughput 5.28653K wps
[Epoch 182 Batch 120/173] avg loss 0.00154556, throughput 5.61009K wps
[Epoch 182 Batch 150/173] avg loss 0.00156642, throughput 4.87668K wps
Begin Testing...
[Epoch 182] train avg loss 0.00152823, dev acc 0.7873, dev avg loss 0.493207, throughput 5.20824K wps
[Epoch 183 Batch 30/173] avg loss 0.00142092, throughput 5.38088K wps
[Epoch 183 Batch 60/173] avg loss 0.00150532, throughput 5.23718K wps
[Epoch 183 Batch 90/173] avg loss 0.00150907, throughput 5.71127K wps
[Epoch 183 Batch 120/173] avg loss 0.00149151, throughput 5.14849K wps
[Epoch 183 Batch 150/173] avg loss 0.00149193, throughput 4.84305K wps
Begin Testing...
[Epoch 183] train avg loss 0.00150314, dev acc 0.7904, dev avg loss 0.49399, throughput 5.25961K wps
[Epoch 184 Batch 30/173] avg loss 0.00147414, throughput 4.86771K wps
[Epoch 184 Batch 60/173] avg loss 0.00128484, throughput 4.56569K wps
[Epoch 184 Batch 90/173] avg loss 0.0015123, throughput 4.9268K wps
[Epoch 184 Batch 120/173] avg loss 0.00146607, throughput 5.09023K wps
[Epoch 184 Batch 150/173] avg loss 0.0015321, throughput 4.79811K wps
Begin Testing...
[Epoch 184] train avg loss 0.00144545, dev acc 0.7904, dev avg loss 0.495258, throughput 4.92499K wps
[Epoch 185 Batch 30/173] avg loss 0.00145842, throughput 4.67663K wps
[Epoch 185 Batch 60/173] avg loss 0.00132742, throughput 4.88336K wps
[Epoch 185 Batch 90/173] avg loss 0.00150623, throughput 4.80758K wps
[Epoch 185 Batch 120/173] avg loss 0.00138298, throughput 5.88592K wps
[Epoch 185 Batch 150/173] avg loss 0.00163385, throughput 5.76765K wps
Begin Testing...
[Epoch 185] train avg loss 0.00148196, dev acc 0.7904, dev avg loss 0.495698, throughput 5.14112K wps
[Epoch 186 Batch 30/173] avg loss 0.00144574, throughput 5.634K wps
[Epoch 186 Batch 60/173] avg loss 0.00141275, throughput 5.13067K wps
[Epoch 186 Batch 90/173] avg loss 0.00138144, throughput 4.8304K wps
[Epoch 186 Batch 120/173] avg loss 0.00121563, throughput 5.32242K wps
[Epoch 186 Batch 150/173] avg loss 0.00147834, throughput 5.92554K wps
Begin Testing...
[Epoch 186] train avg loss 0.00139434, dev acc 0.7904, dev avg loss 0.496578, throughput 5.4692K wps
[Epoch 187 Batch 30/173] avg loss 0.00151333, throughput 4.96013K wps
[Epoch 187 Batch 60/173] avg loss 0.00140384, throughput 5.59162K wps
[Epoch 187 Batch 90/173] avg loss 0.00139392, throughput 4.94709K wps
[Epoch 187 Batch 120/173] avg loss 0.00146163, throughput 5.73993K wps
[Epoch 187 Batch 150/173] avg loss 0.0014525, throughput 6.14537K wps
Begin Testing...
[Epoch 187] train avg loss 0.00146107, dev acc 0.7914, dev avg loss 0.495945, throughput 5.39302K wps
[Epoch 188 Batch 30/173] avg loss 0.00136228, throughput 5.19294K wps
[Epoch 188 Batch 60/173] avg loss 0.00138697, throughput 4.98481K wps
[Epoch 188 Batch 90/173] avg loss 0.00148562, throughput 5.12223K wps
[Epoch 188 Batch 120/173] avg loss 0.00136719, throughput 5.41925K wps
[Epoch 188 Batch 150/173] avg loss 0.00141272, throughput 5.38112K wps
Begin Testing...
[Epoch 188] train avg loss 0.00142552, dev acc 0.7862, dev avg loss 0.501232, throughput 5.29156K wps
[Epoch 189 Batch 30/173] avg loss 0.00157382, throughput 4.98699K wps
[Epoch 189 Batch 60/173] avg loss 0.00135977, throughput 5.09513K wps
[Epoch 189 Batch 90/173] avg loss 0.00141948, throughput 5.42319K wps
[Epoch 189 Batch 120/173] avg loss 0.00147229, throughput 4.71199K wps
[Epoch 189 Batch 150/173] avg loss 0.00149457, throughput 4.73253K wps
Begin Testing...
[Epoch 189] train avg loss 0.00146274, dev acc 0.7894, dev avg loss 0.496211, throughput 5.03777K wps
[Epoch 190 Batch 30/173] avg loss 0.00142664, throughput 5.2255K wps
[Epoch 190 Batch 60/173] avg loss 0.00139587, throughput 4.88795K wps
[Epoch 190 Batch 90/173] avg loss 0.0013957, throughput 4.72308K wps
[Epoch 190 Batch 120/173] avg loss 0.00144984, throughput 6.16207K wps
[Epoch 190 Batch 150/173] avg loss 0.00138465, throughput 6.04145K wps
Begin Testing...
[Epoch 190] train avg loss 0.00142322, dev acc 0.7914, dev avg loss 0.497206, throughput 5.40443K wps
[Epoch 191 Batch 30/173] avg loss 0.00148464, throughput 4.97794K wps
[Epoch 191 Batch 60/173] avg loss 0.00151125, throughput 5.85992K wps
[Epoch 191 Batch 90/173] avg loss 0.00133297, throughput 5.03203K wps
[Epoch 191 Batch 120/173] avg loss 0.00124756, throughput 5.21473K wps
[Epoch 191 Batch 150/173] avg loss 0.00153948, throughput 5.03789K wps
Begin Testing...
[Epoch 191] train avg loss 0.00140768, dev acc 0.7914, dev avg loss 0.499206, throughput 5.1872K wps
[Epoch 192 Batch 30/173] avg loss 0.00127198, throughput 5.64561K wps
[Epoch 192 Batch 60/173] avg loss 0.00139169, throughput 5.55807K wps
[Epoch 192 Batch 90/173] avg loss 0.00141534, throughput 4.94335K wps
[Epoch 192 Batch 120/173] avg loss 0.00146305, throughput 5.08772K wps
[Epoch 192 Batch 150/173] avg loss 0.00154015, throughput 5.47672K wps
Begin Testing...
[Epoch 192] train avg loss 0.00140529, dev acc 0.7873, dev avg loss 0.501424, throughput 5.27371K wps
[Epoch 193 Batch 30/173] avg loss 0.0014091, throughput 4.77013K wps
[Epoch 193 Batch 60/173] avg loss 0.00129251, throughput 5.3874K wps
[Epoch 193 Batch 90/173] avg loss 0.00132752, throughput 5.35082K wps
[Epoch 193 Batch 120/173] avg loss 0.00137584, throughput 5.09521K wps
[Epoch 193 Batch 150/173] avg loss 0.00138068, throughput 4.91148K wps
Begin Testing...
[Epoch 193] train avg loss 0.00136304, dev acc 0.7904, dev avg loss 0.501222, throughput 5.06503K wps
[Epoch 194 Batch 30/173] avg loss 0.00143336, throughput 5.75131K wps
[Epoch 194 Batch 60/173] avg loss 0.00138451, throughput 5.35962K wps
[Epoch 194 Batch 90/173] avg loss 0.00138852, throughput 5.24507K wps
[Epoch 194 Batch 120/173] avg loss 0.00134114, throughput 6.19926K wps
[Epoch 194 Batch 150/173] avg loss 0.00129074, throughput 5.79625K wps
Begin Testing...
[Epoch 194] train avg loss 0.00138028, dev acc 0.7873, dev avg loss 0.500179, throughput 5.52237K wps
[Epoch 195 Batch 30/173] avg loss 0.00141736, throughput 5.38796K wps
[Epoch 195 Batch 60/173] avg loss 0.00120708, throughput 4.8874K wps
[Epoch 195 Batch 90/173] avg loss 0.00129056, throughput 5.50893K wps
[Epoch 195 Batch 120/173] avg loss 0.00130805, throughput 4.74487K wps
[Epoch 195 Batch 150/173] avg loss 0.00140359, throughput 5.2637K wps
Begin Testing...
[Epoch 195] train avg loss 0.0013254, dev acc 0.7852, dev avg loss 0.50248, throughput 5.11045K wps
[Epoch 196 Batch 30/173] avg loss 0.00135597, throughput 5.22139K wps
[Epoch 196 Batch 60/173] avg loss 0.00122284, throughput 5.44118K wps
[Epoch 196 Batch 90/173] avg loss 0.00132517, throughput 5.37391K wps
[Epoch 196 Batch 120/173] avg loss 0.00123162, throughput 4.92886K wps
[Epoch 196 Batch 150/173] avg loss 0.0012398, throughput 5.91455K wps
Begin Testing...
[Epoch 196] train avg loss 0.00129302, dev acc 0.7883, dev avg loss 0.501642, throughput 5.39359K wps
[Epoch 197 Batch 30/173] avg loss 0.00129585, throughput 5.46769K wps
[Epoch 197 Batch 60/173] avg loss 0.00148343, throughput 6.31409K wps
[Epoch 197 Batch 90/173] avg loss 0.00139763, throughput 5.04831K wps
[Epoch 197 Batch 120/173] avg loss 0.00131962, throughput 4.97558K wps
[Epoch 197 Batch 150/173] avg loss 0.00128082, throughput 4.88794K wps
Begin Testing...
[Epoch 197] train avg loss 0.00134883, dev acc 0.7894, dev avg loss 0.501132, throughput 5.34013K wps
[Epoch 198 Batch 30/173] avg loss 0.0014107, throughput 4.6305K wps
[Epoch 198 Batch 60/173] avg loss 0.00133528, throughput 5.50891K wps
[Epoch 198 Batch 90/173] avg loss 0.00140131, throughput 4.88918K wps
[Epoch 198 Batch 120/173] avg loss 0.00135433, throughput 4.97312K wps
[Epoch 198 Batch 150/173] avg loss 0.00137638, throughput 4.9623K wps
Begin Testing...
[Epoch 198] train avg loss 0.00135933, dev acc 0.7883, dev avg loss 0.502221, throughput 5.07803K wps
[Epoch 199 Batch 30/173] avg loss 0.00125453, throughput 4.66949K wps
[Epoch 199 Batch 60/173] avg loss 0.00134418, throughput 5.54K wps
[Epoch 199 Batch 90/173] avg loss 0.00135582, throughput 5.78392K wps
[Epoch 199 Batch 120/173] avg loss 0.0012754, throughput 4.84811K wps
[Epoch 199 Batch 150/173] avg loss 0.00128034, throughput 5.46153K wps
Begin Testing...
[Epoch 199] train avg loss 0.00129678, dev acc 0.7904, dev avg loss 0.503075, throughput 5.20598K wps
Test loss 0.469882, test acc 0.7889
Total time cost 399.80s
[Epoch 0 Batch 30/173] avg loss 0.0140586, throughput 3.88579K wps
[Epoch 0 Batch 60/173] avg loss 0.013934, throughput 5.02787K wps
[Epoch 0 Batch 90/173] avg loss 0.013912, throughput 4.85584K wps
[Epoch 0 Batch 120/173] avg loss 0.0140338, throughput 5.56202K wps
[Epoch 0 Batch 150/173] avg loss 0.0140049, throughput 5.48728K wps
Begin Testing...
[Epoch 0] train avg loss 0.0139966, dev acc 0.6017, dev avg loss 0.684788, throughput 4.93265K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.0138269, throughput 4.71027K wps
[Epoch 1 Batch 60/173] avg loss 0.0136925, throughput 5.70288K wps
[Epoch 1 Batch 90/173] avg loss 0.0137396, throughput 5.12746K wps
[Epoch 1 Batch 120/173] avg loss 0.0138218, throughput 5.3402K wps
[Epoch 1 Batch 150/173] avg loss 0.0137501, throughput 5.6031K wps
Begin Testing...
[Epoch 1] train avg loss 0.0137882, dev acc 0.6621, dev avg loss 0.67851, throughput 5.25247K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.0136063, throughput 5.21762K wps
[Epoch 2 Batch 60/173] avg loss 0.0136196, throughput 5.27488K wps
[Epoch 2 Batch 90/173] avg loss 0.0135832, throughput 5.45832K wps
[Epoch 2 Batch 120/173] avg loss 0.0135932, throughput 5.00123K wps
[Epoch 2 Batch 150/173] avg loss 0.0135684, throughput 4.72994K wps
Begin Testing...
[Epoch 2] train avg loss 0.0136233, dev acc 0.6830, dev avg loss 0.671309, throughput 5.15505K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.0135259, throughput 5.17073K wps
[Epoch 3 Batch 60/173] avg loss 0.0136319, throughput 5.60009K wps
[Epoch 3 Batch 90/173] avg loss 0.0136164, throughput 5.43026K wps
[Epoch 3 Batch 120/173] avg loss 0.0135117, throughput 5.55196K wps
[Epoch 3 Batch 150/173] avg loss 0.0135365, throughput 4.88624K wps
Begin Testing...
[Epoch 3] train avg loss 0.0135711, dev acc 0.7080, dev avg loss 0.665264, throughput 5.33303K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/173] avg loss 0.0133745, throughput 5.38086K wps
[Epoch 4 Batch 60/173] avg loss 0.0133274, throughput 5.67821K wps
[Epoch 4 Batch 90/173] avg loss 0.0133739, throughput 5.62901K wps
[Epoch 4 Batch 120/173] avg loss 0.0133126, throughput 4.69985K wps
[Epoch 4 Batch 150/173] avg loss 0.0132432, throughput 5.0404K wps
Begin Testing...
[Epoch 4] train avg loss 0.0133545, dev acc 0.7101, dev avg loss 0.657999, throughput 5.25583K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/173] avg loss 0.0133285, throughput 5.01555K wps
[Epoch 5 Batch 60/173] avg loss 0.0133324, throughput 5.247K wps
[Epoch 5 Batch 90/173] avg loss 0.0132613, throughput 5.27009K wps
[Epoch 5 Batch 120/173] avg loss 0.0132524, throughput 5.20772K wps
[Epoch 5 Batch 150/173] avg loss 0.01312, throughput 4.90124K wps
Begin Testing...
[Epoch 5] train avg loss 0.0132625, dev acc 0.7299, dev avg loss 0.651195, throughput 5.18278K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/173] avg loss 0.0132208, throughput 5.47547K wps
[Epoch 6 Batch 60/173] avg loss 0.0129931, throughput 5.61068K wps
[Epoch 6 Batch 90/173] avg loss 0.0130813, throughput 5.17065K wps
[Epoch 6 Batch 120/173] avg loss 0.0130641, throughput 5.57837K wps
[Epoch 6 Batch 150/173] avg loss 0.0130764, throughput 5.32916K wps
Begin Testing...
[Epoch 6] train avg loss 0.0130866, dev acc 0.7143, dev avg loss 0.641998, throughput 5.31914K wps
[Epoch 7 Batch 30/173] avg loss 0.0129337, throughput 4.72618K wps
[Epoch 7 Batch 60/173] avg loss 0.0129198, throughput 4.72376K wps
[Epoch 7 Batch 90/173] avg loss 0.0129947, throughput 4.7591K wps
[Epoch 7 Batch 120/173] avg loss 0.0127925, throughput 4.88938K wps
[Epoch 7 Batch 150/173] avg loss 0.0128652, throughput 5.38104K wps
Begin Testing...
[Epoch 7] train avg loss 0.0129073, dev acc 0.7205, dev avg loss 0.633089, throughput 4.95043K wps
[Epoch 8 Batch 30/173] avg loss 0.0128759, throughput 4.95275K wps
[Epoch 8 Batch 60/173] avg loss 0.0127281, throughput 5.58483K wps
[Epoch 8 Batch 90/173] avg loss 0.0127839, throughput 4.90097K wps
[Epoch 8 Batch 120/173] avg loss 0.0126579, throughput 5.23735K wps
[Epoch 8 Batch 150/173] avg loss 0.0128227, throughput 5.54647K wps
Begin Testing...
[Epoch 8] train avg loss 0.0127593, dev acc 0.7310, dev avg loss 0.624041, throughput 5.26653K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/173] avg loss 0.0125646, throughput 5.02357K wps
[Epoch 9 Batch 60/173] avg loss 0.0125996, throughput 5.50479K wps
[Epoch 9 Batch 90/173] avg loss 0.0125471, throughput 5.03643K wps
[Epoch 9 Batch 120/173] avg loss 0.0125356, throughput 5.243K wps
[Epoch 9 Batch 150/173] avg loss 0.0123892, throughput 4.86933K wps
Begin Testing...
[Epoch 9] train avg loss 0.0125233, dev acc 0.7320, dev avg loss 0.613541, throughput 5.12792K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/173] avg loss 0.012468, throughput 5.00753K wps
[Epoch 10 Batch 60/173] avg loss 0.0124327, throughput 5.07999K wps
[Epoch 10 Batch 90/173] avg loss 0.012371, throughput 5.85496K wps
[Epoch 10 Batch 120/173] avg loss 0.0122125, throughput 4.94795K wps
[Epoch 10 Batch 150/173] avg loss 0.0122487, throughput 5.79321K wps
Begin Testing...
[Epoch 10] train avg loss 0.0123423, dev acc 0.7487, dev avg loss 0.606149, throughput 5.33231K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/173] avg loss 0.0121596, throughput 5.5059K wps
[Epoch 11 Batch 60/173] avg loss 0.0120974, throughput 5.71734K wps
[Epoch 11 Batch 90/173] avg loss 0.0120786, throughput 5.4912K wps
[Epoch 11 Batch 120/173] avg loss 0.0120925, throughput 4.91071K wps
[Epoch 11 Batch 150/173] avg loss 0.0119163, throughput 5.21886K wps
Begin Testing...
[Epoch 11] train avg loss 0.0120666, dev acc 0.7289, dev avg loss 0.591472, throughput 5.34944K wps
[Epoch 12 Batch 30/173] avg loss 0.0120128, throughput 5.43224K wps
[Epoch 12 Batch 60/173] avg loss 0.011931, throughput 4.85779K wps
[Epoch 12 Batch 90/173] avg loss 0.0118812, throughput 5.17437K wps
[Epoch 12 Batch 120/173] avg loss 0.0119401, throughput 4.87427K wps
[Epoch 12 Batch 150/173] avg loss 0.0117199, throughput 5.03868K wps
Begin Testing...
[Epoch 12] train avg loss 0.0119036, dev acc 0.7383, dev avg loss 0.581288, throughput 5.1018K wps
[Epoch 13 Batch 30/173] avg loss 0.0119869, throughput 4.85395K wps
[Epoch 13 Batch 60/173] avg loss 0.0116182, throughput 5.62799K wps
[Epoch 13 Batch 90/173] avg loss 0.0116256, throughput 5.71336K wps
[Epoch 13 Batch 120/173] avg loss 0.0115817, throughput 4.80955K wps
[Epoch 13 Batch 150/173] avg loss 0.0115558, throughput 5.19092K wps
Begin Testing...
[Epoch 13] train avg loss 0.0116738, dev acc 0.7518, dev avg loss 0.568985, throughput 5.20034K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/173] avg loss 0.0115569, throughput 5.0471K wps
[Epoch 14 Batch 60/173] avg loss 0.0115203, throughput 5.85587K wps
[Epoch 14 Batch 90/173] avg loss 0.0116968, throughput 4.91599K wps
[Epoch 14 Batch 120/173] avg loss 0.0114233, throughput 5.33515K wps
[Epoch 14 Batch 150/173] avg loss 0.0113907, throughput 5.29246K wps
Begin Testing...
[Epoch 14] train avg loss 0.0115056, dev acc 0.7529, dev avg loss 0.558287, throughput 5.17397K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/173] avg loss 0.0114035, throughput 5.3559K wps
[Epoch 15 Batch 60/173] avg loss 0.0113116, throughput 5.50845K wps
[Epoch 15 Batch 90/173] avg loss 0.0112548, throughput 4.75537K wps
[Epoch 15 Batch 120/173] avg loss 0.0111001, throughput 4.80646K wps
[Epoch 15 Batch 150/173] avg loss 0.0111463, throughput 5.26502K wps
Begin Testing...
[Epoch 15] train avg loss 0.0112559, dev acc 0.7602, dev avg loss 0.549788, throughput 5.08191K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.0110976, throughput 5.71335K wps
[Epoch 16 Batch 60/173] avg loss 0.0112242, throughput 4.91292K wps
[Epoch 16 Batch 90/173] avg loss 0.0110543, throughput 4.66587K wps
[Epoch 16 Batch 120/173] avg loss 0.0110064, throughput 4.34696K wps
[Epoch 16 Batch 150/173] avg loss 0.0106875, throughput 5.92462K wps
Begin Testing...
[Epoch 16] train avg loss 0.0110081, dev acc 0.7581, dev avg loss 0.537438, throughput 5.0924K wps
[Epoch 17 Batch 30/173] avg loss 0.0108465, throughput 5.38607K wps
[Epoch 17 Batch 60/173] avg loss 0.0106593, throughput 4.88735K wps
[Epoch 17 Batch 90/173] avg loss 0.0109629, throughput 5.53744K wps
[Epoch 17 Batch 120/173] avg loss 0.010815, throughput 5.01463K wps
[Epoch 17 Batch 150/173] avg loss 0.0109066, throughput 4.83227K wps
Begin Testing...
[Epoch 17] train avg loss 0.0108423, dev acc 0.7654, dev avg loss 0.52869, throughput 5.12907K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/173] avg loss 0.0105802, throughput 5.23826K wps
[Epoch 18 Batch 60/173] avg loss 0.0104547, throughput 4.777K wps
[Epoch 18 Batch 90/173] avg loss 0.0108038, throughput 5.08049K wps
[Epoch 18 Batch 120/173] avg loss 0.0105965, throughput 4.91789K wps
[Epoch 18 Batch 150/173] avg loss 0.0103903, throughput 5.91322K wps
Begin Testing...
[Epoch 18] train avg loss 0.010588, dev acc 0.7643, dev avg loss 0.519098, throughput 5.11442K wps
[Epoch 19 Batch 30/173] avg loss 0.0103183, throughput 5.55062K wps
[Epoch 19 Batch 60/173] avg loss 0.010284, throughput 5.75838K wps
[Epoch 19 Batch 90/173] avg loss 0.0106871, throughput 5.67453K wps
[Epoch 19 Batch 120/173] avg loss 0.0102023, throughput 5.22049K wps
[Epoch 19 Batch 150/173] avg loss 0.0104473, throughput 5.92781K wps
Begin Testing...
[Epoch 19] train avg loss 0.0104263, dev acc 0.7737, dev avg loss 0.511827, throughput 5.50009K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/173] avg loss 0.0103162, throughput 5.38494K wps
[Epoch 20 Batch 60/173] avg loss 0.0102106, throughput 4.83226K wps
[Epoch 20 Batch 90/173] avg loss 0.0101995, throughput 5.60723K wps
[Epoch 20 Batch 120/173] avg loss 0.0102053, throughput 4.93551K wps
[Epoch 20 Batch 150/173] avg loss 0.0102358, throughput 4.92966K wps
Begin Testing...
[Epoch 20] train avg loss 0.0102674, dev acc 0.7675, dev avg loss 0.50761, throughput 5.18967K wps
[Epoch 21 Batch 30/173] avg loss 0.0103267, throughput 4.68504K wps
[Epoch 21 Batch 60/173] avg loss 0.010061, throughput 5.1697K wps
[Epoch 21 Batch 90/173] avg loss 0.00996003, throughput 4.91712K wps
[Epoch 21 Batch 120/173] avg loss 0.0103791, throughput 4.68066K wps
[Epoch 21 Batch 150/173] avg loss 0.00996477, throughput 5.48052K wps
Begin Testing...
[Epoch 21] train avg loss 0.0101162, dev acc 0.7716, dev avg loss 0.497124, throughput 5.07989K wps
[Epoch 22 Batch 30/173] avg loss 0.010043, throughput 5.08041K wps
[Epoch 22 Batch 60/173] avg loss 0.00999561, throughput 5.44799K wps
[Epoch 22 Batch 90/173] avg loss 0.0095195, throughput 5.67226K wps
[Epoch 22 Batch 120/173] avg loss 0.0100921, throughput 5.01831K wps
[Epoch 22 Batch 150/173] avg loss 0.00977776, throughput 5.26404K wps
Begin Testing...
[Epoch 22] train avg loss 0.00991936, dev acc 0.7810, dev avg loss 0.490197, throughput 5.29767K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/173] avg loss 0.0099584, throughput 5.20819K wps
[Epoch 23 Batch 60/173] avg loss 0.00981912, throughput 4.94822K wps
[Epoch 23 Batch 90/173] avg loss 0.00963004, throughput 4.76273K wps
[Epoch 23 Batch 120/173] avg loss 0.0100161, throughput 4.91083K wps
[Epoch 23 Batch 150/173] avg loss 0.00971886, throughput 5.11182K wps
Begin Testing...
[Epoch 23] train avg loss 0.0098345, dev acc 0.7758, dev avg loss 0.483945, throughput 4.94103K wps
[Epoch 24 Batch 30/173] avg loss 0.00978519, throughput 5.99617K wps
[Epoch 24 Batch 60/173] avg loss 0.00964579, throughput 5.71102K wps
[Epoch 24 Batch 90/173] avg loss 0.00961474, throughput 5.80258K wps
[Epoch 24 Batch 120/173] avg loss 0.00963247, throughput 5.16691K wps
[Epoch 24 Batch 150/173] avg loss 0.00955558, throughput 4.81122K wps
Begin Testing...
[Epoch 24] train avg loss 0.00967804, dev acc 0.7831, dev avg loss 0.479703, throughput 5.58728K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/173] avg loss 0.00960713, throughput 6.45983K wps
[Epoch 25 Batch 60/173] avg loss 0.00942306, throughput 4.86271K wps
[Epoch 25 Batch 90/173] avg loss 0.00961391, throughput 4.87307K wps
[Epoch 25 Batch 120/173] avg loss 0.00929568, throughput 5.19089K wps
[Epoch 25 Batch 150/173] avg loss 0.00959097, throughput 5.62814K wps
Begin Testing...
[Epoch 25] train avg loss 0.00952938, dev acc 0.7800, dev avg loss 0.474216, throughput 5.27088K wps
[Epoch 26 Batch 30/173] avg loss 0.00909813, throughput 5.1572K wps
[Epoch 26 Batch 60/173] avg loss 0.00919613, throughput 4.80995K wps
[Epoch 26 Batch 90/173] avg loss 0.0096079, throughput 4.79839K wps
[Epoch 26 Batch 120/173] avg loss 0.00948492, throughput 4.66339K wps
[Epoch 26 Batch 150/173] avg loss 0.00931523, throughput 5.43563K wps
Begin Testing...
[Epoch 26] train avg loss 0.00936931, dev acc 0.7862, dev avg loss 0.470148, throughput 4.96199K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/173] avg loss 0.00961964, throughput 5.76802K wps
[Epoch 27 Batch 60/173] avg loss 0.00945257, throughput 4.77338K wps
[Epoch 27 Batch 90/173] avg loss 0.00900848, throughput 5.15047K wps
[Epoch 27 Batch 120/173] avg loss 0.00914924, throughput 5.08804K wps
[Epoch 27 Batch 150/173] avg loss 0.00899217, throughput 5.72755K wps
Begin Testing...
[Epoch 27] train avg loss 0.00926157, dev acc 0.7883, dev avg loss 0.465137, throughput 5.28288K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/173] avg loss 0.00921545, throughput 4.65378K wps
[Epoch 28 Batch 60/173] avg loss 0.00923506, throughput 5.48921K wps
[Epoch 28 Batch 90/173] avg loss 0.00878058, throughput 5.2185K wps
[Epoch 28 Batch 120/173] avg loss 0.00909752, throughput 5.567K wps
[Epoch 28 Batch 150/173] avg loss 0.00923705, throughput 5.3879K wps
Begin Testing...
[Epoch 28] train avg loss 0.00909899, dev acc 0.7904, dev avg loss 0.46193, throughput 5.33483K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/173] avg loss 0.00931424, throughput 4.70776K wps
[Epoch 29 Batch 60/173] avg loss 0.00902111, throughput 5.82462K wps
[Epoch 29 Batch 90/173] avg loss 0.00891693, throughput 5.34533K wps
[Epoch 29 Batch 120/173] avg loss 0.00877207, throughput 5.06748K wps
[Epoch 29 Batch 150/173] avg loss 0.00925939, throughput 5.61989K wps
Begin Testing...
[Epoch 29] train avg loss 0.00906635, dev acc 0.7914, dev avg loss 0.458622, throughput 5.23859K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/173] avg loss 0.0086899, throughput 5.48415K wps
[Epoch 30 Batch 60/173] avg loss 0.00913082, throughput 5.8659K wps
[Epoch 30 Batch 90/173] avg loss 0.00878078, throughput 4.95104K wps
[Epoch 30 Batch 120/173] avg loss 0.00888219, throughput 5.05059K wps
[Epoch 30 Batch 150/173] avg loss 0.0088361, throughput 5.94933K wps
Begin Testing...
[Epoch 30] train avg loss 0.00890191, dev acc 0.7883, dev avg loss 0.455065, throughput 5.4462K wps
[Epoch 31 Batch 30/173] avg loss 0.00894814, throughput 4.74236K wps
[Epoch 31 Batch 60/173] avg loss 0.00883644, throughput 4.88976K wps
[Epoch 31 Batch 90/173] avg loss 0.00894062, throughput 5.3809K wps
[Epoch 31 Batch 120/173] avg loss 0.00857372, throughput 5.50222K wps
[Epoch 31 Batch 150/173] avg loss 0.00878907, throughput 5.22329K wps
Begin Testing...
[Epoch 31] train avg loss 0.00887112, dev acc 0.7967, dev avg loss 0.452221, throughput 5.1794K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/173] avg loss 0.00869843, throughput 5.25769K wps
[Epoch 32 Batch 60/173] avg loss 0.00884448, throughput 4.90817K wps
[Epoch 32 Batch 90/173] avg loss 0.00864945, throughput 4.88068K wps
[Epoch 32 Batch 120/173] avg loss 0.00885973, throughput 5.04655K wps
[Epoch 32 Batch 150/173] avg loss 0.00838932, throughput 5.08049K wps
Begin Testing...
[Epoch 32] train avg loss 0.00871012, dev acc 0.7946, dev avg loss 0.45009, throughput 4.98339K wps
[Epoch 33 Batch 30/173] avg loss 0.00876481, throughput 4.89964K wps
[Epoch 33 Batch 60/173] avg loss 0.00856971, throughput 4.99843K wps
[Epoch 33 Batch 90/173] avg loss 0.00851435, throughput 4.93371K wps
[Epoch 33 Batch 120/173] avg loss 0.00855037, throughput 4.86687K wps
[Epoch 33 Batch 150/173] avg loss 0.00909858, throughput 5.8606K wps
Begin Testing...
[Epoch 33] train avg loss 0.00868561, dev acc 0.7977, dev avg loss 0.448111, throughput 5.04304K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/173] avg loss 0.00864706, throughput 5.18756K wps
[Epoch 34 Batch 60/173] avg loss 0.00874841, throughput 5.01702K wps
[Epoch 34 Batch 90/173] avg loss 0.00841689, throughput 4.67602K wps
[Epoch 34 Batch 120/173] avg loss 0.00809942, throughput 5.02828K wps
[Epoch 34 Batch 150/173] avg loss 0.00862346, throughput 5.50812K wps
Begin Testing...
[Epoch 34] train avg loss 0.00846259, dev acc 0.7987, dev avg loss 0.444365, throughput 5.21288K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/173] avg loss 0.00805275, throughput 5.02854K wps
[Epoch 35 Batch 60/173] avg loss 0.00831751, throughput 5.1232K wps
[Epoch 35 Batch 90/173] avg loss 0.00856819, throughput 4.67403K wps
[Epoch 35 Batch 120/173] avg loss 0.00862743, throughput 4.86709K wps
[Epoch 35 Batch 150/173] avg loss 0.0082775, throughput 5.07115K wps
Begin Testing...
[Epoch 35] train avg loss 0.00836274, dev acc 0.7904, dev avg loss 0.441889, throughput 4.95248K wps
[Epoch 36 Batch 30/173] avg loss 0.00826347, throughput 4.84684K wps
[Epoch 36 Batch 60/173] avg loss 0.00831858, throughput 4.6165K wps
[Epoch 36 Batch 90/173] avg loss 0.00801795, throughput 5.22734K wps
[Epoch 36 Batch 120/173] avg loss 0.00864953, throughput 4.84102K wps
[Epoch 36 Batch 150/173] avg loss 0.00830714, throughput 4.96935K wps
Begin Testing...
[Epoch 36] train avg loss 0.00828664, dev acc 0.7946, dev avg loss 0.439728, throughput 4.8791K wps
[Epoch 37 Batch 30/173] avg loss 0.00801785, throughput 5.34702K wps
[Epoch 37 Batch 60/173] avg loss 0.008303, throughput 5.64541K wps
[Epoch 37 Batch 90/173] avg loss 0.00824211, throughput 4.93028K wps
[Epoch 37 Batch 120/173] avg loss 0.00816144, throughput 5.44315K wps
[Epoch 37 Batch 150/173] avg loss 0.00815955, throughput 5.48249K wps
Begin Testing...
[Epoch 37] train avg loss 0.00821306, dev acc 0.7946, dev avg loss 0.437912, throughput 5.28028K wps
[Epoch 38 Batch 30/173] avg loss 0.00785935, throughput 5.03279K wps
[Epoch 38 Batch 60/173] avg loss 0.00813963, throughput 5.85171K wps
[Epoch 38 Batch 90/173] avg loss 0.00834454, throughput 5.00892K wps
[Epoch 38 Batch 120/173] avg loss 0.0076675, throughput 4.96353K wps
[Epoch 38 Batch 150/173] avg loss 0.00817517, throughput 4.65175K wps
Begin Testing...
[Epoch 38] train avg loss 0.00806916, dev acc 0.7946, dev avg loss 0.435947, throughput 5.09298K wps
[Epoch 39 Batch 30/173] avg loss 0.00823881, throughput 5.43435K wps
[Epoch 39 Batch 60/173] avg loss 0.00811594, throughput 5.16287K wps
[Epoch 39 Batch 90/173] avg loss 0.00790299, throughput 5.32504K wps
[Epoch 39 Batch 120/173] avg loss 0.00782331, throughput 5.82589K wps
[Epoch 39 Batch 150/173] avg loss 0.00805655, throughput 5.24756K wps
Begin Testing...
[Epoch 39] train avg loss 0.00802787, dev acc 0.7977, dev avg loss 0.437388, throughput 5.31852K wps
[Epoch 40 Batch 30/173] avg loss 0.00804481, throughput 6.04841K wps
[Epoch 40 Batch 60/173] avg loss 0.00798445, throughput 5.40849K wps
[Epoch 40 Batch 90/173] avg loss 0.00769797, throughput 5.11187K wps
[Epoch 40 Batch 120/173] avg loss 0.00803823, throughput 5.11961K wps
[Epoch 40 Batch 150/173] avg loss 0.00792089, throughput 4.74709K wps
Begin Testing...
[Epoch 40] train avg loss 0.00800591, dev acc 0.7967, dev avg loss 0.435408, throughput 5.31897K wps
[Epoch 41 Batch 30/173] avg loss 0.00812699, throughput 4.97671K wps
[Epoch 41 Batch 60/173] avg loss 0.00768258, throughput 5.04138K wps
[Epoch 41 Batch 90/173] avg loss 0.00793207, throughput 5.09112K wps
[Epoch 41 Batch 120/173] avg loss 0.00770839, throughput 4.78948K wps
[Epoch 41 Batch 150/173] avg loss 0.0080235, throughput 4.67934K wps
Begin Testing...
[Epoch 41] train avg loss 0.00786058, dev acc 0.7987, dev avg loss 0.432155, throughput 5.01954K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/173] avg loss 0.00765242, throughput 5.27157K wps
[Epoch 42 Batch 60/173] avg loss 0.0078861, throughput 5.99651K wps
[Epoch 42 Batch 90/173] avg loss 0.00759772, throughput 4.99222K wps
[Epoch 42 Batch 120/173] avg loss 0.00768952, throughput 5.83119K wps
[Epoch 42 Batch 150/173] avg loss 0.00811078, throughput 5.54798K wps
Begin Testing...
[Epoch 42] train avg loss 0.00782917, dev acc 0.8040, dev avg loss 0.430345, throughput 5.43508K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/173] avg loss 0.00768593, throughput 4.89655K wps
[Epoch 43 Batch 60/173] avg loss 0.0076127, throughput 5.1767K wps
[Epoch 43 Batch 90/173] avg loss 0.00778898, throughput 5.60574K wps
[Epoch 43 Batch 120/173] avg loss 0.00759313, throughput 4.92114K wps
[Epoch 43 Batch 150/173] avg loss 0.00798824, throughput 5.12003K wps
Begin Testing...
[Epoch 43] train avg loss 0.0077095, dev acc 0.7987, dev avg loss 0.430265, throughput 5.20565K wps
[Epoch 44 Batch 30/173] avg loss 0.00781521, throughput 5.48711K wps
[Epoch 44 Batch 60/173] avg loss 0.00729536, throughput 4.9326K wps
[Epoch 44 Batch 90/173] avg loss 0.00745277, throughput 5.36089K wps
[Epoch 44 Batch 120/173] avg loss 0.00773338, throughput 6.18321K wps
[Epoch 44 Batch 150/173] avg loss 0.00760004, throughput 4.90003K wps
Begin Testing...
[Epoch 44] train avg loss 0.00761826, dev acc 0.7977, dev avg loss 0.428244, throughput 5.25094K wps
[Epoch 45 Batch 30/173] avg loss 0.00725829, throughput 4.89527K wps
[Epoch 45 Batch 60/173] avg loss 0.00732572, throughput 5.17936K wps
[Epoch 45 Batch 90/173] avg loss 0.00748816, throughput 5.36552K wps
[Epoch 45 Batch 120/173] avg loss 0.007618, throughput 4.72289K wps
[Epoch 45 Batch 150/173] avg loss 0.00747405, throughput 5.31882K wps
Begin Testing...
[Epoch 45] train avg loss 0.00749887, dev acc 0.7977, dev avg loss 0.428609, throughput 5.08029K wps
[Epoch 46 Batch 30/173] avg loss 0.00712639, throughput 4.66599K wps
[Epoch 46 Batch 60/173] avg loss 0.00739807, throughput 5.21477K wps
[Epoch 46 Batch 90/173] avg loss 0.00717252, throughput 5.16193K wps
[Epoch 46 Batch 120/173] avg loss 0.00728031, throughput 5.35842K wps
[Epoch 46 Batch 150/173] avg loss 0.00756, throughput 4.89722K wps
Begin Testing...
[Epoch 46] train avg loss 0.00739783, dev acc 0.7977, dev avg loss 0.425772, throughput 5.08991K wps
[Epoch 47 Batch 30/173] avg loss 0.00747688, throughput 5.33395K wps
[Epoch 47 Batch 60/173] avg loss 0.00758194, throughput 4.90823K wps
[Epoch 47 Batch 90/173] avg loss 0.00700422, throughput 4.90424K wps
[Epoch 47 Batch 120/173] avg loss 0.00725569, throughput 5.61665K wps
[Epoch 47 Batch 150/173] avg loss 0.00728224, throughput 4.83905K wps
Begin Testing...
[Epoch 47] train avg loss 0.00738465, dev acc 0.8040, dev avg loss 0.424136, throughput 5.07185K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/173] avg loss 0.00691279, throughput 4.7873K wps
[Epoch 48 Batch 60/173] avg loss 0.00777543, throughput 4.89708K wps
[Epoch 48 Batch 90/173] avg loss 0.00721745, throughput 5.26748K wps
[Epoch 48 Batch 120/173] avg loss 0.00734635, throughput 6.22835K wps
[Epoch 48 Batch 150/173] avg loss 0.00703968, throughput 6.00073K wps
Begin Testing...
[Epoch 48] train avg loss 0.00726761, dev acc 0.8029, dev avg loss 0.424559, throughput 5.33017K wps
[Epoch 49 Batch 30/173] avg loss 0.00742553, throughput 4.88702K wps
[Epoch 49 Batch 60/173] avg loss 0.00730257, throughput 5.07596K wps
[Epoch 49 Batch 90/173] avg loss 0.00685439, throughput 5.02817K wps
[Epoch 49 Batch 120/173] avg loss 0.00718244, throughput 5.79927K wps
[Epoch 49 Batch 150/173] avg loss 0.00715945, throughput 4.98273K wps
Begin Testing...
[Epoch 49] train avg loss 0.00719479, dev acc 0.8040, dev avg loss 0.42282, throughput 5.17337K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/173] avg loss 0.00718468, throughput 5.02347K wps
[Epoch 50 Batch 60/173] avg loss 0.00729273, throughput 5.00881K wps
[Epoch 50 Batch 90/173] avg loss 0.00707002, throughput 4.91295K wps
[Epoch 50 Batch 120/173] avg loss 0.00705581, throughput 4.9649K wps
[Epoch 50 Batch 150/173] avg loss 0.00688658, throughput 5.22326K wps
Begin Testing...
[Epoch 50] train avg loss 0.00711918, dev acc 0.7977, dev avg loss 0.422929, throughput 4.98619K wps
[Epoch 51 Batch 30/173] avg loss 0.00690661, throughput 4.72523K wps
[Epoch 51 Batch 60/173] avg loss 0.00705705, throughput 5.40151K wps
[Epoch 51 Batch 90/173] avg loss 0.00708993, throughput 5.14457K wps
[Epoch 51 Batch 120/173] avg loss 0.00709049, throughput 5.89903K wps
[Epoch 51 Batch 150/173] avg loss 0.00695126, throughput 4.74361K wps
Begin Testing...
[Epoch 51] train avg loss 0.00700396, dev acc 0.8008, dev avg loss 0.421482, throughput 5.1417K wps
[Epoch 52 Batch 30/173] avg loss 0.00696771, throughput 5.21946K wps
[Epoch 52 Batch 60/173] avg loss 0.00643676, throughput 6.13619K wps
[Epoch 52 Batch 90/173] avg loss 0.00712525, throughput 5.26432K wps
[Epoch 52 Batch 120/173] avg loss 0.0072403, throughput 4.671K wps
[Epoch 52 Batch 150/173] avg loss 0.00682497, throughput 5.74812K wps
Begin Testing...
[Epoch 52] train avg loss 0.00695237, dev acc 0.8040, dev avg loss 0.421677, throughput 5.4039K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/173] avg loss 0.00695103, throughput 5.06246K wps
[Epoch 53 Batch 60/173] avg loss 0.00654451, throughput 5.73374K wps
[Epoch 53 Batch 90/173] avg loss 0.00688129, throughput 5.31048K wps
[Epoch 53 Batch 120/173] avg loss 0.00717746, throughput 5.27757K wps
[Epoch 53 Batch 150/173] avg loss 0.00669783, throughput 4.97637K wps
Begin Testing...
[Epoch 53] train avg loss 0.00687815, dev acc 0.8040, dev avg loss 0.419378, throughput 5.24653K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/173] avg loss 0.00683989, throughput 6.00315K wps
[Epoch 54 Batch 60/173] avg loss 0.00684522, throughput 5.0296K wps
[Epoch 54 Batch 90/173] avg loss 0.00699136, throughput 5.5893K wps
[Epoch 54 Batch 120/173] avg loss 0.00674094, throughput 5.29291K wps
[Epoch 54 Batch 150/173] avg loss 0.00674864, throughput 5.207K wps
Begin Testing...
[Epoch 54] train avg loss 0.00681725, dev acc 0.8050, dev avg loss 0.417951, throughput 5.35923K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/173] avg loss 0.00696895, throughput 5.16942K wps
[Epoch 55 Batch 60/173] avg loss 0.00674957, throughput 5.10182K wps
[Epoch 55 Batch 90/173] avg loss 0.00682889, throughput 5.11151K wps
[Epoch 55 Batch 120/173] avg loss 0.006453, throughput 4.90854K wps
[Epoch 55 Batch 150/173] avg loss 0.00624331, throughput 6.00726K wps
Begin Testing...
[Epoch 55] train avg loss 0.0067024, dev acc 0.8081, dev avg loss 0.417621, throughput 5.20298K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/173] avg loss 0.00678637, throughput 5.29902K wps
[Epoch 56 Batch 60/173] avg loss 0.00657832, throughput 5.03868K wps
[Epoch 56 Batch 90/173] avg loss 0.00682695, throughput 5.46845K wps
[Epoch 56 Batch 120/173] avg loss 0.0065267, throughput 6.1072K wps
[Epoch 56 Batch 150/173] avg loss 0.00624914, throughput 6.41536K wps
Begin Testing...
[Epoch 56] train avg loss 0.0065924, dev acc 0.8040, dev avg loss 0.416977, throughput 5.60537K wps
[Epoch 57 Batch 30/173] avg loss 0.00627973, throughput 5.01397K wps
[Epoch 57 Batch 60/173] avg loss 0.00632087, throughput 5.37706K wps
[Epoch 57 Batch 90/173] avg loss 0.00651397, throughput 4.81797K wps
[Epoch 57 Batch 120/173] avg loss 0.00689949, throughput 5.3661K wps
[Epoch 57 Batch 150/173] avg loss 0.00654537, throughput 4.72142K wps
Begin Testing...
[Epoch 57] train avg loss 0.00653768, dev acc 0.8060, dev avg loss 0.418742, throughput 5.04186K wps
[Epoch 58 Batch 30/173] avg loss 0.00608253, throughput 5.07765K wps
[Epoch 58 Batch 60/173] avg loss 0.0063577, throughput 4.91099K wps
[Epoch 58 Batch 90/173] avg loss 0.00655973, throughput 6.1653K wps
[Epoch 58 Batch 120/173] avg loss 0.00697857, throughput 5.04872K wps
[Epoch 58 Batch 150/173] avg loss 0.00610526, throughput 4.89902K wps
Begin Testing...
[Epoch 58] train avg loss 0.00641249, dev acc 0.8040, dev avg loss 0.415866, throughput 5.25089K wps
[Epoch 59 Batch 30/173] avg loss 0.00651703, throughput 5.27393K wps
[Epoch 59 Batch 60/173] avg loss 0.00633858, throughput 5.63207K wps
[Epoch 59 Batch 90/173] avg loss 0.00598514, throughput 5.29045K wps
[Epoch 59 Batch 120/173] avg loss 0.00647595, throughput 5.49868K wps
[Epoch 59 Batch 150/173] avg loss 0.00655071, throughput 4.91272K wps
Begin Testing...
[Epoch 59] train avg loss 0.00638079, dev acc 0.8060, dev avg loss 0.415457, throughput 5.27183K wps
[Epoch 60 Batch 30/173] avg loss 0.00662636, throughput 6.07375K wps
[Epoch 60 Batch 60/173] avg loss 0.00635639, throughput 5.71804K wps
[Epoch 60 Batch 90/173] avg loss 0.00628668, throughput 5.73825K wps
[Epoch 60 Batch 120/173] avg loss 0.00651819, throughput 5.16205K wps
[Epoch 60 Batch 150/173] avg loss 0.00626179, throughput 5.72057K wps
Begin Testing...
[Epoch 60] train avg loss 0.006366, dev acc 0.8019, dev avg loss 0.417121, throughput 5.63171K wps
[Epoch 61 Batch 30/173] avg loss 0.00610167, throughput 5.0233K wps
[Epoch 61 Batch 60/173] avg loss 0.00605063, throughput 5.16409K wps
[Epoch 61 Batch 90/173] avg loss 0.00605293, throughput 5.17352K wps
[Epoch 61 Batch 120/173] avg loss 0.00641463, throughput 5.68269K wps
[Epoch 61 Batch 150/173] avg loss 0.0064075, throughput 4.91204K wps
Begin Testing...
[Epoch 61] train avg loss 0.00619853, dev acc 0.8102, dev avg loss 0.414016, throughput 5.11038K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/173] avg loss 0.0060615, throughput 5.2225K wps
[Epoch 62 Batch 60/173] avg loss 0.00626978, throughput 5.59857K wps
[Epoch 62 Batch 90/173] avg loss 0.00631274, throughput 5.38983K wps
[Epoch 62 Batch 120/173] avg loss 0.00606241, throughput 6.18214K wps
[Epoch 62 Batch 150/173] avg loss 0.00623452, throughput 5.06136K wps
Begin Testing...
[Epoch 62] train avg loss 0.00620362, dev acc 0.8050, dev avg loss 0.413351, throughput 5.44861K wps
[Epoch 63 Batch 30/173] avg loss 0.00627367, throughput 5.69612K wps
[Epoch 63 Batch 60/173] avg loss 0.0060487, throughput 5.25408K wps
[Epoch 63 Batch 90/173] avg loss 0.005848, throughput 5.70561K wps
[Epoch 63 Batch 120/173] avg loss 0.00620492, throughput 5.24815K wps
[Epoch 63 Batch 150/173] avg loss 0.00611082, throughput 5.19783K wps
Begin Testing...
[Epoch 63] train avg loss 0.00611216, dev acc 0.8050, dev avg loss 0.412869, throughput 5.32601K wps
[Epoch 64 Batch 30/173] avg loss 0.0060276, throughput 5.49886K wps
[Epoch 64 Batch 60/173] avg loss 0.00604419, throughput 5.48716K wps
[Epoch 64 Batch 90/173] avg loss 0.00629014, throughput 4.94722K wps
[Epoch 64 Batch 120/173] avg loss 0.00597609, throughput 4.91756K wps
[Epoch 64 Batch 150/173] avg loss 0.00588259, throughput 5.45542K wps
Begin Testing...
[Epoch 64] train avg loss 0.00607042, dev acc 0.8040, dev avg loss 0.412227, throughput 5.29415K wps
[Epoch 65 Batch 30/173] avg loss 0.00590587, throughput 4.66612K wps
[Epoch 65 Batch 60/173] avg loss 0.00570556, throughput 4.9705K wps
[Epoch 65 Batch 90/173] avg loss 0.00593206, throughput 5.33767K wps
[Epoch 65 Batch 120/173] avg loss 0.00585001, throughput 5.48939K wps
[Epoch 65 Batch 150/173] avg loss 0.00592585, throughput 5.21544K wps
Begin Testing...
[Epoch 65] train avg loss 0.00597934, dev acc 0.8071, dev avg loss 0.412568, throughput 5.11116K wps
[Epoch 66 Batch 30/173] avg loss 0.00625043, throughput 5.66536K wps
[Epoch 66 Batch 60/173] avg loss 0.00586426, throughput 5.84678K wps
[Epoch 66 Batch 90/173] avg loss 0.0060716, throughput 5.54746K wps
[Epoch 66 Batch 120/173] avg loss 0.00592871, throughput 6.31644K wps
[Epoch 66 Batch 150/173] avg loss 0.00572352, throughput 5.99455K wps
Begin Testing...
[Epoch 66] train avg loss 0.00594521, dev acc 0.8092, dev avg loss 0.411985, throughput 5.75149K wps
[Epoch 67 Batch 30/173] avg loss 0.00558751, throughput 5.49456K wps
[Epoch 67 Batch 60/173] avg loss 0.00609976, throughput 5.23165K wps
[Epoch 67 Batch 90/173] avg loss 0.0056629, throughput 6.25354K wps
[Epoch 67 Batch 120/173] avg loss 0.00577977, throughput 5.14546K wps
[Epoch 67 Batch 150/173] avg loss 0.00584355, throughput 5.90369K wps
Begin Testing...
[Epoch 67] train avg loss 0.00584854, dev acc 0.8071, dev avg loss 0.413426, throughput 5.54053K wps
[Epoch 68 Batch 30/173] avg loss 0.00564891, throughput 5.18478K wps
[Epoch 68 Batch 60/173] avg loss 0.00566211, throughput 5.93339K wps
[Epoch 68 Batch 90/173] avg loss 0.00602933, throughput 5.64352K wps
[Epoch 68 Batch 120/173] avg loss 0.00561715, throughput 4.83081K wps
[Epoch 68 Batch 150/173] avg loss 0.00577648, throughput 5.23156K wps
Begin Testing...
[Epoch 68] train avg loss 0.00578011, dev acc 0.8123, dev avg loss 0.41011, throughput 5.3361K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/173] avg loss 0.00572546, throughput 4.97537K wps
[Epoch 69 Batch 60/173] avg loss 0.00592064, throughput 5.54887K wps
[Epoch 69 Batch 90/173] avg loss 0.00547971, throughput 5.06712K wps
[Epoch 69 Batch 120/173] avg loss 0.00556441, throughput 5.39304K wps
[Epoch 69 Batch 150/173] avg loss 0.005693, throughput 5.23137K wps
Begin Testing...
[Epoch 69] train avg loss 0.00567845, dev acc 0.8092, dev avg loss 0.41135, throughput 5.3647K wps
[Epoch 70 Batch 30/173] avg loss 0.00586011, throughput 4.7032K wps
[Epoch 70 Batch 60/173] avg loss 0.00559822, throughput 5.08362K wps
[Epoch 70 Batch 90/173] avg loss 0.00547689, throughput 5.16334K wps
[Epoch 70 Batch 120/173] avg loss 0.00539116, throughput 5.74737K wps
[Epoch 70 Batch 150/173] avg loss 0.00557981, throughput 5.46872K wps
Begin Testing...
[Epoch 70] train avg loss 0.00558767, dev acc 0.8102, dev avg loss 0.41007, throughput 5.20747K wps
[Epoch 71 Batch 30/173] avg loss 0.00541335, throughput 5.12743K wps
[Epoch 71 Batch 60/173] avg loss 0.00551224, throughput 5.56491K wps
[Epoch 71 Batch 90/173] avg loss 0.0054485, throughput 6.27995K wps
[Epoch 71 Batch 120/173] avg loss 0.00587319, throughput 5.01473K wps
[Epoch 71 Batch 150/173] avg loss 0.00557463, throughput 4.80892K wps
Begin Testing...
[Epoch 71] train avg loss 0.00554915, dev acc 0.8040, dev avg loss 0.415665, throughput 5.32731K wps
[Epoch 72 Batch 30/173] avg loss 0.00528365, throughput 5.51084K wps
[Epoch 72 Batch 60/173] avg loss 0.00551484, throughput 5.03892K wps
[Epoch 72 Batch 90/173] avg loss 0.00559956, throughput 5.25946K wps
[Epoch 72 Batch 120/173] avg loss 0.00548269, throughput 6.59667K wps
[Epoch 72 Batch 150/173] avg loss 0.00558355, throughput 4.7862K wps
Begin Testing...
[Epoch 72] train avg loss 0.0054957, dev acc 0.8113, dev avg loss 0.40939, throughput 5.30442K wps
[Epoch 73 Batch 30/173] avg loss 0.00531836, throughput 4.9265K wps
[Epoch 73 Batch 60/173] avg loss 0.00564633, throughput 5.82346K wps
[Epoch 73 Batch 90/173] avg loss 0.00543951, throughput 5.69884K wps
[Epoch 73 Batch 120/173] avg loss 0.00536884, throughput 6.18328K wps
[Epoch 73 Batch 150/173] avg loss 0.0053888, throughput 4.80768K wps
Begin Testing...
[Epoch 73] train avg loss 0.00542011, dev acc 0.8102, dev avg loss 0.410673, throughput 5.34212K wps
[Epoch 74 Batch 30/173] avg loss 0.0052765, throughput 4.82448K wps
[Epoch 74 Batch 60/173] avg loss 0.00529176, throughput 5.73861K wps
[Epoch 74 Batch 90/173] avg loss 0.00522601, throughput 4.94464K wps
[Epoch 74 Batch 120/173] avg loss 0.0052106, throughput 5.35819K wps
[Epoch 74 Batch 150/173] avg loss 0.00542078, throughput 4.78432K wps
Begin Testing...
[Epoch 74] train avg loss 0.00531211, dev acc 0.8113, dev avg loss 0.410147, throughput 5.15735K wps
[Epoch 75 Batch 30/173] avg loss 0.00547606, throughput 5.81191K wps
[Epoch 75 Batch 60/173] avg loss 0.00533996, throughput 5.27094K wps
[Epoch 75 Batch 90/173] avg loss 0.0049779, throughput 4.81944K wps
[Epoch 75 Batch 120/173] avg loss 0.00534477, throughput 5.02944K wps
[Epoch 75 Batch 150/173] avg loss 0.0053863, throughput 5.90959K wps
Begin Testing...
[Epoch 75] train avg loss 0.00528984, dev acc 0.8113, dev avg loss 0.408965, throughput 5.235K wps
[Epoch 76 Batch 30/173] avg loss 0.00508673, throughput 5.69982K wps
[Epoch 76 Batch 60/173] avg loss 0.00526618, throughput 6.05851K wps
[Epoch 76 Batch 90/173] avg loss 0.00529446, throughput 5.33505K wps
[Epoch 76 Batch 120/173] avg loss 0.00531814, throughput 6.09382K wps
[Epoch 76 Batch 150/173] avg loss 0.00510117, throughput 5.22023K wps
Begin Testing...
[Epoch 76] train avg loss 0.00519577, dev acc 0.8102, dev avg loss 0.409226, throughput 5.5265K wps
[Epoch 77 Batch 30/173] avg loss 0.00471359, throughput 4.82672K wps
[Epoch 77 Batch 60/173] avg loss 0.00533168, throughput 5.56725K wps
[Epoch 77 Batch 90/173] avg loss 0.00498148, throughput 5.32382K wps
[Epoch 77 Batch 120/173] avg loss 0.00491387, throughput 5.1041K wps
[Epoch 77 Batch 150/173] avg loss 0.00544351, throughput 4.86149K wps
Begin Testing...
[Epoch 77] train avg loss 0.00508321, dev acc 0.8113, dev avg loss 0.408084, throughput 5.12436K wps
[Epoch 78 Batch 30/173] avg loss 0.00489811, throughput 5.60756K wps
[Epoch 78 Batch 60/173] avg loss 0.00532312, throughput 4.81835K wps
[Epoch 78 Batch 90/173] avg loss 0.00507898, throughput 5.25434K wps
[Epoch 78 Batch 120/173] avg loss 0.0050738, throughput 5.41349K wps
[Epoch 78 Batch 150/173] avg loss 0.00478276, throughput 5.09378K wps
Begin Testing...
[Epoch 78] train avg loss 0.00504615, dev acc 0.8133, dev avg loss 0.410617, throughput 5.17245K wps
Observed Improvement.
Begin Testing...
[Epoch 79 Batch 30/173] avg loss 0.00487312, throughput 4.6934K wps
[Epoch 79 Batch 60/173] avg loss 0.00495123, throughput 6.02037K wps
[Epoch 79 Batch 90/173] avg loss 0.00513471, throughput 5.21622K wps
[Epoch 79 Batch 120/173] avg loss 0.00513564, throughput 4.96831K wps
[Epoch 79 Batch 150/173] avg loss 0.00480389, throughput 5.49365K wps
Begin Testing...
[Epoch 79] train avg loss 0.00503141, dev acc 0.8144, dev avg loss 0.407573, throughput 5.24069K wps
Observed Improvement.
Begin Testing...
[Epoch 80 Batch 30/173] avg loss 0.00490564, throughput 4.84675K wps
[Epoch 80 Batch 60/173] avg loss 0.00455958, throughput 4.88763K wps
[Epoch 80 Batch 90/173] avg loss 0.00511413, throughput 5.35213K wps
[Epoch 80 Batch 120/173] avg loss 0.00472177, throughput 4.65483K wps
[Epoch 80 Batch 150/173] avg loss 0.00508293, throughput 5.26883K wps
Begin Testing...
[Epoch 80] train avg loss 0.00491501, dev acc 0.8133, dev avg loss 0.408193, throughput 5.02673K wps
[Epoch 81 Batch 30/173] avg loss 0.00484892, throughput 4.75197K wps
[Epoch 81 Batch 60/173] avg loss 0.00485706, throughput 5.47498K wps
[Epoch 81 Batch 90/173] avg loss 0.0047188, throughput 4.94656K wps
[Epoch 81 Batch 120/173] avg loss 0.00495294, throughput 5.0749K wps
[Epoch 81 Batch 150/173] avg loss 0.00490275, throughput 4.73869K wps
Begin Testing...
[Epoch 81] train avg loss 0.0048546, dev acc 0.8165, dev avg loss 0.407845, throughput 5.01395K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/173] avg loss 0.00469393, throughput 5.2006K wps
[Epoch 82 Batch 60/173] avg loss 0.00493624, throughput 4.97776K wps
[Epoch 82 Batch 90/173] avg loss 0.00460288, throughput 4.9665K wps
[Epoch 82 Batch 120/173] avg loss 0.00485069, throughput 4.86924K wps
[Epoch 82 Batch 150/173] avg loss 0.00494965, throughput 5.76107K wps
Begin Testing...
[Epoch 82] train avg loss 0.00481575, dev acc 0.8133, dev avg loss 0.4074, throughput 5.09836K wps
[Epoch 83 Batch 30/173] avg loss 0.00482491, throughput 5.08599K wps
[Epoch 83 Batch 60/173] avg loss 0.00459263, throughput 5.2345K wps
[Epoch 83 Batch 90/173] avg loss 0.0047123, throughput 5.27144K wps
[Epoch 83 Batch 120/173] avg loss 0.00478355, throughput 5.13586K wps
[Epoch 83 Batch 150/173] avg loss 0.00499494, throughput 5.25717K wps
Begin Testing...
[Epoch 83] train avg loss 0.0048058, dev acc 0.8081, dev avg loss 0.41151, throughput 5.18996K wps
[Epoch 84 Batch 30/173] avg loss 0.00453653, throughput 5.13932K wps
[Epoch 84 Batch 60/173] avg loss 0.0047049, throughput 4.94036K wps
[Epoch 84 Batch 90/173] avg loss 0.00463082, throughput 5K wps
[Epoch 84 Batch 120/173] avg loss 0.00481858, throughput 5.13729K wps
[Epoch 84 Batch 150/173] avg loss 0.00461176, throughput 6.23585K wps
Begin Testing...
[Epoch 84] train avg loss 0.00470317, dev acc 0.8165, dev avg loss 0.406583, throughput 5.20769K wps
Observed Improvement.
Begin Testing...
[Epoch 85 Batch 30/173] avg loss 0.00466312, throughput 5.40348K wps
[Epoch 85 Batch 60/173] avg loss 0.00462824, throughput 4.72686K wps
[Epoch 85 Batch 90/173] avg loss 0.00454005, throughput 5.7694K wps
[Epoch 85 Batch 120/173] avg loss 0.00441237, throughput 5.42738K wps
[Epoch 85 Batch 150/173] avg loss 0.00460404, throughput 5.72468K wps
Begin Testing...
[Epoch 85] train avg loss 0.00461282, dev acc 0.8123, dev avg loss 0.4086, throughput 5.38387K wps
[Epoch 86 Batch 30/173] avg loss 0.00459769, throughput 5.23148K wps
[Epoch 86 Batch 60/173] avg loss 0.00453515, throughput 4.86062K wps
[Epoch 86 Batch 90/173] avg loss 0.00462426, throughput 4.94391K wps
[Epoch 86 Batch 120/173] avg loss 0.00444497, throughput 5.37911K wps
[Epoch 86 Batch 150/173] avg loss 0.00475323, throughput 5.18243K wps
Begin Testing...
[Epoch 86] train avg loss 0.00459627, dev acc 0.8081, dev avg loss 0.410927, throughput 5.18196K wps
[Epoch 87 Batch 30/173] avg loss 0.00454654, throughput 5.27496K wps
[Epoch 87 Batch 60/173] avg loss 0.00437235, throughput 5.24368K wps
[Epoch 87 Batch 90/173] avg loss 0.00439875, throughput 5.79532K wps
[Epoch 87 Batch 120/173] avg loss 0.00463962, throughput 4.659K wps
[Epoch 87 Batch 150/173] avg loss 0.00444758, throughput 4.8747K wps
Begin Testing...
[Epoch 87] train avg loss 0.00450515, dev acc 0.8144, dev avg loss 0.406789, throughput 5.08473K wps
[Epoch 88 Batch 30/173] avg loss 0.00424128, throughput 5.50839K wps
[Epoch 88 Batch 60/173] avg loss 0.00431102, throughput 5.4146K wps
[Epoch 88 Batch 90/173] avg loss 0.00432729, throughput 5.30673K wps
[Epoch 88 Batch 120/173] avg loss 0.00448522, throughput 5.07817K wps
[Epoch 88 Batch 150/173] avg loss 0.00451346, throughput 4.66058K wps
Begin Testing...
[Epoch 88] train avg loss 0.00442925, dev acc 0.8154, dev avg loss 0.407531, throughput 5.11007K wps
[Epoch 89 Batch 30/173] avg loss 0.00421856, throughput 5.34116K wps
[Epoch 89 Batch 60/173] avg loss 0.00423719, throughput 5.33215K wps
[Epoch 89 Batch 90/173] avg loss 0.00425125, throughput 4.78237K wps
[Epoch 89 Batch 120/173] avg loss 0.00452634, throughput 4.83844K wps
[Epoch 89 Batch 150/173] avg loss 0.00437307, throughput 4.95622K wps
Begin Testing...
[Epoch 89] train avg loss 0.00433832, dev acc 0.8186, dev avg loss 0.406424, throughput 5.12699K wps
Observed Improvement.
Begin Testing...
[Epoch 90 Batch 30/173] avg loss 0.00434227, throughput 5.42001K wps
[Epoch 90 Batch 60/173] avg loss 0.00437363, throughput 4.80263K wps
[Epoch 90 Batch 90/173] avg loss 0.00450886, throughput 5.22169K wps
[Epoch 90 Batch 120/173] avg loss 0.00425152, throughput 5.15371K wps
[Epoch 90 Batch 150/173] avg loss 0.0042403, throughput 5.39053K wps
Begin Testing...
[Epoch 90] train avg loss 0.00434923, dev acc 0.8165, dev avg loss 0.406349, throughput 5.1853K wps
[Epoch 91 Batch 30/173] avg loss 0.004568, throughput 6.16674K wps
[Epoch 91 Batch 60/173] avg loss 0.00453424, throughput 4.66685K wps
[Epoch 91 Batch 90/173] avg loss 0.00418624, throughput 5.57276K wps
[Epoch 91 Batch 120/173] avg loss 0.004286, throughput 5.16973K wps
[Epoch 91 Batch 150/173] avg loss 0.00405804, throughput 5.51186K wps
Begin Testing...
[Epoch 91] train avg loss 0.00432969, dev acc 0.8102, dev avg loss 0.409448, throughput 5.34388K wps
[Epoch 92 Batch 30/173] avg loss 0.0041474, throughput 4.97944K wps
[Epoch 92 Batch 60/173] avg loss 0.00447527, throughput 5.33531K wps
[Epoch 92 Batch 90/173] avg loss 0.00423841, throughput 6.33431K wps
[Epoch 92 Batch 120/173] avg loss 0.00426278, throughput 5.02563K wps
[Epoch 92 Batch 150/173] avg loss 0.00376047, throughput 5.01967K wps
Begin Testing...
[Epoch 92] train avg loss 0.00420525, dev acc 0.8175, dev avg loss 0.407037, throughput 5.28385K wps
[Epoch 93 Batch 30/173] avg loss 0.00421941, throughput 5.28375K wps
[Epoch 93 Batch 60/173] avg loss 0.00411724, throughput 5.49255K wps
[Epoch 93 Batch 90/173] avg loss 0.0040244, throughput 4.75074K wps
[Epoch 93 Batch 120/173] avg loss 0.00393912, throughput 4.82197K wps
[Epoch 93 Batch 150/173] avg loss 0.00431336, throughput 4.83697K wps
Begin Testing...
[Epoch 93] train avg loss 0.0041229, dev acc 0.8175, dev avg loss 0.405901, throughput 5.1302K wps
[Epoch 94 Batch 30/173] avg loss 0.00391813, throughput 6.18469K wps
[Epoch 94 Batch 60/173] avg loss 0.00415996, throughput 5.50593K wps
[Epoch 94 Batch 90/173] avg loss 0.00407417, throughput 4.68277K wps
[Epoch 94 Batch 120/173] avg loss 0.00412256, throughput 4.82618K wps
[Epoch 94 Batch 150/173] avg loss 0.00433368, throughput 5.46927K wps
Begin Testing...
[Epoch 94] train avg loss 0.00413141, dev acc 0.8144, dev avg loss 0.40642, throughput 5.33611K wps
[Epoch 95 Batch 30/173] avg loss 0.00424651, throughput 4.95317K wps
[Epoch 95 Batch 60/173] avg loss 0.00409228, throughput 4.97826K wps
[Epoch 95 Batch 90/173] avg loss 0.00417591, throughput 6.01911K wps
[Epoch 95 Batch 120/173] avg loss 0.00369543, throughput 5.36406K wps
[Epoch 95 Batch 150/173] avg loss 0.00419287, throughput 4.84331K wps
Begin Testing...
[Epoch 95] train avg loss 0.00409436, dev acc 0.8123, dev avg loss 0.40707, throughput 5.32417K wps
[Epoch 96 Batch 30/173] avg loss 0.00396977, throughput 5.88164K wps
[Epoch 96 Batch 60/173] avg loss 0.00407739, throughput 5.21954K wps
[Epoch 96 Batch 90/173] avg loss 0.00424704, throughput 6.16285K wps
[Epoch 96 Batch 120/173] avg loss 0.0037767, throughput 5.40324K wps
[Epoch 96 Batch 150/173] avg loss 0.00411223, throughput 5.36637K wps
Begin Testing...
[Epoch 96] train avg loss 0.00400815, dev acc 0.8186, dev avg loss 0.406738, throughput 5.49528K wps
Observed Improvement.
Begin Testing...
[Epoch 97 Batch 30/173] avg loss 0.0040058, throughput 4.98367K wps
[Epoch 97 Batch 60/173] avg loss 0.00409636, throughput 5.64766K wps
[Epoch 97 Batch 90/173] avg loss 0.00375502, throughput 5.18908K wps
[Epoch 97 Batch 120/173] avg loss 0.00384471, throughput 5.27346K wps
[Epoch 97 Batch 150/173] avg loss 0.00400165, throughput 4.98962K wps
Begin Testing...
[Epoch 97] train avg loss 0.00397226, dev acc 0.8144, dev avg loss 0.407055, throughput 5.26407K wps
[Epoch 98 Batch 30/173] avg loss 0.00397623, throughput 6.23262K wps
[Epoch 98 Batch 60/173] avg loss 0.00405571, throughput 6.22807K wps
[Epoch 98 Batch 90/173] avg loss 0.00409323, throughput 5.26074K wps
[Epoch 98 Batch 120/173] avg loss 0.00395566, throughput 5.16304K wps
[Epoch 98 Batch 150/173] avg loss 0.00366308, throughput 5.88386K wps
Begin Testing...
[Epoch 98] train avg loss 0.00393577, dev acc 0.8165, dev avg loss 0.406216, throughput 5.61771K wps
[Epoch 99 Batch 30/173] avg loss 0.00393068, throughput 4.61959K wps
[Epoch 99 Batch 60/173] avg loss 0.00382182, throughput 6.1053K wps
[Epoch 99 Batch 90/173] avg loss 0.00396224, throughput 4.83504K wps
[Epoch 99 Batch 120/173] avg loss 0.00355877, throughput 4.91803K wps
[Epoch 99 Batch 150/173] avg loss 0.00413626, throughput 5.52217K wps
Begin Testing...
[Epoch 99] train avg loss 0.00386756, dev acc 0.8175, dev avg loss 0.407202, throughput 5.1165K wps
[Epoch 100 Batch 30/173] avg loss 0.00391665, throughput 5.12106K wps
[Epoch 100 Batch 60/173] avg loss 0.0037919, throughput 4.9705K wps
[Epoch 100 Batch 90/173] avg loss 0.00354187, throughput 5.42989K wps
[Epoch 100 Batch 120/173] avg loss 0.00379076, throughput 5.66972K wps
[Epoch 100 Batch 150/173] avg loss 0.00378568, throughput 5.50028K wps
Begin Testing...
[Epoch 100] train avg loss 0.00379943, dev acc 0.8113, dev avg loss 0.41212, throughput 5.34021K wps
[Epoch 101 Batch 30/173] avg loss 0.00358923, throughput 5.13669K wps
[Epoch 101 Batch 60/173] avg loss 0.00391019, throughput 5.37331K wps
[Epoch 101 Batch 90/173] avg loss 0.00368661, throughput 6.10315K wps
[Epoch 101 Batch 120/173] avg loss 0.00380136, throughput 5.04011K wps
[Epoch 101 Batch 150/173] avg loss 0.00367846, throughput 5.35026K wps
Begin Testing...
[Epoch 101] train avg loss 0.00372457, dev acc 0.8144, dev avg loss 0.407656, throughput 5.29373K wps
[Epoch 102 Batch 30/173] avg loss 0.00365256, throughput 5.23081K wps
[Epoch 102 Batch 60/173] avg loss 0.00359551, throughput 4.83523K wps
[Epoch 102 Batch 90/173] avg loss 0.0036554, throughput 6.58107K wps
[Epoch 102 Batch 120/173] avg loss 0.0038825, throughput 5.23987K wps
[Epoch 102 Batch 150/173] avg loss 0.00362984, throughput 5.05707K wps
Begin Testing...
[Epoch 102] train avg loss 0.0036832, dev acc 0.8081, dev avg loss 0.408933, throughput 5.24455K wps
[Epoch 103 Batch 30/173] avg loss 0.00367633, throughput 4.99129K wps
[Epoch 103 Batch 60/173] avg loss 0.00384195, throughput 5.70216K wps
[Epoch 103 Batch 90/173] avg loss 0.00337943, throughput 5.17593K wps
[Epoch 103 Batch 120/173] avg loss 0.00388539, throughput 5.13644K wps
[Epoch 103 Batch 150/173] avg loss 0.00365406, throughput 6.30853K wps
Begin Testing...
[Epoch 103] train avg loss 0.00366851, dev acc 0.8092, dev avg loss 0.413172, throughput 5.3472K wps
[Epoch 104 Batch 30/173] avg loss 0.00357243, throughput 5.11526K wps
[Epoch 104 Batch 60/173] avg loss 0.00341944, throughput 5.45579K wps
[Epoch 104 Batch 90/173] avg loss 0.00366508, throughput 5.71481K wps
[Epoch 104 Batch 120/173] avg loss 0.00382973, throughput 6.34919K wps
[Epoch 104 Batch 150/173] avg loss 0.00344335, throughput 4.70198K wps
Begin Testing...
[Epoch 104] train avg loss 0.00360709, dev acc 0.8165, dev avg loss 0.408059, throughput 5.30418K wps
[Epoch 105 Batch 30/173] avg loss 0.00364584, throughput 6.2621K wps
[Epoch 105 Batch 60/173] avg loss 0.00359333, throughput 5.19881K wps
[Epoch 105 Batch 90/173] avg loss 0.00361679, throughput 5.06579K wps
[Epoch 105 Batch 120/173] avg loss 0.00335608, throughput 5.03634K wps
[Epoch 105 Batch 150/173] avg loss 0.00353724, throughput 5.3755K wps
Begin Testing...
[Epoch 105] train avg loss 0.00354419, dev acc 0.8165, dev avg loss 0.407006, throughput 5.35609K wps
[Epoch 106 Batch 30/173] avg loss 0.00360021, throughput 5.75659K wps
[Epoch 106 Batch 60/173] avg loss 0.0036069, throughput 5.20094K wps
[Epoch 106 Batch 90/173] avg loss 0.00352448, throughput 4.74236K wps
[Epoch 106 Batch 120/173] avg loss 0.0033638, throughput 4.75162K wps
[Epoch 106 Batch 150/173] avg loss 0.00344131, throughput 5.17628K wps
Begin Testing...
[Epoch 106] train avg loss 0.00350764, dev acc 0.8165, dev avg loss 0.408164, throughput 5.18568K wps
[Epoch 107 Batch 30/173] avg loss 0.00344737, throughput 5.16486K wps
[Epoch 107 Batch 60/173] avg loss 0.00345137, throughput 5.02971K wps
[Epoch 107 Batch 90/173] avg loss 0.00345604, throughput 6.60416K wps
[Epoch 107 Batch 120/173] avg loss 0.00340186, throughput 5.24209K wps
[Epoch 107 Batch 150/173] avg loss 0.00341703, throughput 5.08095K wps
Begin Testing...
[Epoch 107] train avg loss 0.00344802, dev acc 0.8165, dev avg loss 0.406664, throughput 5.3325K wps
[Epoch 108 Batch 30/173] avg loss 0.00344395, throughput 5.06387K wps
[Epoch 108 Batch 60/173] avg loss 0.00330894, throughput 5.86472K wps
[Epoch 108 Batch 90/173] avg loss 0.00361487, throughput 5.06988K wps
[Epoch 108 Batch 120/173] avg loss 0.00339241, throughput 5.38123K wps
[Epoch 108 Batch 150/173] avg loss 0.00348504, throughput 5.78923K wps
Begin Testing...
[Epoch 108] train avg loss 0.00344533, dev acc 0.8165, dev avg loss 0.407182, throughput 5.39899K wps
[Epoch 109 Batch 30/173] avg loss 0.00335493, throughput 4.80621K wps
[Epoch 109 Batch 60/173] avg loss 0.00338884, throughput 4.79085K wps
[Epoch 109 Batch 90/173] avg loss 0.00330234, throughput 4.92185K wps
[Epoch 109 Batch 120/173] avg loss 0.00328047, throughput 5.45026K wps
[Epoch 109 Batch 150/173] avg loss 0.00350065, throughput 5.91744K wps
Begin Testing...
[Epoch 109] train avg loss 0.0033911, dev acc 0.8113, dev avg loss 0.408796, throughput 5.18877K wps
[Epoch 110 Batch 30/173] avg loss 0.0033834, throughput 5.46688K wps
[Epoch 110 Batch 60/173] avg loss 0.00339844, throughput 4.97632K wps
[Epoch 110 Batch 90/173] avg loss 0.00339256, throughput 5.04703K wps
[Epoch 110 Batch 120/173] avg loss 0.00333881, throughput 5.36332K wps
[Epoch 110 Batch 150/173] avg loss 0.00314654, throughput 5.6678K wps
Begin Testing...
[Epoch 110] train avg loss 0.00333933, dev acc 0.8123, dev avg loss 0.412244, throughput 5.32677K wps
[Epoch 111 Batch 30/173] avg loss 0.00339443, throughput 4.73907K wps
[Epoch 111 Batch 60/173] avg loss 0.00303044, throughput 5.08779K wps
[Epoch 111 Batch 90/173] avg loss 0.00335021, throughput 6.14958K wps
[Epoch 111 Batch 120/173] avg loss 0.00314776, throughput 5.30339K wps
[Epoch 111 Batch 150/173] avg loss 0.0034189, throughput 5.45305K wps
Begin Testing...
[Epoch 111] train avg loss 0.00330057, dev acc 0.8175, dev avg loss 0.408177, throughput 5.21334K wps
[Epoch 112 Batch 30/173] avg loss 0.00330182, throughput 5.06209K wps
[Epoch 112 Batch 60/173] avg loss 0.00332958, throughput 5.02813K wps
[Epoch 112 Batch 90/173] avg loss 0.00345846, throughput 4.84541K wps
[Epoch 112 Batch 120/173] avg loss 0.00318718, throughput 4.77848K wps
[Epoch 112 Batch 150/173] avg loss 0.00347798, throughput 5.38352K wps
Begin Testing...
[Epoch 112] train avg loss 0.00331821, dev acc 0.8196, dev avg loss 0.408888, throughput 5.16933K wps
Observed Improvement.
Begin Testing...
[Epoch 113 Batch 30/173] avg loss 0.00322634, throughput 5.23437K wps
[Epoch 113 Batch 60/173] avg loss 0.00334466, throughput 4.91009K wps
[Epoch 113 Batch 90/173] avg loss 0.00308474, throughput 5.17789K wps
[Epoch 113 Batch 120/173] avg loss 0.00321522, throughput 5.40574K wps
[Epoch 113 Batch 150/173] avg loss 0.00331628, throughput 5.14828K wps
Begin Testing...
[Epoch 113] train avg loss 0.00324598, dev acc 0.8123, dev avg loss 0.411295, throughput 5.20957K wps
[Epoch 114 Batch 30/173] avg loss 0.00307645, throughput 5.70418K wps
[Epoch 114 Batch 60/173] avg loss 0.00312445, throughput 5.03566K wps
[Epoch 114 Batch 90/173] avg loss 0.00325404, throughput 5.56503K wps
[Epoch 114 Batch 120/173] avg loss 0.00321538, throughput 5.06576K wps
[Epoch 114 Batch 150/173] avg loss 0.00325118, throughput 4.86402K wps
Begin Testing...
[Epoch 114] train avg loss 0.00320619, dev acc 0.8186, dev avg loss 0.408565, throughput 5.27433K wps
[Epoch 115 Batch 30/173] avg loss 0.00324366, throughput 5.63542K wps
[Epoch 115 Batch 60/173] avg loss 0.00311481, throughput 5.47277K wps
[Epoch 115 Batch 90/173] avg loss 0.00316038, throughput 4.76833K wps
[Epoch 115 Batch 120/173] avg loss 0.00323805, throughput 6.04915K wps
[Epoch 115 Batch 150/173] avg loss 0.00303374, throughput 6.07352K wps
Begin Testing...
[Epoch 115] train avg loss 0.00321963, dev acc 0.8175, dev avg loss 0.408199, throughput 5.63255K wps
[Epoch 116 Batch 30/173] avg loss 0.00304732, throughput 5.26677K wps
[Epoch 116 Batch 60/173] avg loss 0.00305157, throughput 5.32292K wps
[Epoch 116 Batch 90/173] avg loss 0.0030225, throughput 5.30391K wps
[Epoch 116 Batch 120/173] avg loss 0.0030346, throughput 5.79653K wps
[Epoch 116 Batch 150/173] avg loss 0.00304033, throughput 5.09962K wps
Begin Testing...
[Epoch 116] train avg loss 0.00306684, dev acc 0.8113, dev avg loss 0.412401, throughput 5.35484K wps
[Epoch 117 Batch 30/173] avg loss 0.00314285, throughput 4.97192K wps
[Epoch 117 Batch 60/173] avg loss 0.00294186, throughput 5.27503K wps
[Epoch 117 Batch 90/173] avg loss 0.00302387, throughput 5.58878K wps
[Epoch 117 Batch 120/173] avg loss 0.00306613, throughput 5.38536K wps
[Epoch 117 Batch 150/173] avg loss 0.00310627, throughput 5.72373K wps
Begin Testing...
[Epoch 117] train avg loss 0.00305281, dev acc 0.8165, dev avg loss 0.409396, throughput 5.35957K wps
[Epoch 118 Batch 30/173] avg loss 0.00289227, throughput 5.12054K wps
[Epoch 118 Batch 60/173] avg loss 0.00293707, throughput 5.30963K wps
[Epoch 118 Batch 90/173] avg loss 0.00312715, throughput 5.04769K wps
[Epoch 118 Batch 120/173] avg loss 0.0029211, throughput 6.09784K wps
[Epoch 118 Batch 150/173] avg loss 0.00303951, throughput 6.33243K wps
Begin Testing...
[Epoch 118] train avg loss 0.00299368, dev acc 0.8186, dev avg loss 0.409243, throughput 5.52141K wps
[Epoch 119 Batch 30/173] avg loss 0.00301508, throughput 5.38312K wps
[Epoch 119 Batch 60/173] avg loss 0.00309227, throughput 4.78186K wps
[Epoch 119 Batch 90/173] avg loss 0.00293446, throughput 5.46281K wps
[Epoch 119 Batch 120/173] avg loss 0.00301777, throughput 4.95566K wps
[Epoch 119 Batch 150/173] avg loss 0.0029745, throughput 5.37745K wps
Begin Testing...
[Epoch 119] train avg loss 0.00302738, dev acc 0.8186, dev avg loss 0.409709, throughput 5.12676K wps
[Epoch 120 Batch 30/173] avg loss 0.00289634, throughput 4.87622K wps
[Epoch 120 Batch 60/173] avg loss 0.00276056, throughput 5.25428K wps
[Epoch 120 Batch 90/173] avg loss 0.00297783, throughput 5.268K wps
[Epoch 120 Batch 120/173] avg loss 0.00293252, throughput 6.35938K wps
[Epoch 120 Batch 150/173] avg loss 0.0032098, throughput 5.00025K wps
Begin Testing...
[Epoch 120] train avg loss 0.00296547, dev acc 0.8113, dev avg loss 0.411864, throughput 5.31812K wps
[Epoch 121 Batch 30/173] avg loss 0.00308841, throughput 5.12195K wps
[Epoch 121 Batch 60/173] avg loss 0.00296146, throughput 5.27014K wps
[Epoch 121 Batch 90/173] avg loss 0.00279335, throughput 5.85274K wps
[Epoch 121 Batch 120/173] avg loss 0.0028518, throughput 5.46365K wps
[Epoch 121 Batch 150/173] avg loss 0.00294282, throughput 5.62742K wps
Begin Testing...
[Epoch 121] train avg loss 0.00293219, dev acc 0.8196, dev avg loss 0.410963, throughput 5.39704K wps
Observed Improvement.
Begin Testing...
[Epoch 122 Batch 30/173] avg loss 0.00255679, throughput 4.75722K wps
[Epoch 122 Batch 60/173] avg loss 0.00294081, throughput 5.25287K wps
[Epoch 122 Batch 90/173] avg loss 0.00300604, throughput 5.45419K wps
[Epoch 122 Batch 120/173] avg loss 0.00281933, throughput 6.20224K wps
[Epoch 122 Batch 150/173] avg loss 0.00289568, throughput 5.89231K wps
Begin Testing...
[Epoch 122] train avg loss 0.00286793, dev acc 0.8186, dev avg loss 0.409887, throughput 5.46948K wps
[Epoch 123 Batch 30/173] avg loss 0.00286902, throughput 5.28916K wps
[Epoch 123 Batch 60/173] avg loss 0.00269057, throughput 5.52243K wps
[Epoch 123 Batch 90/173] avg loss 0.00280322, throughput 5.05898K wps
[Epoch 123 Batch 120/173] avg loss 0.0030221, throughput 5.54928K wps
[Epoch 123 Batch 150/173] avg loss 0.00285635, throughput 4.84113K wps
Begin Testing...
[Epoch 123] train avg loss 0.00287178, dev acc 0.8123, dev avg loss 0.410271, throughput 5.23362K wps
[Epoch 124 Batch 30/173] avg loss 0.00271678, throughput 5.40145K wps
[Epoch 124 Batch 60/173] avg loss 0.00259696, throughput 6.23966K wps
[Epoch 124 Batch 90/173] avg loss 0.00316402, throughput 5.73239K wps
[Epoch 124 Batch 120/173] avg loss 0.00275303, throughput 6.29341K wps
[Epoch 124 Batch 150/173] avg loss 0.0028185, throughput 4.89101K wps
Begin Testing...
[Epoch 124] train avg loss 0.00280693, dev acc 0.8186, dev avg loss 0.410436, throughput 5.61121K wps
[Epoch 125 Batch 30/173] avg loss 0.00280085, throughput 5.78371K wps
[Epoch 125 Batch 60/173] avg loss 0.00264885, throughput 5.18954K wps
[Epoch 125 Batch 90/173] avg loss 0.00296715, throughput 5.43096K wps
[Epoch 125 Batch 120/173] avg loss 0.00273667, throughput 4.93867K wps
[Epoch 125 Batch 150/173] avg loss 0.00271663, throughput 5.16281K wps
Begin Testing...
[Epoch 125] train avg loss 0.00276232, dev acc 0.8165, dev avg loss 0.410537, throughput 5.38169K wps
[Epoch 126 Batch 30/173] avg loss 0.00279034, throughput 4.8099K wps
[Epoch 126 Batch 60/173] avg loss 0.0026057, throughput 5.18318K wps
[Epoch 126 Batch 90/173] avg loss 0.002737, throughput 5.77938K wps
[Epoch 126 Batch 120/173] avg loss 0.00270085, throughput 5.50642K wps
[Epoch 126 Batch 150/173] avg loss 0.00280488, throughput 5.21713K wps
Begin Testing...
[Epoch 126] train avg loss 0.00273906, dev acc 0.8186, dev avg loss 0.410649, throughput 5.20925K wps
[Epoch 127 Batch 30/173] avg loss 0.00290117, throughput 5.27092K wps
[Epoch 127 Batch 60/173] avg loss 0.00273095, throughput 6.00773K wps
[Epoch 127 Batch 90/173] avg loss 0.00294068, throughput 4.85397K wps
[Epoch 127 Batch 120/173] avg loss 0.00286731, throughput 5.77945K wps
[Epoch 127 Batch 150/173] avg loss 0.00257613, throughput 5.29379K wps
Begin Testing...
[Epoch 127] train avg loss 0.00280225, dev acc 0.8196, dev avg loss 0.41115, throughput 5.38894K wps
Observed Improvement.
Begin Testing...
[Epoch 128 Batch 30/173] avg loss 0.00250217, throughput 5.84995K wps
[Epoch 128 Batch 60/173] avg loss 0.00268243, throughput 5.18773K wps
[Epoch 128 Batch 90/173] avg loss 0.00262136, throughput 5.16479K wps
[Epoch 128 Batch 120/173] avg loss 0.00266855, throughput 5.02075K wps
[Epoch 128 Batch 150/173] avg loss 0.00262578, throughput 5.40457K wps
Begin Testing...
[Epoch 128] train avg loss 0.00264774, dev acc 0.8165, dev avg loss 0.411612, throughput 5.30592K wps
[Epoch 129 Batch 30/173] avg loss 0.00275028, throughput 5.12713K wps
[Epoch 129 Batch 60/173] avg loss 0.00279949, throughput 6.04903K wps
[Epoch 129 Batch 90/173] avg loss 0.00249972, throughput 5.76449K wps
[Epoch 129 Batch 120/173] avg loss 0.00255276, throughput 4.96059K wps
[Epoch 129 Batch 150/173] avg loss 0.00264594, throughput 5.62927K wps
Begin Testing...
[Epoch 129] train avg loss 0.00267361, dev acc 0.8175, dev avg loss 0.412546, throughput 5.48833K wps
[Epoch 130 Batch 30/173] avg loss 0.00247462, throughput 4.77035K wps
[Epoch 130 Batch 60/173] avg loss 0.00275171, throughput 5.63475K wps
[Epoch 130 Batch 90/173] avg loss 0.0025397, throughput 5.47079K wps
[Epoch 130 Batch 120/173] avg loss 0.0024531, throughput 5.1811K wps
[Epoch 130 Batch 150/173] avg loss 0.00286562, throughput 5.04938K wps
Begin Testing...
[Epoch 130] train avg loss 0.00261194, dev acc 0.8196, dev avg loss 0.411725, throughput 5.20843K wps
Observed Improvement.
Begin Testing...
[Epoch 131 Batch 30/173] avg loss 0.00267438, throughput 4.88895K wps
[Epoch 131 Batch 60/173] avg loss 0.00263845, throughput 5.28714K wps
[Epoch 131 Batch 90/173] avg loss 0.00267459, throughput 5.66655K wps
[Epoch 131 Batch 120/173] avg loss 0.00250109, throughput 5.71303K wps
[Epoch 131 Batch 150/173] avg loss 0.00282488, throughput 5.53724K wps
Begin Testing...
[Epoch 131] train avg loss 0.00265528, dev acc 0.8175, dev avg loss 0.412048, throughput 5.49354K wps
[Epoch 132 Batch 30/173] avg loss 0.00257599, throughput 4.74496K wps
[Epoch 132 Batch 60/173] avg loss 0.00271519, throughput 5.09601K wps
[Epoch 132 Batch 90/173] avg loss 0.00254307, throughput 4.96666K wps
[Epoch 132 Batch 120/173] avg loss 0.00242527, throughput 4.97174K wps
[Epoch 132 Batch 150/173] avg loss 0.00256552, throughput 5.94514K wps
Begin Testing...
[Epoch 132] train avg loss 0.00257536, dev acc 0.8123, dev avg loss 0.415236, throughput 5.10726K wps
[Epoch 133 Batch 30/173] avg loss 0.00263745, throughput 6.11003K wps
[Epoch 133 Batch 60/173] avg loss 0.00249664, throughput 4.91829K wps
[Epoch 133 Batch 90/173] avg loss 0.00274002, throughput 5.24056K wps
[Epoch 133 Batch 120/173] avg loss 0.00240384, throughput 5.23422K wps
[Epoch 133 Batch 150/173] avg loss 0.00253782, throughput 4.85533K wps
Begin Testing...
[Epoch 133] train avg loss 0.00254942, dev acc 0.8175, dev avg loss 0.412527, throughput 5.21787K wps
[Epoch 134 Batch 30/173] avg loss 0.00256914, throughput 5.09669K wps
[Epoch 134 Batch 60/173] avg loss 0.00248119, throughput 5.90375K wps
[Epoch 134 Batch 90/173] avg loss 0.00269653, throughput 5.36546K wps
[Epoch 134 Batch 120/173] avg loss 0.00261866, throughput 4.78017K wps
[Epoch 134 Batch 150/173] avg loss 0.00238412, throughput 5.62087K wps
Begin Testing...
[Epoch 134] train avg loss 0.00251142, dev acc 0.8227, dev avg loss 0.412327, throughput 5.35563K wps
Observed Improvement.
Begin Testing...
[Epoch 135 Batch 30/173] avg loss 0.00251079, throughput 5.01185K wps
[Epoch 135 Batch 60/173] avg loss 0.00241126, throughput 4.92495K wps
[Epoch 135 Batch 90/173] avg loss 0.00244166, throughput 5.06782K wps
[Epoch 135 Batch 120/173] avg loss 0.00239619, throughput 5.55014K wps
[Epoch 135 Batch 150/173] avg loss 0.00241405, throughput 5.78365K wps
Begin Testing...
[Epoch 135] train avg loss 0.00243672, dev acc 0.8186, dev avg loss 0.412401, throughput 5.18702K wps
[Epoch 136 Batch 30/173] avg loss 0.00248051, throughput 5.33882K wps
[Epoch 136 Batch 60/173] avg loss 0.00240887, throughput 5.69404K wps
[Epoch 136 Batch 90/173] avg loss 0.00242615, throughput 5.7359K wps
[Epoch 136 Batch 120/173] avg loss 0.00235885, throughput 5.66502K wps
[Epoch 136 Batch 150/173] avg loss 0.00260808, throughput 4.99602K wps
Begin Testing...
[Epoch 136] train avg loss 0.00244168, dev acc 0.8175, dev avg loss 0.413196, throughput 5.46412K wps
[Epoch 137 Batch 30/173] avg loss 0.00242713, throughput 6.30582K wps
[Epoch 137 Batch 60/173] avg loss 0.00239075, throughput 5.65024K wps
[Epoch 137 Batch 90/173] avg loss 0.00239828, throughput 5.65788K wps
[Epoch 137 Batch 120/173] avg loss 0.00232012, throughput 4.66811K wps
[Epoch 137 Batch 150/173] avg loss 0.00252807, throughput 4.69042K wps
Begin Testing...
[Epoch 137] train avg loss 0.00242234, dev acc 0.8144, dev avg loss 0.414424, throughput 5.26143K wps
[Epoch 138 Batch 30/173] avg loss 0.00228849, throughput 5.31854K wps
[Epoch 138 Batch 60/173] avg loss 0.0023573, throughput 5.46097K wps
[Epoch 138 Batch 90/173] avg loss 0.00239405, throughput 6.09884K wps
[Epoch 138 Batch 120/173] avg loss 0.00255544, throughput 5.36594K wps
[Epoch 138 Batch 150/173] avg loss 0.00229004, throughput 5.69675K wps
Begin Testing...
[Epoch 138] train avg loss 0.00241551, dev acc 0.8196, dev avg loss 0.413858, throughput 5.56736K wps
[Epoch 139 Batch 30/173] avg loss 0.00253767, throughput 5.21369K wps
[Epoch 139 Batch 60/173] avg loss 0.00228083, throughput 5.19928K wps
[Epoch 139 Batch 90/173] avg loss 0.00234984, throughput 5.13577K wps
[Epoch 139 Batch 120/173] avg loss 0.0023187, throughput 5.3611K wps
[Epoch 139 Batch 150/173] avg loss 0.00243953, throughput 5.21504K wps
Begin Testing...
[Epoch 139] train avg loss 0.00240267, dev acc 0.8175, dev avg loss 0.414175, throughput 5.26228K wps
[Epoch 140 Batch 30/173] avg loss 0.00235185, throughput 5.3394K wps
[Epoch 140 Batch 60/173] avg loss 0.00244174, throughput 4.974K wps
[Epoch 140 Batch 90/173] avg loss 0.00230174, throughput 4.94757K wps
[Epoch 140 Batch 120/173] avg loss 0.00242485, throughput 5.60198K wps
[Epoch 140 Batch 150/173] avg loss 0.00228991, throughput 5.25212K wps
Begin Testing...
[Epoch 140] train avg loss 0.00235906, dev acc 0.8175, dev avg loss 0.41492, throughput 5.36926K wps
[Epoch 141 Batch 30/173] avg loss 0.00245053, throughput 4.85331K wps
[Epoch 141 Batch 60/173] avg loss 0.00257733, throughput 5.48273K wps
[Epoch 141 Batch 90/173] avg loss 0.00247683, throughput 5.67883K wps
[Epoch 141 Batch 120/173] avg loss 0.00223833, throughput 4.86272K wps
[Epoch 141 Batch 150/173] avg loss 0.00243404, throughput 6.10005K wps
Begin Testing...
[Epoch 141] train avg loss 0.00240286, dev acc 0.8186, dev avg loss 0.415522, throughput 5.40319K wps
[Epoch 142 Batch 30/173] avg loss 0.00242984, throughput 4.80125K wps
[Epoch 142 Batch 60/173] avg loss 0.00231676, throughput 5.22457K wps
[Epoch 142 Batch 90/173] avg loss 0.00222909, throughput 5.59169K wps
[Epoch 142 Batch 120/173] avg loss 0.00211174, throughput 5.3883K wps
[Epoch 142 Batch 150/173] avg loss 0.00230302, throughput 5.35084K wps
Begin Testing...
[Epoch 142] train avg loss 0.00229403, dev acc 0.8175, dev avg loss 0.41575, throughput 5.2621K wps
[Epoch 143 Batch 30/173] avg loss 0.00221815, throughput 5.17669K wps
[Epoch 143 Batch 60/173] avg loss 0.00234612, throughput 5.10325K wps
[Epoch 143 Batch 90/173] avg loss 0.00233994, throughput 5.02675K wps
[Epoch 143 Batch 120/173] avg loss 0.0021805, throughput 4.98187K wps
[Epoch 143 Batch 150/173] avg loss 0.00232073, throughput 6.09415K wps
Begin Testing...
[Epoch 143] train avg loss 0.00229656, dev acc 0.8102, dev avg loss 0.419361, throughput 5.26711K wps
[Epoch 144 Batch 30/173] avg loss 0.00207996, throughput 5.03951K wps
[Epoch 144 Batch 60/173] avg loss 0.00209327, throughput 5.04938K wps
[Epoch 144 Batch 90/173] avg loss 0.00224103, throughput 5.64506K wps
[Epoch 144 Batch 120/173] avg loss 0.00239744, throughput 5.07715K wps
[Epoch 144 Batch 150/173] avg loss 0.00229125, throughput 5.54549K wps
Begin Testing...
[Epoch 144] train avg loss 0.00221918, dev acc 0.8186, dev avg loss 0.415705, throughput 5.25K wps
[Epoch 145 Batch 30/173] avg loss 0.00224216, throughput 5.18768K wps
[Epoch 145 Batch 60/173] avg loss 0.00235972, throughput 5.09159K wps
[Epoch 145 Batch 90/173] avg loss 0.00225739, throughput 5.03773K wps
[Epoch 145 Batch 120/173] avg loss 0.002207, throughput 5.57742K wps
[Epoch 145 Batch 150/173] avg loss 0.0021871, throughput 5.1017K wps
Begin Testing...
[Epoch 145] train avg loss 0.00224331, dev acc 0.8144, dev avg loss 0.418835, throughput 5.14331K wps
[Epoch 146 Batch 30/173] avg loss 0.00199564, throughput 5.35274K wps
[Epoch 146 Batch 60/173] avg loss 0.00232802, throughput 5.84453K wps
[Epoch 146 Batch 90/173] avg loss 0.00231345, throughput 5.99648K wps
[Epoch 146 Batch 120/173] avg loss 0.00230298, throughput 5.88357K wps
[Epoch 146 Batch 150/173] avg loss 0.00206485, throughput 5.28465K wps
Begin Testing...
[Epoch 146] train avg loss 0.00218286, dev acc 0.8165, dev avg loss 0.417197, throughput 5.66108K wps
[Epoch 147 Batch 30/173] avg loss 0.00216643, throughput 5.27262K wps
[Epoch 147 Batch 60/173] avg loss 0.00216319, throughput 5.03226K wps
[Epoch 147 Batch 90/173] avg loss 0.00207614, throughput 4.98702K wps
[Epoch 147 Batch 120/173] avg loss 0.00234343, throughput 5.20174K wps
[Epoch 147 Batch 150/173] avg loss 0.00208741, throughput 4.73372K wps
Begin Testing...
[Epoch 147] train avg loss 0.00214287, dev acc 0.8186, dev avg loss 0.415833, throughput 5.20893K wps
[Epoch 148 Batch 30/173] avg loss 0.00212655, throughput 5.04631K wps
[Epoch 148 Batch 60/173] avg loss 0.00207901, throughput 5.25064K wps
[Epoch 148 Batch 90/173] avg loss 0.00223966, throughput 5.00686K wps
[Epoch 148 Batch 120/173] avg loss 0.00214694, throughput 5.62061K wps
[Epoch 148 Batch 150/173] avg loss 0.00227956, throughput 4.90948K wps
Begin Testing...
[Epoch 148] train avg loss 0.00217099, dev acc 0.8206, dev avg loss 0.416607, throughput 5.1238K wps
[Epoch 149 Batch 30/173] avg loss 0.0021759, throughput 6.10916K wps
[Epoch 149 Batch 60/173] avg loss 0.00215864, throughput 5.1966K wps
[Epoch 149 Batch 90/173] avg loss 0.00219154, throughput 5.48311K wps
[Epoch 149 Batch 120/173] avg loss 0.00208315, throughput 5.46839K wps
[Epoch 149 Batch 150/173] avg loss 0.00228462, throughput 4.99491K wps
Begin Testing...
[Epoch 149] train avg loss 0.00217007, dev acc 0.8186, dev avg loss 0.417514, throughput 5.38296K wps
[Epoch 150 Batch 30/173] avg loss 0.00204982, throughput 5.93287K wps
[Epoch 150 Batch 60/173] avg loss 0.00206352, throughput 5.23007K wps
[Epoch 150 Batch 90/173] avg loss 0.00216196, throughput 5.67287K wps
[Epoch 150 Batch 120/173] avg loss 0.0020625, throughput 5.81872K wps
[Epoch 150 Batch 150/173] avg loss 0.00228892, throughput 5.2531K wps
Begin Testing...
[Epoch 150] train avg loss 0.00212313, dev acc 0.8186, dev avg loss 0.417343, throughput 5.47058K wps
[Epoch 151 Batch 30/173] avg loss 0.00203446, throughput 5.38763K wps
[Epoch 151 Batch 60/173] avg loss 0.00198258, throughput 5.78004K wps
[Epoch 151 Batch 90/173] avg loss 0.00206041, throughput 5.54917K wps
[Epoch 151 Batch 120/173] avg loss 0.00221349, throughput 4.81984K wps
[Epoch 151 Batch 150/173] avg loss 0.00199878, throughput 4.85327K wps
Begin Testing...
[Epoch 151] train avg loss 0.00206885, dev acc 0.8165, dev avg loss 0.419307, throughput 5.24733K wps
[Epoch 152 Batch 30/173] avg loss 0.00214218, throughput 5.87436K wps
[Epoch 152 Batch 60/173] avg loss 0.00182638, throughput 5.67247K wps
[Epoch 152 Batch 90/173] avg loss 0.00206676, throughput 4.87353K wps
[Epoch 152 Batch 120/173] avg loss 0.00213216, throughput 5.20172K wps
[Epoch 152 Batch 150/173] avg loss 0.0020948, throughput 4.91654K wps
Begin Testing...
[Epoch 152] train avg loss 0.00207645, dev acc 0.8186, dev avg loss 0.418906, throughput 5.26515K wps
[Epoch 153 Batch 30/173] avg loss 0.00211519, throughput 4.92817K wps
[Epoch 153 Batch 60/173] avg loss 0.00196662, throughput 4.71051K wps
[Epoch 153 Batch 90/173] avg loss 0.00205335, throughput 5.25307K wps
[Epoch 153 Batch 120/173] avg loss 0.00205368, throughput 5.45409K wps
[Epoch 153 Batch 150/173] avg loss 0.00202204, throughput 5.74532K wps
Begin Testing...
[Epoch 153] train avg loss 0.00203329, dev acc 0.8071, dev avg loss 0.422433, throughput 5.16206K wps
[Epoch 154 Batch 30/173] avg loss 0.00206422, throughput 4.89887K wps
[Epoch 154 Batch 60/173] avg loss 0.00198542, throughput 4.99915K wps
[Epoch 154 Batch 90/173] avg loss 0.00203401, throughput 5.36177K wps
[Epoch 154 Batch 120/173] avg loss 0.00201777, throughput 5.57473K wps
[Epoch 154 Batch 150/173] avg loss 0.00205254, throughput 5.32027K wps
Begin Testing...
[Epoch 154] train avg loss 0.00201428, dev acc 0.8217, dev avg loss 0.419118, throughput 5.21185K wps
[Epoch 155 Batch 30/173] avg loss 0.00210308, throughput 5.12035K wps
[Epoch 155 Batch 60/173] avg loss 0.00193542, throughput 5.25413K wps
[Epoch 155 Batch 90/173] avg loss 0.00199598, throughput 4.95739K wps
[Epoch 155 Batch 120/173] avg loss 0.00198874, throughput 5.97967K wps
[Epoch 155 Batch 150/173] avg loss 0.0020414, throughput 5.3378K wps
Begin Testing...
[Epoch 155] train avg loss 0.00200446, dev acc 0.8165, dev avg loss 0.419522, throughput 5.2872K wps
[Epoch 156 Batch 30/173] avg loss 0.00196363, throughput 5.83918K wps
[Epoch 156 Batch 60/173] avg loss 0.00195268, throughput 5.97272K wps
[Epoch 156 Batch 90/173] avg loss 0.00199309, throughput 4.9996K wps
[Epoch 156 Batch 120/173] avg loss 0.00195599, throughput 4.87476K wps
[Epoch 156 Batch 150/173] avg loss 0.00193721, throughput 6.03183K wps
Begin Testing...
[Epoch 156] train avg loss 0.00195605, dev acc 0.8186, dev avg loss 0.420511, throughput 5.42535K wps
[Epoch 157 Batch 30/173] avg loss 0.00195586, throughput 4.90614K wps
[Epoch 157 Batch 60/173] avg loss 0.00204385, throughput 5.64874K wps
[Epoch 157 Batch 90/173] avg loss 0.00176398, throughput 5.27214K wps
[Epoch 157 Batch 120/173] avg loss 0.00189288, throughput 5.2663K wps
[Epoch 157 Batch 150/173] avg loss 0.0020151, throughput 5.57492K wps
Begin Testing...
[Epoch 157] train avg loss 0.00192905, dev acc 0.8206, dev avg loss 0.421029, throughput 5.42296K wps
[Epoch 158 Batch 30/173] avg loss 0.0019255, throughput 5.08485K wps
[Epoch 158 Batch 60/173] avg loss 0.00200687, throughput 5.13894K wps
[Epoch 158 Batch 90/173] avg loss 0.00177029, throughput 5.02195K wps
[Epoch 158 Batch 120/173] avg loss 0.001955, throughput 5.83298K wps
[Epoch 158 Batch 150/173] avg loss 0.00189475, throughput 5.1183K wps
Begin Testing...
[Epoch 158] train avg loss 0.00190913, dev acc 0.8227, dev avg loss 0.421573, throughput 5.22689K wps
Observed Improvement.
Begin Testing...
[Epoch 159 Batch 30/173] avg loss 0.00177132, throughput 6.16698K wps
[Epoch 159 Batch 60/173] avg loss 0.00197228, throughput 5.71086K wps
[Epoch 159 Batch 90/173] avg loss 0.001895, throughput 4.97691K wps
[Epoch 159 Batch 120/173] avg loss 0.0019591, throughput 4.78398K wps
[Epoch 159 Batch 150/173] avg loss 0.00188457, throughput 6.19629K wps
Begin Testing...
[Epoch 159] train avg loss 0.00190666, dev acc 0.8186, dev avg loss 0.42277, throughput 5.55893K wps
[Epoch 160 Batch 30/173] avg loss 0.00184861, throughput 5.8549K wps
[Epoch 160 Batch 60/173] avg loss 0.00177837, throughput 5.12101K wps
[Epoch 160 Batch 90/173] avg loss 0.00191258, throughput 5.00739K wps
[Epoch 160 Batch 120/173] avg loss 0.0018356, throughput 5.68949K wps
[Epoch 160 Batch 150/173] avg loss 0.00195082, throughput 6.53715K wps
Begin Testing...
[Epoch 160] train avg loss 0.00186546, dev acc 0.8248, dev avg loss 0.422162, throughput 5.49009K wps
Observed Improvement.
Begin Testing...
[Epoch 161 Batch 30/173] avg loss 0.00184194, throughput 5.23052K wps
[Epoch 161 Batch 60/173] avg loss 0.00183314, throughput 5.22057K wps
[Epoch 161 Batch 90/173] avg loss 0.00171785, throughput 5.61727K wps
[Epoch 161 Batch 120/173] avg loss 0.0018288, throughput 5.16589K wps
[Epoch 161 Batch 150/173] avg loss 0.00197074, throughput 5.76219K wps
Begin Testing...
[Epoch 161] train avg loss 0.00186803, dev acc 0.8206, dev avg loss 0.422245, throughput 5.45499K wps
[Epoch 162 Batch 30/173] avg loss 0.00188431, throughput 4.8273K wps
[Epoch 162 Batch 60/173] avg loss 0.00180014, throughput 5.48442K wps
[Epoch 162 Batch 90/173] avg loss 0.00184045, throughput 5.36918K wps
[Epoch 162 Batch 120/173] avg loss 0.0018453, throughput 5.03548K wps
[Epoch 162 Batch 150/173] avg loss 0.00193565, throughput 5.4878K wps
Begin Testing...
[Epoch 162] train avg loss 0.00187758, dev acc 0.8227, dev avg loss 0.422363, throughput 5.2095K wps
[Epoch 163 Batch 30/173] avg loss 0.00195162, throughput 5.69978K wps
[Epoch 163 Batch 60/173] avg loss 0.00191402, throughput 5.95812K wps
[Epoch 163 Batch 90/173] avg loss 0.00190309, throughput 6.2298K wps
[Epoch 163 Batch 120/173] avg loss 0.00191443, throughput 4.88588K wps
[Epoch 163 Batch 150/173] avg loss 0.00186353, throughput 5.06419K wps
Begin Testing...
[Epoch 163] train avg loss 0.00189401, dev acc 0.8206, dev avg loss 0.424703, throughput 5.43541K wps
[Epoch 164 Batch 30/173] avg loss 0.00186275, throughput 5.14161K wps
[Epoch 164 Batch 60/173] avg loss 0.00189361, throughput 6.58601K wps
[Epoch 164 Batch 90/173] avg loss 0.00184993, throughput 5.39983K wps
[Epoch 164 Batch 120/173] avg loss 0.00172531, throughput 5.75236K wps
[Epoch 164 Batch 150/173] avg loss 0.00178109, throughput 5.05057K wps
Begin Testing...
[Epoch 164] train avg loss 0.00181215, dev acc 0.8144, dev avg loss 0.425047, throughput 5.41744K wps
[Epoch 165 Batch 30/173] avg loss 0.0018915, throughput 5.12369K wps
[Epoch 165 Batch 60/173] avg loss 0.00183983, throughput 5.35542K wps
[Epoch 165 Batch 90/173] avg loss 0.00175324, throughput 5.39232K wps
[Epoch 165 Batch 120/173] avg loss 0.00184208, throughput 5.17633K wps
[Epoch 165 Batch 150/173] avg loss 0.00178403, throughput 5.14171K wps
Begin Testing...
[Epoch 165] train avg loss 0.00182789, dev acc 0.8175, dev avg loss 0.426176, throughput 5.2275K wps
[Epoch 166 Batch 30/173] avg loss 0.00176391, throughput 5.37338K wps
[Epoch 166 Batch 60/173] avg loss 0.00188705, throughput 4.76237K wps
[Epoch 166 Batch 90/173] avg loss 0.00173288, throughput 4.86362K wps
[Epoch 166 Batch 120/173] avg loss 0.00188628, throughput 5.65342K wps
[Epoch 166 Batch 150/173] avg loss 0.00180842, throughput 5.4211K wps
Begin Testing...
[Epoch 166] train avg loss 0.00183991, dev acc 0.8196, dev avg loss 0.426853, throughput 5.17347K wps
[Epoch 167 Batch 30/173] avg loss 0.00179138, throughput 5.3806K wps
[Epoch 167 Batch 60/173] avg loss 0.00180871, throughput 4.9446K wps
[Epoch 167 Batch 90/173] avg loss 0.00180695, throughput 5.69801K wps
[Epoch 167 Batch 120/173] avg loss 0.00178417, throughput 5.31125K wps
[Epoch 167 Batch 150/173] avg loss 0.00173496, throughput 5.597K wps
Begin Testing...
[Epoch 167] train avg loss 0.00180292, dev acc 0.8175, dev avg loss 0.424698, throughput 5.35515K wps
[Epoch 168 Batch 30/173] avg loss 0.00165782, throughput 4.76959K wps
[Epoch 168 Batch 60/173] avg loss 0.00171354, throughput 5.08368K wps
[Epoch 168 Batch 90/173] avg loss 0.00184071, throughput 5.21887K wps
[Epoch 168 Batch 120/173] avg loss 0.00162635, throughput 5.73508K wps
[Epoch 168 Batch 150/173] avg loss 0.00176885, throughput 5.85901K wps
Begin Testing...
[Epoch 168] train avg loss 0.00174535, dev acc 0.8206, dev avg loss 0.424433, throughput 5.23752K wps
[Epoch 169 Batch 30/173] avg loss 0.00179129, throughput 5.07842K wps
[Epoch 169 Batch 60/173] avg loss 0.00167561, throughput 5.18893K wps
[Epoch 169 Batch 90/173] avg loss 0.00180705, throughput 4.85168K wps
[Epoch 169 Batch 120/173] avg loss 0.00183713, throughput 5.84418K wps
[Epoch 169 Batch 150/173] avg loss 0.00193873, throughput 5.19504K wps
Begin Testing...
[Epoch 169] train avg loss 0.00179382, dev acc 0.8186, dev avg loss 0.425278, throughput 5.25608K wps
[Epoch 170 Batch 30/173] avg loss 0.00176746, throughput 4.87597K wps
[Epoch 170 Batch 60/173] avg loss 0.00167597, throughput 5.83559K wps
[Epoch 170 Batch 90/173] avg loss 0.00165742, throughput 5.43837K wps
[Epoch 170 Batch 120/173] avg loss 0.00169287, throughput 5.06847K wps
[Epoch 170 Batch 150/173] avg loss 0.00177534, throughput 4.75772K wps
Begin Testing...
[Epoch 170] train avg loss 0.00170902, dev acc 0.8165, dev avg loss 0.426569, throughput 5.17342K wps
[Epoch 171 Batch 30/173] avg loss 0.00174185, throughput 4.67339K wps
[Epoch 171 Batch 60/173] avg loss 0.00162507, throughput 5.12189K wps
[Epoch 171 Batch 90/173] avg loss 0.00168442, throughput 5.35479K wps
[Epoch 171 Batch 120/173] avg loss 0.00161393, throughput 5.36557K wps
[Epoch 171 Batch 150/173] avg loss 0.00188815, throughput 5.32683K wps
Begin Testing...
[Epoch 171] train avg loss 0.00170033, dev acc 0.8186, dev avg loss 0.426273, throughput 5.28614K wps
[Epoch 172 Batch 30/173] avg loss 0.00166099, throughput 5.26989K wps
[Epoch 172 Batch 60/173] avg loss 0.0016661, throughput 5.20782K wps
[Epoch 172 Batch 90/173] avg loss 0.00169945, throughput 5.06272K wps
[Epoch 172 Batch 120/173] avg loss 0.00175417, throughput 5.33518K wps
[Epoch 172 Batch 150/173] avg loss 0.00170551, throughput 6.27276K wps
Begin Testing...
[Epoch 172] train avg loss 0.00170413, dev acc 0.8217, dev avg loss 0.426461, throughput 5.33712K wps
[Epoch 173 Batch 30/173] avg loss 0.0016078, throughput 5.65784K wps
[Epoch 173 Batch 60/173] avg loss 0.00159724, throughput 4.83793K wps
[Epoch 173 Batch 90/173] avg loss 0.0015354, throughput 5.73489K wps
[Epoch 173 Batch 120/173] avg loss 0.00163994, throughput 5.64124K wps
[Epoch 173 Batch 150/173] avg loss 0.0017089, throughput 5.67751K wps
Begin Testing...
[Epoch 173] train avg loss 0.00162407, dev acc 0.8196, dev avg loss 0.426946, throughput 5.43955K wps
[Epoch 174 Batch 30/173] avg loss 0.00164849, throughput 4.92319K wps
[Epoch 174 Batch 60/173] avg loss 0.00168928, throughput 5.10922K wps
[Epoch 174 Batch 90/173] avg loss 0.00149305, throughput 4.88144K wps
[Epoch 174 Batch 120/173] avg loss 0.00163661, throughput 4.70303K wps
[Epoch 174 Batch 150/173] avg loss 0.00177858, throughput 5.42617K wps
Begin Testing...
[Epoch 174] train avg loss 0.00164836, dev acc 0.8154, dev avg loss 0.428621, throughput 5.02061K wps
[Epoch 175 Batch 30/173] avg loss 0.00153216, throughput 5.39988K wps
[Epoch 175 Batch 60/173] avg loss 0.00151837, throughput 5.42207K wps
[Epoch 175 Batch 90/173] avg loss 0.0017893, throughput 5.26256K wps
[Epoch 175 Batch 120/173] avg loss 0.00163467, throughput 5.67961K wps
[Epoch 175 Batch 150/173] avg loss 0.00161329, throughput 5.08578K wps
Begin Testing...
[Epoch 175] train avg loss 0.00161125, dev acc 0.8154, dev avg loss 0.429167, throughput 5.33471K wps
[Epoch 176 Batch 30/173] avg loss 0.00163271, throughput 4.86319K wps
[Epoch 176 Batch 60/173] avg loss 0.00155576, throughput 4.95259K wps
[Epoch 176 Batch 90/173] avg loss 0.0016951, throughput 5.79245K wps
[Epoch 176 Batch 120/173] avg loss 0.00168882, throughput 5.31559K wps
[Epoch 176 Batch 150/173] avg loss 0.00161833, throughput 4.95639K wps
Begin Testing...
[Epoch 176] train avg loss 0.00161973, dev acc 0.8092, dev avg loss 0.432049, throughput 5.18833K wps
[Epoch 177 Batch 30/173] avg loss 0.00142462, throughput 5.14867K wps
[Epoch 177 Batch 60/173] avg loss 0.00164611, throughput 4.98184K wps
[Epoch 177 Batch 90/173] avg loss 0.00149405, throughput 6.24751K wps
[Epoch 177 Batch 120/173] avg loss 0.00164206, throughput 5.04182K wps
[Epoch 177 Batch 150/173] avg loss 0.00162466, throughput 5.44754K wps
Begin Testing...
[Epoch 177] train avg loss 0.00158773, dev acc 0.8217, dev avg loss 0.428316, throughput 5.30035K wps
[Epoch 178 Batch 30/173] avg loss 0.00158278, throughput 4.85981K wps
[Epoch 178 Batch 60/173] avg loss 0.00157323, throughput 6.31882K wps
[Epoch 178 Batch 90/173] avg loss 0.00144006, throughput 5.15009K wps
[Epoch 178 Batch 120/173] avg loss 0.00157852, throughput 5.1878K wps
[Epoch 178 Batch 150/173] avg loss 0.001679, throughput 5.57962K wps
Begin Testing...
[Epoch 178] train avg loss 0.00158123, dev acc 0.8144, dev avg loss 0.430626, throughput 5.47024K wps
[Epoch 179 Batch 30/173] avg loss 0.00151319, throughput 5.29654K wps
[Epoch 179 Batch 60/173] avg loss 0.00154065, throughput 4.82223K wps
[Epoch 179 Batch 90/173] avg loss 0.00155439, throughput 5.6647K wps
[Epoch 179 Batch 120/173] avg loss 0.0015632, throughput 4.81843K wps
[Epoch 179 Batch 150/173] avg loss 0.00153841, throughput 4.83943K wps
Begin Testing...
[Epoch 179] train avg loss 0.00156139, dev acc 0.8186, dev avg loss 0.427809, throughput 5.19227K wps
[Epoch 180 Batch 30/173] avg loss 0.00150679, throughput 5.14394K wps
[Epoch 180 Batch 60/173] avg loss 0.00142946, throughput 4.80568K wps
[Epoch 180 Batch 90/173] avg loss 0.00161167, throughput 4.8281K wps
[Epoch 180 Batch 120/173] avg loss 0.00157896, throughput 5.89175K wps
[Epoch 180 Batch 150/173] avg loss 0.00164315, throughput 5.62664K wps
Begin Testing...
[Epoch 180] train avg loss 0.00154148, dev acc 0.8196, dev avg loss 0.429059, throughput 5.2971K wps
[Epoch 181 Batch 30/173] avg loss 0.00156556, throughput 5.12521K wps
[Epoch 181 Batch 60/173] avg loss 0.00158436, throughput 5.31555K wps
[Epoch 181 Batch 90/173] avg loss 0.00160316, throughput 5.24087K wps
[Epoch 181 Batch 120/173] avg loss 0.00162248, throughput 5.284K wps
[Epoch 181 Batch 150/173] avg loss 0.00157875, throughput 4.80181K wps
Begin Testing...
[Epoch 181] train avg loss 0.00158725, dev acc 0.8186, dev avg loss 0.43167, throughput 5.16386K wps
[Epoch 182 Batch 30/173] avg loss 0.00154067, throughput 5.30392K wps
[Epoch 182 Batch 60/173] avg loss 0.00143668, throughput 5.4379K wps
[Epoch 182 Batch 90/173] avg loss 0.00144946, throughput 5.08616K wps
[Epoch 182 Batch 120/173] avg loss 0.00171771, throughput 5.50095K wps
[Epoch 182 Batch 150/173] avg loss 0.00155762, throughput 5.7612K wps
Begin Testing...
[Epoch 182] train avg loss 0.00154682, dev acc 0.8196, dev avg loss 0.431755, throughput 5.4138K wps
[Epoch 183 Batch 30/173] avg loss 0.00157571, throughput 4.96114K wps
[Epoch 183 Batch 60/173] avg loss 0.0015078, throughput 5.61247K wps
[Epoch 183 Batch 90/173] avg loss 0.00148211, throughput 5.93961K wps
[Epoch 183 Batch 120/173] avg loss 0.00145066, throughput 5.21153K wps
[Epoch 183 Batch 150/173] avg loss 0.00145693, throughput 5.33228K wps
Begin Testing...
[Epoch 183] train avg loss 0.00150335, dev acc 0.8186, dev avg loss 0.429183, throughput 5.42287K wps
[Epoch 184 Batch 30/173] avg loss 0.00144973, throughput 4.82919K wps
[Epoch 184 Batch 60/173] avg loss 0.0015877, throughput 5.31355K wps
[Epoch 184 Batch 90/173] avg loss 0.0015743, throughput 5.49474K wps
[Epoch 184 Batch 120/173] avg loss 0.0015086, throughput 5.5504K wps
[Epoch 184 Batch 150/173] avg loss 0.00154539, throughput 5.80138K wps
Begin Testing...
[Epoch 184] train avg loss 0.00151894, dev acc 0.8175, dev avg loss 0.431123, throughput 5.33198K wps
[Epoch 185 Batch 30/173] avg loss 0.00148507, throughput 5.05467K wps
[Epoch 185 Batch 60/173] avg loss 0.00153957, throughput 5.15929K wps
[Epoch 185 Batch 90/173] avg loss 0.00152777, throughput 5.04668K wps
[Epoch 185 Batch 120/173] avg loss 0.00150497, throughput 5.43484K wps
[Epoch 185 Batch 150/173] avg loss 0.00151553, throughput 5.253K wps
Begin Testing...
[Epoch 185] train avg loss 0.00149812, dev acc 0.8165, dev avg loss 0.43034, throughput 5.1666K wps
[Epoch 186 Batch 30/173] avg loss 0.00148857, throughput 5.69154K wps
[Epoch 186 Batch 60/173] avg loss 0.0015854, throughput 5.16233K wps
[Epoch 186 Batch 90/173] avg loss 0.00143181, throughput 5.64842K wps
[Epoch 186 Batch 120/173] avg loss 0.00143888, throughput 5.52558K wps
[Epoch 186 Batch 150/173] avg loss 0.00149382, throughput 5.96053K wps
Begin Testing...
[Epoch 186] train avg loss 0.00147478, dev acc 0.8154, dev avg loss 0.434816, throughput 5.58546K wps
[Epoch 187 Batch 30/173] avg loss 0.00151895, throughput 4.78976K wps
[Epoch 187 Batch 60/173] avg loss 0.00136057, throughput 5.44838K wps
[Epoch 187 Batch 90/173] avg loss 0.00147045, throughput 6.00284K wps
[Epoch 187 Batch 120/173] avg loss 0.00156715, throughput 5.07564K wps
[Epoch 187 Batch 150/173] avg loss 0.00146569, throughput 5.52168K wps
Begin Testing...
[Epoch 187] train avg loss 0.00148105, dev acc 0.8175, dev avg loss 0.431071, throughput 5.33162K wps
[Epoch 188 Batch 30/173] avg loss 0.00142677, throughput 5.24957K wps
[Epoch 188 Batch 60/173] avg loss 0.0014973, throughput 4.9387K wps
[Epoch 188 Batch 90/173] avg loss 0.00150614, throughput 4.77238K wps
[Epoch 188 Batch 120/173] avg loss 0.00142809, throughput 6.07722K wps
[Epoch 188 Batch 150/173] avg loss 0.00143483, throughput 5.32699K wps
Begin Testing...
[Epoch 188] train avg loss 0.0014636, dev acc 0.8186, dev avg loss 0.43225, throughput 5.20378K wps
[Epoch 189 Batch 30/173] avg loss 0.00141121, throughput 5.20989K wps
[Epoch 189 Batch 60/173] avg loss 0.00129685, throughput 6.27918K wps
[Epoch 189 Batch 90/173] avg loss 0.00143343, throughput 5.14281K wps
[Epoch 189 Batch 120/173] avg loss 0.00136044, throughput 4.79804K wps
[Epoch 189 Batch 150/173] avg loss 0.00135315, throughput 5.395K wps
Begin Testing...
[Epoch 189] train avg loss 0.00137421, dev acc 0.8186, dev avg loss 0.431103, throughput 5.25318K wps
[Epoch 190 Batch 30/173] avg loss 0.00148326, throughput 5.53767K wps
[Epoch 190 Batch 60/173] avg loss 0.00142786, throughput 5.04726K wps
[Epoch 190 Batch 90/173] avg loss 0.00142802, throughput 5.66594K wps
[Epoch 190 Batch 120/173] avg loss 0.00153942, throughput 5.68994K wps
[Epoch 190 Batch 150/173] avg loss 0.00146919, throughput 5.9914K wps
Begin Testing...
[Epoch 190] train avg loss 0.00147695, dev acc 0.8196, dev avg loss 0.43122, throughput 5.48488K wps
[Epoch 191 Batch 30/173] avg loss 0.00145398, throughput 4.99569K wps
[Epoch 191 Batch 60/173] avg loss 0.00130077, throughput 4.90465K wps
[Epoch 191 Batch 90/173] avg loss 0.00149089, throughput 5.19353K wps
[Epoch 191 Batch 120/173] avg loss 0.00138983, throughput 5.30346K wps
[Epoch 191 Batch 150/173] avg loss 0.00149023, throughput 5.11958K wps
Begin Testing...
[Epoch 191] train avg loss 0.00145121, dev acc 0.8196, dev avg loss 0.430997, throughput 5.17595K wps
[Epoch 192 Batch 30/173] avg loss 0.00158139, throughput 5.35043K wps
[Epoch 192 Batch 60/173] avg loss 0.00128341, throughput 4.91682K wps
[Epoch 192 Batch 90/173] avg loss 0.00142165, throughput 5.18407K wps
[Epoch 192 Batch 120/173] avg loss 0.0013503, throughput 6.19451K wps
[Epoch 192 Batch 150/173] avg loss 0.00126732, throughput 5.89888K wps
Begin Testing...
[Epoch 192] train avg loss 0.00139273, dev acc 0.8175, dev avg loss 0.436912, throughput 5.44308K wps
[Epoch 193 Batch 30/173] avg loss 0.00140024, throughput 4.88259K wps
[Epoch 193 Batch 60/173] avg loss 0.00131633, throughput 4.94976K wps
[Epoch 193 Batch 90/173] avg loss 0.00140594, throughput 4.91566K wps
[Epoch 193 Batch 120/173] avg loss 0.00142917, throughput 5.85485K wps
[Epoch 193 Batch 150/173] avg loss 0.00146111, throughput 5.44953K wps
Begin Testing...
[Epoch 193] train avg loss 0.00141025, dev acc 0.8248, dev avg loss 0.431506, throughput 5.21034K wps
Observed Improvement.
Begin Testing...
[Epoch 194 Batch 30/173] avg loss 0.00132037, throughput 5.70749K wps
[Epoch 194 Batch 60/173] avg loss 0.00136898, throughput 6.43274K wps
[Epoch 194 Batch 90/173] avg loss 0.00139882, throughput 4.73819K wps
[Epoch 194 Batch 120/173] avg loss 0.00134966, throughput 5.70421K wps
[Epoch 194 Batch 150/173] avg loss 0.00155187, throughput 5.76657K wps
Begin Testing...
[Epoch 194] train avg loss 0.00141775, dev acc 0.8217, dev avg loss 0.432804, throughput 5.47634K wps
[Epoch 195 Batch 30/173] avg loss 0.00135958, throughput 4.94375K wps
[Epoch 195 Batch 60/173] avg loss 0.00127981, throughput 4.70978K wps
[Epoch 195 Batch 90/173] avg loss 0.00143102, throughput 4.8958K wps
[Epoch 195 Batch 120/173] avg loss 0.00144705, throughput 5.33907K wps
[Epoch 195 Batch 150/173] avg loss 0.00131927, throughput 5.38698K wps
Begin Testing...
[Epoch 195] train avg loss 0.00136669, dev acc 0.8196, dev avg loss 0.433184, throughput 5.11047K wps
[Epoch 196 Batch 30/173] avg loss 0.00140855, throughput 5.04727K wps
[Epoch 196 Batch 60/173] avg loss 0.00142485, throughput 5.0088K wps
[Epoch 196 Batch 90/173] avg loss 0.0014713, throughput 5.21396K wps
[Epoch 196 Batch 120/173] avg loss 0.00142517, throughput 5.56042K wps
[Epoch 196 Batch 150/173] avg loss 0.00124247, throughput 6.43777K wps
Begin Testing...
[Epoch 196] train avg loss 0.00139429, dev acc 0.8206, dev avg loss 0.433365, throughput 5.33832K wps
[Epoch 197 Batch 30/173] avg loss 0.00126379, throughput 5.807K wps
[Epoch 197 Batch 60/173] avg loss 0.00147138, throughput 5.65975K wps
[Epoch 197 Batch 90/173] avg loss 0.00125821, throughput 5.0921K wps
[Epoch 197 Batch 120/173] avg loss 0.00141466, throughput 5.05657K wps
[Epoch 197 Batch 150/173] avg loss 0.0013624, throughput 4.76973K wps
Begin Testing...
[Epoch 197] train avg loss 0.00134195, dev acc 0.8227, dev avg loss 0.434207, throughput 5.16022K wps
[Epoch 198 Batch 30/173] avg loss 0.00126221, throughput 5.41607K wps
[Epoch 198 Batch 60/173] avg loss 0.00133, throughput 5.50262K wps
[Epoch 198 Batch 90/173] avg loss 0.00132793, throughput 5.52602K wps
[Epoch 198 Batch 120/173] avg loss 0.00141748, throughput 4.94242K wps
[Epoch 198 Batch 150/173] avg loss 0.00131182, throughput 5.72538K wps
Begin Testing...
[Epoch 198] train avg loss 0.00134443, dev acc 0.8144, dev avg loss 0.439177, throughput 5.50393K wps
[Epoch 199 Batch 30/173] avg loss 0.00124944, throughput 4.77966K wps
[Epoch 199 Batch 60/173] avg loss 0.0013882, throughput 5.16342K wps
[Epoch 199 Batch 90/173] avg loss 0.00144397, throughput 4.88892K wps
[Epoch 199 Batch 120/173] avg loss 0.00135948, throughput 5.36629K wps
[Epoch 199 Batch 150/173] avg loss 0.00137719, throughput 5.22674K wps
Begin Testing...
[Epoch 199] train avg loss 0.00134483, dev acc 0.8206, dev avg loss 0.435255, throughput 5.05914K wps
Test loss 0.478494, test acc 0.7908
Total time cost 388.05s
[Epoch 0 Batch 30/173] avg loss 0.0139488, throughput 4.50199K wps
[Epoch 0 Batch 60/173] avg loss 0.014038, throughput 5.01458K wps
[Epoch 0 Batch 90/173] avg loss 0.0139303, throughput 5.05564K wps
[Epoch 0 Batch 120/173] avg loss 0.0138152, throughput 5.27663K wps
[Epoch 0 Batch 150/173] avg loss 0.0139544, throughput 4.83131K wps
Begin Testing...
[Epoch 0] train avg loss 0.0139523, dev acc 0.5985, dev avg loss 0.684474, throughput 4.95361K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/173] avg loss 0.013827, throughput 4.96K wps
[Epoch 1 Batch 60/173] avg loss 0.0138111, throughput 4.99676K wps
[Epoch 1 Batch 90/173] avg loss 0.0138112, throughput 4.95399K wps
[Epoch 1 Batch 120/173] avg loss 0.0138312, throughput 4.98569K wps
[Epoch 1 Batch 150/173] avg loss 0.0137427, throughput 4.88074K wps
Begin Testing...
[Epoch 1] train avg loss 0.0138043, dev acc 0.6496, dev avg loss 0.67907, throughput 4.96265K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/173] avg loss 0.013699, throughput 5.2219K wps
[Epoch 2 Batch 60/173] avg loss 0.0136751, throughput 5.12208K wps
[Epoch 2 Batch 90/173] avg loss 0.0135302, throughput 5.03324K wps
[Epoch 2 Batch 120/173] avg loss 0.013757, throughput 5.33381K wps
[Epoch 2 Batch 150/173] avg loss 0.0135567, throughput 5.73246K wps
Begin Testing...
[Epoch 2] train avg loss 0.013661, dev acc 0.6809, dev avg loss 0.672588, throughput 5.28972K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/173] avg loss 0.0135487, throughput 6.0497K wps
[Epoch 3 Batch 60/173] avg loss 0.0135109, throughput 5.01978K wps
[Epoch 3 Batch 90/173] avg loss 0.0135459, throughput 5.5285K wps
[Epoch 3 Batch 120/173] avg loss 0.0135353, throughput 5.09731K wps
[Epoch 3 Batch 150/173] avg loss 0.0135584, throughput 5.7758K wps
Begin Testing...
[Epoch 3] train avg loss 0.0135525, dev acc 0.6820, dev avg loss 0.667058, throughput 5.58948K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/173] avg loss 0.0133829, throughput 5.15306K wps
[Epoch 4 Batch 60/173] avg loss 0.0133012, throughput 5.55684K wps
[Epoch 4 Batch 90/173] avg loss 0.0133589, throughput 4.95485K wps
[Epoch 4 Batch 120/173] avg loss 0.0132306, throughput 5.39848K wps
[Epoch 4 Batch 150/173] avg loss 0.0133564, throughput 5.3604K wps
Begin Testing...
[Epoch 4] train avg loss 0.0133501, dev acc 0.7059, dev avg loss 0.660163, throughput 5.21373K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/173] avg loss 0.0132081, throughput 5.03646K wps
[Epoch 5 Batch 60/173] avg loss 0.0132422, throughput 4.92589K wps
[Epoch 5 Batch 90/173] avg loss 0.0131788, throughput 5.71602K wps
[Epoch 5 Batch 120/173] avg loss 0.0132677, throughput 5.47651K wps
[Epoch 5 Batch 150/173] avg loss 0.0132439, throughput 5.26048K wps
Begin Testing...
[Epoch 5] train avg loss 0.0132394, dev acc 0.7132, dev avg loss 0.652701, throughput 5.33363K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/173] avg loss 0.0130773, throughput 5.10399K wps
[Epoch 6 Batch 60/173] avg loss 0.012912, throughput 5.15831K wps
[Epoch 6 Batch 90/173] avg loss 0.0130293, throughput 5.18603K wps
[Epoch 6 Batch 120/173] avg loss 0.0130413, throughput 5.28304K wps
[Epoch 6 Batch 150/173] avg loss 0.0130978, throughput 5.78432K wps
Begin Testing...
[Epoch 6] train avg loss 0.0130407, dev acc 0.7164, dev avg loss 0.645005, throughput 5.22618K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/173] avg loss 0.0129852, throughput 5.10464K wps
[Epoch 7 Batch 60/173] avg loss 0.012913, throughput 4.77389K wps
[Epoch 7 Batch 90/173] avg loss 0.012964, throughput 4.86041K wps
[Epoch 7 Batch 120/173] avg loss 0.0127683, throughput 4.96353K wps
[Epoch 7 Batch 150/173] avg loss 0.0127553, throughput 5.04811K wps
Begin Testing...
[Epoch 7] train avg loss 0.0128815, dev acc 0.7205, dev avg loss 0.63656, throughput 5.11051K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/173] avg loss 0.0128467, throughput 6.0153K wps
[Epoch 8 Batch 60/173] avg loss 0.0127821, throughput 4.94573K wps
[Epoch 8 Batch 90/173] avg loss 0.0126555, throughput 5.05327K wps
[Epoch 8 Batch 120/173] avg loss 0.012708, throughput 5.09024K wps
[Epoch 8 Batch 150/173] avg loss 0.0125712, throughput 4.92899K wps
Begin Testing...
[Epoch 8] train avg loss 0.0127201, dev acc 0.6882, dev avg loss 0.631292, throughput 5.31558K wps
[Epoch 9 Batch 30/173] avg loss 0.0124553, throughput 5.19505K wps
[Epoch 9 Batch 60/173] avg loss 0.0123918, throughput 5.12642K wps
[Epoch 9 Batch 90/173] avg loss 0.0123801, throughput 4.86492K wps
[Epoch 9 Batch 120/173] avg loss 0.0123692, throughput 4.83219K wps
[Epoch 9 Batch 150/173] avg loss 0.0125797, throughput 5.96995K wps
Begin Testing...
[Epoch 9] train avg loss 0.0124681, dev acc 0.7331, dev avg loss 0.618614, throughput 5.16746K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/173] avg loss 0.0122592, throughput 4.96895K wps
[Epoch 10 Batch 60/173] avg loss 0.0124876, throughput 5.79697K wps
[Epoch 10 Batch 90/173] avg loss 0.0122955, throughput 5.65538K wps
[Epoch 10 Batch 120/173] avg loss 0.0122788, throughput 5.80171K wps
[Epoch 10 Batch 150/173] avg loss 0.0122934, throughput 5.06951K wps
Begin Testing...
[Epoch 10] train avg loss 0.0123127, dev acc 0.7237, dev avg loss 0.610631, throughput 5.44363K wps
[Epoch 11 Batch 30/173] avg loss 0.0119742, throughput 5.74079K wps
[Epoch 11 Batch 60/173] avg loss 0.0122636, throughput 5.99675K wps
[Epoch 11 Batch 90/173] avg loss 0.0120373, throughput 5.71757K wps
[Epoch 11 Batch 120/173] avg loss 0.0121453, throughput 4.72687K wps
[Epoch 11 Batch 150/173] avg loss 0.0122317, throughput 5.49392K wps
Begin Testing...
[Epoch 11] train avg loss 0.012142, dev acc 0.7351, dev avg loss 0.599124, throughput 5.47316K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/173] avg loss 0.0118735, throughput 5.21881K wps
[Epoch 12 Batch 60/173] avg loss 0.0118361, throughput 5.57977K wps
[Epoch 12 Batch 90/173] avg loss 0.0120742, throughput 5.20087K wps
[Epoch 12 Batch 120/173] avg loss 0.0120281, throughput 4.7761K wps
[Epoch 12 Batch 150/173] avg loss 0.0115741, throughput 5.5261K wps
Begin Testing...
[Epoch 12] train avg loss 0.0118938, dev acc 0.7424, dev avg loss 0.5889, throughput 5.25029K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/173] avg loss 0.0116897, throughput 4.7443K wps
[Epoch 13 Batch 60/173] avg loss 0.0117504, throughput 5.40194K wps
[Epoch 13 Batch 90/173] avg loss 0.0115262, throughput 5.15184K wps
[Epoch 13 Batch 120/173] avg loss 0.011666, throughput 5.19003K wps
[Epoch 13 Batch 150/173] avg loss 0.011552, throughput 5.15274K wps
Begin Testing...
[Epoch 13] train avg loss 0.0116554, dev acc 0.7404, dev avg loss 0.57921, throughput 5.128K wps
[Epoch 14 Batch 30/173] avg loss 0.0114539, throughput 4.73133K wps
[Epoch 14 Batch 60/173] avg loss 0.0113446, throughput 5.20816K wps
[Epoch 14 Batch 90/173] avg loss 0.0114505, throughput 4.92139K wps
[Epoch 14 Batch 120/173] avg loss 0.0114513, throughput 5.18319K wps
[Epoch 14 Batch 150/173] avg loss 0.0114484, throughput 5.56402K wps
Begin Testing...
[Epoch 14] train avg loss 0.0114474, dev acc 0.7466, dev avg loss 0.570405, throughput 5.0768K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/173] avg loss 0.011155, throughput 5.01744K wps
[Epoch 15 Batch 60/173] avg loss 0.0112333, throughput 5.45105K wps
[Epoch 15 Batch 90/173] avg loss 0.011283, throughput 6.00554K wps
[Epoch 15 Batch 120/173] avg loss 0.0112457, throughput 5.02888K wps
[Epoch 15 Batch 150/173] avg loss 0.0110689, throughput 5.47053K wps
Begin Testing...
[Epoch 15] train avg loss 0.0112071, dev acc 0.7602, dev avg loss 0.56042, throughput 5.30012K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/173] avg loss 0.0110899, throughput 5.6532K wps
[Epoch 16 Batch 60/173] avg loss 0.011105, throughput 4.8996K wps
[Epoch 16 Batch 90/173] avg loss 0.0107795, throughput 5.3029K wps
[Epoch 16 Batch 120/173] avg loss 0.0109797, throughput 5.07669K wps
[Epoch 16 Batch 150/173] avg loss 0.01088, throughput 5.09273K wps
Begin Testing...
[Epoch 16] train avg loss 0.0109635, dev acc 0.7633, dev avg loss 0.550517, throughput 5.21669K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/173] avg loss 0.0107839, throughput 5.3357K wps
[Epoch 17 Batch 60/173] avg loss 0.0106729, throughput 4.85392K wps
[Epoch 17 Batch 90/173] avg loss 0.0110781, throughput 5.33027K wps
[Epoch 17 Batch 120/173] avg loss 0.0107736, throughput 5.08985K wps
[Epoch 17 Batch 150/173] avg loss 0.0105851, throughput 5.80893K wps
Begin Testing...
[Epoch 17] train avg loss 0.0107972, dev acc 0.7675, dev avg loss 0.541512, throughput 5.303K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/173] avg loss 0.0106453, throughput 5.13758K wps
[Epoch 18 Batch 60/173] avg loss 0.0106784, throughput 5.39738K wps
[Epoch 18 Batch 90/173] avg loss 0.010721, throughput 5.83097K wps
[Epoch 18 Batch 120/173] avg loss 0.0107505, throughput 5.34443K wps
[Epoch 18 Batch 150/173] avg loss 0.0105653, throughput 4.933K wps
Begin Testing...
[Epoch 18] train avg loss 0.0106411, dev acc 0.7716, dev avg loss 0.534089, throughput 5.23277K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/173] avg loss 0.0104318, throughput 5.54142K wps
[Epoch 19 Batch 60/173] avg loss 0.0102562, throughput 4.8003K wps
[Epoch 19 Batch 90/173] avg loss 0.0105468, throughput 4.97151K wps
[Epoch 19 Batch 120/173] avg loss 0.0105529, throughput 5.04179K wps
[Epoch 19 Batch 150/173] avg loss 0.0105833, throughput 5.48367K wps
Begin Testing...
[Epoch 19] train avg loss 0.0104452, dev acc 0.7706, dev avg loss 0.525916, throughput 5.29102K wps
[Epoch 20 Batch 30/173] avg loss 0.010422, throughput 4.76802K wps
[Epoch 20 Batch 60/173] avg loss 0.0105678, throughput 5.7934K wps
[Epoch 20 Batch 90/173] avg loss 0.0102076, throughput 6.46959K wps
[Epoch 20 Batch 120/173] avg loss 0.0103209, throughput 4.89931K wps
[Epoch 20 Batch 150/173] avg loss 0.0100197, throughput 5.21978K wps
Begin Testing...
[Epoch 20] train avg loss 0.010258, dev acc 0.7664, dev avg loss 0.520301, throughput 5.4045K wps
[Epoch 21 Batch 30/173] avg loss 0.0102739, throughput 5.53888K wps
[Epoch 21 Batch 60/173] avg loss 0.00993842, throughput 5.49072K wps
[Epoch 21 Batch 90/173] avg loss 0.00987659, throughput 4.79493K wps
[Epoch 21 Batch 120/173] avg loss 0.00974987, throughput 5.91372K wps
[Epoch 21 Batch 150/173] avg loss 0.0102141, throughput 5.46109K wps
Begin Testing...
[Epoch 21] train avg loss 0.0100228, dev acc 0.7737, dev avg loss 0.512176, throughput 5.52364K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/173] avg loss 0.00975597, throughput 6.10741K wps
[Epoch 22 Batch 60/173] avg loss 0.0100098, throughput 5.2644K wps
[Epoch 22 Batch 90/173] avg loss 0.00987346, throughput 5.32137K wps
[Epoch 22 Batch 120/173] avg loss 0.00967128, throughput 5.08344K wps
[Epoch 22 Batch 150/173] avg loss 0.00984627, throughput 4.76588K wps
Begin Testing...
[Epoch 22] train avg loss 0.00991235, dev acc 0.7685, dev avg loss 0.50797, throughput 5.20689K wps
[Epoch 23 Batch 30/173] avg loss 0.00975013, throughput 5.22627K wps
[Epoch 23 Batch 60/173] avg loss 0.00970536, throughput 5.27058K wps
[Epoch 23 Batch 90/173] avg loss 0.00985269, throughput 5.27751K wps
[Epoch 23 Batch 120/173] avg loss 0.00946226, throughput 4.96378K wps
[Epoch 23 Batch 150/173] avg loss 0.00943081, throughput 5.1372K wps
Begin Testing...
[Epoch 23] train avg loss 0.00973146, dev acc 0.7779, dev avg loss 0.50206, throughput 5.16003K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/173] avg loss 0.00972942, throughput 4.75505K wps
[Epoch 24 Batch 60/173] avg loss 0.00931387, throughput 5.73554K wps
[Epoch 24 Batch 90/173] avg loss 0.00969304, throughput 5.3327K wps
[Epoch 24 Batch 120/173] avg loss 0.00998443, throughput 4.93327K wps
[Epoch 24 Batch 150/173] avg loss 0.00972738, throughput 6.01222K wps
Begin Testing...
[Epoch 24] train avg loss 0.00968105, dev acc 0.7737, dev avg loss 0.496608, throughput 5.39087K wps
[Epoch 25 Batch 30/173] avg loss 0.00961408, throughput 5.22729K wps
[Epoch 25 Batch 60/173] avg loss 0.00954988, throughput 5.09389K wps
[Epoch 25 Batch 90/173] avg loss 0.00940541, throughput 5.42417K wps
[Epoch 25 Batch 120/173] avg loss 0.00935349, throughput 4.76995K wps
[Epoch 25 Batch 150/173] avg loss 0.00947463, throughput 4.71355K wps
Begin Testing...
[Epoch 25] train avg loss 0.00949775, dev acc 0.7685, dev avg loss 0.494894, throughput 5.15134K wps
[Epoch 26 Batch 30/173] avg loss 0.00926849, throughput 5.43712K wps
[Epoch 26 Batch 60/173] avg loss 0.00953172, throughput 4.88782K wps
[Epoch 26 Batch 90/173] avg loss 0.00938667, throughput 5.35774K wps
[Epoch 26 Batch 120/173] avg loss 0.00924312, throughput 5.02867K wps
[Epoch 26 Batch 150/173] avg loss 0.00930517, throughput 5.76982K wps
Begin Testing...
[Epoch 26] train avg loss 0.00937238, dev acc 0.7810, dev avg loss 0.488326, throughput 5.25054K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/173] avg loss 0.00928284, throughput 5.00727K wps
[Epoch 27 Batch 60/173] avg loss 0.00909392, throughput 4.69654K wps
[Epoch 27 Batch 90/173] avg loss 0.00959026, throughput 5.12199K wps
[Epoch 27 Batch 120/173] avg loss 0.00927087, throughput 5.27361K wps
[Epoch 27 Batch 150/173] avg loss 0.00899555, throughput 4.78795K wps
Begin Testing...
[Epoch 27] train avg loss 0.00923224, dev acc 0.7737, dev avg loss 0.485333, throughput 4.93459K wps
[Epoch 28 Batch 30/173] avg loss 0.00897291, throughput 5.31958K wps
[Epoch 28 Batch 60/173] avg loss 0.00895292, throughput 4.91745K wps
[Epoch 28 Batch 90/173] avg loss 0.00916339, throughput 5.30626K wps
[Epoch 28 Batch 120/173] avg loss 0.00916127, throughput 5.14848K wps
[Epoch 28 Batch 150/173] avg loss 0.00912598, throughput 5.44698K wps
Begin Testing...
[Epoch 28] train avg loss 0.00910489, dev acc 0.7789, dev avg loss 0.481285, throughput 5.24367K wps
[Epoch 29 Batch 30/173] avg loss 0.00880392, throughput 5.33504K wps
[Epoch 29 Batch 60/173] avg loss 0.00904666, throughput 5.60525K wps
[Epoch 29 Batch 90/173] avg loss 0.00851584, throughput 5.53456K wps
[Epoch 29 Batch 120/173] avg loss 0.0093412, throughput 5.61463K wps
[Epoch 29 Batch 150/173] avg loss 0.009117, throughput 5.42915K wps
Begin Testing...
[Epoch 29] train avg loss 0.00903948, dev acc 0.7800, dev avg loss 0.478622, throughput 5.39149K wps
[Epoch 30 Batch 30/173] avg loss 0.00884465, throughput 5.03318K wps
[Epoch 30 Batch 60/173] avg loss 0.00905322, throughput 4.79244K wps
[Epoch 30 Batch 90/173] avg loss 0.00913189, throughput 5.12232K wps
[Epoch 30 Batch 120/173] avg loss 0.00900248, throughput 5.17086K wps
[Epoch 30 Batch 150/173] avg loss 0.00857443, throughput 5.30117K wps
Begin Testing...
[Epoch 30] train avg loss 0.0089683, dev acc 0.7623, dev avg loss 0.478718, throughput 5.02532K wps
[Epoch 31 Batch 30/173] avg loss 0.00882774, throughput 5.4023K wps
[Epoch 31 Batch 60/173] avg loss 0.00887008, throughput 4.97351K wps
[Epoch 31 Batch 90/173] avg loss 0.00864032, throughput 5.54217K wps
[Epoch 31 Batch 120/173] avg loss 0.0091355, throughput 5.5164K wps
[Epoch 31 Batch 150/173] avg loss 0.00841121, throughput 5.73981K wps
Begin Testing...
[Epoch 31] train avg loss 0.00880613, dev acc 0.7758, dev avg loss 0.474595, throughput 5.31798K wps
[Epoch 32 Batch 30/173] avg loss 0.0086492, throughput 4.70581K wps
[Epoch 32 Batch 60/173] avg loss 0.00827034, throughput 5.02099K wps
[Epoch 32 Batch 90/173] avg loss 0.00851501, throughput 5.67329K wps
[Epoch 32 Batch 120/173] avg loss 0.00860103, throughput 5.55106K wps
[Epoch 32 Batch 150/173] avg loss 0.00879249, throughput 5.00583K wps
Begin Testing...
[Epoch 32] train avg loss 0.00860789, dev acc 0.7716, dev avg loss 0.472491, throughput 5.14567K wps
[Epoch 33 Batch 30/173] avg loss 0.00848972, throughput 4.86438K wps
[Epoch 33 Batch 60/173] avg loss 0.00852805, throughput 5.02214K wps
[Epoch 33 Batch 90/173] avg loss 0.00883718, throughput 5.4095K wps
[Epoch 33 Batch 120/173] avg loss 0.00849782, throughput 4.91691K wps
[Epoch 33 Batch 150/173] avg loss 0.00836777, throughput 5.49028K wps
Begin Testing...
[Epoch 33] train avg loss 0.00852656, dev acc 0.7748, dev avg loss 0.46895, throughput 5.21126K wps
[Epoch 34 Batch 30/173] avg loss 0.00844546, throughput 5.25692K wps
[Epoch 34 Batch 60/173] avg loss 0.00856862, throughput 5.20767K wps
[Epoch 34 Batch 90/173] avg loss 0.00860515, throughput 5.0241K wps
[Epoch 34 Batch 120/173] avg loss 0.00816304, throughput 5.44326K wps
[Epoch 34 Batch 150/173] avg loss 0.00812941, throughput 5.32859K wps
Begin Testing...
[Epoch 34] train avg loss 0.00844063, dev acc 0.7779, dev avg loss 0.465843, throughput 5.22726K wps
[Epoch 35 Batch 30/173] avg loss 0.00842177, throughput 4.71028K wps
[Epoch 35 Batch 60/173] avg loss 0.00837883, throughput 5.86435K wps
[Epoch 35 Batch 90/173] avg loss 0.00853048, throughput 6.51554K wps
[Epoch 35 Batch 120/173] avg loss 0.00810275, throughput 5.54021K wps
[Epoch 35 Batch 150/173] avg loss 0.00859221, throughput 5.31864K wps
Begin Testing...
[Epoch 35] train avg loss 0.00835496, dev acc 0.7810, dev avg loss 0.463577, throughput 5.50655K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/173] avg loss 0.00833864, throughput 6.25589K wps
[Epoch 36 Batch 60/173] avg loss 0.00823669, throughput 6.28847K wps
[Epoch 36 Batch 90/173] avg loss 0.00829498, throughput 5.26693K wps
[Epoch 36 Batch 120/173] avg loss 0.00808973, throughput 4.84726K wps
[Epoch 36 Batch 150/173] avg loss 0.00801433, throughput 5.08836K wps
Begin Testing...
[Epoch 36] train avg loss 0.00831507, dev acc 0.7831, dev avg loss 0.462674, throughput 5.46573K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/173] avg loss 0.00803638, throughput 5.33064K wps
[Epoch 37 Batch 60/173] avg loss 0.00792476, throughput 5.1026K wps
[Epoch 37 Batch 90/173] avg loss 0.00790008, throughput 5.01888K wps
[Epoch 37 Batch 120/173] avg loss 0.00812842, throughput 5.40784K wps
[Epoch 37 Batch 150/173] avg loss 0.00845738, throughput 4.75967K wps
Begin Testing...
[Epoch 37] train avg loss 0.00814408, dev acc 0.7789, dev avg loss 0.461185, throughput 5.15085K wps
[Epoch 38 Batch 30/173] avg loss 0.00831746, throughput 4.93704K wps
[Epoch 38 Batch 60/173] avg loss 0.00831661, throughput 5.12182K wps
[Epoch 38 Batch 90/173] avg loss 0.00800559, throughput 5.39462K wps
[Epoch 38 Batch 120/173] avg loss 0.0079098, throughput 5.60445K wps
[Epoch 38 Batch 150/173] avg loss 0.00788043, throughput 4.88707K wps
Begin Testing...
[Epoch 38] train avg loss 0.00815891, dev acc 0.7810, dev avg loss 0.456591, throughput 5.16415K wps
[Epoch 39 Batch 30/173] avg loss 0.00793533, throughput 4.86358K wps
[Epoch 39 Batch 60/173] avg loss 0.00786076, throughput 5.05653K wps
[Epoch 39 Batch 90/173] avg loss 0.00795921, throughput 5.70159K wps
[Epoch 39 Batch 120/173] avg loss 0.0082513, throughput 5.45153K wps
[Epoch 39 Batch 150/173] avg loss 0.0079253, throughput 5.58942K wps
Begin Testing...
[Epoch 39] train avg loss 0.00796423, dev acc 0.7831, dev avg loss 0.45724, throughput 5.24373K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/173] avg loss 0.00810932, throughput 4.98574K wps
[Epoch 40 Batch 60/173] avg loss 0.00786716, throughput 4.69969K wps
[Epoch 40 Batch 90/173] avg loss 0.00801599, throughput 5.22113K wps
[Epoch 40 Batch 120/173] avg loss 0.00791708, throughput 5.16666K wps
[Epoch 40 Batch 150/173] avg loss 0.00792645, throughput 4.93577K wps
Begin Testing...
[Epoch 40] train avg loss 0.00796071, dev acc 0.7842, dev avg loss 0.453627, throughput 5.03288K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/173] avg loss 0.00780769, throughput 5.23271K wps
[Epoch 41 Batch 60/173] avg loss 0.00812624, throughput 5.76368K wps
[Epoch 41 Batch 90/173] avg loss 0.00763975, throughput 5.62821K wps
[Epoch 41 Batch 120/173] avg loss 0.00795194, throughput 5.04681K wps
[Epoch 41 Batch 150/173] avg loss 0.00760103, throughput 5.24532K wps
Begin Testing...
[Epoch 41] train avg loss 0.00784931, dev acc 0.7852, dev avg loss 0.452142, throughput 5.4929K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/173] avg loss 0.0075691, throughput 6.15305K wps
[Epoch 42 Batch 60/173] avg loss 0.0077757, throughput 5.43101K wps
[Epoch 42 Batch 90/173] avg loss 0.00745902, throughput 5.75798K wps
[Epoch 42 Batch 120/173] avg loss 0.00769083, throughput 5.71342K wps
[Epoch 42 Batch 150/173] avg loss 0.00787255, throughput 5.52335K wps
Begin Testing...
[Epoch 42] train avg loss 0.00771142, dev acc 0.7789, dev avg loss 0.454847, throughput 5.67073K wps
[Epoch 43 Batch 30/173] avg loss 0.00782744, throughput 4.72641K wps
[Epoch 43 Batch 60/173] avg loss 0.00760378, throughput 4.67601K wps
[Epoch 43 Batch 90/173] avg loss 0.00763019, throughput 4.85695K wps
[Epoch 43 Batch 120/173] avg loss 0.00760057, throughput 5.64596K wps
[Epoch 43 Batch 150/173] avg loss 0.00770805, throughput 6.00004K wps
Begin Testing...
[Epoch 43] train avg loss 0.00768559, dev acc 0.7862, dev avg loss 0.448711, throughput 5.09916K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/173] avg loss 0.00759643, throughput 5.92579K wps
[Epoch 44 Batch 60/173] avg loss 0.0073983, throughput 4.87506K wps
[Epoch 44 Batch 90/173] avg loss 0.00763372, throughput 4.9287K wps
[Epoch 44 Batch 120/173] avg loss 0.00721404, throughput 5.24304K wps
[Epoch 44 Batch 150/173] avg loss 0.00785114, throughput 5.08794K wps
Begin Testing...
[Epoch 44] train avg loss 0.00753373, dev acc 0.7883, dev avg loss 0.447565, throughput 5.12493K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/173] avg loss 0.00726341, throughput 4.87587K wps
[Epoch 45 Batch 60/173] avg loss 0.0076489, throughput 5.43774K wps
[Epoch 45 Batch 90/173] avg loss 0.00740487, throughput 5.44682K wps
[Epoch 45 Batch 120/173] avg loss 0.00763899, throughput 6.09191K wps
[Epoch 45 Batch 150/173] avg loss 0.00755586, throughput 5.63798K wps
Begin Testing...
[Epoch 45] train avg loss 0.00751363, dev acc 0.7925, dev avg loss 0.446934, throughput 5.5892K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/173] avg loss 0.00724837, throughput 5.39435K wps
[Epoch 46 Batch 60/173] avg loss 0.00743692, throughput 5.03909K wps
[Epoch 46 Batch 90/173] avg loss 0.00731779, throughput 5.39071K wps
[Epoch 46 Batch 120/173] avg loss 0.00735599, throughput 4.85563K wps
[Epoch 46 Batch 150/173] avg loss 0.00730555, throughput 5.10632K wps
Begin Testing...
[Epoch 46] train avg loss 0.00740584, dev acc 0.7894, dev avg loss 0.445736, throughput 5.24633K wps
[Epoch 47 Batch 30/173] avg loss 0.00788621, throughput 6.00378K wps
[Epoch 47 Batch 60/173] avg loss 0.00741148, throughput 5.33377K wps
[Epoch 47 Batch 90/173] avg loss 0.00717844, throughput 5.15794K wps
[Epoch 47 Batch 120/173] avg loss 0.00736097, throughput 5.12047K wps
[Epoch 47 Batch 150/173] avg loss 0.00688404, throughput 4.91012K wps
Begin Testing...
[Epoch 47] train avg loss 0.00734194, dev acc 0.7862, dev avg loss 0.44348, throughput 5.40316K wps
[Epoch 48 Batch 30/173] avg loss 0.00729473, throughput 5.41599K wps
[Epoch 48 Batch 60/173] avg loss 0.00747261, throughput 4.9641K wps
[Epoch 48 Batch 90/173] avg loss 0.00685175, throughput 5.38719K wps
[Epoch 48 Batch 120/173] avg loss 0.00736881, throughput 4.98717K wps
[Epoch 48 Batch 150/173] avg loss 0.007124, throughput 5.53898K wps
Begin Testing...
[Epoch 48] train avg loss 0.00727301, dev acc 0.7894, dev avg loss 0.443372, throughput 5.22792K wps
[Epoch 49 Batch 30/173] avg loss 0.0069245, throughput 4.94762K wps
[Epoch 49 Batch 60/173] avg loss 0.00734754, throughput 4.73399K wps
[Epoch 49 Batch 90/173] avg loss 0.00680082, throughput 5.38252K wps
[Epoch 49 Batch 120/173] avg loss 0.00715002, throughput 5.50217K wps
[Epoch 49 Batch 150/173] avg loss 0.00737552, throughput 5.36275K wps
Begin Testing...
[Epoch 49] train avg loss 0.00715702, dev acc 0.7831, dev avg loss 0.449532, throughput 5.13453K wps
[Epoch 50 Batch 30/173] avg loss 0.00726218, throughput 4.84621K wps
[Epoch 50 Batch 60/173] avg loss 0.00708797, throughput 5.48158K wps
[Epoch 50 Batch 90/173] avg loss 0.00693092, throughput 5.18776K wps
[Epoch 50 Batch 120/173] avg loss 0.00699111, throughput 5.13588K wps
[Epoch 50 Batch 150/173] avg loss 0.00730843, throughput 4.6767K wps
Begin Testing...
[Epoch 50] train avg loss 0.00712456, dev acc 0.7873, dev avg loss 0.443893, throughput 5.1121K wps
[Epoch 51 Batch 30/173] avg loss 0.00736681, throughput 5.49221K wps
[Epoch 51 Batch 60/173] avg loss 0.00696524, throughput 4.89574K wps
[Epoch 51 Batch 90/173] avg loss 0.00696648, throughput 4.87279K wps
[Epoch 51 Batch 120/173] avg loss 0.00695141, throughput 5.47755K wps
[Epoch 51 Batch 150/173] avg loss 0.00696118, throughput 5.18957K wps
Begin Testing...
[Epoch 51] train avg loss 0.00698936, dev acc 0.7873, dev avg loss 0.441301, throughput 5.14088K wps
[Epoch 52 Batch 30/173] avg loss 0.00655026, throughput 5.42189K wps
[Epoch 52 Batch 60/173] avg loss 0.00680079, throughput 5.55822K wps
[Epoch 52 Batch 90/173] avg loss 0.00696271, throughput 4.93135K wps
[Epoch 52 Batch 120/173] avg loss 0.0067788, throughput 5.16963K wps
[Epoch 52 Batch 150/173] avg loss 0.00716318, throughput 5.09207K wps
Begin Testing...
[Epoch 52] train avg loss 0.00693712, dev acc 0.7883, dev avg loss 0.439947, throughput 5.32997K wps
[Epoch 53 Batch 30/173] avg loss 0.00684984, throughput 4.78314K wps
[Epoch 53 Batch 60/173] avg loss 0.00679749, throughput 5.16899K wps
[Epoch 53 Batch 90/173] avg loss 0.00679176, throughput 5.83989K wps
[Epoch 53 Batch 120/173] avg loss 0.00700022, throughput 5.75093K wps
[Epoch 53 Batch 150/173] avg loss 0.00681756, throughput 4.97222K wps
Begin Testing...
[Epoch 53] train avg loss 0.00682849, dev acc 0.7883, dev avg loss 0.438687, throughput 5.27169K wps
[Epoch 54 Batch 30/173] avg loss 0.0070752, throughput 5.2744K wps
[Epoch 54 Batch 60/173] avg loss 0.00654824, throughput 4.84812K wps
[Epoch 54 Batch 90/173] avg loss 0.00701106, throughput 5.01738K wps
[Epoch 54 Batch 120/173] avg loss 0.00663593, throughput 5.565K wps
[Epoch 54 Batch 150/173] avg loss 0.00673563, throughput 5.02507K wps
Begin Testing...
[Epoch 54] train avg loss 0.00677444, dev acc 0.7894, dev avg loss 0.437221, throughput 5.15802K wps
[Epoch 55 Batch 30/173] avg loss 0.00683841, throughput 5.15866K wps
[Epoch 55 Batch 60/173] avg loss 0.00666688, throughput 5.21033K wps
[Epoch 55 Batch 90/173] avg loss 0.00687365, throughput 4.94896K wps
[Epoch 55 Batch 120/173] avg loss 0.00646459, throughput 5.01614K wps
[Epoch 55 Batch 150/173] avg loss 0.00670726, throughput 6.00577K wps
Begin Testing...
[Epoch 55] train avg loss 0.00670042, dev acc 0.7894, dev avg loss 0.436107, throughput 5.28271K wps
[Epoch 56 Batch 30/173] avg loss 0.00604973, throughput 5.25142K wps
[Epoch 56 Batch 60/173] avg loss 0.00692419, throughput 5.06721K wps
[Epoch 56 Batch 90/173] avg loss 0.00647523, throughput 5.06462K wps
[Epoch 56 Batch 120/173] avg loss 0.00670336, throughput 5.35285K wps
[Epoch 56 Batch 150/173] avg loss 0.00666434, throughput 5.22653K wps
Begin Testing...
[Epoch 56] train avg loss 0.00652577, dev acc 0.7883, dev avg loss 0.434546, throughput 5.15196K wps
[Epoch 57 Batch 30/173] avg loss 0.00635412, throughput 5.32018K wps
[Epoch 57 Batch 60/173] avg loss 0.00650541, throughput 5.54456K wps
[Epoch 57 Batch 90/173] avg loss 0.00622627, throughput 4.93984K wps
[Epoch 57 Batch 120/173] avg loss 0.00668848, throughput 5.22578K wps
[Epoch 57 Batch 150/173] avg loss 0.00659858, throughput 5.13789K wps
Begin Testing...
[Epoch 57] train avg loss 0.00652602, dev acc 0.7779, dev avg loss 0.448506, throughput 5.19308K wps
[Epoch 58 Batch 30/173] avg loss 0.00656444, throughput 5.43306K wps
[Epoch 58 Batch 60/173] avg loss 0.00645889, throughput 6.09229K wps
[Epoch 58 Batch 90/173] avg loss 0.00658219, throughput 5.4678K wps
[Epoch 58 Batch 120/173] avg loss 0.00617269, throughput 5.94739K wps
[Epoch 58 Batch 150/173] avg loss 0.00641285, throughput 5.73992K wps
Begin Testing...
[Epoch 58] train avg loss 0.0064533, dev acc 0.7904, dev avg loss 0.433512, throughput 5.62534K wps
[Epoch 59 Batch 30/173] avg loss 0.00594797, throughput 5.57101K wps
[Epoch 59 Batch 60/173] avg loss 0.00601774, throughput 5.00733K wps
[Epoch 59 Batch 90/173] avg loss 0.00660227, throughput 4.81297K wps
[Epoch 59 Batch 120/173] avg loss 0.006434, throughput 4.81892K wps
[Epoch 59 Batch 150/173] avg loss 0.00671271, throughput 5.21608K wps
Begin Testing...
[Epoch 59] train avg loss 0.00636151, dev acc 0.7935, dev avg loss 0.435217, throughput 5.0253K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/173] avg loss 0.00647872, throughput 4.81504K wps
[Epoch 60 Batch 60/173] avg loss 0.00625218, throughput 5.13574K wps
[Epoch 60 Batch 90/173] avg loss 0.00606309, throughput 5.33242K wps
[Epoch 60 Batch 120/173] avg loss 0.00636104, throughput 5.69357K wps
[Epoch 60 Batch 150/173] avg loss 0.00625584, throughput 4.96934K wps
Begin Testing...
[Epoch 60] train avg loss 0.00632084, dev acc 0.7925, dev avg loss 0.434302, throughput 5.2811K wps
[Epoch 61 Batch 30/173] avg loss 0.00612305, throughput 5.10645K wps
[Epoch 61 Batch 60/173] avg loss 0.00603228, throughput 6.28122K wps
[Epoch 61 Batch 90/173] avg loss 0.00609432, throughput 5.3435K wps
[Epoch 61 Batch 120/173] avg loss 0.00622009, throughput 5.20123K wps
[Epoch 61 Batch 150/173] avg loss 0.00598117, throughput 6.22617K wps
Begin Testing...
[Epoch 61] train avg loss 0.00618025, dev acc 0.7914, dev avg loss 0.431613, throughput 5.61168K wps
[Epoch 62 Batch 30/173] avg loss 0.00602596, throughput 5.77097K wps
[Epoch 62 Batch 60/173] avg loss 0.00606351, throughput 5.5464K wps
[Epoch 62 Batch 90/173] avg loss 0.00590579, throughput 5.13671K wps
[Epoch 62 Batch 120/173] avg loss 0.00597772, throughput 5.19243K wps
[Epoch 62 Batch 150/173] avg loss 0.00600756, throughput 5.53245K wps
Begin Testing...
[Epoch 62] train avg loss 0.00609854, dev acc 0.7946, dev avg loss 0.430257, throughput 5.41023K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/173] avg loss 0.00581616, throughput 4.80698K wps
[Epoch 63 Batch 60/173] avg loss 0.00596365, throughput 4.91679K wps
[Epoch 63 Batch 90/173] avg loss 0.00616665, throughput 5.039K wps
[Epoch 63 Batch 120/173] avg loss 0.00608231, throughput 5.05917K wps
[Epoch 63 Batch 150/173] avg loss 0.00596131, throughput 5.73066K wps
Begin Testing...
[Epoch 63] train avg loss 0.00604888, dev acc 0.7914, dev avg loss 0.434314, throughput 5.16192K wps
[Epoch 64 Batch 30/173] avg loss 0.00578611, throughput 4.81108K wps
[Epoch 64 Batch 60/173] avg loss 0.006098, throughput 5.46343K wps
[Epoch 64 Batch 90/173] avg loss 0.0060381, throughput 5.39181K wps
[Epoch 64 Batch 120/173] avg loss 0.00608309, throughput 4.74882K wps
[Epoch 64 Batch 150/173] avg loss 0.00596108, throughput 4.99991K wps
Begin Testing...
[Epoch 64] train avg loss 0.00602001, dev acc 0.7904, dev avg loss 0.428923, throughput 5.10436K wps
[Epoch 65 Batch 30/173] avg loss 0.00608121, throughput 5.29427K wps
[Epoch 65 Batch 60/173] avg loss 0.00618151, throughput 6.50352K wps
[Epoch 65 Batch 90/173] avg loss 0.00567021, throughput 5.31004K wps
[Epoch 65 Batch 120/173] avg loss 0.00589132, throughput 5.08299K wps
[Epoch 65 Batch 150/173] avg loss 0.00582598, throughput 4.91639K wps
Begin Testing...
[Epoch 65] train avg loss 0.00591281, dev acc 0.7925, dev avg loss 0.432201, throughput 5.31209K wps
[Epoch 66 Batch 30/173] avg loss 0.0057937, throughput 5.21536K wps
[Epoch 66 Batch 60/173] avg loss 0.00589745, throughput 5.21429K wps
[Epoch 66 Batch 90/173] avg loss 0.00582096, throughput 5.21313K wps
[Epoch 66 Batch 120/173] avg loss 0.00605274, throughput 5.43292K wps
[Epoch 66 Batch 150/173] avg loss 0.00572514, throughput 5.2029K wps
Begin Testing...
[Epoch 66] train avg loss 0.00584116, dev acc 0.7904, dev avg loss 0.428578, throughput 5.19333K wps
[Epoch 67 Batch 30/173] avg loss 0.00575233, throughput 5.17548K wps
[Epoch 67 Batch 60/173] avg loss 0.00566874, throughput 5.02992K wps
[Epoch 67 Batch 90/173] avg loss 0.00559187, throughput 5.02296K wps
[Epoch 67 Batch 120/173] avg loss 0.00578279, throughput 5.05896K wps
[Epoch 67 Batch 150/173] avg loss 0.00583881, throughput 5.23927K wps
Begin Testing...
[Epoch 67] train avg loss 0.00573731, dev acc 0.7904, dev avg loss 0.429952, throughput 5.12624K wps
[Epoch 68 Batch 30/173] avg loss 0.00557026, throughput 6.22254K wps
[Epoch 68 Batch 60/173] avg loss 0.00553491, throughput 5.98041K wps
[Epoch 68 Batch 90/173] avg loss 0.00580501, throughput 5.57026K wps
[Epoch 68 Batch 120/173] avg loss 0.0055234, throughput 4.92216K wps
[Epoch 68 Batch 150/173] avg loss 0.0057866, throughput 6.55951K wps
Begin Testing...
[Epoch 68] train avg loss 0.00564608, dev acc 0.7904, dev avg loss 0.431789, throughput 5.75131K wps
[Epoch 69 Batch 30/173] avg loss 0.00586285, throughput 5.02275K wps
[Epoch 69 Batch 60/173] avg loss 0.00571534, throughput 4.91563K wps
[Epoch 69 Batch 90/173] avg loss 0.00543957, throughput 4.74878K wps
[Epoch 69 Batch 120/173] avg loss 0.00580688, throughput 4.83409K wps
[Epoch 69 Batch 150/173] avg loss 0.0054443, throughput 5.56945K wps
Begin Testing...
[Epoch 69] train avg loss 0.00564083, dev acc 0.7914, dev avg loss 0.431274, throughput 4.98297K wps
[Epoch 70 Batch 30/173] avg loss 0.0054917, throughput 5.96174K wps
[Epoch 70 Batch 60/173] avg loss 0.00540189, throughput 6.24134K wps
[Epoch 70 Batch 90/173] avg loss 0.00536512, throughput 5.08151K wps
[Epoch 70 Batch 120/173] avg loss 0.00559309, throughput 5.91873K wps
[Epoch 70 Batch 150/173] avg loss 0.00586053, throughput 5.14583K wps
Begin Testing...
[Epoch 70] train avg loss 0.00550357, dev acc 0.7925, dev avg loss 0.430245, throughput 5.48605K wps
[Epoch 71 Batch 30/173] avg loss 0.0052583, throughput 4.8814K wps
[Epoch 71 Batch 60/173] avg loss 0.0054502, throughput 5.11522K wps
[Epoch 71 Batch 90/173] avg loss 0.00564737, throughput 5.52592K wps
[Epoch 71 Batch 120/173] avg loss 0.00572889, throughput 5.4757K wps
[Epoch 71 Batch 150/173] avg loss 0.00560325, throughput 5.8613K wps
Begin Testing...
[Epoch 71] train avg loss 0.0054791, dev acc 0.7904, dev avg loss 0.427184, throughput 5.46521K wps
[Epoch 72 Batch 30/173] avg loss 0.00538019, throughput 5.40416K wps
[Epoch 72 Batch 60/173] avg loss 0.00526158, throughput 5.22846K wps
[Epoch 72 Batch 90/173] avg loss 0.00543858, throughput 5.20646K wps
[Epoch 72 Batch 120/173] avg loss 0.00575364, throughput 5.51417K wps
[Epoch 72 Batch 150/173] avg loss 0.00523645, throughput 6.68966K wps
Begin Testing...
[Epoch 72] train avg loss 0.00542923, dev acc 0.7914, dev avg loss 0.429436, throughput 5.54296K wps
[Epoch 73 Batch 30/173] avg loss 0.00526897, throughput 5.44965K wps
[Epoch 73 Batch 60/173] avg loss 0.00531864, throughput 5.22546K wps
[Epoch 73 Batch 90/173] avg loss 0.00559466, throughput 4.90737K wps
[Epoch 73 Batch 120/173] avg loss 0.0054368, throughput 4.8958K wps
[Epoch 73 Batch 150/173] avg loss 0.00546007, throughput 5.58518K wps
Begin Testing...
[Epoch 73] train avg loss 0.00537957, dev acc 0.7852, dev avg loss 0.436023, throughput 5.19588K wps
[Epoch 74 Batch 30/173] avg loss 0.00516556, throughput 4.94643K wps
[Epoch 74 Batch 60/173] avg loss 0.00523938, throughput 5.17546K wps
[Epoch 74 Batch 90/173] avg loss 0.00543824, throughput 5.1305K wps
[Epoch 74 Batch 120/173] avg loss 0.00485524, throughput 5.73957K wps
[Epoch 74 Batch 150/173] avg loss 0.00552267, throughput 5.92377K wps
Begin Testing...
[Epoch 74] train avg loss 0.00527842, dev acc 0.7925, dev avg loss 0.426865, throughput 5.31381K wps
[Epoch 75 Batch 30/173] avg loss 0.0051801, throughput 4.83678K wps
[Epoch 75 Batch 60/173] avg loss 0.00542937, throughput 5.21921K wps
[Epoch 75 Batch 90/173] avg loss 0.00500201, throughput 6.21548K wps
[Epoch 75 Batch 120/173] avg loss 0.00511398, throughput 5.62921K wps
[Epoch 75 Batch 150/173] avg loss 0.00507856, throughput 4.84519K wps
Begin Testing...
[Epoch 75] train avg loss 0.00521843, dev acc 0.7883, dev avg loss 0.429885, throughput 5.24511K wps
[Epoch 76 Batch 30/173] avg loss 0.005144, throughput 5.00119K wps
[Epoch 76 Batch 60/173] avg loss 0.00505689, throughput 4.87633K wps
[Epoch 76 Batch 90/173] avg loss 0.00490429, throughput 5.67478K wps
[Epoch 76 Batch 120/173] avg loss 0.00510665, throughput 5.00431K wps
[Epoch 76 Batch 150/173] avg loss 0.00506836, throughput 4.99238K wps
Begin Testing...
[Epoch 76] train avg loss 0.00511197, dev acc 0.7914, dev avg loss 0.426836, throughput 5.16428K wps
[Epoch 77 Batch 30/173] avg loss 0.00522338, throughput 5.76713K wps
[Epoch 77 Batch 60/173] avg loss 0.00495614, throughput 5.55082K wps
[Epoch 77 Batch 90/173] avg loss 0.00478081, throughput 5.043K wps
[Epoch 77 Batch 120/173] avg loss 0.00514471, throughput 5.53497K wps
[Epoch 77 Batch 150/173] avg loss 0.00504763, throughput 5.52937K wps
Begin Testing...
[Epoch 77] train avg loss 0.00503127, dev acc 0.7914, dev avg loss 0.428062, throughput 5.49424K wps
[Epoch 78 Batch 30/173] avg loss 0.00493082, throughput 5.18842K wps
[Epoch 78 Batch 60/173] avg loss 0.00503286, throughput 5.68746K wps
[Epoch 78 Batch 90/173] avg loss 0.00499969, throughput 5.66741K wps
[Epoch 78 Batch 120/173] avg loss 0.00485796, throughput 5.20899K wps
[Epoch 78 Batch 150/173] avg loss 0.00493304, throughput 5.96673K wps
Begin Testing...
[Epoch 78] train avg loss 0.00497305, dev acc 0.7883, dev avg loss 0.428116, throughput 5.42474K wps
[Epoch 79 Batch 30/173] avg loss 0.00512717, throughput 5.12004K wps
[Epoch 79 Batch 60/173] avg loss 0.00495075, throughput 5.23954K wps
[Epoch 79 Batch 90/173] avg loss 0.00486652, throughput 4.98239K wps
[Epoch 79 Batch 120/173] avg loss 0.00502641, throughput 5.93437K wps
[Epoch 79 Batch 150/173] avg loss 0.00466589, throughput 5.6735K wps
Begin Testing...
[Epoch 79] train avg loss 0.00492523, dev acc 0.7883, dev avg loss 0.427014, throughput 5.43167K wps
[Epoch 80 Batch 30/173] avg loss 0.0047593, throughput 5.32372K wps
[Epoch 80 Batch 60/173] avg loss 0.00508739, throughput 5.051K wps
[Epoch 80 Batch 90/173] avg loss 0.00500629, throughput 4.8068K wps
[Epoch 80 Batch 120/173] avg loss 0.00480629, throughput 4.85258K wps
[Epoch 80 Batch 150/173] avg loss 0.00482313, throughput 4.98591K wps
Begin Testing...
[Epoch 80] train avg loss 0.00490057, dev acc 0.7967, dev avg loss 0.421785, throughput 5.01428K wps
Observed Improvement.
Begin Testing...
[Epoch 81 Batch 30/173] avg loss 0.00472488, throughput 5.79844K wps
[Epoch 81 Batch 60/173] avg loss 0.00489264, throughput 5.35333K wps
[Epoch 81 Batch 90/173] avg loss 0.00495121, throughput 4.70409K wps
[Epoch 81 Batch 120/173] avg loss 0.00476189, throughput 5.1295K wps
[Epoch 81 Batch 150/173] avg loss 0.00492851, throughput 4.99747K wps
Begin Testing...
[Epoch 81] train avg loss 0.00484975, dev acc 0.7914, dev avg loss 0.423478, throughput 5.13321K wps
[Epoch 82 Batch 30/173] avg loss 0.00464855, throughput 5.04505K wps
[Epoch 82 Batch 60/173] avg loss 0.00479047, throughput 5.23192K wps
[Epoch 82 Batch 90/173] avg loss 0.00487427, throughput 6.09994K wps
[Epoch 82 Batch 120/173] avg loss 0.00464861, throughput 5.72991K wps
[Epoch 82 Batch 150/173] avg loss 0.00482937, throughput 5.17519K wps
Begin Testing...
[Epoch 82] train avg loss 0.00477987, dev acc 0.7914, dev avg loss 0.422708, throughput 5.39496K wps
[Epoch 83 Batch 30/173] avg loss 0.00466006, throughput 5.5994K wps
[Epoch 83 Batch 60/173] avg loss 0.00458072, throughput 5.92448K wps
[Epoch 83 Batch 90/173] avg loss 0.00475008, throughput 4.95327K wps
[Epoch 83 Batch 120/173] avg loss 0.00446252, throughput 4.91881K wps
[Epoch 83 Batch 150/173] avg loss 0.00451092, throughput 5.27884K wps
Begin Testing...
[Epoch 83] train avg loss 0.00461309, dev acc 0.7904, dev avg loss 0.422875, throughput 5.32388K wps
[Epoch 84 Batch 30/173] avg loss 0.00452813, throughput 5.10998K wps
[Epoch 84 Batch 60/173] avg loss 0.00474193, throughput 5.76046K wps
[Epoch 84 Batch 90/173] avg loss 0.0042211, throughput 4.82321K wps
[Epoch 84 Batch 120/173] avg loss 0.00454269, throughput 5.91963K wps
[Epoch 84 Batch 150/173] avg loss 0.00466344, throughput 5.02908K wps
Begin Testing...
[Epoch 84] train avg loss 0.00458073, dev acc 0.7925, dev avg loss 0.420939, throughput 5.26728K wps
[Epoch 85 Batch 30/173] avg loss 0.0044351, throughput 5.19318K wps
[Epoch 85 Batch 60/173] avg loss 0.00453002, throughput 5.05699K wps
[Epoch 85 Batch 90/173] avg loss 0.0043846, throughput 5.24494K wps
[Epoch 85 Batch 120/173] avg loss 0.00482969, throughput 5.30636K wps
[Epoch 85 Batch 150/173] avg loss 0.00454934, throughput 5.59827K wps
Begin Testing...
[Epoch 85] train avg loss 0.00457332, dev acc 0.7873, dev avg loss 0.429732, throughput 5.21323K wps
[Epoch 86 Batch 30/173] avg loss 0.00473201, throughput 5.30124K wps
[Epoch 86 Batch 60/173] avg loss 0.00443206, throughput 5.28002K wps
[Epoch 86 Batch 90/173] avg loss 0.00451085, throughput 5.08373K wps
[Epoch 86 Batch 120/173] avg loss 0.00464012, throughput 5.24911K wps
[Epoch 86 Batch 150/173] avg loss 0.00435188, throughput 5.6532K wps
Begin Testing...
[Epoch 86] train avg loss 0.00451744, dev acc 0.7873, dev avg loss 0.427933, throughput 5.33511K wps
[Epoch 87 Batch 30/173] avg loss 0.00415415, throughput 4.95507K wps
[Epoch 87 Batch 60/173] avg loss 0.00435644, throughput 5.49614K wps
[Epoch 87 Batch 90/173] avg loss 0.00456464, throughput 4.8591K wps
[Epoch 87 Batch 120/173] avg loss 0.00440298, throughput 5.6126K wps
[Epoch 87 Batch 150/173] avg loss 0.00459675, throughput 5.02227K wps
Begin Testing...
[Epoch 87] train avg loss 0.00445354, dev acc 0.7904, dev avg loss 0.423484, throughput 5.23631K wps
[Epoch 88 Batch 30/173] avg loss 0.004662, throughput 5.20685K wps
[Epoch 88 Batch 60/173] avg loss 0.00442013, throughput 4.86416K wps
[Epoch 88 Batch 90/173] avg loss 0.00436812, throughput 6.20255K wps
[Epoch 88 Batch 120/173] avg loss 0.00417069, throughput 4.87978K wps
[Epoch 88 Batch 150/173] avg loss 0.00455946, throughput 5.40351K wps
Begin Testing...
[Epoch 88] train avg loss 0.00443153, dev acc 0.7987, dev avg loss 0.419993, throughput 5.23532K wps
Observed Improvement.
Begin Testing...
[Epoch 89 Batch 30/173] avg loss 0.0045649, throughput 5.01458K wps
[Epoch 89 Batch 60/173] avg loss 0.00443275, throughput 5.79837K wps
[Epoch 89 Batch 90/173] avg loss 0.00466245, throughput 5.43453K wps
[Epoch 89 Batch 120/173] avg loss 0.00416327, throughput 4.98654K wps
[Epoch 89 Batch 150/173] avg loss 0.00414088, throughput 6.55682K wps
Begin Testing...
[Epoch 89] train avg loss 0.00442868, dev acc 0.8029, dev avg loss 0.418858, throughput 5.4992K wps
Observed Improvement.
Begin Testing...
[Epoch 90 Batch 30/173] avg loss 0.00434306, throughput 6.23466K wps
[Epoch 90 Batch 60/173] avg loss 0.00430312, throughput 5.35913K wps
[Epoch 90 Batch 90/173] avg loss 0.00443366, throughput 5.60526K wps
[Epoch 90 Batch 120/173] avg loss 0.00430673, throughput 5.66173K wps
[Epoch 90 Batch 150/173] avg loss 0.00429057, throughput 5.85631K wps
Begin Testing...
[Epoch 90] train avg loss 0.00434797, dev acc 0.7883, dev avg loss 0.428018, throughput 5.572K wps
[Epoch 91 Batch 30/173] avg loss 0.00421943, throughput 5.10618K wps
[Epoch 91 Batch 60/173] avg loss 0.0043218, throughput 5.29536K wps
[Epoch 91 Batch 90/173] avg loss 0.00402173, throughput 6.1098K wps
[Epoch 91 Batch 120/173] avg loss 0.00424647, throughput 5.21686K wps
[Epoch 91 Batch 150/173] avg loss 0.00412891, throughput 5.33246K wps
Begin Testing...
[Epoch 91] train avg loss 0.00420532, dev acc 0.7894, dev avg loss 0.427631, throughput 5.29951K wps
[Epoch 92 Batch 30/173] avg loss 0.00409454, throughput 5.21682K wps
[Epoch 92 Batch 60/173] avg loss 0.00416609, throughput 5.33749K wps
[Epoch 92 Batch 90/173] avg loss 0.00420133, throughput 5.3347K wps
[Epoch 92 Batch 120/173] avg loss 0.0042807, throughput 6.07427K wps
[Epoch 92 Batch 150/173] avg loss 0.00419312, throughput 5.42918K wps
Begin Testing...
[Epoch 92] train avg loss 0.00419049, dev acc 0.7998, dev avg loss 0.4199, throughput 5.38842K wps
[Epoch 93 Batch 30/173] avg loss 0.00422784, throughput 4.64376K wps
[Epoch 93 Batch 60/173] avg loss 0.0039395, throughput 4.75572K wps
[Epoch 93 Batch 90/173] avg loss 0.00409581, throughput 5.67548K wps
[Epoch 93 Batch 120/173] avg loss 0.00427088, throughput 5.42452K wps
[Epoch 93 Batch 150/173] avg loss 0.00396946, throughput 5.31405K wps
Begin Testing...
[Epoch 93] train avg loss 0.00413013, dev acc 0.8029, dev avg loss 0.419107, throughput 5.27168K wps
Observed Improvement.
Begin Testing...
[Epoch 94 Batch 30/173] avg loss 0.00395327, throughput 5.04898K wps
[Epoch 94 Batch 60/173] avg loss 0.00396136, throughput 5.0837K wps
[Epoch 94 Batch 90/173] avg loss 0.00409044, throughput 4.92999K wps
[Epoch 94 Batch 120/173] avg loss 0.00425892, throughput 5.79967K wps
[Epoch 94 Batch 150/173] avg loss 0.00407452, throughput 5.86171K wps
Begin Testing...
[Epoch 94] train avg loss 0.00402829, dev acc 0.7946, dev avg loss 0.421193, throughput 5.31978K wps
[Epoch 95 Batch 30/173] avg loss 0.00400416, throughput 6.18958K wps
[Epoch 95 Batch 60/173] avg loss 0.00398091, throughput 5.16866K wps
[Epoch 95 Batch 90/173] avg loss 0.00409183, throughput 5.1493K wps
[Epoch 95 Batch 120/173] avg loss 0.00413842, throughput 4.95463K wps
[Epoch 95 Batch 150/173] avg loss 0.00412045, throughput 5.54501K wps
Begin Testing...
[Epoch 95] train avg loss 0.00404294, dev acc 0.7925, dev avg loss 0.421944, throughput 5.3027K wps
[Epoch 96 Batch 30/173] avg loss 0.0037847, throughput 4.87826K wps
[Epoch 96 Batch 60/173] avg loss 0.0041666, throughput 5.89819K wps
[Epoch 96 Batch 90/173] avg loss 0.00397166, throughput 5.25484K wps
[Epoch 96 Batch 120/173] avg loss 0.0039952, throughput 5.17089K wps
[Epoch 96 Batch 150/173] avg loss 0.00398519, throughput 5.24753K wps
Begin Testing...
[Epoch 96] train avg loss 0.00397518, dev acc 0.8019, dev avg loss 0.420019, throughput 5.22739K wps
[Epoch 97 Batch 30/173] avg loss 0.00370354, throughput 5.33557K wps
[Epoch 97 Batch 60/173] avg loss 0.00410355, throughput 5.27681K wps
[Epoch 97 Batch 90/173] avg loss 0.00381682, throughput 5.83487K wps
[Epoch 97 Batch 120/173] avg loss 0.00407028, throughput 4.8606K wps
[Epoch 97 Batch 150/173] avg loss 0.00392902, throughput 4.96927K wps
Begin Testing...
[Epoch 97] train avg loss 0.00391166, dev acc 0.8071, dev avg loss 0.418049, throughput 5.23847K wps
Observed Improvement.
Begin Testing...
[Epoch 98 Batch 30/173] avg loss 0.00377055, throughput 5.67227K wps
[Epoch 98 Batch 60/173] avg loss 0.00380966, throughput 5.6445K wps
[Epoch 98 Batch 90/173] avg loss 0.00381864, throughput 5.45545K wps
[Epoch 98 Batch 120/173] avg loss 0.00396713, throughput 5.02828K wps
[Epoch 98 Batch 150/173] avg loss 0.00389078, throughput 5.23094K wps
Begin Testing...
[Epoch 98] train avg loss 0.00385337, dev acc 0.7998, dev avg loss 0.418028, throughput 5.38637K wps
[Epoch 99 Batch 30/173] avg loss 0.00364407, throughput 5.0104K wps
[Epoch 99 Batch 60/173] avg loss 0.00357574, throughput 5.15242K wps
[Epoch 99 Batch 90/173] avg loss 0.00371001, throughput 5.99061K wps
[Epoch 99 Batch 120/173] avg loss 0.00373916, throughput 5.83122K wps
[Epoch 99 Batch 150/173] avg loss 0.0041151, throughput 5.23821K wps
Begin Testing...
[Epoch 99] train avg loss 0.00375117, dev acc 0.7956, dev avg loss 0.422121, throughput 5.40333K wps
[Epoch 100 Batch 30/173] avg loss 0.00363379, throughput 5.12015K wps
[Epoch 100 Batch 60/173] avg loss 0.00354942, throughput 4.99091K wps
[Epoch 100 Batch 90/173] avg loss 0.00414746, throughput 5.15578K wps
[Epoch 100 Batch 120/173] avg loss 0.0037152, throughput 6.34218K wps
[Epoch 100 Batch 150/173] avg loss 0.00376703, throughput 5.10957K wps
Begin Testing...
[Epoch 100] train avg loss 0.00379765, dev acc 0.7925, dev avg loss 0.429123, throughput 5.2554K wps
[Epoch 101 Batch 30/173] avg loss 0.0036731, throughput 5.13807K wps
[Epoch 101 Batch 60/173] avg loss 0.00381862, throughput 4.85902K wps
[Epoch 101 Batch 90/173] avg loss 0.00369471, throughput 5.32319K wps
[Epoch 101 Batch 120/173] avg loss 0.00357681, throughput 5.41045K wps
[Epoch 101 Batch 150/173] avg loss 0.00390315, throughput 6.39287K wps
Begin Testing...
[Epoch 101] train avg loss 0.00372427, dev acc 0.7935, dev avg loss 0.422797, throughput 5.39845K wps
[Epoch 102 Batch 30/173] avg loss 0.00360715, throughput 5.31388K wps
[Epoch 102 Batch 60/173] avg loss 0.00359274, throughput 4.68756K wps
[Epoch 102 Batch 90/173] avg loss 0.00395812, throughput 4.92918K wps
[Epoch 102 Batch 120/173] avg loss 0.00364752, throughput 5.03834K wps
[Epoch 102 Batch 150/173] avg loss 0.00356615, throughput 5.28817K wps
Begin Testing...
[Epoch 102] train avg loss 0.00367055, dev acc 0.7946, dev avg loss 0.425075, throughput 5.08686K wps
[Epoch 103 Batch 30/173] avg loss 0.00353909, throughput 5.28975K wps
[Epoch 103 Batch 60/173] avg loss 0.00352183, throughput 4.66431K wps
[Epoch 103 Batch 90/173] avg loss 0.00354058, throughput 4.80497K wps
[Epoch 103 Batch 120/173] avg loss 0.00379883, throughput 4.97371K wps
[Epoch 103 Batch 150/173] avg loss 0.00363039, throughput 4.85777K wps
Begin Testing...
[Epoch 103] train avg loss 0.00361597, dev acc 0.8019, dev avg loss 0.420174, throughput 4.95385K wps
[Epoch 104 Batch 30/173] avg loss 0.00383058, throughput 4.91696K wps
[Epoch 104 Batch 60/173] avg loss 0.00362132, throughput 4.8706K wps
[Epoch 104 Batch 90/173] avg loss 0.00357624, throughput 4.70033K wps
[Epoch 104 Batch 120/173] avg loss 0.00339429, throughput 4.70658K wps
[Epoch 104 Batch 150/173] avg loss 0.00342837, throughput 5.22788K wps
Begin Testing...
[Epoch 104] train avg loss 0.00356972, dev acc 0.8019, dev avg loss 0.418937, throughput 4.90059K wps
[Epoch 105 Batch 30/173] avg loss 0.00348361, throughput 5.28709K wps
[Epoch 105 Batch 60/173] avg loss 0.00335531, throughput 5.83583K wps
[Epoch 105 Batch 90/173] avg loss 0.00354299, throughput 5.75562K wps
[Epoch 105 Batch 120/173] avg loss 0.00363815, throughput 4.87083K wps
[Epoch 105 Batch 150/173] avg loss 0.00337886, throughput 5.47471K wps
Begin Testing...
[Epoch 105] train avg loss 0.00347397, dev acc 0.7977, dev avg loss 0.418889, throughput 5.52835K wps
[Epoch 106 Batch 30/173] avg loss 0.00336228, throughput 5.68421K wps
[Epoch 106 Batch 60/173] avg loss 0.00365098, throughput 5.6973K wps
[Epoch 106 Batch 90/173] avg loss 0.00353164, throughput 6.25301K wps
[Epoch 106 Batch 120/173] avg loss 0.00357672, throughput 4.8492K wps
[Epoch 106 Batch 150/173] avg loss 0.00326839, throughput 5.5834K wps
Begin Testing...
[Epoch 106] train avg loss 0.00350429, dev acc 0.8040, dev avg loss 0.418212, throughput 5.62731K wps
[Epoch 107 Batch 30/173] avg loss 0.00345488, throughput 5.47488K wps
[Epoch 107 Batch 60/173] avg loss 0.00356985, throughput 5.29157K wps
[Epoch 107 Batch 90/173] avg loss 0.00347996, throughput 4.81676K wps
[Epoch 107 Batch 120/173] avg loss 0.0034415, throughput 5.403K wps
[Epoch 107 Batch 150/173] avg loss 0.00365874, throughput 5.1823K wps
Begin Testing...
[Epoch 107] train avg loss 0.00352144, dev acc 0.7935, dev avg loss 0.441361, throughput 5.32174K wps
[Epoch 108 Batch 30/173] avg loss 0.00350256, throughput 5.00593K wps
[Epoch 108 Batch 60/173] avg loss 0.00329381, throughput 6.04411K wps
[Epoch 108 Batch 90/173] avg loss 0.00322845, throughput 4.86582K wps
[Epoch 108 Batch 120/173] avg loss 0.00350812, throughput 5.93141K wps
[Epoch 108 Batch 150/173] avg loss 0.00321349, throughput 4.75199K wps
Begin Testing...
[Epoch 108] train avg loss 0.00335134, dev acc 0.7956, dev avg loss 0.425901, throughput 5.3507K wps
[Epoch 109 Batch 30/173] avg loss 0.00344228, throughput 6.06168K wps
[Epoch 109 Batch 60/173] avg loss 0.00334204, throughput 5.75688K wps
[Epoch 109 Batch 90/173] avg loss 0.00326939, throughput 5.52983K wps
[Epoch 109 Batch 120/173] avg loss 0.00326689, throughput 5.82332K wps
[Epoch 109 Batch 150/173] avg loss 0.00325975, throughput 5.29846K wps
Begin Testing...
[Epoch 109] train avg loss 0.00333318, dev acc 0.7935, dev avg loss 0.427538, throughput 5.56149K wps
[Epoch 110 Batch 30/173] avg loss 0.0036035, throughput 6.11459K wps
[Epoch 110 Batch 60/173] avg loss 0.00325309, throughput 5.98728K wps
[Epoch 110 Batch 90/173] avg loss 0.00330562, throughput 5.43185K wps
[Epoch 110 Batch 120/173] avg loss 0.0032286, throughput 5.58335K wps
[Epoch 110 Batch 150/173] avg loss 0.0031295, throughput 6.2771K wps
Begin Testing...
[Epoch 110] train avg loss 0.00327337, dev acc 0.7998, dev avg loss 0.420756, throughput 5.8155K wps
[Epoch 111 Batch 30/173] avg loss 0.00322109, throughput 5.10219K wps
[Epoch 111 Batch 60/173] avg loss 0.00323775, throughput 5.07211K wps
[Epoch 111 Batch 90/173] avg loss 0.00328751, throughput 5.41532K wps
[Epoch 111 Batch 120/173] avg loss 0.00328301, throughput 5.42349K wps
[Epoch 111 Batch 150/173] avg loss 0.00318247, throughput 4.88498K wps
Begin Testing...
[Epoch 111] train avg loss 0.00326758, dev acc 0.8050, dev avg loss 0.4194, throughput 5.14376K wps
[Epoch 112 Batch 30/173] avg loss 0.00331083, throughput 4.89519K wps
[Epoch 112 Batch 60/173] avg loss 0.00321322, throughput 5.60681K wps
[Epoch 112 Batch 90/173] avg loss 0.00326052, throughput 5.487K wps
[Epoch 112 Batch 120/173] avg loss 0.00320441, throughput 5.28447K wps
[Epoch 112 Batch 150/173] avg loss 0.00323264, throughput 4.84617K wps
Begin Testing...
[Epoch 112] train avg loss 0.00324577, dev acc 0.7998, dev avg loss 0.422778, throughput 5.22714K wps
[Epoch 113 Batch 30/173] avg loss 0.00309666, throughput 4.82036K wps
[Epoch 113 Batch 60/173] avg loss 0.00317304, throughput 5.37386K wps
[Epoch 113 Batch 90/173] avg loss 0.00312217, throughput 6.07315K wps
[Epoch 113 Batch 120/173] avg loss 0.00348707, throughput 5.79096K wps
[Epoch 113 Batch 150/173] avg loss 0.00326692, throughput 4.89431K wps
Begin Testing...
[Epoch 113] train avg loss 0.0032285, dev acc 0.7977, dev avg loss 0.420566, throughput 5.26485K wps
[Epoch 114 Batch 30/173] avg loss 0.00336039, throughput 4.79564K wps
[Epoch 114 Batch 60/173] avg loss 0.00324588, throughput 5.17015K wps
[Epoch 114 Batch 90/173] avg loss 0.00294946, throughput 5.05604K wps
[Epoch 114 Batch 120/173] avg loss 0.00311598, throughput 4.95843K wps
[Epoch 114 Batch 150/173] avg loss 0.00322938, throughput 5.74324K wps
Begin Testing...
[Epoch 114] train avg loss 0.0032105, dev acc 0.7998, dev avg loss 0.421935, throughput 5.25465K wps
[Epoch 115 Batch 30/173] avg loss 0.00301996, throughput 6.16425K wps
[Epoch 115 Batch 60/173] avg loss 0.00333829, throughput 5.58697K wps
[Epoch 115 Batch 90/173] avg loss 0.0031723, throughput 5.85261K wps
[Epoch 115 Batch 1