Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
9371 lines (9370 sloc) 523 KB
Namespace(batch_size=50, data_name='CR', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='rand')
Use gpu0
Downloading data/cr/all-0c9633c6.zip from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/cr/all-0c9633c6.zip...
maximum length (in tokens): 105
Done! Tokenizing Time=0.06s, #Sentences=3775
SentimentNet(
(embedding): Embedding(5343 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/62] avg loss 0.0133905, throughput 0.472194K wps
[Epoch 0 Batch 60/62] avg loss 0.0130829, throughput 4.87524K wps
Begin Testing...
[Epoch 0] train avg loss 0.0134182, dev acc 0.6372, dev avg loss 0.656585, throughput 0.558766K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0131351, throughput 4.95161K wps
[Epoch 1 Batch 60/62] avg loss 0.0130822, throughput 4.87483K wps
Begin Testing...
[Epoch 1] train avg loss 0.013239, dev acc 0.6372, dev avg loss 0.665106, throughput 4.91774K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0130613, throughput 4.98161K wps
[Epoch 2 Batch 60/62] avg loss 0.0131374, throughput 4.88059K wps
Begin Testing...
[Epoch 2] train avg loss 0.0132316, dev acc 0.6372, dev avg loss 0.659264, throughput 4.93854K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0130466, throughput 4.99862K wps
[Epoch 3 Batch 60/62] avg loss 0.0130484, throughput 4.87791K wps
Begin Testing...
[Epoch 3] train avg loss 0.0132133, dev acc 0.6372, dev avg loss 0.653109, throughput 4.94387K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0129382, throughput 4.99605K wps
[Epoch 4 Batch 60/62] avg loss 0.0131058, throughput 4.8778K wps
Begin Testing...
[Epoch 4] train avg loss 0.0131738, dev acc 0.6372, dev avg loss 0.653233, throughput 4.94124K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0130042, throughput 4.97542K wps
[Epoch 5 Batch 60/62] avg loss 0.0128848, throughput 4.85904K wps
Begin Testing...
[Epoch 5] train avg loss 0.0131389, dev acc 0.6372, dev avg loss 0.653097, throughput 4.92471K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0130045, throughput 4.99178K wps
[Epoch 6 Batch 60/62] avg loss 0.0128824, throughput 4.87739K wps
Begin Testing...
[Epoch 6] train avg loss 0.0130944, dev acc 0.6372, dev avg loss 0.652494, throughput 4.94186K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0128367, throughput 4.73519K wps
[Epoch 7 Batch 60/62] avg loss 0.0129795, throughput 4.85869K wps
Begin Testing...
[Epoch 7] train avg loss 0.0130736, dev acc 0.6372, dev avg loss 0.65073, throughput 4.80766K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0128734, throughput 4.9889K wps
[Epoch 8 Batch 60/62] avg loss 0.0128347, throughput 4.86873K wps
Begin Testing...
[Epoch 8] train avg loss 0.0130417, dev acc 0.6372, dev avg loss 0.64987, throughput 4.93556K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0129785, throughput 4.98415K wps
[Epoch 9 Batch 60/62] avg loss 0.0127405, throughput 4.87373K wps
Begin Testing...
[Epoch 9] train avg loss 0.0129986, dev acc 0.6372, dev avg loss 0.649839, throughput 4.93688K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0128087, throughput 4.98923K wps
[Epoch 10 Batch 60/62] avg loss 0.0128734, throughput 4.87631K wps
Begin Testing...
[Epoch 10] train avg loss 0.0130127, dev acc 0.6372, dev avg loss 0.648138, throughput 4.93735K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0128152, throughput 4.98303K wps
[Epoch 11 Batch 60/62] avg loss 0.0128258, throughput 4.86986K wps
Begin Testing...
[Epoch 11] train avg loss 0.0129518, dev acc 0.6372, dev avg loss 0.64841, throughput 4.93328K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0127139, throughput 4.98602K wps
[Epoch 12 Batch 60/62] avg loss 0.0127385, throughput 4.86918K wps
Begin Testing...
[Epoch 12] train avg loss 0.0129118, dev acc 0.6372, dev avg loss 0.646195, throughput 4.93212K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0127027, throughput 4.96636K wps
[Epoch 13 Batch 60/62] avg loss 0.0126665, throughput 4.85456K wps
Begin Testing...
[Epoch 13] train avg loss 0.0127895, dev acc 0.6372, dev avg loss 0.646434, throughput 4.91693K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.01262, throughput 4.99736K wps
[Epoch 14 Batch 60/62] avg loss 0.0125735, throughput 4.84203K wps
Begin Testing...
[Epoch 14] train avg loss 0.0127571, dev acc 0.6372, dev avg loss 0.643843, throughput 4.92569K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0127954, throughput 4.98787K wps
[Epoch 15 Batch 60/62] avg loss 0.0124858, throughput 4.86842K wps
Begin Testing...
[Epoch 15] train avg loss 0.0128513, dev acc 0.6372, dev avg loss 0.642752, throughput 4.93392K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0123931, throughput 4.9829K wps
[Epoch 16 Batch 60/62] avg loss 0.0127885, throughput 4.87327K wps
Begin Testing...
[Epoch 16] train avg loss 0.0127857, dev acc 0.6372, dev avg loss 0.641991, throughput 4.93301K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0124896, throughput 4.97941K wps
[Epoch 17 Batch 60/62] avg loss 0.0125867, throughput 4.86467K wps
Begin Testing...
[Epoch 17] train avg loss 0.0127079, dev acc 0.6372, dev avg loss 0.639374, throughput 4.92872K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.0124281, throughput 4.98361K wps
[Epoch 18 Batch 60/62] avg loss 0.0124261, throughput 4.8789K wps
Begin Testing...
[Epoch 18] train avg loss 0.0126385, dev acc 0.6372, dev avg loss 0.638887, throughput 4.93729K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0124331, throughput 4.98781K wps
[Epoch 19 Batch 60/62] avg loss 0.0123196, throughput 4.86657K wps
Begin Testing...
[Epoch 19] train avg loss 0.0125428, dev acc 0.6372, dev avg loss 0.635454, throughput 4.93349K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0122449, throughput 4.97805K wps
[Epoch 20 Batch 60/62] avg loss 0.0124298, throughput 4.87372K wps
Begin Testing...
[Epoch 20] train avg loss 0.0124826, dev acc 0.6372, dev avg loss 0.633442, throughput 4.93216K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.0123066, throughput 4.98556K wps
[Epoch 21 Batch 60/62] avg loss 0.0122218, throughput 4.87226K wps
Begin Testing...
[Epoch 21] train avg loss 0.0124387, dev acc 0.6372, dev avg loss 0.630992, throughput 4.93584K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.0122759, throughput 4.98K wps
[Epoch 22 Batch 60/62] avg loss 0.0121297, throughput 4.86042K wps
Begin Testing...
[Epoch 22] train avg loss 0.0123473, dev acc 0.6372, dev avg loss 0.628515, throughput 4.92611K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.0122711, throughput 4.98661K wps
[Epoch 23 Batch 60/62] avg loss 0.0120604, throughput 4.8702K wps
Begin Testing...
[Epoch 23] train avg loss 0.012317, dev acc 0.6372, dev avg loss 0.626419, throughput 4.93557K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.0121191, throughput 4.98875K wps
[Epoch 24 Batch 60/62] avg loss 0.0119767, throughput 4.87489K wps
Begin Testing...
[Epoch 24] train avg loss 0.0121991, dev acc 0.6431, dev avg loss 0.62307, throughput 4.93857K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.0121042, throughput 4.99533K wps
[Epoch 25 Batch 60/62] avg loss 0.0118067, throughput 4.87559K wps
Begin Testing...
[Epoch 25] train avg loss 0.0121599, dev acc 0.6726, dev avg loss 0.61997, throughput 4.94115K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.011794, throughput 5.00154K wps
[Epoch 26 Batch 60/62] avg loss 0.0120944, throughput 4.88105K wps
Begin Testing...
[Epoch 26] train avg loss 0.012073, dev acc 0.6667, dev avg loss 0.617222, throughput 4.94812K wps
[Epoch 27 Batch 30/62] avg loss 0.011878, throughput 4.9853K wps
[Epoch 27 Batch 60/62] avg loss 0.0117998, throughput 4.86806K wps
Begin Testing...
[Epoch 27] train avg loss 0.0119666, dev acc 0.6637, dev avg loss 0.614075, throughput 4.933K wps
[Epoch 28 Batch 30/62] avg loss 0.0116461, throughput 4.98008K wps
[Epoch 28 Batch 60/62] avg loss 0.0118653, throughput 4.86515K wps
Begin Testing...
[Epoch 28] train avg loss 0.0118752, dev acc 0.6667, dev avg loss 0.611289, throughput 4.92852K wps
[Epoch 29 Batch 30/62] avg loss 0.0116757, throughput 4.98974K wps
[Epoch 29 Batch 60/62] avg loss 0.0116421, throughput 4.87659K wps
Begin Testing...
[Epoch 29] train avg loss 0.0118545, dev acc 0.6696, dev avg loss 0.608791, throughput 4.93838K wps
[Epoch 30 Batch 30/62] avg loss 0.0116903, throughput 4.97249K wps
[Epoch 30 Batch 60/62] avg loss 0.0113919, throughput 4.86591K wps
Begin Testing...
[Epoch 30] train avg loss 0.0116602, dev acc 0.6755, dev avg loss 0.605979, throughput 4.92421K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.011273, throughput 4.99097K wps
[Epoch 31 Batch 60/62] avg loss 0.0118061, throughput 4.86376K wps
Begin Testing...
[Epoch 31] train avg loss 0.0117222, dev acc 0.6696, dev avg loss 0.603523, throughput 4.93219K wps
[Epoch 32 Batch 30/62] avg loss 0.0114662, throughput 4.98813K wps
[Epoch 32 Batch 60/62] avg loss 0.0114091, throughput 4.87405K wps
Begin Testing...
[Epoch 32] train avg loss 0.0115846, dev acc 0.6755, dev avg loss 0.601481, throughput 4.93661K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/62] avg loss 0.0113399, throughput 4.99493K wps
[Epoch 33 Batch 60/62] avg loss 0.0115092, throughput 4.88185K wps
Begin Testing...
[Epoch 33] train avg loss 0.0115758, dev acc 0.6755, dev avg loss 0.598395, throughput 4.94479K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.01134, throughput 4.99314K wps
[Epoch 34 Batch 60/62] avg loss 0.0113387, throughput 4.88326K wps
Begin Testing...
[Epoch 34] train avg loss 0.0114474, dev acc 0.6696, dev avg loss 0.59692, throughput 4.94416K wps
[Epoch 35 Batch 30/62] avg loss 0.0111198, throughput 4.99227K wps
[Epoch 35 Batch 60/62] avg loss 0.0113046, throughput 4.87657K wps
Begin Testing...
[Epoch 35] train avg loss 0.0113927, dev acc 0.6814, dev avg loss 0.593648, throughput 4.93898K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/62] avg loss 0.0111934, throughput 4.99201K wps
[Epoch 36 Batch 60/62] avg loss 0.0110726, throughput 4.87426K wps
Begin Testing...
[Epoch 36] train avg loss 0.011224, dev acc 0.6726, dev avg loss 0.593256, throughput 4.94K wps
[Epoch 37 Batch 30/62] avg loss 0.0111858, throughput 4.98591K wps
[Epoch 37 Batch 60/62] avg loss 0.0109826, throughput 4.87678K wps
Begin Testing...
[Epoch 37] train avg loss 0.0111447, dev acc 0.6726, dev avg loss 0.590014, throughput 4.93869K wps
[Epoch 38 Batch 30/62] avg loss 0.010971, throughput 4.9652K wps
[Epoch 38 Batch 60/62] avg loss 0.0110493, throughput 4.86467K wps
Begin Testing...
[Epoch 38] train avg loss 0.01121, dev acc 0.6932, dev avg loss 0.587352, throughput 4.92165K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/62] avg loss 0.0107913, throughput 4.98702K wps
[Epoch 39 Batch 60/62] avg loss 0.010971, throughput 4.86567K wps
Begin Testing...
[Epoch 39] train avg loss 0.0109862, dev acc 0.6903, dev avg loss 0.583805, throughput 4.93272K wps
[Epoch 40 Batch 30/62] avg loss 0.0108706, throughput 5.00169K wps
[Epoch 40 Batch 60/62] avg loss 0.0107895, throughput 4.88127K wps
Begin Testing...
[Epoch 40] train avg loss 0.0110298, dev acc 0.7021, dev avg loss 0.582107, throughput 4.94715K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.0108801, throughput 4.97533K wps
[Epoch 41 Batch 60/62] avg loss 0.010731, throughput 4.87588K wps
Begin Testing...
[Epoch 41] train avg loss 0.0108736, dev acc 0.6873, dev avg loss 0.58066, throughput 4.93084K wps
[Epoch 42 Batch 30/62] avg loss 0.0104693, throughput 4.95398K wps
[Epoch 42 Batch 60/62] avg loss 0.0109293, throughput 4.84211K wps
Begin Testing...
[Epoch 42] train avg loss 0.0108756, dev acc 0.7050, dev avg loss 0.576741, throughput 4.90528K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/62] avg loss 0.0105239, throughput 4.98278K wps
[Epoch 43 Batch 60/62] avg loss 0.0105027, throughput 4.8766K wps
Begin Testing...
[Epoch 43] train avg loss 0.0106218, dev acc 0.6991, dev avg loss 0.574191, throughput 4.93455K wps
[Epoch 44 Batch 30/62] avg loss 0.0104011, throughput 4.98499K wps
[Epoch 44 Batch 60/62] avg loss 0.0106268, throughput 4.86901K wps
Begin Testing...
[Epoch 44] train avg loss 0.010642, dev acc 0.7109, dev avg loss 0.571172, throughput 4.93456K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/62] avg loss 0.010607, throughput 4.98657K wps
[Epoch 45 Batch 60/62] avg loss 0.0103661, throughput 4.88399K wps
Begin Testing...
[Epoch 45] train avg loss 0.0105268, dev acc 0.6932, dev avg loss 0.5701, throughput 4.94144K wps
[Epoch 46 Batch 30/62] avg loss 0.0102234, throughput 4.98858K wps
[Epoch 46 Batch 60/62] avg loss 0.0103285, throughput 4.86518K wps
Begin Testing...
[Epoch 46] train avg loss 0.0104204, dev acc 0.7198, dev avg loss 0.565398, throughput 4.9326K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/62] avg loss 0.0101593, throughput 4.98035K wps
[Epoch 47 Batch 60/62] avg loss 0.0103364, throughput 4.88935K wps
Begin Testing...
[Epoch 47] train avg loss 0.0103412, dev acc 0.7050, dev avg loss 0.563066, throughput 4.94052K wps
[Epoch 48 Batch 30/62] avg loss 0.0102879, throughput 4.98881K wps
[Epoch 48 Batch 60/62] avg loss 0.00986216, throughput 4.8894K wps
Begin Testing...
[Epoch 48] train avg loss 0.0101644, dev acc 0.7080, dev avg loss 0.559407, throughput 4.94585K wps
[Epoch 49 Batch 30/62] avg loss 0.0102228, throughput 4.98374K wps
[Epoch 49 Batch 60/62] avg loss 0.00997518, throughput 4.885K wps
Begin Testing...
[Epoch 49] train avg loss 0.010248, dev acc 0.7168, dev avg loss 0.555284, throughput 4.94003K wps
[Epoch 50 Batch 30/62] avg loss 0.0101014, throughput 4.97341K wps
[Epoch 50 Batch 60/62] avg loss 0.00984257, throughput 4.86345K wps
Begin Testing...
[Epoch 50] train avg loss 0.0101056, dev acc 0.7021, dev avg loss 0.555241, throughput 4.92502K wps
[Epoch 51 Batch 30/62] avg loss 0.00971016, throughput 4.98014K wps
[Epoch 51 Batch 60/62] avg loss 0.0100025, throughput 4.87064K wps
Begin Testing...
[Epoch 51] train avg loss 0.00999286, dev acc 0.7139, dev avg loss 0.5493, throughput 4.93137K wps
[Epoch 52 Batch 30/62] avg loss 0.00996276, throughput 4.99123K wps
[Epoch 52 Batch 60/62] avg loss 0.00972785, throughput 4.88046K wps
Begin Testing...
[Epoch 52] train avg loss 0.00995078, dev acc 0.7286, dev avg loss 0.546399, throughput 4.94285K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.00969823, throughput 4.99645K wps
[Epoch 53 Batch 60/62] avg loss 0.00970437, throughput 4.85593K wps
Begin Testing...
[Epoch 53] train avg loss 0.00986473, dev acc 0.7139, dev avg loss 0.54499, throughput 4.93158K wps
[Epoch 54 Batch 30/62] avg loss 0.00962302, throughput 4.98189K wps
[Epoch 54 Batch 60/62] avg loss 0.00953034, throughput 4.87519K wps
Begin Testing...
[Epoch 54] train avg loss 0.00967493, dev acc 0.7286, dev avg loss 0.540467, throughput 4.9344K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/62] avg loss 0.00954384, throughput 5.00905K wps
[Epoch 55 Batch 60/62] avg loss 0.00941977, throughput 4.87821K wps
Begin Testing...
[Epoch 55] train avg loss 0.00965261, dev acc 0.7286, dev avg loss 0.537652, throughput 4.94872K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.00940705, throughput 4.98653K wps
[Epoch 56 Batch 60/62] avg loss 0.00938414, throughput 4.86892K wps
Begin Testing...
[Epoch 56] train avg loss 0.00952894, dev acc 0.7168, dev avg loss 0.53606, throughput 4.93439K wps
[Epoch 57 Batch 30/62] avg loss 0.00927174, throughput 4.9904K wps
[Epoch 57 Batch 60/62] avg loss 0.00941953, throughput 4.86312K wps
Begin Testing...
[Epoch 57] train avg loss 0.0094817, dev acc 0.7316, dev avg loss 0.531154, throughput 4.93317K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/62] avg loss 0.00903979, throughput 4.97596K wps
[Epoch 58 Batch 60/62] avg loss 0.00937015, throughput 4.89908K wps
Begin Testing...
[Epoch 58] train avg loss 0.00927062, dev acc 0.7375, dev avg loss 0.528084, throughput 4.94518K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/62] avg loss 0.00919687, throughput 4.99733K wps
[Epoch 59 Batch 60/62] avg loss 0.00911305, throughput 4.87372K wps
Begin Testing...
[Epoch 59] train avg loss 0.00935776, dev acc 0.7404, dev avg loss 0.525353, throughput 4.94124K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/62] avg loss 0.00887466, throughput 5.00726K wps
[Epoch 60 Batch 60/62] avg loss 0.00912151, throughput 4.87077K wps
Begin Testing...
[Epoch 60] train avg loss 0.00911702, dev acc 0.7227, dev avg loss 0.523764, throughput 4.94517K wps
[Epoch 61 Batch 30/62] avg loss 0.00908678, throughput 5.00848K wps
[Epoch 61 Batch 60/62] avg loss 0.00885108, throughput 4.89762K wps
Begin Testing...
[Epoch 61] train avg loss 0.00902248, dev acc 0.7375, dev avg loss 0.519744, throughput 4.95858K wps
[Epoch 62 Batch 30/62] avg loss 0.0087036, throughput 4.99784K wps
[Epoch 62 Batch 60/62] avg loss 0.00892041, throughput 4.8552K wps
Begin Testing...
[Epoch 62] train avg loss 0.00900817, dev acc 0.7375, dev avg loss 0.516782, throughput 4.93163K wps
[Epoch 63 Batch 30/62] avg loss 0.00849147, throughput 4.98297K wps
[Epoch 63 Batch 60/62] avg loss 0.00895357, throughput 4.87727K wps
Begin Testing...
[Epoch 63] train avg loss 0.00887675, dev acc 0.7375, dev avg loss 0.514313, throughput 4.93642K wps
[Epoch 64 Batch 30/62] avg loss 0.00869618, throughput 4.99154K wps
[Epoch 64 Batch 60/62] avg loss 0.00858248, throughput 4.86415K wps
Begin Testing...
[Epoch 64] train avg loss 0.00873025, dev acc 0.7286, dev avg loss 0.512941, throughput 4.93473K wps
[Epoch 65 Batch 30/62] avg loss 0.00864271, throughput 4.98993K wps
[Epoch 65 Batch 60/62] avg loss 0.00842546, throughput 4.87267K wps
Begin Testing...
[Epoch 65] train avg loss 0.00870972, dev acc 0.7552, dev avg loss 0.512671, throughput 4.9379K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/62] avg loss 0.00835256, throughput 4.99187K wps
[Epoch 66 Batch 60/62] avg loss 0.00841441, throughput 4.85369K wps
Begin Testing...
[Epoch 66] train avg loss 0.00851593, dev acc 0.7375, dev avg loss 0.509502, throughput 4.92843K wps
[Epoch 67 Batch 30/62] avg loss 0.00844047, throughput 5.00311K wps
[Epoch 67 Batch 60/62] avg loss 0.00835551, throughput 4.87572K wps
Begin Testing...
[Epoch 67] train avg loss 0.00847468, dev acc 0.7345, dev avg loss 0.507116, throughput 4.94453K wps
[Epoch 68 Batch 30/62] avg loss 0.00811463, throughput 4.99554K wps
[Epoch 68 Batch 60/62] avg loss 0.00836841, throughput 4.87122K wps
Begin Testing...
[Epoch 68] train avg loss 0.0083764, dev acc 0.7375, dev avg loss 0.502463, throughput 4.939K wps
[Epoch 69 Batch 30/62] avg loss 0.0083343, throughput 4.99073K wps
[Epoch 69 Batch 60/62] avg loss 0.007894, throughput 4.88274K wps
Begin Testing...
[Epoch 69] train avg loss 0.00822631, dev acc 0.7345, dev avg loss 0.500731, throughput 4.94222K wps
[Epoch 70 Batch 30/62] avg loss 0.00787268, throughput 4.9791K wps
[Epoch 70 Batch 60/62] avg loss 0.00819802, throughput 4.87528K wps
Begin Testing...
[Epoch 70] train avg loss 0.0080891, dev acc 0.7345, dev avg loss 0.500854, throughput 4.9335K wps
[Epoch 71 Batch 30/62] avg loss 0.00789593, throughput 4.9909K wps
[Epoch 71 Batch 60/62] avg loss 0.00796548, throughput 4.87373K wps
Begin Testing...
[Epoch 71] train avg loss 0.0080451, dev acc 0.7552, dev avg loss 0.49539, throughput 4.93897K wps
Observed Improvement.
Begin Testing...
[Epoch 72 Batch 30/62] avg loss 0.00788947, throughput 4.96619K wps
[Epoch 72 Batch 60/62] avg loss 0.00795122, throughput 4.88068K wps
Begin Testing...
[Epoch 72] train avg loss 0.00801634, dev acc 0.7404, dev avg loss 0.494948, throughput 4.93018K wps
[Epoch 73 Batch 30/62] avg loss 0.00801507, throughput 4.98539K wps
[Epoch 73 Batch 60/62] avg loss 0.007683, throughput 4.86876K wps
Begin Testing...
[Epoch 73] train avg loss 0.00795861, dev acc 0.7611, dev avg loss 0.490756, throughput 4.93375K wps
Observed Improvement.
Begin Testing...
[Epoch 74 Batch 30/62] avg loss 0.00765641, throughput 4.99848K wps
[Epoch 74 Batch 60/62] avg loss 0.00778176, throughput 4.86727K wps
Begin Testing...
[Epoch 74] train avg loss 0.00783342, dev acc 0.7581, dev avg loss 0.488951, throughput 4.93856K wps
[Epoch 75 Batch 30/62] avg loss 0.00751658, throughput 4.9755K wps
[Epoch 75 Batch 60/62] avg loss 0.00794068, throughput 4.87731K wps
Begin Testing...
[Epoch 75] train avg loss 0.00775647, dev acc 0.7581, dev avg loss 0.486795, throughput 4.93387K wps
[Epoch 76 Batch 30/62] avg loss 0.00743522, throughput 5.00332K wps
[Epoch 76 Batch 60/62] avg loss 0.00751113, throughput 4.87404K wps
Begin Testing...
[Epoch 76] train avg loss 0.00758313, dev acc 0.7581, dev avg loss 0.484841, throughput 4.94493K wps
[Epoch 77 Batch 30/62] avg loss 0.00771067, throughput 4.98919K wps
[Epoch 77 Batch 60/62] avg loss 0.00728978, throughput 4.8695K wps
Begin Testing...
[Epoch 77] train avg loss 0.00763143, dev acc 0.7552, dev avg loss 0.483777, throughput 4.93541K wps
[Epoch 78 Batch 30/62] avg loss 0.00734914, throughput 4.99011K wps
[Epoch 78 Batch 60/62] avg loss 0.00747494, throughput 4.87871K wps
Begin Testing...
[Epoch 78] train avg loss 0.00746451, dev acc 0.7581, dev avg loss 0.481486, throughput 4.94113K wps
[Epoch 79 Batch 30/62] avg loss 0.00720154, throughput 4.95609K wps
[Epoch 79 Batch 60/62] avg loss 0.00715121, throughput 4.88249K wps
Begin Testing...
[Epoch 79] train avg loss 0.00728708, dev acc 0.7581, dev avg loss 0.47997, throughput 4.92617K wps
[Epoch 80 Batch 30/62] avg loss 0.00724113, throughput 4.97419K wps
[Epoch 80 Batch 60/62] avg loss 0.00715059, throughput 4.86351K wps
Begin Testing...
[Epoch 80] train avg loss 0.00731415, dev acc 0.7522, dev avg loss 0.478007, throughput 4.92526K wps
[Epoch 81 Batch 30/62] avg loss 0.00696029, throughput 4.97688K wps
[Epoch 81 Batch 60/62] avg loss 0.00723522, throughput 4.87068K wps
Begin Testing...
[Epoch 81] train avg loss 0.00716139, dev acc 0.7463, dev avg loss 0.48394, throughput 4.93043K wps
[Epoch 82 Batch 30/62] avg loss 0.0069584, throughput 4.99598K wps
[Epoch 82 Batch 60/62] avg loss 0.00701424, throughput 4.89133K wps
Begin Testing...
[Epoch 82] train avg loss 0.0070635, dev acc 0.7729, dev avg loss 0.473953, throughput 4.94948K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/62] avg loss 0.00705798, throughput 4.97989K wps
[Epoch 83 Batch 60/62] avg loss 0.00702616, throughput 4.85138K wps
Begin Testing...
[Epoch 83] train avg loss 0.00716273, dev acc 0.7670, dev avg loss 0.472151, throughput 4.92283K wps
[Epoch 84 Batch 30/62] avg loss 0.00677248, throughput 4.97381K wps
[Epoch 84 Batch 60/62] avg loss 0.006934, throughput 4.87198K wps
Begin Testing...
[Epoch 84] train avg loss 0.00688616, dev acc 0.7493, dev avg loss 0.47959, throughput 4.92879K wps
[Epoch 85 Batch 30/62] avg loss 0.00694203, throughput 4.96498K wps
[Epoch 85 Batch 60/62] avg loss 0.00660327, throughput 4.8825K wps
Begin Testing...
[Epoch 85] train avg loss 0.00691688, dev acc 0.7611, dev avg loss 0.471822, throughput 4.92955K wps
[Epoch 86 Batch 30/62] avg loss 0.00654191, throughput 5.00084K wps
[Epoch 86 Batch 60/62] avg loss 0.00681286, throughput 4.86892K wps
Begin Testing...
[Epoch 86] train avg loss 0.00680064, dev acc 0.7552, dev avg loss 0.469517, throughput 4.94004K wps
[Epoch 87 Batch 30/62] avg loss 0.00645826, throughput 4.97464K wps
[Epoch 87 Batch 60/62] avg loss 0.00677302, throughput 4.8663K wps
Begin Testing...
[Epoch 87] train avg loss 0.00678157, dev acc 0.7581, dev avg loss 0.468008, throughput 4.9273K wps
[Epoch 88 Batch 30/62] avg loss 0.00679919, throughput 4.99972K wps
[Epoch 88 Batch 60/62] avg loss 0.00627193, throughput 4.8813K wps
Begin Testing...
[Epoch 88] train avg loss 0.00658959, dev acc 0.7552, dev avg loss 0.472155, throughput 4.94571K wps
[Epoch 89 Batch 30/62] avg loss 0.00645358, throughput 4.98634K wps
[Epoch 89 Batch 60/62] avg loss 0.00636663, throughput 4.86988K wps
Begin Testing...
[Epoch 89] train avg loss 0.00655422, dev acc 0.7611, dev avg loss 0.465006, throughput 4.93501K wps
[Epoch 90 Batch 30/62] avg loss 0.00649103, throughput 4.97439K wps
[Epoch 90 Batch 60/62] avg loss 0.00619102, throughput 4.87721K wps
Begin Testing...
[Epoch 90] train avg loss 0.00640331, dev acc 0.7611, dev avg loss 0.462843, throughput 4.9323K wps
[Epoch 91 Batch 30/62] avg loss 0.00606666, throughput 4.99774K wps
[Epoch 91 Batch 60/62] avg loss 0.00633651, throughput 4.85901K wps
Begin Testing...
[Epoch 91] train avg loss 0.00629777, dev acc 0.7611, dev avg loss 0.460939, throughput 4.93329K wps
[Epoch 92 Batch 30/62] avg loss 0.00598683, throughput 4.99109K wps
[Epoch 92 Batch 60/62] avg loss 0.00617278, throughput 4.84961K wps
Begin Testing...
[Epoch 92] train avg loss 0.00615849, dev acc 0.7552, dev avg loss 0.45895, throughput 4.92556K wps
[Epoch 93 Batch 30/62] avg loss 0.00637674, throughput 4.98362K wps
[Epoch 93 Batch 60/62] avg loss 0.00581987, throughput 4.86135K wps
Begin Testing...
[Epoch 93] train avg loss 0.00615037, dev acc 0.7670, dev avg loss 0.460013, throughput 4.92787K wps
[Epoch 94 Batch 30/62] avg loss 0.00588825, throughput 4.9845K wps
[Epoch 94 Batch 60/62] avg loss 0.00622052, throughput 4.88117K wps
Begin Testing...
[Epoch 94] train avg loss 0.00610124, dev acc 0.7729, dev avg loss 0.461103, throughput 4.93853K wps
Observed Improvement.
Begin Testing...
[Epoch 95 Batch 30/62] avg loss 0.00593012, throughput 4.97252K wps
[Epoch 95 Batch 60/62] avg loss 0.00585796, throughput 4.86668K wps
Begin Testing...
[Epoch 95] train avg loss 0.00595424, dev acc 0.7552, dev avg loss 0.456692, throughput 4.92518K wps
[Epoch 96 Batch 30/62] avg loss 0.00563075, throughput 4.99144K wps
[Epoch 96 Batch 60/62] avg loss 0.00591818, throughput 4.88481K wps
Begin Testing...
[Epoch 96] train avg loss 0.00583146, dev acc 0.7552, dev avg loss 0.456815, throughput 4.94444K wps
[Epoch 97 Batch 30/62] avg loss 0.00581107, throughput 4.97836K wps
[Epoch 97 Batch 60/62] avg loss 0.0058078, throughput 4.84094K wps
Begin Testing...
[Epoch 97] train avg loss 0.00585759, dev acc 0.7552, dev avg loss 0.454347, throughput 4.9156K wps
[Epoch 98 Batch 30/62] avg loss 0.0055884, throughput 4.9872K wps
[Epoch 98 Batch 60/62] avg loss 0.00562028, throughput 4.88444K wps
Begin Testing...
[Epoch 98] train avg loss 0.00568191, dev acc 0.7493, dev avg loss 0.453376, throughput 4.94183K wps
[Epoch 99 Batch 30/62] avg loss 0.00560803, throughput 4.95391K wps
[Epoch 99 Batch 60/62] avg loss 0.00559872, throughput 4.85842K wps
Begin Testing...
[Epoch 99] train avg loss 0.00569257, dev acc 0.7611, dev avg loss 0.453015, throughput 4.91194K wps
[Epoch 100 Batch 30/62] avg loss 0.00558883, throughput 4.95875K wps
[Epoch 100 Batch 60/62] avg loss 0.00532012, throughput 4.87332K wps
Begin Testing...
[Epoch 100] train avg loss 0.00557871, dev acc 0.7581, dev avg loss 0.453062, throughput 4.9235K wps
[Epoch 101 Batch 30/62] avg loss 0.00568472, throughput 4.96694K wps
[Epoch 101 Batch 60/62] avg loss 0.00520809, throughput 4.86443K wps
Begin Testing...
[Epoch 101] train avg loss 0.00554819, dev acc 0.7611, dev avg loss 0.45099, throughput 4.92306K wps
[Epoch 102 Batch 30/62] avg loss 0.00515583, throughput 4.96991K wps
[Epoch 102 Batch 60/62] avg loss 0.00555836, throughput 4.87581K wps
Begin Testing...
[Epoch 102] train avg loss 0.0054633, dev acc 0.7670, dev avg loss 0.455332, throughput 4.92999K wps
[Epoch 103 Batch 30/62] avg loss 0.00529132, throughput 4.97264K wps
[Epoch 103 Batch 60/62] avg loss 0.00536518, throughput 4.87349K wps
Begin Testing...
[Epoch 103] train avg loss 0.0053821, dev acc 0.7640, dev avg loss 0.449851, throughput 4.92824K wps
[Epoch 104 Batch 30/62] avg loss 0.00522946, throughput 4.98241K wps
[Epoch 104 Batch 60/62] avg loss 0.00514189, throughput 4.86445K wps
Begin Testing...
[Epoch 104] train avg loss 0.00531517, dev acc 0.7611, dev avg loss 0.447512, throughput 4.92947K wps
[Epoch 105 Batch 30/62] avg loss 0.00525454, throughput 4.97099K wps
[Epoch 105 Batch 60/62] avg loss 0.00501091, throughput 4.8603K wps
Begin Testing...
[Epoch 105] train avg loss 0.00526198, dev acc 0.7581, dev avg loss 0.447829, throughput 4.92199K wps
[Epoch 106 Batch 30/62] avg loss 0.00510649, throughput 4.9739K wps
[Epoch 106 Batch 60/62] avg loss 0.00505626, throughput 4.87997K wps
Begin Testing...
[Epoch 106] train avg loss 0.00512042, dev acc 0.7670, dev avg loss 0.446775, throughput 4.93433K wps
[Epoch 107 Batch 30/62] avg loss 0.00492939, throughput 4.98703K wps
[Epoch 107 Batch 60/62] avg loss 0.00499616, throughput 4.87344K wps
Begin Testing...
[Epoch 107] train avg loss 0.00507364, dev acc 0.7581, dev avg loss 0.447244, throughput 4.93661K wps
[Epoch 108 Batch 30/62] avg loss 0.00491024, throughput 4.96813K wps
[Epoch 108 Batch 60/62] avg loss 0.00494423, throughput 4.87847K wps
Begin Testing...
[Epoch 108] train avg loss 0.00504143, dev acc 0.7640, dev avg loss 0.447863, throughput 4.93052K wps
[Epoch 109 Batch 30/62] avg loss 0.00480412, throughput 4.99175K wps
[Epoch 109 Batch 60/62] avg loss 0.00496075, throughput 4.86319K wps
Begin Testing...
[Epoch 109] train avg loss 0.00493068, dev acc 0.7611, dev avg loss 0.446919, throughput 4.93477K wps
[Epoch 110 Batch 30/62] avg loss 0.00469006, throughput 4.98045K wps
[Epoch 110 Batch 60/62] avg loss 0.00486587, throughput 4.8709K wps
Begin Testing...
[Epoch 110] train avg loss 0.00486616, dev acc 0.7670, dev avg loss 0.444556, throughput 4.93295K wps
[Epoch 111 Batch 30/62] avg loss 0.00459334, throughput 4.98555K wps
[Epoch 111 Batch 60/62] avg loss 0.00475257, throughput 4.87057K wps
Begin Testing...
[Epoch 111] train avg loss 0.00475524, dev acc 0.7640, dev avg loss 0.442353, throughput 4.93396K wps
[Epoch 112 Batch 30/62] avg loss 0.00458343, throughput 4.99643K wps
[Epoch 112 Batch 60/62] avg loss 0.00471512, throughput 4.88137K wps
Begin Testing...
[Epoch 112] train avg loss 0.00471363, dev acc 0.7640, dev avg loss 0.442972, throughput 4.94548K wps
[Epoch 113 Batch 30/62] avg loss 0.00448531, throughput 4.99027K wps
[Epoch 113 Batch 60/62] avg loss 0.00457034, throughput 4.87536K wps
Begin Testing...
[Epoch 113] train avg loss 0.00462699, dev acc 0.7611, dev avg loss 0.445067, throughput 4.93915K wps
[Epoch 114 Batch 30/62] avg loss 0.00460044, throughput 4.98723K wps
[Epoch 114 Batch 60/62] avg loss 0.00439204, throughput 4.87335K wps
Begin Testing...
[Epoch 114] train avg loss 0.00455263, dev acc 0.7640, dev avg loss 0.443949, throughput 4.9358K wps
[Epoch 115 Batch 30/62] avg loss 0.00452652, throughput 5.00109K wps
[Epoch 115 Batch 60/62] avg loss 0.00429924, throughput 4.88212K wps
Begin Testing...
[Epoch 115] train avg loss 0.00445463, dev acc 0.7699, dev avg loss 0.441227, throughput 4.94773K wps
[Epoch 116 Batch 30/62] avg loss 0.00434655, throughput 4.97622K wps
[Epoch 116 Batch 60/62] avg loss 0.00430241, throughput 4.87821K wps
Begin Testing...
[Epoch 116] train avg loss 0.00442268, dev acc 0.7670, dev avg loss 0.440065, throughput 4.93407K wps
[Epoch 117 Batch 30/62] avg loss 0.00433912, throughput 4.97756K wps
[Epoch 117 Batch 60/62] avg loss 0.00431599, throughput 4.8674K wps
Begin Testing...
[Epoch 117] train avg loss 0.00440241, dev acc 0.7670, dev avg loss 0.440577, throughput 4.93002K wps
[Epoch 118 Batch 30/62] avg loss 0.00432398, throughput 4.98688K wps
[Epoch 118 Batch 60/62] avg loss 0.00419726, throughput 4.87375K wps
Begin Testing...
[Epoch 118] train avg loss 0.0042879, dev acc 0.7699, dev avg loss 0.446426, throughput 4.93746K wps
[Epoch 119 Batch 30/62] avg loss 0.00420532, throughput 4.98818K wps
[Epoch 119 Batch 60/62] avg loss 0.00414429, throughput 4.87633K wps
Begin Testing...
[Epoch 119] train avg loss 0.0042498, dev acc 0.7699, dev avg loss 0.44059, throughput 4.93603K wps
[Epoch 120 Batch 30/62] avg loss 0.00405228, throughput 4.99882K wps
[Epoch 120 Batch 60/62] avg loss 0.00411625, throughput 4.87548K wps
Begin Testing...
[Epoch 120] train avg loss 0.0041471, dev acc 0.7729, dev avg loss 0.439753, throughput 4.94255K wps
Observed Improvement.
Begin Testing...
[Epoch 121 Batch 30/62] avg loss 0.00397403, throughput 4.97787K wps
[Epoch 121 Batch 60/62] avg loss 0.00410428, throughput 4.87556K wps
Begin Testing...
[Epoch 121] train avg loss 0.00409253, dev acc 0.7699, dev avg loss 0.441166, throughput 4.93267K wps
[Epoch 122 Batch 30/62] avg loss 0.00392309, throughput 4.97525K wps
[Epoch 122 Batch 60/62] avg loss 0.0040334, throughput 4.8607K wps
Begin Testing...
[Epoch 122] train avg loss 0.00398909, dev acc 0.7729, dev avg loss 0.438476, throughput 4.92308K wps
Observed Improvement.
Begin Testing...
[Epoch 123 Batch 30/62] avg loss 0.00397936, throughput 5.00321K wps
[Epoch 123 Batch 60/62] avg loss 0.00383255, throughput 4.86028K wps
Begin Testing...
[Epoch 123] train avg loss 0.00390914, dev acc 0.7788, dev avg loss 0.438129, throughput 4.93516K wps
Observed Improvement.
Begin Testing...
[Epoch 124 Batch 30/62] avg loss 0.00397566, throughput 4.9917K wps
[Epoch 124 Batch 60/62] avg loss 0.00379508, throughput 4.8763K wps
Begin Testing...
[Epoch 124] train avg loss 0.00403239, dev acc 0.7699, dev avg loss 0.435848, throughput 4.9411K wps
[Epoch 125 Batch 30/62] avg loss 0.00385347, throughput 4.97897K wps
[Epoch 125 Batch 60/62] avg loss 0.00381999, throughput 4.87605K wps
Begin Testing...
[Epoch 125] train avg loss 0.00387198, dev acc 0.7729, dev avg loss 0.435413, throughput 4.93367K wps
[Epoch 126 Batch 30/62] avg loss 0.00386388, throughput 4.98896K wps
[Epoch 126 Batch 60/62] avg loss 0.00366273, throughput 4.86928K wps
Begin Testing...
[Epoch 126] train avg loss 0.00379648, dev acc 0.7729, dev avg loss 0.440796, throughput 4.93429K wps
[Epoch 127 Batch 30/62] avg loss 0.00374752, throughput 4.9916K wps
[Epoch 127 Batch 60/62] avg loss 0.00370705, throughput 4.8724K wps
Begin Testing...
[Epoch 127] train avg loss 0.00376441, dev acc 0.7699, dev avg loss 0.433839, throughput 4.93812K wps
[Epoch 128 Batch 30/62] avg loss 0.00361662, throughput 5.00293K wps
[Epoch 128 Batch 60/62] avg loss 0.00368948, throughput 4.85938K wps
Begin Testing...
[Epoch 128] train avg loss 0.00375767, dev acc 0.7699, dev avg loss 0.433255, throughput 4.9371K wps
[Epoch 129 Batch 30/62] avg loss 0.00356851, throughput 4.99289K wps
[Epoch 129 Batch 60/62] avg loss 0.00362433, throughput 4.85869K wps
Begin Testing...
[Epoch 129] train avg loss 0.00360391, dev acc 0.7670, dev avg loss 0.434565, throughput 4.9317K wps
[Epoch 130 Batch 30/62] avg loss 0.0034979, throughput 4.97327K wps
[Epoch 130 Batch 60/62] avg loss 0.00353079, throughput 4.86954K wps
Begin Testing...
[Epoch 130] train avg loss 0.00359163, dev acc 0.7758, dev avg loss 0.434444, throughput 4.92803K wps
[Epoch 131 Batch 30/62] avg loss 0.00354536, throughput 4.99305K wps
[Epoch 131 Batch 60/62] avg loss 0.00341357, throughput 4.86396K wps
Begin Testing...
[Epoch 131] train avg loss 0.00358183, dev acc 0.7788, dev avg loss 0.435988, throughput 4.93556K wps
Observed Improvement.
Begin Testing...
[Epoch 132 Batch 30/62] avg loss 0.00366462, throughput 4.99318K wps
[Epoch 132 Batch 60/62] avg loss 0.00320833, throughput 4.85673K wps
Begin Testing...
[Epoch 132] train avg loss 0.00349138, dev acc 0.7788, dev avg loss 0.451171, throughput 4.92993K wps
Observed Improvement.
Begin Testing...
[Epoch 133 Batch 30/62] avg loss 0.00328956, throughput 4.99837K wps
[Epoch 133 Batch 60/62] avg loss 0.00360693, throughput 4.87625K wps
Begin Testing...
[Epoch 133] train avg loss 0.00346899, dev acc 0.7758, dev avg loss 0.432567, throughput 4.94276K wps
[Epoch 134 Batch 30/62] avg loss 0.00342521, throughput 4.98999K wps
[Epoch 134 Batch 60/62] avg loss 0.00329808, throughput 4.87142K wps
Begin Testing...
[Epoch 134] train avg loss 0.00337729, dev acc 0.7817, dev avg loss 0.43673, throughput 4.93688K wps
Observed Improvement.
Begin Testing...
[Epoch 135 Batch 30/62] avg loss 0.00333015, throughput 4.99668K wps
[Epoch 135 Batch 60/62] avg loss 0.00338167, throughput 4.8799K wps
Begin Testing...
[Epoch 135] train avg loss 0.00335926, dev acc 0.7788, dev avg loss 0.434751, throughput 4.94479K wps
[Epoch 136 Batch 30/62] avg loss 0.00328408, throughput 4.95844K wps
[Epoch 136 Batch 60/62] avg loss 0.00320508, throughput 4.85456K wps
Begin Testing...
[Epoch 136] train avg loss 0.00330101, dev acc 0.7817, dev avg loss 0.434404, throughput 4.91304K wps
Observed Improvement.
Begin Testing...
[Epoch 137 Batch 30/62] avg loss 0.00323012, throughput 4.97921K wps
[Epoch 137 Batch 60/62] avg loss 0.00327604, throughput 4.86836K wps
Begin Testing...
[Epoch 137] train avg loss 0.00327982, dev acc 0.7758, dev avg loss 0.431289, throughput 4.93088K wps
[Epoch 138 Batch 30/62] avg loss 0.00309946, throughput 4.99048K wps
[Epoch 138 Batch 60/62] avg loss 0.00326015, throughput 4.86361K wps
Begin Testing...
[Epoch 138] train avg loss 0.00318895, dev acc 0.7847, dev avg loss 0.436474, throughput 4.93343K wps
Observed Improvement.
Begin Testing...
[Epoch 139 Batch 30/62] avg loss 0.00304281, throughput 4.98675K wps
[Epoch 139 Batch 60/62] avg loss 0.00315632, throughput 4.88115K wps
Begin Testing...
[Epoch 139] train avg loss 0.00311714, dev acc 0.7788, dev avg loss 0.434737, throughput 4.94041K wps
[Epoch 140 Batch 30/62] avg loss 0.00304161, throughput 4.97321K wps
[Epoch 140 Batch 60/62] avg loss 0.00297265, throughput 4.87994K wps
Begin Testing...
[Epoch 140] train avg loss 0.00302055, dev acc 0.7817, dev avg loss 0.432119, throughput 4.93281K wps
[Epoch 141 Batch 30/62] avg loss 0.00284806, throughput 4.98985K wps
[Epoch 141 Batch 60/62] avg loss 0.00312141, throughput 4.8458K wps
Begin Testing...
[Epoch 141] train avg loss 0.00298838, dev acc 0.7817, dev avg loss 0.431659, throughput 4.92448K wps
[Epoch 142 Batch 30/62] avg loss 0.00309086, throughput 4.99006K wps
[Epoch 142 Batch 60/62] avg loss 0.00286492, throughput 4.8681K wps
Begin Testing...
[Epoch 142] train avg loss 0.00299812, dev acc 0.7817, dev avg loss 0.43235, throughput 4.93395K wps
[Epoch 143 Batch 30/62] avg loss 0.00295743, throughput 4.9924K wps
[Epoch 143 Batch 60/62] avg loss 0.00292872, throughput 4.89116K wps
Begin Testing...
[Epoch 143] train avg loss 0.00298811, dev acc 0.7817, dev avg loss 0.429602, throughput 4.94808K wps
[Epoch 144 Batch 30/62] avg loss 0.00300424, throughput 4.99397K wps
[Epoch 144 Batch 60/62] avg loss 0.00276146, throughput 4.88782K wps
Begin Testing...
[Epoch 144] train avg loss 0.00287002, dev acc 0.7876, dev avg loss 0.430819, throughput 4.94806K wps
Observed Improvement.
Begin Testing...
[Epoch 145 Batch 30/62] avg loss 0.00277149, throughput 4.98393K wps
[Epoch 145 Batch 60/62] avg loss 0.00282948, throughput 4.86575K wps
Begin Testing...
[Epoch 145] train avg loss 0.00282475, dev acc 0.7876, dev avg loss 0.431907, throughput 4.93053K wps
Observed Improvement.
Begin Testing...
[Epoch 146 Batch 30/62] avg loss 0.00291886, throughput 4.99185K wps
[Epoch 146 Batch 60/62] avg loss 0.00271028, throughput 4.87183K wps
Begin Testing...
[Epoch 146] train avg loss 0.00284268, dev acc 0.7817, dev avg loss 0.432368, throughput 4.93686K wps
[Epoch 147 Batch 30/62] avg loss 0.00263102, throughput 4.98697K wps
[Epoch 147 Batch 60/62] avg loss 0.00286837, throughput 4.87305K wps
Begin Testing...
[Epoch 147] train avg loss 0.00279792, dev acc 0.7876, dev avg loss 0.435281, throughput 4.93659K wps
Observed Improvement.
Begin Testing...
[Epoch 148 Batch 30/62] avg loss 0.00280795, throughput 4.9902K wps
[Epoch 148 Batch 60/62] avg loss 0.00264812, throughput 4.86586K wps
Begin Testing...
[Epoch 148] train avg loss 0.00275988, dev acc 0.7847, dev avg loss 0.436887, throughput 4.9337K wps
[Epoch 149 Batch 30/62] avg loss 0.00268784, throughput 4.94609K wps
[Epoch 149 Batch 60/62] avg loss 0.00272202, throughput 4.87096K wps
Begin Testing...
[Epoch 149] train avg loss 0.00271751, dev acc 0.7876, dev avg loss 0.436834, throughput 4.9162K wps
Observed Improvement.
Begin Testing...
[Epoch 150 Batch 30/62] avg loss 0.00248601, throughput 4.99767K wps
[Epoch 150 Batch 60/62] avg loss 0.00265041, throughput 4.87694K wps
Begin Testing...
[Epoch 150] train avg loss 0.00259454, dev acc 0.7876, dev avg loss 0.433611, throughput 4.94287K wps
Observed Improvement.
Begin Testing...
[Epoch 151 Batch 30/62] avg loss 0.00257331, throughput 5.00111K wps
[Epoch 151 Batch 60/62] avg loss 0.00262886, throughput 4.87852K wps
Begin Testing...
[Epoch 151] train avg loss 0.00263276, dev acc 0.7906, dev avg loss 0.437367, throughput 4.94437K wps
Observed Improvement.
Begin Testing...
[Epoch 152 Batch 30/62] avg loss 0.00256885, throughput 4.98023K wps
[Epoch 152 Batch 60/62] avg loss 0.00255067, throughput 4.8729K wps
Begin Testing...
[Epoch 152] train avg loss 0.00257627, dev acc 0.7847, dev avg loss 0.433142, throughput 4.93305K wps
[Epoch 153 Batch 30/62] avg loss 0.00253242, throughput 4.97435K wps
[Epoch 153 Batch 60/62] avg loss 0.00260194, throughput 4.87089K wps
Begin Testing...
[Epoch 153] train avg loss 0.00260737, dev acc 0.7906, dev avg loss 0.436262, throughput 4.92829K wps
Observed Improvement.
Begin Testing...
[Epoch 154 Batch 30/62] avg loss 0.00233415, throughput 4.98456K wps
[Epoch 154 Batch 60/62] avg loss 0.00257534, throughput 4.8655K wps
Begin Testing...
[Epoch 154] train avg loss 0.00246574, dev acc 0.7876, dev avg loss 0.433333, throughput 4.92986K wps
[Epoch 155 Batch 30/62] avg loss 0.00240616, throughput 4.98004K wps
[Epoch 155 Batch 60/62] avg loss 0.00252721, throughput 4.88247K wps
Begin Testing...
[Epoch 155] train avg loss 0.00247045, dev acc 0.7847, dev avg loss 0.432489, throughput 4.93654K wps
[Epoch 156 Batch 30/62] avg loss 0.00242703, throughput 4.98933K wps
[Epoch 156 Batch 60/62] avg loss 0.002472, throughput 4.88174K wps
Begin Testing...
[Epoch 156] train avg loss 0.00248575, dev acc 0.7906, dev avg loss 0.432019, throughput 4.94219K wps
Observed Improvement.
Begin Testing...
[Epoch 157 Batch 30/62] avg loss 0.00225503, throughput 4.99296K wps
[Epoch 157 Batch 60/62] avg loss 0.00244285, throughput 4.87697K wps
Begin Testing...
[Epoch 157] train avg loss 0.00245497, dev acc 0.7729, dev avg loss 0.440915, throughput 4.94082K wps
[Epoch 158 Batch 30/62] avg loss 0.00219145, throughput 4.97707K wps
[Epoch 158 Batch 60/62] avg loss 0.00226888, throughput 4.86426K wps
Begin Testing...
[Epoch 158] train avg loss 0.0022541, dev acc 0.7876, dev avg loss 0.433478, throughput 4.92659K wps
[Epoch 159 Batch 30/62] avg loss 0.0022627, throughput 4.97646K wps
[Epoch 159 Batch 60/62] avg loss 0.00238107, throughput 4.87835K wps
Begin Testing...
[Epoch 159] train avg loss 0.00236424, dev acc 0.7847, dev avg loss 0.433152, throughput 4.93407K wps
[Epoch 160 Batch 30/62] avg loss 0.00229904, throughput 4.99413K wps
[Epoch 160 Batch 60/62] avg loss 0.00220446, throughput 4.87221K wps
Begin Testing...
[Epoch 160] train avg loss 0.00233962, dev acc 0.7817, dev avg loss 0.434317, throughput 4.93904K wps
[Epoch 161 Batch 30/62] avg loss 0.00216437, throughput 4.98195K wps
[Epoch 161 Batch 60/62] avg loss 0.00232463, throughput 4.87136K wps
Begin Testing...
[Epoch 161] train avg loss 0.00227927, dev acc 0.7847, dev avg loss 0.434282, throughput 4.9333K wps
[Epoch 162 Batch 30/62] avg loss 0.00226975, throughput 4.98386K wps
[Epoch 162 Batch 60/62] avg loss 0.00218717, throughput 4.86037K wps
Begin Testing...
[Epoch 162] train avg loss 0.00224946, dev acc 0.7906, dev avg loss 0.435174, throughput 4.92758K wps
Observed Improvement.
Begin Testing...
[Epoch 163 Batch 30/62] avg loss 0.00212128, throughput 4.99713K wps
[Epoch 163 Batch 60/62] avg loss 0.0022031, throughput 4.87984K wps
Begin Testing...
[Epoch 163] train avg loss 0.00221158, dev acc 0.7994, dev avg loss 0.440377, throughput 4.94461K wps
Observed Improvement.
Begin Testing...
[Epoch 164 Batch 30/62] avg loss 0.0020782, throughput 4.98457K wps
[Epoch 164 Batch 60/62] avg loss 0.00212193, throughput 4.87077K wps
Begin Testing...
[Epoch 164] train avg loss 0.00211962, dev acc 0.7965, dev avg loss 0.440113, throughput 4.93424K wps
[Epoch 165 Batch 30/62] avg loss 0.00199784, throughput 4.97782K wps
[Epoch 165 Batch 60/62] avg loss 0.00202889, throughput 4.87412K wps
Begin Testing...
[Epoch 165] train avg loss 0.00207082, dev acc 0.7906, dev avg loss 0.439425, throughput 4.93098K wps
[Epoch 166 Batch 30/62] avg loss 0.00207351, throughput 4.97264K wps
[Epoch 166 Batch 60/62] avg loss 0.00205482, throughput 4.85631K wps
Begin Testing...
[Epoch 166] train avg loss 0.00207327, dev acc 0.7847, dev avg loss 0.434935, throughput 4.92141K wps
[Epoch 167 Batch 30/62] avg loss 0.00206678, throughput 4.95672K wps
[Epoch 167 Batch 60/62] avg loss 0.00198667, throughput 4.87133K wps
Begin Testing...
[Epoch 167] train avg loss 0.00206472, dev acc 0.7965, dev avg loss 0.442774, throughput 4.92093K wps
[Epoch 168 Batch 30/62] avg loss 0.00206306, throughput 4.98002K wps
[Epoch 168 Batch 60/62] avg loss 0.00201327, throughput 4.86371K wps
Begin Testing...
[Epoch 168] train avg loss 0.00205631, dev acc 0.7906, dev avg loss 0.434179, throughput 4.92676K wps
[Epoch 169 Batch 30/62] avg loss 0.00202182, throughput 4.98831K wps
[Epoch 169 Batch 60/62] avg loss 0.00185004, throughput 4.86549K wps
Begin Testing...
[Epoch 169] train avg loss 0.00195033, dev acc 0.7876, dev avg loss 0.433618, throughput 4.93261K wps
[Epoch 170 Batch 30/62] avg loss 0.00199851, throughput 4.98547K wps
[Epoch 170 Batch 60/62] avg loss 0.00186945, throughput 4.88768K wps
Begin Testing...
[Epoch 170] train avg loss 0.00194739, dev acc 0.7876, dev avg loss 0.435914, throughput 4.94182K wps
[Epoch 171 Batch 30/62] avg loss 0.00202909, throughput 4.97049K wps
[Epoch 171 Batch 60/62] avg loss 0.00184716, throughput 4.85738K wps
Begin Testing...
[Epoch 171] train avg loss 0.00195626, dev acc 0.7876, dev avg loss 0.434918, throughput 4.9203K wps
[Epoch 172 Batch 30/62] avg loss 0.001786, throughput 4.96328K wps
[Epoch 172 Batch 60/62] avg loss 0.001949, throughput 4.87003K wps
Begin Testing...
[Epoch 172] train avg loss 0.00188153, dev acc 0.7817, dev avg loss 0.435655, throughput 4.92367K wps
[Epoch 173 Batch 30/62] avg loss 0.00184428, throughput 4.98719K wps
[Epoch 173 Batch 60/62] avg loss 0.00190213, throughput 4.86752K wps
Begin Testing...
[Epoch 173] train avg loss 0.00190902, dev acc 0.7935, dev avg loss 0.449468, throughput 4.93423K wps
[Epoch 174 Batch 30/62] avg loss 0.00185835, throughput 4.98006K wps
[Epoch 174 Batch 60/62] avg loss 0.00192053, throughput 4.88453K wps
Begin Testing...
[Epoch 174] train avg loss 0.00188892, dev acc 0.7847, dev avg loss 0.437346, throughput 4.93896K wps
[Epoch 175 Batch 30/62] avg loss 0.00168503, throughput 4.98849K wps
[Epoch 175 Batch 60/62] avg loss 0.00182497, throughput 4.86056K wps
Begin Testing...
[Epoch 175] train avg loss 0.00180453, dev acc 0.7847, dev avg loss 0.434489, throughput 4.93K wps
[Epoch 176 Batch 30/62] avg loss 0.00179166, throughput 4.98155K wps
[Epoch 176 Batch 60/62] avg loss 0.00181008, throughput 4.87753K wps
Begin Testing...
[Epoch 176] train avg loss 0.00182294, dev acc 0.7847, dev avg loss 0.4361, throughput 4.93655K wps
[Epoch 177 Batch 30/62] avg loss 0.00176526, throughput 4.99469K wps
[Epoch 177 Batch 60/62] avg loss 0.00182018, throughput 4.84667K wps
Begin Testing...
[Epoch 177] train avg loss 0.0018139, dev acc 0.7847, dev avg loss 0.434905, throughput 4.92344K wps
[Epoch 178 Batch 30/62] avg loss 0.00167987, throughput 4.96133K wps
[Epoch 178 Batch 60/62] avg loss 0.00169824, throughput 4.88208K wps
Begin Testing...
[Epoch 178] train avg loss 0.00170064, dev acc 0.7876, dev avg loss 0.437739, throughput 4.9272K wps
[Epoch 179 Batch 30/62] avg loss 0.00164053, throughput 4.98693K wps
[Epoch 179 Batch 60/62] avg loss 0.00175694, throughput 4.87066K wps
Begin Testing...
[Epoch 179] train avg loss 0.0017521, dev acc 0.7847, dev avg loss 0.440834, throughput 4.93454K wps
[Epoch 180 Batch 30/62] avg loss 0.00175387, throughput 4.99131K wps
[Epoch 180 Batch 60/62] avg loss 0.00164973, throughput 4.86909K wps
Begin Testing...
[Epoch 180] train avg loss 0.00175169, dev acc 0.7876, dev avg loss 0.436617, throughput 4.93551K wps
[Epoch 181 Batch 30/62] avg loss 0.00164359, throughput 4.985K wps
[Epoch 181 Batch 60/62] avg loss 0.00168362, throughput 4.86709K wps
Begin Testing...
[Epoch 181] train avg loss 0.00167472, dev acc 0.7847, dev avg loss 0.439363, throughput 4.93158K wps
[Epoch 182 Batch 30/62] avg loss 0.00162455, throughput 4.96832K wps
[Epoch 182 Batch 60/62] avg loss 0.00158437, throughput 4.8767K wps
Begin Testing...
[Epoch 182] train avg loss 0.00162117, dev acc 0.7935, dev avg loss 0.444148, throughput 4.92914K wps
[Epoch 183 Batch 30/62] avg loss 0.00165029, throughput 4.99597K wps
[Epoch 183 Batch 60/62] avg loss 0.00154237, throughput 4.877K wps
Begin Testing...
[Epoch 183] train avg loss 0.00161664, dev acc 0.7935, dev avg loss 0.443707, throughput 4.94208K wps
[Epoch 184 Batch 30/62] avg loss 0.00160037, throughput 4.97797K wps
[Epoch 184 Batch 60/62] avg loss 0.00158007, throughput 4.87351K wps
Begin Testing...
[Epoch 184] train avg loss 0.00159606, dev acc 0.7935, dev avg loss 0.442829, throughput 4.93123K wps
[Epoch 185 Batch 30/62] avg loss 0.00168104, throughput 4.99267K wps
[Epoch 185 Batch 60/62] avg loss 0.00143404, throughput 4.87259K wps
Begin Testing...
[Epoch 185] train avg loss 0.00156951, dev acc 0.7817, dev avg loss 0.438768, throughput 4.93966K wps
[Epoch 186 Batch 30/62] avg loss 0.00151531, throughput 4.98845K wps
[Epoch 186 Batch 60/62] avg loss 0.00161105, throughput 4.87129K wps
Begin Testing...
[Epoch 186] train avg loss 0.00157838, dev acc 0.7935, dev avg loss 0.445149, throughput 4.93695K wps
[Epoch 187 Batch 30/62] avg loss 0.00149826, throughput 4.95878K wps
[Epoch 187 Batch 60/62] avg loss 0.00163931, throughput 4.86766K wps
Begin Testing...
[Epoch 187] train avg loss 0.00157664, dev acc 0.7935, dev avg loss 0.440019, throughput 4.9196K wps
[Epoch 188 Batch 30/62] avg loss 0.00159835, throughput 4.98923K wps
[Epoch 188 Batch 60/62] avg loss 0.00149533, throughput 4.8271K wps
Begin Testing...
[Epoch 188] train avg loss 0.00160357, dev acc 0.7994, dev avg loss 0.446269, throughput 4.91424K wps
Observed Improvement.
Begin Testing...
[Epoch 189 Batch 30/62] avg loss 0.00143157, throughput 4.95462K wps
[Epoch 189 Batch 60/62] avg loss 0.00153486, throughput 4.86129K wps
Begin Testing...
[Epoch 189] train avg loss 0.00151943, dev acc 0.7935, dev avg loss 0.444722, throughput 4.91436K wps
[Epoch 190 Batch 30/62] avg loss 0.00146626, throughput 4.96889K wps
[Epoch 190 Batch 60/62] avg loss 0.00155429, throughput 4.88751K wps
Begin Testing...
[Epoch 190] train avg loss 0.0015478, dev acc 0.7876, dev avg loss 0.441799, throughput 4.93548K wps
[Epoch 191 Batch 30/62] avg loss 0.00148287, throughput 5.0019K wps
[Epoch 191 Batch 60/62] avg loss 0.00139606, throughput 4.87212K wps
Begin Testing...
[Epoch 191] train avg loss 0.00143662, dev acc 0.7876, dev avg loss 0.443557, throughput 4.94258K wps
[Epoch 192 Batch 30/62] avg loss 0.00142966, throughput 4.98368K wps
[Epoch 192 Batch 60/62] avg loss 0.00138419, throughput 4.85157K wps
Begin Testing...
[Epoch 192] train avg loss 0.00142808, dev acc 0.7817, dev avg loss 0.440787, throughput 4.9238K wps
[Epoch 193 Batch 30/62] avg loss 0.00136661, throughput 4.97974K wps
[Epoch 193 Batch 60/62] avg loss 0.00143488, throughput 4.88217K wps
Begin Testing...
[Epoch 193] train avg loss 0.0014083, dev acc 0.7935, dev avg loss 0.442045, throughput 4.93799K wps
[Epoch 194 Batch 30/62] avg loss 0.00135344, throughput 4.99235K wps
[Epoch 194 Batch 60/62] avg loss 0.00135318, throughput 4.88639K wps
Begin Testing...
[Epoch 194] train avg loss 0.00135726, dev acc 0.7847, dev avg loss 0.443002, throughput 4.94575K wps
[Epoch 195 Batch 30/62] avg loss 0.00136072, throughput 4.98255K wps
[Epoch 195 Batch 60/62] avg loss 0.00142833, throughput 4.8753K wps
Begin Testing...
[Epoch 195] train avg loss 0.00139519, dev acc 0.7876, dev avg loss 0.442366, throughput 4.93531K wps
[Epoch 196 Batch 30/62] avg loss 0.00129682, throughput 4.94695K wps
[Epoch 196 Batch 60/62] avg loss 0.0013942, throughput 4.87154K wps
Begin Testing...
[Epoch 196] train avg loss 0.00138521, dev acc 0.7906, dev avg loss 0.443274, throughput 4.91607K wps
[Epoch 197 Batch 30/62] avg loss 0.00136607, throughput 4.99256K wps
[Epoch 197 Batch 60/62] avg loss 0.0013052, throughput 4.87887K wps
Begin Testing...
[Epoch 197] train avg loss 0.00137137, dev acc 0.7817, dev avg loss 0.443231, throughput 4.94178K wps
[Epoch 198 Batch 30/62] avg loss 0.0012755, throughput 4.97692K wps
[Epoch 198 Batch 60/62] avg loss 0.00139995, throughput 4.87896K wps
Begin Testing...
[Epoch 198] train avg loss 0.0013594, dev acc 0.7965, dev avg loss 0.44784, throughput 4.93499K wps
[Epoch 199 Batch 30/62] avg loss 0.00129394, throughput 4.97442K wps
[Epoch 199 Batch 60/62] avg loss 0.00133974, throughput 4.87652K wps
Begin Testing...
[Epoch 199] train avg loss 0.00133262, dev acc 0.7906, dev avg loss 0.446073, throughput 4.93211K wps
Test loss 0.418908, test acc 0.8090
Total time cost 299.04s
[Epoch 0 Batch 30/62] avg loss 0.0133704, throughput 4.761K wps
[Epoch 0 Batch 60/62] avg loss 0.013016, throughput 4.85958K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133279, dev acc 0.6519, dev avg loss 0.65069, throughput 4.81853K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0133384, throughput 4.97197K wps
[Epoch 1 Batch 60/62] avg loss 0.0130246, throughput 4.86217K wps
Begin Testing...
[Epoch 1] train avg loss 0.0133492, dev acc 0.6519, dev avg loss 0.645271, throughput 4.92396K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0130507, throughput 4.96954K wps
[Epoch 2 Batch 60/62] avg loss 0.0130733, throughput 4.83263K wps
Begin Testing...
[Epoch 2] train avg loss 0.0132008, dev acc 0.6519, dev avg loss 0.643786, throughput 4.90647K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0130718, throughput 4.97995K wps
[Epoch 3 Batch 60/62] avg loss 0.01298, throughput 4.88909K wps
Begin Testing...
[Epoch 3] train avg loss 0.0131903, dev acc 0.6519, dev avg loss 0.642981, throughput 4.94003K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0131204, throughput 4.96239K wps
[Epoch 4 Batch 60/62] avg loss 0.0128313, throughput 4.84412K wps
Begin Testing...
[Epoch 4] train avg loss 0.0131707, dev acc 0.6519, dev avg loss 0.641933, throughput 4.9107K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0129661, throughput 4.98282K wps
[Epoch 5 Batch 60/62] avg loss 0.0128968, throughput 4.88275K wps
Begin Testing...
[Epoch 5] train avg loss 0.0130923, dev acc 0.6519, dev avg loss 0.640974, throughput 4.93908K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0129087, throughput 4.99485K wps
[Epoch 6 Batch 60/62] avg loss 0.0128449, throughput 4.88783K wps
Begin Testing...
[Epoch 6] train avg loss 0.0130558, dev acc 0.6519, dev avg loss 0.639961, throughput 4.94705K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0129646, throughput 4.98042K wps
[Epoch 7 Batch 60/62] avg loss 0.0127548, throughput 4.86735K wps
Begin Testing...
[Epoch 7] train avg loss 0.0130317, dev acc 0.6519, dev avg loss 0.639274, throughput 4.93K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0127365, throughput 4.98141K wps
[Epoch 8 Batch 60/62] avg loss 0.0129385, throughput 4.85241K wps
Begin Testing...
[Epoch 8] train avg loss 0.0129779, dev acc 0.6519, dev avg loss 0.637639, throughput 4.92426K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0127867, throughput 4.99311K wps
[Epoch 9 Batch 60/62] avg loss 0.0128082, throughput 4.88398K wps
Begin Testing...
[Epoch 9] train avg loss 0.0129384, dev acc 0.6519, dev avg loss 0.636443, throughput 4.94454K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0129346, throughput 4.992K wps
[Epoch 10 Batch 60/62] avg loss 0.0126236, throughput 4.85645K wps
Begin Testing...
[Epoch 10] train avg loss 0.0128843, dev acc 0.6519, dev avg loss 0.637017, throughput 4.93072K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0128001, throughput 4.98053K wps
[Epoch 11 Batch 60/62] avg loss 0.0126614, throughput 4.87946K wps
Begin Testing...
[Epoch 11] train avg loss 0.0128699, dev acc 0.6519, dev avg loss 0.633519, throughput 4.93735K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0127792, throughput 4.98875K wps
[Epoch 12 Batch 60/62] avg loss 0.012603, throughput 4.88646K wps
Begin Testing...
[Epoch 12] train avg loss 0.012891, dev acc 0.6519, dev avg loss 0.633398, throughput 4.94145K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0126051, throughput 4.96892K wps
[Epoch 13 Batch 60/62] avg loss 0.0126792, throughput 4.87167K wps
Begin Testing...
[Epoch 13] train avg loss 0.0127957, dev acc 0.6519, dev avg loss 0.630808, throughput 4.92789K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0125401, throughput 4.98423K wps
[Epoch 14 Batch 60/62] avg loss 0.012722, throughput 4.86054K wps
Begin Testing...
[Epoch 14] train avg loss 0.0127621, dev acc 0.6519, dev avg loss 0.628669, throughput 4.92674K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0125617, throughput 4.98528K wps
[Epoch 15 Batch 60/62] avg loss 0.0125327, throughput 4.85615K wps
Begin Testing...
[Epoch 15] train avg loss 0.0127304, dev acc 0.6519, dev avg loss 0.62823, throughput 4.92638K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.012605, throughput 4.95488K wps
[Epoch 16 Batch 60/62] avg loss 0.0124045, throughput 4.87174K wps
Begin Testing...
[Epoch 16] train avg loss 0.0126811, dev acc 0.6519, dev avg loss 0.625029, throughput 4.91931K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0123673, throughput 4.99507K wps
[Epoch 17 Batch 60/62] avg loss 0.0124649, throughput 4.88043K wps
Begin Testing...
[Epoch 17] train avg loss 0.0125662, dev acc 0.6519, dev avg loss 0.622683, throughput 4.94442K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.012324, throughput 4.97671K wps
[Epoch 18 Batch 60/62] avg loss 0.0124135, throughput 4.90586K wps
Begin Testing...
[Epoch 18] train avg loss 0.0125068, dev acc 0.6519, dev avg loss 0.620057, throughput 4.94886K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0121922, throughput 4.98869K wps
[Epoch 19 Batch 60/62] avg loss 0.012408, throughput 4.88949K wps
Begin Testing...
[Epoch 19] train avg loss 0.0124662, dev acc 0.6519, dev avg loss 0.617931, throughput 4.94527K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0123786, throughput 5.00214K wps
[Epoch 20 Batch 60/62] avg loss 0.01214, throughput 4.86689K wps
Begin Testing...
[Epoch 20] train avg loss 0.0123917, dev acc 0.6519, dev avg loss 0.614804, throughput 4.94004K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.0122034, throughput 4.99831K wps
[Epoch 21 Batch 60/62] avg loss 0.0122053, throughput 4.90169K wps
Begin Testing...
[Epoch 21] train avg loss 0.0123946, dev acc 0.6608, dev avg loss 0.611826, throughput 4.95658K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.0121146, throughput 4.98701K wps
[Epoch 22 Batch 60/62] avg loss 0.0121842, throughput 4.88493K wps
Begin Testing...
[Epoch 22] train avg loss 0.0122923, dev acc 0.6578, dev avg loss 0.607885, throughput 4.94271K wps
[Epoch 23 Batch 30/62] avg loss 0.0121854, throughput 5.00458K wps
[Epoch 23 Batch 60/62] avg loss 0.0119753, throughput 4.88999K wps
Begin Testing...
[Epoch 23] train avg loss 0.0122539, dev acc 0.6637, dev avg loss 0.60504, throughput 4.95328K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.0119304, throughput 4.9615K wps
[Epoch 24 Batch 60/62] avg loss 0.0121558, throughput 4.86293K wps
Begin Testing...
[Epoch 24] train avg loss 0.0121404, dev acc 0.6637, dev avg loss 0.601177, throughput 4.91971K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.0120917, throughput 4.9846K wps
[Epoch 25 Batch 60/62] avg loss 0.011774, throughput 4.89086K wps
Begin Testing...
[Epoch 25] train avg loss 0.0120575, dev acc 0.6637, dev avg loss 0.597215, throughput 4.94455K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.0118867, throughput 5.00648K wps
[Epoch 26 Batch 60/62] avg loss 0.0118031, throughput 4.88066K wps
Begin Testing...
[Epoch 26] train avg loss 0.0119949, dev acc 0.6667, dev avg loss 0.593545, throughput 4.94722K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/62] avg loss 0.0118073, throughput 5.01861K wps
[Epoch 27 Batch 60/62] avg loss 0.011734, throughput 4.88948K wps
Begin Testing...
[Epoch 27] train avg loss 0.0119367, dev acc 0.6755, dev avg loss 0.590017, throughput 4.96081K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.0117813, throughput 4.97049K wps
[Epoch 28 Batch 60/62] avg loss 0.0115372, throughput 4.86893K wps
Begin Testing...
[Epoch 28] train avg loss 0.0118314, dev acc 0.6755, dev avg loss 0.58607, throughput 4.92565K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.0116474, throughput 4.9979K wps
[Epoch 29 Batch 60/62] avg loss 0.0116314, throughput 4.87245K wps
Begin Testing...
[Epoch 29] train avg loss 0.011778, dev acc 0.6755, dev avg loss 0.582393, throughput 4.94106K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.0117615, throughput 4.99996K wps
[Epoch 30 Batch 60/62] avg loss 0.0112427, throughput 4.87757K wps
Begin Testing...
[Epoch 30] train avg loss 0.0116681, dev acc 0.6785, dev avg loss 0.578777, throughput 4.9453K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.0114965, throughput 4.98767K wps
[Epoch 31 Batch 60/62] avg loss 0.0114283, throughput 4.86534K wps
Begin Testing...
[Epoch 31] train avg loss 0.0115979, dev acc 0.6755, dev avg loss 0.575569, throughput 4.93337K wps
[Epoch 32 Batch 30/62] avg loss 0.0113681, throughput 4.99731K wps
[Epoch 32 Batch 60/62] avg loss 0.011392, throughput 4.88159K wps
Begin Testing...
[Epoch 32] train avg loss 0.0115332, dev acc 0.6785, dev avg loss 0.571531, throughput 4.94303K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/62] avg loss 0.0111974, throughput 5.0048K wps
[Epoch 33 Batch 60/62] avg loss 0.011338, throughput 4.87227K wps
Begin Testing...
[Epoch 33] train avg loss 0.0114477, dev acc 0.7080, dev avg loss 0.568265, throughput 4.94338K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.0111518, throughput 4.98393K wps
[Epoch 34 Batch 60/62] avg loss 0.0112651, throughput 4.86444K wps
Begin Testing...
[Epoch 34] train avg loss 0.011393, dev acc 0.7080, dev avg loss 0.565421, throughput 4.92975K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/62] avg loss 0.0113845, throughput 4.98179K wps
[Epoch 35 Batch 60/62] avg loss 0.0110676, throughput 4.89112K wps
Begin Testing...
[Epoch 35] train avg loss 0.011333, dev acc 0.6932, dev avg loss 0.561356, throughput 4.94261K wps
[Epoch 36 Batch 30/62] avg loss 0.0110769, throughput 4.99916K wps
[Epoch 36 Batch 60/62] avg loss 0.0111311, throughput 4.89467K wps
Begin Testing...
[Epoch 36] train avg loss 0.0111841, dev acc 0.6962, dev avg loss 0.558189, throughput 4.95357K wps
[Epoch 37 Batch 30/62] avg loss 0.0110415, throughput 4.97989K wps
[Epoch 37 Batch 60/62] avg loss 0.010848, throughput 4.88838K wps
Begin Testing...
[Epoch 37] train avg loss 0.0111347, dev acc 0.6991, dev avg loss 0.554752, throughput 4.93865K wps
[Epoch 38 Batch 30/62] avg loss 0.0109215, throughput 4.97695K wps
[Epoch 38 Batch 60/62] avg loss 0.0109085, throughput 4.8889K wps
Begin Testing...
[Epoch 38] train avg loss 0.0110994, dev acc 0.7050, dev avg loss 0.551537, throughput 4.9399K wps
[Epoch 39 Batch 30/62] avg loss 0.0105935, throughput 4.99801K wps
[Epoch 39 Batch 60/62] avg loss 0.0110546, throughput 4.89675K wps
Begin Testing...
[Epoch 39] train avg loss 0.0109705, dev acc 0.7109, dev avg loss 0.548757, throughput 4.95374K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/62] avg loss 0.0109011, throughput 5.01325K wps
[Epoch 40 Batch 60/62] avg loss 0.010559, throughput 4.89094K wps
Begin Testing...
[Epoch 40] train avg loss 0.010856, dev acc 0.6932, dev avg loss 0.54557, throughput 4.95853K wps
[Epoch 41 Batch 30/62] avg loss 0.0106084, throughput 4.98969K wps
[Epoch 41 Batch 60/62] avg loss 0.0106417, throughput 4.88451K wps
Begin Testing...
[Epoch 41] train avg loss 0.0108025, dev acc 0.7109, dev avg loss 0.541566, throughput 4.94267K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/62] avg loss 0.0107142, throughput 4.99208K wps
[Epoch 42 Batch 60/62] avg loss 0.010548, throughput 4.87816K wps
Begin Testing...
[Epoch 42] train avg loss 0.0107852, dev acc 0.7139, dev avg loss 0.538082, throughput 4.94203K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/62] avg loss 0.0105494, throughput 5.00467K wps
[Epoch 43 Batch 60/62] avg loss 0.0103975, throughput 4.87644K wps
Begin Testing...
[Epoch 43] train avg loss 0.010609, dev acc 0.6991, dev avg loss 0.535388, throughput 4.9467K wps
[Epoch 44 Batch 30/62] avg loss 0.0102528, throughput 4.97912K wps
[Epoch 44 Batch 60/62] avg loss 0.0104931, throughput 4.8552K wps
Begin Testing...
[Epoch 44] train avg loss 0.0105012, dev acc 0.7139, dev avg loss 0.531369, throughput 4.9223K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/62] avg loss 0.0103273, throughput 5.00177K wps
[Epoch 45 Batch 60/62] avg loss 0.0102458, throughput 4.90344K wps
Begin Testing...
[Epoch 45] train avg loss 0.0104025, dev acc 0.7139, dev avg loss 0.528294, throughput 4.95885K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/62] avg loss 0.0100659, throughput 4.98136K wps
[Epoch 46 Batch 60/62] avg loss 0.0101808, throughput 4.86882K wps
Begin Testing...
[Epoch 46] train avg loss 0.0102528, dev acc 0.7139, dev avg loss 0.524796, throughput 4.93207K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/62] avg loss 0.0101306, throughput 5.007K wps
[Epoch 47 Batch 60/62] avg loss 0.0100603, throughput 4.87986K wps
Begin Testing...
[Epoch 47] train avg loss 0.0102386, dev acc 0.7257, dev avg loss 0.521068, throughput 4.95008K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/62] avg loss 0.0101362, throughput 5.01045K wps
[Epoch 48 Batch 60/62] avg loss 0.00988432, throughput 4.90065K wps
Begin Testing...
[Epoch 48] train avg loss 0.010153, dev acc 0.7021, dev avg loss 0.519808, throughput 4.96238K wps
[Epoch 49 Batch 30/62] avg loss 0.00999657, throughput 5.01033K wps
[Epoch 49 Batch 60/62] avg loss 0.00991842, throughput 4.8893K wps
Begin Testing...
[Epoch 49] train avg loss 0.0100766, dev acc 0.7021, dev avg loss 0.516637, throughput 4.95623K wps
[Epoch 50 Batch 30/62] avg loss 0.00985856, throughput 5.01077K wps
[Epoch 50 Batch 60/62] avg loss 0.00967868, throughput 4.89789K wps
Begin Testing...
[Epoch 50] train avg loss 0.00984835, dev acc 0.6991, dev avg loss 0.514341, throughput 4.96169K wps
[Epoch 51 Batch 30/62] avg loss 0.00974477, throughput 4.9629K wps
[Epoch 51 Batch 60/62] avg loss 0.0096806, throughput 4.88538K wps
Begin Testing...
[Epoch 51] train avg loss 0.00992209, dev acc 0.7345, dev avg loss 0.509062, throughput 4.93061K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00981604, throughput 4.97618K wps
[Epoch 52 Batch 60/62] avg loss 0.00961418, throughput 4.86528K wps
Begin Testing...
[Epoch 52] train avg loss 0.00981109, dev acc 0.7345, dev avg loss 0.506042, throughput 4.9261K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.00998622, throughput 4.97565K wps
[Epoch 53 Batch 60/62] avg loss 0.00923823, throughput 4.891K wps
Begin Testing...
[Epoch 53] train avg loss 0.00974523, dev acc 0.7050, dev avg loss 0.506106, throughput 4.94027K wps
[Epoch 54 Batch 30/62] avg loss 0.009507, throughput 5.00067K wps
[Epoch 54 Batch 60/62] avg loss 0.00933984, throughput 4.8839K wps
Begin Testing...
[Epoch 54] train avg loss 0.00954865, dev acc 0.7463, dev avg loss 0.501565, throughput 4.94874K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/62] avg loss 0.00925514, throughput 4.98158K wps
[Epoch 55 Batch 60/62] avg loss 0.00946845, throughput 4.89365K wps
Begin Testing...
[Epoch 55] train avg loss 0.0094855, dev acc 0.7257, dev avg loss 0.49947, throughput 4.94453K wps
[Epoch 56 Batch 30/62] avg loss 0.0093421, throughput 4.97095K wps
[Epoch 56 Batch 60/62] avg loss 0.00914955, throughput 4.88743K wps
Begin Testing...
[Epoch 56] train avg loss 0.00933269, dev acc 0.7286, dev avg loss 0.497509, throughput 4.93595K wps
[Epoch 57 Batch 30/62] avg loss 0.00917989, throughput 4.99651K wps
[Epoch 57 Batch 60/62] avg loss 0.00903107, throughput 4.89044K wps
Begin Testing...
[Epoch 57] train avg loss 0.00925066, dev acc 0.7463, dev avg loss 0.492762, throughput 4.9493K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/62] avg loss 0.00915247, throughput 5.0023K wps
[Epoch 58 Batch 60/62] avg loss 0.00903949, throughput 4.87884K wps
Begin Testing...
[Epoch 58] train avg loss 0.0092255, dev acc 0.7404, dev avg loss 0.489934, throughput 4.94723K wps
[Epoch 59 Batch 30/62] avg loss 0.00897645, throughput 4.9488K wps
[Epoch 59 Batch 60/62] avg loss 0.00894875, throughput 4.8625K wps
Begin Testing...
[Epoch 59] train avg loss 0.00914493, dev acc 0.7404, dev avg loss 0.488674, throughput 4.91366K wps
[Epoch 60 Batch 30/62] avg loss 0.00886744, throughput 5.00662K wps
[Epoch 60 Batch 60/62] avg loss 0.00876523, throughput 4.86848K wps
Begin Testing...
[Epoch 60] train avg loss 0.00895234, dev acc 0.7434, dev avg loss 0.485184, throughput 4.94324K wps
[Epoch 61 Batch 30/62] avg loss 0.00865778, throughput 5.01117K wps
[Epoch 61 Batch 60/62] avg loss 0.00897156, throughput 4.87101K wps
Begin Testing...
[Epoch 61] train avg loss 0.00893417, dev acc 0.7640, dev avg loss 0.482935, throughput 4.94566K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/62] avg loss 0.00881501, throughput 4.99988K wps
[Epoch 62 Batch 60/62] avg loss 0.00843588, throughput 4.88919K wps
Begin Testing...
[Epoch 62] train avg loss 0.00878994, dev acc 0.7375, dev avg loss 0.48286, throughput 4.95101K wps
[Epoch 63 Batch 30/62] avg loss 0.00865954, throughput 4.99286K wps
[Epoch 63 Batch 60/62] avg loss 0.00868941, throughput 4.8874K wps
Begin Testing...
[Epoch 63] train avg loss 0.00877467, dev acc 0.7434, dev avg loss 0.478267, throughput 4.9463K wps
[Epoch 64 Batch 30/62] avg loss 0.00860429, throughput 5.00421K wps
[Epoch 64 Batch 60/62] avg loss 0.00828811, throughput 4.87521K wps
Begin Testing...
[Epoch 64] train avg loss 0.00863766, dev acc 0.7640, dev avg loss 0.476133, throughput 4.94614K wps
Observed Improvement.
Begin Testing...
[Epoch 65 Batch 30/62] avg loss 0.00837748, throughput 4.98492K wps
[Epoch 65 Batch 60/62] avg loss 0.00851612, throughput 4.89071K wps
Begin Testing...
[Epoch 65] train avg loss 0.00860272, dev acc 0.7611, dev avg loss 0.473862, throughput 4.94312K wps
[Epoch 66 Batch 30/62] avg loss 0.00818455, throughput 4.9895K wps
[Epoch 66 Batch 60/62] avg loss 0.00831309, throughput 4.89446K wps
Begin Testing...
[Epoch 66] train avg loss 0.00834519, dev acc 0.7847, dev avg loss 0.473, throughput 4.94824K wps
Observed Improvement.
Begin Testing...
[Epoch 67 Batch 30/62] avg loss 0.00809122, throughput 5.00094K wps
[Epoch 67 Batch 60/62] avg loss 0.00843305, throughput 4.89621K wps
Begin Testing...
[Epoch 67] train avg loss 0.00834273, dev acc 0.7463, dev avg loss 0.469999, throughput 4.95456K wps
[Epoch 68 Batch 30/62] avg loss 0.00822488, throughput 4.99229K wps
[Epoch 68 Batch 60/62] avg loss 0.00830307, throughput 4.87274K wps
Begin Testing...
[Epoch 68] train avg loss 0.00830121, dev acc 0.7463, dev avg loss 0.469699, throughput 4.93944K wps
[Epoch 69 Batch 30/62] avg loss 0.00806536, throughput 4.99578K wps
[Epoch 69 Batch 60/62] avg loss 0.00794242, throughput 4.86576K wps
Begin Testing...
[Epoch 69] train avg loss 0.00819, dev acc 0.7699, dev avg loss 0.466071, throughput 4.93752K wps
[Epoch 70 Batch 30/62] avg loss 0.00774308, throughput 5.01756K wps
[Epoch 70 Batch 60/62] avg loss 0.00809664, throughput 4.89542K wps
Begin Testing...
[Epoch 70] train avg loss 0.00808887, dev acc 0.7552, dev avg loss 0.466557, throughput 4.96243K wps
[Epoch 71 Batch 30/62] avg loss 0.00775949, throughput 4.98944K wps
[Epoch 71 Batch 60/62] avg loss 0.00795858, throughput 4.88062K wps
Begin Testing...
[Epoch 71] train avg loss 0.00791206, dev acc 0.7758, dev avg loss 0.463298, throughput 4.94096K wps
[Epoch 72 Batch 30/62] avg loss 0.00782467, throughput 5.00402K wps
[Epoch 72 Batch 60/62] avg loss 0.00771809, throughput 4.8761K wps
Begin Testing...
[Epoch 72] train avg loss 0.00788672, dev acc 0.7611, dev avg loss 0.460954, throughput 4.94636K wps
[Epoch 73 Batch 30/62] avg loss 0.00755409, throughput 5.004K wps
[Epoch 73 Batch 60/62] avg loss 0.00773261, throughput 4.90061K wps
Begin Testing...
[Epoch 73] train avg loss 0.00771329, dev acc 0.7611, dev avg loss 0.459583, throughput 4.95865K wps
[Epoch 74 Batch 30/62] avg loss 0.00793675, throughput 5.00784K wps
[Epoch 74 Batch 60/62] avg loss 0.00733598, throughput 4.86795K wps
Begin Testing...
[Epoch 74] train avg loss 0.00771631, dev acc 0.7729, dev avg loss 0.457584, throughput 4.94456K wps
[Epoch 75 Batch 30/62] avg loss 0.00747356, throughput 4.97602K wps
[Epoch 75 Batch 60/62] avg loss 0.00757958, throughput 4.85451K wps
Begin Testing...
[Epoch 75] train avg loss 0.00757035, dev acc 0.7670, dev avg loss 0.457248, throughput 4.92165K wps
[Epoch 76 Batch 30/62] avg loss 0.00726318, throughput 5.00908K wps
[Epoch 76 Batch 60/62] avg loss 0.00763602, throughput 4.89823K wps
Begin Testing...
[Epoch 76] train avg loss 0.00749894, dev acc 0.7522, dev avg loss 0.460111, throughput 4.95927K wps
[Epoch 77 Batch 30/62] avg loss 0.00739396, throughput 5.01572K wps
[Epoch 77 Batch 60/62] avg loss 0.00722113, throughput 4.88588K wps
Begin Testing...
[Epoch 77] train avg loss 0.00740662, dev acc 0.7611, dev avg loss 0.456015, throughput 4.95697K wps
[Epoch 78 Batch 30/62] avg loss 0.00729084, throughput 4.98672K wps
[Epoch 78 Batch 60/62] avg loss 0.00722301, throughput 4.88829K wps
Begin Testing...
[Epoch 78] train avg loss 0.00732879, dev acc 0.7729, dev avg loss 0.451159, throughput 4.94549K wps
[Epoch 79 Batch 30/62] avg loss 0.00727841, throughput 5.00126K wps
[Epoch 79 Batch 60/62] avg loss 0.00692348, throughput 4.89093K wps
Begin Testing...
[Epoch 79] train avg loss 0.00721057, dev acc 0.7817, dev avg loss 0.450875, throughput 4.95299K wps
[Epoch 80 Batch 30/62] avg loss 0.00724989, throughput 4.99882K wps
[Epoch 80 Batch 60/62] avg loss 0.00686074, throughput 4.87813K wps
Begin Testing...
[Epoch 80] train avg loss 0.00723606, dev acc 0.7581, dev avg loss 0.456859, throughput 4.94448K wps
[Epoch 81 Batch 30/62] avg loss 0.00714007, throughput 4.99685K wps
[Epoch 81 Batch 60/62] avg loss 0.00687076, throughput 4.89153K wps
Begin Testing...
[Epoch 81] train avg loss 0.0070921, dev acc 0.7611, dev avg loss 0.450651, throughput 4.95059K wps
[Epoch 82 Batch 30/62] avg loss 0.00677499, throughput 4.98855K wps
[Epoch 82 Batch 60/62] avg loss 0.00698095, throughput 4.90082K wps
Begin Testing...
[Epoch 82] train avg loss 0.00700473, dev acc 0.7788, dev avg loss 0.447005, throughput 4.95155K wps
[Epoch 83 Batch 30/62] avg loss 0.00664278, throughput 5.0137K wps
[Epoch 83 Batch 60/62] avg loss 0.00673186, throughput 4.85185K wps
Begin Testing...
[Epoch 83] train avg loss 0.00674627, dev acc 0.7729, dev avg loss 0.446387, throughput 4.93795K wps
[Epoch 84 Batch 30/62] avg loss 0.00679539, throughput 4.99503K wps
[Epoch 84 Batch 60/62] avg loss 0.00657153, throughput 4.85008K wps
Begin Testing...
[Epoch 84] train avg loss 0.0068368, dev acc 0.7670, dev avg loss 0.450201, throughput 4.92832K wps
[Epoch 85 Batch 30/62] avg loss 0.00642873, throughput 4.98038K wps
[Epoch 85 Batch 60/62] avg loss 0.00682591, throughput 4.87085K wps
Begin Testing...
[Epoch 85] train avg loss 0.00667685, dev acc 0.7699, dev avg loss 0.445888, throughput 4.93127K wps
[Epoch 86 Batch 30/62] avg loss 0.00642136, throughput 4.99222K wps
[Epoch 86 Batch 60/62] avg loss 0.00663181, throughput 4.87794K wps
Begin Testing...
[Epoch 86] train avg loss 0.00659612, dev acc 0.7699, dev avg loss 0.443136, throughput 4.94018K wps
[Epoch 87 Batch 30/62] avg loss 0.00635369, throughput 4.98218K wps
[Epoch 87 Batch 60/62] avg loss 0.00663491, throughput 4.88812K wps
Begin Testing...
[Epoch 87] train avg loss 0.00653622, dev acc 0.7847, dev avg loss 0.440368, throughput 4.94154K wps
Observed Improvement.
Begin Testing...
[Epoch 88 Batch 30/62] avg loss 0.00646188, throughput 5.00159K wps
[Epoch 88 Batch 60/62] avg loss 0.00623381, throughput 4.8963K wps
Begin Testing...
[Epoch 88] train avg loss 0.00642589, dev acc 0.7788, dev avg loss 0.440111, throughput 4.95526K wps
[Epoch 89 Batch 30/62] avg loss 0.00635422, throughput 5.00718K wps
[Epoch 89 Batch 60/62] avg loss 0.00639032, throughput 4.88818K wps
Begin Testing...
[Epoch 89] train avg loss 0.00642332, dev acc 0.7788, dev avg loss 0.439944, throughput 4.95351K wps
[Epoch 90 Batch 30/62] avg loss 0.00602878, throughput 4.98667K wps
[Epoch 90 Batch 60/62] avg loss 0.00633054, throughput 4.87237K wps
Begin Testing...
[Epoch 90] train avg loss 0.00631449, dev acc 0.7788, dev avg loss 0.438833, throughput 4.93454K wps
[Epoch 91 Batch 30/62] avg loss 0.00617727, throughput 5.01134K wps
[Epoch 91 Batch 60/62] avg loss 0.00628438, throughput 4.87193K wps
Begin Testing...
[Epoch 91] train avg loss 0.00633441, dev acc 0.7817, dev avg loss 0.437867, throughput 4.94698K wps
[Epoch 92 Batch 30/62] avg loss 0.00603458, throughput 4.98328K wps
[Epoch 92 Batch 60/62] avg loss 0.0062494, throughput 4.88991K wps
Begin Testing...
[Epoch 92] train avg loss 0.00630364, dev acc 0.7788, dev avg loss 0.438597, throughput 4.94149K wps
[Epoch 93 Batch 30/62] avg loss 0.00588323, throughput 4.98999K wps
[Epoch 93 Batch 60/62] avg loss 0.00608261, throughput 4.88397K wps
Begin Testing...
[Epoch 93] train avg loss 0.00608529, dev acc 0.7847, dev avg loss 0.435356, throughput 4.9431K wps
Observed Improvement.
Begin Testing...
[Epoch 94 Batch 30/62] avg loss 0.00583351, throughput 4.99663K wps
[Epoch 94 Batch 60/62] avg loss 0.00591848, throughput 4.88587K wps
Begin Testing...
[Epoch 94] train avg loss 0.00591322, dev acc 0.7788, dev avg loss 0.437244, throughput 4.94739K wps
[Epoch 95 Batch 30/62] avg loss 0.00592468, throughput 4.98621K wps
[Epoch 95 Batch 60/62] avg loss 0.00591152, throughput 4.88313K wps
Begin Testing...
[Epoch 95] train avg loss 0.00599086, dev acc 0.7935, dev avg loss 0.433908, throughput 4.94121K wps
Observed Improvement.
Begin Testing...
[Epoch 96 Batch 30/62] avg loss 0.00575012, throughput 4.97831K wps
[Epoch 96 Batch 60/62] avg loss 0.00574306, throughput 4.89164K wps
Begin Testing...
[Epoch 96] train avg loss 0.00578154, dev acc 0.7729, dev avg loss 0.435589, throughput 4.9415K wps
[Epoch 97 Batch 30/62] avg loss 0.0056507, throughput 4.98184K wps
[Epoch 97 Batch 60/62] avg loss 0.00567897, throughput 4.87903K wps
Begin Testing...
[Epoch 97] train avg loss 0.00577044, dev acc 0.7817, dev avg loss 0.438278, throughput 4.93518K wps
[Epoch 98 Batch 30/62] avg loss 0.0056878, throughput 4.9518K wps
[Epoch 98 Batch 60/62] avg loss 0.00538751, throughput 4.84775K wps
Begin Testing...
[Epoch 98] train avg loss 0.00560692, dev acc 0.7758, dev avg loss 0.434995, throughput 4.90749K wps
[Epoch 99 Batch 30/62] avg loss 0.00536682, throughput 5.00193K wps
[Epoch 99 Batch 60/62] avg loss 0.00558626, throughput 4.86858K wps
Begin Testing...
[Epoch 99] train avg loss 0.00561565, dev acc 0.7906, dev avg loss 0.431505, throughput 4.93977K wps
[Epoch 100 Batch 30/62] avg loss 0.00529724, throughput 4.95958K wps
[Epoch 100 Batch 60/62] avg loss 0.00552867, throughput 4.88255K wps
Begin Testing...
[Epoch 100] train avg loss 0.00544494, dev acc 0.7847, dev avg loss 0.436162, throughput 4.92893K wps
[Epoch 101 Batch 30/62] avg loss 0.00523695, throughput 5.00122K wps
[Epoch 101 Batch 60/62] avg loss 0.00548802, throughput 4.88852K wps
Begin Testing...
[Epoch 101] train avg loss 0.00540599, dev acc 0.7817, dev avg loss 0.431237, throughput 4.9519K wps
[Epoch 102 Batch 30/62] avg loss 0.00511413, throughput 4.98632K wps
[Epoch 102 Batch 60/62] avg loss 0.0051651, throughput 4.88446K wps
Begin Testing...
[Epoch 102] train avg loss 0.00536269, dev acc 0.7906, dev avg loss 0.439738, throughput 4.94241K wps
[Epoch 103 Batch 30/62] avg loss 0.00523178, throughput 5.00248K wps
[Epoch 103 Batch 60/62] avg loss 0.00506889, throughput 4.88517K wps
Begin Testing...
[Epoch 103] train avg loss 0.00522644, dev acc 0.7965, dev avg loss 0.4293, throughput 4.95001K wps
Observed Improvement.
Begin Testing...
[Epoch 104 Batch 30/62] avg loss 0.0052092, throughput 4.98898K wps
[Epoch 104 Batch 60/62] avg loss 0.00498305, throughput 4.89584K wps
Begin Testing...
[Epoch 104] train avg loss 0.00516301, dev acc 0.7906, dev avg loss 0.428605, throughput 4.94868K wps
[Epoch 105 Batch 30/62] avg loss 0.00513421, throughput 5.01046K wps
[Epoch 105 Batch 60/62] avg loss 0.00500458, throughput 4.89405K wps
Begin Testing...
[Epoch 105] train avg loss 0.00506539, dev acc 0.7847, dev avg loss 0.429721, throughput 4.95806K wps
[Epoch 106 Batch 30/62] avg loss 0.00516367, throughput 4.99365K wps
[Epoch 106 Batch 60/62] avg loss 0.0050102, throughput 4.85573K wps
Begin Testing...
[Epoch 106] train avg loss 0.00510045, dev acc 0.7817, dev avg loss 0.428598, throughput 4.9312K wps
[Epoch 107 Batch 30/62] avg loss 0.00486555, throughput 5.00151K wps
[Epoch 107 Batch 60/62] avg loss 0.00490479, throughput 4.89191K wps
Begin Testing...
[Epoch 107] train avg loss 0.00495281, dev acc 0.7935, dev avg loss 0.427157, throughput 4.95206K wps
[Epoch 108 Batch 30/62] avg loss 0.00475919, throughput 4.96103K wps
[Epoch 108 Batch 60/62] avg loss 0.0048931, throughput 4.84932K wps
Begin Testing...
[Epoch 108] train avg loss 0.00484692, dev acc 0.7935, dev avg loss 0.426546, throughput 4.9124K wps
[Epoch 109 Batch 30/62] avg loss 0.00474113, throughput 4.98188K wps
[Epoch 109 Batch 60/62] avg loss 0.00481723, throughput 4.88403K wps
Begin Testing...
[Epoch 109] train avg loss 0.00485814, dev acc 0.7817, dev avg loss 0.429594, throughput 4.93896K wps
[Epoch 110 Batch 30/62] avg loss 0.00452266, throughput 4.99908K wps
[Epoch 110 Batch 60/62] avg loss 0.0048147, throughput 4.88532K wps
Begin Testing...
[Epoch 110] train avg loss 0.00468269, dev acc 0.7729, dev avg loss 0.433511, throughput 4.94882K wps
[Epoch 111 Batch 30/62] avg loss 0.00444342, throughput 4.98832K wps
[Epoch 111 Batch 60/62] avg loss 0.00484623, throughput 4.8769K wps
Begin Testing...
[Epoch 111] train avg loss 0.00469324, dev acc 0.7788, dev avg loss 0.429432, throughput 4.9392K wps
[Epoch 112 Batch 30/62] avg loss 0.00465227, throughput 5.00024K wps
[Epoch 112 Batch 60/62] avg loss 0.00452059, throughput 4.85753K wps
Begin Testing...
[Epoch 112] train avg loss 0.00461688, dev acc 0.7906, dev avg loss 0.428067, throughput 4.93433K wps
[Epoch 113 Batch 30/62] avg loss 0.00451808, throughput 5.00689K wps
[Epoch 113 Batch 60/62] avg loss 0.00454168, throughput 4.88419K wps
Begin Testing...
[Epoch 113] train avg loss 0.00458092, dev acc 0.7965, dev avg loss 0.426442, throughput 4.95202K wps
Observed Improvement.
Begin Testing...
[Epoch 114 Batch 30/62] avg loss 0.00436518, throughput 4.98361K wps
[Epoch 114 Batch 60/62] avg loss 0.00431415, throughput 4.89362K wps
Begin Testing...
[Epoch 114] train avg loss 0.00443745, dev acc 0.7817, dev avg loss 0.427947, throughput 4.94593K wps
[Epoch 115 Batch 30/62] avg loss 0.0042708, throughput 4.99368K wps
[Epoch 115 Batch 60/62] avg loss 0.00432812, throughput 4.89262K wps
Begin Testing...
[Epoch 115] train avg loss 0.00439986, dev acc 0.7699, dev avg loss 0.433719, throughput 4.94939K wps
[Epoch 116 Batch 30/62] avg loss 0.0040065, throughput 4.98479K wps
[Epoch 116 Batch 60/62] avg loss 0.00454627, throughput 4.86588K wps
Begin Testing...
[Epoch 116] train avg loss 0.00434371, dev acc 0.7965, dev avg loss 0.42499, throughput 4.93244K wps
Observed Improvement.
Begin Testing...
[Epoch 117 Batch 30/62] avg loss 0.00433468, throughput 5.00102K wps
[Epoch 117 Batch 60/62] avg loss 0.00419189, throughput 4.86846K wps
Begin Testing...
[Epoch 117] train avg loss 0.00434111, dev acc 0.7935, dev avg loss 0.42626, throughput 4.93944K wps
[Epoch 118 Batch 30/62] avg loss 0.00438524, throughput 5.01397K wps
[Epoch 118 Batch 60/62] avg loss 0.00418325, throughput 4.89229K wps
Begin Testing...
[Epoch 118] train avg loss 0.00431072, dev acc 0.7729, dev avg loss 0.430848, throughput 4.95835K wps
[Epoch 119 Batch 30/62] avg loss 0.00411002, throughput 4.95866K wps
[Epoch 119 Batch 60/62] avg loss 0.00402882, throughput 4.87819K wps
Begin Testing...
[Epoch 119] train avg loss 0.00414202, dev acc 0.7965, dev avg loss 0.425274, throughput 4.92475K wps
Observed Improvement.
Begin Testing...
[Epoch 120 Batch 30/62] avg loss 0.00409223, throughput 4.98567K wps
[Epoch 120 Batch 60/62] avg loss 0.00407348, throughput 4.88664K wps
Begin Testing...
[Epoch 120] train avg loss 0.00416148, dev acc 0.7876, dev avg loss 0.424989, throughput 4.9423K wps
[Epoch 121 Batch 30/62] avg loss 0.00406671, throughput 5.0065K wps
[Epoch 121 Batch 60/62] avg loss 0.00381582, throughput 4.88807K wps
Begin Testing...
[Epoch 121] train avg loss 0.00398082, dev acc 0.7817, dev avg loss 0.436575, throughput 4.95285K wps
[Epoch 122 Batch 30/62] avg loss 0.00391431, throughput 5.0066K wps
[Epoch 122 Batch 60/62] avg loss 0.00387475, throughput 4.89199K wps
Begin Testing...
[Epoch 122] train avg loss 0.00390834, dev acc 0.7994, dev avg loss 0.42577, throughput 4.9535K wps
Observed Improvement.
Begin Testing...
[Epoch 123 Batch 30/62] avg loss 0.00391042, throughput 4.99685K wps
[Epoch 123 Batch 60/62] avg loss 0.00362638, throughput 4.88284K wps
Begin Testing...
[Epoch 123] train avg loss 0.0038326, dev acc 0.7847, dev avg loss 0.427786, throughput 4.94631K wps
[Epoch 124 Batch 30/62] avg loss 0.0038841, throughput 4.9956K wps
[Epoch 124 Batch 60/62] avg loss 0.00371192, throughput 4.88742K wps
Begin Testing...
[Epoch 124] train avg loss 0.00380038, dev acc 0.7729, dev avg loss 0.431063, throughput 4.94771K wps
[Epoch 125 Batch 30/62] avg loss 0.00381772, throughput 4.99929K wps
[Epoch 125 Batch 60/62] avg loss 0.00371156, throughput 4.89972K wps
Begin Testing...
[Epoch 125] train avg loss 0.00375672, dev acc 0.7788, dev avg loss 0.434844, throughput 4.9556K wps
[Epoch 126 Batch 30/62] avg loss 0.00378403, throughput 5.00206K wps
[Epoch 126 Batch 60/62] avg loss 0.00361364, throughput 4.87568K wps
Begin Testing...
[Epoch 126] train avg loss 0.00369842, dev acc 0.7788, dev avg loss 0.434274, throughput 4.94425K wps
[Epoch 127 Batch 30/62] avg loss 0.00374129, throughput 4.99301K wps
[Epoch 127 Batch 60/62] avg loss 0.00344708, throughput 4.89449K wps
Begin Testing...
[Epoch 127] train avg loss 0.00362797, dev acc 0.7758, dev avg loss 0.432637, throughput 4.9495K wps
[Epoch 128 Batch 30/62] avg loss 0.00358675, throughput 5.01225K wps
[Epoch 128 Batch 60/62] avg loss 0.003593, throughput 4.8685K wps
Begin Testing...
[Epoch 128] train avg loss 0.00365403, dev acc 0.7847, dev avg loss 0.428405, throughput 4.94573K wps
[Epoch 129 Batch 30/62] avg loss 0.00349761, throughput 4.99738K wps
[Epoch 129 Batch 60/62] avg loss 0.0035553, throughput 4.86863K wps
Begin Testing...
[Epoch 129] train avg loss 0.00361733, dev acc 0.7906, dev avg loss 0.426911, throughput 4.93961K wps
[Epoch 130 Batch 30/62] avg loss 0.00342151, throughput 4.99097K wps
[Epoch 130 Batch 60/62] avg loss 0.00350828, throughput 4.89442K wps
Begin Testing...
[Epoch 130] train avg loss 0.00354572, dev acc 0.7847, dev avg loss 0.431667, throughput 4.94845K wps
[Epoch 131 Batch 30/62] avg loss 0.00319451, throughput 4.97865K wps
[Epoch 131 Batch 60/62] avg loss 0.00359176, throughput 4.88062K wps
Begin Testing...
[Epoch 131] train avg loss 0.00347348, dev acc 0.7906, dev avg loss 0.4272, throughput 4.93733K wps
[Epoch 132 Batch 30/62] avg loss 0.00348495, throughput 4.98896K wps
[Epoch 132 Batch 60/62] avg loss 0.00336228, throughput 4.89595K wps
Begin Testing...
[Epoch 132] train avg loss 0.00347762, dev acc 0.7847, dev avg loss 0.432911, throughput 4.94919K wps
[Epoch 133 Batch 30/62] avg loss 0.00327517, throughput 4.98895K wps
[Epoch 133 Batch 60/62] avg loss 0.00336581, throughput 4.85224K wps
Begin Testing...
[Epoch 133] train avg loss 0.00334451, dev acc 0.7847, dev avg loss 0.430933, throughput 4.92783K wps
[Epoch 134 Batch 30/62] avg loss 0.00322567, throughput 4.9694K wps
[Epoch 134 Batch 60/62] avg loss 0.00295227, throughput 4.88682K wps
Begin Testing...
[Epoch 134] train avg loss 0.00311873, dev acc 0.7935, dev avg loss 0.430909, throughput 4.93413K wps
[Epoch 135 Batch 30/62] avg loss 0.00321544, throughput 4.98165K wps
[Epoch 135 Batch 60/62] avg loss 0.00313719, throughput 4.88162K wps
Begin Testing...
[Epoch 135] train avg loss 0.00331754, dev acc 0.7935, dev avg loss 0.428908, throughput 4.93871K wps
[Epoch 136 Batch 30/62] avg loss 0.00316794, throughput 5.00094K wps
[Epoch 136 Batch 60/62] avg loss 0.00313267, throughput 4.89907K wps
Begin Testing...
[Epoch 136] train avg loss 0.00318507, dev acc 0.7965, dev avg loss 0.429308, throughput 4.9562K wps
[Epoch 137 Batch 30/62] avg loss 0.00307963, throughput 5.0066K wps
[Epoch 137 Batch 60/62] avg loss 0.00306585, throughput 4.86368K wps
Begin Testing...
[Epoch 137] train avg loss 0.0031214, dev acc 0.7965, dev avg loss 0.428615, throughput 4.93997K wps
[Epoch 138 Batch 30/62] avg loss 0.00315953, throughput 4.97149K wps
[Epoch 138 Batch 60/62] avg loss 0.00307243, throughput 4.89269K wps
Begin Testing...
[Epoch 138] train avg loss 0.0031061, dev acc 0.7847, dev avg loss 0.435403, throughput 4.93952K wps
[Epoch 139 Batch 30/62] avg loss 0.00305073, throughput 4.97059K wps
[Epoch 139 Batch 60/62] avg loss 0.00283976, throughput 4.87126K wps
Begin Testing...
[Epoch 139] train avg loss 0.00300516, dev acc 0.7965, dev avg loss 0.428573, throughput 4.92846K wps
[Epoch 140 Batch 30/62] avg loss 0.00304923, throughput 4.96452K wps
[Epoch 140 Batch 60/62] avg loss 0.00291364, throughput 4.87251K wps
Begin Testing...
[Epoch 140] train avg loss 0.00299479, dev acc 0.7935, dev avg loss 0.432723, throughput 4.9253K wps
[Epoch 141 Batch 30/62] avg loss 0.00296524, throughput 5.00158K wps
[Epoch 141 Batch 60/62] avg loss 0.00290003, throughput 4.8889K wps
Begin Testing...
[Epoch 141] train avg loss 0.0029615, dev acc 0.7994, dev avg loss 0.43101, throughput 4.95205K wps
Observed Improvement.
Begin Testing...
[Epoch 142 Batch 30/62] avg loss 0.00289672, throughput 4.98582K wps
[Epoch 142 Batch 60/62] avg loss 0.00292054, throughput 4.86261K wps
Begin Testing...
[Epoch 142] train avg loss 0.00292163, dev acc 0.7906, dev avg loss 0.439256, throughput 4.93125K wps
[Epoch 143 Batch 30/62] avg loss 0.00288352, throughput 4.99523K wps
[Epoch 143 Batch 60/62] avg loss 0.00280932, throughput 4.88136K wps
Begin Testing...
[Epoch 143] train avg loss 0.00285872, dev acc 0.7906, dev avg loss 0.438267, throughput 4.94395K wps
[Epoch 144 Batch 30/62] avg loss 0.00273644, throughput 4.99227K wps
[Epoch 144 Batch 60/62] avg loss 0.00273081, throughput 4.88648K wps
Begin Testing...
[Epoch 144] train avg loss 0.00276441, dev acc 0.7935, dev avg loss 0.436161, throughput 4.9429K wps
[Epoch 145 Batch 30/62] avg loss 0.00266657, throughput 4.97635K wps
[Epoch 145 Batch 60/62] avg loss 0.00276426, throughput 4.88056K wps
Begin Testing...
[Epoch 145] train avg loss 0.00273984, dev acc 0.7994, dev avg loss 0.436686, throughput 4.93518K wps
Observed Improvement.
Begin Testing...
[Epoch 146 Batch 30/62] avg loss 0.00257937, throughput 4.98256K wps
[Epoch 146 Batch 60/62] avg loss 0.00266052, throughput 4.90076K wps
Begin Testing...
[Epoch 146] train avg loss 0.00267786, dev acc 0.7935, dev avg loss 0.441054, throughput 4.94802K wps
[Epoch 147 Batch 30/62] avg loss 0.0027609, throughput 4.99118K wps
[Epoch 147 Batch 60/62] avg loss 0.00250303, throughput 4.89134K wps
Begin Testing...
[Epoch 147] train avg loss 0.00266885, dev acc 0.7935, dev avg loss 0.440406, throughput 4.9484K wps
[Epoch 148 Batch 30/62] avg loss 0.00265774, throughput 4.99525K wps
[Epoch 148 Batch 60/62] avg loss 0.00252629, throughput 4.89118K wps
Begin Testing...
[Epoch 148] train avg loss 0.00266804, dev acc 0.7906, dev avg loss 0.436129, throughput 4.9493K wps
[Epoch 149 Batch 30/62] avg loss 0.00246447, throughput 4.99692K wps
[Epoch 149 Batch 60/62] avg loss 0.00263111, throughput 4.87106K wps
Begin Testing...
[Epoch 149] train avg loss 0.00262994, dev acc 0.8024, dev avg loss 0.434292, throughput 4.93907K wps
Observed Improvement.
Begin Testing...
[Epoch 150 Batch 30/62] avg loss 0.00262036, throughput 4.98341K wps
[Epoch 150 Batch 60/62] avg loss 0.00250465, throughput 4.87076K wps
Begin Testing...
[Epoch 150] train avg loss 0.00266443, dev acc 0.7906, dev avg loss 0.438514, throughput 4.93399K wps
[Epoch 151 Batch 30/62] avg loss 0.00249845, throughput 5.01839K wps
[Epoch 151 Batch 60/62] avg loss 0.00239424, throughput 4.89115K wps
Begin Testing...
[Epoch 151] train avg loss 0.00245733, dev acc 0.7965, dev avg loss 0.449208, throughput 4.96039K wps
[Epoch 152 Batch 30/62] avg loss 0.00241999, throughput 4.99003K wps
[Epoch 152 Batch 60/62] avg loss 0.0024792, throughput 4.89907K wps
Begin Testing...
[Epoch 152] train avg loss 0.00250121, dev acc 0.8053, dev avg loss 0.437639, throughput 4.94925K wps
Observed Improvement.
Begin Testing...
[Epoch 153 Batch 30/62] avg loss 0.00249641, throughput 4.99128K wps
[Epoch 153 Batch 60/62] avg loss 0.00228991, throughput 4.87184K wps
Begin Testing...
[Epoch 153] train avg loss 0.00242078, dev acc 0.7935, dev avg loss 0.439381, throughput 4.93814K wps
[Epoch 154 Batch 30/62] avg loss 0.00237552, throughput 4.96923K wps
[Epoch 154 Batch 60/62] avg loss 0.00246422, throughput 4.87606K wps
Begin Testing...
[Epoch 154] train avg loss 0.00242696, dev acc 0.7935, dev avg loss 0.445636, throughput 4.93049K wps
[Epoch 155 Batch 30/62] avg loss 0.00229478, throughput 4.98607K wps
[Epoch 155 Batch 60/62] avg loss 0.00237719, throughput 4.90252K wps
Begin Testing...
[Epoch 155] train avg loss 0.00239359, dev acc 0.7935, dev avg loss 0.444432, throughput 4.9507K wps
[Epoch 156 Batch 30/62] avg loss 0.00228462, throughput 4.99881K wps
[Epoch 156 Batch 60/62] avg loss 0.00221225, throughput 4.86379K wps
Begin Testing...
[Epoch 156] train avg loss 0.00228974, dev acc 0.7994, dev avg loss 0.439363, throughput 4.93747K wps
[Epoch 157 Batch 30/62] avg loss 0.00226117, throughput 5.00589K wps
[Epoch 157 Batch 60/62] avg loss 0.00244477, throughput 4.89674K wps
Begin Testing...
[Epoch 157] train avg loss 0.00235315, dev acc 0.7935, dev avg loss 0.442933, throughput 4.95811K wps
[Epoch 158 Batch 30/62] avg loss 0.00217627, throughput 5.00788K wps
[Epoch 158 Batch 60/62] avg loss 0.00223292, throughput 4.87273K wps
Begin Testing...
[Epoch 158] train avg loss 0.00224153, dev acc 0.7876, dev avg loss 0.447005, throughput 4.94629K wps
[Epoch 159 Batch 30/62] avg loss 0.00206815, throughput 5.01148K wps
[Epoch 159 Batch 60/62] avg loss 0.00228309, throughput 4.88591K wps
Begin Testing...
[Epoch 159] train avg loss 0.00220615, dev acc 0.7965, dev avg loss 0.443964, throughput 4.9548K wps
[Epoch 160 Batch 30/62] avg loss 0.00219781, throughput 5.00279K wps
[Epoch 160 Batch 60/62] avg loss 0.00219103, throughput 4.8801K wps
Begin Testing...
[Epoch 160] train avg loss 0.00219792, dev acc 0.7906, dev avg loss 0.450105, throughput 4.94633K wps
[Epoch 161 Batch 30/62] avg loss 0.00198454, throughput 4.98986K wps
[Epoch 161 Batch 60/62] avg loss 0.00222035, throughput 4.89064K wps
Begin Testing...
[Epoch 161] train avg loss 0.00214524, dev acc 0.7935, dev avg loss 0.448811, throughput 4.94518K wps
[Epoch 162 Batch 30/62] avg loss 0.00208052, throughput 4.95533K wps
[Epoch 162 Batch 60/62] avg loss 0.00204449, throughput 4.88139K wps
Begin Testing...
[Epoch 162] train avg loss 0.00209017, dev acc 0.7935, dev avg loss 0.445571, throughput 4.92539K wps
[Epoch 163 Batch 30/62] avg loss 0.00211651, throughput 4.99555K wps
[Epoch 163 Batch 60/62] avg loss 0.00206379, throughput 4.88851K wps
Begin Testing...
[Epoch 163] train avg loss 0.00211646, dev acc 0.7906, dev avg loss 0.451337, throughput 4.94873K wps
[Epoch 164 Batch 30/62] avg loss 0.00189513, throughput 4.97206K wps
[Epoch 164 Batch 60/62] avg loss 0.00210225, throughput 4.884K wps
Begin Testing...
[Epoch 164] train avg loss 0.00202398, dev acc 0.7994, dev avg loss 0.449291, throughput 4.93485K wps
[Epoch 165 Batch 30/62] avg loss 0.00200731, throughput 5.00779K wps
[Epoch 165 Batch 60/62] avg loss 0.00208437, throughput 4.88622K wps
Begin Testing...
[Epoch 165] train avg loss 0.00209143, dev acc 0.7994, dev avg loss 0.460474, throughput 4.95178K wps
[Epoch 166 Batch 30/62] avg loss 0.00199969, throughput 5.00007K wps
[Epoch 166 Batch 60/62] avg loss 0.00195264, throughput 4.87498K wps
Begin Testing...
[Epoch 166] train avg loss 0.00200049, dev acc 0.7994, dev avg loss 0.449173, throughput 4.94439K wps
[Epoch 167 Batch 30/62] avg loss 0.00192904, throughput 5.00483K wps
[Epoch 167 Batch 60/62] avg loss 0.00194342, throughput 4.87768K wps
Begin Testing...
[Epoch 167] train avg loss 0.00195816, dev acc 0.7965, dev avg loss 0.451914, throughput 4.94698K wps
[Epoch 168 Batch 30/62] avg loss 0.00186391, throughput 5.0114K wps
[Epoch 168 Batch 60/62] avg loss 0.00186593, throughput 4.8717K wps
Begin Testing...
[Epoch 168] train avg loss 0.00191298, dev acc 0.7965, dev avg loss 0.461588, throughput 4.94753K wps
[Epoch 169 Batch 30/62] avg loss 0.00193885, throughput 5.01983K wps
[Epoch 169 Batch 60/62] avg loss 0.00189423, throughput 4.88808K wps
Begin Testing...
[Epoch 169] train avg loss 0.00193174, dev acc 0.7906, dev avg loss 0.456104, throughput 4.9567K wps
[Epoch 170 Batch 30/62] avg loss 0.00186203, throughput 5.00124K wps
[Epoch 170 Batch 60/62] avg loss 0.00181876, throughput 4.88522K wps
Begin Testing...
[Epoch 170] train avg loss 0.00185752, dev acc 0.7935, dev avg loss 0.464245, throughput 4.95019K wps
[Epoch 171 Batch 30/62] avg loss 0.00186511, throughput 5.00916K wps
[Epoch 171 Batch 60/62] avg loss 0.00185214, throughput 4.88096K wps
Begin Testing...
[Epoch 171] train avg loss 0.00189494, dev acc 0.7994, dev avg loss 0.452686, throughput 4.94966K wps
[Epoch 172 Batch 30/62] avg loss 0.0017726, throughput 4.98764K wps
[Epoch 172 Batch 60/62] avg loss 0.0018135, throughput 4.86751K wps
Begin Testing...
[Epoch 172] train avg loss 0.00180478, dev acc 0.7965, dev avg loss 0.453737, throughput 4.93497K wps
[Epoch 173 Batch 30/62] avg loss 0.00191173, throughput 5.00932K wps
[Epoch 173 Batch 60/62] avg loss 0.00168726, throughput 4.88576K wps
Begin Testing...
[Epoch 173] train avg loss 0.00181792, dev acc 0.7906, dev avg loss 0.458569, throughput 4.95351K wps
[Epoch 174 Batch 30/62] avg loss 0.00167831, throughput 4.99954K wps
[Epoch 174 Batch 60/62] avg loss 0.0017837, throughput 4.87917K wps
Begin Testing...
[Epoch 174] train avg loss 0.00174055, dev acc 0.7876, dev avg loss 0.458028, throughput 4.9448K wps
[Epoch 175 Batch 30/62] avg loss 0.0016953, throughput 4.98772K wps
[Epoch 175 Batch 60/62] avg loss 0.00168432, throughput 4.86443K wps
Begin Testing...
[Epoch 175] train avg loss 0.00171348, dev acc 0.7965, dev avg loss 0.459113, throughput 4.93235K wps
[Epoch 176 Batch 30/62] avg loss 0.00180993, throughput 5.00706K wps
[Epoch 176 Batch 60/62] avg loss 0.00164758, throughput 4.8873K wps
Begin Testing...
[Epoch 176] train avg loss 0.0017344, dev acc 0.7906, dev avg loss 0.459872, throughput 4.95322K wps
[Epoch 177 Batch 30/62] avg loss 0.00169377, throughput 4.99202K wps
[Epoch 177 Batch 60/62] avg loss 0.00151606, throughput 4.87299K wps
Begin Testing...
[Epoch 177] train avg loss 0.0016296, dev acc 0.7994, dev avg loss 0.456894, throughput 4.93879K wps
[Epoch 178 Batch 30/62] avg loss 0.00162314, throughput 5.00657K wps
[Epoch 178 Batch 60/62] avg loss 0.00177061, throughput 4.87189K wps
Begin Testing...
[Epoch 178] train avg loss 0.00173091, dev acc 0.7965, dev avg loss 0.48655, throughput 4.94302K wps
[Epoch 179 Batch 30/62] avg loss 0.00172437, throughput 4.97487K wps
[Epoch 179 Batch 60/62] avg loss 0.00166357, throughput 4.88013K wps
Begin Testing...
[Epoch 179] train avg loss 0.0017113, dev acc 0.7965, dev avg loss 0.46051, throughput 4.93446K wps
[Epoch 180 Batch 30/62] avg loss 0.00157879, throughput 5.00163K wps
[Epoch 180 Batch 60/62] avg loss 0.00165249, throughput 4.88889K wps
Begin Testing...
[Epoch 180] train avg loss 0.00162894, dev acc 0.7965, dev avg loss 0.469928, throughput 4.95172K wps
[Epoch 181 Batch 30/62] avg loss 0.00166614, throughput 4.94953K wps
[Epoch 181 Batch 60/62] avg loss 0.0015914, throughput 4.89093K wps
Begin Testing...
[Epoch 181] train avg loss 0.00163429, dev acc 0.7965, dev avg loss 0.46565, throughput 4.92777K wps
[Epoch 182 Batch 30/62] avg loss 0.00155495, throughput 4.99044K wps
[Epoch 182 Batch 60/62] avg loss 0.00162113, throughput 4.86548K wps
Begin Testing...
[Epoch 182] train avg loss 0.00158485, dev acc 0.7935, dev avg loss 0.464216, throughput 4.93302K wps
[Epoch 183 Batch 30/62] avg loss 0.00156825, throughput 4.94797K wps
[Epoch 183 Batch 60/62] avg loss 0.00141993, throughput 4.86629K wps
Begin Testing...
[Epoch 183] train avg loss 0.00152798, dev acc 0.7906, dev avg loss 0.465198, throughput 4.91456K wps
[Epoch 184 Batch 30/62] avg loss 0.00154637, throughput 5.01503K wps
[Epoch 184 Batch 60/62] avg loss 0.0014339, throughput 4.87841K wps
Begin Testing...
[Epoch 184] train avg loss 0.00150075, dev acc 0.7876, dev avg loss 0.469227, throughput 4.95137K wps
[Epoch 185 Batch 30/62] avg loss 0.00149321, throughput 4.99538K wps
[Epoch 185 Batch 60/62] avg loss 0.00165066, throughput 4.86878K wps
Begin Testing...
[Epoch 185] train avg loss 0.00159881, dev acc 0.7935, dev avg loss 0.47099, throughput 4.93803K wps
[Epoch 186 Batch 30/62] avg loss 0.00150037, throughput 5.01707K wps
[Epoch 186 Batch 60/62] avg loss 0.00144252, throughput 4.88049K wps
Begin Testing...
[Epoch 186] train avg loss 0.00150328, dev acc 0.7935, dev avg loss 0.476579, throughput 4.95543K wps
[Epoch 187 Batch 30/62] avg loss 0.00138191, throughput 4.94416K wps
[Epoch 187 Batch 60/62] avg loss 0.00156237, throughput 4.8943K wps
Begin Testing...
[Epoch 187] train avg loss 0.00147506, dev acc 0.7876, dev avg loss 0.469997, throughput 4.92717K wps
[Epoch 188 Batch 30/62] avg loss 0.00150678, throughput 5.00436K wps
[Epoch 188 Batch 60/62] avg loss 0.00148776, throughput 4.89677K wps
Begin Testing...
[Epoch 188] train avg loss 0.00152468, dev acc 0.7935, dev avg loss 0.48125, throughput 4.95704K wps
[Epoch 189 Batch 30/62] avg loss 0.00139935, throughput 4.98482K wps
[Epoch 189 Batch 60/62] avg loss 0.00143346, throughput 4.86293K wps
Begin Testing...
[Epoch 189] train avg loss 0.00142856, dev acc 0.7906, dev avg loss 0.473011, throughput 4.93012K wps
[Epoch 190 Batch 30/62] avg loss 0.00130089, throughput 4.96932K wps
[Epoch 190 Batch 60/62] avg loss 0.00146526, throughput 4.8967K wps
Begin Testing...
[Epoch 190] train avg loss 0.00141028, dev acc 0.7994, dev avg loss 0.471841, throughput 4.93961K wps
[Epoch 191 Batch 30/62] avg loss 0.00133867, throughput 5.01137K wps
[Epoch 191 Batch 60/62] avg loss 0.00141727, throughput 4.8673K wps
Begin Testing...
[Epoch 191] train avg loss 0.00138581, dev acc 0.7935, dev avg loss 0.478036, throughput 4.94583K wps
[Epoch 192 Batch 30/62] avg loss 0.00130679, throughput 5.01012K wps
[Epoch 192 Batch 60/62] avg loss 0.00141536, throughput 4.88764K wps
Begin Testing...
[Epoch 192] train avg loss 0.00136947, dev acc 0.7935, dev avg loss 0.482811, throughput 4.95333K wps
[Epoch 193 Batch 30/62] avg loss 0.0012943, throughput 4.97121K wps
[Epoch 193 Batch 60/62] avg loss 0.00138323, throughput 4.8786K wps
Begin Testing...
[Epoch 193] train avg loss 0.00135575, dev acc 0.7965, dev avg loss 0.478545, throughput 4.93235K wps
[Epoch 194 Batch 30/62] avg loss 0.00129181, throughput 4.98552K wps
[Epoch 194 Batch 60/62] avg loss 0.00137723, throughput 4.8826K wps
Begin Testing...
[Epoch 194] train avg loss 0.00135996, dev acc 0.7965, dev avg loss 0.480137, throughput 4.94074K wps
[Epoch 195 Batch 30/62] avg loss 0.00132983, throughput 4.99977K wps
[Epoch 195 Batch 60/62] avg loss 0.0013115, throughput 4.88023K wps
Begin Testing...
[Epoch 195] train avg loss 0.00134294, dev acc 0.7906, dev avg loss 0.485707, throughput 4.9458K wps
[Epoch 196 Batch 30/62] avg loss 0.00138437, throughput 5.0096K wps
[Epoch 196 Batch 60/62] avg loss 0.00120601, throughput 4.89138K wps
Begin Testing...
[Epoch 196] train avg loss 0.00131134, dev acc 0.7876, dev avg loss 0.483947, throughput 4.95706K wps
[Epoch 197 Batch 30/62] avg loss 0.00126485, throughput 4.9604K wps
[Epoch 197 Batch 60/62] avg loss 0.00131873, throughput 4.86996K wps
Begin Testing...
[Epoch 197] train avg loss 0.00131914, dev acc 0.7935, dev avg loss 0.50322, throughput 4.92273K wps
[Epoch 198 Batch 30/62] avg loss 0.00122482, throughput 4.99199K wps
[Epoch 198 Batch 60/62] avg loss 0.00127981, throughput 4.87955K wps
Begin Testing...
[Epoch 198] train avg loss 0.00126032, dev acc 0.7876, dev avg loss 0.488851, throughput 4.94124K wps
[Epoch 199 Batch 30/62] avg loss 0.00127977, throughput 4.96295K wps
[Epoch 199 Batch 60/62] avg loss 0.00123091, throughput 4.86316K wps
Begin Testing...
[Epoch 199] train avg loss 0.0013259, dev acc 0.7906, dev avg loss 0.488399, throughput 4.92055K wps
Test loss 0.421023, test acc 0.8090
Total time cost 275.45s
[Epoch 0 Batch 30/62] avg loss 0.0134954, throughput 4.766K wps
[Epoch 0 Batch 60/62] avg loss 0.0130864, throughput 4.87955K wps
Begin Testing...
[Epoch 0] train avg loss 0.0134388, dev acc 0.6254, dev avg loss 0.663686, throughput 4.83277K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0129946, throughput 5.00974K wps
[Epoch 1 Batch 60/62] avg loss 0.0133344, throughput 4.89238K wps
Begin Testing...
[Epoch 1] train avg loss 0.0133348, dev acc 0.6254, dev avg loss 0.659844, throughput 4.95697K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0131521, throughput 4.97939K wps
[Epoch 2 Batch 60/62] avg loss 0.0131283, throughput 4.8879K wps
Begin Testing...
[Epoch 2] train avg loss 0.0133496, dev acc 0.6254, dev avg loss 0.660169, throughput 4.94068K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0129069, throughput 4.99466K wps
[Epoch 3 Batch 60/62] avg loss 0.0132084, throughput 4.86931K wps
Begin Testing...
[Epoch 3] train avg loss 0.0132423, dev acc 0.6254, dev avg loss 0.658871, throughput 4.93673K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0131805, throughput 4.96375K wps
[Epoch 4 Batch 60/62] avg loss 0.0129307, throughput 4.86882K wps
Begin Testing...
[Epoch 4] train avg loss 0.0132206, dev acc 0.6254, dev avg loss 0.657504, throughput 4.92431K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.013103, throughput 4.99709K wps
[Epoch 5 Batch 60/62] avg loss 0.0129323, throughput 4.85025K wps
Begin Testing...
[Epoch 5] train avg loss 0.013209, dev acc 0.6254, dev avg loss 0.656846, throughput 4.92824K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0128172, throughput 4.99562K wps
[Epoch 6 Batch 60/62] avg loss 0.0131298, throughput 4.8431K wps
Begin Testing...
[Epoch 6] train avg loss 0.0131335, dev acc 0.6254, dev avg loss 0.655793, throughput 4.92473K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0129737, throughput 4.94266K wps
[Epoch 7 Batch 60/62] avg loss 0.0129405, throughput 4.85275K wps
Begin Testing...
[Epoch 7] train avg loss 0.0131111, dev acc 0.6254, dev avg loss 0.656349, throughput 4.90458K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0129739, throughput 4.95825K wps
[Epoch 8 Batch 60/62] avg loss 0.0129688, throughput 4.88022K wps
Begin Testing...
[Epoch 8] train avg loss 0.0131457, dev acc 0.6254, dev avg loss 0.653628, throughput 4.92756K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0128108, throughput 4.99408K wps
[Epoch 9 Batch 60/62] avg loss 0.0130134, throughput 4.88643K wps
Begin Testing...
[Epoch 9] train avg loss 0.0130913, dev acc 0.6254, dev avg loss 0.652698, throughput 4.94652K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0127561, throughput 5.01541K wps
[Epoch 10 Batch 60/62] avg loss 0.0130491, throughput 4.88891K wps
Begin Testing...
[Epoch 10] train avg loss 0.0130335, dev acc 0.6254, dev avg loss 0.651722, throughput 4.95736K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0129098, throughput 4.98275K wps
[Epoch 11 Batch 60/62] avg loss 0.0127482, throughput 4.87348K wps
Begin Testing...
[Epoch 11] train avg loss 0.0129554, dev acc 0.6254, dev avg loss 0.651588, throughput 4.93436K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0127684, throughput 4.96079K wps
[Epoch 12 Batch 60/62] avg loss 0.0128488, throughput 4.88034K wps
Begin Testing...
[Epoch 12] train avg loss 0.0129821, dev acc 0.6254, dev avg loss 0.649174, throughput 4.92692K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0129259, throughput 4.97678K wps
[Epoch 13 Batch 60/62] avg loss 0.0126506, throughput 4.86506K wps
Begin Testing...
[Epoch 13] train avg loss 0.0129973, dev acc 0.6254, dev avg loss 0.647746, throughput 4.92635K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0128556, throughput 5.00756K wps
[Epoch 14 Batch 60/62] avg loss 0.0126041, throughput 4.8925K wps
Begin Testing...
[Epoch 14] train avg loss 0.0129106, dev acc 0.6254, dev avg loss 0.646447, throughput 4.9568K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0128586, throughput 4.99584K wps
[Epoch 15 Batch 60/62] avg loss 0.0125085, throughput 4.8915K wps
Begin Testing...
[Epoch 15] train avg loss 0.0128125, dev acc 0.6254, dev avg loss 0.64661, throughput 4.95067K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0127046, throughput 4.9867K wps
[Epoch 16 Batch 60/62] avg loss 0.0125131, throughput 4.8818K wps
Begin Testing...
[Epoch 16] train avg loss 0.0128253, dev acc 0.6254, dev avg loss 0.642549, throughput 4.94069K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0125542, throughput 4.99855K wps
[Epoch 17 Batch 60/62] avg loss 0.0126273, throughput 4.88672K wps
Begin Testing...
[Epoch 17] train avg loss 0.0127422, dev acc 0.6254, dev avg loss 0.64119, throughput 4.94804K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.0126754, throughput 4.96323K wps
[Epoch 18 Batch 60/62] avg loss 0.0124948, throughput 4.858K wps
Begin Testing...
[Epoch 18] train avg loss 0.0127549, dev acc 0.6254, dev avg loss 0.639077, throughput 4.91682K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.012581, throughput 5.00177K wps
[Epoch 19 Batch 60/62] avg loss 0.0124653, throughput 4.88566K wps
Begin Testing...
[Epoch 19] train avg loss 0.0126584, dev acc 0.6254, dev avg loss 0.637537, throughput 4.94947K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0124072, throughput 4.96502K wps
[Epoch 20 Batch 60/62] avg loss 0.0124991, throughput 4.85032K wps
Begin Testing...
[Epoch 20] train avg loss 0.0126043, dev acc 0.6254, dev avg loss 0.634402, throughput 4.9142K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.0124745, throughput 4.98627K wps
[Epoch 21 Batch 60/62] avg loss 0.0124389, throughput 4.88205K wps
Begin Testing...
[Epoch 21] train avg loss 0.0126198, dev acc 0.6254, dev avg loss 0.63221, throughput 4.94138K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.0122639, throughput 4.98413K wps
[Epoch 22 Batch 60/62] avg loss 0.012391, throughput 4.87299K wps
Begin Testing...
[Epoch 22] train avg loss 0.0125069, dev acc 0.6283, dev avg loss 0.629716, throughput 4.93541K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.0122649, throughput 5.00061K wps
[Epoch 23 Batch 60/62] avg loss 0.0123051, throughput 4.88008K wps
Begin Testing...
[Epoch 23] train avg loss 0.0124403, dev acc 0.6283, dev avg loss 0.627399, throughput 4.94513K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.0123219, throughput 5.00355K wps
[Epoch 24 Batch 60/62] avg loss 0.0122042, throughput 4.86916K wps
Begin Testing...
[Epoch 24] train avg loss 0.0123928, dev acc 0.6283, dev avg loss 0.626209, throughput 4.94246K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.0121162, throughput 5.00389K wps
[Epoch 25 Batch 60/62] avg loss 0.0121114, throughput 4.86164K wps
Begin Testing...
[Epoch 25] train avg loss 0.0122781, dev acc 0.6549, dev avg loss 0.622886, throughput 4.93759K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.0120151, throughput 4.98667K wps
[Epoch 26 Batch 60/62] avg loss 0.0121628, throughput 4.8594K wps
Begin Testing...
[Epoch 26] train avg loss 0.0122001, dev acc 0.6401, dev avg loss 0.620071, throughput 4.92739K wps
[Epoch 27 Batch 30/62] avg loss 0.0120309, throughput 4.9738K wps
[Epoch 27 Batch 60/62] avg loss 0.0119747, throughput 4.84965K wps
Begin Testing...
[Epoch 27] train avg loss 0.0121267, dev acc 0.6519, dev avg loss 0.617438, throughput 4.91549K wps
[Epoch 28 Batch 30/62] avg loss 0.011942, throughput 4.9874K wps
[Epoch 28 Batch 60/62] avg loss 0.0118425, throughput 4.87041K wps
Begin Testing...
[Epoch 28] train avg loss 0.0120767, dev acc 0.6490, dev avg loss 0.613513, throughput 4.93573K wps
[Epoch 29 Batch 30/62] avg loss 0.0119389, throughput 4.97742K wps
[Epoch 29 Batch 60/62] avg loss 0.0116891, throughput 4.86715K wps
Begin Testing...
[Epoch 29] train avg loss 0.0119732, dev acc 0.6490, dev avg loss 0.611214, throughput 4.92773K wps
[Epoch 30 Batch 30/62] avg loss 0.0118905, throughput 4.96731K wps
[Epoch 30 Batch 60/62] avg loss 0.0116304, throughput 4.87681K wps
Begin Testing...
[Epoch 30] train avg loss 0.0119709, dev acc 0.6431, dev avg loss 0.60753, throughput 4.92901K wps
[Epoch 31 Batch 30/62] avg loss 0.0116049, throughput 4.98432K wps
[Epoch 31 Batch 60/62] avg loss 0.0117621, throughput 4.88403K wps
Begin Testing...
[Epoch 31] train avg loss 0.011817, dev acc 0.6490, dev avg loss 0.604623, throughput 4.93875K wps
[Epoch 32 Batch 30/62] avg loss 0.0119126, throughput 5.00671K wps
[Epoch 32 Batch 60/62] avg loss 0.0113482, throughput 4.89547K wps
Begin Testing...
[Epoch 32] train avg loss 0.0118197, dev acc 0.6460, dev avg loss 0.601891, throughput 4.95813K wps
[Epoch 33 Batch 30/62] avg loss 0.0117107, throughput 4.98777K wps
[Epoch 33 Batch 60/62] avg loss 0.0112882, throughput 4.88129K wps
Begin Testing...
[Epoch 33] train avg loss 0.0116102, dev acc 0.6490, dev avg loss 0.601033, throughput 4.94135K wps
[Epoch 34 Batch 30/62] avg loss 0.0116016, throughput 4.99904K wps
[Epoch 34 Batch 60/62] avg loss 0.0115445, throughput 4.85077K wps
Begin Testing...
[Epoch 34] train avg loss 0.0116996, dev acc 0.6490, dev avg loss 0.596438, throughput 4.93093K wps
[Epoch 35 Batch 30/62] avg loss 0.0113433, throughput 5.00492K wps
[Epoch 35 Batch 60/62] avg loss 0.0114302, throughput 4.86964K wps
Begin Testing...
[Epoch 35] train avg loss 0.0115464, dev acc 0.6431, dev avg loss 0.59305, throughput 4.94339K wps
[Epoch 36 Batch 30/62] avg loss 0.0113802, throughput 4.97949K wps
[Epoch 36 Batch 60/62] avg loss 0.0113789, throughput 4.88918K wps
Begin Testing...
[Epoch 36] train avg loss 0.0115083, dev acc 0.6460, dev avg loss 0.590536, throughput 4.94107K wps
[Epoch 37 Batch 30/62] avg loss 0.0111006, throughput 4.99965K wps
[Epoch 37 Batch 60/62] avg loss 0.0114846, throughput 4.89607K wps
Begin Testing...
[Epoch 37] train avg loss 0.0113993, dev acc 0.6490, dev avg loss 0.587571, throughput 4.95407K wps
[Epoch 38 Batch 30/62] avg loss 0.0111974, throughput 4.99177K wps
[Epoch 38 Batch 60/62] avg loss 0.0111051, throughput 4.87773K wps
Begin Testing...
[Epoch 38] train avg loss 0.0112686, dev acc 0.6460, dev avg loss 0.583343, throughput 4.94065K wps
[Epoch 39 Batch 30/62] avg loss 0.0110787, throughput 5.00632K wps
[Epoch 39 Batch 60/62] avg loss 0.011131, throughput 4.88233K wps
Begin Testing...
[Epoch 39] train avg loss 0.0113079, dev acc 0.6549, dev avg loss 0.579508, throughput 4.9507K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/62] avg loss 0.0110727, throughput 5.00086K wps
[Epoch 40 Batch 60/62] avg loss 0.0110566, throughput 4.87968K wps
Begin Testing...
[Epoch 40] train avg loss 0.0112712, dev acc 0.6637, dev avg loss 0.576571, throughput 4.94545K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.0109519, throughput 4.95676K wps
[Epoch 41 Batch 60/62] avg loss 0.0109304, throughput 4.89867K wps
Begin Testing...
[Epoch 41] train avg loss 0.0111582, dev acc 0.6549, dev avg loss 0.572195, throughput 4.93605K wps
[Epoch 42 Batch 30/62] avg loss 0.0107936, throughput 4.97093K wps
[Epoch 42 Batch 60/62] avg loss 0.0109227, throughput 4.88006K wps
Begin Testing...
[Epoch 42] train avg loss 0.0109461, dev acc 0.6578, dev avg loss 0.568175, throughput 4.93233K wps
[Epoch 43 Batch 30/62] avg loss 0.0106585, throughput 5.01256K wps
[Epoch 43 Batch 60/62] avg loss 0.0110164, throughput 4.87083K wps
Begin Testing...
[Epoch 43] train avg loss 0.0109639, dev acc 0.6785, dev avg loss 0.563866, throughput 4.94707K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/62] avg loss 0.0106011, throughput 4.98009K wps
[Epoch 44 Batch 60/62] avg loss 0.0107649, throughput 4.88907K wps
Begin Testing...
[Epoch 44] train avg loss 0.0108211, dev acc 0.6755, dev avg loss 0.559668, throughput 4.94186K wps
[Epoch 45 Batch 30/62] avg loss 0.0105374, throughput 4.96972K wps
[Epoch 45 Batch 60/62] avg loss 0.0107791, throughput 4.86884K wps
Begin Testing...
[Epoch 45] train avg loss 0.0107376, dev acc 0.6991, dev avg loss 0.555401, throughput 4.92571K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/62] avg loss 0.0105118, throughput 4.98389K wps
[Epoch 46 Batch 60/62] avg loss 0.0105465, throughput 4.89183K wps
Begin Testing...
[Epoch 46] train avg loss 0.0106262, dev acc 0.7109, dev avg loss 0.551411, throughput 4.94351K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/62] avg loss 0.0104437, throughput 5.00915K wps
[Epoch 47 Batch 60/62] avg loss 0.0104187, throughput 4.89414K wps
Begin Testing...
[Epoch 47] train avg loss 0.0105645, dev acc 0.7080, dev avg loss 0.547473, throughput 4.95733K wps
[Epoch 48 Batch 30/62] avg loss 0.0102947, throughput 4.98653K wps
[Epoch 48 Batch 60/62] avg loss 0.0104695, throughput 4.87149K wps
Begin Testing...
[Epoch 48] train avg loss 0.0105537, dev acc 0.7286, dev avg loss 0.542695, throughput 4.93599K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/62] avg loss 0.0103725, throughput 4.99696K wps
[Epoch 49 Batch 60/62] avg loss 0.0102292, throughput 4.89265K wps
Begin Testing...
[Epoch 49] train avg loss 0.0104304, dev acc 0.7227, dev avg loss 0.539002, throughput 4.9499K wps
[Epoch 50 Batch 30/62] avg loss 0.0101712, throughput 4.99684K wps
[Epoch 50 Batch 60/62] avg loss 0.0101458, throughput 4.88877K wps
Begin Testing...
[Epoch 50] train avg loss 0.0103358, dev acc 0.7227, dev avg loss 0.535146, throughput 4.94961K wps
[Epoch 51 Batch 30/62] avg loss 0.0103208, throughput 5.00035K wps
[Epoch 51 Batch 60/62] avg loss 0.00989436, throughput 4.89605K wps
Begin Testing...
[Epoch 51] train avg loss 0.010214, dev acc 0.7227, dev avg loss 0.531894, throughput 4.95448K wps
[Epoch 52 Batch 30/62] avg loss 0.00996556, throughput 5.01231K wps
[Epoch 52 Batch 60/62] avg loss 0.0099996, throughput 4.86194K wps
Begin Testing...
[Epoch 52] train avg loss 0.010115, dev acc 0.7404, dev avg loss 0.526468, throughput 4.9424K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.010006, throughput 5.00083K wps
[Epoch 53 Batch 60/62] avg loss 0.00983162, throughput 4.89529K wps
Begin Testing...
[Epoch 53] train avg loss 0.0101124, dev acc 0.7581, dev avg loss 0.522062, throughput 4.95505K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/62] avg loss 0.0099325, throughput 5.00404K wps
[Epoch 54 Batch 60/62] avg loss 0.00968199, throughput 4.85935K wps
Begin Testing...
[Epoch 54] train avg loss 0.0099427, dev acc 0.7434, dev avg loss 0.518498, throughput 4.93687K wps
[Epoch 55 Batch 30/62] avg loss 0.00953829, throughput 4.9939K wps
[Epoch 55 Batch 60/62] avg loss 0.00997361, throughput 4.86703K wps
Begin Testing...
[Epoch 55] train avg loss 0.00982198, dev acc 0.7670, dev avg loss 0.513177, throughput 4.93734K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.00973484, throughput 4.99138K wps
[Epoch 56 Batch 60/62] avg loss 0.00958291, throughput 4.88826K wps
Begin Testing...
[Epoch 56] train avg loss 0.0097487, dev acc 0.7581, dev avg loss 0.509883, throughput 4.94558K wps
[Epoch 57 Batch 30/62] avg loss 0.00940227, throughput 5.00332K wps
[Epoch 57 Batch 60/62] avg loss 0.0094805, throughput 4.89327K wps
Begin Testing...
[Epoch 57] train avg loss 0.00951674, dev acc 0.7552, dev avg loss 0.506545, throughput 4.95475K wps
[Epoch 58 Batch 30/62] avg loss 0.00944146, throughput 4.99838K wps
[Epoch 58 Batch 60/62] avg loss 0.00946566, throughput 4.88188K wps
Begin Testing...
[Epoch 58] train avg loss 0.0095731, dev acc 0.7699, dev avg loss 0.501247, throughput 4.94545K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/62] avg loss 0.00931551, throughput 5.0002K wps
[Epoch 59 Batch 60/62] avg loss 0.009287, throughput 4.88311K wps
Begin Testing...
[Epoch 59] train avg loss 0.00943473, dev acc 0.7788, dev avg loss 0.498767, throughput 4.94857K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/62] avg loss 0.00926412, throughput 5.00747K wps
[Epoch 60 Batch 60/62] avg loss 0.00935486, throughput 4.86794K wps
Begin Testing...
[Epoch 60] train avg loss 0.00938085, dev acc 0.7640, dev avg loss 0.496119, throughput 4.94404K wps
[Epoch 61 Batch 30/62] avg loss 0.00916037, throughput 5.00195K wps
[Epoch 61 Batch 60/62] avg loss 0.00911139, throughput 4.8929K wps
Begin Testing...
[Epoch 61] train avg loss 0.00926146, dev acc 0.7699, dev avg loss 0.4918, throughput 4.95457K wps
[Epoch 62 Batch 30/62] avg loss 0.00919029, throughput 5.01023K wps
[Epoch 62 Batch 60/62] avg loss 0.00897728, throughput 4.8973K wps
Begin Testing...
[Epoch 62] train avg loss 0.00922044, dev acc 0.7729, dev avg loss 0.488502, throughput 4.95941K wps
[Epoch 63 Batch 30/62] avg loss 0.00910627, throughput 4.99606K wps
[Epoch 63 Batch 60/62] avg loss 0.00886873, throughput 4.87022K wps
Begin Testing...
[Epoch 63] train avg loss 0.00915694, dev acc 0.7935, dev avg loss 0.485796, throughput 4.93824K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/62] avg loss 0.00883765, throughput 4.98333K wps
[Epoch 64 Batch 60/62] avg loss 0.00893765, throughput 4.86928K wps
Begin Testing...
[Epoch 64] train avg loss 0.00904353, dev acc 0.7758, dev avg loss 0.481588, throughput 4.93242K wps
[Epoch 65 Batch 30/62] avg loss 0.00861209, throughput 4.95195K wps
[Epoch 65 Batch 60/62] avg loss 0.00875459, throughput 4.86097K wps
Begin Testing...
[Epoch 65] train avg loss 0.00880162, dev acc 0.7906, dev avg loss 0.478738, throughput 4.91378K wps
[Epoch 66 Batch 30/62] avg loss 0.00886568, throughput 4.97672K wps
[Epoch 66 Batch 60/62] avg loss 0.00872408, throughput 4.84995K wps
Begin Testing...
[Epoch 66] train avg loss 0.00896984, dev acc 0.7729, dev avg loss 0.476227, throughput 4.91787K wps
[Epoch 67 Batch 30/62] avg loss 0.00858484, throughput 4.96162K wps
[Epoch 67 Batch 60/62] avg loss 0.00853269, throughput 4.88834K wps
Begin Testing...
[Epoch 67] train avg loss 0.00863293, dev acc 0.7729, dev avg loss 0.473867, throughput 4.93089K wps
[Epoch 68 Batch 30/62] avg loss 0.0084728, throughput 4.97184K wps
[Epoch 68 Batch 60/62] avg loss 0.00847919, throughput 4.88619K wps
Begin Testing...
[Epoch 68] train avg loss 0.00863407, dev acc 0.7906, dev avg loss 0.469291, throughput 4.93572K wps
[Epoch 69 Batch 30/62] avg loss 0.00846242, throughput 4.99059K wps
[Epoch 69 Batch 60/62] avg loss 0.00839111, throughput 4.892K wps
Begin Testing...
[Epoch 69] train avg loss 0.00857048, dev acc 0.7994, dev avg loss 0.466457, throughput 4.94865K wps
Observed Improvement.
Begin Testing...
[Epoch 70 Batch 30/62] avg loss 0.0081279, throughput 4.97012K wps
[Epoch 70 Batch 60/62] avg loss 0.00855083, throughput 4.89595K wps
Begin Testing...
[Epoch 70] train avg loss 0.00843555, dev acc 0.7994, dev avg loss 0.463568, throughput 4.93985K wps
Observed Improvement.
Begin Testing...
[Epoch 71 Batch 30/62] avg loss 0.00806603, throughput 4.98991K wps
[Epoch 71 Batch 60/62] avg loss 0.00845358, throughput 4.89345K wps
Begin Testing...
[Epoch 71] train avg loss 0.00837403, dev acc 0.7935, dev avg loss 0.460824, throughput 4.94751K wps
[Epoch 72 Batch 30/62] avg loss 0.0079617, throughput 4.92198K wps
[Epoch 72 Batch 60/62] avg loss 0.00842711, throughput 4.84963K wps
Begin Testing...
[Epoch 72] train avg loss 0.00830046, dev acc 0.7906, dev avg loss 0.459486, throughput 4.89355K wps
[Epoch 73 Batch 30/62] avg loss 0.00792412, throughput 4.96054K wps
[Epoch 73 Batch 60/62] avg loss 0.0080576, throughput 4.88547K wps
Begin Testing...
[Epoch 73] train avg loss 0.00812712, dev acc 0.7965, dev avg loss 0.45873, throughput 4.93143K wps
[Epoch 74 Batch 30/62] avg loss 0.00803316, throughput 5.01464K wps
[Epoch 74 Batch 60/62] avg loss 0.00795298, throughput 4.87608K wps
Begin Testing...
[Epoch 74] train avg loss 0.00806708, dev acc 0.8024, dev avg loss 0.454096, throughput 4.95K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/62] avg loss 0.00790843, throughput 4.98961K wps
[Epoch 75 Batch 60/62] avg loss 0.00774496, throughput 4.89513K wps
Begin Testing...
[Epoch 75] train avg loss 0.00793364, dev acc 0.7994, dev avg loss 0.451054, throughput 4.94914K wps
[Epoch 76 Batch 30/62] avg loss 0.00754299, throughput 4.97553K wps
[Epoch 76 Batch 60/62] avg loss 0.00807805, throughput 4.86K wps
Begin Testing...
[Epoch 76] train avg loss 0.00794375, dev acc 0.8083, dev avg loss 0.448568, throughput 4.9236K wps
Observed Improvement.
Begin Testing...
[Epoch 77 Batch 30/62] avg loss 0.00775624, throughput 4.97739K wps
[Epoch 77 Batch 60/62] avg loss 0.00765467, throughput 4.86505K wps
Begin Testing...
[Epoch 77] train avg loss 0.00782195, dev acc 0.8024, dev avg loss 0.446675, throughput 4.92825K wps
[Epoch 78 Batch 30/62] avg loss 0.00756226, throughput 5.01999K wps
[Epoch 78 Batch 60/62] avg loss 0.00768449, throughput 4.8876K wps
Begin Testing...
[Epoch 78] train avg loss 0.00768195, dev acc 0.8083, dev avg loss 0.444533, throughput 4.96029K wps
Observed Improvement.
Begin Testing...
[Epoch 79 Batch 30/62] avg loss 0.00765177, throughput 4.98346K wps
[Epoch 79 Batch 60/62] avg loss 0.00721486, throughput 4.90306K wps
Begin Testing...
[Epoch 79] train avg loss 0.0075875, dev acc 0.7994, dev avg loss 0.442774, throughput 4.95039K wps
[Epoch 80 Batch 30/62] avg loss 0.00755937, throughput 4.98826K wps
[Epoch 80 Batch 60/62] avg loss 0.00713902, throughput 4.86374K wps
Begin Testing...
[Epoch 80] train avg loss 0.00750025, dev acc 0.7965, dev avg loss 0.441074, throughput 4.93354K wps
[Epoch 81 Batch 30/62] avg loss 0.00725786, throughput 5.01284K wps
[Epoch 81 Batch 60/62] avg loss 0.00750864, throughput 4.87857K wps
Begin Testing...
[Epoch 81] train avg loss 0.00747609, dev acc 0.8083, dev avg loss 0.438473, throughput 4.95129K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/62] avg loss 0.00713341, throughput 5.00204K wps
[Epoch 82 Batch 60/62] avg loss 0.00742431, throughput 4.88397K wps
Begin Testing...
[Epoch 82] train avg loss 0.0073493, dev acc 0.8024, dev avg loss 0.437497, throughput 4.94745K wps
[Epoch 83 Batch 30/62] avg loss 0.00709258, throughput 4.9945K wps
[Epoch 83 Batch 60/62] avg loss 0.00721117, throughput 4.89321K wps
Begin Testing...
[Epoch 83] train avg loss 0.00723948, dev acc 0.8171, dev avg loss 0.434979, throughput 4.95053K wps
Observed Improvement.
Begin Testing...
[Epoch 84 Batch 30/62] avg loss 0.00723458, throughput 4.9713K wps
[Epoch 84 Batch 60/62] avg loss 0.00687246, throughput 4.8782K wps
Begin Testing...
[Epoch 84] train avg loss 0.00717579, dev acc 0.8171, dev avg loss 0.433703, throughput 4.93171K wps
Observed Improvement.
Begin Testing...
[Epoch 85 Batch 30/62] avg loss 0.00687586, throughput 4.99805K wps
[Epoch 85 Batch 60/62] avg loss 0.00719714, throughput 4.85428K wps
Begin Testing...
[Epoch 85] train avg loss 0.00711708, dev acc 0.8112, dev avg loss 0.432102, throughput 4.93068K wps
[Epoch 86 Batch 30/62] avg loss 0.00672594, throughput 4.99382K wps
[Epoch 86 Batch 60/62] avg loss 0.00711157, throughput 4.86669K wps
Begin Testing...
[Epoch 86] train avg loss 0.00703686, dev acc 0.8142, dev avg loss 0.429531, throughput 4.93537K wps
[Epoch 87 Batch 30/62] avg loss 0.00706635, throughput 4.98174K wps
[Epoch 87 Batch 60/62] avg loss 0.00675852, throughput 4.8796K wps
Begin Testing...
[Epoch 87] train avg loss 0.0070572, dev acc 0.8083, dev avg loss 0.430246, throughput 4.93685K wps
[Epoch 88 Batch 30/62] avg loss 0.00676493, throughput 5.00354K wps
[Epoch 88 Batch 60/62] avg loss 0.00661462, throughput 4.88799K wps
Begin Testing...
[Epoch 88] train avg loss 0.00680715, dev acc 0.8053, dev avg loss 0.426318, throughput 4.95128K wps
[Epoch 89 Batch 30/62] avg loss 0.00667544, throughput 4.97969K wps
[Epoch 89 Batch 60/62] avg loss 0.00683585, throughput 4.886K wps
Begin Testing...
[Epoch 89] train avg loss 0.00683678, dev acc 0.8112, dev avg loss 0.427126, throughput 4.94015K wps
[Epoch 90 Batch 30/62] avg loss 0.00647212, throughput 4.97271K wps
[Epoch 90 Batch 60/62] avg loss 0.00660879, throughput 4.90255K wps
Begin Testing...
[Epoch 90] train avg loss 0.00664936, dev acc 0.7994, dev avg loss 0.424984, throughput 4.94462K wps
[Epoch 91 Batch 30/62] avg loss 0.00646042, throughput 4.99321K wps
[Epoch 91 Batch 60/62] avg loss 0.00671788, throughput 4.89164K wps
Begin Testing...
[Epoch 91] train avg loss 0.00662674, dev acc 0.8142, dev avg loss 0.421669, throughput 4.94937K wps
[Epoch 92 Batch 30/62] avg loss 0.00634274, throughput 5.01119K wps
[Epoch 92 Batch 60/62] avg loss 0.00646805, throughput 4.88645K wps
Begin Testing...
[Epoch 92] train avg loss 0.00653617, dev acc 0.8053, dev avg loss 0.42067, throughput 4.95465K wps
[Epoch 93 Batch 30/62] avg loss 0.00621266, throughput 4.95941K wps
[Epoch 93 Batch 60/62] avg loss 0.00649163, throughput 4.85572K wps
Begin Testing...
[Epoch 93] train avg loss 0.00649627, dev acc 0.8053, dev avg loss 0.420653, throughput 4.91534K wps
[Epoch 94 Batch 30/62] avg loss 0.00628002, throughput 4.98496K wps
[Epoch 94 Batch 60/62] avg loss 0.00614116, throughput 4.88733K wps
Begin Testing...
[Epoch 94] train avg loss 0.00624281, dev acc 0.8142, dev avg loss 0.419348, throughput 4.94313K wps
[Epoch 95 Batch 30/62] avg loss 0.00588921, throughput 5.0042K wps
[Epoch 95 Batch 60/62] avg loss 0.0063986, throughput 4.89987K wps
Begin Testing...
[Epoch 95] train avg loss 0.00619151, dev acc 0.8083, dev avg loss 0.417414, throughput 4.95797K wps
[Epoch 96 Batch 30/62] avg loss 0.00596891, throughput 5.01323K wps
[Epoch 96 Batch 60/62] avg loss 0.00620377, throughput 4.86359K wps
Begin Testing...
[Epoch 96] train avg loss 0.00622456, dev acc 0.8053, dev avg loss 0.417599, throughput 4.94508K wps
[Epoch 97 Batch 30/62] avg loss 0.00595152, throughput 4.98892K wps
[Epoch 97 Batch 60/62] avg loss 0.00596714, throughput 4.8796K wps
Begin Testing...
[Epoch 97] train avg loss 0.00602024, dev acc 0.8112, dev avg loss 0.414545, throughput 4.94124K wps
[Epoch 98 Batch 30/62] avg loss 0.00609269, throughput 5.00223K wps
[Epoch 98 Batch 60/62] avg loss 0.00567675, throughput 4.87067K wps
Begin Testing...
[Epoch 98] train avg loss 0.00602176, dev acc 0.8083, dev avg loss 0.413421, throughput 4.94275K wps
[Epoch 99 Batch 30/62] avg loss 0.00591682, throughput 5.01873K wps
[Epoch 99 Batch 60/62] avg loss 0.0058202, throughput 4.88141K wps
Begin Testing...
[Epoch 99] train avg loss 0.00592611, dev acc 0.8142, dev avg loss 0.412163, throughput 4.95318K wps
[Epoch 100 Batch 30/62] avg loss 0.00580426, throughput 4.98969K wps
[Epoch 100 Batch 60/62] avg loss 0.00567298, throughput 4.86276K wps
Begin Testing...
[Epoch 100] train avg loss 0.00588892, dev acc 0.8201, dev avg loss 0.411808, throughput 4.93324K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/62] avg loss 0.00592193, throughput 4.97486K wps
[Epoch 101 Batch 60/62] avg loss 0.00541965, throughput 4.88767K wps
Begin Testing...
[Epoch 101] train avg loss 0.00577297, dev acc 0.8171, dev avg loss 0.411136, throughput 4.93808K wps
[Epoch 102 Batch 30/62] avg loss 0.00556368, throughput 4.96302K wps
[Epoch 102 Batch 60/62] avg loss 0.00568164, throughput 4.90212K wps
Begin Testing...
[Epoch 102] train avg loss 0.00567381, dev acc 0.8171, dev avg loss 0.409418, throughput 4.94073K wps
[Epoch 103 Batch 30/62] avg loss 0.00549157, throughput 5.01933K wps
[Epoch 103 Batch 60/62] avg loss 0.00567389, throughput 4.87891K wps
Begin Testing...
[Epoch 103] train avg loss 0.00565422, dev acc 0.8142, dev avg loss 0.408802, throughput 4.95426K wps
[Epoch 104 Batch 30/62] avg loss 0.0053532, throughput 5.00381K wps
[Epoch 104 Batch 60/62] avg loss 0.00554019, throughput 4.87637K wps
Begin Testing...
[Epoch 104] train avg loss 0.00552535, dev acc 0.8053, dev avg loss 0.412178, throughput 4.94639K wps
[Epoch 105 Batch 30/62] avg loss 0.00536746, throughput 4.97018K wps
[Epoch 105 Batch 60/62] avg loss 0.00533367, throughput 4.86445K wps
Begin Testing...
[Epoch 105] train avg loss 0.00537782, dev acc 0.8083, dev avg loss 0.407198, throughput 4.92203K wps
[Epoch 106 Batch 30/62] avg loss 0.00519134, throughput 4.98921K wps
[Epoch 106 Batch 60/62] avg loss 0.00548282, throughput 4.88688K wps
Begin Testing...
[Epoch 106] train avg loss 0.0053945, dev acc 0.8171, dev avg loss 0.406946, throughput 4.945K wps
[Epoch 107 Batch 30/62] avg loss 0.00530005, throughput 5.00297K wps
[Epoch 107 Batch 60/62] avg loss 0.00520748, throughput 4.87061K wps
Begin Testing...
[Epoch 107] train avg loss 0.00525581, dev acc 0.8112, dev avg loss 0.406437, throughput 4.94141K wps
[Epoch 108 Batch 30/62] avg loss 0.00494094, throughput 4.97343K wps
[Epoch 108 Batch 60/62] avg loss 0.00542475, throughput 4.89386K wps
Begin Testing...
[Epoch 108] train avg loss 0.00522475, dev acc 0.8053, dev avg loss 0.405326, throughput 4.94062K wps
[Epoch 109 Batch 30/62] avg loss 0.00516603, throughput 5.00035K wps
[Epoch 109 Batch 60/62] avg loss 0.0049407, throughput 4.867K wps
Begin Testing...
[Epoch 109] train avg loss 0.00512706, dev acc 0.8142, dev avg loss 0.404264, throughput 4.94128K wps
[Epoch 110 Batch 30/62] avg loss 0.0047494, throughput 4.94841K wps
[Epoch 110 Batch 60/62] avg loss 0.00531377, throughput 4.85173K wps
Begin Testing...
[Epoch 110] train avg loss 0.00505187, dev acc 0.8171, dev avg loss 0.405118, throughput 4.90669K wps
[Epoch 111 Batch 30/62] avg loss 0.00508878, throughput 4.96976K wps
[Epoch 111 Batch 60/62] avg loss 0.00488501, throughput 4.85871K wps
Begin Testing...
[Epoch 111] train avg loss 0.00500374, dev acc 0.8171, dev avg loss 0.404327, throughput 4.92047K wps
[Epoch 112 Batch 30/62] avg loss 0.00473991, throughput 4.97082K wps
[Epoch 112 Batch 60/62] avg loss 0.00500262, throughput 4.86461K wps
Begin Testing...
[Epoch 112] train avg loss 0.00492248, dev acc 0.8112, dev avg loss 0.403156, throughput 4.92501K wps
[Epoch 113 Batch 30/62] avg loss 0.00491104, throughput 5.01377K wps
[Epoch 113 Batch 60/62] avg loss 0.00480439, throughput 4.88598K wps
Begin Testing...
[Epoch 113] train avg loss 0.00491282, dev acc 0.8112, dev avg loss 0.402838, throughput 4.95602K wps
[Epoch 114 Batch 30/62] avg loss 0.00461223, throughput 4.99989K wps
[Epoch 114 Batch 60/62] avg loss 0.00483936, throughput 4.89975K wps
Begin Testing...
[Epoch 114] train avg loss 0.00477465, dev acc 0.8142, dev avg loss 0.401661, throughput 4.95454K wps
[Epoch 115 Batch 30/62] avg loss 0.00462075, throughput 4.98758K wps
[Epoch 115 Batch 60/62] avg loss 0.00464503, throughput 4.87568K wps
Begin Testing...
[Epoch 115] train avg loss 0.00468663, dev acc 0.8201, dev avg loss 0.401712, throughput 4.93692K wps
Observed Improvement.
Begin Testing...
[Epoch 116 Batch 30/62] avg loss 0.00459241, throughput 4.98472K wps
[Epoch 116 Batch 60/62] avg loss 0.00451267, throughput 4.89385K wps
Begin Testing...
[Epoch 116] train avg loss 0.00459787, dev acc 0.8053, dev avg loss 0.403249, throughput 4.94519K wps
[Epoch 117 Batch 30/62] avg loss 0.00458384, throughput 4.9751K wps
[Epoch 117 Batch 60/62] avg loss 0.00442056, throughput 4.90057K wps
Begin Testing...
[Epoch 117] train avg loss 0.00457793, dev acc 0.8142, dev avg loss 0.40214, throughput 4.94519K wps
[Epoch 118 Batch 30/62] avg loss 0.00457084, throughput 4.99625K wps
[Epoch 118 Batch 60/62] avg loss 0.00451157, throughput 4.8913K wps
Begin Testing...
[Epoch 118] train avg loss 0.00457918, dev acc 0.8112, dev avg loss 0.400409, throughput 4.95005K wps
[Epoch 119 Batch 30/62] avg loss 0.00440054, throughput 4.9836K wps
[Epoch 119 Batch 60/62] avg loss 0.0045432, throughput 4.90104K wps
Begin Testing...
[Epoch 119] train avg loss 0.00457796, dev acc 0.8112, dev avg loss 0.400831, throughput 4.94894K wps
[Epoch 120 Batch 30/62] avg loss 0.00416409, throughput 5.00896K wps
[Epoch 120 Batch 60/62] avg loss 0.00459471, throughput 4.90092K wps
Begin Testing...
[Epoch 120] train avg loss 0.00446139, dev acc 0.8171, dev avg loss 0.400506, throughput 4.96141K wps
[Epoch 121 Batch 30/62] avg loss 0.00408494, throughput 4.98512K wps
[Epoch 121 Batch 60/62] avg loss 0.00438568, throughput 4.84929K wps
Begin Testing...
[Epoch 121] train avg loss 0.00429511, dev acc 0.8083, dev avg loss 0.400093, throughput 4.92469K wps
[Epoch 122 Batch 30/62] avg loss 0.00398997, throughput 4.99707K wps
[Epoch 122 Batch 60/62] avg loss 0.00441619, throughput 4.84384K wps
Begin Testing...
[Epoch 122] train avg loss 0.00423967, dev acc 0.8142, dev avg loss 0.398521, throughput 4.92609K wps
[Epoch 123 Batch 30/62] avg loss 0.00412872, throughput 5.01673K wps
[Epoch 123 Batch 60/62] avg loss 0.00423701, throughput 4.87369K wps
Begin Testing...
[Epoch 123] train avg loss 0.00421139, dev acc 0.8201, dev avg loss 0.398261, throughput 4.95115K wps
Observed Improvement.
Begin Testing...
[Epoch 124 Batch 30/62] avg loss 0.00406476, throughput 4.9894K wps
[Epoch 124 Batch 60/62] avg loss 0.00402057, throughput 4.87923K wps
Begin Testing...
[Epoch 124] train avg loss 0.00414123, dev acc 0.8083, dev avg loss 0.400721, throughput 4.93957K wps
[Epoch 125 Batch 30/62] avg loss 0.00391953, throughput 5.0128K wps
[Epoch 125 Batch 60/62] avg loss 0.00401701, throughput 4.9014K wps
Begin Testing...
[Epoch 125] train avg loss 0.00405808, dev acc 0.8142, dev avg loss 0.400583, throughput 4.96364K wps
[Epoch 126 Batch 30/62] avg loss 0.00419437, throughput 4.98776K wps
[Epoch 126 Batch 60/62] avg loss 0.00377724, throughput 4.87334K wps
Begin Testing...
[Epoch 126] train avg loss 0.00403367, dev acc 0.8142, dev avg loss 0.403011, throughput 4.93726K wps
[Epoch 127 Batch 30/62] avg loss 0.00386194, throughput 5.01692K wps
[Epoch 127 Batch 60/62] avg loss 0.00386026, throughput 4.89836K wps
Begin Testing...
[Epoch 127] train avg loss 0.00396286, dev acc 0.8053, dev avg loss 0.407035, throughput 4.96302K wps
[Epoch 128 Batch 30/62] avg loss 0.00387231, throughput 5.01926K wps
[Epoch 128 Batch 60/62] avg loss 0.00372534, throughput 4.87219K wps
Begin Testing...
[Epoch 128] train avg loss 0.00388084, dev acc 0.8112, dev avg loss 0.399709, throughput 4.95151K wps
[Epoch 129 Batch 30/62] avg loss 0.00380903, throughput 4.97509K wps
[Epoch 129 Batch 60/62] avg loss 0.00382588, throughput 4.8895K wps
Begin Testing...
[Epoch 129] train avg loss 0.00387062, dev acc 0.8171, dev avg loss 0.399632, throughput 4.93981K wps
[Epoch 130 Batch 30/62] avg loss 0.00381874, throughput 4.97589K wps
[Epoch 130 Batch 60/62] avg loss 0.0035677, throughput 4.86594K wps
Begin Testing...
[Epoch 130] train avg loss 0.00372278, dev acc 0.8171, dev avg loss 0.398883, throughput 4.92576K wps
[Epoch 131 Batch 30/62] avg loss 0.00366663, throughput 4.98631K wps
[Epoch 131 Batch 60/62] avg loss 0.00374895, throughput 4.85119K wps
Begin Testing...
[Epoch 131] train avg loss 0.0037542, dev acc 0.8142, dev avg loss 0.399258, throughput 4.92487K wps
[Epoch 132 Batch 30/62] avg loss 0.00363234, throughput 5.01581K wps
[Epoch 132 Batch 60/62] avg loss 0.00358603, throughput 4.85655K wps
Begin Testing...
[Epoch 132] train avg loss 0.00363987, dev acc 0.8083, dev avg loss 0.39606, throughput 4.94069K wps
[Epoch 133 Batch 30/62] avg loss 0.00348607, throughput 5.01395K wps
[Epoch 133 Batch 60/62] avg loss 0.00352088, throughput 4.89314K wps
Begin Testing...
[Epoch 133] train avg loss 0.00354232, dev acc 0.8053, dev avg loss 0.401704, throughput 4.95922K wps
[Epoch 134 Batch 30/62] avg loss 0.00353344, throughput 4.97484K wps
[Epoch 134 Batch 60/62] avg loss 0.00348346, throughput 4.89198K wps
Begin Testing...
[Epoch 134] train avg loss 0.00354476, dev acc 0.8112, dev avg loss 0.397273, throughput 4.94023K wps
[Epoch 135 Batch 30/62] avg loss 0.00337718, throughput 5.00814K wps
[Epoch 135 Batch 60/62] avg loss 0.00356653, throughput 4.88008K wps
Begin Testing...
[Epoch 135] train avg loss 0.00354593, dev acc 0.8142, dev avg loss 0.397858, throughput 4.94959K wps
[Epoch 136 Batch 30/62] avg loss 0.00335008, throughput 5.00735K wps
[Epoch 136 Batch 60/62] avg loss 0.00341643, throughput 4.8625K wps
Begin Testing...
[Epoch 136] train avg loss 0.00341351, dev acc 0.8083, dev avg loss 0.397166, throughput 4.94007K wps
[Epoch 137 Batch 30/62] avg loss 0.00336967, throughput 4.96694K wps
[Epoch 137 Batch 60/62] avg loss 0.00341309, throughput 4.89152K wps
Begin Testing...
[Epoch 137] train avg loss 0.00351219, dev acc 0.8112, dev avg loss 0.395637, throughput 4.93631K wps
[Epoch 138 Batch 30/62] avg loss 0.00314585, throughput 4.99443K wps
[Epoch 138 Batch 60/62] avg loss 0.00348973, throughput 4.88603K wps
Begin Testing...
[Epoch 138] train avg loss 0.0033734, dev acc 0.8142, dev avg loss 0.39633, throughput 4.94665K wps
[Epoch 139 Batch 30/62] avg loss 0.0033408, throughput 4.99143K wps
[Epoch 139 Batch 60/62] avg loss 0.00323387, throughput 4.87067K wps
Begin Testing...
[Epoch 139] train avg loss 0.00327344, dev acc 0.8201, dev avg loss 0.397462, throughput 4.93874K wps
Observed Improvement.
Begin Testing...
[Epoch 140 Batch 30/62] avg loss 0.00328611, throughput 4.96074K wps
[Epoch 140 Batch 60/62] avg loss 0.00318417, throughput 4.87477K wps
Begin Testing...
[Epoch 140] train avg loss 0.00329497, dev acc 0.7994, dev avg loss 0.41176, throughput 4.92497K wps
[Epoch 141 Batch 30/62] avg loss 0.00313226, throughput 4.96566K wps
[Epoch 141 Batch 60/62] avg loss 0.0031407, throughput 4.8737K wps
Begin Testing...
[Epoch 141] train avg loss 0.00321423, dev acc 0.8142, dev avg loss 0.398168, throughput 4.92528K wps
[Epoch 142 Batch 30/62] avg loss 0.00313292, throughput 4.97965K wps
[Epoch 142 Batch 60/62] avg loss 0.00310089, throughput 4.87832K wps
Begin Testing...
[Epoch 142] train avg loss 0.00317283, dev acc 0.8171, dev avg loss 0.397213, throughput 4.93495K wps
[Epoch 143 Batch 30/62] avg loss 0.00299545, throughput 4.96569K wps
[Epoch 143 Batch 60/62] avg loss 0.0030076, throughput 4.86855K wps
Begin Testing...
[Epoch 143] train avg loss 0.00303072, dev acc 0.8230, dev avg loss 0.399896, throughput 4.92523K wps
Observed Improvement.
Begin Testing...
[Epoch 144 Batch 30/62] avg loss 0.00285563, throughput 4.98668K wps
[Epoch 144 Batch 60/62] avg loss 0.00304991, throughput 4.8585K wps
Begin Testing...
[Epoch 144] train avg loss 0.00301927, dev acc 0.8083, dev avg loss 0.397333, throughput 4.92796K wps
[Epoch 145 Batch 30/62] avg loss 0.00290147, throughput 5.0091K wps
[Epoch 145 Batch 60/62] avg loss 0.00305334, throughput 4.89367K wps
Begin Testing...
[Epoch 145] train avg loss 0.00298586, dev acc 0.8142, dev avg loss 0.398464, throughput 4.95798K wps
[Epoch 146 Batch 30/62] avg loss 0.00295165, throughput 5.01127K wps
[Epoch 146 Batch 60/62] avg loss 0.00299386, throughput 4.90124K wps
Begin Testing...
[Epoch 146] train avg loss 0.00297956, dev acc 0.8112, dev avg loss 0.398124, throughput 4.96292K wps
[Epoch 147 Batch 30/62] avg loss 0.00275932, throughput 5.00549K wps
[Epoch 147 Batch 60/62] avg loss 0.00305894, throughput 4.86725K wps
Begin Testing...
[Epoch 147] train avg loss 0.00293748, dev acc 0.8112, dev avg loss 0.396447, throughput 4.94132K wps
[Epoch 148 Batch 30/62] avg loss 0.00285507, throughput 5.00097K wps
[Epoch 148 Batch 60/62] avg loss 0.00293618, throughput 4.88035K wps
Begin Testing...
[Epoch 148] train avg loss 0.00291936, dev acc 0.8112, dev avg loss 0.397962, throughput 4.9444K wps
[Epoch 149 Batch 30/62] avg loss 0.00267932, throughput 4.9898K wps
[Epoch 149 Batch 60/62] avg loss 0.00287841, throughput 4.88163K wps
Begin Testing...
[Epoch 149] train avg loss 0.00284804, dev acc 0.8142, dev avg loss 0.397273, throughput 4.94258K wps
[Epoch 150 Batch 30/62] avg loss 0.00286801, throughput 4.97749K wps
[Epoch 150 Batch 60/62] avg loss 0.00270199, throughput 4.90232K wps
Begin Testing...
[Epoch 150] train avg loss 0.00281214, dev acc 0.8142, dev avg loss 0.398139, throughput 4.94655K wps
[Epoch 151 Batch 30/62] avg loss 0.00262114, throughput 5.01095K wps
[Epoch 151 Batch 60/62] avg loss 0.00282819, throughput 4.89239K wps
Begin Testing...
[Epoch 151] train avg loss 0.00273828, dev acc 0.8260, dev avg loss 0.399309, throughput 4.95741K wps
Observed Improvement.
Begin Testing...
[Epoch 152 Batch 30/62] avg loss 0.00281123, throughput 5.02109K wps
[Epoch 152 Batch 60/62] avg loss 0.00252288, throughput 4.891K wps
Begin Testing...
[Epoch 152] train avg loss 0.00267737, dev acc 0.8142, dev avg loss 0.396559, throughput 4.96159K wps
[Epoch 153 Batch 30/62] avg loss 0.0025135, throughput 4.98859K wps
[Epoch 153 Batch 60/62] avg loss 0.00276365, throughput 4.88618K wps
Begin Testing...
[Epoch 153] train avg loss 0.00266042, dev acc 0.8112, dev avg loss 0.396869, throughput 4.94329K wps
[Epoch 154 Batch 30/62] avg loss 0.00255069, throughput 5.00374K wps
[Epoch 154 Batch 60/62] avg loss 0.0026628, throughput 4.88846K wps
Begin Testing...
[Epoch 154] train avg loss 0.00265011, dev acc 0.8112, dev avg loss 0.397253, throughput 4.95198K wps
[Epoch 155 Batch 30/62] avg loss 0.00261979, throughput 4.96448K wps
[Epoch 155 Batch 60/62] avg loss 0.00255332, throughput 4.89308K wps
Begin Testing...
[Epoch 155] train avg loss 0.00261368, dev acc 0.8171, dev avg loss 0.398591, throughput 4.93594K wps
[Epoch 156 Batch 30/62] avg loss 0.00248802, throughput 5.00985K wps
[Epoch 156 Batch 60/62] avg loss 0.0024854, throughput 4.89788K wps
Begin Testing...
[Epoch 156] train avg loss 0.00250149, dev acc 0.8083, dev avg loss 0.398966, throughput 4.96031K wps
[Epoch 157 Batch 30/62] avg loss 0.00239879, throughput 5.00825K wps
[Epoch 157 Batch 60/62] avg loss 0.00260478, throughput 4.85392K wps
Begin Testing...
[Epoch 157] train avg loss 0.00250776, dev acc 0.8112, dev avg loss 0.398804, throughput 4.93383K wps
[Epoch 158 Batch 30/62] avg loss 0.00256124, throughput 4.98854K wps
[Epoch 158 Batch 60/62] avg loss 0.00233099, throughput 4.8911K wps
Begin Testing...
[Epoch 158] train avg loss 0.00248584, dev acc 0.8142, dev avg loss 0.401199, throughput 4.94649K wps
[Epoch 159 Batch 30/62] avg loss 0.00241518, throughput 5.01184K wps
[Epoch 159 Batch 60/62] avg loss 0.00236705, throughput 4.8855K wps
Begin Testing...
[Epoch 159] train avg loss 0.00242948, dev acc 0.8083, dev avg loss 0.400834, throughput 4.95506K wps
[Epoch 160 Batch 30/62] avg loss 0.00244001, throughput 5.0175K wps
[Epoch 160 Batch 60/62] avg loss 0.00244074, throughput 4.87584K wps
Begin Testing...
[Epoch 160] train avg loss 0.00245747, dev acc 0.8201, dev avg loss 0.400028, throughput 4.95215K wps
[Epoch 161 Batch 30/62] avg loss 0.00220409, throughput 4.9666K wps
[Epoch 161 Batch 60/62] avg loss 0.00245011, throughput 4.8804K wps
Begin Testing...
[Epoch 161] train avg loss 0.00234388, dev acc 0.8112, dev avg loss 0.399888, throughput 4.92975K wps
[Epoch 162 Batch 30/62] avg loss 0.00224905, throughput 5.00333K wps
[Epoch 162 Batch 60/62] avg loss 0.0022178, throughput 4.89977K wps
Begin Testing...
[Epoch 162] train avg loss 0.00227079, dev acc 0.8230, dev avg loss 0.401002, throughput 4.95765K wps
[Epoch 163 Batch 30/62] avg loss 0.00226319, throughput 5.00074K wps
[Epoch 163 Batch 60/62] avg loss 0.00236743, throughput 4.8714K wps
Begin Testing...
[Epoch 163] train avg loss 0.00236983, dev acc 0.8201, dev avg loss 0.400885, throughput 4.94277K wps
[Epoch 164 Batch 30/62] avg loss 0.00217552, throughput 4.97549K wps
[Epoch 164 Batch 60/62] avg loss 0.00229377, throughput 4.85907K wps
Begin Testing...
[Epoch 164] train avg loss 0.00223806, dev acc 0.8201, dev avg loss 0.401614, throughput 4.92292K wps
[Epoch 165 Batch 30/62] avg loss 0.00225529, throughput 4.99192K wps
[Epoch 165 Batch 60/62] avg loss 0.0022205, throughput 4.89296K wps
Begin Testing...
[Epoch 165] train avg loss 0.00227506, dev acc 0.8142, dev avg loss 0.401753, throughput 4.94923K wps
[Epoch 166 Batch 30/62] avg loss 0.0022739, throughput 5.00261K wps
[Epoch 166 Batch 60/62] avg loss 0.00221394, throughput 4.8935K wps
Begin Testing...
[Epoch 166] train avg loss 0.00226342, dev acc 0.8171, dev avg loss 0.401246, throughput 4.95579K wps
[Epoch 167 Batch 30/62] avg loss 0.00223499, throughput 4.99525K wps
[Epoch 167 Batch 60/62] avg loss 0.00212573, throughput 4.87736K wps
Begin Testing...
[Epoch 167] train avg loss 0.00218885, dev acc 0.8112, dev avg loss 0.402461, throughput 4.94318K wps
[Epoch 168 Batch 30/62] avg loss 0.00198386, throughput 5.018K wps
[Epoch 168 Batch 60/62] avg loss 0.00214895, throughput 4.8954K wps
Begin Testing...
[Epoch 168] train avg loss 0.00208891, dev acc 0.8142, dev avg loss 0.402146, throughput 4.96307K wps
[Epoch 169 Batch 30/62] avg loss 0.00211389, throughput 4.98016K wps
[Epoch 169 Batch 60/62] avg loss 0.00201934, throughput 4.86099K wps
Begin Testing...
[Epoch 169] train avg loss 0.00208799, dev acc 0.8171, dev avg loss 0.403458, throughput 4.9262K wps
[Epoch 170 Batch 30/62] avg loss 0.0019463, throughput 4.99461K wps
[Epoch 170 Batch 60/62] avg loss 0.00211457, throughput 4.87878K wps
Begin Testing...
[Epoch 170] train avg loss 0.00205436, dev acc 0.8112, dev avg loss 0.403088, throughput 4.94168K wps
[Epoch 171 Batch 30/62] avg loss 0.00204154, throughput 5.00563K wps
[Epoch 171 Batch 60/62] avg loss 0.00205664, throughput 4.89938K wps
Begin Testing...
[Epoch 171] train avg loss 0.00207175, dev acc 0.8142, dev avg loss 0.405375, throughput 4.95736K wps
[Epoch 172 Batch 30/62] avg loss 0.00189364, throughput 4.9756K wps
[Epoch 172 Batch 60/62] avg loss 0.00199724, throughput 4.88742K wps
Begin Testing...
[Epoch 172] train avg loss 0.00197563, dev acc 0.8201, dev avg loss 0.404532, throughput 4.93857K wps
[Epoch 173 Batch 30/62] avg loss 0.00194901, throughput 4.99853K wps
[Epoch 173 Batch 60/62] avg loss 0.00202814, throughput 4.88121K wps
Begin Testing...
[Epoch 173] train avg loss 0.00199878, dev acc 0.8201, dev avg loss 0.406, throughput 4.94642K wps
[Epoch 174 Batch 30/62] avg loss 0.00202867, throughput 4.95865K wps
[Epoch 174 Batch 60/62] avg loss 0.00181437, throughput 4.89634K wps
Begin Testing...
[Epoch 174] train avg loss 0.00195303, dev acc 0.8201, dev avg loss 0.407552, throughput 4.9348K wps
[Epoch 175 Batch 30/62] avg loss 0.00187418, throughput 5.00887K wps
[Epoch 175 Batch 60/62] avg loss 0.00188367, throughput 4.86281K wps
Begin Testing...
[Epoch 175] train avg loss 0.0019022, dev acc 0.8230, dev avg loss 0.406726, throughput 4.94271K wps
[Epoch 176 Batch 30/62] avg loss 0.0018285, throughput 4.99139K wps
[Epoch 176 Batch 60/62] avg loss 0.00187191, throughput 4.87056K wps
Begin Testing...
[Epoch 176] train avg loss 0.00189611, dev acc 0.8201, dev avg loss 0.40845, throughput 4.93922K wps
[Epoch 177 Batch 30/62] avg loss 0.00173818, throughput 5.01301K wps
[Epoch 177 Batch 60/62] avg loss 0.00192563, throughput 4.89864K wps
Begin Testing...
[Epoch 177] train avg loss 0.00183954, dev acc 0.8201, dev avg loss 0.40791, throughput 4.96318K wps
[Epoch 178 Batch 30/62] avg loss 0.00187412, throughput 4.95693K wps
[Epoch 178 Batch 60/62] avg loss 0.00171962, throughput 4.87119K wps
Begin Testing...
[Epoch 178] train avg loss 0.00181021, dev acc 0.8201, dev avg loss 0.410303, throughput 4.92192K wps
[Epoch 179 Batch 30/62] avg loss 0.00179264, throughput 4.99177K wps
[Epoch 179 Batch 60/62] avg loss 0.00176433, throughput 4.85264K wps
Begin Testing...
[Epoch 179] train avg loss 0.00180997, dev acc 0.8053, dev avg loss 0.408519, throughput 4.92724K wps
[Epoch 180 Batch 30/62] avg loss 0.00183738, throughput 4.96708K wps
[Epoch 180 Batch 60/62] avg loss 0.00180889, throughput 4.89119K wps
Begin Testing...
[Epoch 180] train avg loss 0.00184676, dev acc 0.8201, dev avg loss 0.407433, throughput 4.93634K wps
[Epoch 181 Batch 30/62] avg loss 0.00176125, throughput 4.98087K wps
[Epoch 181 Batch 60/62] avg loss 0.00174442, throughput 4.75883K wps
Begin Testing...
[Epoch 181] train avg loss 0.00176237, dev acc 0.8142, dev avg loss 0.408446, throughput 4.87687K wps
[Epoch 182 Batch 30/62] avg loss 0.00173475, throughput 5.00376K wps
[Epoch 182 Batch 60/62] avg loss 0.00170764, throughput 4.88676K wps
Begin Testing...
[Epoch 182] train avg loss 0.00172236, dev acc 0.8171, dev avg loss 0.409293, throughput 4.95161K wps
[Epoch 183 Batch 30/62] avg loss 0.00172587, throughput 4.98969K wps
[Epoch 183 Batch 60/62] avg loss 0.00161945, throughput 4.86387K wps
Begin Testing...
[Epoch 183] train avg loss 0.00168153, dev acc 0.8112, dev avg loss 0.410228, throughput 4.93218K wps
[Epoch 184 Batch 30/62] avg loss 0.00177841, throughput 4.97816K wps
[Epoch 184 Batch 60/62] avg loss 0.00163825, throughput 4.89703K wps
Begin Testing...
[Epoch 184] train avg loss 0.00171127, dev acc 0.8171, dev avg loss 0.408181, throughput 4.94441K wps
[Epoch 185 Batch 30/62] avg loss 0.00165895, throughput 4.98872K wps
[Epoch 185 Batch 60/62] avg loss 0.0016511, throughput 4.89498K wps
Begin Testing...
[Epoch 185] train avg loss 0.0016692, dev acc 0.8201, dev avg loss 0.411168, throughput 4.94736K wps
[Epoch 186 Batch 30/62] avg loss 0.00157103, throughput 5.00143K wps
[Epoch 186 Batch 60/62] avg loss 0.00171769, throughput 4.8969K wps
Begin Testing...
[Epoch 186] train avg loss 0.00167188, dev acc 0.8230, dev avg loss 0.410541, throughput 4.95503K wps
[Epoch 187 Batch 30/62] avg loss 0.00157546, throughput 4.97892K wps
[Epoch 187 Batch 60/62] avg loss 0.00166328, throughput 4.89348K wps
Begin Testing...
[Epoch 187] train avg loss 0.00164436, dev acc 0.8171, dev avg loss 0.410177, throughput 4.94362K wps
[Epoch 188 Batch 30/62] avg loss 0.00166966, throughput 4.96335K wps
[Epoch 188 Batch 60/62] avg loss 0.00154988, throughput 4.88964K wps
Begin Testing...
[Epoch 188] train avg loss 0.00162208, dev acc 0.8260, dev avg loss 0.413799, throughput 4.93448K wps
Observed Improvement.
Begin Testing...
[Epoch 189 Batch 30/62] avg loss 0.00172076, throughput 4.99462K wps
[Epoch 189 Batch 60/62] avg loss 0.00157257, throughput 4.88717K wps
Begin Testing...
[Epoch 189] train avg loss 0.00165139, dev acc 0.8053, dev avg loss 0.410926, throughput 4.94738K wps
[Epoch 190 Batch 30/62] avg loss 0.00167905, throughput 5.0111K wps
[Epoch 190 Batch 60/62] avg loss 0.00149098, throughput 4.87627K wps
Begin Testing...
[Epoch 190] train avg loss 0.00158796, dev acc 0.8083, dev avg loss 0.413613, throughput 4.94966K wps
[Epoch 191 Batch 30/62] avg loss 0.00158393, throughput 4.97772K wps
[Epoch 191 Batch 60/62] avg loss 0.00149837, throughput 4.87654K wps
Begin Testing...
[Epoch 191] train avg loss 0.00155146, dev acc 0.8024, dev avg loss 0.414342, throughput 4.93434K wps
[Epoch 192 Batch 30/62] avg loss 0.00150733, throughput 4.96596K wps
[Epoch 192 Batch 60/62] avg loss 0.00152656, throughput 4.86307K wps
Begin Testing...
[Epoch 192] train avg loss 0.00152695, dev acc 0.8083, dev avg loss 0.413189, throughput 4.92163K wps
[Epoch 193 Batch 30/62] avg loss 0.0013303, throughput 5.01096K wps
[Epoch 193 Batch 60/62] avg loss 0.00157726, throughput 4.89195K wps
Begin Testing...
[Epoch 193] train avg loss 0.00150322, dev acc 0.8083, dev avg loss 0.414943, throughput 4.95872K wps
[Epoch 194 Batch 30/62] avg loss 0.00141741, throughput 4.98136K wps
[Epoch 194 Batch 60/62] avg loss 0.00154265, throughput 4.89512K wps
Begin Testing...
[Epoch 194] train avg loss 0.00149011, dev acc 0.8142, dev avg loss 0.411856, throughput 4.94528K wps
[Epoch 195 Batch 30/62] avg loss 0.00147193, throughput 5.01212K wps
[Epoch 195 Batch 60/62] avg loss 0.00144127, throughput 4.88601K wps
Begin Testing...
[Epoch 195] train avg loss 0.00148359, dev acc 0.8083, dev avg loss 0.419439, throughput 4.95459K wps
[Epoch 196 Batch 30/62] avg loss 0.00147718, throughput 5.01302K wps
[Epoch 196 Batch 60/62] avg loss 0.00149831, throughput 4.88786K wps
Begin Testing...
[Epoch 196] train avg loss 0.00153879, dev acc 0.8083, dev avg loss 0.414712, throughput 4.9562K wps
[Epoch 197 Batch 30/62] avg loss 0.00156929, throughput 5.01631K wps
[Epoch 197 Batch 60/62] avg loss 0.00139293, throughput 4.86968K wps
Begin Testing...
[Epoch 197] train avg loss 0.0014966, dev acc 0.8230, dev avg loss 0.417825, throughput 4.94868K wps
[Epoch 198 Batch 30/62] avg loss 0.00143492, throughput 4.98741K wps
[Epoch 198 Batch 60/62] avg loss 0.00150613, throughput 4.86907K wps
Begin Testing...
[Epoch 198] train avg loss 0.00148059, dev acc 0.8142, dev avg loss 0.413648, throughput 4.93416K wps
[Epoch 199 Batch 30/62] avg loss 0.00132303, throughput 5.00942K wps
[Epoch 199 Batch 60/62] avg loss 0.00143382, throughput 4.8875K wps
Begin Testing...
[Epoch 199] train avg loss 0.0013847, dev acc 0.8112, dev avg loss 0.414258, throughput 4.95265K wps
Test loss 0.365241, test acc 0.8408
Total time cost 275.22s
[Epoch 0 Batch 30/62] avg loss 0.0133611, throughput 4.77981K wps
[Epoch 0 Batch 60/62] avg loss 0.0130122, throughput 4.89362K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133924, dev acc 0.6578, dev avg loss 0.641837, throughput 4.84637K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0131616, throughput 5.00434K wps
[Epoch 1 Batch 60/62] avg loss 0.0130379, throughput 4.89329K wps
Begin Testing...
[Epoch 1] train avg loss 0.0132696, dev acc 0.6578, dev avg loss 0.641253, throughput 4.95413K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0132605, throughput 5.00522K wps
[Epoch 2 Batch 60/62] avg loss 0.0129118, throughput 4.86744K wps
Begin Testing...
[Epoch 2] train avg loss 0.0132755, dev acc 0.6578, dev avg loss 0.641682, throughput 4.9421K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0131844, throughput 5.00835K wps
[Epoch 3 Batch 60/62] avg loss 0.0129033, throughput 4.89307K wps
Begin Testing...
[Epoch 3] train avg loss 0.0132305, dev acc 0.6578, dev avg loss 0.640665, throughput 4.95737K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0128625, throughput 4.99668K wps
[Epoch 4 Batch 60/62] avg loss 0.0131847, throughput 4.88575K wps
Begin Testing...
[Epoch 4] train avg loss 0.0131797, dev acc 0.6578, dev avg loss 0.640977, throughput 4.94836K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0130246, throughput 5.00083K wps
[Epoch 5 Batch 60/62] avg loss 0.0128076, throughput 4.89991K wps
Begin Testing...
[Epoch 5] train avg loss 0.0131699, dev acc 0.6578, dev avg loss 0.640972, throughput 4.9565K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0128677, throughput 4.99886K wps
[Epoch 6 Batch 60/62] avg loss 0.0130044, throughput 4.89249K wps
Begin Testing...
[Epoch 6] train avg loss 0.0130846, dev acc 0.6578, dev avg loss 0.638309, throughput 4.95291K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0127685, throughput 5.02225K wps
[Epoch 7 Batch 60/62] avg loss 0.0129997, throughput 4.89438K wps
Begin Testing...
[Epoch 7] train avg loss 0.013069, dev acc 0.6578, dev avg loss 0.637633, throughput 4.96292K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0127752, throughput 4.98655K wps
[Epoch 8 Batch 60/62] avg loss 0.0130338, throughput 4.874K wps
Begin Testing...
[Epoch 8] train avg loss 0.0130558, dev acc 0.6578, dev avg loss 0.635758, throughput 4.93671K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0128249, throughput 5.00631K wps
[Epoch 9 Batch 60/62] avg loss 0.0128662, throughput 4.89267K wps
Begin Testing...
[Epoch 9] train avg loss 0.0130107, dev acc 0.6578, dev avg loss 0.635692, throughput 4.95379K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0128023, throughput 5.01875K wps
[Epoch 10 Batch 60/62] avg loss 0.0128124, throughput 4.87967K wps
Begin Testing...
[Epoch 10] train avg loss 0.0129668, dev acc 0.6578, dev avg loss 0.633814, throughput 4.95548K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.01282, throughput 4.99366K wps
[Epoch 11 Batch 60/62] avg loss 0.0127249, throughput 4.89153K wps
Begin Testing...
[Epoch 11] train avg loss 0.0129478, dev acc 0.6578, dev avg loss 0.634126, throughput 4.94955K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0127942, throughput 5.00752K wps
[Epoch 12 Batch 60/62] avg loss 0.0127586, throughput 4.89212K wps
Begin Testing...
[Epoch 12] train avg loss 0.0129529, dev acc 0.6578, dev avg loss 0.632312, throughput 4.95548K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0127005, throughput 5.02455K wps
[Epoch 13 Batch 60/62] avg loss 0.0126707, throughput 4.89721K wps
Begin Testing...
[Epoch 13] train avg loss 0.012878, dev acc 0.6578, dev avg loss 0.631307, throughput 4.96694K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0126141, throughput 5.00566K wps
[Epoch 14 Batch 60/62] avg loss 0.0127392, throughput 4.86698K wps
Begin Testing...
[Epoch 14] train avg loss 0.0128304, dev acc 0.6578, dev avg loss 0.629437, throughput 4.94222K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0125789, throughput 5.01044K wps
[Epoch 15 Batch 60/62] avg loss 0.0127103, throughput 4.91138K wps
Begin Testing...
[Epoch 15] train avg loss 0.0128312, dev acc 0.6578, dev avg loss 0.62905, throughput 4.96675K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0124923, throughput 4.99048K wps
[Epoch 16 Batch 60/62] avg loss 0.0127458, throughput 4.8993K wps
Begin Testing...
[Epoch 16] train avg loss 0.0127675, dev acc 0.6578, dev avg loss 0.626766, throughput 4.95228K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0126583, throughput 4.96033K wps
[Epoch 17 Batch 60/62] avg loss 0.0124968, throughput 4.86348K wps
Begin Testing...
[Epoch 17] train avg loss 0.0127409, dev acc 0.6578, dev avg loss 0.625005, throughput 4.91919K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.0125261, throughput 5.00501K wps
[Epoch 18 Batch 60/62] avg loss 0.0124647, throughput 4.89679K wps
Begin Testing...
[Epoch 18] train avg loss 0.0126953, dev acc 0.6578, dev avg loss 0.625307, throughput 4.95654K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0125063, throughput 5.01566K wps
[Epoch 19 Batch 60/62] avg loss 0.0124286, throughput 4.90617K wps
Begin Testing...
[Epoch 19] train avg loss 0.0126258, dev acc 0.6578, dev avg loss 0.621636, throughput 4.96678K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0125415, throughput 4.98452K wps
[Epoch 20 Batch 60/62] avg loss 0.0123593, throughput 4.87225K wps
Begin Testing...
[Epoch 20] train avg loss 0.0125823, dev acc 0.6578, dev avg loss 0.619692, throughput 4.93372K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.0123318, throughput 4.98516K wps
[Epoch 21 Batch 60/62] avg loss 0.0123703, throughput 4.86657K wps
Begin Testing...
[Epoch 21] train avg loss 0.0124753, dev acc 0.6578, dev avg loss 0.61745, throughput 4.93242K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.0122998, throughput 5.01292K wps
[Epoch 22 Batch 60/62] avg loss 0.0123009, throughput 4.86488K wps
Begin Testing...
[Epoch 22] train avg loss 0.0124584, dev acc 0.6578, dev avg loss 0.615744, throughput 4.94485K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.0123342, throughput 4.97616K wps
[Epoch 23 Batch 60/62] avg loss 0.0122022, throughput 4.89314K wps
Begin Testing...
[Epoch 23] train avg loss 0.0123804, dev acc 0.6578, dev avg loss 0.612872, throughput 4.94042K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.0122524, throughput 4.97044K wps
[Epoch 24 Batch 60/62] avg loss 0.0120382, throughput 4.87261K wps
Begin Testing...
[Epoch 24] train avg loss 0.012352, dev acc 0.6608, dev avg loss 0.611497, throughput 4.92865K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.0121925, throughput 4.99051K wps
[Epoch 25 Batch 60/62] avg loss 0.0119701, throughput 4.89364K wps
Begin Testing...
[Epoch 25] train avg loss 0.0122708, dev acc 0.6696, dev avg loss 0.609258, throughput 4.94867K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.0119737, throughput 5.01132K wps
[Epoch 26 Batch 60/62] avg loss 0.0119688, throughput 4.88109K wps
Begin Testing...
[Epoch 26] train avg loss 0.0121242, dev acc 0.6785, dev avg loss 0.606025, throughput 4.95231K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/62] avg loss 0.0121151, throughput 5.00475K wps
[Epoch 27 Batch 60/62] avg loss 0.0117996, throughput 4.8878K wps
Begin Testing...
[Epoch 27] train avg loss 0.012108, dev acc 0.6814, dev avg loss 0.602686, throughput 4.95294K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.0118608, throughput 5.00814K wps
[Epoch 28 Batch 60/62] avg loss 0.0119236, throughput 4.88663K wps
Begin Testing...
[Epoch 28] train avg loss 0.0120801, dev acc 0.6844, dev avg loss 0.600888, throughput 4.95247K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.0118192, throughput 5.01687K wps
[Epoch 29 Batch 60/62] avg loss 0.0117804, throughput 4.89852K wps
Begin Testing...
[Epoch 29] train avg loss 0.0119268, dev acc 0.6844, dev avg loss 0.597508, throughput 4.96344K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.0117528, throughput 5.01512K wps
[Epoch 30 Batch 60/62] avg loss 0.0117283, throughput 4.86442K wps
Begin Testing...
[Epoch 30] train avg loss 0.0118876, dev acc 0.6814, dev avg loss 0.595428, throughput 4.94486K wps
[Epoch 31 Batch 30/62] avg loss 0.0117121, throughput 5.00403K wps
[Epoch 31 Batch 60/62] avg loss 0.0115714, throughput 4.89138K wps
Begin Testing...
[Epoch 31] train avg loss 0.0117868, dev acc 0.6844, dev avg loss 0.593349, throughput 4.95384K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/62] avg loss 0.0116966, throughput 4.99677K wps
[Epoch 32 Batch 60/62] avg loss 0.0114912, throughput 4.88117K wps
Begin Testing...
[Epoch 32] train avg loss 0.0117011, dev acc 0.6814, dev avg loss 0.589942, throughput 4.94632K wps
[Epoch 33 Batch 30/62] avg loss 0.0114806, throughput 5.01109K wps
[Epoch 33 Batch 60/62] avg loss 0.0113713, throughput 4.88672K wps
Begin Testing...
[Epoch 33] train avg loss 0.0115463, dev acc 0.6873, dev avg loss 0.587401, throughput 4.95502K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.0114888, throughput 5.01944K wps
[Epoch 34 Batch 60/62] avg loss 0.011351, throughput 4.90316K wps
Begin Testing...
[Epoch 34] train avg loss 0.0115352, dev acc 0.6814, dev avg loss 0.585018, throughput 4.96685K wps
[Epoch 35 Batch 30/62] avg loss 0.0114408, throughput 5.00768K wps
[Epoch 35 Batch 60/62] avg loss 0.0113929, throughput 4.87844K wps
Begin Testing...
[Epoch 35] train avg loss 0.0115264, dev acc 0.6755, dev avg loss 0.582714, throughput 4.94955K wps
[Epoch 36 Batch 30/62] avg loss 0.0111469, throughput 5.00866K wps
[Epoch 36 Batch 60/62] avg loss 0.0113154, throughput 4.89397K wps
Begin Testing...
[Epoch 36] train avg loss 0.0113919, dev acc 0.6785, dev avg loss 0.580543, throughput 4.94931K wps
[Epoch 37 Batch 30/62] avg loss 0.0110912, throughput 5.00841K wps
[Epoch 37 Batch 60/62] avg loss 0.0111578, throughput 4.85044K wps
Begin Testing...
[Epoch 37] train avg loss 0.011333, dev acc 0.6932, dev avg loss 0.582425, throughput 4.93469K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/62] avg loss 0.0112139, throughput 4.98224K wps
[Epoch 38 Batch 60/62] avg loss 0.0110955, throughput 4.89594K wps
Begin Testing...
[Epoch 38] train avg loss 0.0112843, dev acc 0.6785, dev avg loss 0.575696, throughput 4.9444K wps
[Epoch 39 Batch 30/62] avg loss 0.0109317, throughput 5.01516K wps
[Epoch 39 Batch 60/62] avg loss 0.0110479, throughput 4.87763K wps
Begin Testing...
[Epoch 39] train avg loss 0.011092, dev acc 0.6814, dev avg loss 0.572953, throughput 4.95233K wps
[Epoch 40 Batch 30/62] avg loss 0.0109944, throughput 5.00407K wps
[Epoch 40 Batch 60/62] avg loss 0.0109665, throughput 4.89611K wps
Begin Testing...
[Epoch 40] train avg loss 0.0110922, dev acc 0.6814, dev avg loss 0.57014, throughput 4.95595K wps
[Epoch 41 Batch 30/62] avg loss 0.0109753, throughput 4.98779K wps
[Epoch 41 Batch 60/62] avg loss 0.0107275, throughput 4.89792K wps
Begin Testing...
[Epoch 41] train avg loss 0.011041, dev acc 0.6814, dev avg loss 0.5675, throughput 4.94932K wps
[Epoch 42 Batch 30/62] avg loss 0.0108771, throughput 4.99682K wps
[Epoch 42 Batch 60/62] avg loss 0.0107638, throughput 4.89811K wps
Begin Testing...
[Epoch 42] train avg loss 0.0109206, dev acc 0.6903, dev avg loss 0.564973, throughput 4.95418K wps
[Epoch 43 Batch 30/62] avg loss 0.0108106, throughput 4.97817K wps
[Epoch 43 Batch 60/62] avg loss 0.010671, throughput 4.86059K wps
Begin Testing...
[Epoch 43] train avg loss 0.0108819, dev acc 0.6962, dev avg loss 0.562238, throughput 4.92708K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/62] avg loss 0.0105256, throughput 5.00136K wps
[Epoch 44 Batch 60/62] avg loss 0.0108431, throughput 4.89368K wps
Begin Testing...
[Epoch 44] train avg loss 0.0107369, dev acc 0.6814, dev avg loss 0.559485, throughput 4.95326K wps
[Epoch 45 Batch 30/62] avg loss 0.0106783, throughput 5.0074K wps
[Epoch 45 Batch 60/62] avg loss 0.0105513, throughput 4.86985K wps
Begin Testing...
[Epoch 45] train avg loss 0.010701, dev acc 0.7139, dev avg loss 0.557041, throughput 4.94441K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/62] avg loss 0.0103263, throughput 5.0114K wps
[Epoch 46 Batch 60/62] avg loss 0.0105096, throughput 4.85462K wps
Begin Testing...
[Epoch 46] train avg loss 0.0105315, dev acc 0.7139, dev avg loss 0.553381, throughput 4.93836K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/62] avg loss 0.0104543, throughput 4.97726K wps
[Epoch 47 Batch 60/62] avg loss 0.0102651, throughput 4.88626K wps
Begin Testing...
[Epoch 47] train avg loss 0.0104324, dev acc 0.7080, dev avg loss 0.550481, throughput 4.93877K wps
[Epoch 48 Batch 30/62] avg loss 0.01057, throughput 5.00085K wps
[Epoch 48 Batch 60/62] avg loss 0.00998717, throughput 4.90179K wps
Begin Testing...
[Epoch 48] train avg loss 0.0103398, dev acc 0.7109, dev avg loss 0.547523, throughput 4.95802K wps
[Epoch 49 Batch 30/62] avg loss 0.0101936, throughput 5.01438K wps
[Epoch 49 Batch 60/62] avg loss 0.0101213, throughput 4.87742K wps
Begin Testing...
[Epoch 49] train avg loss 0.0102843, dev acc 0.7316, dev avg loss 0.546009, throughput 4.95185K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.0103237, throughput 5.01231K wps
[Epoch 50 Batch 60/62] avg loss 0.00990112, throughput 4.90697K wps
Begin Testing...
[Epoch 50] train avg loss 0.0102605, dev acc 0.7316, dev avg loss 0.542347, throughput 4.96582K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/62] avg loss 0.00999401, throughput 5.0096K wps
[Epoch 51 Batch 60/62] avg loss 0.00995486, throughput 4.88161K wps
Begin Testing...
[Epoch 51] train avg loss 0.0100703, dev acc 0.7109, dev avg loss 0.538107, throughput 4.94906K wps
[Epoch 52 Batch 30/62] avg loss 0.00984665, throughput 5.00202K wps
[Epoch 52 Batch 60/62] avg loss 0.00994802, throughput 4.8935K wps
Begin Testing...
[Epoch 52] train avg loss 0.0100216, dev acc 0.7522, dev avg loss 0.537075, throughput 4.95474K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.0099796, throughput 5.00978K wps
[Epoch 53 Batch 60/62] avg loss 0.0096686, throughput 4.88133K wps
Begin Testing...
[Epoch 53] train avg loss 0.00992491, dev acc 0.7227, dev avg loss 0.531903, throughput 4.95036K wps
[Epoch 54 Batch 30/62] avg loss 0.00963323, throughput 5.01347K wps
[Epoch 54 Batch 60/62] avg loss 0.0098227, throughput 4.89403K wps
Begin Testing...
[Epoch 54] train avg loss 0.00984204, dev acc 0.7375, dev avg loss 0.52867, throughput 4.95931K wps
[Epoch 55 Batch 30/62] avg loss 0.00965027, throughput 5.00912K wps
[Epoch 55 Batch 60/62] avg loss 0.00949441, throughput 4.87521K wps
Begin Testing...
[Epoch 55] train avg loss 0.00969738, dev acc 0.7552, dev avg loss 0.526168, throughput 4.94914K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.00942513, throughput 5.01342K wps
[Epoch 56 Batch 60/62] avg loss 0.00963383, throughput 4.89736K wps
Begin Testing...
[Epoch 56] train avg loss 0.00967909, dev acc 0.7522, dev avg loss 0.522381, throughput 4.96212K wps
[Epoch 57 Batch 30/62] avg loss 0.00936905, throughput 4.9964K wps
[Epoch 57 Batch 60/62] avg loss 0.00937751, throughput 4.87328K wps
Begin Testing...
[Epoch 57] train avg loss 0.00945086, dev acc 0.7552, dev avg loss 0.519273, throughput 4.94011K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/62] avg loss 0.00943446, throughput 4.98545K wps
[Epoch 58 Batch 60/62] avg loss 0.00918732, throughput 4.87993K wps
Begin Testing...
[Epoch 58] train avg loss 0.00944332, dev acc 0.7670, dev avg loss 0.516366, throughput 4.93896K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/62] avg loss 0.00927891, throughput 5.01159K wps
[Epoch 59 Batch 60/62] avg loss 0.00902793, throughput 4.89112K wps
Begin Testing...
[Epoch 59] train avg loss 0.00922799, dev acc 0.7611, dev avg loss 0.513303, throughput 4.95746K wps
[Epoch 60 Batch 30/62] avg loss 0.00925144, throughput 4.99254K wps
[Epoch 60 Batch 60/62] avg loss 0.00909732, throughput 4.8798K wps
Begin Testing...
[Epoch 60] train avg loss 0.00930429, dev acc 0.7699, dev avg loss 0.511216, throughput 4.94189K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/62] avg loss 0.00915313, throughput 4.99453K wps
[Epoch 61 Batch 60/62] avg loss 0.00900497, throughput 4.89318K wps
Begin Testing...
[Epoch 61] train avg loss 0.00914236, dev acc 0.7699, dev avg loss 0.507794, throughput 4.95001K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/62] avg loss 0.00894382, throughput 4.97176K wps
[Epoch 62 Batch 60/62] avg loss 0.00895704, throughput 4.86692K wps
Begin Testing...
[Epoch 62] train avg loss 0.00907425, dev acc 0.7729, dev avg loss 0.505124, throughput 4.92691K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/62] avg loss 0.00873092, throughput 4.99871K wps
[Epoch 63 Batch 60/62] avg loss 0.00900003, throughput 4.89404K wps
Begin Testing...
[Epoch 63] train avg loss 0.00893784, dev acc 0.7729, dev avg loss 0.502519, throughput 4.9523K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/62] avg loss 0.00858921, throughput 5.00744K wps
[Epoch 64 Batch 60/62] avg loss 0.00901062, throughput 4.89373K wps
Begin Testing...
[Epoch 64] train avg loss 0.00889335, dev acc 0.7758, dev avg loss 0.499679, throughput 4.95644K wps
Observed Improvement.
Begin Testing...
[Epoch 65 Batch 30/62] avg loss 0.00864595, throughput 5.00364K wps
[Epoch 65 Batch 60/62] avg loss 0.00881093, throughput 4.87648K wps
Begin Testing...
[Epoch 65] train avg loss 0.00882342, dev acc 0.7581, dev avg loss 0.498159, throughput 4.94556K wps
[Epoch 66 Batch 30/62] avg loss 0.0087033, throughput 4.98893K wps
[Epoch 66 Batch 60/62] avg loss 0.00839975, throughput 4.87753K wps
Begin Testing...
[Epoch 66] train avg loss 0.00871374, dev acc 0.7699, dev avg loss 0.493878, throughput 4.94107K wps
[Epoch 67 Batch 30/62] avg loss 0.00860407, throughput 4.99051K wps
[Epoch 67 Batch 60/62] avg loss 0.00857797, throughput 4.86142K wps
Begin Testing...
[Epoch 67] train avg loss 0.00866699, dev acc 0.7817, dev avg loss 0.491506, throughput 4.93191K wps
Observed Improvement.
Begin Testing...
[Epoch 68 Batch 30/62] avg loss 0.00850154, throughput 4.98526K wps
[Epoch 68 Batch 60/62] avg loss 0.00814877, throughput 4.90477K wps
Begin Testing...
[Epoch 68] train avg loss 0.0084705, dev acc 0.7758, dev avg loss 0.4904, throughput 4.9521K wps
[Epoch 69 Batch 30/62] avg loss 0.00846194, throughput 5.02059K wps
[Epoch 69 Batch 60/62] avg loss 0.00820782, throughput 4.88732K wps
Begin Testing...
[Epoch 69] train avg loss 0.00851975, dev acc 0.7640, dev avg loss 0.486266, throughput 4.95829K wps
[Epoch 70 Batch 30/62] avg loss 0.00815533, throughput 4.9949K wps
[Epoch 70 Batch 60/62] avg loss 0.00832884, throughput 4.8837K wps
Begin Testing...
[Epoch 70] train avg loss 0.00833658, dev acc 0.7817, dev avg loss 0.484654, throughput 4.94563K wps
Observed Improvement.
Begin Testing...
[Epoch 71 Batch 30/62] avg loss 0.00824766, throughput 4.99201K wps
[Epoch 71 Batch 60/62] avg loss 0.0081487, throughput 4.8829K wps
Begin Testing...
[Epoch 71] train avg loss 0.008323, dev acc 0.7699, dev avg loss 0.481321, throughput 4.94378K wps
[Epoch 72 Batch 30/62] avg loss 0.00807546, throughput 5.0068K wps
[Epoch 72 Batch 60/62] avg loss 0.00809303, throughput 4.87459K wps
Begin Testing...
[Epoch 72] train avg loss 0.00816853, dev acc 0.7640, dev avg loss 0.480218, throughput 4.94745K wps
[Epoch 73 Batch 30/62] avg loss 0.00792697, throughput 5.00221K wps
[Epoch 73 Batch 60/62] avg loss 0.0080061, throughput 4.90023K wps
Begin Testing...
[Epoch 73] train avg loss 0.00814069, dev acc 0.7581, dev avg loss 0.477759, throughput 4.95773K wps
[Epoch 74 Batch 30/62] avg loss 0.00797069, throughput 5.02416K wps
[Epoch 74 Batch 60/62] avg loss 0.00783641, throughput 4.90191K wps
Begin Testing...
[Epoch 74] train avg loss 0.0080165, dev acc 0.7758, dev avg loss 0.474281, throughput 4.96925K wps
[Epoch 75 Batch 30/62] avg loss 0.00778918, throughput 4.98832K wps
[Epoch 75 Batch 60/62] avg loss 0.00772562, throughput 4.89779K wps
Begin Testing...
[Epoch 75] train avg loss 0.00780909, dev acc 0.7611, dev avg loss 0.473068, throughput 4.9489K wps
[Epoch 76 Batch 30/62] avg loss 0.00791502, throughput 5.00117K wps
[Epoch 76 Batch 60/62] avg loss 0.00751799, throughput 4.88864K wps
Begin Testing...
[Epoch 76] train avg loss 0.0078345, dev acc 0.7699, dev avg loss 0.471156, throughput 4.95039K wps
[Epoch 77 Batch 30/62] avg loss 0.0073071, throughput 4.98346K wps
[Epoch 77 Batch 60/62] avg loss 0.00788781, throughput 4.89572K wps
Begin Testing...
[Epoch 77] train avg loss 0.00769625, dev acc 0.7729, dev avg loss 0.469829, throughput 4.94628K wps
[Epoch 78 Batch 30/62] avg loss 0.00746391, throughput 4.99463K wps
[Epoch 78 Batch 60/62] avg loss 0.00757715, throughput 4.88737K wps
Begin Testing...
[Epoch 78] train avg loss 0.007599, dev acc 0.7670, dev avg loss 0.464875, throughput 4.94706K wps
[Epoch 79 Batch 30/62] avg loss 0.00755942, throughput 5.02291K wps
[Epoch 79 Batch 60/62] avg loss 0.00724247, throughput 4.8761K wps
Begin Testing...
[Epoch 79] train avg loss 0.00761484, dev acc 0.7729, dev avg loss 0.463245, throughput 4.9538K wps
[Epoch 80 Batch 30/62] avg loss 0.00733277, throughput 4.9897K wps
[Epoch 80 Batch 60/62] avg loss 0.00733443, throughput 4.88367K wps
Begin Testing...
[Epoch 80] train avg loss 0.00742085, dev acc 0.7788, dev avg loss 0.460312, throughput 4.94254K wps
[Epoch 81 Batch 30/62] avg loss 0.00709966, throughput 4.99738K wps
[Epoch 81 Batch 60/62] avg loss 0.0072864, throughput 4.87352K wps
Begin Testing...
[Epoch 81] train avg loss 0.00732653, dev acc 0.7729, dev avg loss 0.458013, throughput 4.94245K wps
[Epoch 82 Batch 30/62] avg loss 0.00741705, throughput 5.00481K wps
[Epoch 82 Batch 60/62] avg loss 0.00709068, throughput 4.88112K wps
Begin Testing...
[Epoch 82] train avg loss 0.00737267, dev acc 0.7876, dev avg loss 0.455916, throughput 4.94964K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/62] avg loss 0.00698419, throughput 4.99934K wps
[Epoch 83 Batch 60/62] avg loss 0.00716818, throughput 4.8942K wps
Begin Testing...
[Epoch 83] train avg loss 0.00712892, dev acc 0.7847, dev avg loss 0.453948, throughput 4.95283K wps
[Epoch 84 Batch 30/62] avg loss 0.00702155, throughput 5.01192K wps
[Epoch 84 Batch 60/62] avg loss 0.00698184, throughput 4.89367K wps
Begin Testing...
[Epoch 84] train avg loss 0.0070216, dev acc 0.7906, dev avg loss 0.451836, throughput 4.95842K wps
Observed Improvement.
Begin Testing...
[Epoch 85 Batch 30/62] avg loss 0.00713321, throughput 5.00985K wps
[Epoch 85 Batch 60/62] avg loss 0.00674195, throughput 4.88891K wps
Begin Testing...
[Epoch 85] train avg loss 0.00697865, dev acc 0.7847, dev avg loss 0.448568, throughput 4.95468K wps
[Epoch 86 Batch 30/62] avg loss 0.00678239, throughput 4.9755K wps
[Epoch 86 Batch 60/62] avg loss 0.00675844, throughput 4.89241K wps
Begin Testing...
[Epoch 86] train avg loss 0.00693283, dev acc 0.7729, dev avg loss 0.448211, throughput 4.93901K wps
[Epoch 87 Batch 30/62] avg loss 0.00665185, throughput 5.00553K wps
[Epoch 87 Batch 60/62] avg loss 0.00663356, throughput 4.9017K wps
Begin Testing...
[Epoch 87] train avg loss 0.00673533, dev acc 0.7876, dev avg loss 0.44687, throughput 4.95994K wps
[Epoch 88 Batch 30/62] avg loss 0.00663566, throughput 5.00615K wps
[Epoch 88 Batch 60/62] avg loss 0.00662716, throughput 4.89407K wps
Begin Testing...
[Epoch 88] train avg loss 0.00671225, dev acc 0.7788, dev avg loss 0.442358, throughput 4.9569K wps
[Epoch 89 Batch 30/62] avg loss 0.00672472, throughput 5.00725K wps
[Epoch 89 Batch 60/62] avg loss 0.00629986, throughput 4.89989K wps
Begin Testing...
[Epoch 89] train avg loss 0.00661013, dev acc 0.7817, dev avg loss 0.440254, throughput 4.95963K wps
[Epoch 90 Batch 30/62] avg loss 0.00661717, throughput 4.99299K wps
[Epoch 90 Batch 60/62] avg loss 0.00643411, throughput 4.88722K wps
Begin Testing...
[Epoch 90] train avg loss 0.00661823, dev acc 0.7906, dev avg loss 0.440044, throughput 4.94685K wps
Observed Improvement.
Begin Testing...
[Epoch 91 Batch 30/62] avg loss 0.00643801, throughput 4.99511K wps
[Epoch 91 Batch 60/62] avg loss 0.00610881, throughput 4.88964K wps
Begin Testing...
[Epoch 91] train avg loss 0.00634661, dev acc 0.7935, dev avg loss 0.436211, throughput 4.94786K wps
Observed Improvement.
Begin Testing...
[Epoch 92 Batch 30/62] avg loss 0.00628645, throughput 5.00239K wps
[Epoch 92 Batch 60/62] avg loss 0.00632543, throughput 4.8747K wps
Begin Testing...
[Epoch 92] train avg loss 0.0063409, dev acc 0.7876, dev avg loss 0.435986, throughput 4.94324K wps
[Epoch 93 Batch 30/62] avg loss 0.00616583, throughput 4.97986K wps
[Epoch 93 Batch 60/62] avg loss 0.00638364, throughput 4.89653K wps
Begin Testing...
[Epoch 93] train avg loss 0.00631909, dev acc 0.7876, dev avg loss 0.432146, throughput 4.94631K wps
[Epoch 94 Batch 30/62] avg loss 0.0060057, throughput 5.0217K wps
[Epoch 94 Batch 60/62] avg loss 0.00609241, throughput 4.8738K wps
Begin Testing...
[Epoch 94] train avg loss 0.00609663, dev acc 0.7906, dev avg loss 0.43221, throughput 4.9524K wps
[Epoch 95 Batch 30/62] avg loss 0.00584864, throughput 5.00282K wps
[Epoch 95 Batch 60/62] avg loss 0.00622352, throughput 4.89944K wps
Begin Testing...
[Epoch 95] train avg loss 0.00605885, dev acc 0.7935, dev avg loss 0.428555, throughput 4.95777K wps
Observed Improvement.
Begin Testing...
[Epoch 96 Batch 30/62] avg loss 0.00592738, throughput 5.00104K wps
[Epoch 96 Batch 60/62] avg loss 0.00593166, throughput 4.88365K wps
Begin Testing...
[Epoch 96] train avg loss 0.00606255, dev acc 0.7994, dev avg loss 0.427503, throughput 4.94858K wps
Observed Improvement.
Begin Testing...
[Epoch 97 Batch 30/62] avg loss 0.00585452, throughput 5.01067K wps
[Epoch 97 Batch 60/62] avg loss 0.00606258, throughput 4.89022K wps
Begin Testing...
[Epoch 97] train avg loss 0.0060665, dev acc 0.7965, dev avg loss 0.425871, throughput 4.95789K wps
[Epoch 98 Batch 30/62] avg loss 0.00568828, throughput 4.99892K wps
[Epoch 98 Batch 60/62] avg loss 0.00575835, throughput 4.90002K wps
Begin Testing...
[Epoch 98] train avg loss 0.00578371, dev acc 0.7965, dev avg loss 0.423694, throughput 4.95629K wps
[Epoch 99 Batch 30/62] avg loss 0.0056949, throughput 4.9679K wps
[Epoch 99 Batch 60/62] avg loss 0.00558937, throughput 4.90003K wps
Begin Testing...
[Epoch 99] train avg loss 0.00571232, dev acc 0.7965, dev avg loss 0.423306, throughput 4.94101K wps
[Epoch 100 Batch 30/62] avg loss 0.00586708, throughput 5.00961K wps
[Epoch 100 Batch 60/62] avg loss 0.00545464, throughput 4.89386K wps
Begin Testing...
[Epoch 100] train avg loss 0.00571631, dev acc 0.7994, dev avg loss 0.420462, throughput 4.95732K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/62] avg loss 0.00561734, throughput 4.99075K wps
[Epoch 101 Batch 60/62] avg loss 0.00532742, throughput 4.90074K wps
Begin Testing...
[Epoch 101] train avg loss 0.00554026, dev acc 0.8083, dev avg loss 0.418015, throughput 4.95084K wps
Observed Improvement.
Begin Testing...
[Epoch 102 Batch 30/62] avg loss 0.00514344, throughput 4.98978K wps
[Epoch 102 Batch 60/62] avg loss 0.00567538, throughput 4.89907K wps
Begin Testing...
[Epoch 102] train avg loss 0.00544715, dev acc 0.7994, dev avg loss 0.417366, throughput 4.95141K wps
[Epoch 103 Batch 30/62] avg loss 0.00555231, throughput 4.99674K wps
[Epoch 103 Batch 60/62] avg loss 0.00508602, throughput 4.86068K wps
Begin Testing...
[Epoch 103] train avg loss 0.00541379, dev acc 0.7994, dev avg loss 0.417561, throughput 4.93438K wps
[Epoch 104 Batch 30/62] avg loss 0.00529039, throughput 5.00387K wps
[Epoch 104 Batch 60/62] avg loss 0.00564614, throughput 4.88322K wps
Begin Testing...
[Epoch 104] train avg loss 0.00549211, dev acc 0.8053, dev avg loss 0.414629, throughput 4.95095K wps
[Epoch 105 Batch 30/62] avg loss 0.00512966, throughput 5.01352K wps
[Epoch 105 Batch 60/62] avg loss 0.00516111, throughput 4.89921K wps
Begin Testing...
[Epoch 105] train avg loss 0.00518604, dev acc 0.8053, dev avg loss 0.416317, throughput 4.96099K wps
[Epoch 106 Batch 30/62] avg loss 0.00516285, throughput 5.01633K wps
[Epoch 106 Batch 60/62] avg loss 0.00525298, throughput 4.89312K wps
Begin Testing...
[Epoch 106] train avg loss 0.00526918, dev acc 0.8083, dev avg loss 0.411805, throughput 4.96152K wps
Observed Improvement.
Begin Testing...
[Epoch 107 Batch 30/62] avg loss 0.00503418, throughput 4.98749K wps
[Epoch 107 Batch 60/62] avg loss 0.00506521, throughput 4.86912K wps
Begin Testing...
[Epoch 107] train avg loss 0.00507507, dev acc 0.8053, dev avg loss 0.410277, throughput 4.93445K wps
[Epoch 108 Batch 30/62] avg loss 0.00504835, throughput 4.97482K wps
[Epoch 108 Batch 60/62] avg loss 0.00501197, throughput 4.89992K wps
Begin Testing...
[Epoch 108] train avg loss 0.00515046, dev acc 0.8053, dev avg loss 0.409299, throughput 4.9445K wps
[Epoch 109 Batch 30/62] avg loss 0.0048541, throughput 4.99345K wps
[Epoch 109 Batch 60/62] avg loss 0.00488713, throughput 4.8788K wps
Begin Testing...
[Epoch 109] train avg loss 0.00493263, dev acc 0.8142, dev avg loss 0.408392, throughput 4.94304K wps
Observed Improvement.
Begin Testing...
[Epoch 110 Batch 30/62] avg loss 0.0047682, throughput 4.98052K wps
[Epoch 110 Batch 60/62] avg loss 0.00515197, throughput 4.87179K wps
Begin Testing...
[Epoch 110] train avg loss 0.00497922, dev acc 0.8083, dev avg loss 0.408228, throughput 4.93347K wps
[Epoch 111 Batch 30/62] avg loss 0.00485431, throughput 4.97844K wps
[Epoch 111 Batch 60/62] avg loss 0.00477778, throughput 4.87227K wps
Begin Testing...
[Epoch 111] train avg loss 0.0048746, dev acc 0.8053, dev avg loss 0.408279, throughput 4.93175K wps
[Epoch 112 Batch 30/62] avg loss 0.00494746, throughput 5.00089K wps
[Epoch 112 Batch 60/62] avg loss 0.00462313, throughput 4.87477K wps
Begin Testing...
[Epoch 112] train avg loss 0.00483375, dev acc 0.8112, dev avg loss 0.405471, throughput 4.94327K wps
[Epoch 113 Batch 30/62] avg loss 0.00437937, throughput 4.97828K wps
[Epoch 113 Batch 60/62] avg loss 0.00503839, throughput 4.84398K wps
Begin Testing...
[Epoch 113] train avg loss 0.00475335, dev acc 0.8083, dev avg loss 0.405984, throughput 4.91815K wps
[Epoch 114 Batch 30/62] avg loss 0.00461633, throughput 4.9423K wps
[Epoch 114 Batch 60/62] avg loss 0.00462598, throughput 4.86557K wps
Begin Testing...
[Epoch 114] train avg loss 0.00463784, dev acc 0.8083, dev avg loss 0.404653, throughput 4.91125K wps
[Epoch 115 Batch 30/62] avg loss 0.00454531, throughput 4.96552K wps
[Epoch 115 Batch 60/62] avg loss 0.00457257, throughput 4.86132K wps
Begin Testing...
[Epoch 115] train avg loss 0.00461155, dev acc 0.8112, dev avg loss 0.400969, throughput 4.91991K wps
[Epoch 116 Batch 30/62] avg loss 0.00460119, throughput 4.98231K wps
[Epoch 116 Batch 60/62] avg loss 0.00446085, throughput 4.84663K wps
Begin Testing...
[Epoch 116] train avg loss 0.00461197, dev acc 0.8112, dev avg loss 0.401439, throughput 4.91972K wps
[Epoch 117 Batch 30/62] avg loss 0.00432471, throughput 4.98274K wps
[Epoch 117 Batch 60/62] avg loss 0.0043631, throughput 4.88198K wps
Begin Testing...
[Epoch 117] train avg loss 0.00438512, dev acc 0.8112, dev avg loss 0.400184, throughput 4.93942K wps
[Epoch 118 Batch 30/62] avg loss 0.00410015, throughput 5.00597K wps
[Epoch 118 Batch 60/62] avg loss 0.00432029, throughput 4.88441K wps
Begin Testing...
[Epoch 118] train avg loss 0.00426066, dev acc 0.8112, dev avg loss 0.398243, throughput 4.95141K wps
[Epoch 119 Batch 30/62] avg loss 0.00423387, throughput 5.01413K wps
[Epoch 119 Batch 60/62] avg loss 0.00448238, throughput 4.89397K wps
Begin Testing...
[Epoch 119] train avg loss 0.00443853, dev acc 0.8230, dev avg loss 0.398343, throughput 4.95942K wps
Observed Improvement.
Begin Testing...
[Epoch 120 Batch 30/62] avg loss 0.00412305, throughput 4.99838K wps
[Epoch 120 Batch 60/62] avg loss 0.0042248, throughput 4.87717K wps
Begin Testing...
[Epoch 120] train avg loss 0.0041814, dev acc 0.8201, dev avg loss 0.397306, throughput 4.94321K wps
[Epoch 121 Batch 30/62] avg loss 0.00427507, throughput 5.00505K wps
[Epoch 121 Batch 60/62] avg loss 0.0040725, throughput 4.88497K wps
Begin Testing...
[Epoch 121] train avg loss 0.00426242, dev acc 0.8201, dev avg loss 0.396684, throughput 4.95159K wps
[Epoch 122 Batch 30/62] avg loss 0.00411248, throughput 5.00243K wps
[Epoch 122 Batch 60/62] avg loss 0.00380812, throughput 4.89467K wps
Begin Testing...
[Epoch 122] train avg loss 0.0039918, dev acc 0.8083, dev avg loss 0.398209, throughput 4.95447K wps
[Epoch 123 Batch 30/62] avg loss 0.00369193, throughput 5.01162K wps
[Epoch 123 Batch 60/62] avg loss 0.00419269, throughput 4.8946K wps
Begin Testing...
[Epoch 123] train avg loss 0.00406223, dev acc 0.8171, dev avg loss 0.397375, throughput 4.95907K wps
[Epoch 124 Batch 30/62] avg loss 0.00395723, throughput 5.01162K wps
[Epoch 124 Batch 60/62] avg loss 0.00388826, throughput 4.88889K wps
Begin Testing...
[Epoch 124] train avg loss 0.00394468, dev acc 0.8201, dev avg loss 0.400801, throughput 4.95618K wps
[Epoch 125 Batch 30/62] avg loss 0.00394504, throughput 4.99505K wps
[Epoch 125 Batch 60/62] avg loss 0.00390329, throughput 4.88939K wps
Begin Testing...
[Epoch 125] train avg loss 0.00394189, dev acc 0.8142, dev avg loss 0.394962, throughput 4.94781K wps
[Epoch 126 Batch 30/62] avg loss 0.00400412, throughput 4.99615K wps
[Epoch 126 Batch 60/62] avg loss 0.00382507, throughput 4.89678K wps
Begin Testing...
[Epoch 126] train avg loss 0.00398273, dev acc 0.8171, dev avg loss 0.395907, throughput 4.95296K wps
[Epoch 127 Batch 30/62] avg loss 0.00381561, throughput 5.01282K wps
[Epoch 127 Batch 60/62] avg loss 0.00370809, throughput 4.89368K wps
Begin Testing...
[Epoch 127] train avg loss 0.00381523, dev acc 0.8260, dev avg loss 0.394029, throughput 4.95909K wps
Observed Improvement.
Begin Testing...
[Epoch 128 Batch 30/62] avg loss 0.00369838, throughput 4.99698K wps
[Epoch 128 Batch 60/62] avg loss 0.00360442, throughput 4.88184K wps
Begin Testing...
[Epoch 128] train avg loss 0.00375246, dev acc 0.8319, dev avg loss 0.392407, throughput 4.9461K wps
Observed Improvement.
Begin Testing...
[Epoch 129 Batch 30/62] avg loss 0.00372975, throughput 5.01462K wps
[Epoch 129 Batch 60/62] avg loss 0.00353732, throughput 4.88615K wps
Begin Testing...
[Epoch 129] train avg loss 0.00366075, dev acc 0.8230, dev avg loss 0.392692, throughput 4.95657K wps
[Epoch 130 Batch 30/62] avg loss 0.00356365, throughput 5.02111K wps
[Epoch 130 Batch 60/62] avg loss 0.00364061, throughput 4.89891K wps
Begin Testing...
[Epoch 130] train avg loss 0.00364356, dev acc 0.8260, dev avg loss 0.392345, throughput 4.96575K wps
[Epoch 131 Batch 30/62] avg loss 0.00332744, throughput 5.00954K wps
[Epoch 131 Batch 60/62] avg loss 0.00362329, throughput 4.87613K wps
Begin Testing...
[Epoch 131] train avg loss 0.00353071, dev acc 0.8260, dev avg loss 0.393209, throughput 4.94712K wps
[Epoch 132 Batch 30/62] avg loss 0.00341627, throughput 5.00056K wps
[Epoch 132 Batch 60/62] avg loss 0.00346592, throughput 4.88168K wps
Begin Testing...
[Epoch 132] train avg loss 0.00349313, dev acc 0.8289, dev avg loss 0.391997, throughput 4.94742K wps
[Epoch 133 Batch 30/62] avg loss 0.00334691, throughput 4.98409K wps
[Epoch 133 Batch 60/62] avg loss 0.00346485, throughput 4.89011K wps
Begin Testing...
[Epoch 133] train avg loss 0.00348029, dev acc 0.8260, dev avg loss 0.3934, throughput 4.94352K wps
[Epoch 134 Batch 30/62] avg loss 0.00327312, throughput 5.00038K wps
[Epoch 134 Batch 60/62] avg loss 0.00340439, throughput 4.88571K wps
Begin Testing...
[Epoch 134] train avg loss 0.00336217, dev acc 0.8348, dev avg loss 0.390599, throughput 4.94897K wps
Observed Improvement.
Begin Testing...
[Epoch 135 Batch 30/62] avg loss 0.00336867, throughput 5.00549K wps
[Epoch 135 Batch 60/62] avg loss 0.00316326, throughput 4.89976K wps
Begin Testing...
[Epoch 135] train avg loss 0.00330443, dev acc 0.8260, dev avg loss 0.392059, throughput 4.95841K wps
[Epoch 136 Batch 30/62] avg loss 0.00305728, throughput 5.00407K wps
[Epoch 136 Batch 60/62] avg loss 0.00327548, throughput 4.90331K wps
Begin Testing...
[Epoch 136] train avg loss 0.00340145, dev acc 0.8260, dev avg loss 0.392222, throughput 4.95944K wps
[Epoch 137 Batch 30/62] avg loss 0.00322319, throughput 5.01747K wps
[Epoch 137 Batch 60/62] avg loss 0.00310644, throughput 4.9028K wps
Begin Testing...
[Epoch 137] train avg loss 0.00319906, dev acc 0.8201, dev avg loss 0.395854, throughput 4.96456K wps
[Epoch 138 Batch 30/62] avg loss 0.00321034, throughput 4.99435K wps
[Epoch 138 Batch 60/62] avg loss 0.00310047, throughput 4.86351K wps
Begin Testing...
[Epoch 138] train avg loss 0.00318858, dev acc 0.8289, dev avg loss 0.391427, throughput 4.93568K wps
[Epoch 139 Batch 30/62] avg loss 0.00305889, throughput 5.00825K wps
[Epoch 139 Batch 60/62] avg loss 0.0031137, throughput 4.88442K wps
Begin Testing...
[Epoch 139] train avg loss 0.00309278, dev acc 0.8348, dev avg loss 0.391686, throughput 4.95279K wps
Observed Improvement.
Begin Testing...
[Epoch 140 Batch 30/62] avg loss 0.00320205, throughput 4.98979K wps
[Epoch 140 Batch 60/62] avg loss 0.0030314, throughput 4.88314K wps
Begin Testing...
[Epoch 140] train avg loss 0.00313124, dev acc 0.8260, dev avg loss 0.393503, throughput 4.94212K wps
[Epoch 141 Batch 30/62] avg loss 0.00292347, throughput 4.98221K wps
[Epoch 141 Batch 60/62] avg loss 0.00307604, throughput 4.885K wps
Begin Testing...
[Epoch 141] train avg loss 0.00300402, dev acc 0.8466, dev avg loss 0.389414, throughput 4.94029K wps
Observed Improvement.
Begin Testing...
[Epoch 142 Batch 30/62] avg loss 0.00297664, throughput 4.98029K wps
[Epoch 142 Batch 60/62] avg loss 0.00310465, throughput 4.89347K wps
Begin Testing...
[Epoch 142] train avg loss 0.0030854, dev acc 0.8348, dev avg loss 0.389911, throughput 4.94426K wps
[Epoch 143 Batch 30/62] avg loss 0.00282909, throughput 4.97571K wps
[Epoch 143 Batch 60/62] avg loss 0.00300755, throughput 4.86271K wps
Begin Testing...
[Epoch 143] train avg loss 0.00294181, dev acc 0.8378, dev avg loss 0.389868, throughput 4.92659K wps
[Epoch 144 Batch 30/62] avg loss 0.00288377, throughput 5.00044K wps
[Epoch 144 Batch 60/62] avg loss 0.00284245, throughput 4.85805K wps
Begin Testing...
[Epoch 144] train avg loss 0.00290265, dev acc 0.8348, dev avg loss 0.389859, throughput 4.93593K wps
[Epoch 145 Batch 30/62] avg loss 0.00281506, throughput 4.98679K wps
[Epoch 145 Batch 60/62] avg loss 0.00284996, throughput 4.88379K wps
Begin Testing...
[Epoch 145] train avg loss 0.00287047, dev acc 0.8289, dev avg loss 0.392505, throughput 4.94141K wps
[Epoch 146 Batch 30/62] avg loss 0.00285922, throughput 5.00126K wps
[Epoch 146 Batch 60/62] avg loss 0.00286816, throughput 4.89176K wps
Begin Testing...
[Epoch 146] train avg loss 0.00291425, dev acc 0.8348, dev avg loss 0.389311, throughput 4.95119K wps
[Epoch 147 Batch 30/62] avg loss 0.00288321, throughput 5.00748K wps
[Epoch 147 Batch 60/62] avg loss 0.00256623, throughput 4.88769K wps
Begin Testing...
[Epoch 147] train avg loss 0.00274891, dev acc 0.8289, dev avg loss 0.392328, throughput 4.95301K wps
[Epoch 148 Batch 30/62] avg loss 0.00266411, throughput 4.9943K wps
[Epoch 148 Batch 60/62] avg loss 0.002625, throughput 4.8587K wps
Begin Testing...
[Epoch 148] train avg loss 0.00266611, dev acc 0.8348, dev avg loss 0.389587, throughput 4.93232K wps
[Epoch 149 Batch 30/62] avg loss 0.00267331, throughput 4.97129K wps
[Epoch 149 Batch 60/62] avg loss 0.00269532, throughput 4.89745K wps
Begin Testing...
[Epoch 149] train avg loss 0.00268477, dev acc 0.8407, dev avg loss 0.387098, throughput 4.94101K wps
[Epoch 150 Batch 30/62] avg loss 0.00260969, throughput 4.98049K wps
[Epoch 150 Batch 60/62] avg loss 0.00270488, throughput 4.88643K wps
Begin Testing...
[Epoch 150] train avg loss 0.00266968, dev acc 0.8466, dev avg loss 0.386203, throughput 4.94042K wps
Observed Improvement.
Begin Testing...
[Epoch 151 Batch 30/62] avg loss 0.00254801, throughput 4.99487K wps
[Epoch 151 Batch 60/62] avg loss 0.00252551, throughput 4.89113K wps
Begin Testing...
[Epoch 151] train avg loss 0.00256716, dev acc 0.8437, dev avg loss 0.386807, throughput 4.9495K wps
[Epoch 152 Batch 30/62] avg loss 0.00251921, throughput 5.01953K wps
[Epoch 152 Batch 60/62] avg loss 0.00250403, throughput 4.8825K wps
Begin Testing...
[Epoch 152] train avg loss 0.00251003, dev acc 0.8378, dev avg loss 0.387781, throughput 4.95572K wps
[Epoch 153 Batch 30/62] avg loss 0.00246901, throughput 5.0034K wps
[Epoch 153 Batch 60/62] avg loss 0.00260424, throughput 4.89395K wps
Begin Testing...
[Epoch 153] train avg loss 0.00256248, dev acc 0.8466, dev avg loss 0.389264, throughput 4.95519K wps
Observed Improvement.
Begin Testing...
[Epoch 154 Batch 30/62] avg loss 0.00251246, throughput 4.99964K wps
[Epoch 154 Batch 60/62] avg loss 0.002526, throughput 4.88769K wps
Begin Testing...
[Epoch 154] train avg loss 0.00257194, dev acc 0.8407, dev avg loss 0.389412, throughput 4.94925K wps
[Epoch 155 Batch 30/62] avg loss 0.00250752, throughput 4.98152K wps
[Epoch 155 Batch 60/62] avg loss 0.00228889, throughput 4.88493K wps
Begin Testing...
[Epoch 155] train avg loss 0.00242911, dev acc 0.8496, dev avg loss 0.388342, throughput 4.93934K wps
Observed Improvement.
Begin Testing...
[Epoch 156 Batch 30/62] avg loss 0.00237974, throughput 5.00219K wps
[Epoch 156 Batch 60/62] avg loss 0.00239359, throughput 4.90092K wps
Begin Testing...
[Epoch 156] train avg loss 0.00246279, dev acc 0.8348, dev avg loss 0.391191, throughput 4.95745K wps
[Epoch 157 Batch 30/62] avg loss 0.0023596, throughput 4.99483K wps
[Epoch 157 Batch 60/62] avg loss 0.00237647, throughput 4.8898K wps
Begin Testing...
[Epoch 157] train avg loss 0.00239496, dev acc 0.8407, dev avg loss 0.389377, throughput 4.9493K wps
[Epoch 158 Batch 30/62] avg loss 0.00230865, throughput 5.00345K wps
[Epoch 158 Batch 60/62] avg loss 0.00232417, throughput 4.89326K wps
Begin Testing...
[Epoch 158] train avg loss 0.00233662, dev acc 0.8348, dev avg loss 0.388202, throughput 4.95492K wps
[Epoch 159 Batch 30/62] avg loss 0.00241989, throughput 4.9993K wps
[Epoch 159 Batch 60/62] avg loss 0.00223782, throughput 4.90187K wps
Begin Testing...
[Epoch 159] train avg loss 0.00236605, dev acc 0.8378, dev avg loss 0.390386, throughput 4.95617K wps
[Epoch 160 Batch 30/62] avg loss 0.00219394, throughput 5.01845K wps
[Epoch 160 Batch 60/62] avg loss 0.00227561, throughput 4.88454K wps
Begin Testing...
[Epoch 160] train avg loss 0.00225527, dev acc 0.8466, dev avg loss 0.387901, throughput 4.95748K wps
[Epoch 161 Batch 30/62] avg loss 0.00235434, throughput 4.98708K wps
[Epoch 161 Batch 60/62] avg loss 0.00217358, throughput 4.90204K wps
Begin Testing...
[Epoch 161] train avg loss 0.00229326, dev acc 0.8466, dev avg loss 0.388289, throughput 4.95085K wps
[Epoch 162 Batch 30/62] avg loss 0.00228666, throughput 5.01166K wps
[Epoch 162 Batch 60/62] avg loss 0.00216453, throughput 4.89837K wps
Begin Testing...
[Epoch 162] train avg loss 0.00224456, dev acc 0.8437, dev avg loss 0.388144, throughput 4.96094K wps
[Epoch 163 Batch 30/62] avg loss 0.00206885, throughput 4.99345K wps
[Epoch 163 Batch 60/62] avg loss 0.00214789, throughput 4.88862K wps
Begin Testing...
[Epoch 163] train avg loss 0.00212332, dev acc 0.8407, dev avg loss 0.390766, throughput 4.94748K wps
[Epoch 164 Batch 30/62] avg loss 0.00218455, throughput 4.99032K wps
[Epoch 164 Batch 60/62] avg loss 0.00208375, throughput 4.88835K wps
Begin Testing...
[Epoch 164] train avg loss 0.00218471, dev acc 0.8496, dev avg loss 0.389455, throughput 4.94649K wps
Observed Improvement.
Begin Testing...
[Epoch 165 Batch 30/62] avg loss 0.00216253, throughput 4.99527K wps
[Epoch 165 Batch 60/62] avg loss 0.00204132, throughput 4.89562K wps
Begin Testing...
[Epoch 165] train avg loss 0.00212259, dev acc 0.8496, dev avg loss 0.38979, throughput 4.9517K wps
Observed Improvement.
Begin Testing...
[Epoch 166 Batch 30/62] avg loss 0.00203395, throughput 5.00008K wps
[Epoch 166 Batch 60/62] avg loss 0.00215672, throughput 4.8922K wps
Begin Testing...
[Epoch 166] train avg loss 0.00213342, dev acc 0.8378, dev avg loss 0.394106, throughput 4.95092K wps
[Epoch 167 Batch 30/62] avg loss 0.00196413, throughput 4.99202K wps
[Epoch 167 Batch 60/62] avg loss 0.00202052, throughput 4.90054K wps
Begin Testing...
[Epoch 167] train avg loss 0.00200187, dev acc 0.8348, dev avg loss 0.397078, throughput 4.95348K wps
[Epoch 168 Batch 30/62] avg loss 0.00217602, throughput 5.01513K wps
[Epoch 168 Batch 60/62] avg loss 0.00183116, throughput 4.89778K wps
Begin Testing...
[Epoch 168] train avg loss 0.00204192, dev acc 0.8437, dev avg loss 0.391497, throughput 4.96168K wps
[Epoch 169 Batch 30/62] avg loss 0.00193875, throughput 4.98429K wps
[Epoch 169 Batch 60/62] avg loss 0.00189081, throughput 4.88427K wps
Begin Testing...
[Epoch 169] train avg loss 0.00195472, dev acc 0.8407, dev avg loss 0.39084, throughput 4.9403K wps
[Epoch 170 Batch 30/62] avg loss 0.0020793, throughput 5.0056K wps
[Epoch 170 Batch 60/62] avg loss 0.00194151, throughput 4.89419K wps
Begin Testing...
[Epoch 170] train avg loss 0.00202359, dev acc 0.8466, dev avg loss 0.391919, throughput 4.95694K wps
[Epoch 171 Batch 30/62] avg loss 0.00200306, throughput 4.98059K wps
[Epoch 171 Batch 60/62] avg loss 0.00183403, throughput 4.89449K wps
Begin Testing...
[Epoch 171] train avg loss 0.00196644, dev acc 0.8525, dev avg loss 0.393024, throughput 4.94436K wps
Observed Improvement.
Begin Testing...
[Epoch 172 Batch 30/62] avg loss 0.00183184, throughput 5.01795K wps
[Epoch 172 Batch 60/62] avg loss 0.00192851, throughput 4.89204K wps
Begin Testing...
[Epoch 172] train avg loss 0.00188732, dev acc 0.8437, dev avg loss 0.394564, throughput 4.95988K wps
[Epoch 173 Batch 30/62] avg loss 0.00189503, throughput 5.00964K wps
[Epoch 173 Batch 60/62] avg loss 0.00186316, throughput 4.89847K wps
Begin Testing...
[Epoch 173] train avg loss 0.00190485, dev acc 0.8437, dev avg loss 0.392641, throughput 4.96039K wps
[Epoch 174 Batch 30/62] avg loss 0.00181097, throughput 5.01078K wps
[Epoch 174 Batch 60/62] avg loss 0.00173642, throughput 4.87741K wps
Begin Testing...
[Epoch 174] train avg loss 0.00183047, dev acc 0.8348, dev avg loss 0.400039, throughput 4.94966K wps
[Epoch 175 Batch 30/62] avg loss 0.00183683, throughput 5.00256K wps
[Epoch 175 Batch 60/62] avg loss 0.00180998, throughput 4.86913K wps
Begin Testing...
[Epoch 175] train avg loss 0.00186447, dev acc 0.8348, dev avg loss 0.398433, throughput 4.94034K wps
[Epoch 176 Batch 30/62] avg loss 0.00175337, throughput 4.99299K wps
[Epoch 176 Batch 60/62] avg loss 0.00172103, throughput 4.88025K wps
Begin Testing...
[Epoch 176] train avg loss 0.00176142, dev acc 0.8437, dev avg loss 0.39435, throughput 4.94357K wps
[Epoch 177 Batch 30/62] avg loss 0.0017632, throughput 4.98947K wps
[Epoch 177 Batch 60/62] avg loss 0.00174228, throughput 4.88975K wps
Begin Testing...
[Epoch 177] train avg loss 0.00179981, dev acc 0.8348, dev avg loss 0.399768, throughput 4.94622K wps
[Epoch 178 Batch 30/62] avg loss 0.00166199, throughput 4.97921K wps
[Epoch 178 Batch 60/62] avg loss 0.00171901, throughput 4.9K wps
Begin Testing...
[Epoch 178] train avg loss 0.00170795, dev acc 0.8378, dev avg loss 0.394962, throughput 4.94654K wps
[Epoch 179 Batch 30/62] avg loss 0.00168047, throughput 4.97356K wps
[Epoch 179 Batch 60/62] avg loss 0.00174728, throughput 4.88946K wps
Begin Testing...
[Epoch 179] train avg loss 0.00173277, dev acc 0.8289, dev avg loss 0.403049, throughput 4.93886K wps
[Epoch 180 Batch 30/62] avg loss 0.0017634, throughput 5.02058K wps
[Epoch 180 Batch 60/62] avg loss 0.00164898, throughput 4.90136K wps
Begin Testing...
[Epoch 180] train avg loss 0.00173766, dev acc 0.8437, dev avg loss 0.394609, throughput 4.96664K wps
[Epoch 181 Batch 30/62] avg loss 0.00167945, throughput 4.97607K wps
[Epoch 181 Batch 60/62] avg loss 0.00163758, throughput 4.86472K wps
Begin Testing...
[Epoch 181] train avg loss 0.00166178, dev acc 0.8407, dev avg loss 0.395844, throughput 4.92685K wps
[Epoch 182 Batch 30/62] avg loss 0.00164031, throughput 4.97803K wps
[Epoch 182 Batch 60/62] avg loss 0.0016849, throughput 4.89858K wps
Begin Testing...
[Epoch 182] train avg loss 0.0016695, dev acc 0.8437, dev avg loss 0.39837, throughput 4.94603K wps
[Epoch 183 Batch 30/62] avg loss 0.00155892, throughput 5.01126K wps
[Epoch 183 Batch 60/62] avg loss 0.00159496, throughput 4.89132K wps
Begin Testing...
[Epoch 183] train avg loss 0.00158763, dev acc 0.8378, dev avg loss 0.396052, throughput 4.95672K wps
[Epoch 184 Batch 30/62] avg loss 0.00170481, throughput 4.9897K wps
[Epoch 184 Batch 60/62] avg loss 0.00152573, throughput 4.88294K wps
Begin Testing...
[Epoch 184] train avg loss 0.00162041, dev acc 0.8407, dev avg loss 0.396436, throughput 4.94162K wps
[Epoch 185 Batch 30/62] avg loss 0.00153482, throughput 4.97747K wps
[Epoch 185 Batch 60/62] avg loss 0.0015688, throughput 4.90239K wps
Begin Testing...
[Epoch 185] train avg loss 0.00158424, dev acc 0.8378, dev avg loss 0.396656, throughput 4.94656K wps
[Epoch 186 Batch 30/62] avg loss 0.00157413, throughput 4.98949K wps
[Epoch 186 Batch 60/62] avg loss 0.0016053, throughput 4.90406K wps
Begin Testing...
[Epoch 186] train avg loss 0.00161825, dev acc 0.8378, dev avg loss 0.395942, throughput 4.95387K wps
[Epoch 187 Batch 30/62] avg loss 0.0014587, throughput 4.97419K wps
[Epoch 187 Batch 60/62] avg loss 0.00154983, throughput 4.88097K wps
Begin Testing...
[Epoch 187] train avg loss 0.00150659, dev acc 0.8407, dev avg loss 0.397669, throughput 4.93377K wps
[Epoch 188 Batch 30/62] avg loss 0.00146536, throughput 4.98463K wps
[Epoch 188 Batch 60/62] avg loss 0.00156024, throughput 4.89204K wps
Begin Testing...
[Epoch 188] train avg loss 0.00156372, dev acc 0.8348, dev avg loss 0.397769, throughput 4.94615K wps
[Epoch 189 Batch 30/62] avg loss 0.00149309, throughput 5.00601K wps
[Epoch 189 Batch 60/62] avg loss 0.00135916, throughput 4.89527K wps
Begin Testing...
[Epoch 189] train avg loss 0.0014439, dev acc 0.8378, dev avg loss 0.396965, throughput 4.95661K wps
[Epoch 190 Batch 30/62] avg loss 0.00153881, throughput 4.97761K wps
[Epoch 190 Batch 60/62] avg loss 0.00147426, throughput 4.87008K wps
Begin Testing...
[Epoch 190] train avg loss 0.001519, dev acc 0.8407, dev avg loss 0.399442, throughput 4.93097K wps
[Epoch 191 Batch 30/62] avg loss 0.00141634, throughput 4.98515K wps
[Epoch 191 Batch 60/62] avg loss 0.00136018, throughput 4.87652K wps
Begin Testing...
[Epoch 191] train avg loss 0.00141442, dev acc 0.8319, dev avg loss 0.409597, throughput 4.93606K wps
[Epoch 192 Batch 30/62] avg loss 0.00129663, throughput 4.99307K wps
[Epoch 192 Batch 60/62] avg loss 0.00143448, throughput 4.90158K wps
Begin Testing...
[Epoch 192] train avg loss 0.00137822, dev acc 0.8437, dev avg loss 0.402346, throughput 4.95387K wps
[Epoch 193 Batch 30/62] avg loss 0.00142262, throughput 5.01469K wps
[Epoch 193 Batch 60/62] avg loss 0.00143319, throughput 4.88703K wps
Begin Testing...
[Epoch 193] train avg loss 0.0014601, dev acc 0.8407, dev avg loss 0.400699, throughput 4.95684K wps
[Epoch 194 Batch 30/62] avg loss 0.00132635, throughput 4.99758K wps
[Epoch 194 Batch 60/62] avg loss 0.00135918, throughput 4.90245K wps
Begin Testing...
[Epoch 194] train avg loss 0.00138583, dev acc 0.8319, dev avg loss 0.400234, throughput 4.95664K wps
[Epoch 195 Batch 30/62] avg loss 0.0013658, throughput 5.00364K wps
[Epoch 195 Batch 60/62] avg loss 0.00132236, throughput 4.90493K wps
Begin Testing...
[Epoch 195] train avg loss 0.00136816, dev acc 0.8407, dev avg loss 0.400685, throughput 4.95899K wps
[Epoch 196 Batch 30/62] avg loss 0.00132178, throughput 4.99326K wps
[Epoch 196 Batch 60/62] avg loss 0.00136811, throughput 4.89325K wps
Begin Testing...
[Epoch 196] train avg loss 0.00140072, dev acc 0.8378, dev avg loss 0.401682, throughput 4.95003K wps
[Epoch 197 Batch 30/62] avg loss 0.00129307, throughput 5.00724K wps
[Epoch 197 Batch 60/62] avg loss 0.00137391, throughput 4.89468K wps
Begin Testing...
[Epoch 197] train avg loss 0.00135494, dev acc 0.8378, dev avg loss 0.40162, throughput 4.95788K wps
[Epoch 198 Batch 30/62] avg loss 0.00133391, throughput 4.99246K wps
[Epoch 198 Batch 60/62] avg loss 0.00127008, throughput 4.8834K wps
Begin Testing...
[Epoch 198] train avg loss 0.00130283, dev acc 0.8378, dev avg loss 0.401653, throughput 4.94213K wps
[Epoch 199 Batch 30/62] avg loss 0.00123552, throughput 5.01571K wps
[Epoch 199 Batch 60/62] avg loss 0.00124931, throughput 4.89792K wps
Begin Testing...
[Epoch 199] train avg loss 0.00124015, dev acc 0.8407, dev avg loss 0.403469, throughput 4.96396K wps
Test loss 0.531078, test acc 0.7692
Total time cost 275.62s
[Epoch 0 Batch 30/62] avg loss 0.01336, throughput 4.77957K wps
[Epoch 0 Batch 60/62] avg loss 0.012912, throughput 4.84508K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133091, dev acc 0.6254, dev avg loss 0.673061, throughput 4.82225K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0132583, throughput 4.99959K wps
[Epoch 1 Batch 60/62] avg loss 0.0128917, throughput 4.88904K wps
Begin Testing...
[Epoch 1] train avg loss 0.0132824, dev acc 0.6254, dev avg loss 0.660356, throughput 4.95066K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0130497, throughput 4.98164K wps
[Epoch 2 Batch 60/62] avg loss 0.0130315, throughput 4.87871K wps
Begin Testing...
[Epoch 2] train avg loss 0.0132269, dev acc 0.6254, dev avg loss 0.660829, throughput 4.93736K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0130462, throughput 4.99796K wps
[Epoch 3 Batch 60/62] avg loss 0.0129747, throughput 4.87818K wps
Begin Testing...
[Epoch 3] train avg loss 0.0131704, dev acc 0.6254, dev avg loss 0.659977, throughput 4.94321K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0130373, throughput 4.98633K wps
[Epoch 4 Batch 60/62] avg loss 0.0129944, throughput 4.87126K wps
Begin Testing...
[Epoch 4] train avg loss 0.0131688, dev acc 0.6254, dev avg loss 0.659044, throughput 4.93449K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0128627, throughput 4.99666K wps
[Epoch 5 Batch 60/62] avg loss 0.0130078, throughput 4.89367K wps
Begin Testing...
[Epoch 5] train avg loss 0.0131188, dev acc 0.6254, dev avg loss 0.657983, throughput 4.95296K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0128716, throughput 4.99632K wps
[Epoch 6 Batch 60/62] avg loss 0.0130058, throughput 4.87773K wps
Begin Testing...
[Epoch 6] train avg loss 0.0131065, dev acc 0.6254, dev avg loss 0.656541, throughput 4.94338K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0129542, throughput 5.01077K wps
[Epoch 7 Batch 60/62] avg loss 0.0128552, throughput 4.89569K wps
Begin Testing...
[Epoch 7] train avg loss 0.0131122, dev acc 0.6254, dev avg loss 0.655797, throughput 4.95952K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0127238, throughput 4.99259K wps
[Epoch 8 Batch 60/62] avg loss 0.013017, throughput 4.88653K wps
Begin Testing...
[Epoch 8] train avg loss 0.0130435, dev acc 0.6254, dev avg loss 0.655123, throughput 4.94498K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0129714, throughput 5.0111K wps
[Epoch 9 Batch 60/62] avg loss 0.0126669, throughput 4.89108K wps
Begin Testing...
[Epoch 9] train avg loss 0.0129916, dev acc 0.6254, dev avg loss 0.656239, throughput 4.95699K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.012883, throughput 5.00724K wps
[Epoch 10 Batch 60/62] avg loss 0.0127933, throughput 4.90422K wps
Begin Testing...
[Epoch 10] train avg loss 0.0130416, dev acc 0.6254, dev avg loss 0.653093, throughput 4.9625K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0125835, throughput 5.00696K wps
[Epoch 11 Batch 60/62] avg loss 0.0129153, throughput 4.87636K wps
Begin Testing...
[Epoch 11] train avg loss 0.0129216, dev acc 0.6254, dev avg loss 0.652259, throughput 4.94672K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0126428, throughput 4.9894K wps
[Epoch 12 Batch 60/62] avg loss 0.0128271, throughput 4.88609K wps
Begin Testing...
[Epoch 12] train avg loss 0.012944, dev acc 0.6254, dev avg loss 0.651065, throughput 4.94425K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0125921, throughput 4.99878K wps
[Epoch 13 Batch 60/62] avg loss 0.0128013, throughput 4.88514K wps
Begin Testing...
[Epoch 13] train avg loss 0.0128709, dev acc 0.6254, dev avg loss 0.649993, throughput 4.94907K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0126463, throughput 4.99673K wps
[Epoch 14 Batch 60/62] avg loss 0.0127647, throughput 4.89614K wps
Begin Testing...
[Epoch 14] train avg loss 0.0128497, dev acc 0.6254, dev avg loss 0.649536, throughput 4.95345K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.012616, throughput 4.99562K wps
[Epoch 15 Batch 60/62] avg loss 0.0126239, throughput 4.88848K wps
Begin Testing...
[Epoch 15] train avg loss 0.0127912, dev acc 0.6254, dev avg loss 0.647391, throughput 4.94933K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0128558, throughput 5.01412K wps
[Epoch 16 Batch 60/62] avg loss 0.0122782, throughput 4.90396K wps
Begin Testing...
[Epoch 16] train avg loss 0.0127719, dev acc 0.6254, dev avg loss 0.646192, throughput 4.96399K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0124311, throughput 5.0018K wps
[Epoch 17 Batch 60/62] avg loss 0.0126237, throughput 4.88755K wps
Begin Testing...
[Epoch 17] train avg loss 0.0127006, dev acc 0.6254, dev avg loss 0.644399, throughput 4.95038K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.012422, throughput 4.98795K wps
[Epoch 18 Batch 60/62] avg loss 0.0125258, throughput 4.88375K wps
Begin Testing...
[Epoch 18] train avg loss 0.0126057, dev acc 0.6254, dev avg loss 0.643861, throughput 4.94307K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0124628, throughput 5.01651K wps
[Epoch 19 Batch 60/62] avg loss 0.012419, throughput 4.88442K wps
Begin Testing...
[Epoch 19] train avg loss 0.0125569, dev acc 0.6254, dev avg loss 0.641771, throughput 4.95612K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0125238, throughput 4.99015K wps
[Epoch 20 Batch 60/62] avg loss 0.0122902, throughput 4.89961K wps
Begin Testing...
[Epoch 20] train avg loss 0.0125672, dev acc 0.6254, dev avg loss 0.639592, throughput 4.95129K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.012437, throughput 4.99885K wps
[Epoch 21 Batch 60/62] avg loss 0.012196, throughput 4.89169K wps
Begin Testing...
[Epoch 21] train avg loss 0.0124417, dev acc 0.6254, dev avg loss 0.638808, throughput 4.95205K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.012295, throughput 5.00374K wps
[Epoch 22 Batch 60/62] avg loss 0.0122443, throughput 4.88165K wps
Begin Testing...
[Epoch 22] train avg loss 0.0124384, dev acc 0.6254, dev avg loss 0.63493, throughput 4.94897K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.0121658, throughput 4.9969K wps
[Epoch 23 Batch 60/62] avg loss 0.0121789, throughput 4.89707K wps
Begin Testing...
[Epoch 23] train avg loss 0.0123361, dev acc 0.6254, dev avg loss 0.632693, throughput 4.95194K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.0118641, throughput 5.0137K wps
[Epoch 24 Batch 60/62] avg loss 0.0123017, throughput 4.88586K wps
Begin Testing...
[Epoch 24] train avg loss 0.0122889, dev acc 0.6283, dev avg loss 0.630126, throughput 4.95627K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.0120022, throughput 5.00152K wps
[Epoch 25 Batch 60/62] avg loss 0.0120649, throughput 4.88462K wps
Begin Testing...
[Epoch 25] train avg loss 0.0122283, dev acc 0.6313, dev avg loss 0.627481, throughput 4.94869K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.0119961, throughput 4.98453K wps
[Epoch 26 Batch 60/62] avg loss 0.0119565, throughput 4.88069K wps
Begin Testing...
[Epoch 26] train avg loss 0.0121113, dev acc 0.6342, dev avg loss 0.624985, throughput 4.93948K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/62] avg loss 0.0118694, throughput 4.9834K wps
[Epoch 27 Batch 60/62] avg loss 0.0120658, throughput 4.89537K wps
Begin Testing...
[Epoch 27] train avg loss 0.0120894, dev acc 0.6431, dev avg loss 0.622075, throughput 4.9456K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.0117464, throughput 5.00668K wps
[Epoch 28 Batch 60/62] avg loss 0.0118607, throughput 4.89068K wps
Begin Testing...
[Epoch 28] train avg loss 0.0119716, dev acc 0.6460, dev avg loss 0.619284, throughput 4.95423K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.0117651, throughput 4.98991K wps
[Epoch 29 Batch 60/62] avg loss 0.011629, throughput 4.89468K wps
Begin Testing...
[Epoch 29] train avg loss 0.0118735, dev acc 0.6490, dev avg loss 0.616616, throughput 4.94971K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.0116179, throughput 4.9999K wps
[Epoch 30 Batch 60/62] avg loss 0.0117194, throughput 4.86637K wps
Begin Testing...
[Epoch 30] train avg loss 0.0117576, dev acc 0.6490, dev avg loss 0.613807, throughput 4.94003K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.0115263, throughput 4.99238K wps
[Epoch 31 Batch 60/62] avg loss 0.01158, throughput 4.89477K wps
Begin Testing...
[Epoch 31] train avg loss 0.0117476, dev acc 0.6578, dev avg loss 0.610611, throughput 4.94955K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/62] avg loss 0.0114707, throughput 5.0017K wps
[Epoch 32 Batch 60/62] avg loss 0.0113835, throughput 4.89604K wps
Begin Testing...
[Epoch 32] train avg loss 0.0116396, dev acc 0.6637, dev avg loss 0.607555, throughput 4.95461K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/62] avg loss 0.0113533, throughput 4.99655K wps
[Epoch 33 Batch 60/62] avg loss 0.0114418, throughput 4.89778K wps
Begin Testing...
[Epoch 33] train avg loss 0.0115406, dev acc 0.6578, dev avg loss 0.604623, throughput 4.95423K wps
[Epoch 34 Batch 30/62] avg loss 0.011194, throughput 5.00811K wps
[Epoch 34 Batch 60/62] avg loss 0.0115173, throughput 4.89766K wps
Begin Testing...
[Epoch 34] train avg loss 0.0115361, dev acc 0.6726, dev avg loss 0.601864, throughput 4.95872K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/62] avg loss 0.0112595, throughput 5.00096K wps
[Epoch 35 Batch 60/62] avg loss 0.01122, throughput 4.88993K wps
Begin Testing...
[Epoch 35] train avg loss 0.0114359, dev acc 0.6755, dev avg loss 0.598966, throughput 4.95215K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/62] avg loss 0.011255, throughput 5.00467K wps
[Epoch 36 Batch 60/62] avg loss 0.0111212, throughput 4.86854K wps
Begin Testing...
[Epoch 36] train avg loss 0.0113927, dev acc 0.6696, dev avg loss 0.596744, throughput 4.94265K wps
[Epoch 37 Batch 30/62] avg loss 0.0110734, throughput 5.00749K wps
[Epoch 37 Batch 60/62] avg loss 0.0110559, throughput 4.89344K wps
Begin Testing...
[Epoch 37] train avg loss 0.0111513, dev acc 0.6696, dev avg loss 0.594387, throughput 4.95684K wps
[Epoch 38 Batch 30/62] avg loss 0.0110944, throughput 4.96903K wps
[Epoch 38 Batch 60/62] avg loss 0.0109426, throughput 4.88869K wps
Begin Testing...
[Epoch 38] train avg loss 0.0111739, dev acc 0.6667, dev avg loss 0.591974, throughput 4.93552K wps
[Epoch 39 Batch 30/62] avg loss 0.011036, throughput 5.00039K wps
[Epoch 39 Batch 60/62] avg loss 0.0107794, throughput 4.88119K wps
Begin Testing...
[Epoch 39] train avg loss 0.0110442, dev acc 0.6873, dev avg loss 0.588274, throughput 4.94519K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/62] avg loss 0.0108052, throughput 5.00133K wps
[Epoch 40 Batch 60/62] avg loss 0.0109448, throughput 4.87907K wps
Begin Testing...
[Epoch 40] train avg loss 0.0109738, dev acc 0.6903, dev avg loss 0.585455, throughput 4.94688K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.0105102, throughput 5.0065K wps
[Epoch 41 Batch 60/62] avg loss 0.0109877, throughput 4.87359K wps
Begin Testing...
[Epoch 41] train avg loss 0.0109305, dev acc 0.6932, dev avg loss 0.583081, throughput 4.94589K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/62] avg loss 0.0107308, throughput 4.99251K wps
[Epoch 42 Batch 60/62] avg loss 0.0104996, throughput 4.86091K wps
Begin Testing...
[Epoch 42] train avg loss 0.0107378, dev acc 0.6932, dev avg loss 0.58012, throughput 4.93224K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/62] avg loss 0.0106254, throughput 4.99229K wps
[Epoch 43 Batch 60/62] avg loss 0.0105415, throughput 4.89875K wps
Begin Testing...
[Epoch 43] train avg loss 0.0106859, dev acc 0.6844, dev avg loss 0.57904, throughput 4.9515K wps
[Epoch 44 Batch 30/62] avg loss 0.010366, throughput 4.99834K wps
[Epoch 44 Batch 60/62] avg loss 0.010589, throughput 4.87908K wps
Begin Testing...
[Epoch 44] train avg loss 0.0106727, dev acc 0.6873, dev avg loss 0.574342, throughput 4.9442K wps
[Epoch 45 Batch 30/62] avg loss 0.0104926, throughput 4.99556K wps
[Epoch 45 Batch 60/62] avg loss 0.0102926, throughput 4.89173K wps
Begin Testing...
[Epoch 45] train avg loss 0.0104922, dev acc 0.6932, dev avg loss 0.571351, throughput 4.94999K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/62] avg loss 0.0102199, throughput 4.99168K wps
[Epoch 46 Batch 60/62] avg loss 0.0104261, throughput 4.89015K wps
Begin Testing...
[Epoch 46] train avg loss 0.0104384, dev acc 0.6903, dev avg loss 0.568717, throughput 4.9483K wps
[Epoch 47 Batch 30/62] avg loss 0.00996136, throughput 5.00742K wps
[Epoch 47 Batch 60/62] avg loss 0.0104811, throughput 4.88882K wps
Begin Testing...
[Epoch 47] train avg loss 0.010344, dev acc 0.6844, dev avg loss 0.565502, throughput 4.95316K wps
[Epoch 48 Batch 30/62] avg loss 0.0100671, throughput 5.01022K wps
[Epoch 48 Batch 60/62] avg loss 0.0102907, throughput 4.8777K wps
Begin Testing...
[Epoch 48] train avg loss 0.0102874, dev acc 0.6844, dev avg loss 0.562558, throughput 4.95067K wps
[Epoch 49 Batch 30/62] avg loss 0.00995127, throughput 4.995K wps
[Epoch 49 Batch 60/62] avg loss 0.0102187, throughput 4.87594K wps
Begin Testing...
[Epoch 49] train avg loss 0.0101989, dev acc 0.6873, dev avg loss 0.559735, throughput 4.9421K wps
[Epoch 50 Batch 30/62] avg loss 0.00988469, throughput 5.01075K wps
[Epoch 50 Batch 60/62] avg loss 0.0100785, throughput 4.88166K wps
Begin Testing...
[Epoch 50] train avg loss 0.0100822, dev acc 0.7080, dev avg loss 0.557218, throughput 4.95125K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/62] avg loss 0.0100537, throughput 5.00992K wps
[Epoch 51 Batch 60/62] avg loss 0.00975152, throughput 4.87915K wps
Begin Testing...
[Epoch 51] train avg loss 0.0100153, dev acc 0.6991, dev avg loss 0.554347, throughput 4.95035K wps
[Epoch 52 Batch 30/62] avg loss 0.00962117, throughput 4.99097K wps
[Epoch 52 Batch 60/62] avg loss 0.00988847, throughput 4.89082K wps
Begin Testing...
[Epoch 52] train avg loss 0.00989617, dev acc 0.7021, dev avg loss 0.552154, throughput 4.94788K wps
[Epoch 53 Batch 30/62] avg loss 0.00974638, throughput 4.99187K wps
[Epoch 53 Batch 60/62] avg loss 0.00975773, throughput 4.88332K wps
Begin Testing...
[Epoch 53] train avg loss 0.00990403, dev acc 0.7168, dev avg loss 0.551742, throughput 4.94509K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/62] avg loss 0.00972323, throughput 5.00515K wps
[Epoch 54 Batch 60/62] avg loss 0.00945459, throughput 4.8972K wps
Begin Testing...
[Epoch 54] train avg loss 0.00975277, dev acc 0.7050, dev avg loss 0.547333, throughput 4.95747K wps
[Epoch 55 Batch 30/62] avg loss 0.00958361, throughput 4.98715K wps
[Epoch 55 Batch 60/62] avg loss 0.00951923, throughput 4.89434K wps
Begin Testing...
[Epoch 55] train avg loss 0.00969154, dev acc 0.7198, dev avg loss 0.544959, throughput 4.9469K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.00940593, throughput 4.99707K wps
[Epoch 56 Batch 60/62] avg loss 0.00956216, throughput 4.8843K wps
Begin Testing...
[Epoch 56] train avg loss 0.00969287, dev acc 0.7463, dev avg loss 0.54622, throughput 4.94629K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/62] avg loss 0.00933603, throughput 5.00488K wps
[Epoch 57 Batch 60/62] avg loss 0.00939402, throughput 4.89262K wps
Begin Testing...
[Epoch 57] train avg loss 0.00949146, dev acc 0.7227, dev avg loss 0.53817, throughput 4.95666K wps
[Epoch 58 Batch 30/62] avg loss 0.00928123, throughput 4.99157K wps
[Epoch 58 Batch 60/62] avg loss 0.00926794, throughput 4.89175K wps
Begin Testing...
[Epoch 58] train avg loss 0.0093687, dev acc 0.7404, dev avg loss 0.538278, throughput 4.94822K wps
[Epoch 59 Batch 30/62] avg loss 0.00908648, throughput 5.00515K wps
[Epoch 59 Batch 60/62] avg loss 0.00929124, throughput 4.87703K wps
Begin Testing...
[Epoch 59] train avg loss 0.00926156, dev acc 0.7316, dev avg loss 0.533048, throughput 4.94698K wps
[Epoch 60 Batch 30/62] avg loss 0.00911348, throughput 5.00516K wps
[Epoch 60 Batch 60/62] avg loss 0.00918411, throughput 4.87078K wps
Begin Testing...
[Epoch 60] train avg loss 0.00922895, dev acc 0.7168, dev avg loss 0.532113, throughput 4.94422K wps
[Epoch 61 Batch 30/62] avg loss 0.00884016, throughput 4.99462K wps
[Epoch 61 Batch 60/62] avg loss 0.00915611, throughput 4.89293K wps
Begin Testing...
[Epoch 61] train avg loss 0.00915456, dev acc 0.7404, dev avg loss 0.526894, throughput 4.95003K wps
[Epoch 62 Batch 30/62] avg loss 0.00884101, throughput 5.00872K wps
[Epoch 62 Batch 60/62] avg loss 0.00889268, throughput 4.89862K wps
Begin Testing...
[Epoch 62] train avg loss 0.00899941, dev acc 0.7493, dev avg loss 0.524649, throughput 4.96019K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/62] avg loss 0.00895809, throughput 4.99631K wps
[Epoch 63 Batch 60/62] avg loss 0.00877116, throughput 4.89485K wps
Begin Testing...
[Epoch 63] train avg loss 0.00897465, dev acc 0.7522, dev avg loss 0.522511, throughput 4.95277K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/62] avg loss 0.00855772, throughput 4.9936K wps
[Epoch 64 Batch 60/62] avg loss 0.00895303, throughput 4.89493K wps
Begin Testing...
[Epoch 64] train avg loss 0.00887007, dev acc 0.7493, dev avg loss 0.525065, throughput 4.95022K wps
[Epoch 65 Batch 30/62] avg loss 0.00858041, throughput 5.01171K wps
[Epoch 65 Batch 60/62] avg loss 0.00883766, throughput 4.89152K wps
Begin Testing...
[Epoch 65] train avg loss 0.00885289, dev acc 0.7581, dev avg loss 0.519925, throughput 4.95794K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/62] avg loss 0.00883271, throughput 5.00869K wps
[Epoch 66 Batch 60/62] avg loss 0.00833547, throughput 4.88232K wps
Begin Testing...
[Epoch 66] train avg loss 0.00878755, dev acc 0.7463, dev avg loss 0.520337, throughput 4.95116K wps
[Epoch 67 Batch 30/62] avg loss 0.00845651, throughput 4.99282K wps
[Epoch 67 Batch 60/62] avg loss 0.00848544, throughput 4.88185K wps
Begin Testing...
[Epoch 67] train avg loss 0.00865492, dev acc 0.7581, dev avg loss 0.515601, throughput 4.94352K wps
Observed Improvement.
Begin Testing...
[Epoch 68 Batch 30/62] avg loss 0.00830116, throughput 4.99033K wps
[Epoch 68 Batch 60/62] avg loss 0.00841104, throughput 4.8808K wps
Begin Testing...
[Epoch 68] train avg loss 0.00847301, dev acc 0.7581, dev avg loss 0.509285, throughput 4.94221K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/62] avg loss 0.00835026, throughput 4.99634K wps
[Epoch 69 Batch 60/62] avg loss 0.00826854, throughput 4.88326K wps
Begin Testing...
[Epoch 69] train avg loss 0.00844205, dev acc 0.7699, dev avg loss 0.507929, throughput 4.94651K wps
Observed Improvement.
Begin Testing...
[Epoch 70 Batch 30/62] avg loss 0.00843616, throughput 4.99453K wps
[Epoch 70 Batch 60/62] avg loss 0.00801055, throughput 4.89137K wps
Begin Testing...
[Epoch 70] train avg loss 0.00826564, dev acc 0.7788, dev avg loss 0.505031, throughput 4.9495K wps
Observed Improvement.
Begin Testing...
[Epoch 71 Batch 30/62] avg loss 0.00791569, throughput 4.98997K wps
[Epoch 71 Batch 60/62] avg loss 0.00821511, throughput 4.88846K wps
Begin Testing...
[Epoch 71] train avg loss 0.00818423, dev acc 0.7640, dev avg loss 0.501741, throughput 4.94623K wps
[Epoch 72 Batch 30/62] avg loss 0.00796619, throughput 5.00562K wps
[Epoch 72 Batch 60/62] avg loss 0.00810078, throughput 4.88827K wps
Begin Testing...
[Epoch 72] train avg loss 0.00818017, dev acc 0.7670, dev avg loss 0.499391, throughput 4.95154K wps
[Epoch 73 Batch 30/62] avg loss 0.0079406, throughput 4.98791K wps
[Epoch 73 Batch 60/62] avg loss 0.00801937, throughput 4.88391K wps
Begin Testing...
[Epoch 73] train avg loss 0.00811745, dev acc 0.7640, dev avg loss 0.499393, throughput 4.94227K wps
[Epoch 74 Batch 30/62] avg loss 0.00769631, throughput 5.00366K wps
[Epoch 74 Batch 60/62] avg loss 0.00800732, throughput 4.88173K wps
Begin Testing...
[Epoch 74] train avg loss 0.00794211, dev acc 0.7788, dev avg loss 0.494791, throughput 4.95067K wps
Observed Improvement.