Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='MPQA', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='static')
Use gpu0
maximum length (in tokens): 36
Done! Tokenizing Time=0.05s, #Sentences=10606
SentimentNet(
(embedding): Embedding(6250 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/172] avg loss 0.0127233, throughput 0.568037K wps
[Epoch 0 Batch 60/172] avg loss 0.0121942, throughput 2.98018K wps
[Epoch 0 Batch 90/172] avg loss 0.0123264, throughput 2.95468K wps
[Epoch 0 Batch 120/172] avg loss 0.0124115, throughput 2.99725K wps
[Epoch 0 Batch 150/172] avg loss 0.0125816, throughput 3.46619K wps
Begin Testing...
[Epoch 0] train avg loss 0.0124621, dev acc 0.7013, dev avg loss 0.596264, throughput 1.26098K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0121442, throughput 3.26167K wps
[Epoch 1 Batch 60/172] avg loss 0.012264, throughput 2.69865K wps
[Epoch 1 Batch 90/172] avg loss 0.0121966, throughput 3.11976K wps
[Epoch 1 Batch 120/172] avg loss 0.0118879, throughput 3.532K wps
[Epoch 1 Batch 150/172] avg loss 0.0119789, throughput 3.37953K wps
Begin Testing...
[Epoch 1] train avg loss 0.0121223, dev acc 0.7013, dev avg loss 0.584246, throughput 3.22058K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0120123, throughput 3.4221K wps
[Epoch 2 Batch 60/172] avg loss 0.0119558, throughput 3.24615K wps
[Epoch 2 Batch 90/172] avg loss 0.0116812, throughput 3.34826K wps
[Epoch 2 Batch 120/172] avg loss 0.0118478, throughput 3.29778K wps
[Epoch 2 Batch 150/172] avg loss 0.0117791, throughput 3.27964K wps
Begin Testing...
[Epoch 2] train avg loss 0.0118672, dev acc 0.7013, dev avg loss 0.572381, throughput 3.30119K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0114487, throughput 3.71262K wps
[Epoch 3 Batch 60/172] avg loss 0.0118789, throughput 2.97893K wps
[Epoch 3 Batch 90/172] avg loss 0.0114672, throughput 3.22584K wps
[Epoch 3 Batch 120/172] avg loss 0.0112549, throughput 3.04536K wps
[Epoch 3 Batch 150/172] avg loss 0.0117584, throughput 4.03722K wps
Begin Testing...
[Epoch 3] train avg loss 0.0115544, dev acc 0.7013, dev avg loss 0.558716, throughput 3.42615K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0114478, throughput 4.16634K wps
[Epoch 4 Batch 60/172] avg loss 0.011327, throughput 3.13915K wps
[Epoch 4 Batch 90/172] avg loss 0.0111751, throughput 3.45599K wps
[Epoch 4 Batch 120/172] avg loss 0.0109726, throughput 3.40918K wps
[Epoch 4 Batch 150/172] avg loss 0.0112832, throughput 3.06019K wps
Begin Testing...
[Epoch 4] train avg loss 0.0112317, dev acc 0.7044, dev avg loss 0.541566, throughput 3.3932K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0107178, throughput 3.03481K wps
[Epoch 5 Batch 60/172] avg loss 0.0110696, throughput 3.33332K wps
[Epoch 5 Batch 90/172] avg loss 0.0106704, throughput 3.70407K wps
[Epoch 5 Batch 120/172] avg loss 0.0108208, throughput 3.08342K wps
[Epoch 5 Batch 150/172] avg loss 0.0107542, throughput 3.15044K wps
Begin Testing...
[Epoch 5] train avg loss 0.010817, dev acc 0.7275, dev avg loss 0.523032, throughput 3.27562K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.0108376, throughput 3.22603K wps
[Epoch 6 Batch 60/172] avg loss 0.0103796, throughput 3.03746K wps
[Epoch 6 Batch 90/172] avg loss 0.0104502, throughput 3.28056K wps
[Epoch 6 Batch 120/172] avg loss 0.0103761, throughput 3.83081K wps
[Epoch 6 Batch 150/172] avg loss 0.0102663, throughput 3.35017K wps
Begin Testing...
[Epoch 6] train avg loss 0.0104146, dev acc 0.7505, dev avg loss 0.501728, throughput 3.36225K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.0101223, throughput 3.2793K wps
[Epoch 7 Batch 60/172] avg loss 0.0103292, throughput 2.95991K wps
[Epoch 7 Batch 90/172] avg loss 0.00996908, throughput 3.91696K wps
[Epoch 7 Batch 120/172] avg loss 0.010045, throughput 3.23376K wps
[Epoch 7 Batch 150/172] avg loss 0.0095717, throughput 3.23186K wps
Begin Testing...
[Epoch 7] train avg loss 0.00996753, dev acc 0.7778, dev avg loss 0.479924, throughput 3.36584K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00963984, throughput 3.28301K wps
[Epoch 8 Batch 60/172] avg loss 0.00939315, throughput 3.40667K wps
[Epoch 8 Batch 90/172] avg loss 0.00955535, throughput 3.0485K wps
[Epoch 8 Batch 120/172] avg loss 0.00934517, throughput 3.03151K wps
[Epoch 8 Batch 150/172] avg loss 0.00935906, throughput 3.14406K wps
Begin Testing...
[Epoch 8] train avg loss 0.00946479, dev acc 0.8082, dev avg loss 0.457284, throughput 3.23204K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00901025, throughput 3.32432K wps
[Epoch 9 Batch 60/172] avg loss 0.00913873, throughput 3.87368K wps
[Epoch 9 Batch 90/172] avg loss 0.00887666, throughput 3.24743K wps
[Epoch 9 Batch 120/172] avg loss 0.00885105, throughput 3.43844K wps
[Epoch 9 Batch 150/172] avg loss 0.00897269, throughput 3.24354K wps
Begin Testing...
[Epoch 9] train avg loss 0.00898854, dev acc 0.8281, dev avg loss 0.436572, throughput 3.3963K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00858826, throughput 3.79686K wps
[Epoch 10 Batch 60/172] avg loss 0.00855101, throughput 3.32147K wps
[Epoch 10 Batch 90/172] avg loss 0.0084636, throughput 3.32589K wps
[Epoch 10 Batch 120/172] avg loss 0.00856535, throughput 3.37608K wps
[Epoch 10 Batch 150/172] avg loss 0.00849448, throughput 3.75786K wps
Begin Testing...
[Epoch 10] train avg loss 0.00854714, dev acc 0.8407, dev avg loss 0.416918, throughput 3.46747K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00854495, throughput 3.6082K wps
[Epoch 11 Batch 60/172] avg loss 0.00825646, throughput 3.35882K wps
[Epoch 11 Batch 90/172] avg loss 0.00789342, throughput 3.08015K wps
[Epoch 11 Batch 120/172] avg loss 0.00827371, throughput 3.23117K wps
[Epoch 11 Batch 150/172] avg loss 0.00801883, throughput 3.15959K wps
Begin Testing...
[Epoch 11] train avg loss 0.00814192, dev acc 0.8428, dev avg loss 0.399477, throughput 3.29884K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.00772041, throughput 2.91767K wps
[Epoch 12 Batch 60/172] avg loss 0.00782335, throughput 3.14053K wps
[Epoch 12 Batch 90/172] avg loss 0.00795845, throughput 3.18763K wps
[Epoch 12 Batch 120/172] avg loss 0.0076794, throughput 3.47449K wps
[Epoch 12 Batch 150/172] avg loss 0.00779173, throughput 3.07137K wps
Begin Testing...
[Epoch 12] train avg loss 0.00781458, dev acc 0.8532, dev avg loss 0.385584, throughput 3.14814K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.0073061, throughput 2.91638K wps
[Epoch 13 Batch 60/172] avg loss 0.00759952, throughput 3.15092K wps
[Epoch 13 Batch 90/172] avg loss 0.00752232, throughput 3.07213K wps
[Epoch 13 Batch 120/172] avg loss 0.00746355, throughput 3.1516K wps
[Epoch 13 Batch 150/172] avg loss 0.00745688, throughput 3.49559K wps
Begin Testing...
[Epoch 13] train avg loss 0.00749335, dev acc 0.8564, dev avg loss 0.372432, throughput 3.11167K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.00725985, throughput 3.05221K wps
[Epoch 14 Batch 60/172] avg loss 0.00751878, throughput 2.86987K wps
[Epoch 14 Batch 90/172] avg loss 0.00732755, throughput 3.10717K wps
[Epoch 14 Batch 120/172] avg loss 0.00720006, throughput 2.97359K wps
[Epoch 14 Batch 150/172] avg loss 0.00719098, throughput 2.91866K wps
Begin Testing...
[Epoch 14] train avg loss 0.00728736, dev acc 0.8532, dev avg loss 0.361486, throughput 2.9818K wps
[Epoch 15 Batch 30/172] avg loss 0.00717271, throughput 3.14917K wps
[Epoch 15 Batch 60/172] avg loss 0.00708512, throughput 3.30843K wps
[Epoch 15 Batch 90/172] avg loss 0.00707693, throughput 3.10045K wps
[Epoch 15 Batch 120/172] avg loss 0.00705599, throughput 3.09707K wps
[Epoch 15 Batch 150/172] avg loss 0.00694422, throughput 3.22122K wps
Begin Testing...
[Epoch 15] train avg loss 0.00706927, dev acc 0.8616, dev avg loss 0.352945, throughput 3.14684K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.0073563, throughput 3.28204K wps
[Epoch 16 Batch 60/172] avg loss 0.00702837, throughput 3.33905K wps
[Epoch 16 Batch 90/172] avg loss 0.00677221, throughput 3.30064K wps
[Epoch 16 Batch 120/172] avg loss 0.00667035, throughput 3.20914K wps
[Epoch 16 Batch 150/172] avg loss 0.00669526, throughput 3.80875K wps
Begin Testing...
[Epoch 16] train avg loss 0.0069187, dev acc 0.8606, dev avg loss 0.346213, throughput 3.34159K wps
[Epoch 17 Batch 30/172] avg loss 0.00701887, throughput 3.49612K wps
[Epoch 17 Batch 60/172] avg loss 0.00664853, throughput 3.04102K wps
[Epoch 17 Batch 90/172] avg loss 0.00655937, throughput 3.15781K wps
[Epoch 17 Batch 120/172] avg loss 0.00673822, throughput 3.35695K wps
[Epoch 17 Batch 150/172] avg loss 0.00703093, throughput 3.74515K wps
Begin Testing...
[Epoch 17] train avg loss 0.00673067, dev acc 0.8658, dev avg loss 0.33979, throughput 3.35854K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.00668021, throughput 3.57295K wps
[Epoch 18 Batch 60/172] avg loss 0.00647215, throughput 2.99785K wps
[Epoch 18 Batch 90/172] avg loss 0.00637071, throughput 3.55826K wps
[Epoch 18 Batch 120/172] avg loss 0.00660224, throughput 3.21775K wps
[Epoch 18 Batch 150/172] avg loss 0.00641824, throughput 3.01983K wps
Begin Testing...
[Epoch 18] train avg loss 0.00655544, dev acc 0.8690, dev avg loss 0.334831, throughput 3.32123K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00653397, throughput 3.37539K wps
[Epoch 19 Batch 60/172] avg loss 0.00623574, throughput 4.00235K wps
[Epoch 19 Batch 90/172] avg loss 0.00662951, throughput 3.14147K wps
[Epoch 19 Batch 120/172] avg loss 0.00641958, throughput 3.21954K wps
[Epoch 19 Batch 150/172] avg loss 0.00660851, throughput 3.06596K wps
Begin Testing...
[Epoch 19] train avg loss 0.00646775, dev acc 0.8690, dev avg loss 0.33043, throughput 3.28406K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.00628485, throughput 2.9268K wps
[Epoch 20 Batch 60/172] avg loss 0.00624464, throughput 3.37136K wps
[Epoch 20 Batch 90/172] avg loss 0.00658069, throughput 3.16108K wps
[Epoch 20 Batch 120/172] avg loss 0.00647636, throughput 2.99797K wps
[Epoch 20 Batch 150/172] avg loss 0.00615344, throughput 2.94709K wps
Begin Testing...
[Epoch 20] train avg loss 0.00632612, dev acc 0.8669, dev avg loss 0.327034, throughput 3.07544K wps
[Epoch 21 Batch 30/172] avg loss 0.00590377, throughput 3.39111K wps
[Epoch 21 Batch 60/172] avg loss 0.00649392, throughput 3.23956K wps
[Epoch 21 Batch 90/172] avg loss 0.00659017, throughput 3.09683K wps
[Epoch 21 Batch 120/172] avg loss 0.00608674, throughput 3.06675K wps
[Epoch 21 Batch 150/172] avg loss 0.00628865, throughput 3.36194K wps
Begin Testing...
[Epoch 21] train avg loss 0.00630336, dev acc 0.8732, dev avg loss 0.324129, throughput 3.2359K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.00583023, throughput 3.40699K wps
[Epoch 22 Batch 60/172] avg loss 0.00620989, throughput 3.17302K wps
[Epoch 22 Batch 90/172] avg loss 0.00630623, throughput 3.55451K wps
[Epoch 22 Batch 120/172] avg loss 0.00648819, throughput 3.22982K wps
[Epoch 22 Batch 150/172] avg loss 0.00643698, throughput 3.29356K wps
Begin Testing...
[Epoch 22] train avg loss 0.00624489, dev acc 0.8711, dev avg loss 0.320877, throughput 3.39095K wps
[Epoch 23 Batch 30/172] avg loss 0.00601718, throughput 3.28153K wps
[Epoch 23 Batch 60/172] avg loss 0.00616177, throughput 3.10622K wps
[Epoch 23 Batch 90/172] avg loss 0.00635835, throughput 2.97047K wps
[Epoch 23 Batch 120/172] avg loss 0.00616207, throughput 3.17878K wps
[Epoch 23 Batch 150/172] avg loss 0.0058397, throughput 3.63953K wps
Begin Testing...
[Epoch 23] train avg loss 0.00615918, dev acc 0.8763, dev avg loss 0.319247, throughput 3.23414K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00601478, throughput 3.14921K wps
[Epoch 24 Batch 60/172] avg loss 0.00611081, throughput 3.74394K wps
[Epoch 24 Batch 90/172] avg loss 0.00608245, throughput 3.33662K wps
[Epoch 24 Batch 120/172] avg loss 0.00578948, throughput 3.48627K wps
[Epoch 24 Batch 150/172] avg loss 0.0061752, throughput 2.90924K wps
Begin Testing...
[Epoch 24] train avg loss 0.0060603, dev acc 0.8711, dev avg loss 0.31629, throughput 3.24154K wps
[Epoch 25 Batch 30/172] avg loss 0.00613641, throughput 3.13829K wps
[Epoch 25 Batch 60/172] avg loss 0.00597926, throughput 3.12878K wps
[Epoch 25 Batch 90/172] avg loss 0.00594114, throughput 3.08405K wps
[Epoch 25 Batch 120/172] avg loss 0.00578393, throughput 3.41234K wps
[Epoch 25 Batch 150/172] avg loss 0.00638086, throughput 3.45158K wps
Begin Testing...
[Epoch 25] train avg loss 0.00595594, dev acc 0.8753, dev avg loss 0.314189, throughput 3.20689K wps
[Epoch 26 Batch 30/172] avg loss 0.00558838, throughput 3.20089K wps
[Epoch 26 Batch 60/172] avg loss 0.00618518, throughput 2.95103K wps
[Epoch 26 Batch 90/172] avg loss 0.00588484, throughput 3.56916K wps
[Epoch 26 Batch 120/172] avg loss 0.00589363, throughput 3.19684K wps
[Epoch 26 Batch 150/172] avg loss 0.00612226, throughput 3.49191K wps
Begin Testing...
[Epoch 26] train avg loss 0.0059725, dev acc 0.8774, dev avg loss 0.312784, throughput 3.31793K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/172] avg loss 0.00588888, throughput 2.93619K wps
[Epoch 27 Batch 60/172] avg loss 0.0061345, throughput 2.96873K wps
[Epoch 27 Batch 90/172] avg loss 0.00566765, throughput 3.16559K wps
[Epoch 27 Batch 120/172] avg loss 0.00562864, throughput 3.14564K wps
[Epoch 27 Batch 150/172] avg loss 0.00625058, throughput 3.32899K wps
Begin Testing...
[Epoch 27] train avg loss 0.00585134, dev acc 0.8805, dev avg loss 0.310932, throughput 3.12738K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/172] avg loss 0.00597622, throughput 3.16849K wps
[Epoch 28 Batch 60/172] avg loss 0.00554544, throughput 3.25166K wps
[Epoch 28 Batch 90/172] avg loss 0.0057941, throughput 3.51676K wps
[Epoch 28 Batch 120/172] avg loss 0.00598347, throughput 2.96205K wps
[Epoch 28 Batch 150/172] avg loss 0.00566894, throughput 3.06385K wps
Begin Testing...
[Epoch 28] train avg loss 0.00580601, dev acc 0.8795, dev avg loss 0.309405, throughput 3.21466K wps
[Epoch 29 Batch 30/172] avg loss 0.00570336, throughput 3.23856K wps
[Epoch 29 Batch 60/172] avg loss 0.00526966, throughput 3.54845K wps
[Epoch 29 Batch 90/172] avg loss 0.00631996, throughput 3.01974K wps
[Epoch 29 Batch 120/172] avg loss 0.00588606, throughput 3.32608K wps
[Epoch 29 Batch 150/172] avg loss 0.00575599, throughput 3.08267K wps
Begin Testing...
[Epoch 29] train avg loss 0.00577465, dev acc 0.8774, dev avg loss 0.309248, throughput 3.23398K wps
[Epoch 30 Batch 30/172] avg loss 0.00639365, throughput 3.08892K wps
[Epoch 30 Batch 60/172] avg loss 0.00511487, throughput 3.47182K wps
[Epoch 30 Batch 90/172] avg loss 0.005301, throughput 3.36181K wps
[Epoch 30 Batch 120/172] avg loss 0.00584884, throughput 3.20703K wps
[Epoch 30 Batch 150/172] avg loss 0.00586244, throughput 2.91843K wps
Begin Testing...
[Epoch 30] train avg loss 0.005688, dev acc 0.8774, dev avg loss 0.30662, throughput 3.1546K wps
[Epoch 31 Batch 30/172] avg loss 0.00589364, throughput 3.2569K wps
[Epoch 31 Batch 60/172] avg loss 0.00540048, throughput 3.22285K wps
[Epoch 31 Batch 90/172] avg loss 0.00580304, throughput 2.82822K wps
[Epoch 31 Batch 120/172] avg loss 0.00558404, throughput 2.87527K wps
[Epoch 31 Batch 150/172] avg loss 0.00554986, throughput 3.4867K wps
Begin Testing...
[Epoch 31] train avg loss 0.00574556, dev acc 0.8784, dev avg loss 0.306724, throughput 3.08224K wps
[Epoch 32 Batch 30/172] avg loss 0.00511018, throughput 3.13811K wps
[Epoch 32 Batch 60/172] avg loss 0.00568895, throughput 3.68366K wps
[Epoch 32 Batch 90/172] avg loss 0.00583557, throughput 4.04503K wps
[Epoch 32 Batch 120/172] avg loss 0.00575624, throughput 2.9532K wps
[Epoch 32 Batch 150/172] avg loss 0.00588778, throughput 3.31346K wps
Begin Testing...
[Epoch 32] train avg loss 0.00559614, dev acc 0.8774, dev avg loss 0.306169, throughput 3.43838K wps
[Epoch 33 Batch 30/172] avg loss 0.00527906, throughput 3.13354K wps
[Epoch 33 Batch 60/172] avg loss 0.00543822, throughput 2.9945K wps
[Epoch 33 Batch 90/172] avg loss 0.00558288, throughput 3.49092K wps
[Epoch 33 Batch 120/172] avg loss 0.00522591, throughput 3.16752K wps
[Epoch 33 Batch 150/172] avg loss 0.00596157, throughput 2.99157K wps
Begin Testing...
[Epoch 33] train avg loss 0.00553968, dev acc 0.8805, dev avg loss 0.303638, throughput 3.13875K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/172] avg loss 0.00582964, throughput 3.38732K wps
[Epoch 34 Batch 60/172] avg loss 0.00584805, throughput 3.47832K wps
[Epoch 34 Batch 90/172] avg loss 0.00538751, throughput 3.11784K wps
[Epoch 34 Batch 120/172] avg loss 0.00521102, throughput 3.33727K wps
[Epoch 34 Batch 150/172] avg loss 0.00556222, throughput 3.47843K wps
Begin Testing...
[Epoch 34] train avg loss 0.00550054, dev acc 0.8784, dev avg loss 0.303678, throughput 3.37165K wps
[Epoch 35 Batch 30/172] avg loss 0.00525401, throughput 3.09261K wps
[Epoch 35 Batch 60/172] avg loss 0.0053626, throughput 3.05743K wps
[Epoch 35 Batch 90/172] avg loss 0.00552438, throughput 3.28116K wps
[Epoch 35 Batch 120/172] avg loss 0.00592613, throughput 3.27532K wps
[Epoch 35 Batch 150/172] avg loss 0.00517675, throughput 3.20705K wps
Begin Testing...
[Epoch 35] train avg loss 0.0054771, dev acc 0.8826, dev avg loss 0.301958, throughput 3.15174K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/172] avg loss 0.00531488, throughput 3.35658K wps
[Epoch 36 Batch 60/172] avg loss 0.00538417, throughput 3.2213K wps
[Epoch 36 Batch 90/172] avg loss 0.0053467, throughput 3.20527K wps
[Epoch 36 Batch 120/172] avg loss 0.00557561, throughput 3.52871K wps
[Epoch 36 Batch 150/172] avg loss 0.00556207, throughput 2.98268K wps
Begin Testing...
[Epoch 36] train avg loss 0.00545444, dev acc 0.8826, dev avg loss 0.300951, throughput 3.26492K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/172] avg loss 0.00530603, throughput 3.14862K wps
[Epoch 37 Batch 60/172] avg loss 0.0052495, throughput 3.44951K wps
[Epoch 37 Batch 90/172] avg loss 0.00539426, throughput 2.97872K wps
[Epoch 37 Batch 120/172] avg loss 0.00537585, throughput 3.26534K wps
[Epoch 37 Batch 150/172] avg loss 0.00530836, throughput 3.3975K wps
Begin Testing...
[Epoch 37] train avg loss 0.00540317, dev acc 0.8826, dev avg loss 0.300212, throughput 3.26054K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/172] avg loss 0.00573187, throughput 3.42024K wps
[Epoch 38 Batch 60/172] avg loss 0.0051257, throughput 3.82002K wps
[Epoch 38 Batch 90/172] avg loss 0.00545522, throughput 3.01604K wps
[Epoch 38 Batch 120/172] avg loss 0.00514336, throughput 3.04805K wps
[Epoch 38 Batch 150/172] avg loss 0.00562819, throughput 3.44746K wps
Begin Testing...
[Epoch 38] train avg loss 0.00539816, dev acc 0.8847, dev avg loss 0.29965, throughput 3.32341K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/172] avg loss 0.00549327, throughput 3.07794K wps
[Epoch 39 Batch 60/172] avg loss 0.00512159, throughput 2.89641K wps
[Epoch 39 Batch 90/172] avg loss 0.00558526, throughput 2.96524K wps
[Epoch 39 Batch 120/172] avg loss 0.00545365, throughput 3.23778K wps
[Epoch 39 Batch 150/172] avg loss 0.00551086, throughput 3.39843K wps
Begin Testing...
[Epoch 39] train avg loss 0.00540512, dev acc 0.8847, dev avg loss 0.299635, throughput 3.11265K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/172] avg loss 0.00577654, throughput 3.10277K wps
[Epoch 40 Batch 60/172] avg loss 0.00520822, throughput 3.07245K wps
[Epoch 40 Batch 90/172] avg loss 0.00527059, throughput 3.45958K wps
[Epoch 40 Batch 120/172] avg loss 0.0053301, throughput 3.66469K wps
[Epoch 40 Batch 150/172] avg loss 0.00468038, throughput 3.24265K wps
Begin Testing...
[Epoch 40] train avg loss 0.00529221, dev acc 0.8816, dev avg loss 0.298998, throughput 3.24453K wps
[Epoch 41 Batch 30/172] avg loss 0.00554737, throughput 3.2319K wps
[Epoch 41 Batch 60/172] avg loss 0.00516385, throughput 3.04078K wps
[Epoch 41 Batch 90/172] avg loss 0.0053642, throughput 3.05962K wps
[Epoch 41 Batch 120/172] avg loss 0.00534818, throughput 3.10069K wps
[Epoch 41 Batch 150/172] avg loss 0.0055899, throughput 3.78193K wps
Begin Testing...
[Epoch 41] train avg loss 0.00531936, dev acc 0.8868, dev avg loss 0.297648, throughput 3.23141K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/172] avg loss 0.004872, throughput 2.95751K wps
[Epoch 42 Batch 60/172] avg loss 0.00528704, throughput 2.92965K wps
[Epoch 42 Batch 90/172] avg loss 0.00550098, throughput 3.20139K wps
[Epoch 42 Batch 120/172] avg loss 0.00525231, throughput 3.51374K wps
[Epoch 42 Batch 150/172] avg loss 0.00525616, throughput 3.20584K wps
Begin Testing...
[Epoch 42] train avg loss 0.00523914, dev acc 0.8857, dev avg loss 0.297303, throughput 3.17303K wps
[Epoch 43 Batch 30/172] avg loss 0.00516612, throughput 3.1028K wps
[Epoch 43 Batch 60/172] avg loss 0.00515936, throughput 3.25575K wps
[Epoch 43 Batch 90/172] avg loss 0.0050249, throughput 3.46283K wps
[Epoch 43 Batch 120/172] avg loss 0.00502948, throughput 3.54589K wps
[Epoch 43 Batch 150/172] avg loss 0.00544026, throughput 2.87783K wps
Begin Testing...
[Epoch 43] train avg loss 0.00522373, dev acc 0.8857, dev avg loss 0.296956, throughput 3.24518K wps
[Epoch 44 Batch 30/172] avg loss 0.00535413, throughput 3.58512K wps
[Epoch 44 Batch 60/172] avg loss 0.0053574, throughput 3.42889K wps
[Epoch 44 Batch 90/172] avg loss 0.00519385, throughput 3.49666K wps
[Epoch 44 Batch 120/172] avg loss 0.00518586, throughput 3.14486K wps
[Epoch 44 Batch 150/172] avg loss 0.00509441, throughput 3.55502K wps
Begin Testing...
[Epoch 44] train avg loss 0.00521169, dev acc 0.8868, dev avg loss 0.296235, throughput 3.35357K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/172] avg loss 0.00526172, throughput 3.15473K wps
[Epoch 45 Batch 60/172] avg loss 0.00467022, throughput 3.43089K wps
[Epoch 45 Batch 90/172] avg loss 0.00565718, throughput 3.64352K wps
[Epoch 45 Batch 120/172] avg loss 0.00498134, throughput 2.95058K wps
[Epoch 45 Batch 150/172] avg loss 0.00494585, throughput 3.29887K wps
Begin Testing...
[Epoch 45] train avg loss 0.00518641, dev acc 0.8868, dev avg loss 0.295661, throughput 3.24048K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/172] avg loss 0.00503835, throughput 3.65936K wps
[Epoch 46 Batch 60/172] avg loss 0.00531913, throughput 2.99864K wps
[Epoch 46 Batch 90/172] avg loss 0.0049571, throughput 3.21921K wps
[Epoch 46 Batch 120/172] avg loss 0.00554236, throughput 2.87964K wps
[Epoch 46 Batch 150/172] avg loss 0.00511143, throughput 3.04718K wps
Begin Testing...
[Epoch 46] train avg loss 0.00512866, dev acc 0.8868, dev avg loss 0.29544, throughput 3.13827K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/172] avg loss 0.00495965, throughput 3.29231K wps
[Epoch 47 Batch 60/172] avg loss 0.00525356, throughput 3.02094K wps
[Epoch 47 Batch 90/172] avg loss 0.00476325, throughput 3.1279K wps
[Epoch 47 Batch 120/172] avg loss 0.00516842, throughput 2.97751K wps
[Epoch 47 Batch 150/172] avg loss 0.00520985, throughput 2.98171K wps
Begin Testing...
[Epoch 47] train avg loss 0.00507658, dev acc 0.8889, dev avg loss 0.295646, throughput 3.08209K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/172] avg loss 0.00514, throughput 3.24551K wps
[Epoch 48 Batch 60/172] avg loss 0.00493684, throughput 3.35148K wps
[Epoch 48 Batch 90/172] avg loss 0.00551604, throughput 3.13105K wps
[Epoch 48 Batch 120/172] avg loss 0.00512679, throughput 2.96864K wps
[Epoch 48 Batch 150/172] avg loss 0.00525897, throughput 3.02643K wps
Begin Testing...
[Epoch 48] train avg loss 0.00507714, dev acc 0.8889, dev avg loss 0.294503, throughput 3.13643K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/172] avg loss 0.00517628, throughput 3.06589K wps
[Epoch 49 Batch 60/172] avg loss 0.00508764, throughput 3.08049K wps
[Epoch 49 Batch 90/172] avg loss 0.00494481, throughput 3.30501K wps
[Epoch 49 Batch 120/172] avg loss 0.00520737, throughput 3.33165K wps
[Epoch 49 Batch 150/172] avg loss 0.00502998, throughput 3.24977K wps
Begin Testing...
[Epoch 49] train avg loss 0.00506518, dev acc 0.8910, dev avg loss 0.293874, throughput 3.24475K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/172] avg loss 0.00494742, throughput 3.08114K wps
[Epoch 50 Batch 60/172] avg loss 0.00475817, throughput 3.12007K wps
[Epoch 50 Batch 90/172] avg loss 0.00480093, throughput 3.31216K wps
[Epoch 50 Batch 120/172] avg loss 0.00533553, throughput 3.47328K wps
[Epoch 50 Batch 150/172] avg loss 0.00486297, throughput 3.05581K wps
Begin Testing...
[Epoch 50] train avg loss 0.00496461, dev acc 0.8899, dev avg loss 0.29404, throughput 3.27918K wps
[Epoch 51 Batch 30/172] avg loss 0.00470829, throughput 3.21309K wps
[Epoch 51 Batch 60/172] avg loss 0.00503592, throughput 3.18584K wps
[Epoch 51 Batch 90/172] avg loss 0.00516877, throughput 3.41846K wps
[Epoch 51 Batch 120/172] avg loss 0.00467271, throughput 3.54006K wps
[Epoch 51 Batch 150/172] avg loss 0.00499618, throughput 2.97423K wps
Begin Testing...
[Epoch 51] train avg loss 0.00496639, dev acc 0.8910, dev avg loss 0.293587, throughput 3.21372K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/172] avg loss 0.00484746, throughput 2.93354K wps
[Epoch 52 Batch 60/172] avg loss 0.00507291, throughput 3.21386K wps
[Epoch 52 Batch 90/172] avg loss 0.00494545, throughput 3.04767K wps
[Epoch 52 Batch 120/172] avg loss 0.00465334, throughput 3.78097K wps
[Epoch 52 Batch 150/172] avg loss 0.00518025, throughput 3.6499K wps
Begin Testing...
[Epoch 52] train avg loss 0.00492187, dev acc 0.8910, dev avg loss 0.293009, throughput 3.32365K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/172] avg loss 0.00509023, throughput 3.23183K wps
[Epoch 53 Batch 60/172] avg loss 0.00483509, throughput 2.8955K wps
[Epoch 53 Batch 90/172] avg loss 0.00480007, throughput 3.28125K wps
[Epoch 53 Batch 120/172] avg loss 0.0049985, throughput 2.86994K wps
[Epoch 53 Batch 150/172] avg loss 0.00478899, throughput 3.07782K wps
Begin Testing...
[Epoch 53] train avg loss 0.00490441, dev acc 0.8910, dev avg loss 0.29256, throughput 3.04124K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/172] avg loss 0.00517714, throughput 3.20786K wps
[Epoch 54 Batch 60/172] avg loss 0.00515157, throughput 3.27236K wps
[Epoch 54 Batch 90/172] avg loss 0.00517511, throughput 3.23121K wps
[Epoch 54 Batch 120/172] avg loss 0.00457853, throughput 4.00046K wps
[Epoch 54 Batch 150/172] avg loss 0.00454756, throughput 3.04633K wps
Begin Testing...
[Epoch 54] train avg loss 0.00489287, dev acc 0.8910, dev avg loss 0.292693, throughput 3.32072K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/172] avg loss 0.00439213, throughput 3.31615K wps
[Epoch 55 Batch 60/172] avg loss 0.00512817, throughput 3.07884K wps
[Epoch 55 Batch 90/172] avg loss 0.00527859, throughput 3.35551K wps
[Epoch 55 Batch 120/172] avg loss 0.00479906, throughput 3.27775K wps
[Epoch 55 Batch 150/172] avg loss 0.00476784, throughput 3.29372K wps
Begin Testing...
[Epoch 55] train avg loss 0.00489331, dev acc 0.8910, dev avg loss 0.291785, throughput 3.31206K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/172] avg loss 0.00542508, throughput 3.06577K wps
[Epoch 56 Batch 60/172] avg loss 0.00495694, throughput 2.9792K wps
[Epoch 56 Batch 90/172] avg loss 0.00434699, throughput 3.34748K wps
[Epoch 56 Batch 120/172] avg loss 0.00487334, throughput 3.2014K wps
[Epoch 56 Batch 150/172] avg loss 0.00481519, throughput 3.02683K wps
Begin Testing...
[Epoch 56] train avg loss 0.0048543, dev acc 0.8910, dev avg loss 0.291747, throughput 3.1554K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/172] avg loss 0.00469274, throughput 3.2535K wps
[Epoch 57 Batch 60/172] avg loss 0.00476396, throughput 3.62452K wps
[Epoch 57 Batch 90/172] avg loss 0.00489662, throughput 2.95566K wps
[Epoch 57 Batch 120/172] avg loss 0.00452628, throughput 2.98472K wps
[Epoch 57 Batch 150/172] avg loss 0.00491915, throughput 3.56336K wps
Begin Testing...
[Epoch 57] train avg loss 0.00476486, dev acc 0.8899, dev avg loss 0.291172, throughput 3.22231K wps
[Epoch 58 Batch 30/172] avg loss 0.00464837, throughput 3.15644K wps
[Epoch 58 Batch 60/172] avg loss 0.00486665, throughput 2.89863K wps
[Epoch 58 Batch 90/172] avg loss 0.00473555, throughput 3.24299K wps
[Epoch 58 Batch 120/172] avg loss 0.00464807, throughput 3.36522K wps
[Epoch 58 Batch 150/172] avg loss 0.00466565, throughput 3.44119K wps
Begin Testing...
[Epoch 58] train avg loss 0.00479479, dev acc 0.8952, dev avg loss 0.290878, throughput 3.24017K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/172] avg loss 0.00502225, throughput 3.21676K wps
[Epoch 59 Batch 60/172] avg loss 0.00460453, throughput 3.14082K wps
[Epoch 59 Batch 90/172] avg loss 0.00458289, throughput 3.061K wps
[Epoch 59 Batch 120/172] avg loss 0.00476421, throughput 3.26185K wps
[Epoch 59 Batch 150/172] avg loss 0.00481015, throughput 3.08927K wps
Begin Testing...
[Epoch 59] train avg loss 0.00473086, dev acc 0.8920, dev avg loss 0.290838, throughput 3.19663K wps
[Epoch 60 Batch 30/172] avg loss 0.00492577, throughput 3.52464K wps
[Epoch 60 Batch 60/172] avg loss 0.00467811, throughput 3.40876K wps
[Epoch 60 Batch 90/172] avg loss 0.00475258, throughput 3.63952K wps
[Epoch 60 Batch 120/172] avg loss 0.00426243, throughput 3.57549K wps
[Epoch 60 Batch 150/172] avg loss 0.00491706, throughput 3.11457K wps
Begin Testing...
[Epoch 60] train avg loss 0.00474271, dev acc 0.8920, dev avg loss 0.290655, throughput 3.47814K wps
[Epoch 61 Batch 30/172] avg loss 0.00465941, throughput 3.04142K wps
[Epoch 61 Batch 60/172] avg loss 0.00456225, throughput 2.9139K wps
[Epoch 61 Batch 90/172] avg loss 0.00472556, throughput 3.42994K wps
[Epoch 61 Batch 120/172] avg loss 0.00480213, throughput 3.25182K wps
[Epoch 61 Batch 150/172] avg loss 0.00491107, throughput 3.12986K wps
Begin Testing...
[Epoch 61] train avg loss 0.00471998, dev acc 0.8920, dev avg loss 0.291079, throughput 3.18471K wps
[Epoch 62 Batch 30/172] avg loss 0.00475273, throughput 2.96109K wps
[Epoch 62 Batch 60/172] avg loss 0.00454693, throughput 3.17198K wps
[Epoch 62 Batch 90/172] avg loss 0.00505831, throughput 3.38328K wps
[Epoch 62 Batch 120/172] avg loss 0.00446228, throughput 3.59412K wps
[Epoch 62 Batch 150/172] avg loss 0.00499186, throughput 3.55837K wps
Begin Testing...
[Epoch 62] train avg loss 0.00473727, dev acc 0.8920, dev avg loss 0.290198, throughput 3.30777K wps
[Epoch 63 Batch 30/172] avg loss 0.00444945, throughput 3.53873K wps
[Epoch 63 Batch 60/172] avg loss 0.00516302, throughput 3.69598K wps
[Epoch 63 Batch 90/172] avg loss 0.00446296, throughput 3.27949K wps
[Epoch 63 Batch 120/172] avg loss 0.0048082, throughput 3.40425K wps
[Epoch 63 Batch 150/172] avg loss 0.00470815, throughput 3.30544K wps
Begin Testing...
[Epoch 63] train avg loss 0.0046694, dev acc 0.8941, dev avg loss 0.289753, throughput 3.3596K wps
[Epoch 64 Batch 30/172] avg loss 0.00498284, throughput 3.10942K wps
[Epoch 64 Batch 60/172] avg loss 0.00449775, throughput 3.4226K wps
[Epoch 64 Batch 90/172] avg loss 0.00453805, throughput 2.97246K wps
[Epoch 64 Batch 120/172] avg loss 0.00445673, throughput 3.26335K wps
[Epoch 64 Batch 150/172] avg loss 0.0046603, throughput 3.1816K wps
Begin Testing...
[Epoch 64] train avg loss 0.0046254, dev acc 0.8920, dev avg loss 0.290628, throughput 3.23306K wps
[Epoch 65 Batch 30/172] avg loss 0.00436757, throughput 3.18026K wps
[Epoch 65 Batch 60/172] avg loss 0.00490062, throughput 3.1408K wps
[Epoch 65 Batch 90/172] avg loss 0.0049002, throughput 3.29139K wps
[Epoch 65 Batch 120/172] avg loss 0.0044319, throughput 2.98287K wps
[Epoch 65 Batch 150/172] avg loss 0.00441191, throughput 2.94407K wps
Begin Testing...
[Epoch 65] train avg loss 0.00464899, dev acc 0.8931, dev avg loss 0.290854, throughput 3.09185K wps
[Epoch 66 Batch 30/172] avg loss 0.00491189, throughput 3.46765K wps
[Epoch 66 Batch 60/172] avg loss 0.00440344, throughput 3.12516K wps
[Epoch 66 Batch 90/172] avg loss 0.00410055, throughput 3.79524K wps
[Epoch 66 Batch 120/172] avg loss 0.00475601, throughput 3.08849K wps
[Epoch 66 Batch 150/172] avg loss 0.00447231, throughput 3.29487K wps
Begin Testing...
[Epoch 66] train avg loss 0.00455857, dev acc 0.8920, dev avg loss 0.289625, throughput 3.38715K wps
[Epoch 67 Batch 30/172] avg loss 0.0044584, throughput 3.42793K wps
[Epoch 67 Batch 60/172] avg loss 0.00481578, throughput 3.14264K wps
[Epoch 67 Batch 90/172] avg loss 0.00466946, throughput 3.06517K wps
[Epoch 67 Batch 120/172] avg loss 0.0046749, throughput 3.01872K wps
[Epoch 67 Batch 150/172] avg loss 0.00426306, throughput 3.203K wps
Begin Testing...
[Epoch 67] train avg loss 0.00456029, dev acc 0.8931, dev avg loss 0.289565, throughput 3.12777K wps
[Epoch 68 Batch 30/172] avg loss 0.00481366, throughput 3.08636K wps
[Epoch 68 Batch 60/172] avg loss 0.00417077, throughput 3.66319K wps
[Epoch 68 Batch 90/172] avg loss 0.00452732, throughput 3.18965K wps
[Epoch 68 Batch 120/172] avg loss 0.00461212, throughput 3.26057K wps
[Epoch 68 Batch 150/172] avg loss 0.00451984, throughput 3.01051K wps
Begin Testing...
[Epoch 68] train avg loss 0.00454344, dev acc 0.8931, dev avg loss 0.290251, throughput 3.2355K wps
[Epoch 69 Batch 30/172] avg loss 0.00463247, throughput 3.52891K wps
[Epoch 69 Batch 60/172] avg loss 0.00473696, throughput 2.92278K wps
[Epoch 69 Batch 90/172] avg loss 0.00444612, throughput 3.2718K wps
[Epoch 69 Batch 120/172] avg loss 0.00430056, throughput 3.5018K wps
[Epoch 69 Batch 150/172] avg loss 0.00422796, throughput 3.00383K wps
Begin Testing...
[Epoch 69] train avg loss 0.00449458, dev acc 0.8931, dev avg loss 0.289535, throughput 3.26599K wps
[Epoch 70 Batch 30/172] avg loss 0.00477549, throughput 3.14251K wps
[Epoch 70 Batch 60/172] avg loss 0.00446934, throughput 3.17667K wps
[Epoch 70 Batch 90/172] avg loss 0.00432902, throughput 3.22294K wps
[Epoch 70 Batch 120/172] avg loss 0.00443986, throughput 2.9685K wps
[Epoch 70 Batch 150/172] avg loss 0.00468949, throughput 3.57108K wps
Begin Testing...
[Epoch 70] train avg loss 0.00453949, dev acc 0.8941, dev avg loss 0.28906, throughput 3.23889K wps
[Epoch 71 Batch 30/172] avg loss 0.00419962, throughput 3.11957K wps
[Epoch 71 Batch 60/172] avg loss 0.00437959, throughput 3.58624K wps
[Epoch 71 Batch 90/172] avg loss 0.00434645, throughput 3.13405K wps
[Epoch 71 Batch 120/172] avg loss 0.0050608, throughput 4.00807K wps
[Epoch 71 Batch 150/172] avg loss 0.00439632, throughput 3.73364K wps
Begin Testing...
[Epoch 71] train avg loss 0.00452136, dev acc 0.8910, dev avg loss 0.288529, throughput 3.45298K wps
[Epoch 72 Batch 30/172] avg loss 0.00416834, throughput 2.90564K wps
[Epoch 72 Batch 60/172] avg loss 0.00464691, throughput 3.15045K wps
[Epoch 72 Batch 90/172] avg loss 0.0045682, throughput 2.91078K wps
[Epoch 72 Batch 120/172] avg loss 0.00434568, throughput 3.14972K wps
[Epoch 72 Batch 150/172] avg loss 0.00458366, throughput 4.12707K wps
Begin Testing...
[Epoch 72] train avg loss 0.00441303, dev acc 0.8941, dev avg loss 0.289258, throughput 3.22225K wps
[Epoch 73 Batch 30/172] avg loss 0.00431959, throughput 3.24994K wps
[Epoch 73 Batch 60/172] avg loss 0.00466357, throughput 3.2344K wps
[Epoch 73 Batch 90/172] avg loss 0.00439424, throughput 3.09936K wps
[Epoch 73 Batch 120/172] avg loss 0.00433194, throughput 3.5559K wps
[Epoch 73 Batch 150/172] avg loss 0.00424773, throughput 2.91119K wps
Begin Testing...
[Epoch 73] train avg loss 0.00442798, dev acc 0.8931, dev avg loss 0.288607, throughput 3.21707K wps
[Epoch 74 Batch 30/172] avg loss 0.00466745, throughput 3.32035K wps
[Epoch 74 Batch 60/172] avg loss 0.00473098, throughput 3.20546K wps
[Epoch 74 Batch 90/172] avg loss 0.00422228, throughput 3.11554K wps
[Epoch 74 Batch 120/172] avg loss 0.00440042, throughput 3.17228K wps
[Epoch 74 Batch 150/172] avg loss 0.00429774, throughput 3.38104K wps
Begin Testing...
[Epoch 74] train avg loss 0.00443787, dev acc 0.8920, dev avg loss 0.290401, throughput 3.33477K wps
[Epoch 75 Batch 30/172] avg loss 0.00421157, throughput 3.24751K wps
[Epoch 75 Batch 60/172] avg loss 0.00413942, throughput 3.38262K wps
[Epoch 75 Batch 90/172] avg loss 0.00462275, throughput 3.80819K wps
[Epoch 75 Batch 120/172] avg loss 0.00474709, throughput 4.12804K wps
[Epoch 75 Batch 150/172] avg loss 0.00425725, throughput 3.45016K wps
Begin Testing...
[Epoch 75] train avg loss 0.00435231, dev acc 0.8931, dev avg loss 0.288779, throughput 3.54438K wps
[Epoch 76 Batch 30/172] avg loss 0.00447376, throughput 3.47886K wps
[Epoch 76 Batch 60/172] avg loss 0.00396477, throughput 3.5373K wps
[Epoch 76 Batch 90/172] avg loss 0.00488424, throughput 3.16726K wps
[Epoch 76 Batch 120/172] avg loss 0.00443685, throughput 3.74525K wps
[Epoch 76 Batch 150/172] avg loss 0.00429369, throughput 3.2395K wps
Begin Testing...
[Epoch 76] train avg loss 0.00434737, dev acc 0.8920, dev avg loss 0.288468, throughput 3.40167K wps
[Epoch 77 Batch 30/172] avg loss 0.00424375, throughput 3.30537K wps
[Epoch 77 Batch 60/172] avg loss 0.00434253, throughput 3.10215K wps
[Epoch 77 Batch 90/172] avg loss 0.00446783, throughput 3.0596K wps
[Epoch 77 Batch 120/172] avg loss 0.00421779, throughput 2.97094K wps
[Epoch 77 Batch 150/172] avg loss 0.00411392, throughput 3.04692K wps
Begin Testing...
[Epoch 77] train avg loss 0.00433333, dev acc 0.8952, dev avg loss 0.288326, throughput 3.14077K wps
Observed Improvement.
Begin Testing...
[Epoch 78 Batch 30/172] avg loss 0.00414387, throughput 2.90797K wps
[Epoch 78 Batch 60/172] avg loss 0.00461166, throughput 3.06603K wps
[Epoch 78 Batch 90/172] avg loss 0.0037979, throughput 3.2315K wps
[Epoch 78 Batch 120/172] avg loss 0.00450695, throughput 3.05258K wps
[Epoch 78 Batch 150/172] avg loss 0.00467417, throughput 3.157K wps
Begin Testing...
[Epoch 78] train avg loss 0.00428996, dev acc 0.8931, dev avg loss 0.288615, throughput 3.09958K wps
[Epoch 79 Batch 30/172] avg loss 0.00431482, throughput 2.94395K wps
[Epoch 79 Batch 60/172] avg loss 0.00457198, throughput 3.01226K wps
[Epoch 79 Batch 90/172] avg loss 0.00430128, throughput 2.94367K wps
[Epoch 79 Batch 120/172] avg loss 0.00436329, throughput 3.49268K wps
[Epoch 79 Batch 150/172] avg loss 0.00406549, throughput 3.43147K wps
Begin Testing...
[Epoch 79] train avg loss 0.00431558, dev acc 0.8931, dev avg loss 0.288707, throughput 3.14521K wps
[Epoch 80 Batch 30/172] avg loss 0.00405893, throughput 3.21048K wps
[Epoch 80 Batch 60/172] avg loss 0.00441195, throughput 3.09887K wps
[Epoch 80 Batch 90/172] avg loss 0.0041375, throughput 3.63138K wps
[Epoch 80 Batch 120/172] avg loss 0.00443323, throughput 3.73838K wps
[Epoch 80 Batch 150/172] avg loss 0.00421422, throughput 3.43341K wps
Begin Testing...
[Epoch 80] train avg loss 0.0042667, dev acc 0.8920, dev avg loss 0.288482, throughput 3.41657K wps
[Epoch 81 Batch 30/172] avg loss 0.00403282, throughput 2.91481K wps
[Epoch 81 Batch 60/172] avg loss 0.0043716, throughput 3.10163K wps
[Epoch 81 Batch 90/172] avg loss 0.00436788, throughput 3.67712K wps
[Epoch 81 Batch 120/172] avg loss 0.00399947, throughput 3.05896K wps
[Epoch 81 Batch 150/172] avg loss 0.00460142, throughput 3.8275K wps
Begin Testing...
[Epoch 81] train avg loss 0.00429336, dev acc 0.8952, dev avg loss 0.288049, throughput 3.23988K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/172] avg loss 0.00428541, throughput 3.28753K wps
[Epoch 82 Batch 60/172] avg loss 0.00462358, throughput 2.86829K wps
[Epoch 82 Batch 90/172] avg loss 0.00382982, throughput 3.06504K wps
[Epoch 82 Batch 120/172] avg loss 0.00417744, throughput 3.65696K wps
[Epoch 82 Batch 150/172] avg loss 0.00432791, throughput 3.75582K wps
Begin Testing...
[Epoch 82] train avg loss 0.00419879, dev acc 0.8973, dev avg loss 0.288152, throughput 3.27802K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/172] avg loss 0.00400673, throughput 2.9557K wps
[Epoch 83 Batch 60/172] avg loss 0.00402086, throughput 3.93179K wps
[Epoch 83 Batch 90/172] avg loss 0.00384908, throughput 3.24148K wps
[Epoch 83 Batch 120/172] avg loss 0.00441908, throughput 3.1995K wps
[Epoch 83 Batch 150/172] avg loss 0.00449433, throughput 3.70562K wps
Begin Testing...
[Epoch 83] train avg loss 0.00415648, dev acc 0.8952, dev avg loss 0.288176, throughput 3.35512K wps
[Epoch 84 Batch 30/172] avg loss 0.00411717, throughput 3.2803K wps
[Epoch 84 Batch 60/172] avg loss 0.00411664, throughput 3.34154K wps
[Epoch 84 Batch 90/172] avg loss 0.00386669, throughput 2.98958K wps
[Epoch 84 Batch 120/172] avg loss 0.00402667, throughput 3.16976K wps
[Epoch 84 Batch 150/172] avg loss 0.00422101, throughput 2.99731K wps
Begin Testing...
[Epoch 84] train avg loss 0.00412546, dev acc 0.8962, dev avg loss 0.288258, throughput 3.13195K wps
[Epoch 85 Batch 30/172] avg loss 0.00411886, throughput 3.40285K wps
[Epoch 85 Batch 60/172] avg loss 0.00408344, throughput 3.68083K wps
[Epoch 85 Batch 90/172] avg loss 0.0039377, throughput 3.3341K wps
[Epoch 85 Batch 120/172] avg loss 0.00491194, throughput 2.99192K wps
[Epoch 85 Batch 150/172] avg loss 0.0037347, throughput 3.66302K wps
Begin Testing...
[Epoch 85] train avg loss 0.00418198, dev acc 0.8952, dev avg loss 0.288093, throughput 3.43861K wps
[Epoch 86 Batch 30/172] avg loss 0.00397635, throughput 3.27426K wps
[Epoch 86 Batch 60/172] avg loss 0.00393046, throughput 3.45287K wps
[Epoch 86 Batch 90/172] avg loss 0.00414686, throughput 3.47978K wps
[Epoch 86 Batch 120/172] avg loss 0.0043437, throughput 3.67024K wps
[Epoch 86 Batch 150/172] avg loss 0.00440783, throughput 2.95035K wps
Begin Testing...
[Epoch 86] train avg loss 0.00413906, dev acc 0.8920, dev avg loss 0.288615, throughput 3.28263K wps
[Epoch 87 Batch 30/172] avg loss 0.00421413, throughput 3.05942K wps
[Epoch 87 Batch 60/172] avg loss 0.00399844, throughput 2.99122K wps
[Epoch 87 Batch 90/172] avg loss 0.00413841, throughput 3.01179K wps
[Epoch 87 Batch 120/172] avg loss 0.00369471, throughput 3.36717K wps
[Epoch 87 Batch 150/172] avg loss 0.00428959, throughput 3.23648K wps
Begin Testing...
[Epoch 87] train avg loss 0.00411718, dev acc 0.8931, dev avg loss 0.287475, throughput 3.11131K wps
[Epoch 88 Batch 30/172] avg loss 0.00404079, throughput 3.00683K wps
[Epoch 88 Batch 60/172] avg loss 0.00441427, throughput 3.49523K wps
[Epoch 88 Batch 90/172] avg loss 0.00422544, throughput 3.84714K wps
[Epoch 88 Batch 120/172] avg loss 0.00386855, throughput 3.3264K wps
[Epoch 88 Batch 150/172] avg loss 0.00380513, throughput 3.18545K wps
Begin Testing...
[Epoch 88] train avg loss 0.00408393, dev acc 0.8920, dev avg loss 0.287587, throughput 3.32741K wps
[Epoch 89 Batch 30/172] avg loss 0.00418543, throughput 3.26363K wps
[Epoch 89 Batch 60/172] avg loss 0.00425141, throughput 3.01589K wps
[Epoch 89 Batch 90/172] avg loss 0.00412781, throughput 3.17198K wps
[Epoch 89 Batch 120/172] avg loss 0.0038558, throughput 3.35134K wps
[Epoch 89 Batch 150/172] avg loss 0.00415142, throughput 3.3745K wps
Begin Testing...
[Epoch 89] train avg loss 0.00406459, dev acc 0.8952, dev avg loss 0.287745, throughput 3.18103K wps
[Epoch 90 Batch 30/172] avg loss 0.00371218, throughput 3.04618K wps
[Epoch 90 Batch 60/172] avg loss 0.00400038, throughput 2.88861K wps
[Epoch 90 Batch 90/172] avg loss 0.00414436, throughput 2.89744K wps
[Epoch 90 Batch 120/172] avg loss 0.00435015, throughput 2.954K wps
[Epoch 90 Batch 150/172] avg loss 0.00372285, throughput 3.226K wps
Begin Testing...
[Epoch 90] train avg loss 0.00399751, dev acc 0.8910, dev avg loss 0.28774, throughput 2.98784K wps
[Epoch 91 Batch 30/172] avg loss 0.00398294, throughput 3.30853K wps
[Epoch 91 Batch 60/172] avg loss 0.00390085, throughput 3.476K wps
[Epoch 91 Batch 90/172] avg loss 0.00379705, throughput 3.72158K wps
[Epoch 91 Batch 120/172] avg loss 0.00434308, throughput 3.53962K wps
[Epoch 91 Batch 150/172] avg loss 0.00393563, throughput 3.36419K wps
Begin Testing...
[Epoch 91] train avg loss 0.00399626, dev acc 0.8952, dev avg loss 0.288282, throughput 3.41452K wps
[Epoch 92 Batch 30/172] avg loss 0.00392608, throughput 3.11706K wps
[Epoch 92 Batch 60/172] avg loss 0.00369887, throughput 3.11966K wps
[Epoch 92 Batch 90/172] avg loss 0.00412449, throughput 3.26468K wps
[Epoch 92 Batch 120/172] avg loss 0.00425181, throughput 3.07975K wps
[Epoch 92 Batch 150/172] avg loss 0.00373091, throughput 3.37615K wps
Begin Testing...
[Epoch 92] train avg loss 0.00397902, dev acc 0.8952, dev avg loss 0.287774, throughput 3.18978K wps
[Epoch 93 Batch 30/172] avg loss 0.00387646, throughput 3.1538K wps
[Epoch 93 Batch 60/172] avg loss 0.00417917, throughput 3.434K wps
[Epoch 93 Batch 90/172] avg loss 0.00385847, throughput 3.30807K wps
[Epoch 93 Batch 120/172] avg loss 0.00411273, throughput 3.77804K wps
[Epoch 93 Batch 150/172] avg loss 0.00385832, throughput 3.22445K wps
Begin Testing...
[Epoch 93] train avg loss 0.00397853, dev acc 0.8952, dev avg loss 0.287914, throughput 3.34648K wps
[Epoch 94 Batch 30/172] avg loss 0.00380235, throughput 3.65197K wps
[Epoch 94 Batch 60/172] avg loss 0.00405506, throughput 3.11579K wps
[Epoch 94 Batch 90/172] avg loss 0.00417256, throughput 2.98588K wps
[Epoch 94 Batch 120/172] avg loss 0.00410221, throughput 3.03931K wps
[Epoch 94 Batch 150/172] avg loss 0.00381083, throughput 3.36603K wps
Begin Testing...
[Epoch 94] train avg loss 0.00399225, dev acc 0.8931, dev avg loss 0.288042, throughput 3.2931K wps
[Epoch 95 Batch 30/172] avg loss 0.00394238, throughput 3.0063K wps
[Epoch 95 Batch 60/172] avg loss 0.00400763, throughput 2.98182K wps
[Epoch 95 Batch 90/172] avg loss 0.00385179, throughput 2.95225K wps
[Epoch 95 Batch 120/172] avg loss 0.00378873, throughput 3.11958K wps
[Epoch 95 Batch 150/172] avg loss 0.00421701, throughput 3.0378K wps
Begin Testing...
[Epoch 95] train avg loss 0.0039637, dev acc 0.8920, dev avg loss 0.288038, throughput 3.13466K wps
[Epoch 96 Batch 30/172] avg loss 0.00384911, throughput 3.1105K wps
[Epoch 96 Batch 60/172] avg loss 0.00381155, throughput 3.71711K wps
[Epoch 96 Batch 90/172] avg loss 0.00410465, throughput 2.94272K wps
[Epoch 96 Batch 120/172] avg loss 0.00400315, throughput 3.05825K wps
[Epoch 96 Batch 150/172] avg loss 0.0042097, throughput 3.2173K wps
Begin Testing...
[Epoch 96] train avg loss 0.00397182, dev acc 0.8910, dev avg loss 0.288858, throughput 3.1942K wps
[Epoch 97 Batch 30/172] avg loss 0.00361455, throughput 3.64933K wps
[Epoch 97 Batch 60/172] avg loss 0.00383386, throughput 3.51009K wps
[Epoch 97 Batch 90/172] avg loss 0.00409922, throughput 3.01971K wps
[Epoch 97 Batch 120/172] avg loss 0.00380838, throughput 3.42968K wps
[Epoch 97 Batch 150/172] avg loss 0.00383468, throughput 2.97574K wps
Begin Testing...
[Epoch 97] train avg loss 0.00393394, dev acc 0.8931, dev avg loss 0.287914, throughput 3.27456K wps
[Epoch 98 Batch 30/172] avg loss 0.00358753, throughput 3.01419K wps
[Epoch 98 Batch 60/172] avg loss 0.00403582, throughput 2.95831K wps
[Epoch 98 Batch 90/172] avg loss 0.00388808, throughput 3.03635K wps
[Epoch 98 Batch 120/172] avg loss 0.00393978, throughput 3.15597K wps
[Epoch 98 Batch 150/172] avg loss 0.00406146, throughput 3.10263K wps
Begin Testing...
[Epoch 98] train avg loss 0.00389784, dev acc 0.8920, dev avg loss 0.288498, throughput 3.04023K wps
[Epoch 99 Batch 30/172] avg loss 0.00403256, throughput 3.26709K wps
[Epoch 99 Batch 60/172] avg loss 0.00374261, throughput 2.96317K wps
[Epoch 99 Batch 90/172] avg loss 0.00387819, throughput 3.12512K wps
[Epoch 99 Batch 120/172] avg loss 0.00387987, throughput 3.21153K wps
[Epoch 99 Batch 150/172] avg loss 0.00369141, throughput 3.32984K wps
Begin Testing...
[Epoch 99] train avg loss 0.00388359, dev acc 0.8910, dev avg loss 0.287578, throughput 3.13545K wps
[Epoch 100 Batch 30/172] avg loss 0.00370454, throughput 2.90353K wps
[Epoch 100 Batch 60/172] avg loss 0.00416681, throughput 3.01627K wps
[Epoch 100 Batch 90/172] avg loss 0.00367229, throughput 3.42452K wps
[Epoch 100 Batch 120/172] avg loss 0.00378642, throughput 3.56651K wps
[Epoch 100 Batch 150/172] avg loss 0.00404067, throughput 3.06262K wps
Begin Testing...
[Epoch 100] train avg loss 0.00385497, dev acc 0.8931, dev avg loss 0.290798, throughput 3.13473K wps
[Epoch 101 Batch 30/172] avg loss 0.00362722, throughput 2.91582K wps
[Epoch 101 Batch 60/172] avg loss 0.00403561, throughput 3.39096K wps
[Epoch 101 Batch 90/172] avg loss 0.00363619, throughput 2.92734K wps
[Epoch 101 Batch 120/172] avg loss 0.00380461, throughput 3.557K wps
[Epoch 101 Batch 150/172] avg loss 0.0040166, throughput 2.99773K wps
Begin Testing...
[Epoch 101] train avg loss 0.00384281, dev acc 0.8920, dev avg loss 0.287958, throughput 3.09803K wps
[Epoch 102 Batch 30/172] avg loss 0.00398713, throughput 2.99923K wps
[Epoch 102 Batch 60/172] avg loss 0.00350298, throughput 3.09381K wps
[Epoch 102 Batch 90/172] avg loss 0.00412249, throughput 3.28542K wps
[Epoch 102 Batch 120/172] avg loss 0.00382524, throughput 3.5407K wps
[Epoch 102 Batch 150/172] avg loss 0.00355246, throughput 3.3317K wps
Begin Testing...
[Epoch 102] train avg loss 0.00379039, dev acc 0.8910, dev avg loss 0.288314, throughput 3.24981K wps
[Epoch 103 Batch 30/172] avg loss 0.00373488, throughput 2.92819K wps
[Epoch 103 Batch 60/172] avg loss 0.00405807, throughput 3.18809K wps
[Epoch 103 Batch 90/172] avg loss 0.00334666, throughput 3.1621K wps
[Epoch 103 Batch 120/172] avg loss 0.00372488, throughput 3.53295K wps
[Epoch 103 Batch 150/172] avg loss 0.00412858, throughput 3.18838K wps
Begin Testing...
[Epoch 103] train avg loss 0.00381553, dev acc 0.8910, dev avg loss 0.287991, throughput 3.23149K wps
[Epoch 104 Batch 30/172] avg loss 0.00368573, throughput 3.01219K wps
[Epoch 104 Batch 60/172] avg loss 0.00375296, throughput 3.38797K wps
[Epoch 104 Batch 90/172] avg loss 0.00403113, throughput 3.55876K wps
[Epoch 104 Batch 120/172] avg loss 0.00411926, throughput 3.60693K wps
[Epoch 104 Batch 150/172] avg loss 0.0036734, throughput 3.43545K wps
Begin Testing...
[Epoch 104] train avg loss 0.00378831, dev acc 0.8920, dev avg loss 0.287951, throughput 3.40667K wps
[Epoch 105 Batch 30/172] avg loss 0.00390658, throughput 3.27872K wps
[Epoch 105 Batch 60/172] avg loss 0.00355568, throughput 3.56094K wps
[Epoch 105 Batch 90/172] avg loss 0.00353505, throughput 3.27469K wps
[Epoch 105 Batch 120/172] avg loss 0.00419389, throughput 3.3084K wps
[Epoch 105 Batch 150/172] avg loss 0.00370728, throughput 3.03696K wps
Begin Testing...
[Epoch 105] train avg loss 0.00378041, dev acc 0.8920, dev avg loss 0.288546, throughput 3.34332K wps
[Epoch 106 Batch 30/172] avg loss 0.00368186, throughput 3.21573K wps
[Epoch 106 Batch 60/172] avg loss 0.00374365, throughput 3.20121K wps
[Epoch 106 Batch 90/172] avg loss 0.00351513, throughput 3.02725K wps
[Epoch 106 Batch 120/172] avg loss 0.00389985, throughput 3.37159K wps
[Epoch 106 Batch 150/172] avg loss 0.003857, throughput 3.37846K wps
Begin Testing...
[Epoch 106] train avg loss 0.00376355, dev acc 0.8910, dev avg loss 0.287754, throughput 3.23983K wps
[Epoch 107 Batch 30/172] avg loss 0.00381799, throughput 3.04331K wps
[Epoch 107 Batch 60/172] avg loss 0.00334893, throughput 3.50903K wps
[Epoch 107 Batch 90/172] avg loss 0.00374035, throughput 2.95416K wps
[Epoch 107 Batch 120/172] avg loss 0.00359001, throughput 3.53852K wps
[Epoch 107 Batch 150/172] avg loss 0.00350323, throughput 3.18043K wps
Begin Testing...
[Epoch 107] train avg loss 0.00367484, dev acc 0.8920, dev avg loss 0.288003, throughput 3.19834K wps
[Epoch 108 Batch 30/172] avg loss 0.0039671, throughput 3.22647K wps
[Epoch 108 Batch 60/172] avg loss 0.00401323, throughput 3.25915K wps
[Epoch 108 Batch 90/172] avg loss 0.0034315, throughput 3.74596K wps
[Epoch 108 Batch 120/172] avg loss 0.00359487, throughput 3.33034K wps
[Epoch 108 Batch 150/172] avg loss 0.00359979, throughput 3.15318K wps
Begin Testing...
[Epoch 108] train avg loss 0.00371377, dev acc 0.8899, dev avg loss 0.290571, throughput 3.39583K wps
[Epoch 109 Batch 30/172] avg loss 0.00371488, throughput 3.19003K wps
[Epoch 109 Batch 60/172] avg loss 0.00381742, throughput 3.12779K wps
[Epoch 109 Batch 90/172] avg loss 0.00340356, throughput 3.36941K wps
[Epoch 109 Batch 120/172] avg loss 0.00411103, throughput 3.49436K wps
[Epoch 109 Batch 150/172] avg loss 0.00318881, throughput 3.21301K wps
Begin Testing...
[Epoch 109] train avg loss 0.0036969, dev acc 0.8889, dev avg loss 0.288644, throughput 3.34243K wps
[Epoch 110 Batch 30/172] avg loss 0.0039055, throughput 3.26736K wps
[Epoch 110 Batch 60/172] avg loss 0.00335964, throughput 3.01068K wps
[Epoch 110 Batch 90/172] avg loss 0.00350881, throughput 3.88968K wps
[Epoch 110 Batch 120/172] avg loss 0.00368449, throughput 3.4556K wps
[Epoch 110 Batch 150/172] avg loss 0.00375382, throughput 3.06619K wps
Begin Testing...
[Epoch 110] train avg loss 0.00364829, dev acc 0.8889, dev avg loss 0.288521, throughput 3.34157K wps
[Epoch 111 Batch 30/172] avg loss 0.00362977, throughput 3.44708K wps
[Epoch 111 Batch 60/172] avg loss 0.00340146, throughput 4.02245K wps
[Epoch 111 Batch 90/172] avg loss 0.00370019, throughput 3.24166K wps
[Epoch 111 Batch 120/172] avg loss 0.00382928, throughput 3.11659K wps
[Epoch 111 Batch 150/172] avg loss 0.00364969, throughput 3.07401K wps
Begin Testing...
[Epoch 111] train avg loss 0.00364102, dev acc 0.8899, dev avg loss 0.288465, throughput 3.36795K wps
[Epoch 112 Batch 30/172] avg loss 0.00336788, throughput 3.42981K wps
[Epoch 112 Batch 60/172] avg loss 0.00364739, throughput 3.16287K wps
[Epoch 112 Batch 90/172] avg loss 0.00370827, throughput 3.02449K wps
[Epoch 112 Batch 120/172] avg loss 0.00362433, throughput 3.50521K wps
[Epoch 112 Batch 150/172] avg loss 0.00353151, throughput 3.20493K wps
Begin Testing...
[Epoch 112] train avg loss 0.00364205, dev acc 0.8899, dev avg loss 0.287979, throughput 3.25123K wps
[Epoch 113 Batch 30/172] avg loss 0.00323935, throughput 3.21964K wps
[Epoch 113 Batch 60/172] avg loss 0.00354657, throughput 3.57375K wps
[Epoch 113 Batch 90/172] avg loss 0.00379021, throughput 4.05031K wps
[Epoch 113 Batch 120/172] avg loss 0.00369283, throughput 3.3575K wps
[Epoch 113 Batch 150/172] avg loss 0.00401921, throughput 2.89071K wps
Begin Testing...
[Epoch 113] train avg loss 0.00362545, dev acc 0.8899, dev avg loss 0.288391, throughput 3.35871K wps
[Epoch 114 Batch 30/172] avg loss 0.00369446, throughput 3.48367K wps
[Epoch 114 Batch 60/172] avg loss 0.00361194, throughput 3.14104K wps
[Epoch 114 Batch 90/172] avg loss 0.00354948, throughput 3.28056K wps
[Epoch 114 Batch 120/172] avg loss 0.00410023, throughput 3.29867K wps
[Epoch 114 Batch 150/172] avg loss 0.00334949, throughput 2.96624K wps
Begin Testing...
[Epoch 114] train avg loss 0.00361279, dev acc 0.8910, dev avg loss 0.288528, throughput 3.22021K wps
[Epoch 115 Batch 30/172] avg loss 0.00350096, throughput 2.95262K wps
[Epoch 115 Batch 60/172] avg loss 0.00391426, throughput 3.09736K wps
[Epoch 115 Batch 90/172] avg loss 0.00324918, throughput 3.08761K wps
[Epoch 115 Batch 120/172] avg loss 0.0034039, throughput 3.1995K wps
[Epoch 115 Batch 150/172] avg loss 0.00340083, throughput 3.8345K wps
Begin Testing...
[Epoch 115] train avg loss 0.00359198, dev acc 0.8931, dev avg loss 0.288945, throughput 3.22192K wps
[Epoch 116 Batch 30/172] avg loss 0.00413534, throughput 3.13214K wps
[Epoch 116 Batch 60/172] avg loss 0.00328207, throughput 2.92455K wps
[Epoch 116 Batch 90/172] avg loss 0.00345271, throughput 3.06657K wps
[Epoch 116 Batch 120/172] avg loss 0.00340152, throughput 3.41482K wps
[Epoch 116 Batch 150/172] avg loss 0.00362597, throughput 3.19526K wps
Begin Testing...
[Epoch 116] train avg loss 0.00356315, dev acc 0.8910, dev avg loss 0.288131, throughput 3.17298K wps
[Epoch 117 Batch 30/172] avg loss 0.00353572, throughput 3.10083K wps
[Epoch 117 Batch 60/172] avg loss 0.00379434, throughput 3.55754K wps
[Epoch 117 Batch 90/172] avg loss 0.00344982, throughput 3.08722K wps
[Epoch 117 Batch 120/172] avg loss 0.00362188, throughput 3.33945K wps
[Epoch 117 Batch 150/172] avg loss 0.00287993, throughput 3.01372K wps
Begin Testing...
[Epoch 117] train avg loss 0.00352194, dev acc 0.8878, dev avg loss 0.288529, throughput 3.26475K wps
[Epoch 118 Batch 30/172] avg loss 0.00345636, throughput 3.65626K wps
[Epoch 118 Batch 60/172] avg loss 0.00338755, throughput 3.2669K wps
[Epoch 118 Batch 90/172] avg loss 0.0036132, throughput 3.08489K wps
[Epoch 118 Batch 120/172] avg loss 0.00384619, throughput 3.03343K wps
[Epoch 118 Batch 150/172] avg loss 0.00314906, throughput 2.88528K wps
Begin Testing...
[Epoch 118] train avg loss 0.00352576, dev acc 0.8931, dev avg loss 0.29037, throughput 3.13988K wps
[Epoch 119 Batch 30/172] avg loss 0.00318678, throughput 3.28382K wps
[Epoch 119 Batch 60/172] avg loss 0.0034678, throughput 3.49979K wps
[Epoch 119 Batch 90/172] avg loss 0.00373737, throughput 2.98617K wps
[Epoch 119 Batch 120/172] avg loss 0.00349494, throughput 2.9393K wps
[Epoch 119 Batch 150/172] avg loss 0.00367504, throughput 3.65145K wps
Begin Testing...
[Epoch 119] train avg loss 0.00345679, dev acc 0.8931, dev avg loss 0.291279, throughput 3.33682K wps
[Epoch 120 Batch 30/172] avg loss 0.00363508, throughput 3.19681K wps
[Epoch 120 Batch 60/172] avg loss 0.00346493, throughput 3.35846K wps
[Epoch 120 Batch 90/172] avg loss 0.00338688, throughput 3.03022K wps
[Epoch 120 Batch 120/172] avg loss 0.00316071, throughput 3.01822K wps
[Epoch 120 Batch 150/172] avg loss 0.00368931, throughput 3.63485K wps
Begin Testing...
[Epoch 120] train avg loss 0.00345079, dev acc 0.8910, dev avg loss 0.288804, throughput 3.18743K wps
[Epoch 121 Batch 30/172] avg loss 0.00368637, throughput 3.06876K wps
[Epoch 121 Batch 60/172] avg loss 0.0031764, throughput 3.20016K wps
[Epoch 121 Batch 90/172] avg loss 0.00318263, throughput 3.50598K wps
[Epoch 121 Batch 120/172] avg loss 0.00365394, throughput 4.21413K wps
[Epoch 121 Batch 150/172] avg loss 0.00373214, throughput 3.55319K wps
Begin Testing...
[Epoch 121] train avg loss 0.00345573, dev acc 0.8910, dev avg loss 0.288325, throughput 3.42645K wps
[Epoch 122 Batch 30/172] avg loss 0.00371073, throughput 3.1133K wps
[Epoch 122 Batch 60/172] avg loss 0.00327035, throughput 3.60627K wps
[Epoch 122 Batch 90/172] avg loss 0.00350882, throughput 3.43681K wps
[Epoch 122 Batch 120/172] avg loss 0.00356241, throughput 3.4574K wps
[Epoch 122 Batch 150/172] avg loss 0.00346496, throughput 3.38154K wps
Begin Testing...
[Epoch 122] train avg loss 0.00348763, dev acc 0.8910, dev avg loss 0.28849, throughput 3.44934K wps
[Epoch 123 Batch 30/172] avg loss 0.00363081, throughput 3.32759K wps
[Epoch 123 Batch 60/172] avg loss 0.00342188, throughput 3.15476K wps
[Epoch 123 Batch 90/172] avg loss 0.00372906, throughput 3.06609K wps
[Epoch 123 Batch 120/172] avg loss 0.00331765, throughput 3.42579K wps
[Epoch 123 Batch 150/172] avg loss 0.00334027, throughput 3.0951K wps
Begin Testing...
[Epoch 123] train avg loss 0.00344587, dev acc 0.8910, dev avg loss 0.290751, throughput 3.2586K wps
[Epoch 124 Batch 30/172] avg loss 0.00331652, throughput 3.57231K wps
[Epoch 124 Batch 60/172] avg loss 0.00335617, throughput 3.94628K wps
[Epoch 124 Batch 90/172] avg loss 0.00309092, throughput 3.66223K wps
[Epoch 124 Batch 120/172] avg loss 0.00345078, throughput 3.22796K wps
[Epoch 124 Batch 150/172] avg loss 0.0038707, throughput 2.99663K wps
Begin Testing...
[Epoch 124] train avg loss 0.00339564, dev acc 0.8910, dev avg loss 0.289, throughput 3.41561K wps
[Epoch 125 Batch 30/172] avg loss 0.00318407, throughput 3.186K wps
[Epoch 125 Batch 60/172] avg loss 0.00335453, throughput 3.38117K wps
[Epoch 125 Batch 90/172] avg loss 0.00372024, throughput 3.48733K wps
[Epoch 125 Batch 120/172] avg loss 0.00314013, throughput 3.25505K wps
[Epoch 125 Batch 150/172] avg loss 0.00348052, throughput 3.03714K wps
Begin Testing...
[Epoch 125] train avg loss 0.00339816, dev acc 0.8899, dev avg loss 0.288233, throughput 3.2977K wps
[Epoch 126 Batch 30/172] avg loss 0.0036362, throughput 3.52574K wps
[Epoch 126 Batch 60/172] avg loss 0.00321073, throughput 3.29878K wps
[Epoch 126 Batch 90/172] avg loss 0.00313675, throughput 3.06332K wps
[Epoch 126 Batch 120/172] avg loss 0.00306088, throughput 3.0899K wps
[Epoch 126 Batch 150/172] avg loss 0.00379439, throughput 3.53645K wps
Begin Testing...
[Epoch 126] train avg loss 0.00335858, dev acc 0.8910, dev avg loss 0.291297, throughput 3.36137K wps
[Epoch 127 Batch 30/172] avg loss 0.00330359, throughput 3.10251K wps
[Epoch 127 Batch 60/172] avg loss 0.00300024, throughput 3.26549K wps
[Epoch 127 Batch 90/172] avg loss 0.00359538, throughput 3.29858K wps
[Epoch 127 Batch 120/172] avg loss 0.00373544, throughput 3.23356K wps
[Epoch 127 Batch 150/172] avg loss 0.00343883, throughput 2.96389K wps
Begin Testing...
[Epoch 127] train avg loss 0.00338745, dev acc 0.8889, dev avg loss 0.289583, throughput 3.16572K wps
[Epoch 128 Batch 30/172] avg loss 0.00341072, throughput 3.13504K wps
[Epoch 128 Batch 60/172] avg loss 0.00351354, throughput 3.03528K wps
[Epoch 128 Batch 90/172] avg loss 0.0030596, throughput 3.35169K wps
[Epoch 128 Batch 120/172] avg loss 0.00374479, throughput 3.07997K wps
[Epoch 128 Batch 150/172] avg loss 0.00308436, throughput 3.78789K wps
Begin Testing...
[Epoch 128] train avg loss 0.00334085, dev acc 0.8899, dev avg loss 0.289658, throughput 3.2519K wps
[Epoch 129 Batch 30/172] avg loss 0.00319927, throughput 2.98377K wps
[Epoch 129 Batch 60/172] avg loss 0.00349544, throughput 2.96087K wps
[Epoch 129 Batch 90/172] avg loss 0.00315246, throughput 3.01574K wps
[Epoch 129 Batch 120/172] avg loss 0.00358489, throughput 3.13963K wps
[Epoch 129 Batch 150/172] avg loss 0.00325204, throughput 3.53344K wps
Begin Testing...
[Epoch 129] train avg loss 0.00337009, dev acc 0.8899, dev avg loss 0.289057, throughput 3.09026K wps
[Epoch 130 Batch 30/172] avg loss 0.00316335, throughput 3.71622K wps
[Epoch 130 Batch 60/172] avg loss 0.00325024, throughput 3.2368K wps
[Epoch 130 Batch 90/172] avg loss 0.00359476, throughput 3.16143K wps
[Epoch 130 Batch 120/172] avg loss 0.00310851, throughput 3.31506K wps
[Epoch 130 Batch 150/172] avg loss 0.00372226, throughput 3.54482K wps
Begin Testing...
[Epoch 130] train avg loss 0.00332556, dev acc 0.8889, dev avg loss 0.29112, throughput 3.33427K wps
[Epoch 131 Batch 30/172] avg loss 0.00314437, throughput 3.21832K wps
[Epoch 131 Batch 60/172] avg loss 0.00357296, throughput 3.20054K wps
[Epoch 131 Batch 90/172] avg loss 0.00326554, throughput 3.12621K wps
[Epoch 131 Batch 120/172] avg loss 0.00348722, throughput 3.03359K wps
[Epoch 131 Batch 150/172] avg loss 0.00308078, throughput 3.54484K wps
Begin Testing...
[Epoch 131] train avg loss 0.00331733, dev acc 0.8910, dev avg loss 0.289277, throughput 3.18599K wps
[Epoch 132 Batch 30/172] avg loss 0.00325831, throughput 3.58427K wps
[Epoch 132 Batch 60/172] avg loss 0.00309546, throughput 2.89914K wps
[Epoch 132 Batch 90/172] avg loss 0.00328082, throughput 2.91233K wps
[Epoch 132 Batch 120/172] avg loss 0.00297116, throughput 3.34741K wps
[Epoch 132 Batch 150/172] avg loss 0.00325304, throughput 2.9202K wps
Begin Testing...
[Epoch 132] train avg loss 0.00325156, dev acc 0.8910, dev avg loss 0.289464, throughput 3.11762K wps
[Epoch 133 Batch 30/172] avg loss 0.00331296, throughput 2.87439K wps
[Epoch 133 Batch 60/172] avg loss 0.00326705, throughput 3.50076K wps
[Epoch 133 Batch 90/172] avg loss 0.00313224, throughput 2.90576K wps
[Epoch 133 Batch 120/172] avg loss 0.00311789, throughput 3.20963K wps
[Epoch 133 Batch 150/172] avg loss 0.00345387, throughput 3.05074K wps
Begin Testing...
[Epoch 133] train avg loss 0.00327178, dev acc 0.8889, dev avg loss 0.290226, throughput 3.12125K wps
[Epoch 134 Batch 30/172] avg loss 0.00358193, throughput 3.48106K wps
[Epoch 134 Batch 60/172] avg loss 0.00318872, throughput 3.25122K wps
[Epoch 134 Batch 90/172] avg loss 0.00326773, throughput 3.30863K wps
[Epoch 134 Batch 120/172] avg loss 0.00282143, throughput 3.41311K wps
[Epoch 134 Batch 150/172] avg loss 0.00341337, throughput 3.40114K wps
Begin Testing...
[Epoch 134] train avg loss 0.00326303, dev acc 0.8920, dev avg loss 0.289317, throughput 3.298K wps
[Epoch 135 Batch 30/172] avg loss 0.00312977, throughput 2.95348K wps
[Epoch 135 Batch 60/172] avg loss 0.0037889, throughput 3.54462K wps
[Epoch 135 Batch 90/172] avg loss 0.00296941, throughput 3.25542K wps
[Epoch 135 Batch 120/172] avg loss 0.00326183, throughput 3.05083K wps
[Epoch 135 Batch 150/172] avg loss 0.00319186, throughput 3.35889K wps
Begin Testing...
[Epoch 135] train avg loss 0.00327173, dev acc 0.8899, dev avg loss 0.291775, throughput 3.21249K wps
[Epoch 136 Batch 30/172] avg loss 0.00323464, throughput 2.86758K wps
[Epoch 136 Batch 60/172] avg loss 0.00344211, throughput 3.27257K wps
[Epoch 136 Batch 90/172] avg loss 0.0032668, throughput 3.70081K wps
[Epoch 136 Batch 120/172] avg loss 0.00324199, throughput 3.24047K wps
[Epoch 136 Batch 150/172] avg loss 0.00315421, throughput 2.91751K wps
Begin Testing...
[Epoch 136] train avg loss 0.00327376, dev acc 0.8920, dev avg loss 0.289259, throughput 3.21715K wps
[Epoch 137 Batch 30/172] avg loss 0.00326354, throughput 3.29439K wps
[Epoch 137 Batch 60/172] avg loss 0.00326909, throughput 3.35054K wps
[Epoch 137 Batch 90/172] avg loss 0.00303402, throughput 3.03863K wps
[Epoch 137 Batch 120/172] avg loss 0.00352298, throughput 2.90155K wps
[Epoch 137 Batch 150/172] avg loss 0.00299489, throughput 3.53647K wps
Begin Testing...
[Epoch 137] train avg loss 0.00319292, dev acc 0.8910, dev avg loss 0.290211, throughput 3.24248K wps
[Epoch 138 Batch 30/172] avg loss 0.00284735, throughput 3.66363K wps
[Epoch 138 Batch 60/172] avg loss 0.00297179, throughput 3.16532K wps
[Epoch 138 Batch 90/172] avg loss 0.00313311, throughput 3.0215K wps
[Epoch 138 Batch 120/172] avg loss 0.00351036, throughput 3.53252K wps
[Epoch 138 Batch 150/172] avg loss 0.00316743, throughput 3.29205K wps
Begin Testing...
[Epoch 138] train avg loss 0.00316724, dev acc 0.8910, dev avg loss 0.290615, throughput 3.37584K wps
[Epoch 139 Batch 30/172] avg loss 0.00309648, throughput 3.42262K wps
[Epoch 139 Batch 60/172] avg loss 0.00319797, throughput 3.24744K wps
[Epoch 139 Batch 90/172] avg loss 0.00312413, throughput 3.67668K wps
[Epoch 139 Batch 120/172] avg loss 0.00331347, throughput 3.14446K wps
[Epoch 139 Batch 150/172] avg loss 0.00343506, throughput 2.95798K wps
Begin Testing...
[Epoch 139] train avg loss 0.0032283, dev acc 0.8931, dev avg loss 0.289442, throughput 3.27815K wps
[Epoch 140 Batch 30/172] avg loss 0.00325824, throughput 3.68911K wps
[Epoch 140 Batch 60/172] avg loss 0.00297454, throughput 3.30197K wps
[Epoch 140 Batch 90/172] avg loss 0.00321118, throughput 3.37165K wps
[Epoch 140 Batch 120/172] avg loss 0.0032978, throughput 3.47555K wps
[Epoch 140 Batch 150/172] avg loss 0.00307837, throughput 3.26646K wps
Begin Testing...
[Epoch 140] train avg loss 0.00312713, dev acc 0.8920, dev avg loss 0.290355, throughput 3.39538K wps
[Epoch 141 Batch 30/172] avg loss 0.00320625, throughput 3.50111K wps
[Epoch 141 Batch 60/172] avg loss 0.00310705, throughput 3.2781K wps
[Epoch 141 Batch 90/172] avg loss 0.00309474, throughput 3.61909K wps
[Epoch 141 Batch 120/172] avg loss 0.00311048, throughput 3.25327K wps
[Epoch 141 Batch 150/172] avg loss 0.00319666, throughput 3.32949K wps
Begin Testing...
[Epoch 141] train avg loss 0.00314825, dev acc 0.8910, dev avg loss 0.290299, throughput 3.36668K wps
[Epoch 142 Batch 30/172] avg loss 0.0034395, throughput 3.22168K wps
[Epoch 142 Batch 60/172] avg loss 0.00283715, throughput 3.41605K wps
[Epoch 142 Batch 90/172] avg loss 0.00323308, throughput 2.97043K wps
[Epoch 142 Batch 120/172] avg loss 0.00310377, throughput 3.12897K wps
[Epoch 142 Batch 150/172] avg loss 0.00279501, throughput 3.38525K wps
Begin Testing...
[Epoch 142] train avg loss 0.00308358, dev acc 0.8899, dev avg loss 0.291152, throughput 3.1778K wps
[Epoch 143 Batch 30/172] avg loss 0.00337836, throughput 2.94977K wps
[Epoch 143 Batch 60/172] avg loss 0.00277336, throughput 2.94651K wps
[Epoch 143 Batch 90/172] avg loss 0.00295286, throughput 3.419K wps
[Epoch 143 Batch 120/172] avg loss 0.0031003, throughput 2.92932K wps
[Epoch 143 Batch 150/172] avg loss 0.00325049, throughput 2.90163K wps
Begin Testing...
[Epoch 143] train avg loss 0.00312015, dev acc 0.8889, dev avg loss 0.291236, throughput 3.0131K wps
[Epoch 144 Batch 30/172] avg loss 0.00322647, throughput 3.60347K wps
[Epoch 144 Batch 60/172] avg loss 0.00310547, throughput 3.23616K wps
[Epoch 144 Batch 90/172] avg loss 0.00286696, throughput 3.2633K wps
[Epoch 144 Batch 120/172] avg loss 0.00330534, throughput 3.35433K wps
[Epoch 144 Batch 150/172] avg loss 0.00293739, throughput 3.11345K wps
Begin Testing...
[Epoch 144] train avg loss 0.0030973, dev acc 0.8910, dev avg loss 0.290433, throughput 3.26425K wps
[Epoch 145 Batch 30/172] avg loss 0.00311704, throughput 2.98154K wps
[Epoch 145 Batch 60/172] avg loss 0.0030117, throughput 3.24934K wps
[Epoch 145 Batch 90/172] avg loss 0.00266867, throughput 3.67798K wps
[Epoch 145 Batch 120/172] avg loss 0.00307894, throughput 3.07808K wps
[Epoch 145 Batch 150/172] avg loss 0.00320724, throughput 3.50954K wps
Begin Testing...
[Epoch 145] train avg loss 0.00306991, dev acc 0.8889, dev avg loss 0.292296, throughput 3.2384K wps
[Epoch 146 Batch 30/172] avg loss 0.00299226, throughput 3.39626K wps
[Epoch 146 Batch 60/172] avg loss 0.00326786, throughput 3.67792K wps
[Epoch 146 Batch 90/172] avg loss 0.00256652, throughput 3.07139K wps
[Epoch 146 Batch 120/172] avg loss 0.00330459, throughput 3.32565K wps
[Epoch 146 Batch 150/172] avg loss 0.00301008, throughput 3.6268K wps
Begin Testing...
[Epoch 146] train avg loss 0.00303765, dev acc 0.8899, dev avg loss 0.291706, throughput 3.43142K wps
[Epoch 147 Batch 30/172] avg loss 0.00309433, throughput 3.31896K wps
[Epoch 147 Batch 60/172] avg loss 0.00314632, throughput 3.29368K wps
[Epoch 147 Batch 90/172] avg loss 0.00299306, throughput 3.72183K wps
[Epoch 147 Batch 120/172] avg loss 0.00267962, throughput 3.13067K wps
[Epoch 147 Batch 150/172] avg loss 0.00316444, throughput 3.14283K wps
Begin Testing...
[Epoch 147] train avg loss 0.0030093, dev acc 0.8899, dev avg loss 0.291461, throughput 3.35201K wps
[Epoch 148 Batch 30/172] avg loss 0.0032827, throughput 3.3544K wps
[Epoch 148 Batch 60/172] avg loss 0.003123, throughput 3.28163K wps
[Epoch 148 Batch 90/172] avg loss 0.00283055, throughput 3.40732K wps
[Epoch 148 Batch 120/172] avg loss 0.0028578, throughput 3.49467K wps
[Epoch 148 Batch 150/172] avg loss 0.00308969, throughput 3.6377K wps
Begin Testing...
[Epoch 148] train avg loss 0.00305443, dev acc 0.8889, dev avg loss 0.292699, throughput 3.46823K wps
[Epoch 149 Batch 30/172] avg loss 0.00321015, throughput 3.53737K wps
[Epoch 149 Batch 60/172] avg loss 0.00284152, throughput 3.22352K wps
[Epoch 149 Batch 90/172] avg loss 0.00287635, throughput 3.32353K wps
[Epoch 149 Batch 120/172] avg loss 0.00293728, throughput 3.4522K wps
[Epoch 149 Batch 150/172] avg loss 0.00317717, throughput 3.07516K wps
Begin Testing...
[Epoch 149] train avg loss 0.00303832, dev acc 0.8899, dev avg loss 0.291466, throughput 3.35932K wps
[Epoch 150 Batch 30/172] avg loss 0.00317808, throughput 2.96678K wps
[Epoch 150 Batch 60/172] avg loss 0.00307452, throughput 3.89948K wps
[Epoch 150 Batch 90/172] avg loss 0.00291867, throughput 4.06657K wps
[Epoch 150 Batch 120/172] avg loss 0.00302669, throughput 3.24278K wps
[Epoch 150 Batch 150/172] avg loss 0.00274182, throughput 3.53051K wps
Begin Testing...
[Epoch 150] train avg loss 0.00301111, dev acc 0.8920, dev avg loss 0.291981, throughput 3.52294K wps
[Epoch 151 Batch 30/172] avg loss 0.00339581, throughput 4.04078K wps
[Epoch 151 Batch 60/172] avg loss 0.00285335, throughput 3.14361K wps
[Epoch 151 Batch 90/172] avg loss 0.00268217, throughput 3.07896K wps
[Epoch 151 Batch 120/172] avg loss 0.0027933, throughput 2.9889K wps
[Epoch 151 Batch 150/172] avg loss 0.00295101, throughput 3.53228K wps
Begin Testing...
[Epoch 151] train avg loss 0.00295571, dev acc 0.8920, dev avg loss 0.292217, throughput 3.28825K wps
[Epoch 152 Batch 30/172] avg loss 0.00295356, throughput 3.14323K wps
[Epoch 152 Batch 60/172] avg loss 0.00302608, throughput 3.23592K wps
[Epoch 152 Batch 90/172] avg loss 0.00301718, throughput 3.35823K wps
[Epoch 152 Batch 120/172] avg loss 0.00295168, throughput 3.1201K wps
[Epoch 152 Batch 150/172] avg loss 0.00334856, throughput 3.14499K wps
Begin Testing...
[Epoch 152] train avg loss 0.00304861, dev acc 0.8899, dev avg loss 0.292437, throughput 3.2887K wps
[Epoch 153 Batch 30/172] avg loss 0.0030581, throughput 3.64842K wps
[Epoch 153 Batch 60/172] avg loss 0.00289675, throughput 3.01395K wps
[Epoch 153 Batch 90/172] avg loss 0.00298986, throughput 3.28081K wps
[Epoch 153 Batch 120/172] avg loss 0.00308858, throughput 3.72225K wps
[Epoch 153 Batch 150/172] avg loss 0.00327312, throughput 3.36157K wps
Begin Testing...
[Epoch 153] train avg loss 0.00301314, dev acc 0.8931, dev avg loss 0.293471, throughput 3.35797K wps
[Epoch 154 Batch 30/172] avg loss 0.00266955, throughput 3.51002K wps
[Epoch 154 Batch 60/172] avg loss 0.00285542, throughput 3.17907K wps
[Epoch 154 Batch 90/172] avg loss 0.00319862, throughput 2.92501K wps
[Epoch 154 Batch 120/172] avg loss 0.00276641, throughput 3.12804K wps
[Epoch 154 Batch 150/172] avg loss 0.00325079, throughput 2.93737K wps
Begin Testing...
[Epoch 154] train avg loss 0.00299972, dev acc 0.8899, dev avg loss 0.292084, throughput 3.12061K wps
[Epoch 155 Batch 30/172] avg loss 0.00308295, throughput 2.88997K wps
[Epoch 155 Batch 60/172] avg loss 0.00287574, throughput 3.00438K wps
[Epoch 155 Batch 90/172] avg loss 0.00300968, throughput 3.28822K wps
[Epoch 155 Batch 120/172] avg loss 0.00261484, throughput 3.06788K wps
[Epoch 155 Batch 150/172] avg loss 0.00326494, throughput 2.96123K wps
Begin Testing...
[Epoch 155] train avg loss 0.00294681, dev acc 0.8899, dev avg loss 0.292648, throughput 3.07936K wps
[Epoch 156 Batch 30/172] avg loss 0.00318373, throughput 3.55552K wps
[Epoch 156 Batch 60/172] avg loss 0.00283278, throughput 2.9157K wps
[Epoch 156 Batch 90/172] avg loss 0.00315793, throughput 3.55764K wps
[Epoch 156 Batch 120/172] avg loss 0.00285444, throughput 3.06547K wps
[Epoch 156 Batch 150/172] avg loss 0.00266357, throughput 2.9022K wps
Begin Testing...
[Epoch 156] train avg loss 0.00295725, dev acc 0.8899, dev avg loss 0.29216, throughput 3.21929K wps
[Epoch 157 Batch 30/172] avg loss 0.00283183, throughput 3.10399K wps
[Epoch 157 Batch 60/172] avg loss 0.00312352, throughput 4.23185K wps
[Epoch 157 Batch 90/172] avg loss 0.00272667, throughput 2.89914K wps
[Epoch 157 Batch 120/172] avg loss 0.00297088, throughput 3.12218K wps
[Epoch 157 Batch 150/172] avg loss 0.0028345, throughput 3.59308K wps
Begin Testing...
[Epoch 157] train avg loss 0.00295476, dev acc 0.8910, dev avg loss 0.292349, throughput 3.27247K wps
[Epoch 158 Batch 30/172] avg loss 0.00312136, throughput 2.85871K wps
[Epoch 158 Batch 60/172] avg loss 0.0026299, throughput 3.41261K wps
[Epoch 158 Batch 90/172] avg loss 0.00284862, throughput 3.11157K wps
[Epoch 158 Batch 120/172] avg loss 0.00285369, throughput 3.26537K wps
[Epoch 158 Batch 150/172] avg loss 0.00314286, throughput 3.21123K wps
Begin Testing...
[Epoch 158] train avg loss 0.0029354, dev acc 0.8899, dev avg loss 0.293355, throughput 3.12449K wps
[Epoch 159 Batch 30/172] avg loss 0.00271107, throughput 3.31808K wps
[Epoch 159 Batch 60/172] avg loss 0.00274376, throughput 3.3659K wps
[Epoch 159 Batch 90/172] avg loss 0.00311421, throughput 2.91556K wps
[Epoch 159 Batch 120/172] avg loss 0.00299871, throughput 2.91621K wps
[Epoch 159 Batch 150/172] avg loss 0.00253182, throughput 3.75089K wps
Begin Testing...
[Epoch 159] train avg loss 0.00286866, dev acc 0.8920, dev avg loss 0.293604, throughput 3.24303K wps
[Epoch 160 Batch 30/172] avg loss 0.00308377, throughput 3.28133K wps
[Epoch 160 Batch 60/172] avg loss 0.00279056, throughput 3.18663K wps
[Epoch 160 Batch 90/172] avg loss 0.00251737, throughput 3.54788K wps
[Epoch 160 Batch 120/172] avg loss 0.00347043, throughput 3.15142K wps
[Epoch 160 Batch 150/172] avg loss 0.00288413, throughput 3.49062K wps
Begin Testing...
[Epoch 160] train avg loss 0.00292504, dev acc 0.8910, dev avg loss 0.294365, throughput 3.29597K wps
[Epoch 161 Batch 30/172] avg loss 0.00263732, throughput 3.1819K wps
[Epoch 161 Batch 60/172] avg loss 0.00263023, throughput 2.93758K wps
[Epoch 161 Batch 90/172] avg loss 0.00285946, throughput 3.12224K wps
[Epoch 161 Batch 120/172] avg loss 0.00329394, throughput 3.37832K wps
[Epoch 161 Batch 150/172] avg loss 0.00294131, throughput 3.24567K wps
Begin Testing...
[Epoch 161] train avg loss 0.00283032, dev acc 0.8899, dev avg loss 0.293539, throughput 3.12957K wps
[Epoch 162 Batch 30/172] avg loss 0.00288744, throughput 3.65233K wps
[Epoch 162 Batch 60/172] avg loss 0.00293193, throughput 2.99006K wps
[Epoch 162 Batch 90/172] avg loss 0.00319725, throughput 4.29917K wps
[Epoch 162 Batch 120/172] avg loss 0.00266984, throughput 3.27365K wps
[Epoch 162 Batch 150/172] avg loss 0.00279628, throughput 2.89689K wps
Begin Testing...
[Epoch 162] train avg loss 0.00288238, dev acc 0.8910, dev avg loss 0.293805, throughput 3.28543K wps
[Epoch 163 Batch 30/172] avg loss 0.00263141, throughput 3.19723K wps
[Epoch 163 Batch 60/172] avg loss 0.00290293, throughput 2.95713K wps
[Epoch 163 Batch 90/172] avg loss 0.00245708, throughput 3.58796K wps
[Epoch 163 Batch 120/172] avg loss 0.00318131, throughput 3.48106K wps
[Epoch 163 Batch 150/172] avg loss 0.00290785, throughput 3.27492K wps
Begin Testing...
[Epoch 163] train avg loss 0.00281367, dev acc 0.8899, dev avg loss 0.293394, throughput 3.28621K wps
[Epoch 164 Batch 30/172] avg loss 0.00269554, throughput 2.88326K wps
[Epoch 164 Batch 60/172] avg loss 0.00269107, throughput 3.39199K wps
[Epoch 164 Batch 90/172] avg loss 0.0025846, throughput 2.93301K wps
[Epoch 164 Batch 120/172] avg loss 0.00328373, throughput 3.27829K wps
[Epoch 164 Batch 150/172] avg loss 0.00288613, throughput 3.49334K wps
Begin Testing...
[Epoch 164] train avg loss 0.00281133, dev acc 0.8910, dev avg loss 0.293899, throughput 3.19717K wps
[Epoch 165 Batch 30/172] avg loss 0.00257807, throughput 3.12216K wps
[Epoch 165 Batch 60/172] avg loss 0.00290161, throughput 3.91743K wps
[Epoch 165 Batch 90/172] avg loss 0.00261715, throughput 3.04305K wps
[Epoch 165 Batch 120/172] avg loss 0.00271715, throughput 2.98927K wps
[Epoch 165 Batch 150/172] avg loss 0.00305601, throughput 3.10768K wps
Begin Testing...
[Epoch 165] train avg loss 0.00281403, dev acc 0.8910, dev avg loss 0.293813, throughput 3.19845K wps
[Epoch 166 Batch 30/172] avg loss 0.00282884, throughput 3.18938K wps
[Epoch 166 Batch 60/172] avg loss 0.00307284, throughput 3.04847K wps
[Epoch 166 Batch 90/172] avg loss 0.00253494, throughput 2.97244K wps
[Epoch 166 Batch 120/172] avg loss 0.00290433, throughput 3.2249K wps
[Epoch 166 Batch 150/172] avg loss 0.00278593, throughput 3.6331K wps
Begin Testing...
[Epoch 166] train avg loss 0.00278765, dev acc 0.8910, dev avg loss 0.294325, throughput 3.21325K wps
[Epoch 167 Batch 30/172] avg loss 0.00278197, throughput 3.97212K wps
[Epoch 167 Batch 60/172] avg loss 0.0030124, throughput 2.95656K wps
[Epoch 167 Batch 90/172] avg loss 0.00272877, throughput 2.89303K wps
[Epoch 167 Batch 120/172] avg loss 0.00287364, throughput 3.57502K wps
[Epoch 167 Batch 150/172] avg loss 0.0028844, throughput 3.50488K wps
Begin Testing...
[Epoch 167] train avg loss 0.00280889, dev acc 0.8920, dev avg loss 0.297759, throughput 3.3733K wps
[Epoch 168 Batch 30/172] avg loss 0.00292751, throughput 2.99969K wps
[Epoch 168 Batch 60/172] avg loss 0.00243488, throughput 3.43207K wps
[Epoch 168 Batch 90/172] avg loss 0.00267188, throughput 3.46389K wps
[Epoch 168 Batch 120/172] avg loss 0.003017, throughput 3.84243K wps
[Epoch 168 Batch 150/172] avg loss 0.00313536, throughput 3.19116K wps
Begin Testing...
[Epoch 168] train avg loss 0.00278391, dev acc 0.8899, dev avg loss 0.294544, throughput 3.33381K wps
[Epoch 169 Batch 30/172] avg loss 0.00290737, throughput 3.13127K wps
[Epoch 169 Batch 60/172] avg loss 0.00295048, throughput 3.53691K wps
[Epoch 169 Batch 90/172] avg loss 0.00282558, throughput 3.61441K wps
[Epoch 169 Batch 120/172] avg loss 0.00267688, throughput 3.36952K wps
[Epoch 169 Batch 150/172] avg loss 0.00275968, throughput 3.28972K wps
Begin Testing...
[Epoch 169] train avg loss 0.00278626, dev acc 0.8920, dev avg loss 0.294422, throughput 3.34673K wps
[Epoch 170 Batch 30/172] avg loss 0.00251881, throughput 3.21053K wps
[Epoch 170 Batch 60/172] avg loss 0.00275228, throughput 2.97344K wps
[Epoch 170 Batch 90/172] avg loss 0.00282915, throughput 3.46879K wps
[Epoch 170 Batch 120/172] avg loss 0.00314482, throughput 4.00031K wps
[Epoch 170 Batch 150/172] avg loss 0.00282164, throughput 2.94442K wps
Begin Testing...
[Epoch 170] train avg loss 0.0028182, dev acc 0.8910, dev avg loss 0.295253, throughput 3.21801K wps
[Epoch 171 Batch 30/172] avg loss 0.00284667, throughput 3.02759K wps
[Epoch 171 Batch 60/172] avg loss 0.00248564, throughput 2.94641K wps
[Epoch 171 Batch 90/172] avg loss 0.002959, throughput 3.49458K wps
[Epoch 171 Batch 120/172] avg loss 0.00297266, throughput 3.06553K wps
[Epoch 171 Batch 150/172] avg loss 0.00284402, throughput 2.91724K wps
Begin Testing...
[Epoch 171] train avg loss 0.00283821, dev acc 0.8910, dev avg loss 0.293925, throughput 3.06676K wps
[Epoch 172 Batch 30/172] avg loss 0.00307228, throughput 3.63801K wps
[Epoch 172 Batch 60/172] avg loss 0.00314168, throughput 3.9016K wps
[Epoch 172 Batch 90/172] avg loss 0.00268749, throughput 2.9646K wps
[Epoch 172 Batch 120/172] avg loss 0.00240147, throughput 3.39904K wps
[Epoch 172 Batch 150/172] avg loss 0.00258137, throughput 3.23385K wps
Begin Testing...
[Epoch 172] train avg loss 0.0027318, dev acc 0.8920, dev avg loss 0.298744, throughput 3.37246K wps
[Epoch 173 Batch 30/172] avg loss 0.00302416, throughput 3.16207K wps
[Epoch 173 Batch 60/172] avg loss 0.00272155, throughput 3.06335K wps
[Epoch 173 Batch 90/172] avg loss 0.00263447, throughput 3.16561K wps
[Epoch 173 Batch 120/172] avg loss 0.00260407, throughput 3.84861K wps
[Epoch 173 Batch 150/172] avg loss 0.00267569, throughput 3.94722K wps
Begin Testing...
[Epoch 173] train avg loss 0.00275574, dev acc 0.8920, dev avg loss 0.298766, throughput 3.3651K wps
[Epoch 174 Batch 30/172] avg loss 0.00301485, throughput 3.30169K wps
[Epoch 174 Batch 60/172] avg loss 0.00286091, throughput 3.00391K wps
[Epoch 174 Batch 90/172] avg loss 0.00239318, throughput 2.95098K wps
[Epoch 174 Batch 120/172] avg loss 0.00266045, throughput 2.96664K wps
[Epoch 174 Batch 150/172] avg loss 0.0028554, throughput 3.16739K wps
Begin Testing...
[Epoch 174] train avg loss 0.00270822, dev acc 0.8910, dev avg loss 0.295883, throughput 3.09305K wps
[Epoch 175 Batch 30/172] avg loss 0.00289208, throughput 3.10115K wps
[Epoch 175 Batch 60/172] avg loss 0.00250703, throughput 3.35115K wps
[Epoch 175 Batch 90/172] avg loss 0.00296, throughput 3.03237K wps
[Epoch 175 Batch 120/172] avg loss 0.00252593, throughput 3.23748K wps
[Epoch 175 Batch 150/172] avg loss 0.00273408, throughput 2.92486K wps
Begin Testing...
[Epoch 175] train avg loss 0.00269957, dev acc 0.8899, dev avg loss 0.300304, throughput 3.11121K wps
[Epoch 176 Batch 30/172] avg loss 0.00260725, throughput 3.25791K wps
[Epoch 176 Batch 60/172] avg loss 0.00253681, throughput 3.59621K wps
[Epoch 176 Batch 90/172] avg loss 0.00263771, throughput 2.93935K wps
[Epoch 176 Batch 120/172] avg loss 0.00290226, throughput 3.04311K wps
[Epoch 176 Batch 150/172] avg loss 0.00256718, throughput 3.07245K wps
Begin Testing...
[Epoch 176] train avg loss 0.00265451, dev acc 0.8910, dev avg loss 0.296531, throughput 3.22928K wps
[Epoch 177 Batch 30/172] avg loss 0.00257561, throughput 3.15605K wps
[Epoch 177 Batch 60/172] avg loss 0.00240491, throughput 3.08911K wps
[Epoch 177 Batch 90/172] avg loss 0.00250677, throughput 3.1446K wps
[Epoch 177 Batch 120/172] avg loss 0.00309183, throughput 3.25228K wps
[Epoch 177 Batch 150/172] avg loss 0.00285328, throughput 3.26311K wps
Begin Testing...
[Epoch 177] train avg loss 0.00267813, dev acc 0.8899, dev avg loss 0.297486, throughput 3.20061K wps
[Epoch 178 Batch 30/172] avg loss 0.00256015, throughput 3.86886K wps
[Epoch 178 Batch 60/172] avg loss 0.00271976, throughput 3.3895K wps
[Epoch 178 Batch 90/172] avg loss 0.00269212, throughput 3.79714K wps
[Epoch 178 Batch 120/172] avg loss 0.00287677, throughput 4.0523K wps
[Epoch 178 Batch 150/172] avg loss 0.00276961, throughput 3.22995K wps
Begin Testing...
[Epoch 178] train avg loss 0.00270548, dev acc 0.8899, dev avg loss 0.295625, throughput 3.57745K wps
[Epoch 179 Batch 30/172] avg loss 0.00269704, throughput 2.95797K wps
[Epoch 179 Batch 60/172] avg loss 0.00259405, throughput 3.04341K wps
[Epoch 179 Batch 90/172] avg loss 0.002871, throughput 2.95686K wps
[Epoch 179 Batch 120/172] avg loss 0.00278483, throughput 3.51891K wps
[Epoch 179 Batch 150/172] avg loss 0.00283577, throughput 3.83605K wps
Begin Testing...
[Epoch 179] train avg loss 0.00269691, dev acc 0.8910, dev avg loss 0.296544, throughput 3.32679K wps
[Epoch 180 Batch 30/172] avg loss 0.00243765, throughput 3.92461K wps
[Epoch 180 Batch 60/172] avg loss 0.00271557, throughput 4.09433K wps
[Epoch 180 Batch 90/172] avg loss 0.00292907, throughput 3.24393K wps
[Epoch 180 Batch 120/172] avg loss 0.0026773, throughput 3.48367K wps
[Epoch 180 Batch 150/172] avg loss 0.00230558, throughput 3.14822K wps
Begin Testing...
[Epoch 180] train avg loss 0.00262899, dev acc 0.8889, dev avg loss 0.296201, throughput 3.50672K wps
[Epoch 181 Batch 30/172] avg loss 0.00248577, throughput 3.17318K wps
[Epoch 181 Batch 60/172] avg loss 0.0025218, throughput 3.16469K wps
[Epoch 181 Batch 90/172] avg loss 0.00258134, throughput 3.07396K wps
[Epoch 181 Batch 120/172] avg loss 0.00266613, throughput 3.06677K wps
[Epoch 181 Batch 150/172] avg loss 0.00268424, throughput 3.61277K wps
Begin Testing...
[Epoch 181] train avg loss 0.00262404, dev acc 0.8910, dev avg loss 0.296323, throughput 3.27211K wps
[Epoch 182 Batch 30/172] avg loss 0.0027117, throughput 3.24283K wps
[Epoch 182 Batch 60/172] avg loss 0.00247698, throughput 3.47542K wps
[Epoch 182 Batch 90/172] avg loss 0.00249488, throughput 3.29062K wps
[Epoch 182 Batch 120/172] avg loss 0.00288238, throughput 3.00876K wps
[Epoch 182 Batch 150/172] avg loss 0.00264854, throughput 3.00668K wps
Begin Testing...
[Epoch 182] train avg loss 0.00262658, dev acc 0.8899, dev avg loss 0.29691, throughput 3.23026K wps
[Epoch 183 Batch 30/172] avg loss 0.00248164, throughput 3.49233K wps
[Epoch 183 Batch 60/172] avg loss 0.0024414, throughput 3.06965K wps
[Epoch 183 Batch 90/172] avg loss 0.00281739, throughput 2.91924K wps
[Epoch 183 Batch 120/172] avg loss 0.00257618, throughput 2.91144K wps
[Epoch 183 Batch 150/172] avg loss 0.00227971, throughput 3.01605K wps
Begin Testing...
[Epoch 183] train avg loss 0.00260519, dev acc 0.8931, dev avg loss 0.298876, throughput 3.08597K wps
[Epoch 184 Batch 30/172] avg loss 0.00249239, throughput 3.06213K wps
[Epoch 184 Batch 60/172] avg loss 0.00263471, throughput 3.31727K wps
[Epoch 184 Batch 90/172] avg loss 0.00244707, throughput 3.08105K wps
[Epoch 184 Batch 120/172] avg loss 0.00266697, throughput 3.6955K wps
[Epoch 184 Batch 150/172] avg loss 0.00269057, throughput 3.20447K wps
Begin Testing...
[Epoch 184] train avg loss 0.00260685, dev acc 0.8941, dev avg loss 0.296353, throughput 3.21807K wps
[Epoch 185 Batch 30/172] avg loss 0.00270711, throughput 2.9666K wps
[Epoch 185 Batch 60/172] avg loss 0.00245071, throughput 3.02164K wps
[Epoch 185 Batch 90/172] avg loss 0.00258655, throughput 3.23519K wps
[Epoch 185 Batch 120/172] avg loss 0.00265815, throughput 3.26707K wps
[Epoch 185 Batch 150/172] avg loss 0.00272763, throughput 3.24821K wps
Begin Testing...
[Epoch 185] train avg loss 0.00260199, dev acc 0.8941, dev avg loss 0.296939, throughput 3.18433K wps
[Epoch 186 Batch 30/172] avg loss 0.00282391, throughput 3.02103K wps
[Epoch 186 Batch 60/172] avg loss 0.00298241, throughput 3.18725K wps
[Epoch 186 Batch 90/172] avg loss 0.00245575, throughput 3.26565K wps
[Epoch 186 Batch 120/172] avg loss 0.00239998, throughput 3.00842K wps
[Epoch 186 Batch 150/172] avg loss 0.00195919, throughput 3.52521K wps
Begin Testing...
[Epoch 186] train avg loss 0.00252108, dev acc 0.8931, dev avg loss 0.300512, throughput 3.20313K wps
[Epoch 187 Batch 30/172] avg loss 0.00242485, throughput 3.27591K wps
[Epoch 187 Batch 60/172] avg loss 0.00267333, throughput 3.69036K wps
[Epoch 187 Batch 90/172] avg loss 0.0027352, throughput 2.98192K wps
[Epoch 187 Batch 120/172] avg loss 0.00248438, throughput 3.25284K wps
[Epoch 187 Batch 150/172] avg loss 0.0028435, throughput 3.46846K wps
Begin Testing...
[Epoch 187] train avg loss 0.00260065, dev acc 0.8931, dev avg loss 0.297826, throughput 3.29969K wps
[Epoch 188 Batch 30/172] avg loss 0.00237086, throughput 3.6485K wps
[Epoch 188 Batch 60/172] avg loss 0.00264989, throughput 3.27579K wps
[Epoch 188 Batch 90/172] avg loss 0.0024878, throughput 3.32364K wps
[Epoch 188 Batch 120/172] avg loss 0.00245124, throughput 3.52155K wps
[Epoch 188 Batch 150/172] avg loss 0.00234912, throughput 3.05039K wps
Begin Testing...
[Epoch 188] train avg loss 0.00251808, dev acc 0.8910, dev avg loss 0.303431, throughput 3.30852K wps
[Epoch 189 Batch 30/172] avg loss 0.00234412, throughput 3.52755K wps
[Epoch 189 Batch 60/172] avg loss 0.0024844, throughput 3.12029K wps
[Epoch 189 Batch 90/172] avg loss 0.00247368, throughput 3.31362K wps
[Epoch 189 Batch 120/172] avg loss 0.00278242, throughput 2.85745K wps
[Epoch 189 Batch 150/172] avg loss 0.00252584, throughput 3.34615K wps
Begin Testing...
[Epoch 189] train avg loss 0.00251581, dev acc 0.8941, dev avg loss 0.300438, throughput 3.17877K wps
[Epoch 190 Batch 30/172] avg loss 0.00240038, throughput 2.89513K wps
[Epoch 190 Batch 60/172] avg loss 0.00238822, throughput 3.09009K wps
[Epoch 190 Batch 90/172] avg loss 0.00233777, throughput 2.90945K wps
[Epoch 190 Batch 120/172] avg loss 0.0029507, throughput 3.18464K wps
[Epoch 190 Batch 150/172] avg loss 0.00248547, throughput 2.94032K wps
Begin Testing...
[Epoch 190] train avg loss 0.0024976, dev acc 0.8889, dev avg loss 0.297395, throughput 3.07082K wps
[Epoch 191 Batch 30/172] avg loss 0.00249262, throughput 3.50568K wps
[Epoch 191 Batch 60/172] avg loss 0.0023261, throughput 2.99533K wps
[Epoch 191 Batch 90/172] avg loss 0.0022611, throughput 3.27538K wps
[Epoch 191 Batch 120/172] avg loss 0.00249736, throughput 3.00266K wps
[Epoch 191 Batch 150/172] avg loss 0.00279966, throughput 3.62739K wps
Begin Testing...
[Epoch 191] train avg loss 0.0024803, dev acc 0.8920, dev avg loss 0.299272, throughput 3.26663K wps
[Epoch 192 Batch 30/172] avg loss 0.00231962, throughput 3.46287K wps
[Epoch 192 Batch 60/172] avg loss 0.00274206, throughput 3.17431K wps
[Epoch 192 Batch 90/172] avg loss 0.00221863, throughput 3.66846K wps
[Epoch 192 Batch 120/172] avg loss 0.00250718, throughput 3.16223K wps
[Epoch 192 Batch 150/172] avg loss 0.00264402, throughput 3.89041K wps
Begin Testing...
[Epoch 192] train avg loss 0.00249112, dev acc 0.8920, dev avg loss 0.300423, throughput 3.39837K wps
[Epoch 193 Batch 30/172] avg loss 0.00246641, throughput 2.96577K wps
[Epoch 193 Batch 60/172] avg loss 0.00246093, throughput 3.35355K wps
[Epoch 193 Batch 90/172] avg loss 0.00245235, throughput 3.27772K wps
[Epoch 193 Batch 120/172] avg loss 0.00248942, throughput 3.03733K wps
[Epoch 193 Batch 150/172] avg loss 0.00266336, throughput 3.6594K wps
Begin Testing...
[Epoch 193] train avg loss 0.00249367, dev acc 0.8910, dev avg loss 0.297774, throughput 3.211K wps
[Epoch 194 Batch 30/172] avg loss 0.00217138, throughput 3.22101K wps
[Epoch 194 Batch 60/172] avg loss 0.0026463, throughput 3.36893K wps
[Epoch 194 Batch 90/172] avg loss 0.00257419, throughput 3.18644K wps
[Epoch 194 Batch 120/172] avg loss 0.00248401, throughput 3.25369K wps
[Epoch 194 Batch 150/172] avg loss 0.00255222, throughput 3.94875K wps
Begin Testing...
[Epoch 194] train avg loss 0.00250226, dev acc 0.8910, dev avg loss 0.301828, throughput 3.38423K wps
[Epoch 195 Batch 30/172] avg loss 0.00293057, throughput 2.89603K wps
[Epoch 195 Batch 60/172] avg loss 0.00246967, throughput 3.25986K wps
[Epoch 195 Batch 90/172] avg loss 0.00242639, throughput 3.07963K wps
[Epoch 195 Batch 120/172] avg loss 0.00230677, throughput 3.10687K wps
[Epoch 195 Batch 150/172] avg loss 0.0023421, throughput 3.00596K wps
Begin Testing...
[Epoch 195] train avg loss 0.00248857, dev acc 0.8910, dev avg loss 0.299809, throughput 3.03916K wps
[Epoch 196 Batch 30/172] avg loss 0.00230136, throughput 3.21603K wps
[Epoch 196 Batch 60/172] avg loss 0.00281014, throughput 3.5887K wps
[Epoch 196 Batch 90/172] avg loss 0.00252801, throughput 3.10233K wps
[Epoch 196 Batch 120/172] avg loss 0.00248179, throughput 3.10339K wps
[Epoch 196 Batch 150/172] avg loss 0.00211283, throughput 2.96246K wps
Begin Testing...
[Epoch 196] train avg loss 0.00242902, dev acc 0.8931, dev avg loss 0.299584, throughput 3.19454K wps
[Epoch 197 Batch 30/172] avg loss 0.00262413, throughput 3.48726K wps
[Epoch 197 Batch 60/172] avg loss 0.00228802, throughput 3.30937K wps
[Epoch 197 Batch 90/172] avg loss 0.00251026, throughput 2.96941K wps
[Epoch 197 Batch 120/172] avg loss 0.00248218, throughput 3.01242K wps
[Epoch 197 Batch 150/172] avg loss 0.0024172, throughput 3.12017K wps
Begin Testing...
[Epoch 197] train avg loss 0.00243685, dev acc 0.8931, dev avg loss 0.300463, throughput 3.21417K wps
[Epoch 198 Batch 30/172] avg loss 0.00262292, throughput 3.0474K wps
[Epoch 198 Batch 60/172] avg loss 0.00261099, throughput 3.15333K wps
[Epoch 198 Batch 90/172] avg loss 0.00218366, throughput 3.30387K wps
[Epoch 198 Batch 120/172] avg loss 0.00252439, throughput 3.47563K wps
[Epoch 198 Batch 150/172] avg loss 0.00246666, throughput 3.00278K wps
Begin Testing...
[Epoch 198] train avg loss 0.00247386, dev acc 0.8931, dev avg loss 0.301131, throughput 3.17499K wps
[Epoch 199 Batch 30/172] avg loss 0.00261662, throughput 3.09712K wps
[Epoch 199 Batch 60/172] avg loss 0.00252633, throughput 3.41778K wps
[Epoch 199 Batch 90/172] avg loss 0.00211397, throughput 3.70489K wps
[Epoch 199 Batch 120/172] avg loss 0.00277627, throughput 3.22294K wps
[Epoch 199 Batch 150/172] avg loss 0.00215792, throughput 3.11013K wps
Begin Testing...
[Epoch 199] train avg loss 0.00243091, dev acc 0.8878, dev avg loss 0.299206, throughput 3.28614K wps
Test loss 0.248095, test acc 0.8981
Total time cost 407.97s
[Epoch 0 Batch 30/172] avg loss 0.0131428, throughput 3.08623K wps
[Epoch 0 Batch 60/172] avg loss 0.0124763, throughput 3.49416K wps
[Epoch 0 Batch 90/172] avg loss 0.0123138, throughput 3.24341K wps
[Epoch 0 Batch 120/172] avg loss 0.0120415, throughput 2.98226K wps
[Epoch 0 Batch 150/172] avg loss 0.0118927, throughput 3.21373K wps
Begin Testing...
[Epoch 0] train avg loss 0.0123915, dev acc 0.7044, dev avg loss 0.590419, throughput 3.22265K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.011997, throughput 3.36457K wps
[Epoch 1 Batch 60/172] avg loss 0.0119977, throughput 2.89655K wps
[Epoch 1 Batch 90/172] avg loss 0.0119651, throughput 2.96021K wps
[Epoch 1 Batch 120/172] avg loss 0.0118177, throughput 3.3736K wps
[Epoch 1 Batch 150/172] avg loss 0.0120887, throughput 3.35866K wps
Begin Testing...
[Epoch 1] train avg loss 0.012029, dev acc 0.7044, dev avg loss 0.580617, throughput 3.16863K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0118603, throughput 3.23734K wps
[Epoch 2 Batch 60/172] avg loss 0.0118252, throughput 3.11717K wps
[Epoch 2 Batch 90/172] avg loss 0.0115846, throughput 3.49462K wps
[Epoch 2 Batch 120/172] avg loss 0.0116833, throughput 2.91054K wps
[Epoch 2 Batch 150/172] avg loss 0.0116544, throughput 3.24747K wps
Begin Testing...
[Epoch 2] train avg loss 0.0117584, dev acc 0.7044, dev avg loss 0.565489, throughput 3.18143K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0118641, throughput 3.01258K wps
[Epoch 3 Batch 60/172] avg loss 0.0113308, throughput 3.07072K wps
[Epoch 3 Batch 90/172] avg loss 0.0114321, throughput 3.17722K wps
[Epoch 3 Batch 120/172] avg loss 0.0112772, throughput 3.20628K wps
[Epoch 3 Batch 150/172] avg loss 0.0111246, throughput 3.47298K wps
Begin Testing...
[Epoch 3] train avg loss 0.011431, dev acc 0.7044, dev avg loss 0.548799, throughput 3.18827K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0114485, throughput 3.03509K wps
[Epoch 4 Batch 60/172] avg loss 0.011088, throughput 3.16863K wps
[Epoch 4 Batch 90/172] avg loss 0.0109684, throughput 3.34506K wps
[Epoch 4 Batch 120/172] avg loss 0.0109612, throughput 3.02195K wps
[Epoch 4 Batch 150/172] avg loss 0.011054, throughput 3.4119K wps
Begin Testing...
[Epoch 4] train avg loss 0.0110775, dev acc 0.7149, dev avg loss 0.532043, throughput 3.15726K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0109185, throughput 2.98112K wps
[Epoch 5 Batch 60/172] avg loss 0.0108649, throughput 3.49155K wps
[Epoch 5 Batch 90/172] avg loss 0.0109532, throughput 3.03645K wps
[Epoch 5 Batch 120/172] avg loss 0.0105741, throughput 3.0677K wps
[Epoch 5 Batch 150/172] avg loss 0.0103332, throughput 3.22942K wps
Begin Testing...
[Epoch 5] train avg loss 0.0107121, dev acc 0.7338, dev avg loss 0.512116, throughput 3.1897K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.0103806, throughput 2.98687K wps
[Epoch 6 Batch 60/172] avg loss 0.0105165, throughput 3.05742K wps
[Epoch 6 Batch 90/172] avg loss 0.0102628, throughput 3.06382K wps
[Epoch 6 Batch 120/172] avg loss 0.0100482, throughput 3.46841K wps
[Epoch 6 Batch 150/172] avg loss 0.0101476, throughput 3.39929K wps
Begin Testing...
[Epoch 6] train avg loss 0.0102398, dev acc 0.7715, dev avg loss 0.490342, throughput 3.2003K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.00979837, throughput 3.42385K wps
[Epoch 7 Batch 60/172] avg loss 0.00984624, throughput 3.56092K wps
[Epoch 7 Batch 90/172] avg loss 0.0099196, throughput 3.57677K wps
[Epoch 7 Batch 120/172] avg loss 0.00977563, throughput 2.9121K wps
[Epoch 7 Batch 150/172] avg loss 0.00950556, throughput 2.92458K wps
Begin Testing...
[Epoch 7] train avg loss 0.0097611, dev acc 0.8029, dev avg loss 0.469245, throughput 3.31135K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00928619, throughput 3.08066K wps
[Epoch 8 Batch 60/172] avg loss 0.00960423, throughput 3.24201K wps
[Epoch 8 Batch 90/172] avg loss 0.00935321, throughput 2.95578K wps
[Epoch 8 Batch 120/172] avg loss 0.00892197, throughput 3.01886K wps
[Epoch 8 Batch 150/172] avg loss 0.00920249, throughput 3.12961K wps
Begin Testing...
[Epoch 8] train avg loss 0.00926204, dev acc 0.8312, dev avg loss 0.446926, throughput 3.11742K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00892272, throughput 3.30656K wps
[Epoch 9 Batch 60/172] avg loss 0.00888036, throughput 3.03134K wps
[Epoch 9 Batch 90/172] avg loss 0.00902597, throughput 3.21198K wps
[Epoch 9 Batch 120/172] avg loss 0.00864465, throughput 3.76765K wps
[Epoch 9 Batch 150/172] avg loss 0.00858364, throughput 3.08886K wps
Begin Testing...
[Epoch 9] train avg loss 0.00880719, dev acc 0.8312, dev avg loss 0.423132, throughput 3.34708K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00841892, throughput 3.03964K wps
[Epoch 10 Batch 60/172] avg loss 0.00795692, throughput 3.93454K wps
[Epoch 10 Batch 90/172] avg loss 0.00848305, throughput 3.4744K wps
[Epoch 10 Batch 120/172] avg loss 0.00832743, throughput 3.20656K wps
[Epoch 10 Batch 150/172] avg loss 0.0083117, throughput 3.29918K wps
Begin Testing...
[Epoch 10] train avg loss 0.00831999, dev acc 0.8585, dev avg loss 0.405509, throughput 3.39361K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00796917, throughput 3.05132K wps
[Epoch 11 Batch 60/172] avg loss 0.00798281, throughput 3.71944K wps
[Epoch 11 Batch 90/172] avg loss 0.00780722, throughput 3.41255K wps
[Epoch 11 Batch 120/172] avg loss 0.00803876, throughput 3.18598K wps
[Epoch 11 Batch 150/172] avg loss 0.00776195, throughput 3.70404K wps
Begin Testing...
[Epoch 11] train avg loss 0.00793903, dev acc 0.8564, dev avg loss 0.387468, throughput 3.42871K wps
[Epoch 12 Batch 30/172] avg loss 0.00788529, throughput 3.57681K wps
[Epoch 12 Batch 60/172] avg loss 0.00793195, throughput 3.22148K wps
[Epoch 12 Batch 90/172] avg loss 0.00749271, throughput 3.06132K wps
[Epoch 12 Batch 120/172] avg loss 0.00736485, throughput 3.51087K wps
[Epoch 12 Batch 150/172] avg loss 0.00731105, throughput 3.4161K wps
Begin Testing...
[Epoch 12] train avg loss 0.00760636, dev acc 0.8564, dev avg loss 0.372565, throughput 3.36253K wps
[Epoch 13 Batch 30/172] avg loss 0.00760388, throughput 3.35374K wps
[Epoch 13 Batch 60/172] avg loss 0.00757828, throughput 3.11085K wps
[Epoch 13 Batch 90/172] avg loss 0.00755788, throughput 3.09853K wps
[Epoch 13 Batch 120/172] avg loss 0.00720234, throughput 3.73009K wps
[Epoch 13 Batch 150/172] avg loss 0.00711487, throughput 3.55949K wps
Begin Testing...
[Epoch 13] train avg loss 0.00732971, dev acc 0.8553, dev avg loss 0.360516, throughput 3.32896K wps
[Epoch 14 Batch 30/172] avg loss 0.00707279, throughput 2.97505K wps
[Epoch 14 Batch 60/172] avg loss 0.00700882, throughput 3.2881K wps
[Epoch 14 Batch 90/172] avg loss 0.0073682, throughput 3.15659K wps
[Epoch 14 Batch 120/172] avg loss 0.00704836, throughput 3.18022K wps
[Epoch 14 Batch 150/172] avg loss 0.00736453, throughput 3.2515K wps
Begin Testing...
[Epoch 14] train avg loss 0.00708505, dev acc 0.8595, dev avg loss 0.350207, throughput 3.23826K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.00682662, throughput 3.44971K wps
[Epoch 15 Batch 60/172] avg loss 0.00684694, throughput 3.43471K wps
[Epoch 15 Batch 90/172] avg loss 0.0065451, throughput 3.09646K wps
[Epoch 15 Batch 120/172] avg loss 0.0068602, throughput 3.1101K wps
[Epoch 15 Batch 150/172] avg loss 0.00698996, throughput 3.21791K wps
Begin Testing...
[Epoch 15] train avg loss 0.00686035, dev acc 0.8669, dev avg loss 0.342465, throughput 3.22181K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.00676052, throughput 3.01729K wps
[Epoch 16 Batch 60/172] avg loss 0.00686354, throughput 3.08951K wps
[Epoch 16 Batch 90/172] avg loss 0.00687649, throughput 2.99535K wps
[Epoch 16 Batch 120/172] avg loss 0.00642024, throughput 3.14086K wps
[Epoch 16 Batch 150/172] avg loss 0.00659092, throughput 3.09836K wps
Begin Testing...
[Epoch 16] train avg loss 0.00672168, dev acc 0.8637, dev avg loss 0.335935, throughput 3.08642K wps
[Epoch 17 Batch 30/172] avg loss 0.00667567, throughput 3.02573K wps
[Epoch 17 Batch 60/172] avg loss 0.00616392, throughput 3.5164K wps
[Epoch 17 Batch 90/172] avg loss 0.00635932, throughput 3.07298K wps
[Epoch 17 Batch 120/172] avg loss 0.00699217, throughput 3.46514K wps
[Epoch 17 Batch 150/172] avg loss 0.00640697, throughput 3.01268K wps
Begin Testing...
[Epoch 17] train avg loss 0.0065133, dev acc 0.8690, dev avg loss 0.330056, throughput 3.18015K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.00631902, throughput 3.95233K wps
[Epoch 18 Batch 60/172] avg loss 0.00616001, throughput 3.01857K wps
[Epoch 18 Batch 90/172] avg loss 0.006356, throughput 3.39652K wps
[Epoch 18 Batch 120/172] avg loss 0.0065353, throughput 3.10165K wps
[Epoch 18 Batch 150/172] avg loss 0.00628508, throughput 4.09832K wps
Begin Testing...
[Epoch 18] train avg loss 0.00636359, dev acc 0.8721, dev avg loss 0.325227, throughput 3.46411K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00639169, throughput 3.22216K wps
[Epoch 19 Batch 60/172] avg loss 0.00659259, throughput 3.0542K wps
[Epoch 19 Batch 90/172] avg loss 0.00631485, throughput 2.97687K wps
[Epoch 19 Batch 120/172] avg loss 0.00611719, throughput 3.19338K wps
[Epoch 19 Batch 150/172] avg loss 0.00604318, throughput 4.22228K wps
Begin Testing...
[Epoch 19] train avg loss 0.00626941, dev acc 0.8732, dev avg loss 0.321052, throughput 3.27204K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.00664816, throughput 2.98912K wps
[Epoch 20 Batch 60/172] avg loss 0.00626823, throughput 3.48725K wps
[Epoch 20 Batch 90/172] avg loss 0.00593203, throughput 3.09329K wps
[Epoch 20 Batch 120/172] avg loss 0.00595847, throughput 3.39882K wps
[Epoch 20 Batch 150/172] avg loss 0.00589022, throughput 3.40212K wps
Begin Testing...
[Epoch 20] train avg loss 0.00617125, dev acc 0.8721, dev avg loss 0.318, throughput 3.27877K wps
[Epoch 21 Batch 30/172] avg loss 0.00624515, throughput 3.44787K wps
[Epoch 21 Batch 60/172] avg loss 0.00591656, throughput 3.25031K wps
[Epoch 21 Batch 90/172] avg loss 0.00612125, throughput 3.4471K wps
[Epoch 21 Batch 120/172] avg loss 0.00626221, throughput 3.65029K wps
[Epoch 21 Batch 150/172] avg loss 0.00555277, throughput 3.13261K wps
Begin Testing...
[Epoch 21] train avg loss 0.00606315, dev acc 0.8732, dev avg loss 0.315248, throughput 3.32812K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.00556931, throughput 3.47417K wps
[Epoch 22 Batch 60/172] avg loss 0.00603174, throughput 3.05792K wps
[Epoch 22 Batch 90/172] avg loss 0.00607749, throughput 4.0281K wps
[Epoch 22 Batch 120/172] avg loss 0.00597692, throughput 3.51523K wps
[Epoch 22 Batch 150/172] avg loss 0.0062357, throughput 4.24822K wps
Begin Testing...
[Epoch 22] train avg loss 0.00596221, dev acc 0.8721, dev avg loss 0.312624, throughput 3.57944K wps
[Epoch 23 Batch 30/172] avg loss 0.00580337, throughput 3.79411K wps
[Epoch 23 Batch 60/172] avg loss 0.00597773, throughput 3.29444K wps
[Epoch 23 Batch 90/172] avg loss 0.00600664, throughput 2.936K wps
[Epoch 23 Batch 120/172] avg loss 0.00566232, throughput 3.38409K wps
[Epoch 23 Batch 150/172] avg loss 0.00586245, throughput 2.91931K wps
Begin Testing...
[Epoch 23] train avg loss 0.00590697, dev acc 0.8816, dev avg loss 0.311065, throughput 3.2401K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00585091, throughput 2.96852K wps
[Epoch 24 Batch 60/172] avg loss 0.00557172, throughput 3.48747K wps
[Epoch 24 Batch 90/172] avg loss 0.00583849, throughput 3.3087K wps
[Epoch 24 Batch 120/172] avg loss 0.00580619, throughput 2.9189K wps
[Epoch 24 Batch 150/172] avg loss 0.00606055, throughput 2.89702K wps
Begin Testing...
[Epoch 24] train avg loss 0.00580276, dev acc 0.8721, dev avg loss 0.308507, throughput 3.10154K wps
[Epoch 25 Batch 30/172] avg loss 0.00571676, throughput 3.11787K wps
[Epoch 25 Batch 60/172] avg loss 0.0060013, throughput 3.21756K wps
[Epoch 25 Batch 90/172] avg loss 0.0057224, throughput 2.9227K wps
[Epoch 25 Batch 120/172] avg loss 0.00603684, throughput 3.28413K wps
[Epoch 25 Batch 150/172] avg loss 0.00576968, throughput 3.53357K wps
Begin Testing...
[Epoch 25] train avg loss 0.00583841, dev acc 0.8805, dev avg loss 0.307514, throughput 3.22357K wps
[Epoch 26 Batch 30/172] avg loss 0.00548602, throughput 3.39374K wps
[Epoch 26 Batch 60/172] avg loss 0.00566034, throughput 3.8717K wps
[Epoch 26 Batch 90/172] avg loss 0.00607051, throughput 3.60073K wps
[Epoch 26 Batch 120/172] avg loss 0.00572695, throughput 3.5218K wps
[Epoch 26 Batch 150/172] avg loss 0.00585646, throughput 3.0226K wps
Begin Testing...
[Epoch 26] train avg loss 0.00572291, dev acc 0.8742, dev avg loss 0.305419, throughput 3.4062K wps
[Epoch 27 Batch 30/172] avg loss 0.00558961, throughput 3.08409K wps
[Epoch 27 Batch 60/172] avg loss 0.00545599, throughput 2.88663K wps
[Epoch 27 Batch 90/172] avg loss 0.00575598, throughput 3.09317K wps
[Epoch 27 Batch 120/172] avg loss 0.00601216, throughput 3.42905K wps
[Epoch 27 Batch 150/172] avg loss 0.00594678, throughput 3.24861K wps
Begin Testing...
[Epoch 27] train avg loss 0.00573342, dev acc 0.8763, dev avg loss 0.304006, throughput 3.18806K wps
[Epoch 28 Batch 30/172] avg loss 0.00545333, throughput 3.62108K wps
[Epoch 28 Batch 60/172] avg loss 0.00539429, throughput 3.18532K wps
[Epoch 28 Batch 90/172] avg loss 0.00580237, throughput 3.48029K wps
[Epoch 28 Batch 120/172] avg loss 0.00550063, throughput 3.50019K wps
[Epoch 28 Batch 150/172] avg loss 0.00532259, throughput 3.27096K wps
Begin Testing...
[Epoch 28] train avg loss 0.00556419, dev acc 0.8774, dev avg loss 0.302803, throughput 3.42546K wps
[Epoch 29 Batch 30/172] avg loss 0.0057608, throughput 3.50018K wps
[Epoch 29 Batch 60/172] avg loss 0.00559955, throughput 4.24249K wps
[Epoch 29 Batch 90/172] avg loss 0.00595788, throughput 3.91293K wps
[Epoch 29 Batch 120/172] avg loss 0.00547571, throughput 3.10704K wps
[Epoch 29 Batch 150/172] avg loss 0.0051236, throughput 3.41107K wps
Begin Testing...
[Epoch 29] train avg loss 0.0055558, dev acc 0.8774, dev avg loss 0.302494, throughput 3.61627K wps
[Epoch 30 Batch 30/172] avg loss 0.0055119, throughput 2.98212K wps
[Epoch 30 Batch 60/172] avg loss 0.00534082, throughput 3.2207K wps
[Epoch 30 Batch 90/172] avg loss 0.00555905, throughput 3.33367K wps
[Epoch 30 Batch 120/172] avg loss 0.00571938, throughput 3.7341K wps
[Epoch 30 Batch 150/172] avg loss 0.00515788, throughput 3.28717K wps
Begin Testing...
[Epoch 30] train avg loss 0.00546949, dev acc 0.8784, dev avg loss 0.30061, throughput 3.32388K wps
[Epoch 31 Batch 30/172] avg loss 0.00553872, throughput 3.59312K wps
[Epoch 31 Batch 60/172] avg loss 0.00557428, throughput 3.51035K wps
[Epoch 31 Batch 90/172] avg loss 0.00555359, throughput 3.33741K wps
[Epoch 31 Batch 120/172] avg loss 0.00564407, throughput 3.34971K wps
[Epoch 31 Batch 150/172] avg loss 0.00509429, throughput 4.01769K wps
Begin Testing...
[Epoch 31] train avg loss 0.00546533, dev acc 0.8774, dev avg loss 0.299732, throughput 3.55278K wps
[Epoch 32 Batch 30/172] avg loss 0.00547374, throughput 3.16025K wps
[Epoch 32 Batch 60/172] avg loss 0.00532957, throughput 3.09057K wps
[Epoch 32 Batch 90/172] avg loss 0.00492862, throughput 3.52351K wps
[Epoch 32 Batch 120/172] avg loss 0.00559236, throughput 3.18707K wps
[Epoch 32 Batch 150/172] avg loss 0.00553083, throughput 3.00865K wps
Begin Testing...
[Epoch 32] train avg loss 0.0053687, dev acc 0.8784, dev avg loss 0.298407, throughput 3.2207K wps
[Epoch 33 Batch 30/172] avg loss 0.00555, throughput 3.62253K wps
[Epoch 33 Batch 60/172] avg loss 0.00556953, throughput 4.04651K wps
[Epoch 33 Batch 90/172] avg loss 0.00543006, throughput 3.86433K wps
[Epoch 33 Batch 120/172] avg loss 0.00527154, throughput 2.96583K wps
[Epoch 33 Batch 150/172] avg loss 0.00516715, throughput 3.5523K wps
Begin Testing...
[Epoch 33] train avg loss 0.00536285, dev acc 0.8805, dev avg loss 0.297693, throughput 3.50813K wps
[Epoch 34 Batch 30/172] avg loss 0.00526053, throughput 3.17758K wps
[Epoch 34 Batch 60/172] avg loss 0.00520314, throughput 3.05954K wps
[Epoch 34 Batch 90/172] avg loss 0.00546861, throughput 3.1875K wps
[Epoch 34 Batch 120/172] avg loss 0.00503365, throughput 3.31835K wps
[Epoch 34 Batch 150/172] avg loss 0.00601532, throughput 2.98863K wps
Begin Testing...
[Epoch 34] train avg loss 0.00539363, dev acc 0.8795, dev avg loss 0.297099, throughput 3.12835K wps
[Epoch 35 Batch 30/172] avg loss 0.00510753, throughput 3.43117K wps
[Epoch 35 Batch 60/172] avg loss 0.00540222, throughput 3.36444K wps
[Epoch 35 Batch 90/172] avg loss 0.00488898, throughput 3.43397K wps
[Epoch 35 Batch 120/172] avg loss 0.00517528, throughput 3.18893K wps
[Epoch 35 Batch 150/172] avg loss 0.00586079, throughput 3.02555K wps
Begin Testing...
[Epoch 35] train avg loss 0.00529257, dev acc 0.8795, dev avg loss 0.296031, throughput 3.27027K wps
[Epoch 36 Batch 30/172] avg loss 0.00529313, throughput 3.15315K wps
[Epoch 36 Batch 60/172] avg loss 0.0049977, throughput 3.39286K wps
[Epoch 36 Batch 90/172] avg loss 0.00533152, throughput 3.27511K wps
[Epoch 36 Batch 120/172] avg loss 0.00534461, throughput 3.81829K wps
[Epoch 36 Batch 150/172] avg loss 0.00524369, throughput 3.64221K wps
Begin Testing...
[Epoch 36] train avg loss 0.00526089, dev acc 0.8795, dev avg loss 0.295163, throughput 3.41713K wps
[Epoch 37 Batch 30/172] avg loss 0.005212, throughput 3.21398K wps
[Epoch 37 Batch 60/172] avg loss 0.00544735, throughput 3.18166K wps
[Epoch 37 Batch 90/172] avg loss 0.00504174, throughput 3.1867K wps
[Epoch 37 Batch 120/172] avg loss 0.00551583, throughput 3.81137K wps
[Epoch 37 Batch 150/172] avg loss 0.0050265, throughput 3.19809K wps
Begin Testing...
[Epoch 37] train avg loss 0.00523593, dev acc 0.8795, dev avg loss 0.294341, throughput 3.35522K wps
[Epoch 38 Batch 30/172] avg loss 0.00542355, throughput 3.00739K wps
[Epoch 38 Batch 60/172] avg loss 0.00550027, throughput 3.57412K wps
[Epoch 38 Batch 90/172] avg loss 0.0048208, throughput 3.14785K wps
[Epoch 38 Batch 120/172] avg loss 0.005171, throughput 3.01692K wps
[Epoch 38 Batch 150/172] avg loss 0.00503678, throughput 3.55784K wps
Begin Testing...
[Epoch 38] train avg loss 0.0051631, dev acc 0.8836, dev avg loss 0.294597, throughput 3.3082K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/172] avg loss 0.0057343, throughput 3.28946K wps
[Epoch 39 Batch 60/172] avg loss 0.00492689, throughput 3.60808K wps
[Epoch 39 Batch 90/172] avg loss 0.00543972, throughput 3.23677K wps
[Epoch 39 Batch 120/172] avg loss 0.00494994, throughput 3.02348K wps
[Epoch 39 Batch 150/172] avg loss 0.00494345, throughput 3.177K wps
Begin Testing...
[Epoch 39] train avg loss 0.00517902, dev acc 0.8816, dev avg loss 0.293961, throughput 3.21011K wps
[Epoch 40 Batch 30/172] avg loss 0.00492781, throughput 3.33364K wps
[Epoch 40 Batch 60/172] avg loss 0.00512029, throughput 2.96162K wps
[Epoch 40 Batch 90/172] avg loss 0.00504844, throughput 3.07995K wps
[Epoch 40 Batch 120/172] avg loss 0.00533818, throughput 3.51491K wps
[Epoch 40 Batch 150/172] avg loss 0.00546359, throughput 2.93561K wps
Begin Testing...
[Epoch 40] train avg loss 0.00515929, dev acc 0.8795, dev avg loss 0.29239, throughput 3.25041K wps
[Epoch 41 Batch 30/172] avg loss 0.00522227, throughput 3.24211K wps
[Epoch 41 Batch 60/172] avg loss 0.00484336, throughput 3.23519K wps
[Epoch 41 Batch 90/172] avg loss 0.00523415, throughput 3.2438K wps
[Epoch 41 Batch 120/172] avg loss 0.00496838, throughput 3.54019K wps
[Epoch 41 Batch 150/172] avg loss 0.00506426, throughput 3.54847K wps
Begin Testing...
[Epoch 41] train avg loss 0.00511139, dev acc 0.8847, dev avg loss 0.292865, throughput 3.36665K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/172] avg loss 0.00513925, throughput 3.34877K wps
[Epoch 42 Batch 60/172] avg loss 0.00529085, throughput 3.08077K wps
[Epoch 42 Batch 90/172] avg loss 0.00484439, throughput 3.67205K wps
[Epoch 42 Batch 120/172] avg loss 0.00486904, throughput 3.64733K wps
[Epoch 42 Batch 150/172] avg loss 0.00506044, throughput 3.23373K wps
Begin Testing...
[Epoch 42] train avg loss 0.00505498, dev acc 0.8795, dev avg loss 0.291147, throughput 3.44874K wps
[Epoch 43 Batch 30/172] avg loss 0.00488998, throughput 3.04662K wps
[Epoch 43 Batch 60/172] avg loss 0.00506584, throughput 3.48435K wps
[Epoch 43 Batch 90/172] avg loss 0.00522186, throughput 3.19816K wps
[Epoch 43 Batch 120/172] avg loss 0.00480696, throughput 3.39627K wps
[Epoch 43 Batch 150/172] avg loss 0.00476445, throughput 3.10453K wps
Begin Testing...
[Epoch 43] train avg loss 0.00497367, dev acc 0.8826, dev avg loss 0.29133, throughput 3.21503K wps
[Epoch 44 Batch 30/172] avg loss 0.00525093, throughput 3.053K wps
[Epoch 44 Batch 60/172] avg loss 0.00520206, throughput 3.39963K wps
[Epoch 44 Batch 90/172] avg loss 0.00475528, throughput 2.99886K wps
[Epoch 44 Batch 120/172] avg loss 0.00497008, throughput 3.10235K wps
[Epoch 44 Batch 150/172] avg loss 0.00503202, throughput 3.26104K wps
Begin Testing...
[Epoch 44] train avg loss 0.00496903, dev acc 0.8805, dev avg loss 0.290148, throughput 3.13522K wps
[Epoch 45 Batch 30/172] avg loss 0.00482634, throughput 3.46497K wps
[Epoch 45 Batch 60/172] avg loss 0.00497375, throughput 3.29703K wps
[Epoch 45 Batch 90/172] avg loss 0.00482753, throughput 3.76765K wps
[Epoch 45 Batch 120/172] avg loss 0.00506363, throughput 3.4942K wps
[Epoch 45 Batch 150/172] avg loss 0.00479402, throughput 3.3315K wps
Begin Testing...
[Epoch 45] train avg loss 0.00494388, dev acc 0.8784, dev avg loss 0.28994, throughput 3.42603K wps
[Epoch 46 Batch 30/172] avg loss 0.00484425, throughput 2.96828K wps
[Epoch 46 Batch 60/172] avg loss 0.00492215, throughput 3.05133K wps
[Epoch 46 Batch 90/172] avg loss 0.00444153, throughput 3.71502K wps
[Epoch 46 Batch 120/172] avg loss 0.00549308, throughput 4.09216K wps
[Epoch 46 Batch 150/172] avg loss 0.00497785, throughput 3.23395K wps
Begin Testing...
[Epoch 46] train avg loss 0.00494577, dev acc 0.8784, dev avg loss 0.289233, throughput 3.36349K wps
[Epoch 47 Batch 30/172] avg loss 0.00479453, throughput 3.08195K wps
[Epoch 47 Batch 60/172] avg loss 0.00478986, throughput 3.40739K wps
[Epoch 47 Batch 90/172] avg loss 0.00500868, throughput 3.69668K wps
[Epoch 47 Batch 120/172] avg loss 0.00498388, throughput 3.82687K wps
[Epoch 47 Batch 150/172] avg loss 0.00477108, throughput 3.28051K wps
Begin Testing...
[Epoch 47] train avg loss 0.00487523, dev acc 0.8795, dev avg loss 0.288726, throughput 3.47772K wps
[Epoch 48 Batch 30/172] avg loss 0.00531149, throughput 3.04966K wps
[Epoch 48 Batch 60/172] avg loss 0.00466469, throughput 3.13525K wps
[Epoch 48 Batch 90/172] avg loss 0.00481445, throughput 3.17829K wps
[Epoch 48 Batch 120/172] avg loss 0.00463933, throughput 3.05871K wps
[Epoch 48 Batch 150/172] avg loss 0.0048756, throughput 4.15575K wps
Begin Testing...
[Epoch 48] train avg loss 0.00488377, dev acc 0.8805, dev avg loss 0.288499, throughput 3.34984K wps
[Epoch 49 Batch 30/172] avg loss 0.00502528, throughput 3.5202K wps
[Epoch 49 Batch 60/172] avg loss 0.00461825, throughput 3.01523K wps
[Epoch 49 Batch 90/172] avg loss 0.00471816, throughput 3.11168K wps
[Epoch 49 Batch 120/172] avg loss 0.00497361, throughput 3.23421K wps
[Epoch 49 Batch 150/172] avg loss 0.00470138, throughput 3.67086K wps
Begin Testing...
[Epoch 49] train avg loss 0.00483275, dev acc 0.8816, dev avg loss 0.288125, throughput 3.32712K wps
[Epoch 50 Batch 30/172] avg loss 0.00442276, throughput 3.72115K wps
[Epoch 50 Batch 60/172] avg loss 0.00475515, throughput 3.15721K wps
[Epoch 50 Batch 90/172] avg loss 0.00510736, throughput 3.77859K wps
[Epoch 50 Batch 120/172] avg loss 0.00455048, throughput 3.87398K wps
[Epoch 50 Batch 150/172] avg loss 0.00506537, throughput 3.26238K wps
Begin Testing...
[Epoch 50] train avg loss 0.00479813, dev acc 0.8836, dev avg loss 0.289106, throughput 3.48312K wps
[Epoch 51 Batch 30/172] avg loss 0.00480438, throughput 3.07155K wps
[Epoch 51 Batch 60/172] avg loss 0.00441351, throughput 3.57297K wps
[Epoch 51 Batch 90/172] avg loss 0.00484475, throughput 3.14506K wps
[Epoch 51 Batch 120/172] avg loss 0.00487658, throughput 2.99044K wps
[Epoch 51 Batch 150/172] avg loss 0.00468974, throughput 3.21191K wps
Begin Testing...
[Epoch 51] train avg loss 0.00473156, dev acc 0.8836, dev avg loss 0.287207, throughput 3.19279K wps
[Epoch 52 Batch 30/172] avg loss 0.00469471, throughput 3.14501K wps
[Epoch 52 Batch 60/172] avg loss 0.00482115, throughput 3.11263K wps
[Epoch 52 Batch 90/172] avg loss 0.00448857, throughput 3.03256K wps
[Epoch 52 Batch 120/172] avg loss 0.00504254, throughput 3.27689K wps
[Epoch 52 Batch 150/172] avg loss 0.00467892, throughput 3.27835K wps
Begin Testing...
[Epoch 52] train avg loss 0.00477531, dev acc 0.8805, dev avg loss 0.286852, throughput 3.15656K wps
[Epoch 53 Batch 30/172] avg loss 0.00450337, throughput 3.16433K wps
[Epoch 53 Batch 60/172] avg loss 0.00470892, throughput 3.46051K wps
[Epoch 53 Batch 90/172] avg loss 0.00449731, throughput 3.12309K wps
[Epoch 53 Batch 120/172] avg loss 0.00463329, throughput 3.35188K wps
[Epoch 53 Batch 150/172] avg loss 0.00471547, throughput 2.97468K wps
Begin Testing...
[Epoch 53] train avg loss 0.00467456, dev acc 0.8784, dev avg loss 0.286238, throughput 3.21812K wps
[Epoch 54 Batch 30/172] avg loss 0.00479818, throughput 3.36328K wps
[Epoch 54 Batch 60/172] avg loss 0.00468568, throughput 3.22385K wps
[Epoch 54 Batch 90/172] avg loss 0.00467522, throughput 3.37165K wps
[Epoch 54 Batch 120/172] avg loss 0.0044997, throughput 3.38474K wps
[Epoch 54 Batch 150/172] avg loss 0.00487044, throughput 3.89054K wps
Begin Testing...
[Epoch 54] train avg loss 0.00474818, dev acc 0.8847, dev avg loss 0.28655, throughput 3.40146K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/172] avg loss 0.00441837, throughput 3.71396K wps
[Epoch 55 Batch 60/172] avg loss 0.00527056, throughput 3.51411K wps
[Epoch 55 Batch 90/172] avg loss 0.00469949, throughput 3.44638K wps
[Epoch 55 Batch 120/172] avg loss 0.00469657, throughput 2.97036K wps
[Epoch 55 Batch 150/172] avg loss 0.00434336, throughput 3.22504K wps
Begin Testing...
[Epoch 55] train avg loss 0.00468128, dev acc 0.8805, dev avg loss 0.285543, throughput 3.33549K wps
[Epoch 56 Batch 30/172] avg loss 0.00461578, throughput 3.01823K wps
[Epoch 56 Batch 60/172] avg loss 0.00472691, throughput 3.12643K wps
[Epoch 56 Batch 90/172] avg loss 0.00458255, throughput 3.43675K wps
[Epoch 56 Batch 120/172] avg loss 0.00471845, throughput 3.74814K wps
[Epoch 56 Batch 150/172] avg loss 0.00489266, throughput 3.10419K wps
Begin Testing...
[Epoch 56] train avg loss 0.00469635, dev acc 0.8847, dev avg loss 0.286583, throughput 3.2244K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/172] avg loss 0.00459841, throughput 3.10014K wps
[Epoch 57 Batch 60/172] avg loss 0.00494553, throughput 3.3558K wps
[Epoch 57 Batch 90/172] avg loss 0.004604, throughput 3.47055K wps
[Epoch 57 Batch 120/172] avg loss 0.00437, throughput 3.69925K wps
[Epoch 57 Batch 150/172] avg loss 0.00434093, throughput 3.50402K wps
Begin Testing...
[Epoch 57] train avg loss 0.00461482, dev acc 0.8816, dev avg loss 0.28495, throughput 3.40444K wps
[Epoch 58 Batch 30/172] avg loss 0.0047812, throughput 3.43891K wps
[Epoch 58 Batch 60/172] avg loss 0.00461843, throughput 3.58904K wps
[Epoch 58 Batch 90/172] avg loss 0.00485089, throughput 3.44384K wps
[Epoch 58 Batch 120/172] avg loss 0.00447516, throughput 2.97852K wps
[Epoch 58 Batch 150/172] avg loss 0.00443977, throughput 3.86221K wps
Begin Testing...
[Epoch 58] train avg loss 0.00461682, dev acc 0.8847, dev avg loss 0.284838, throughput 3.38971K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/172] avg loss 0.00453377, throughput 3.29001K wps
[Epoch 59 Batch 60/172] avg loss 0.00434059, throughput 3.53247K wps
[Epoch 59 Batch 90/172] avg loss 0.00488701, throughput 3.19126K wps
[Epoch 59 Batch 120/172] avg loss 0.00435179, throughput 3.43976K wps
[Epoch 59 Batch 150/172] avg loss 0.00443042, throughput 3.68069K wps
Begin Testing...
[Epoch 59] train avg loss 0.00461413, dev acc 0.8826, dev avg loss 0.284052, throughput 3.46056K wps
[Epoch 60 Batch 30/172] avg loss 0.0047641, throughput 3.35586K wps
[Epoch 60 Batch 60/172] avg loss 0.00443196, throughput 3.73947K wps
[Epoch 60 Batch 90/172] avg loss 0.00414429, throughput 3.43129K wps
[Epoch 60 Batch 120/172] avg loss 0.00453895, throughput 3.47117K wps
[Epoch 60 Batch 150/172] avg loss 0.00455807, throughput 3.79046K wps
Begin Testing...
[Epoch 60] train avg loss 0.00456181, dev acc 0.8826, dev avg loss 0.283664, throughput 3.47353K wps
[Epoch 61 Batch 30/172] avg loss 0.00444975, throughput 3.61772K wps
[Epoch 61 Batch 60/172] avg loss 0.00471465, throughput 2.99444K wps
[Epoch 61 Batch 90/172] avg loss 0.00419933, throughput 3.19542K wps
[Epoch 61 Batch 120/172] avg loss 0.0042954, throughput 3.24957K wps
[Epoch 61 Batch 150/172] avg loss 0.00514085, throughput 3.65771K wps
Begin Testing...
[Epoch 61] train avg loss 0.00456407, dev acc 0.8836, dev avg loss 0.284019, throughput 3.37273K wps
[Epoch 62 Batch 30/172] avg loss 0.00444916, throughput 3.46509K wps
[Epoch 62 Batch 60/172] avg loss 0.00414047, throughput 3.14121K wps
[Epoch 62 Batch 90/172] avg loss 0.00462263, throughput 3.3793K wps
[Epoch 62 Batch 120/172] avg loss 0.00494112, throughput 3.09571K wps
[Epoch 62 Batch 150/172] avg loss 0.00406777, throughput 3.3643K wps
Begin Testing...
[Epoch 62] train avg loss 0.0044694, dev acc 0.8868, dev avg loss 0.28341, throughput 3.26945K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/172] avg loss 0.00389023, throughput 3.23715K wps
[Epoch 63 Batch 60/172] avg loss 0.00467879, throughput 3.60877K wps
[Epoch 63 Batch 90/172] avg loss 0.00464587, throughput 3.34548K wps
[Epoch 63 Batch 120/172] avg loss 0.00462986, throughput 3.00736K wps
[Epoch 63 Batch 150/172] avg loss 0.00503671, throughput 3.27236K wps
Begin Testing...
[Epoch 63] train avg loss 0.00450768, dev acc 0.8857, dev avg loss 0.28455, throughput 3.31475K wps
[Epoch 64 Batch 30/172] avg loss 0.0041293, throughput 3.05992K wps
[Epoch 64 Batch 60/172] avg loss 0.00451612, throughput 3.21435K wps
[Epoch 64 Batch 90/172] avg loss 0.00472832, throughput 3.45304K wps
[Epoch 64 Batch 120/172] avg loss 0.00409826, throughput 3.80903K wps
[Epoch 64 Batch 150/172] avg loss 0.00448952, throughput 3.26626K wps
Begin Testing...
[Epoch 64] train avg loss 0.0044663, dev acc 0.8868, dev avg loss 0.282444, throughput 3.31708K wps
Observed Improvement.
Begin Testing...
[Epoch 65 Batch 30/172] avg loss 0.00461461, throughput 2.93763K wps
[Epoch 65 Batch 60/172] avg loss 0.00436807, throughput 3.19259K wps
[Epoch 65 Batch 90/172] avg loss 0.0043955, throughput 3.52404K wps
[Epoch 65 Batch 120/172] avg loss 0.00436453, throughput 3.44809K wps
[Epoch 65 Batch 150/172] avg loss 0.00429289, throughput 3.45904K wps
Begin Testing...
[Epoch 65] train avg loss 0.00448808, dev acc 0.8836, dev avg loss 0.281631, throughput 3.24903K wps
[Epoch 66 Batch 30/172] avg loss 0.00433271, throughput 3.3545K wps
[Epoch 66 Batch 60/172] avg loss 0.00408638, throughput 3.00286K wps
[Epoch 66 Batch 90/172] avg loss 0.0040224, throughput 3.07793K wps
[Epoch 66 Batch 120/172] avg loss 0.0049125, throughput 3.38082K wps
[Epoch 66 Batch 150/172] avg loss 0.00419865, throughput 3.51326K wps
Begin Testing...
[Epoch 66] train avg loss 0.00435981, dev acc 0.8857, dev avg loss 0.281663, throughput 3.26248K wps
[Epoch 67 Batch 30/172] avg loss 0.00454827, throughput 3.46872K wps
[Epoch 67 Batch 60/172] avg loss 0.00444444, throughput 4.02106K wps
[Epoch 67 Batch 90/172] avg loss 0.00439243, throughput 3.0433K wps
[Epoch 67 Batch 120/172] avg loss 0.00485216, throughput 3.81309K wps
[Epoch 67 Batch 150/172] avg loss 0.00403251, throughput 3.38261K wps
Begin Testing...
[Epoch 67] train avg loss 0.00440184, dev acc 0.8857, dev avg loss 0.28274, throughput 3.44568K wps
[Epoch 68 Batch 30/172] avg loss 0.00466269, throughput 3.53226K wps
[Epoch 68 Batch 60/172] avg loss 0.00459553, throughput 3.36822K wps
[Epoch 68 Batch 90/172] avg loss 0.00435113, throughput 3.47124K wps
[Epoch 68 Batch 120/172] avg loss 0.0042454, throughput 3.52408K wps
[Epoch 68 Batch 150/172] avg loss 0.00403074, throughput 3.47718K wps
Begin Testing...
[Epoch 68] train avg loss 0.00435562, dev acc 0.8889, dev avg loss 0.281407, throughput 3.49717K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/172] avg loss 0.00425382, throughput 3.08082K wps
[Epoch 69 Batch 60/172] avg loss 0.00417371, throughput 3.21301K wps
[Epoch 69 Batch 90/172] avg loss 0.00456112, throughput 4.27745K wps
[Epoch 69 Batch 120/172] avg loss 0.00423657, throughput 3.85959K wps
[Epoch 69 Batch 150/172] avg loss 0.00392505, throughput 3.32324K wps
Begin Testing...
[Epoch 69] train avg loss 0.00424514, dev acc 0.8868, dev avg loss 0.281648, throughput 3.537K wps
[Epoch 70 Batch 30/172] avg loss 0.00436762, throughput 3.36468K wps
[Epoch 70 Batch 60/172] avg loss 0.00441651, throughput 2.77235K wps
[Epoch 70 Batch 90/172] avg loss 0.00395608, throughput 3.27336K wps
[Epoch 70 Batch 120/172] avg loss 0.0041644, throughput 3.10471K wps
[Epoch 70 Batch 150/172] avg loss 0.00457142, throughput 3.19395K wps
Begin Testing...
[Epoch 70] train avg loss 0.00428123, dev acc 0.8857, dev avg loss 0.281351, throughput 3.17888K wps
[Epoch 71 Batch 30/172] avg loss 0.00436852, throughput 3.43889K wps
[Epoch 71 Batch 60/172] avg loss 0.00408873, throughput 4.16318K wps
[Epoch 71 Batch 90/172] avg loss 0.00452093, throughput 3.23003K wps
[Epoch 71 Batch 120/172] avg loss 0.00417764, throughput 3.44126K wps
[Epoch 71 Batch 150/172] avg loss 0.00401949, throughput 3.04783K wps
Begin Testing...
[Epoch 71] train avg loss 0.00422389, dev acc 0.8868, dev avg loss 0.28071, throughput 3.39138K wps
[Epoch 72 Batch 30/172] avg loss 0.00418589, throughput 3.50104K wps
[Epoch 72 Batch 60/172] avg loss 0.00470252, throughput 3.87811K wps
[Epoch 72 Batch 90/172] avg loss 0.00427977, throughput 3.7061K wps
[Epoch 72 Batch 120/172] avg loss 0.00427206, throughput 3.74283K wps
[Epoch 72 Batch 150/172] avg loss 0.00408354, throughput 3.76306K wps
Begin Testing...
[Epoch 72] train avg loss 0.00428689, dev acc 0.8868, dev avg loss 0.281165, throughput 3.63347K wps
[Epoch 73 Batch 30/172] avg loss 0.00401954, throughput 4.04419K wps
[Epoch 73 Batch 60/172] avg loss 0.00432072, throughput 3.51523K wps
[Epoch 73 Batch 90/172] avg loss 0.00434148, throughput 3.89229K wps
[Epoch 73 Batch 120/172] avg loss 0.00459893, throughput 3.65436K wps
[Epoch 73 Batch 150/172] avg loss 0.00397273, throughput 3.21155K wps
Begin Testing...
[Epoch 73] train avg loss 0.00422612, dev acc 0.8868, dev avg loss 0.280574, throughput 3.53209K wps
[Epoch 74 Batch 30/172] avg loss 0.00406624, throughput 3.09365K wps
[Epoch 74 Batch 60/172] avg loss 0.00442225, throughput 3.06149K wps
[Epoch 74 Batch 90/172] avg loss 0.00412022, throughput 3.22825K wps
[Epoch 74 Batch 120/172] avg loss 0.00427684, throughput 3.46205K wps
[Epoch 74 Batch 150/172] avg loss 0.00438411, throughput 3.17551K wps
Begin Testing...
[Epoch 74] train avg loss 0.00418256, dev acc 0.8910, dev avg loss 0.28037, throughput 3.26471K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/172] avg loss 0.00437333, throughput 3.32771K wps
[Epoch 75 Batch 60/172] avg loss 0.00402819, throughput 3.27717K wps
[Epoch 75 Batch 90/172] avg loss 0.00452457, throughput 3.08406K wps
[Epoch 75 Batch 120/172] avg loss 0.00358944, throughput 3.97017K wps
[Epoch 75 Batch 150/172] avg loss 0.00445712, throughput 3.72361K wps
Begin Testing...
[Epoch 75] train avg loss 0.00418604, dev acc 0.8857, dev avg loss 0.28053, throughput 3.36366K wps
[Epoch 76 Batch 30/172] avg loss 0.00441328, throughput 2.9271K wps
[Epoch 76 Batch 60/172] avg loss 0.00410473, throughput 3.24913K wps
[Epoch 76 Batch 90/172] avg loss 0.00429022, throughput 3.25653K wps
[Epoch 76 Batch 120/172] avg loss 0.00404992, throughput 3.55644K wps
[Epoch 76 Batch 150/172] avg loss 0.00407375, throughput 3.00042K wps
Begin Testing...
[Epoch 76] train avg loss 0.00416146, dev acc 0.8857, dev avg loss 0.280466, throughput 3.18233K wps
[Epoch 77 Batch 30/172] avg loss 0.00421762, throughput 3.22068K wps
[Epoch 77 Batch 60/172] avg loss 0.00412786, throughput 3.74417K wps
[Epoch 77 Batch 90/172] avg loss 0.00429749, throughput 2.96825K wps
[Epoch 77 Batch 120/172] avg loss 0.00382481, throughput 3.30488K wps
[Epoch 77 Batch 150/172] avg loss 0.00417611, throughput 3.44427K wps
Begin Testing...
[Epoch 77] train avg loss 0.00411514, dev acc 0.8889, dev avg loss 0.279708, throughput 3.29749K wps
[Epoch 78 Batch 30/172] avg loss 0.00414808, throughput 3.33744K wps
[Epoch 78 Batch 60/172] avg loss 0.00441851, throughput 3.57344K wps
[Epoch 78 Batch 90/172] avg loss 0.00366444, throughput 3.14177K wps
[Epoch 78 Batch 120/172] avg loss 0.00378521, throughput 3.66103K wps
[Epoch 78 Batch 150/172] avg loss 0.00492269, throughput 2.92261K wps
Begin Testing...
[Epoch 78] train avg loss 0.0041537, dev acc 0.8889, dev avg loss 0.281141, throughput 3.31527K wps
[Epoch 79 Batch 30/172] avg loss 0.00450323, throughput 3.58307K wps
[Epoch 79 Batch 60/172] avg loss 0.00435038, throughput 3.49427K wps
[Epoch 79 Batch 90/172] avg loss 0.00384822, throughput 3.274K wps
[Epoch 79 Batch 120/172] avg loss 0.00424755, throughput 3.40301K wps
[Epoch 79 Batch 150/172] avg loss 0.00369694, throughput 3.0049K wps
Begin Testing...
[Epoch 79] train avg loss 0.00412405, dev acc 0.8889, dev avg loss 0.27942, throughput 3.35314K wps
[Epoch 80 Batch 30/172] avg loss 0.00400289, throughput 3.01966K wps
[Epoch 80 Batch 60/172] avg loss 0.00390975, throughput 3.32268K wps
[Epoch 80 Batch 90/172] avg loss 0.00393861, throughput 3.30875K wps
[Epoch 80 Batch 120/172] avg loss 0.00402324, throughput 2.99598K wps
[Epoch 80 Batch 150/172] avg loss 0.00417184, throughput 2.98136K wps
Begin Testing...
[Epoch 80] train avg loss 0.00400457, dev acc 0.8889, dev avg loss 0.27966, throughput 3.19113K wps
[Epoch 81 Batch 30/172] avg loss 0.00396391, throughput 3.29837K wps
[Epoch 81 Batch 60/172] avg loss 0.00406774, throughput 3.05365K wps
[Epoch 81 Batch 90/172] avg loss 0.00410077, throughput 3.45374K wps
[Epoch 81 Batch 120/172] avg loss 0.00430607, throughput 3.71403K wps
[Epoch 81 Batch 150/172] avg loss 0.00383329, throughput 3.03727K wps
Begin Testing...
[Epoch 81] train avg loss 0.00407102, dev acc 0.8899, dev avg loss 0.278782, throughput 3.24289K wps
[Epoch 82 Batch 30/172] avg loss 0.00429455, throughput 3.30245K wps
[Epoch 82 Batch 60/172] avg loss 0.00368941, throughput 2.99639K wps
[Epoch 82 Batch 90/172] avg loss 0.00373897, throughput 3.01074K wps
[Epoch 82 Batch 120/172] avg loss 0.0042125, throughput 3.64871K wps
[Epoch 82 Batch 150/172] avg loss 0.00418339, throughput 3.24718K wps
Begin Testing...
[Epoch 82] train avg loss 0.00406578, dev acc 0.8878, dev avg loss 0.279686, throughput 3.19963K wps
[Epoch 83 Batch 30/172] avg loss 0.00417269, throughput 3.26344K wps
[Epoch 83 Batch 60/172] avg loss 0.00377289, throughput 3.16746K wps
[Epoch 83 Batch 90/172] avg loss 0.0038959, throughput 3.56031K wps
[Epoch 83 Batch 120/172] avg loss 0.00370589, throughput 4.16334K wps
[Epoch 83 Batch 150/172] avg loss 0.0044088, throughput 3.58958K wps
Begin Testing...
[Epoch 83] train avg loss 0.00403114, dev acc 0.8868, dev avg loss 0.279361, throughput 3.60006K wps
[Epoch 84 Batch 30/172] avg loss 0.00412873, throughput 3.1011K wps
[Epoch 84 Batch 60/172] avg loss 0.00403629, throughput 4.05538K wps
[Epoch 84 Batch 90/172] avg loss 0.00390664, throughput 3.12933K wps
[Epoch 84 Batch 120/172] avg loss 0.00376898, throughput 3.49461K wps
[Epoch 84 Batch 150/172] avg loss 0.00392972, throughput 3.05652K wps
Begin Testing...
[Epoch 84] train avg loss 0.00402295, dev acc 0.8910, dev avg loss 0.278725, throughput 3.30343K wps
Observed Improvement.
Begin Testing...
[Epoch 85 Batch 30/172] avg loss 0.00432166, throughput 3.04663K wps
[Epoch 85 Batch 60/172] avg loss 0.00374501, throughput 2.99662K wps
[Epoch 85 Batch 90/172] avg loss 0.00381592, throughput 3.57541K wps
[Epoch 85 Batch 120/172] avg loss 0.00423991, throughput 3.63545K wps
[Epoch 85 Batch 150/172] avg loss 0.00402347, throughput 2.94721K wps
Begin Testing...
[Epoch 85] train avg loss 0.00403704, dev acc 0.8910, dev avg loss 0.278511, throughput 3.22872K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/172] avg loss 0.00418457, throughput 3.23029K wps
[Epoch 86 Batch 60/172] avg loss 0.00391872, throughput 3.18042K wps
[Epoch 86 Batch 90/172] avg loss 0.00396597, throughput 3.41773K wps
[Epoch 86 Batch 120/172] avg loss 0.00393423, throughput 3.10088K wps
[Epoch 86 Batch 150/172] avg loss 0.00371034, throughput 3.43034K wps
Begin Testing...
[Epoch 86] train avg loss 0.00396308, dev acc 0.8899, dev avg loss 0.278186, throughput 3.21936K wps
[Epoch 87 Batch 30/172] avg loss 0.00398116, throughput 3.09639K wps
[Epoch 87 Batch 60/172] avg loss 0.00412303, throughput 3.19563K wps
[Epoch 87 Batch 90/172] avg loss 0.00388607, throughput 2.90615K wps
[Epoch 87 Batch 120/172] avg loss 0.00411095, throughput 3.03555K wps
[Epoch 87 Batch 150/172] avg loss 0.00401089, throughput 3.31806K wps
Begin Testing...
[Epoch 87] train avg loss 0.00395859, dev acc 0.8868, dev avg loss 0.278668, throughput 3.15631K wps
[Epoch 88 Batch 30/172] avg loss 0.00398452, throughput 3.62046K wps
[Epoch 88 Batch 60/172] avg loss 0.0039767, throughput 3.59299K wps
[Epoch 88 Batch 90/172] avg loss 0.00377471, throughput 3.11695K wps
[Epoch 88 Batch 120/172] avg loss 0.00399656, throughput 3.55743K wps
[Epoch 88 Batch 150/172] avg loss 0.00385819, throughput 3.32651K wps
Begin Testing...
[Epoch 88] train avg loss 0.00390175, dev acc 0.8899, dev avg loss 0.278029, throughput 3.38137K wps
[Epoch 89 Batch 30/172] avg loss 0.00371944, throughput 3.02073K wps
[Epoch 89 Batch 60/172] avg loss 0.00388222, throughput 3.63753K wps
[Epoch 89 Batch 90/172] avg loss 0.00411245, throughput 3.42675K wps
[Epoch 89 Batch 120/172] avg loss 0.00399009, throughput 3.2018K wps
[Epoch 89 Batch 150/172] avg loss 0.00396928, throughput 3.8658K wps
Begin Testing...
[Epoch 89] train avg loss 0.00392216, dev acc 0.8910, dev avg loss 0.27952, throughput 3.43229K wps
Observed Improvement.
Begin Testing...
[Epoch 90 Batch 30/172] avg loss 0.00409131, throughput 3.58936K wps
[Epoch 90 Batch 60/172] avg loss 0.00400971, throughput 3.77509K wps
[Epoch 90 Batch 90/172] avg loss 0.00350037, throughput 2.97398K wps
[Epoch 90 Batch 120/172] avg loss 0.00408356, throughput 3.38664K wps
[Epoch 90 Batch 150/172] avg loss 0.00367744, throughput 3.37785K wps
Begin Testing...
[Epoch 90] train avg loss 0.00386445, dev acc 0.8920, dev avg loss 0.277647, throughput 3.37197K wps
Observed Improvement.
Begin Testing...
[Epoch 91 Batch 30/172] avg loss 0.00367631, throughput 3.32381K wps
[Epoch 91 Batch 60/172] avg loss 0.00379681, throughput 3.00924K wps
[Epoch 91 Batch 90/172] avg loss 0.00386261, throughput 3.75745K wps
[Epoch 91 Batch 120/172] avg loss 0.0037002, throughput 3.45203K wps
[Epoch 91 Batch 150/172] avg loss 0.00389789, throughput 2.96665K wps
Begin Testing...
[Epoch 91] train avg loss 0.00380053, dev acc 0.8889, dev avg loss 0.27766, throughput 3.21791K wps
[Epoch 92 Batch 30/172] avg loss 0.00367419, throughput 3.12097K wps
[Epoch 92 Batch 60/172] avg loss 0.00379815, throughput 3.15775K wps
[Epoch 92 Batch 90/172] avg loss 0.00359334, throughput 3.24704K wps
[Epoch 92 Batch 120/172] avg loss 0.0041705, throughput 3.55305K wps
[Epoch 92 Batch 150/172] avg loss 0.00395872, throughput 4.07856K wps
Begin Testing...
[Epoch 92] train avg loss 0.00386967, dev acc 0.8899, dev avg loss 0.277437, throughput 3.35948K wps
[Epoch 93 Batch 30/172] avg loss 0.00352248, throughput 2.91294K wps
[Epoch 93 Batch 60/172] avg loss 0.00429712, throughput 3.63661K wps
[Epoch 93 Batch 90/172] avg loss 0.00358206, throughput 3.18477K wps
[Epoch 93 Batch 120/172] avg loss 0.00394753, throughput 3.32536K wps
[Epoch 93 Batch 150/172] avg loss 0.00392456, throughput 3.67478K wps
Begin Testing...
[Epoch 93] train avg loss 0.00381577, dev acc 0.8878, dev avg loss 0.277227, throughput 3.31786K wps
[Epoch 94 Batch 30/172] avg loss 0.00375188, throughput 2.98179K wps
[Epoch 94 Batch 60/172] avg loss 0.00368815, throughput 3.50744K wps
[Epoch 94 Batch 90/172] avg loss 0.00403554, throughput 3.46276K wps
[Epoch 94 Batch 120/172] avg loss 0.00342268, throughput 2.87844K wps
[Epoch 94 Batch 150/172] avg loss 0.00384908, throughput 3.45102K wps
Begin Testing...
[Epoch 94] train avg loss 0.0037812, dev acc 0.8878, dev avg loss 0.277569, throughput 3.30542K wps
[Epoch 95 Batch 30/172] avg loss 0.00363005, throughput 3.04755K wps
[Epoch 95 Batch 60/172] avg loss 0.00371595, throughput 3.05246K wps
[Epoch 95 Batch 90/172] avg loss 0.00378336, throughput 4.13512K wps
[Epoch 95 Batch 120/172] avg loss 0.00414931, throughput 3.39295K wps
[Epoch 95 Batch 150/172] avg loss 0.0034622, throughput 2.89358K wps
Begin Testing...
[Epoch 95] train avg loss 0.00380067, dev acc 0.8878, dev avg loss 0.277253, throughput 3.2279K wps
[Epoch 96 Batch 30/172] avg loss 0.00369254, throughput 3.59347K wps
[Epoch 96 Batch 60/172] avg loss 0.00401555, throughput 3.47621K wps
[Epoch 96 Batch 90/172] avg loss 0.00333056, throughput 3.48554K wps
[Epoch 96 Batch 120/172] avg loss 0.00392512, throughput 3.1694K wps
[Epoch 96 Batch 150/172] avg loss 0.00383753, throughput 3.02699K wps
Begin Testing...
[Epoch 96] train avg loss 0.00378059, dev acc 0.8920, dev avg loss 0.277483, throughput 3.29042K wps
Observed Improvement.
Begin Testing...
[Epoch 97 Batch 30/172] avg loss 0.00387836, throughput 3.41429K wps
[Epoch 97 Batch 60/172] avg loss 0.0037647, throughput 3.24204K wps
[Epoch 97 Batch 90/172] avg loss 0.00364134, throughput 3.45601K wps
[Epoch 97 Batch 120/172] avg loss 0.00375252, throughput 2.92888K wps
[Epoch 97 Batch 150/172] avg loss 0.0035291, throughput 3.09517K wps
Begin Testing...
[Epoch 97] train avg loss 0.00373276, dev acc 0.8899, dev avg loss 0.277271, throughput 3.22839K wps
[Epoch 98 Batch 30/172] avg loss 0.0032855, throughput 3.85013K wps
[Epoch 98 Batch 60/172] avg loss 0.00363829, throughput 3.02748K wps
[Epoch 98 Batch 90/172] avg loss 0.00366183, throughput 3.11002K wps
[Epoch 98 Batch 120/172] avg loss 0.004002, throughput 3.73739K wps
[Epoch 98 Batch 150/172] avg loss 0.00378364, throughput 3.23859K wps
Begin Testing...
[Epoch 98] train avg loss 0.00368021, dev acc 0.8899, dev avg loss 0.276856, throughput 3.35271K wps
[Epoch 99 Batch 30/172] avg loss 0.00354095, throughput 3.01145K wps
[Epoch 99 Batch 60/172] avg loss 0.00365216, throughput 3.06716K wps
[Epoch 99 Batch 90/172] avg loss 0.0033452, throughput 3.39549K wps
[Epoch 99 Batch 120/172] avg loss 0.00373508, throughput 3.1846K wps
[Epoch 99 Batch 150/172] avg loss 0.00400466, throughput 3.16972K wps
Begin Testing...
[Epoch 99] train avg loss 0.00376961, dev acc 0.8910, dev avg loss 0.276333, throughput 3.14765K wps
[Epoch 100 Batch 30/172] avg loss 0.00359199, throughput 3.05371K wps
[Epoch 100 Batch 60/172] avg loss 0.00355463, throughput 3.59673K wps
[Epoch 100 Batch 90/172] avg loss 0.00353953, throughput 3.30055K wps
[Epoch 100 Batch 120/172] avg loss 0.00365711, throughput 3.22626K wps
[Epoch 100 Batch 150/172] avg loss 0.00411808, throughput 3.29826K wps
Begin Testing...
[Epoch 100] train avg loss 0.00371298, dev acc 0.8899, dev avg loss 0.276438, throughput 3.2379K wps
[Epoch 101 Batch 30/172] avg loss 0.00374575, throughput 3.13432K wps
[Epoch 101 Batch 60/172] avg loss 0.00358356, throughput 3.38525K wps
[Epoch 101 Batch 90/172] avg loss 0.00356979, throughput 3.79655K wps
[Epoch 101 Batch 120/172] avg loss 0.00351344, throughput 3.32557K wps
[Epoch 101 Batch 150/172] avg loss 0.00359606, throughput 3.20224K wps
Begin Testing...
[Epoch 101] train avg loss 0.00365718, dev acc 0.8920, dev avg loss 0.277149, throughput 3.36894K wps
Observed Improvement.
Begin Testing...
[Epoch 102 Batch 30/172] avg loss 0.00390119, throughput 2.96009K wps
[Epoch 102 Batch 60/172] avg loss 0.00353319, throughput 4.0259K wps
[Epoch 102 Batch 90/172] avg loss 0.00359621, throughput 3.31893K wps
[Epoch 102 Batch 120/172] avg loss 0.00355162, throughput 3.64269K wps
[Epoch 102 Batch 150/172] avg loss 0.00356408, throughput 3.11705K wps
Begin Testing...
[Epoch 102] train avg loss 0.00364146, dev acc 0.8857, dev avg loss 0.276392, throughput 3.38961K wps
[Epoch 103 Batch 30/172] avg loss 0.00390164, throughput 3.55344K wps
[Epoch 103 Batch 60/172] avg loss 0.00357607, throughput 3.19032K wps
[Epoch 103 Batch 90/172] avg loss 0.00360015, throughput 3.20852K wps
[Epoch 103 Batch 120/172] avg loss 0.00343091, throughput 3.16013K wps
[Epoch 103 Batch 150/172] avg loss 0.00337528, throughput 3.04676K wps
Begin Testing...
[Epoch 103] train avg loss 0.00356614, dev acc 0.8910, dev avg loss 0.276964, throughput 3.26149K wps
[Epoch 104 Batch 30/172] avg loss 0.00365504, throughput 3.03491K wps
[Epoch 104 Batch 60/172] avg loss 0.00343891, throughput 3.62361K wps
[Epoch 104 Batch 90/172] avg loss 0.00341067, throughput 3.74156K wps
[Epoch 104 Batch 120/172] avg loss 0.00376392, throughput 3.24225K wps
[Epoch 104 Batch 150/172] avg loss 0.00342916, throughput 3.31982K wps
Begin Testing...
[Epoch 104] train avg loss 0.00358877, dev acc 0.8899, dev avg loss 0.276731, throughput 3.36981K wps
[Epoch 105 Batch 30/172] avg loss 0.00369423, throughput 3.07326K wps
[Epoch 105 Batch 60/172] avg loss 0.00359537, throughput 3.53609K wps
[Epoch 105 Batch 90/172] avg loss 0.00336356, throughput 3.86283K wps
[Epoch 105 Batch 120/172] avg loss 0.00376326, throughput 3.5374K wps
[Epoch 105 Batch 150/172] avg loss 0.00327505, throughput 3.84304K wps
Begin Testing...
[Epoch 105] train avg loss 0.00357967, dev acc 0.8899, dev avg loss 0.276892, throughput 3.5103K wps
[Epoch 106 Batch 30/172] avg loss 0.00359394, throughput 3.42467K wps
[Epoch 106 Batch 60/172] avg loss 0.00337836, throughput 3.41205K wps
[Epoch 106 Batch 90/172] avg loss 0.00351386, throughput 3.28216K wps
[Epoch 106 Batch 120/172] avg loss 0.00358193, throughput 3.31518K wps
[Epoch 106 Batch 150/172] avg loss 0.0038797, throughput 2.91612K wps
Begin Testing...
[Epoch 106] train avg loss 0.00356209, dev acc 0.8899, dev avg loss 0.277604, throughput 3.23905K wps
[Epoch 107 Batch 30/172] avg loss 0.00340163, throughput 3.33364K wps
[Epoch 107 Batch 60/172] avg loss 0.0038163, throughput 3.36512K wps
[Epoch 107 Batch 90/172] avg loss 0.00352505, throughput 3.08216K wps
[Epoch 107 Batch 120/172] avg loss 0.00324485, throughput 3.21454K wps
[Epoch 107 Batch 150/172] avg loss 0.00358916, throughput 3.49287K wps
Begin Testing...
[Epoch 107] train avg loss 0.00348506, dev acc 0.8889, dev avg loss 0.277417, throughput 3.35914K wps
[Epoch 108 Batch 30/172] avg loss 0.00354588, throughput 3.47884K wps
[Epoch 108 Batch 60/172] avg loss 0.00350264, throughput 2.99631K wps
[Epoch 108 Batch 90/172] avg loss 0.00402344, throughput 3.83631K wps
[Epoch 108 Batch 120/172] avg loss 0.00327664, throughput 3.52952K wps
[Epoch 108 Batch 150/172] avg loss 0.00377588, throughput 3.11333K wps
Begin Testing...
[Epoch 108] train avg loss 0.00357987, dev acc 0.8910, dev avg loss 0.277045, throughput 3.37434K wps
[Epoch 109 Batch 30/172] avg loss 0.00352028, throughput 3.11459K wps
[Epoch 109 Batch 60/172] avg loss 0.00362574, throughput 3.28064K wps
[Epoch 109 Batch 90/172] avg loss 0.00345018, throughput 2.99798K wps
[Epoch 109 Batch 120/172] avg loss 0.00352997, throughput 3.22046K wps
[Epoch 109 Batch 150/172] avg loss 0.00358234, throughput 3.28364K wps
Begin Testing...
[Epoch 109] train avg loss 0.00356041, dev acc 0.8899, dev avg loss 0.276597, throughput 3.18225K wps
[Epoch 110 Batch 30/172] avg loss 0.003582, throughput 2.9147K wps
[Epoch 110 Batch 60/172] avg loss 0.00320542, throughput 3.03275K wps
[Epoch 110 Batch 90/172] avg loss 0.00332913, throughput 3.07471K wps
[Epoch 110 Batch 120/172] avg loss 0.00354208, throughput 3.65882K wps
[Epoch 110 Batch 150/172] avg loss 0.0035522, throughput 3.46851K wps
Begin Testing...
[Epoch 110] train avg loss 0.0034713, dev acc 0.8920, dev avg loss 0.277102, throughput 3.17859K wps
Observed Improvement.
Begin Testing...
[Epoch 111 Batch 30/172] avg loss 0.00322986, throughput 2.97171K wps
[Epoch 111 Batch 60/172] avg loss 0.00346928, throughput 3.19574K wps
[Epoch 111 Batch 90/172] avg loss 0.00347722, throughput 3.54808K wps
[Epoch 111 Batch 120/172] avg loss 0.00323896, throughput 3.53445K wps
[Epoch 111 Batch 150/172] avg loss 0.00350857, throughput 3.0021K wps
Begin Testing...
[Epoch 111] train avg loss 0.00343696, dev acc 0.8878, dev avg loss 0.277881, throughput 3.21447K wps
[Epoch 112 Batch 30/172] avg loss 0.00360378, throughput 3.54612K wps
[Epoch 112 Batch 60/172] avg loss 0.00319304, throughput 3.00503K wps
[Epoch 112 Batch 90/172] avg loss 0.00378094, throughput 3.32628K wps
[Epoch 112 Batch 120/172] avg loss 0.00310354, throughput 3.6228K wps
[Epoch 112 Batch 150/172] avg loss 0.00335044, throughput 3.10312K wps
Begin Testing...
[Epoch 112] train avg loss 0.00343579, dev acc 0.8899, dev avg loss 0.276374, throughput 3.27346K wps
[Epoch 113 Batch 30/172] avg loss 0.00348685, throughput 3.18386K wps
[Epoch 113 Batch 60/172] avg loss 0.00312595, throughput 2.97907K wps
[Epoch 113 Batch 90/172] avg loss 0.00391076, throughput 3.08874K wps
[Epoch 113 Batch 120/172] avg loss 0.00312372, throughput 3.04349K wps
[Epoch 113 Batch 150/172] avg loss 0.00337226, throughput 3.1942K wps
Begin Testing...
[Epoch 113] train avg loss 0.00336848, dev acc 0.8889, dev avg loss 0.276767, throughput 3.09842K wps
[Epoch 114 Batch 30/172] avg loss 0.00325445, throughput 3.94073K wps
[Epoch 114 Batch 60/172] avg loss 0.00384816, throughput 3.02241K wps
[Epoch 114 Batch 90/172] avg loss 0.00338682, throughput 3.22717K wps
[Epoch 114 Batch 120/172] avg loss 0.00324994, throughput 3.07452K wps
[Epoch 114 Batch 150/172] avg loss 0.00355836, throughput 3.43757K wps
Begin Testing...
[Epoch 114] train avg loss 0.00343827, dev acc 0.8910, dev avg loss 0.276542, throughput 3.27076K wps
[Epoch 115 Batch 30/172] avg loss 0.00290444, throughput 3.22811K wps
[Epoch 115 Batch 60/172] avg loss 0.00366241, throughput 3.74551K wps
[Epoch 115 Batch 90/172] avg loss 0.00363667, throughput 3.6754K wps
[Epoch 115 Batch 120/172] avg loss 0.00323663, throughput 3.67369K wps
[Epoch 115 Batch 150/172] avg loss 0.00344813, throughput 3.112K wps
Begin Testing...
[Epoch 115] train avg loss 0.00340938, dev acc 0.8910, dev avg loss 0.276943, throughput 3.48192K wps
[Epoch 116 Batch 30/172] avg loss 0.00330804, throughput 3.50416K wps
[Epoch 116 Batch 60/172] avg loss 0.00309343, throughput 3.13875K wps
[Epoch 116 Batch 90/172] avg loss 0.00326401, throughput 3.05559K wps
[Epoch 116 Batch 120/172] avg loss 0.00331142, throughput 3.2212K wps
[Epoch 116 Batch 150/172] avg loss 0.0037606, throughput 3.45988K wps
Begin Testing...
[Epoch 116] train avg loss 0.00336936, dev acc 0.8889, dev avg loss 0.27635, throughput 3.27818K wps
[Epoch 117 Batch 30/172] avg loss 0.00362315, throughput 3.28107K wps
[Epoch 117 Batch 60/172] avg loss 0.0030854, throughput 3.67978K wps
[Epoch 117 Batch 90/172] avg loss 0.00321052, throughput 3.2638K wps
[Epoch 117 Batch 120/172] avg loss 0.00312067, throughput 3.30156K wps
[Epoch 117 Batch 150/172] avg loss 0.00362087, throughput 3.16038K wps
Begin Testing...
[Epoch 117] train avg loss 0.00339507, dev acc 0.8910, dev avg loss 0.276413, throughput 3.28973K wps
[Epoch 118 Batch 30/172] avg loss 0.00306637, throughput 3.63468K wps
[Epoch 118 Batch 60/172] avg loss 0.00367963, throughput 3.35049K wps
[Epoch 118 Batch 90/172] avg loss 0.00328632, throughput 3.50435K wps
[Epoch 118 Batch 120/172] avg loss 0.00351553, throughput 3.3384K wps
[Epoch 118 Batch 150/172] avg loss 0.00336098, throughput 3.50835K wps
Begin Testing...
[Epoch 118] train avg loss 0.00334986, dev acc 0.8910, dev avg loss 0.27639, throughput 3.45774K wps
[Epoch 119 Batch 30/172] avg loss 0.00316503, throughput 3.02332K wps
[Epoch 119 Batch 60/172] avg loss 0.0034894, throughput 3.54697K wps
[Epoch 119 Batch 90/172] avg loss 0.00343435, throughput 4.21758K wps
[Epoch 119 Batch 120/172] avg loss 0.00300375, throughput 3.93712K wps
[Epoch 119 Batch 150/172] avg loss 0.00336864, throughput 3.38601K wps
Begin Testing...
[Epoch 119] train avg loss 0.00331404, dev acc 0.8878, dev avg loss 0.276632, throughput 3.48711K wps
[Epoch 120 Batch 30/172] avg loss 0.00320779, throughput 3.10841K wps
[Epoch 120 Batch 60/172] avg loss 0.00315765, throughput 3.09899K wps
[Epoch 120 Batch 90/172] avg loss 0.00325685, throughput 3.8693K wps
[Epoch 120 Batch 120/172] avg loss 0.00357846, throughput 3.20267K wps
[Epoch 120 Batch 150/172] avg loss 0.00305545, throughput 3.17484K wps
Begin Testing...
[Epoch 120] train avg loss 0.00327534, dev acc 0.8910, dev avg loss 0.277406, throughput 3.28077K wps
[Epoch 121 Batch 30/172] avg loss 0.00356067, throughput 3.18297K wps
[Epoch 121 Batch 60/172] avg loss 0.00281527, throughput 3.37766K wps
[Epoch 121 Batch 90/172] avg loss 0.00332337, throughput 3.49412K wps
[Epoch 121 Batch 120/172] avg loss 0.00359174, throughput 3.19786K wps
[Epoch 121 Batch 150/172] avg loss 0.00335963, throughput 3.37158K wps
Begin Testing...
[Epoch 121] train avg loss 0.00332246, dev acc 0.8920, dev avg loss 0.277831, throughput 3.35476K wps
Observed Improvement.
Begin Testing...
[Epoch 122 Batch 30/172] avg loss 0.00307352, throughput 3.32072K wps
[Epoch 122 Batch 60/172] avg loss 0.00320259, throughput 3.18189K wps
[Epoch 122 Batch 90/172] avg loss 0.00340958, throughput 3.9267K wps
[Epoch 122 Batch 120/172] avg loss 0.00331796, throughput 3.5264K wps
[Epoch 122 Batch 150/172] avg loss 0.00332677, throughput 3.26784K wps
Begin Testing...
[Epoch 122] train avg loss 0.00325231, dev acc 0.8910, dev avg loss 0.278055, throughput 3.51658K wps
[Epoch 123 Batch 30/172] avg loss 0.00370421, throughput 3.32484K wps
[Epoch 123 Batch 60/172] avg loss 0.00319491, throughput 3.12936K wps
[Epoch 123 Batch 90/172] avg loss 0.00314357, throughput 3.0626K wps
[Epoch 123 Batch 120/172] avg loss 0.00310039, throughput 3.01648K wps
[Epoch 123 Batch 150/172] avg loss 0.00315428, throughput 3.05584K wps
Begin Testing...
[Epoch 123] train avg loss 0.00325519, dev acc 0.8899, dev avg loss 0.277461, throughput 3.16898K wps
[Epoch 124 Batch 30/172] avg loss 0.00315844, throughput 3.39124K wps
[Epoch 124 Batch 60/172] avg loss 0.00313456, throughput 3.37494K wps
[Epoch 124 Batch 90/172] avg loss 0.00326648, throughput 3.32715K wps
[Epoch 124 Batch 120/172] avg loss 0.00348169, throughput 3.10439K wps
[Epoch 124 Batch 150/172] avg loss 0.00289383, throughput 3.29389K wps
Begin Testing...
[Epoch 124] train avg loss 0.00323363, dev acc 0.8878, dev avg loss 0.276466, throughput 3.30643K wps
[Epoch 125 Batch 30/172] avg loss 0.00320597, throughput 3.26637K wps
[Epoch 125 Batch 60/172] avg loss 0.00344298, throughput 3.07473K wps
[Epoch 125 Batch 90/172] avg loss 0.00328838, throughput 3.12539K wps
[Epoch 125 Batch 120/172] avg loss 0.00308841, throughput 3.75207K wps
[Epoch 125 Batch 150/172] avg loss 0.00332836, throughput 3.14521K wps
Begin Testing...
[Epoch 125] train avg loss 0.00328585, dev acc 0.8910, dev avg loss 0.27657, throughput 3.30877K wps
[Epoch 126 Batch 30/172] avg loss 0.00295084, throughput 3.36467K wps
[Epoch 126 Batch 60/172] avg loss 0.00359573, throughput 3.80071K wps
[Epoch 126 Batch 90/172] avg loss 0.00334053, throughput 3.39681K wps
[Epoch 126 Batch 120/172] avg loss 0.00329454, throughput 3.10364K wps
[Epoch 126 Batch 150/172] avg loss 0.00324109, throughput 3.40935K wps
Begin Testing...
[Epoch 126] train avg loss 0.00325753, dev acc 0.8910, dev avg loss 0.276937, throughput 3.35399K wps
[Epoch 127 Batch 30/172] avg loss 0.00302202, throughput 3.53266K wps
[Epoch 127 Batch 60/172] avg loss 0.00339878, throughput 3.28729K wps
[Epoch 127 Batch 90/172] avg loss 0.00304856, throughput 3.86631K wps
[Epoch 127 Batch 120/172] avg loss 0.00317195, throughput 3.38145K wps
[Epoch 127 Batch 150/172] avg loss 0.00305337, throughput 3.18107K wps
Begin Testing...
[Epoch 127] train avg loss 0.00313518, dev acc 0.8899, dev avg loss 0.27775, throughput 3.40968K wps
[Epoch 128 Batch 30/172] avg loss 0.00327587, throughput 3.43607K wps
[Epoch 128 Batch 60/172] avg loss 0.00283793, throughput 3.04095K wps
[Epoch 128 Batch 90/172] avg loss 0.00293052, throughput 3.33252K wps
[Epoch 128 Batch 120/172] avg loss 0.00330912, throughput 3.35921K wps
[Epoch 128 Batch 150/172] avg loss 0.00344206, throughput 3.51953K wps
Begin Testing...
[Epoch 128] train avg loss 0.00315656, dev acc 0.8889, dev avg loss 0.277392, throughput 3.34479K wps
[Epoch 129 Batch 30/172] avg loss 0.00346548, throughput 3.79033K wps
[Epoch 129 Batch 60/172] avg loss 0.00280072, throughput 3.4818K wps
[Epoch 129 Batch 90/172] avg loss 0.00298693, throughput 3.08892K wps
[Epoch 129 Batch 120/172] avg loss 0.00321451, throughput 3.38518K wps
[Epoch 129 Batch 150/172] avg loss 0.00322955, throughput 2.98746K wps
Begin Testing...
[Epoch 129] train avg loss 0.00318524, dev acc 0.8878, dev avg loss 0.279545, throughput 3.32983K wps
[Epoch 130 Batch 30/172] avg loss 0.00300358, throughput 3.05778K wps
[Epoch 130 Batch 60/172] avg loss 0.00328329, throughput 3.93686K wps
[Epoch 130 Batch 90/172] avg loss 0.00312892, throughput 3.10326K wps
[Epoch 130 Batch 120/172] avg loss 0.00335097, throughput 3.08164K wps
[Epoch 130 Batch 150/172] avg loss 0.00317889, throughput 3.27515K wps
Begin Testing...
[Epoch 130] train avg loss 0.00313768, dev acc 0.8878, dev avg loss 0.277466, throughput 3.2794K wps
[Epoch 131 Batch 30/172] avg loss 0.00326265, throughput 3.00743K wps
[Epoch 131 Batch 60/172] avg loss 0.00319834, throughput 3.07632K wps
[Epoch 131 Batch 90/172] avg loss 0.00302265, throughput 3.57231K wps
[Epoch 131 Batch 120/172] avg loss 0.00305621, throughput 3.34676K wps
[Epoch 131 Batch 150/172] avg loss 0.00311009, throughput 3.17913K wps
Begin Testing...
[Epoch 131] train avg loss 0.0031008, dev acc 0.8910, dev avg loss 0.276893, throughput 3.22511K wps
[Epoch 132 Batch 30/172] avg loss 0.00306003, throughput 3.70379K wps
[Epoch 132 Batch 60/172] avg loss 0.00338903, throughput 3.1356K wps
[Epoch 132 Batch 90/172] avg loss 0.00324836, throughput 3.3374K wps
[Epoch 132 Batch 120/172] avg loss 0.00275874, throughput 3.07318K wps
[Epoch 132 Batch 150/172] avg loss 0.00324411, throughput 3.23599K wps
Begin Testing...
[Epoch 132] train avg loss 0.00311365, dev acc 0.8889, dev avg loss 0.277181, throughput 3.35129K wps
[Epoch 133 Batch 30/172] avg loss 0.00289378, throughput 3.20284K wps
[Epoch 133 Batch 60/172] avg loss 0.00333548, throughput 3.37544K wps
[Epoch 133 Batch 90/172] avg loss 0.00310943, throughput 3.16985K wps
[Epoch 133 Batch 120/172] avg loss 0.00316822, throughput 3.44402K wps
[Epoch 133 Batch 150/172] avg loss 0.00333425, throughput 3.05404K wps
Begin Testing...
[Epoch 133] train avg loss 0.0031096, dev acc 0.8889, dev avg loss 0.279096, throughput 3.21375K wps
[Epoch 134 Batch 30/172] avg loss 0.00298641, throughput 3.63299K wps
[Epoch 134 Batch 60/172] avg loss 0.00322491, throughput 3.29211K wps
[Epoch 134 Batch 90/172] avg loss 0.00306252, throughput 2.94085K wps
[Epoch 134 Batch 120/172] avg loss 0.00306062, throughput 3.44329K wps
[Epoch 134 Batch 150/172] avg loss 0.00333637, throughput 3.45295K wps
Begin Testing...
[Epoch 134] train avg loss 0.00309701, dev acc 0.8920, dev avg loss 0.277948, throughput 3.36494K wps
Observed Improvement.
Begin Testing...
[Epoch 135 Batch 30/172] avg loss 0.0032603, throughput 3.26535K wps
[Epoch 135 Batch 60/172] avg loss 0.00319637, throughput 3.31819K wps
[Epoch 135 Batch 90/172] avg loss 0.00290782, throughput 3.38641K wps
[Epoch 135 Batch 120/172] avg loss 0.00323739, throughput 3.23567K wps
[Epoch 135 Batch 150/172] avg loss 0.00322334, throughput 2.99011K wps
Begin Testing...
[Epoch 135] train avg loss 0.00313473, dev acc 0.8889, dev avg loss 0.278925, throughput 3.22918K wps
[Epoch 136 Batch 30/172] avg loss 0.00281186, throughput 3.54261K wps
[Epoch 136 Batch 60/172] avg loss 0.0033496, throughput 2.97958K wps
[Epoch 136 Batch 90/172] avg loss 0.00285661, throughput 3.04628K wps
[Epoch 136 Batch 120/172] avg loss 0.00341762, throughput 3.2992K wps
[Epoch 136 Batch 150/172] avg loss 0.00313327, throughput 3.56255K wps
Begin Testing...
[Epoch 136] train avg loss 0.00309769, dev acc 0.8899, dev avg loss 0.278323, throughput 3.31834K wps
[Epoch 137 Batch 30/172] avg loss 0.00327871, throughput 2.97944K wps
[Epoch 137 Batch 60/172] avg loss 0.00274762, throughput 3.0414K wps
[Epoch 137 Batch 90/172] avg loss 0.00309741, throughput 3.46366K wps
[Epoch 137 Batch 120/172] avg loss 0.00330511, throughput 3.15218K wps
[Epoch 137 Batch 150/172] avg loss 0.0027607, throughput 3.24714K wps
Begin Testing...
[Epoch 137] train avg loss 0.00305263, dev acc 0.8878, dev avg loss 0.277704, throughput 3.19868K wps
[Epoch 138 Batch 30/172] avg loss 0.00303972, throughput 3.47031K wps
[Epoch 138 Batch 60/172] avg loss 0.00305273, throughput 3.39599K wps
[Epoch 138 Batch 90/172] avg loss 0.003315, throughput 2.997K wps
[Epoch 138 Batch 120/172] avg loss 0.00309019, throughput 3.7125K wps
[Epoch 138 Batch 150/172] avg loss 0.00315034, throughput 3.24721K wps
Begin Testing...
[Epoch 138] train avg loss 0.00306582, dev acc 0.8899, dev avg loss 0.277519, throughput 3.37276K wps
[Epoch 139 Batch 30/172] avg loss 0.00314938, throughput 3.16309K wps
[Epoch 139 Batch 60/172] avg loss 0.0032401, throughput 2.99688K wps
[Epoch 139 Batch 90/172] avg loss 0.00273629, throughput 3.21079K wps
[Epoch 139 Batch 120/172] avg loss 0.00290705, throughput 3.38864K wps
[Epoch 139 Batch 150/172] avg loss 0.00295109, throughput 3.1899K wps
Begin Testing...
[Epoch 139] train avg loss 0.00298063, dev acc 0.8910, dev avg loss 0.280767, throughput 3.20504K wps
[Epoch 140 Batch 30/172] avg loss 0.00276114, throughput 2.88291K wps
[Epoch 140 Batch 60/172] avg loss 0.00323748, throughput 3.10714K wps
[Epoch 140 Batch 90/172] avg loss 0.00297264, throughput 3.31337K wps
[Epoch 140 Batch 120/172] avg loss 0.0027403, throughput 3.12976K wps
[Epoch 140 Batch 150/172] avg loss 0.00338579, throughput 3.60527K wps
Begin Testing...
[Epoch 140] train avg loss 0.00304656, dev acc 0.8899, dev avg loss 0.278127, throughput 3.16936K wps
[Epoch 141 Batch 30/172] avg loss 0.00296777, throughput 3.01173K wps
[Epoch 141 Batch 60/172] avg loss 0.00305277, throughput 3.86956K wps
[Epoch 141 Batch 90/172] avg loss 0.00305597, throughput 3.4662K wps
[Epoch 141 Batch 120/172] avg loss 0.00333804, throughput 3.70826K wps
[Epoch 141 Batch 150/172] avg loss 0.00299174, throughput 3.24334K wps
Begin Testing...
[Epoch 141] train avg loss 0.00305189, dev acc 0.8899, dev avg loss 0.277524, throughput 3.44825K wps
[Epoch 142 Batch 30/172] avg loss 0.00300221, throughput 3.01169K wps
[Epoch 142 Batch 60/172] avg loss 0.00281884, throughput 3.55336K wps
[Epoch 142 Batch 90/172] avg loss 0.0028921, throughput 3.0875K wps
[Epoch 142 Batch 120/172] avg loss 0.00305298, throughput 3.32543K wps
[Epoch 142 Batch 150/172] avg loss 0.00273914, throughput 3.36248K wps
Begin Testing...
[Epoch 142] train avg loss 0.00294757, dev acc 0.8878, dev avg loss 0.277676, throughput 3.31225K wps
[Epoch 143 Batch 30/172] avg loss 0.00304314, throughput 3.25714K wps
[Epoch 143 Batch 60/172] avg loss 0.0029343, throughput 2.99084K wps
[Epoch 143 Batch 90/172] avg loss 0.00294066, throughput 3.32974K wps
[Epoch 143 Batch 120/172] avg loss 0.00318291, throughput 3.60052K wps
[Epoch 143 Batch 150/172] avg loss 0.00302319, throughput 3.18917K wps
Begin Testing...
[Epoch 143] train avg loss 0.00296844, dev acc 0.8910, dev avg loss 0.278824, throughput 3.32363K wps
[Epoch 144 Batch 30/172] avg loss 0.00282801, throughput 3.06209K wps
[Epoch 144 Batch 60/172] avg loss 0.00298921, throughput 3.14182K wps
[Epoch 144 Batch 90/172] avg loss 0.00310851, throughput 3.23752K wps
[Epoch 144 Batch 120/172] avg loss 0.00283815, throughput 3.18733K wps
[Epoch 144 Batch 150/172] avg loss 0.00311052, throughput 3.37944K wps
Begin Testing...
[Epoch 144] train avg loss 0.00296192, dev acc 0.8931, dev avg loss 0.278305, throughput 3.2555K wps
Observed Improvement.
Begin Testing...
[Epoch 145 Batch 30/172] avg loss 0.00301673, throughput 3.57271K wps
[Epoch 145 Batch 60/172] avg loss 0.00296442, throughput 3.1071K wps
[Epoch 145 Batch 90/172] avg loss 0.0027907, throughput 3.16286K wps
[Epoch 145 Batch 120/172] avg loss 0.00307003, throughput 3.15354K wps
[Epoch 145 Batch 150/172] avg loss 0.00292861, throughput 2.89822K wps
Begin Testing...
[Epoch 145] train avg loss 0.00297576, dev acc 0.8920, dev avg loss 0.278923, throughput 3.17223K wps
[Epoch 146 Batch 30/172] avg loss 0.00293942, throughput 3.04251K wps
[Epoch 146 Batch 60/172] avg loss 0.00304141, throughput 3.25649K wps
[Epoch 146 Batch 90/172] avg loss 0.00312538, throughput 2.96375K wps
[Epoch 146 Batch 120/172] avg loss 0.00264554, throughput 3.71819K wps
[Epoch 146 Batch 150/172] avg loss 0.00303922, throughput 3.77342K wps
Begin Testing...
[Epoch 146] train avg loss 0.00294323, dev acc 0.8899, dev avg loss 0.27888, throughput 3.3443K wps
[Epoch 147 Batch 30/172] avg loss 0.00282217, throughput 3.64919K wps
[Epoch 147 Batch 60/172] avg loss 0.00295119, throughput 3.06215K wps
[Epoch 147 Batch 90/172] avg loss 0.00282613, throughput 3.39725K wps
[Epoch 147 Batch 120/172] avg loss 0.00300864, throughput 3.59175K wps
[Epoch 147 Batch 150/172] avg loss 0.00321743, throughput 3.20913K wps
Begin Testing...
[Epoch 147] train avg loss 0.00291231, dev acc 0.8889, dev avg loss 0.278996, throughput 3.3751K wps
[Epoch 148 Batch 30/172] avg loss 0.0028822, throughput 3.34748K wps
[Epoch 148 Batch 60/172] avg loss 0.00241727, throughput 3.34075K wps
[Epoch 148 Batch 90/172] avg loss 0.00317472, throughput 3.39789K wps
[Epoch 148 Batch 120/172] avg loss 0.00268865, throughput 3.32488K wps
[Epoch 148 Batch 150/172] avg loss 0.00354325, throughput 3.14293K wps
Begin Testing...
[Epoch 148] train avg loss 0.00293361, dev acc 0.8899, dev avg loss 0.27888, throughput 3.27983K wps
[Epoch 149 Batch 30/172] avg loss 0.00296922, throughput 3.02694K wps
[Epoch 149 Batch 60/172] avg loss 0.00295303, throughput 3.30672K wps
[Epoch 149 Batch 90/172] avg loss 0.00291433, throughput 3.44588K wps
[Epoch 149 Batch 120/172] avg loss 0.00280301, throughput 3.47423K wps
[Epoch 149 Batch 150/172] avg loss 0.00329091, throughput 3.85034K wps
Begin Testing...
[Epoch 149] train avg loss 0.00290667, dev acc 0.8889, dev avg loss 0.279431, throughput 3.40837K wps
[Epoch 150 Batch 30/172] avg loss 0.00317958, throughput 3.34887K wps
[Epoch 150 Batch 60/172] avg loss 0.00270124, throughput 3.1015K wps
[Epoch 150 Batch 90/172] avg loss 0.00312651, throughput 3.53101K wps
[Epoch 150 Batch 120/172] avg loss 0.00259828, throughput 3.592K wps
[Epoch 150 Batch 150/172] avg loss 0.00285849, throughput 3.48091K wps
Begin Testing...
[Epoch 150] train avg loss 0.00290364, dev acc 0.8920, dev avg loss 0.277768, throughput 3.39417K wps
[Epoch 151 Batch 30/172] avg loss 0.00286052, throughput 3.88483K wps
[Epoch 151 Batch 60/172] avg loss 0.00294991, throughput 3.34977K wps
[Epoch 151 Batch 90/172] avg loss 0.00301047, throughput 3.82159K wps
[Epoch 151 Batch 120/172] avg loss 0.0027187, throughput 3.01701K wps
[Epoch 151 Batch 150/172] avg loss 0.00271784, throughput 2.99124K wps
Begin Testing...
[Epoch 151] train avg loss 0.00283613, dev acc 0.8899, dev avg loss 0.278126, throughput 3.43251K wps
[Epoch 152 Batch 30/172] avg loss 0.00298008, throughput 3.26012K wps
[Epoch 152 Batch 60/172] avg loss 0.0027687, throughput 3.20387K wps
[Epoch 152 Batch 90/172] avg loss 0.00330191, throughput 3.13181K wps
[Epoch 152 Batch 120/172] avg loss 0.00251393, throughput 3.60892K wps
[Epoch 152 Batch 150/172] avg loss 0.00277943, throughput 3.40022K wps
Begin Testing...
[Epoch 152] train avg loss 0.00284892, dev acc 0.8889, dev avg loss 0.279846, throughput 3.27756K wps
[Epoch 153 Batch 30/172] avg loss 0.00273686, throughput 3.13124K wps
[Epoch 153 Batch 60/172] avg loss 0.00294789, throughput 3.22904K wps
[Epoch 153 Batch 90/172] avg loss 0.00258614, throughput 3.03399K wps
[Epoch 153 Batch 120/172] avg loss 0.00282931, throughput 3.15852K wps
[Epoch 153 Batch 150/172] avg loss 0.00287712, throughput 3.44775K wps
Begin Testing...
[Epoch 153] train avg loss 0.00282467, dev acc 0.8931, dev avg loss 0.278852, throughput 3.20233K wps
Observed Improvement.
Begin Testing...
[Epoch 154 Batch 30/172] avg loss 0.00269321, throughput 3.67662K wps
[Epoch 154 Batch 60/172] avg loss 0.00311812, throughput 3.16706K wps
[Epoch 154 Batch 90/172] avg loss 0.00275974, throughput 3.69184K wps
[Epoch 154 Batch 120/172] avg loss 0.00284183, throughput 4.02033K wps
[Epoch 154 Batch 150/172] avg loss 0.00245562, throughput 3.31949K wps
Begin Testing...
[Epoch 154] train avg loss 0.0028235, dev acc 0.8878, dev avg loss 0.28116, throughput 3.64218K wps
[Epoch 155 Batch 30/172] avg loss 0.0027471, throughput 2.97741K wps
[Epoch 155 Batch 60/172] avg loss 0.0029672, throughput 3.80372K wps
[Epoch 155 Batch 90/172] avg loss 0.00308721, throughput 3.09185K wps
[Epoch 155 Batch 120/172] avg loss 0.00271897, throughput 3.3335K wps
[Epoch 155 Batch 150/172] avg loss 0.00259356, throughput 3.13126K wps
Begin Testing...
[Epoch 155] train avg loss 0.00283854, dev acc 0.8931, dev avg loss 0.278466, throughput 3.24678K wps
Observed Improvement.
Begin Testing...
[Epoch 156 Batch 30/172] avg loss 0.00266719, throughput 3.05707K wps
[Epoch 156 Batch 60/172] avg loss 0.00322382, throughput 3.31719K wps
[Epoch 156 Batch 90/172] avg loss 0.00300469, throughput 3.36298K wps
[Epoch 156 Batch 120/172] avg loss 0.00233125, throughput 3.1328K wps
[Epoch 156 Batch 150/172] avg loss 0.00277836, throughput 3.46458K wps
Begin Testing...
[Epoch 156] train avg loss 0.00278988, dev acc 0.8889, dev avg loss 0.279758, throughput 3.26757K wps
[Epoch 157 Batch 30/172] avg loss 0.00294831, throughput 3.32798K wps
[Epoch 157 Batch 60/172] avg loss 0.00270951, throughput 3.38886K wps
[Epoch 157 Batch 90/172] avg loss 0.00301718, throughput 3.18903K wps
[Epoch 157 Batch 120/172] avg loss 0.00284424, throughput 3.60229K wps
[Epoch 157 Batch 150/172] avg loss 0.00277717, throughput 2.99621K wps
Begin Testing...
[Epoch 157] train avg loss 0.00282253, dev acc 0.8899, dev avg loss 0.279913, throughput 3.2605K wps
[Epoch 158 Batch 30/172] avg loss 0.00248953, throughput 3.7656K wps
[Epoch 158 Batch 60/172] avg loss 0.00285212, throughput 3.66804K wps
[Epoch 158 Batch 90/172] avg loss 0.00275086, throughput 3.25189K wps
[Epoch 158 Batch 120/172] avg loss 0.00285105, throughput 3.47965K wps
[Epoch 158 Batch 150/172] avg loss 0.00280776, throughput 3.05296K wps
Begin Testing...
[Epoch 158] train avg loss 0.00275432, dev acc 0.8899, dev avg loss 0.279574, throughput 3.40857K wps
[Epoch 159 Batch 30/172] avg loss 0.0029164, throughput 3.34078K wps
[Epoch 159 Batch 60/172] avg loss 0.00280881, throughput 3.16633K wps
[Epoch 159 Batch 90/172] avg loss 0.00254729, throughput 3.28242K wps
[Epoch 159 Batch 120/172] avg loss 0.00274711, throughput 3.45354K wps
[Epoch 159 Batch 150/172] avg loss 0.00282208, throughput 3.19955K wps
Begin Testing...
[Epoch 159] train avg loss 0.002729, dev acc 0.8899, dev avg loss 0.281506, throughput 3.26664K wps
[Epoch 160 Batch 30/172] avg loss 0.0025287, throughput 3.32963K wps
[Epoch 160 Batch 60/172] avg loss 0.0025207, throughput 3.19334K wps
[Epoch 160 Batch 90/172] avg loss 0.00273296, throughput 3.30833K wps
[Epoch 160 Batch 120/172] avg loss 0.003145, throughput 2.94866K wps
[Epoch 160 Batch 150/172] avg loss 0.00252908, throughput 2.92958K wps
Begin Testing...
[Epoch 160] train avg loss 0.00269092, dev acc 0.8941, dev avg loss 0.280914, throughput 3.11147K wps
Observed Improvement.
Begin Testing...
[Epoch 161 Batch 30/172] avg loss 0.00283396, throughput 3.34245K wps
[Epoch 161 Batch 60/172] avg loss 0.00278245, throughput 3.43207K wps
[Epoch 161 Batch 90/172] avg loss 0.00287293, throughput 3.2796K wps
[Epoch 161 Batch 120/172] avg loss 0.00264421, throughput 3.03664K wps
[Epoch 161 Batch 150/172] avg loss 0.0028018, throughput 2.97656K wps
Begin Testing...
[Epoch 161] train avg loss 0.00275042, dev acc 0.8899, dev avg loss 0.280575, throughput 3.25753K wps
[Epoch 162 Batch 30/172] avg loss 0.00290521, throughput 3.25212K wps
[Epoch 162 Batch 60/172] avg loss 0.00273229, throughput 3.65541K wps
[Epoch 162 Batch 90/172] avg loss 0.00300675, throughput 3.7582K wps
[Epoch 162 Batch 120/172] avg loss 0.00247052, throughput 3.5255K wps
[Epoch 162 Batch 150/172] avg loss 0.00261358, throughput 3.04268K wps
Begin Testing...
[Epoch 162] train avg loss 0.0027193, dev acc 0.8920, dev avg loss 0.28126, throughput 3.42039K wps
[Epoch 163 Batch 30/172] avg loss 0.00286341, throughput 3.35397K wps
[Epoch 163 Batch 60/172] avg loss 0.00284015, throughput 3.10513K wps
[Epoch 163 Batch 90/172] avg loss 0.00243016, throughput 3.44589K wps
[Epoch 163 Batch 120/172] avg loss 0.00231289, throughput 3.10299K wps
[Epoch 163 Batch 150/172] avg loss 0.0027376, throughput 4.20069K wps
Begin Testing...
[Epoch 163] train avg loss 0.00266759, dev acc 0.8889, dev avg loss 0.280364, throughput 3.3996K wps
[Epoch 164 Batch 30/172] avg loss 0.00239015, throughput 3.48661K wps
[Epoch 164 Batch 60/172] avg loss 0.00243963, throughput 3.0176K wps
[Epoch 164 Batch 90/172] avg loss 0.00256284, throughput 3.43075K wps
[Epoch 164 Batch 120/172] avg loss 0.00299651, throughput 3.32051K wps
[Epoch 164 Batch 150/172] avg loss 0.00263353, throughput 3.42527K wps
Begin Testing...
[Epoch 164] train avg loss 0.00268654, dev acc 0.8889, dev avg loss 0.280341, throughput 3.41248K wps
[Epoch 165 Batch 30/172] avg loss 0.00241143, throughput 3.21602K wps
[Epoch 165 Batch 60/172] avg loss 0.00266355, throughput 3.56452K wps
[Epoch 165 Batch 90/172] avg loss 0.00291658, throughput 3.5261K wps
[Epoch 165 Batch 120/172] avg loss 0.00298617, throughput 4.01779K wps
[Epoch 165 Batch 150/172] avg loss 0.00247586, throughput 3.43026K wps
Begin Testing...
[Epoch 165] train avg loss 0.0027011, dev acc 0.8899, dev avg loss 0.280477, throughput 3.44721K wps
[Epoch 166 Batch 30/172] avg loss 0.00269765, throughput 3.16534K wps
[Epoch 166 Batch 60/172] avg loss 0.00283265, throughput 3.01576K wps
[Epoch 166 Batch 90/172] avg loss 0.0025097, throughput 3.28839K wps
[Epoch 166 Batch 120/172] avg loss 0.00242149, throughput 3.66987K wps
[Epoch 166 Batch 150/172] avg loss 0.00286828, throughput 3.84245K wps
Begin Testing...
[Epoch 166] train avg loss 0.0026799, dev acc 0.8910, dev avg loss 0.281207, throughput 3.3149K wps
[Epoch 167 Batch 30/172] avg loss 0.00263158, throughput 3.02147K wps
[Epoch 167 Batch 60/172] avg loss 0.00258804, throughput 4.01814K wps
[Epoch 167 Batch 90/172] avg loss 0.00268757, throughput 3.35692K wps
[Epoch 167 Batch 120/172] avg loss 0.00248384, throughput 3.10867K wps
[Epoch 167 Batch 150/172] avg loss 0.00258453, throughput 3.38622K wps
Begin Testing...
[Epoch 167] train avg loss 0.00260743, dev acc 0.8899, dev avg loss 0.280826, throughput 3.31939K wps
[Epoch 168 Batch 30/172] avg loss 0.00276852, throughput 3.42892K wps
[Epoch 168 Batch 60/172] avg loss 0.00238684, throughput 4.32249K wps
[Epoch 168 Batch 90/172] avg loss 0.00248425, throughput 3.16303K wps
[Epoch 168 Batch 120/172] avg loss 0.00264794, throughput 3.2279K wps
[Epoch 168 Batch 150/172] avg loss 0.00267576, throughput 3.48999K wps
Begin Testing...
[Epoch 168] train avg loss 0.00264768, dev acc 0.8899, dev avg loss 0.280595, throughput 3.53748K wps
[Epoch 169 Batch 30/172] avg loss 0.00258595, throughput 3.04484K wps
[Epoch 169 Batch 60/172] avg loss 0.00273659, throughput 3.32567K wps
[Epoch 169 Batch 90/172] avg loss 0.0024953, throughput 3.32729K wps
[Epoch 169 Batch 120/172] avg loss 0.00274037, throughput 3.14171K wps
[Epoch 169 Batch 150/172] avg loss 0.00268777, throughput 3.44916K wps
Begin Testing...
[Epoch 169] train avg loss 0.00266018, dev acc 0.8899, dev avg loss 0.280704, throughput 3.29762K wps
[Epoch 170 Batch 30/172] avg loss 0.00259988, throughput 3.20988K wps
[Epoch 170 Batch 60/172] avg loss 0.00257089, throughput 2.98167K wps
[Epoch 170 Batch 90/172] avg loss 0.00240085, throughput 3.14274K wps
[Epoch 170 Batch 120/172] avg loss 0.0027927, throughput 3.11304K wps
[Epoch 170 Batch 150/172] avg loss 0.0026419, throughput 3.54806K wps
Begin Testing...
[Epoch 170] train avg loss 0.00262273, dev acc 0.8952, dev avg loss 0.281188, throughput 3.24816K wps
Observed Improvement.
Begin Testing...
[Epoch 171 Batch 30/172] avg loss 0.00262244, throughput 2.93547K wps
[Epoch 171 Batch 60/172] avg loss 0.00236531, throughput 3.05243K wps
[Epoch 171 Batch 90/172] avg loss 0.00268793, throughput 2.99279K wps
[Epoch 171 Batch 120/172] avg loss 0.00243763, throughput 3.36414K wps
[Epoch 171 Batch 150/172] avg loss 0.00265205, throughput 3.26667K wps
Begin Testing...
[Epoch 171] train avg loss 0.00257602, dev acc 0.8910, dev avg loss 0.280599, throughput 3.08041K wps
[Epoch 172 Batch 30/172] avg loss 0.00266974, throughput 3.40355K wps
[Epoch 172 Batch 60/172] avg loss 0.00260155, throughput 3.18438K wps
[Epoch 172 Batch 90/172] avg loss 0.00267849, throughput 3.16386K wps
[Epoch 172 Batch 120/172] avg loss 0.00257381, throughput 3.38464K wps
[Epoch 172 Batch 150/172] avg loss 0.0028005, throughput 2.96026K wps
Begin Testing...
[Epoch 172] train avg loss 0.00264413, dev acc 0.8920, dev avg loss 0.280643, throughput 3.23626K wps
[Epoch 173 Batch 30/172] avg loss 0.00238833, throughput 3.03558K wps
[Epoch 173 Batch 60/172] avg loss 0.00263187, throughput 3.13093K wps
[Epoch 173 Batch 90/172] avg loss 0.00251103, throughput 3.50273K wps
[Epoch 173 Batch 120/172] avg loss 0.00225129, throughput 3.51935K wps
[Epoch 173 Batch 150/172] avg loss 0.00239982, throughput 2.91887K wps
Begin Testing...
[Epoch 173] train avg loss 0.00250495, dev acc 0.8899, dev avg loss 0.281501, throughput 3.16837K wps
[Epoch 174 Batch 30/172] avg loss 0.00258337, throughput 3.0211K wps
[Epoch 174 Batch 60/172] avg loss 0.00278747, throughput 3.29235K wps
[Epoch 174 Batch 90/172] avg loss 0.00228651, throughput 3.03707K wps
[Epoch 174 Batch 120/172] avg loss 0.00244399, throughput 3.30091K wps
[Epoch 174 Batch 150/172] avg loss 0.00263828, throughput 3.39094K wps
Begin Testing...
[Epoch 174] train avg loss 0.00251843, dev acc 0.8962, dev avg loss 0.28281, throughput 3.1634K wps
Observed Improvement.
Begin Testing...
[Epoch 175 Batch 30/172] avg loss 0.00252911, throughput 3.69438K wps
[Epoch 175 Batch 60/172] avg loss 0.00231658, throughput 3.58969K wps
[Epoch 175 Batch 90/172] avg loss 0.00268937, throughput 3.26771K wps
[Epoch 175 Batch 120/172] avg loss 0.0024961, throughput 3.93397K wps
[Epoch 175 Batch 150/172] avg loss 0.00264076, throughput 3.52309K wps
Begin Testing...
[Epoch 175] train avg loss 0.00254546, dev acc 0.8910, dev avg loss 0.280856, throughput 3.51746K wps
[Epoch 176 Batch 30/172] avg loss 0.00278778, throughput 3.33092K wps
[Epoch 176 Batch 60/172] avg loss 0.00240737, throughput 3.58022K wps
[Epoch 176 Batch 90/172] avg loss 0.00255056, throughput 3.18306K wps
[Epoch 176 Batch 120/172] avg loss 0.00253932, throughput 3.5578K wps
[Epoch 176 Batch 150/172] avg loss 0.00268557, throughput 3.09532K wps
Begin Testing...
[Epoch 176] train avg loss 0.0026079, dev acc 0.8941, dev avg loss 0.28116, throughput 3.34264K wps
[Epoch 177 Batch 30/172] avg loss 0.00284485, throughput 3.66634K wps
[Epoch 177 Batch 60/172] avg loss 0.00251524, throughput 3.74541K wps
[Epoch 177 Batch 90/172] avg loss 0.00244328, throughput 2.99907K wps
[Epoch 177 Batch 120/172] avg loss 0.00254146, throughput 3.60771K wps
[Epoch 177 Batch 150/172] avg loss 0.00228719, throughput 3.89113K wps
Begin Testing...
[Epoch 177] train avg loss 0.00251377, dev acc 0.8910, dev avg loss 0.282119, throughput 3.54029K wps
[Epoch 178 Batch 30/172] avg loss 0.0028397, throughput 3.70619K wps
[Epoch 178 Batch 60/172] avg loss 0.00271361, throughput 3.71659K wps
[Epoch 178 Batch 90/172] avg loss 0.00245628, throughput 3.39535K wps
[Epoch 178 Batch 120/172] avg loss 0.0022827, throughput 3.41439K wps
[Epoch 178 Batch 150/172] avg loss 0.00212726, throughput 3.91117K wps
Begin Testing...
[Epoch 178] train avg loss 0.00252009, dev acc 0.8952, dev avg loss 0.283216, throughput 3.61012K wps
[Epoch 179 Batch 30/172] avg loss 0.00245624, throughput 3.56843K wps
[Epoch 179 Batch 60/172] avg loss 0.00253692, throughput 3.3619K wps
[Epoch 179 Batch 90/172] avg loss 0.00277841, throughput 3.61074K wps
[Epoch 179 Batch 120/172] avg loss 0.00229037, throughput 3.27653K wps
[Epoch 179 Batch 150/172] avg loss 0.00265663, throughput 3.48221K wps
Begin Testing...
[Epoch 179] train avg loss 0.00253169, dev acc 0.8952, dev avg loss 0.283247, throughput 3.3921K wps
[Epoch 180 Batch 30/172] avg loss 0.00273778, throughput 3.16293K wps
[Epoch 180 Batch 60/172] avg loss 0.00247357, throughput 3.27746K wps
[Epoch 180 Batch 90/172] avg loss 0.00247186, throughput 2.97889K wps
[Epoch 180 Batch 120/172] avg loss 0.00262401, throughput 3.51265K wps
[Epoch 180 Batch 150/172] avg loss 0.00254592, throughput 3.4692K wps
Begin Testing...
[Epoch 180] train avg loss 0.00258112, dev acc 0.8931, dev avg loss 0.281999, throughput 3.25432K wps
[Epoch 181 Batch 30/172] avg loss 0.00252076, throughput 3.06189K wps
[Epoch 181 Batch 60/172] avg loss 0.00292357, throughput 3.23137K wps
[Epoch 181 Batch 90/172] avg loss 0.0023574, throughput 4.02419K wps
[Epoch 181 Batch 120/172] avg loss 0.00256229, throughput 3.15537K wps
[Epoch 181 Batch 150/172] avg loss 0.00237735, throughput 3.10915K wps
Begin Testing...
[Epoch 181] train avg loss 0.00250052, dev acc 0.8931, dev avg loss 0.283716, throughput 3.29466K wps
[Epoch 182 Batch 30/172] avg loss 0.00241751, throughput 3.63679K wps
[Epoch 182 Batch 60/172] avg loss 0.00257537, throughput 3.67774K wps
[Epoch 182 Batch 90/172] avg loss 0.00244783, throughput 3.34554K wps
[Epoch 182 Batch 120/172] avg loss 0.00246414, throughput 3.18462K wps
[Epoch 182 Batch 150/172] avg loss 0.00278141, throughput 3.96144K wps
Begin Testing...
[Epoch 182] train avg loss 0.00250309, dev acc 0.8941, dev avg loss 0.284514, throughput 3.53667K wps
[Epoch 183 Batch 30/172] avg loss 0.00234797, throughput 2.91468K wps
[Epoch 183 Batch 60/172] avg loss 0.00225202, throughput 3.45058K wps
[Epoch 183 Batch 90/172] avg loss 0.00236574, throughput 3.77046K wps
[Epoch 183 Batch 120/172] avg loss 0.00253194, throughput 3.32534K wps
[Epoch 183 Batch 150/172] avg loss 0.00248997, throughput 3.39537K wps
Begin Testing...
[Epoch 183] train avg loss 0.00247939, dev acc 0.8941, dev avg loss 0.282341, throughput 3.30442K wps
[Epoch 184 Batch 30/172] avg loss 0.00246775, throughput 2.9727K wps
[Epoch 184 Batch 60/172] avg loss 0.00245199, throughput 3.26819K wps
[Epoch 184 Batch 90/172] avg loss 0.00244056, throughput 3.44567K wps
[Epoch 184 Batch 120/172] avg loss 0.00251379, throughput 3.42317K wps
[Epoch 184 Batch 150/172] avg loss 0.00260134, throughput 3.04423K wps
Begin Testing...
[Epoch 184] train avg loss 0.00247334, dev acc 0.8931, dev avg loss 0.283391, throughput 3.22656K wps
[Epoch 185 Batch 30/172] avg loss 0.00220184, throughput 4.02567K wps
[Epoch 185 Batch 60/172] avg loss 0.00241329, throughput 3.09006K wps
[Epoch 185 Batch 90/172] avg loss 0.00262132, throughput 3.15146K wps
[Epoch 185 Batch 120/172] avg loss 0.00247202, throughput 3.21017K wps
[Epoch 185 Batch 150/172] avg loss 0.00269754, throughput 3.30579K wps
Begin Testing...
[Epoch 185] train avg loss 0.00247584, dev acc 0.8931, dev avg loss 0.284213, throughput 3.40287K wps
[Epoch 186 Batch 30/172] avg loss 0.00246289, throughput 2.8991K wps
[Epoch 186 Batch 60/172] avg loss 0.00227671, throughput 3.42165K wps
[Epoch 186 Batch 90/172] avg loss 0.00276661, throughput 2.90987K wps
[Epoch 186 Batch 120/172] avg loss 0.00229223, throughput 3.4179K wps
[Epoch 186 Batch 150/172] avg loss 0.00232292, throughput 3.61086K wps
Begin Testing...
[Epoch 186] train avg loss 0.00243692, dev acc 0.8931, dev avg loss 0.286263, throughput 3.27248K wps
[Epoch 187 Batch 30/172] avg loss 0.00253785, throughput 3.52103K wps
[Epoch 187 Batch 60/172] avg loss 0.00251104, throughput 3.26163K wps
[Epoch 187 Batch 90/172] avg loss 0.0024139, throughput 3.20299K wps
[Epoch 187 Batch 120/172] avg loss 0.00235769, throughput 4.15384K wps
[Epoch 187 Batch 150/172] avg loss 0.0024562, throughput 2.91144K wps
Begin Testing...
[Epoch 187] train avg loss 0.00242759, dev acc 0.8941, dev avg loss 0.285408, throughput 3.29629K wps
[Epoch 188 Batch 30/172] avg loss 0.00252798, throughput 3.03227K wps
[Epoch 188 Batch 60/172] avg loss 0.00222952, throughput 3.13588K wps
[Epoch 188 Batch 90/172] avg loss 0.00246211, throughput 3.35631K wps
[Epoch 188 Batch 120/172] avg loss 0.0025379, throughput 2.97874K wps
[Epoch 188 Batch 150/172] avg loss 0.00222779, throughput 3.88777K wps
Begin Testing...
[Epoch 188] train avg loss 0.0024282, dev acc 0.8931, dev avg loss 0.286344, throughput 3.29823K wps
[Epoch 189 Batch 30/172] avg loss 0.00235652, throughput 3.0706K wps
[Epoch 189 Batch 60/172] avg loss 0.00267806, throughput 3.02954K wps
[Epoch 189 Batch 90/172] avg loss 0.0023768, throughput 3.70717K wps
[Epoch 189 Batch 120/172] avg loss 0.00256359, throughput 3.31355K wps
[Epoch 189 Batch 150/172] avg loss 0.00247078, throughput 3.83043K wps
Begin Testing...
[Epoch 189] train avg loss 0.00246254, dev acc 0.8920, dev avg loss 0.284268, throughput 3.40058K wps
[Epoch 190 Batch 30/172] avg loss 0.00223818, throughput 3.22936K wps
[Epoch 190 Batch 60/172] avg loss 0.00226288, throughput 3.27219K wps
[Epoch 190 Batch 90/172] avg loss 0.00243406, throughput 3.11767K wps
[Epoch 190 Batch 120/172] avg loss 0.0021844, throughput 3.40072K wps
[Epoch 190 Batch 150/172] avg loss 0.00259907, throughput 3.11513K wps
Begin Testing...
[Epoch 190] train avg loss 0.00242466, dev acc 0.8941, dev avg loss 0.284901, throughput 3.22184K wps
[Epoch 191 Batch 30/172] avg loss 0.00230373, throughput 2.94363K wps
[Epoch 191 Batch 60/172] avg loss 0.00223094, throughput 3.26063K wps
[Epoch 191 Batch 90/172] avg loss 0.00275348, throughput 3.06669K wps
[Epoch 191 Batch 120/172] avg loss 0.00226104, throughput 3.3103K wps
[Epoch 191 Batch 150/172] avg loss 0.00225658, throughput 3.21287K wps
Begin Testing...
[Epoch 191] train avg loss 0.00239008, dev acc 0.8931, dev avg loss 0.28463, throughput 3.12457K wps
[Epoch 192 Batch 30/172] avg loss 0.00248118, throughput 3.12484K wps
[Epoch 192 Batch 60/172] avg loss 0.00206958, throughput 3.38714K wps
[Epoch 192 Batch 90/172] avg loss 0.00220192, throughput 3.03731K wps
[Epoch 192 Batch 120/172] avg loss 0.00263104, throughput 2.98863K wps
[Epoch 192 Batch 150/172] avg loss 0.00234211, throughput 3.27986K wps
Begin Testing...
[Epoch 192] train avg loss 0.00237972, dev acc 0.8941, dev avg loss 0.28372, throughput 3.15586K wps
[Epoch 193 Batch 30/172] avg loss 0.00206233, throughput 3.29811K wps
[Epoch 193 Batch 60/172] avg loss 0.00242768, throughput 3.39539K wps
[Epoch 193 Batch 90/172] avg loss 0.00268395, throughput 3.66632K wps
[Epoch 193 Batch 120/172] avg loss 0.00245871, throughput 3.40279K wps
[Epoch 193 Batch 150/172] avg loss 0.00229776, throughput 3.53583K wps
Begin Testing...
[Epoch 193] train avg loss 0.00233438, dev acc 0.8962, dev avg loss 0.285691, throughput 3.52961K wps
Observed Improvement.
Begin Testing...
[Epoch 194 Batch 30/172] avg loss 0.00236413, throughput 3.3521K wps
[Epoch 194 Batch 60/172] avg loss 0.00260091, throughput 3.62613K wps
[Epoch 194 Batch 90/172] avg loss 0.00207431, throughput 3.1509K wps
[Epoch 194 Batch 120/172] avg loss 0.00214959, throughput 3.01168K wps
[Epoch 194 Batch 150/172] avg loss 0.00250389, throughput 3.05701K wps
Begin Testing...
[Epoch 194] train avg loss 0.00236181, dev acc 0.8920, dev avg loss 0.284639, throughput 3.28548K wps
[Epoch 195 Batch 30/172] avg loss 0.00247083, throughput 3.24843K wps
[Epoch 195 Batch 60/172] avg loss 0.00245976, throughput 3.33652K wps
[Epoch 195 Batch 90/172] avg loss 0.0020646, throughput 3.62878K wps
[Epoch 195 Batch 120/172] avg loss 0.00221032, throughput 3.57559K wps
[Epoch 195 Batch 150/172] avg loss 0.00241011, throughput 3.05077K wps
Begin Testing...
[Epoch 195] train avg loss 0.00230389, dev acc 0.8962, dev avg loss 0.287509, throughput 3.34482K wps
Observed Improvement.
Begin Testing...
[Epoch 196 Batch 30/172] avg loss 0.00248652, throughput 3.58384K wps
[Epoch 196 Batch 60/172] avg loss 0.00219506, throughput 3.20866K wps
[Epoch 196 Batch 90/172] avg loss 0.0024206, throughput 3.39194K wps
[Epoch 196 Batch 120/172] avg loss 0.00239569, throughput 3.12231K wps
[Epoch 196 Batch 150/172] avg loss 0.00224583, throughput 3.02342K wps
Begin Testing...
[Epoch 196] train avg loss 0.00237805, dev acc 0.8920, dev avg loss 0.285135, throughput 3.21899K wps
[Epoch 197 Batch 30/172] avg loss 0.00235188, throughput 3.2063K wps
[Epoch 197 Batch 60/172] avg loss 0.00267772, throughput 3.05603K wps
[Epoch 197 Batch 90/172] avg loss 0.00235128, throughput 3.61579K wps
[Epoch 197 Batch 120/172] avg loss 0.00222862, throughput 3.41021K wps
[Epoch 197 Batch 150/172] avg loss 0.00238831, throughput 3.23304K wps
Begin Testing...
[Epoch 197] train avg loss 0.00237807, dev acc 0.8941, dev avg loss 0.284904, throughput 3.34303K wps
[Epoch 198 Batch 30/172] avg loss 0.00222304, throughput 3.04445K wps
[Epoch 198 Batch 60/172] avg loss 0.00211649, throughput 3.4547K wps
[Epoch 198 Batch 90/172] avg loss 0.00213125, throughput 2.97708K wps
[Epoch 198 Batch 120/172] avg loss 0.00237114, throughput 3.24595K wps
[Epoch 198 Batch 150/172] avg loss 0.00255637, throughput 3.5486K wps
Begin Testing...
[Epoch 198] train avg loss 0.0023204, dev acc 0.8952, dev avg loss 0.285244, throughput 3.23275K wps
[Epoch 199 Batch 30/172] avg loss 0.00245105, throughput 3.12498K wps
[Epoch 199 Batch 60/172] avg loss 0.00245788, throughput 3.18999K wps
[Epoch 199 Batch 90/172] avg loss 0.0021858, throughput 3.37972K wps
[Epoch 199 Batch 120/172] avg loss 0.00222385, throughput 3.41883K wps
[Epoch 199 Batch 150/172] avg loss 0.00219016, throughput 3.38529K wps
Begin Testing...
[Epoch 199] train avg loss 0.00231718, dev acc 0.8941, dev avg loss 0.285919, throughput 3.30118K wps
Test loss 0.341564, test acc 0.8849
Total time cost 393.44s
[Epoch 0 Batch 30/172] avg loss 0.0127982, throughput 2.78465K wps
[Epoch 0 Batch 60/172] avg loss 0.01229, throughput 2.9348K wps
[Epoch 0 Batch 90/172] avg loss 0.0124257, throughput 3.47807K wps
[Epoch 0 Batch 120/172] avg loss 0.0123104, throughput 3.36287K wps
[Epoch 0 Batch 150/172] avg loss 0.0122975, throughput 3.18255K wps
Begin Testing...
[Epoch 0] train avg loss 0.0124122, dev acc 0.6771, dev avg loss 0.617801, throughput 3.10396K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0119565, throughput 3.04726K wps
[Epoch 1 Batch 60/172] avg loss 0.0120373, throughput 3.68889K wps
[Epoch 1 Batch 90/172] avg loss 0.0123518, throughput 4.15369K wps
[Epoch 1 Batch 120/172] avg loss 0.0120913, throughput 3.08442K wps
[Epoch 1 Batch 150/172] avg loss 0.0120308, throughput 3.70299K wps
Begin Testing...
[Epoch 1] train avg loss 0.0120763, dev acc 0.6771, dev avg loss 0.604518, throughput 3.49589K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0118358, throughput 3.32892K wps
[Epoch 2 Batch 60/172] avg loss 0.0118452, throughput 3.76431K wps
[Epoch 2 Batch 90/172] avg loss 0.0119833, throughput 3.38022K wps
[Epoch 2 Batch 120/172] avg loss 0.0118246, throughput 2.94091K wps
[Epoch 2 Batch 150/172] avg loss 0.0117414, throughput 3.64197K wps
Begin Testing...
[Epoch 2] train avg loss 0.0117974, dev acc 0.6771, dev avg loss 0.591004, throughput 3.43577K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.011726, throughput 3.49885K wps
[Epoch 3 Batch 60/172] avg loss 0.0114146, throughput 3.42523K wps
[Epoch 3 Batch 90/172] avg loss 0.0114951, throughput 3.49032K wps
[Epoch 3 Batch 120/172] avg loss 0.0115168, throughput 2.98526K wps
[Epoch 3 Batch 150/172] avg loss 0.0113317, throughput 2.97224K wps
Begin Testing...
[Epoch 3] train avg loss 0.0114842, dev acc 0.6771, dev avg loss 0.574553, throughput 3.33967K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0111187, throughput 3.26444K wps
[Epoch 4 Batch 60/172] avg loss 0.0115738, throughput 2.99966K wps
[Epoch 4 Batch 90/172] avg loss 0.0112798, throughput 3.4921K wps
[Epoch 4 Batch 120/172] avg loss 0.0108782, throughput 3.75763K wps
[Epoch 4 Batch 150/172] avg loss 0.0110157, throughput 3.1421K wps
Begin Testing...
[Epoch 4] train avg loss 0.0111255, dev acc 0.6813, dev avg loss 0.556844, throughput 3.2698K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0108732, throughput 3.16582K wps
[Epoch 5 Batch 60/172] avg loss 0.0106675, throughput 3.00606K wps
[Epoch 5 Batch 90/172] avg loss 0.0108672, throughput 3.47363K wps
[Epoch 5 Batch 120/172] avg loss 0.0104335, throughput 3.34745K wps
[Epoch 5 Batch 150/172] avg loss 0.0108129, throughput 3.77888K wps
Begin Testing...
[Epoch 5] train avg loss 0.0107475, dev acc 0.7138, dev avg loss 0.535051, throughput 3.3419K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.0103151, throughput 3.13981K wps
[Epoch 6 Batch 60/172] avg loss 0.0103266, throughput 3.04226K wps
[Epoch 6 Batch 90/172] avg loss 0.0101016, throughput 2.97133K wps
[Epoch 6 Batch 120/172] avg loss 0.0104626, throughput 3.32751K wps
[Epoch 6 Batch 150/172] avg loss 0.0100681, throughput 3.38334K wps
Begin Testing...
[Epoch 6] train avg loss 0.0102777, dev acc 0.7589, dev avg loss 0.510968, throughput 3.24689K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.0101584, throughput 3.20643K wps
[Epoch 7 Batch 60/172] avg loss 0.00985858, throughput 3.75392K wps
[Epoch 7 Batch 90/172] avg loss 0.00972202, throughput 3.50284K wps
[Epoch 7 Batch 120/172] avg loss 0.00997329, throughput 3.44332K wps
[Epoch 7 Batch 150/172] avg loss 0.00965308, throughput 3.32642K wps
Begin Testing...
[Epoch 7] train avg loss 0.00984861, dev acc 0.7642, dev avg loss 0.486076, throughput 3.48442K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00930513, throughput 3.06309K wps
[Epoch 8 Batch 60/172] avg loss 0.00944635, throughput 3.19551K wps
[Epoch 8 Batch 90/172] avg loss 0.00952802, throughput 3.56081K wps
[Epoch 8 Batch 120/172] avg loss 0.00931012, throughput 3.38447K wps
[Epoch 8 Batch 150/172] avg loss 0.00925462, throughput 3.78478K wps
Begin Testing...
[Epoch 8] train avg loss 0.00934376, dev acc 0.7883, dev avg loss 0.461294, throughput 3.35622K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00908691, throughput 3.64108K wps
[Epoch 9 Batch 60/172] avg loss 0.00908601, throughput 3.07746K wps
[Epoch 9 Batch 90/172] avg loss 0.00876498, throughput 3.48292K wps
[Epoch 9 Batch 120/172] avg loss 0.00898411, throughput 3.37758K wps
[Epoch 9 Batch 150/172] avg loss 0.00859089, throughput 3.26255K wps
Begin Testing...
[Epoch 9] train avg loss 0.00886546, dev acc 0.8071, dev avg loss 0.438025, throughput 3.37878K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00835651, throughput 3.02703K wps
[Epoch 10 Batch 60/172] avg loss 0.00872133, throughput 3.13929K wps
[Epoch 10 Batch 90/172] avg loss 0.00846815, throughput 3.73766K wps
[Epoch 10 Batch 120/172] avg loss 0.00846549, throughput 3.26847K wps
[Epoch 10 Batch 150/172] avg loss 0.00832669, throughput 3.32793K wps
Begin Testing...
[Epoch 10] train avg loss 0.00841942, dev acc 0.8218, dev avg loss 0.415605, throughput 3.28204K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00793525, throughput 3.2853K wps
[Epoch 11 Batch 60/172] avg loss 0.00820808, throughput 3.20521K wps
[Epoch 11 Batch 90/172] avg loss 0.0080386, throughput 3.94433K wps
[Epoch 11 Batch 120/172] avg loss 0.00796055, throughput 3.21547K wps
[Epoch 11 Batch 150/172] avg loss 0.00788415, throughput 3.44239K wps
Begin Testing...
[Epoch 11] train avg loss 0.00802932, dev acc 0.8470, dev avg loss 0.394025, throughput 3.3651K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.00759623, throughput 2.97352K wps
[Epoch 12 Batch 60/172] avg loss 0.00758202, throughput 3.45238K wps
[Epoch 12 Batch 90/172] avg loss 0.00793183, throughput 3.26314K wps
[Epoch 12 Batch 120/172] avg loss 0.00767516, throughput 3.36551K wps
[Epoch 12 Batch 150/172] avg loss 0.00760781, throughput 2.98049K wps
Begin Testing...
[Epoch 12] train avg loss 0.00765905, dev acc 0.8491, dev avg loss 0.377887, throughput 3.17565K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.00725746, throughput 3.40061K wps
[Epoch 13 Batch 60/172] avg loss 0.00762335, throughput 3.14914K wps
[Epoch 13 Batch 90/172] avg loss 0.00750104, throughput 3.19361K wps
[Epoch 13 Batch 120/172] avg loss 0.00747123, throughput 3.24362K wps
[Epoch 13 Batch 150/172] avg loss 0.00713497, throughput 3.09195K wps
Begin Testing...
[Epoch 13] train avg loss 0.00736467, dev acc 0.8553, dev avg loss 0.363179, throughput 3.18582K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.0074028, throughput 2.89832K wps
[Epoch 14 Batch 60/172] avg loss 0.00730026, throughput 3.15706K wps
[Epoch 14 Batch 90/172] avg loss 0.00706134, throughput 3.5341K wps
[Epoch 14 Batch 120/172] avg loss 0.00727455, throughput 3.27343K wps
[Epoch 14 Batch 150/172] avg loss 0.00646804, throughput 3.33825K wps
Begin Testing...
[Epoch 14] train avg loss 0.00713862, dev acc 0.8753, dev avg loss 0.351141, throughput 3.25116K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.00697023, throughput 3.37453K wps
[Epoch 15 Batch 60/172] avg loss 0.0068267, throughput 3.41661K wps
[Epoch 15 Batch 90/172] avg loss 0.00644966, throughput 3.30453K wps
[Epoch 15 Batch 120/172] avg loss 0.00724827, throughput 3.30594K wps
[Epoch 15 Batch 150/172] avg loss 0.00707762, throughput 3.53815K wps
Begin Testing...
[Epoch 15] train avg loss 0.00688733, dev acc 0.8742, dev avg loss 0.341051, throughput 3.42101K wps
[Epoch 16 Batch 30/172] avg loss 0.00648742, throughput 3.14819K wps
[Epoch 16 Batch 60/172] avg loss 0.00736078, throughput 3.2772K wps
[Epoch 16 Batch 90/172] avg loss 0.00657236, throughput 3.49287K wps
[Epoch 16 Batch 120/172] avg loss 0.00684128, throughput 3.42337K wps
[Epoch 16 Batch 150/172] avg loss 0.00685311, throughput 3.28975K wps
Begin Testing...
[Epoch 16] train avg loss 0.00676837, dev acc 0.8805, dev avg loss 0.333179, throughput 3.36936K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/172] avg loss 0.00657836, throughput 3.50982K wps
[Epoch 17 Batch 60/172] avg loss 0.00613476, throughput 3.63736K wps
[Epoch 17 Batch 90/172] avg loss 0.00642265, throughput 2.90592K wps
[Epoch 17 Batch 120/172] avg loss 0.00672222, throughput 2.98929K wps
[Epoch 17 Batch 150/172] avg loss 0.00712704, throughput 3.72736K wps
Begin Testing...
[Epoch 17] train avg loss 0.0065956, dev acc 0.8836, dev avg loss 0.326493, throughput 3.34471K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.00678602, throughput 3.53284K wps
[Epoch 18 Batch 60/172] avg loss 0.00630158, throughput 3.34801K wps
[Epoch 18 Batch 90/172] avg loss 0.0064707, throughput 3.18011K wps
[Epoch 18 Batch 120/172] avg loss 0.00601532, throughput 3.02687K wps
[Epoch 18 Batch 150/172] avg loss 0.00659838, throughput 3.17772K wps
Begin Testing...
[Epoch 18] train avg loss 0.00646796, dev acc 0.8857, dev avg loss 0.320802, throughput 3.24819K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00627247, throughput 3.59021K wps
[Epoch 19 Batch 60/172] avg loss 0.00607763, throughput 3.81152K wps
[Epoch 19 Batch 90/172] avg loss 0.00647683, throughput 3.74142K wps
[Epoch 19 Batch 120/172] avg loss 0.00646932, throughput 2.99458K wps
[Epoch 19 Batch 150/172] avg loss 0.00630457, throughput 2.96611K wps
Begin Testing...
[Epoch 19] train avg loss 0.00633189, dev acc 0.8857, dev avg loss 0.31618, throughput 3.41457K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.0065614, throughput 3.89443K wps
[Epoch 20 Batch 60/172] avg loss 0.00597925, throughput 3.51235K wps
[Epoch 20 Batch 90/172] avg loss 0.00612721, throughput 3.56807K wps
[Epoch 20 Batch 120/172] avg loss 0.00599726, throughput 3.28856K wps
[Epoch 20 Batch 150/172] avg loss 0.00648109, throughput 3.35098K wps
Begin Testing...
[Epoch 20] train avg loss 0.00622466, dev acc 0.8889, dev avg loss 0.31227, throughput 3.48293K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/172] avg loss 0.00562228, throughput 3.02953K wps
[Epoch 21 Batch 60/172] avg loss 0.00609999, throughput 3.16553K wps
[Epoch 21 Batch 90/172] avg loss 0.00688194, throughput 3.20462K wps
[Epoch 21 Batch 120/172] avg loss 0.00629079, throughput 3.05826K wps
[Epoch 21 Batch 150/172] avg loss 0.00554651, throughput 3.17253K wps
Begin Testing...
[Epoch 21] train avg loss 0.00616149, dev acc 0.8899, dev avg loss 0.308605, throughput 3.14591K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.00639972, throughput 3.01552K wps
[Epoch 22 Batch 60/172] avg loss 0.00592813, throughput 3.60322K wps
[Epoch 22 Batch 90/172] avg loss 0.00621364, throughput 3.21763K wps
[Epoch 22 Batch 120/172] avg loss 0.00603247, throughput 3.9898K wps
[Epoch 22 Batch 150/172] avg loss 0.00594659, throughput 3.61167K wps
Begin Testing...
[Epoch 22] train avg loss 0.00608913, dev acc 0.8920, dev avg loss 0.305753, throughput 3.44161K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/172] avg loss 0.00587407, throughput 3.20209K wps
[Epoch 23 Batch 60/172] avg loss 0.0059014, throughput 3.46018K wps
[Epoch 23 Batch 90/172] avg loss 0.00613681, throughput 3.47136K wps
[Epoch 23 Batch 120/172] avg loss 0.00622571, throughput 3.00362K wps
[Epoch 23 Batch 150/172] avg loss 0.00603694, throughput 3.64637K wps
Begin Testing...
[Epoch 23] train avg loss 0.0059857, dev acc 0.8920, dev avg loss 0.304315, throughput 3.31721K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00570895, throughput 3.23601K wps
[Epoch 24 Batch 60/172] avg loss 0.00589211, throughput 3.67368K wps
[Epoch 24 Batch 90/172] avg loss 0.00600316, throughput 3.72772K wps
[Epoch 24 Batch 120/172] avg loss 0.00591926, throughput 3.15882K wps
[Epoch 24 Batch 150/172] avg loss 0.0060188, throughput 3.22944K wps
Begin Testing...
[Epoch 24] train avg loss 0.00586868, dev acc 0.8910, dev avg loss 0.301408, throughput 3.31687K wps
[Epoch 25 Batch 30/172] avg loss 0.0063269, throughput 3.43345K wps
[Epoch 25 Batch 60/172] avg loss 0.00591282, throughput 3.57502K wps
[Epoch 25 Batch 90/172] avg loss 0.00580811, throughput 3.21349K wps
[Epoch 25 Batch 120/172] avg loss 0.00577394, throughput 3.65581K wps
[Epoch 25 Batch 150/172] avg loss 0.00598485, throughput 3.64547K wps
Begin Testing...