Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
14934 lines (14933 sloc) 929 KB
Namespace(batch_size=50, data_name='MPQA', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='static')
Use gpu0
maximum length (in tokens): 36
Done! Tokenizing Time=0.05s, #Sentences=10606
SentimentNet(
(embedding): Embedding(6250 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/172] avg loss 0.0127233, throughput 0.568037K wps
[Epoch 0 Batch 60/172] avg loss 0.0121942, throughput 2.98018K wps
[Epoch 0 Batch 90/172] avg loss 0.0123264, throughput 2.95468K wps
[Epoch 0 Batch 120/172] avg loss 0.0124115, throughput 2.99725K wps
[Epoch 0 Batch 150/172] avg loss 0.0125816, throughput 3.46619K wps
Begin Testing...
[Epoch 0] train avg loss 0.0124621, dev acc 0.7013, dev avg loss 0.596264, throughput 1.26098K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0121442, throughput 3.26167K wps
[Epoch 1 Batch 60/172] avg loss 0.012264, throughput 2.69865K wps
[Epoch 1 Batch 90/172] avg loss 0.0121966, throughput 3.11976K wps
[Epoch 1 Batch 120/172] avg loss 0.0118879, throughput 3.532K wps
[Epoch 1 Batch 150/172] avg loss 0.0119789, throughput 3.37953K wps
Begin Testing...
[Epoch 1] train avg loss 0.0121223, dev acc 0.7013, dev avg loss 0.584246, throughput 3.22058K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0120123, throughput 3.4221K wps
[Epoch 2 Batch 60/172] avg loss 0.0119558, throughput 3.24615K wps
[Epoch 2 Batch 90/172] avg loss 0.0116812, throughput 3.34826K wps
[Epoch 2 Batch 120/172] avg loss 0.0118478, throughput 3.29778K wps
[Epoch 2 Batch 150/172] avg loss 0.0117791, throughput 3.27964K wps
Begin Testing...
[Epoch 2] train avg loss 0.0118672, dev acc 0.7013, dev avg loss 0.572381, throughput 3.30119K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0114487, throughput 3.71262K wps
[Epoch 3 Batch 60/172] avg loss 0.0118789, throughput 2.97893K wps
[Epoch 3 Batch 90/172] avg loss 0.0114672, throughput 3.22584K wps
[Epoch 3 Batch 120/172] avg loss 0.0112549, throughput 3.04536K wps
[Epoch 3 Batch 150/172] avg loss 0.0117584, throughput 4.03722K wps
Begin Testing...
[Epoch 3] train avg loss 0.0115544, dev acc 0.7013, dev avg loss 0.558716, throughput 3.42615K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0114478, throughput 4.16634K wps
[Epoch 4 Batch 60/172] avg loss 0.011327, throughput 3.13915K wps
[Epoch 4 Batch 90/172] avg loss 0.0111751, throughput 3.45599K wps
[Epoch 4 Batch 120/172] avg loss 0.0109726, throughput 3.40918K wps
[Epoch 4 Batch 150/172] avg loss 0.0112832, throughput 3.06019K wps
Begin Testing...
[Epoch 4] train avg loss 0.0112317, dev acc 0.7044, dev avg loss 0.541566, throughput 3.3932K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0107178, throughput 3.03481K wps
[Epoch 5 Batch 60/172] avg loss 0.0110696, throughput 3.33332K wps
[Epoch 5 Batch 90/172] avg loss 0.0106704, throughput 3.70407K wps
[Epoch 5 Batch 120/172] avg loss 0.0108208, throughput 3.08342K wps
[Epoch 5 Batch 150/172] avg loss 0.0107542, throughput 3.15044K wps
Begin Testing...
[Epoch 5] train avg loss 0.010817, dev acc 0.7275, dev avg loss 0.523032, throughput 3.27562K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.0108376, throughput 3.22603K wps
[Epoch 6 Batch 60/172] avg loss 0.0103796, throughput 3.03746K wps
[Epoch 6 Batch 90/172] avg loss 0.0104502, throughput 3.28056K wps
[Epoch 6 Batch 120/172] avg loss 0.0103761, throughput 3.83081K wps
[Epoch 6 Batch 150/172] avg loss 0.0102663, throughput 3.35017K wps
Begin Testing...
[Epoch 6] train avg loss 0.0104146, dev acc 0.7505, dev avg loss 0.501728, throughput 3.36225K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.0101223, throughput 3.2793K wps
[Epoch 7 Batch 60/172] avg loss 0.0103292, throughput 2.95991K wps
[Epoch 7 Batch 90/172] avg loss 0.00996908, throughput 3.91696K wps
[Epoch 7 Batch 120/172] avg loss 0.010045, throughput 3.23376K wps
[Epoch 7 Batch 150/172] avg loss 0.0095717, throughput 3.23186K wps
Begin Testing...
[Epoch 7] train avg loss 0.00996753, dev acc 0.7778, dev avg loss 0.479924, throughput 3.36584K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00963984, throughput 3.28301K wps
[Epoch 8 Batch 60/172] avg loss 0.00939315, throughput 3.40667K wps
[Epoch 8 Batch 90/172] avg loss 0.00955535, throughput 3.0485K wps
[Epoch 8 Batch 120/172] avg loss 0.00934517, throughput 3.03151K wps
[Epoch 8 Batch 150/172] avg loss 0.00935906, throughput 3.14406K wps
Begin Testing...
[Epoch 8] train avg loss 0.00946479, dev acc 0.8082, dev avg loss 0.457284, throughput 3.23204K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00901025, throughput 3.32432K wps
[Epoch 9 Batch 60/172] avg loss 0.00913873, throughput 3.87368K wps
[Epoch 9 Batch 90/172] avg loss 0.00887666, throughput 3.24743K wps
[Epoch 9 Batch 120/172] avg loss 0.00885105, throughput 3.43844K wps
[Epoch 9 Batch 150/172] avg loss 0.00897269, throughput 3.24354K wps
Begin Testing...
[Epoch 9] train avg loss 0.00898854, dev acc 0.8281, dev avg loss 0.436572, throughput 3.3963K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00858826, throughput 3.79686K wps
[Epoch 10 Batch 60/172] avg loss 0.00855101, throughput 3.32147K wps
[Epoch 10 Batch 90/172] avg loss 0.0084636, throughput 3.32589K wps
[Epoch 10 Batch 120/172] avg loss 0.00856535, throughput 3.37608K wps
[Epoch 10 Batch 150/172] avg loss 0.00849448, throughput 3.75786K wps
Begin Testing...
[Epoch 10] train avg loss 0.00854714, dev acc 0.8407, dev avg loss 0.416918, throughput 3.46747K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00854495, throughput 3.6082K wps
[Epoch 11 Batch 60/172] avg loss 0.00825646, throughput 3.35882K wps
[Epoch 11 Batch 90/172] avg loss 0.00789342, throughput 3.08015K wps
[Epoch 11 Batch 120/172] avg loss 0.00827371, throughput 3.23117K wps
[Epoch 11 Batch 150/172] avg loss 0.00801883, throughput 3.15959K wps
Begin Testing...
[Epoch 11] train avg loss 0.00814192, dev acc 0.8428, dev avg loss 0.399477, throughput 3.29884K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.00772041, throughput 2.91767K wps
[Epoch 12 Batch 60/172] avg loss 0.00782335, throughput 3.14053K wps
[Epoch 12 Batch 90/172] avg loss 0.00795845, throughput 3.18763K wps
[Epoch 12 Batch 120/172] avg loss 0.0076794, throughput 3.47449K wps
[Epoch 12 Batch 150/172] avg loss 0.00779173, throughput 3.07137K wps
Begin Testing...
[Epoch 12] train avg loss 0.00781458, dev acc 0.8532, dev avg loss 0.385584, throughput 3.14814K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.0073061, throughput 2.91638K wps
[Epoch 13 Batch 60/172] avg loss 0.00759952, throughput 3.15092K wps
[Epoch 13 Batch 90/172] avg loss 0.00752232, throughput 3.07213K wps
[Epoch 13 Batch 120/172] avg loss 0.00746355, throughput 3.1516K wps
[Epoch 13 Batch 150/172] avg loss 0.00745688, throughput 3.49559K wps
Begin Testing...
[Epoch 13] train avg loss 0.00749335, dev acc 0.8564, dev avg loss 0.372432, throughput 3.11167K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.00725985, throughput 3.05221K wps
[Epoch 14 Batch 60/172] avg loss 0.00751878, throughput 2.86987K wps
[Epoch 14 Batch 90/172] avg loss 0.00732755, throughput 3.10717K wps
[Epoch 14 Batch 120/172] avg loss 0.00720006, throughput 2.97359K wps
[Epoch 14 Batch 150/172] avg loss 0.00719098, throughput 2.91866K wps
Begin Testing...
[Epoch 14] train avg loss 0.00728736, dev acc 0.8532, dev avg loss 0.361486, throughput 2.9818K wps
[Epoch 15 Batch 30/172] avg loss 0.00717271, throughput 3.14917K wps
[Epoch 15 Batch 60/172] avg loss 0.00708512, throughput 3.30843K wps
[Epoch 15 Batch 90/172] avg loss 0.00707693, throughput 3.10045K wps
[Epoch 15 Batch 120/172] avg loss 0.00705599, throughput 3.09707K wps
[Epoch 15 Batch 150/172] avg loss 0.00694422, throughput 3.22122K wps
Begin Testing...
[Epoch 15] train avg loss 0.00706927, dev acc 0.8616, dev avg loss 0.352945, throughput 3.14684K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.0073563, throughput 3.28204K wps
[Epoch 16 Batch 60/172] avg loss 0.00702837, throughput 3.33905K wps
[Epoch 16 Batch 90/172] avg loss 0.00677221, throughput 3.30064K wps
[Epoch 16 Batch 120/172] avg loss 0.00667035, throughput 3.20914K wps
[Epoch 16 Batch 150/172] avg loss 0.00669526, throughput 3.80875K wps
Begin Testing...
[Epoch 16] train avg loss 0.0069187, dev acc 0.8606, dev avg loss 0.346213, throughput 3.34159K wps
[Epoch 17 Batch 30/172] avg loss 0.00701887, throughput 3.49612K wps
[Epoch 17 Batch 60/172] avg loss 0.00664853, throughput 3.04102K wps
[Epoch 17 Batch 90/172] avg loss 0.00655937, throughput 3.15781K wps
[Epoch 17 Batch 120/172] avg loss 0.00673822, throughput 3.35695K wps
[Epoch 17 Batch 150/172] avg loss 0.00703093, throughput 3.74515K wps
Begin Testing...
[Epoch 17] train avg loss 0.00673067, dev acc 0.8658, dev avg loss 0.33979, throughput 3.35854K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.00668021, throughput 3.57295K wps
[Epoch 18 Batch 60/172] avg loss 0.00647215, throughput 2.99785K wps
[Epoch 18 Batch 90/172] avg loss 0.00637071, throughput 3.55826K wps
[Epoch 18 Batch 120/172] avg loss 0.00660224, throughput 3.21775K wps
[Epoch 18 Batch 150/172] avg loss 0.00641824, throughput 3.01983K wps
Begin Testing...
[Epoch 18] train avg loss 0.00655544, dev acc 0.8690, dev avg loss 0.334831, throughput 3.32123K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00653397, throughput 3.37539K wps
[Epoch 19 Batch 60/172] avg loss 0.00623574, throughput 4.00235K wps
[Epoch 19 Batch 90/172] avg loss 0.00662951, throughput 3.14147K wps
[Epoch 19 Batch 120/172] avg loss 0.00641958, throughput 3.21954K wps
[Epoch 19 Batch 150/172] avg loss 0.00660851, throughput 3.06596K wps
Begin Testing...
[Epoch 19] train avg loss 0.00646775, dev acc 0.8690, dev avg loss 0.33043, throughput 3.28406K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.00628485, throughput 2.9268K wps
[Epoch 20 Batch 60/172] avg loss 0.00624464, throughput 3.37136K wps
[Epoch 20 Batch 90/172] avg loss 0.00658069, throughput 3.16108K wps
[Epoch 20 Batch 120/172] avg loss 0.00647636, throughput 2.99797K wps
[Epoch 20 Batch 150/172] avg loss 0.00615344, throughput 2.94709K wps
Begin Testing...
[Epoch 20] train avg loss 0.00632612, dev acc 0.8669, dev avg loss 0.327034, throughput 3.07544K wps
[Epoch 21 Batch 30/172] avg loss 0.00590377, throughput 3.39111K wps
[Epoch 21 Batch 60/172] avg loss 0.00649392, throughput 3.23956K wps
[Epoch 21 Batch 90/172] avg loss 0.00659017, throughput 3.09683K wps
[Epoch 21 Batch 120/172] avg loss 0.00608674, throughput 3.06675K wps
[Epoch 21 Batch 150/172] avg loss 0.00628865, throughput 3.36194K wps
Begin Testing...
[Epoch 21] train avg loss 0.00630336, dev acc 0.8732, dev avg loss 0.324129, throughput 3.2359K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.00583023, throughput 3.40699K wps
[Epoch 22 Batch 60/172] avg loss 0.00620989, throughput 3.17302K wps
[Epoch 22 Batch 90/172] avg loss 0.00630623, throughput 3.55451K wps
[Epoch 22 Batch 120/172] avg loss 0.00648819, throughput 3.22982K wps
[Epoch 22 Batch 150/172] avg loss 0.00643698, throughput 3.29356K wps
Begin Testing...
[Epoch 22] train avg loss 0.00624489, dev acc 0.8711, dev avg loss 0.320877, throughput 3.39095K wps
[Epoch 23 Batch 30/172] avg loss 0.00601718, throughput 3.28153K wps
[Epoch 23 Batch 60/172] avg loss 0.00616177, throughput 3.10622K wps
[Epoch 23 Batch 90/172] avg loss 0.00635835, throughput 2.97047K wps
[Epoch 23 Batch 120/172] avg loss 0.00616207, throughput 3.17878K wps
[Epoch 23 Batch 150/172] avg loss 0.0058397, throughput 3.63953K wps
Begin Testing...
[Epoch 23] train avg loss 0.00615918, dev acc 0.8763, dev avg loss 0.319247, throughput 3.23414K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00601478, throughput 3.14921K wps
[Epoch 24 Batch 60/172] avg loss 0.00611081, throughput 3.74394K wps
[Epoch 24 Batch 90/172] avg loss 0.00608245, throughput 3.33662K wps
[Epoch 24 Batch 120/172] avg loss 0.00578948, throughput 3.48627K wps
[Epoch 24 Batch 150/172] avg loss 0.0061752, throughput 2.90924K wps
Begin Testing...
[Epoch 24] train avg loss 0.0060603, dev acc 0.8711, dev avg loss 0.31629, throughput 3.24154K wps
[Epoch 25 Batch 30/172] avg loss 0.00613641, throughput 3.13829K wps
[Epoch 25 Batch 60/172] avg loss 0.00597926, throughput 3.12878K wps
[Epoch 25 Batch 90/172] avg loss 0.00594114, throughput 3.08405K wps
[Epoch 25 Batch 120/172] avg loss 0.00578393, throughput 3.41234K wps
[Epoch 25 Batch 150/172] avg loss 0.00638086, throughput 3.45158K wps
Begin Testing...
[Epoch 25] train avg loss 0.00595594, dev acc 0.8753, dev avg loss 0.314189, throughput 3.20689K wps
[Epoch 26 Batch 30/172] avg loss 0.00558838, throughput 3.20089K wps
[Epoch 26 Batch 60/172] avg loss 0.00618518, throughput 2.95103K wps
[Epoch 26 Batch 90/172] avg loss 0.00588484, throughput 3.56916K wps
[Epoch 26 Batch 120/172] avg loss 0.00589363, throughput 3.19684K wps
[Epoch 26 Batch 150/172] avg loss 0.00612226, throughput 3.49191K wps
Begin Testing...
[Epoch 26] train avg loss 0.0059725, dev acc 0.8774, dev avg loss 0.312784, throughput 3.31793K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/172] avg loss 0.00588888, throughput 2.93619K wps
[Epoch 27 Batch 60/172] avg loss 0.0061345, throughput 2.96873K wps
[Epoch 27 Batch 90/172] avg loss 0.00566765, throughput 3.16559K wps
[Epoch 27 Batch 120/172] avg loss 0.00562864, throughput 3.14564K wps
[Epoch 27 Batch 150/172] avg loss 0.00625058, throughput 3.32899K wps
Begin Testing...
[Epoch 27] train avg loss 0.00585134, dev acc 0.8805, dev avg loss 0.310932, throughput 3.12738K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/172] avg loss 0.00597622, throughput 3.16849K wps
[Epoch 28 Batch 60/172] avg loss 0.00554544, throughput 3.25166K wps
[Epoch 28 Batch 90/172] avg loss 0.0057941, throughput 3.51676K wps
[Epoch 28 Batch 120/172] avg loss 0.00598347, throughput 2.96205K wps
[Epoch 28 Batch 150/172] avg loss 0.00566894, throughput 3.06385K wps
Begin Testing...
[Epoch 28] train avg loss 0.00580601, dev acc 0.8795, dev avg loss 0.309405, throughput 3.21466K wps
[Epoch 29 Batch 30/172] avg loss 0.00570336, throughput 3.23856K wps
[Epoch 29 Batch 60/172] avg loss 0.00526966, throughput 3.54845K wps
[Epoch 29 Batch 90/172] avg loss 0.00631996, throughput 3.01974K wps
[Epoch 29 Batch 120/172] avg loss 0.00588606, throughput 3.32608K wps
[Epoch 29 Batch 150/172] avg loss 0.00575599, throughput 3.08267K wps
Begin Testing...
[Epoch 29] train avg loss 0.00577465, dev acc 0.8774, dev avg loss 0.309248, throughput 3.23398K wps
[Epoch 30 Batch 30/172] avg loss 0.00639365, throughput 3.08892K wps
[Epoch 30 Batch 60/172] avg loss 0.00511487, throughput 3.47182K wps
[Epoch 30 Batch 90/172] avg loss 0.005301, throughput 3.36181K wps
[Epoch 30 Batch 120/172] avg loss 0.00584884, throughput 3.20703K wps
[Epoch 30 Batch 150/172] avg loss 0.00586244, throughput 2.91843K wps
Begin Testing...
[Epoch 30] train avg loss 0.005688, dev acc 0.8774, dev avg loss 0.30662, throughput 3.1546K wps
[Epoch 31 Batch 30/172] avg loss 0.00589364, throughput 3.2569K wps
[Epoch 31 Batch 60/172] avg loss 0.00540048, throughput 3.22285K wps
[Epoch 31 Batch 90/172] avg loss 0.00580304, throughput 2.82822K wps
[Epoch 31 Batch 120/172] avg loss 0.00558404, throughput 2.87527K wps
[Epoch 31 Batch 150/172] avg loss 0.00554986, throughput 3.4867K wps
Begin Testing...
[Epoch 31] train avg loss 0.00574556, dev acc 0.8784, dev avg loss 0.306724, throughput 3.08224K wps
[Epoch 32 Batch 30/172] avg loss 0.00511018, throughput 3.13811K wps
[Epoch 32 Batch 60/172] avg loss 0.00568895, throughput 3.68366K wps
[Epoch 32 Batch 90/172] avg loss 0.00583557, throughput 4.04503K wps
[Epoch 32 Batch 120/172] avg loss 0.00575624, throughput 2.9532K wps
[Epoch 32 Batch 150/172] avg loss 0.00588778, throughput 3.31346K wps
Begin Testing...
[Epoch 32] train avg loss 0.00559614, dev acc 0.8774, dev avg loss 0.306169, throughput 3.43838K wps
[Epoch 33 Batch 30/172] avg loss 0.00527906, throughput 3.13354K wps
[Epoch 33 Batch 60/172] avg loss 0.00543822, throughput 2.9945K wps
[Epoch 33 Batch 90/172] avg loss 0.00558288, throughput 3.49092K wps
[Epoch 33 Batch 120/172] avg loss 0.00522591, throughput 3.16752K wps
[Epoch 33 Batch 150/172] avg loss 0.00596157, throughput 2.99157K wps
Begin Testing...
[Epoch 33] train avg loss 0.00553968, dev acc 0.8805, dev avg loss 0.303638, throughput 3.13875K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/172] avg loss 0.00582964, throughput 3.38732K wps
[Epoch 34 Batch 60/172] avg loss 0.00584805, throughput 3.47832K wps
[Epoch 34 Batch 90/172] avg loss 0.00538751, throughput 3.11784K wps
[Epoch 34 Batch 120/172] avg loss 0.00521102, throughput 3.33727K wps
[Epoch 34 Batch 150/172] avg loss 0.00556222, throughput 3.47843K wps
Begin Testing...
[Epoch 34] train avg loss 0.00550054, dev acc 0.8784, dev avg loss 0.303678, throughput 3.37165K wps
[Epoch 35 Batch 30/172] avg loss 0.00525401, throughput 3.09261K wps
[Epoch 35 Batch 60/172] avg loss 0.0053626, throughput 3.05743K wps
[Epoch 35 Batch 90/172] avg loss 0.00552438, throughput 3.28116K wps
[Epoch 35 Batch 120/172] avg loss 0.00592613, throughput 3.27532K wps
[Epoch 35 Batch 150/172] avg loss 0.00517675, throughput 3.20705K wps
Begin Testing...
[Epoch 35] train avg loss 0.0054771, dev acc 0.8826, dev avg loss 0.301958, throughput 3.15174K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/172] avg loss 0.00531488, throughput 3.35658K wps
[Epoch 36 Batch 60/172] avg loss 0.00538417, throughput 3.2213K wps
[Epoch 36 Batch 90/172] avg loss 0.0053467, throughput 3.20527K wps
[Epoch 36 Batch 120/172] avg loss 0.00557561, throughput 3.52871K wps
[Epoch 36 Batch 150/172] avg loss 0.00556207, throughput 2.98268K wps
Begin Testing...
[Epoch 36] train avg loss 0.00545444, dev acc 0.8826, dev avg loss 0.300951, throughput 3.26492K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/172] avg loss 0.00530603, throughput 3.14862K wps
[Epoch 37 Batch 60/172] avg loss 0.0052495, throughput 3.44951K wps
[Epoch 37 Batch 90/172] avg loss 0.00539426, throughput 2.97872K wps
[Epoch 37 Batch 120/172] avg loss 0.00537585, throughput 3.26534K wps
[Epoch 37 Batch 150/172] avg loss 0.00530836, throughput 3.3975K wps
Begin Testing...
[Epoch 37] train avg loss 0.00540317, dev acc 0.8826, dev avg loss 0.300212, throughput 3.26054K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/172] avg loss 0.00573187, throughput 3.42024K wps
[Epoch 38 Batch 60/172] avg loss 0.0051257, throughput 3.82002K wps
[Epoch 38 Batch 90/172] avg loss 0.00545522, throughput 3.01604K wps
[Epoch 38 Batch 120/172] avg loss 0.00514336, throughput 3.04805K wps
[Epoch 38 Batch 150/172] avg loss 0.00562819, throughput 3.44746K wps
Begin Testing...
[Epoch 38] train avg loss 0.00539816, dev acc 0.8847, dev avg loss 0.29965, throughput 3.32341K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/172] avg loss 0.00549327, throughput 3.07794K wps
[Epoch 39 Batch 60/172] avg loss 0.00512159, throughput 2.89641K wps
[Epoch 39 Batch 90/172] avg loss 0.00558526, throughput 2.96524K wps
[Epoch 39 Batch 120/172] avg loss 0.00545365, throughput 3.23778K wps
[Epoch 39 Batch 150/172] avg loss 0.00551086, throughput 3.39843K wps
Begin Testing...
[Epoch 39] train avg loss 0.00540512, dev acc 0.8847, dev avg loss 0.299635, throughput 3.11265K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/172] avg loss 0.00577654, throughput 3.10277K wps
[Epoch 40 Batch 60/172] avg loss 0.00520822, throughput 3.07245K wps
[Epoch 40 Batch 90/172] avg loss 0.00527059, throughput 3.45958K wps
[Epoch 40 Batch 120/172] avg loss 0.0053301, throughput 3.66469K wps
[Epoch 40 Batch 150/172] avg loss 0.00468038, throughput 3.24265K wps
Begin Testing...
[Epoch 40] train avg loss 0.00529221, dev acc 0.8816, dev avg loss 0.298998, throughput 3.24453K wps
[Epoch 41 Batch 30/172] avg loss 0.00554737, throughput 3.2319K wps
[Epoch 41 Batch 60/172] avg loss 0.00516385, throughput 3.04078K wps
[Epoch 41 Batch 90/172] avg loss 0.0053642, throughput 3.05962K wps
[Epoch 41 Batch 120/172] avg loss 0.00534818, throughput 3.10069K wps
[Epoch 41 Batch 150/172] avg loss 0.0055899, throughput 3.78193K wps
Begin Testing...
[Epoch 41] train avg loss 0.00531936, dev acc 0.8868, dev avg loss 0.297648, throughput 3.23141K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/172] avg loss 0.004872, throughput 2.95751K wps
[Epoch 42 Batch 60/172] avg loss 0.00528704, throughput 2.92965K wps
[Epoch 42 Batch 90/172] avg loss 0.00550098, throughput 3.20139K wps
[Epoch 42 Batch 120/172] avg loss 0.00525231, throughput 3.51374K wps
[Epoch 42 Batch 150/172] avg loss 0.00525616, throughput 3.20584K wps
Begin Testing...
[Epoch 42] train avg loss 0.00523914, dev acc 0.8857, dev avg loss 0.297303, throughput 3.17303K wps
[Epoch 43 Batch 30/172] avg loss 0.00516612, throughput 3.1028K wps
[Epoch 43 Batch 60/172] avg loss 0.00515936, throughput 3.25575K wps
[Epoch 43 Batch 90/172] avg loss 0.0050249, throughput 3.46283K wps
[Epoch 43 Batch 120/172] avg loss 0.00502948, throughput 3.54589K wps
[Epoch 43 Batch 150/172] avg loss 0.00544026, throughput 2.87783K wps
Begin Testing...
[Epoch 43] train avg loss 0.00522373, dev acc 0.8857, dev avg loss 0.296956, throughput 3.24518K wps
[Epoch 44 Batch 30/172] avg loss 0.00535413, throughput 3.58512K wps
[Epoch 44 Batch 60/172] avg loss 0.0053574, throughput 3.42889K wps
[Epoch 44 Batch 90/172] avg loss 0.00519385, throughput 3.49666K wps
[Epoch 44 Batch 120/172] avg loss 0.00518586, throughput 3.14486K wps
[Epoch 44 Batch 150/172] avg loss 0.00509441, throughput 3.55502K wps
Begin Testing...
[Epoch 44] train avg loss 0.00521169, dev acc 0.8868, dev avg loss 0.296235, throughput 3.35357K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/172] avg loss 0.00526172, throughput 3.15473K wps
[Epoch 45 Batch 60/172] avg loss 0.00467022, throughput 3.43089K wps
[Epoch 45 Batch 90/172] avg loss 0.00565718, throughput 3.64352K wps
[Epoch 45 Batch 120/172] avg loss 0.00498134, throughput 2.95058K wps
[Epoch 45 Batch 150/172] avg loss 0.00494585, throughput 3.29887K wps
Begin Testing...
[Epoch 45] train avg loss 0.00518641, dev acc 0.8868, dev avg loss 0.295661, throughput 3.24048K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/172] avg loss 0.00503835, throughput 3.65936K wps
[Epoch 46 Batch 60/172] avg loss 0.00531913, throughput 2.99864K wps
[Epoch 46 Batch 90/172] avg loss 0.0049571, throughput 3.21921K wps
[Epoch 46 Batch 120/172] avg loss 0.00554236, throughput 2.87964K wps
[Epoch 46 Batch 150/172] avg loss 0.00511143, throughput 3.04718K wps
Begin Testing...
[Epoch 46] train avg loss 0.00512866, dev acc 0.8868, dev avg loss 0.29544, throughput 3.13827K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/172] avg loss 0.00495965, throughput 3.29231K wps
[Epoch 47 Batch 60/172] avg loss 0.00525356, throughput 3.02094K wps
[Epoch 47 Batch 90/172] avg loss 0.00476325, throughput 3.1279K wps
[Epoch 47 Batch 120/172] avg loss 0.00516842, throughput 2.97751K wps
[Epoch 47 Batch 150/172] avg loss 0.00520985, throughput 2.98171K wps
Begin Testing...
[Epoch 47] train avg loss 0.00507658, dev acc 0.8889, dev avg loss 0.295646, throughput 3.08209K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/172] avg loss 0.00514, throughput 3.24551K wps
[Epoch 48 Batch 60/172] avg loss 0.00493684, throughput 3.35148K wps
[Epoch 48 Batch 90/172] avg loss 0.00551604, throughput 3.13105K wps
[Epoch 48 Batch 120/172] avg loss 0.00512679, throughput 2.96864K wps
[Epoch 48 Batch 150/172] avg loss 0.00525897, throughput 3.02643K wps
Begin Testing...
[Epoch 48] train avg loss 0.00507714, dev acc 0.8889, dev avg loss 0.294503, throughput 3.13643K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/172] avg loss 0.00517628, throughput 3.06589K wps
[Epoch 49 Batch 60/172] avg loss 0.00508764, throughput 3.08049K wps
[Epoch 49 Batch 90/172] avg loss 0.00494481, throughput 3.30501K wps
[Epoch 49 Batch 120/172] avg loss 0.00520737, throughput 3.33165K wps
[Epoch 49 Batch 150/172] avg loss 0.00502998, throughput 3.24977K wps
Begin Testing...
[Epoch 49] train avg loss 0.00506518, dev acc 0.8910, dev avg loss 0.293874, throughput 3.24475K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/172] avg loss 0.00494742, throughput 3.08114K wps
[Epoch 50 Batch 60/172] avg loss 0.00475817, throughput 3.12007K wps
[Epoch 50 Batch 90/172] avg loss 0.00480093, throughput 3.31216K wps
[Epoch 50 Batch 120/172] avg loss 0.00533553, throughput 3.47328K wps
[Epoch 50 Batch 150/172] avg loss 0.00486297, throughput 3.05581K wps
Begin Testing...
[Epoch 50] train avg loss 0.00496461, dev acc 0.8899, dev avg loss 0.29404, throughput 3.27918K wps
[Epoch 51 Batch 30/172] avg loss 0.00470829, throughput 3.21309K wps
[Epoch 51 Batch 60/172] avg loss 0.00503592, throughput 3.18584K wps
[Epoch 51 Batch 90/172] avg loss 0.00516877, throughput 3.41846K wps
[Epoch 51 Batch 120/172] avg loss 0.00467271, throughput 3.54006K wps
[Epoch 51 Batch 150/172] avg loss 0.00499618, throughput 2.97423K wps
Begin Testing...
[Epoch 51] train avg loss 0.00496639, dev acc 0.8910, dev avg loss 0.293587, throughput 3.21372K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/172] avg loss 0.00484746, throughput 2.93354K wps
[Epoch 52 Batch 60/172] avg loss 0.00507291, throughput 3.21386K wps
[Epoch 52 Batch 90/172] avg loss 0.00494545, throughput 3.04767K wps
[Epoch 52 Batch 120/172] avg loss 0.00465334, throughput 3.78097K wps
[Epoch 52 Batch 150/172] avg loss 0.00518025, throughput 3.6499K wps
Begin Testing...
[Epoch 52] train avg loss 0.00492187, dev acc 0.8910, dev avg loss 0.293009, throughput 3.32365K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/172] avg loss 0.00509023, throughput 3.23183K wps
[Epoch 53 Batch 60/172] avg loss 0.00483509, throughput 2.8955K wps
[Epoch 53 Batch 90/172] avg loss 0.00480007, throughput 3.28125K wps
[Epoch 53 Batch 120/172] avg loss 0.0049985, throughput 2.86994K wps
[Epoch 53 Batch 150/172] avg loss 0.00478899, throughput 3.07782K wps
Begin Testing...
[Epoch 53] train avg loss 0.00490441, dev acc 0.8910, dev avg loss 0.29256, throughput 3.04124K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/172] avg loss 0.00517714, throughput 3.20786K wps
[Epoch 54 Batch 60/172] avg loss 0.00515157, throughput 3.27236K wps
[Epoch 54 Batch 90/172] avg loss 0.00517511, throughput 3.23121K wps
[Epoch 54 Batch 120/172] avg loss 0.00457853, throughput 4.00046K wps
[Epoch 54 Batch 150/172] avg loss 0.00454756, throughput 3.04633K wps
Begin Testing...
[Epoch 54] train avg loss 0.00489287, dev acc 0.8910, dev avg loss 0.292693, throughput 3.32072K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/172] avg loss 0.00439213, throughput 3.31615K wps
[Epoch 55 Batch 60/172] avg loss 0.00512817, throughput 3.07884K wps
[Epoch 55 Batch 90/172] avg loss 0.00527859, throughput 3.35551K wps
[Epoch 55 Batch 120/172] avg loss 0.00479906, throughput 3.27775K wps
[Epoch 55 Batch 150/172] avg loss 0.00476784, throughput 3.29372K wps
Begin Testing...
[Epoch 55] train avg loss 0.00489331, dev acc 0.8910, dev avg loss 0.291785, throughput 3.31206K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/172] avg loss 0.00542508, throughput 3.06577K wps
[Epoch 56 Batch 60/172] avg loss 0.00495694, throughput 2.9792K wps
[Epoch 56 Batch 90/172] avg loss 0.00434699, throughput 3.34748K wps
[Epoch 56 Batch 120/172] avg loss 0.00487334, throughput 3.2014K wps
[Epoch 56 Batch 150/172] avg loss 0.00481519, throughput 3.02683K wps
Begin Testing...
[Epoch 56] train avg loss 0.0048543, dev acc 0.8910, dev avg loss 0.291747, throughput 3.1554K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/172] avg loss 0.00469274, throughput 3.2535K wps
[Epoch 57 Batch 60/172] avg loss 0.00476396, throughput 3.62452K wps
[Epoch 57 Batch 90/172] avg loss 0.00489662, throughput 2.95566K wps
[Epoch 57 Batch 120/172] avg loss 0.00452628, throughput 2.98472K wps
[Epoch 57 Batch 150/172] avg loss 0.00491915, throughput 3.56336K wps
Begin Testing...
[Epoch 57] train avg loss 0.00476486, dev acc 0.8899, dev avg loss 0.291172, throughput 3.22231K wps
[Epoch 58 Batch 30/172] avg loss 0.00464837, throughput 3.15644K wps
[Epoch 58 Batch 60/172] avg loss 0.00486665, throughput 2.89863K wps
[Epoch 58 Batch 90/172] avg loss 0.00473555, throughput 3.24299K wps
[Epoch 58 Batch 120/172] avg loss 0.00464807, throughput 3.36522K wps
[Epoch 58 Batch 150/172] avg loss 0.00466565, throughput 3.44119K wps
Begin Testing...
[Epoch 58] train avg loss 0.00479479, dev acc 0.8952, dev avg loss 0.290878, throughput 3.24017K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/172] avg loss 0.00502225, throughput 3.21676K wps
[Epoch 59 Batch 60/172] avg loss 0.00460453, throughput 3.14082K wps
[Epoch 59 Batch 90/172] avg loss 0.00458289, throughput 3.061K wps
[Epoch 59 Batch 120/172] avg loss 0.00476421, throughput 3.26185K wps
[Epoch 59 Batch 150/172] avg loss 0.00481015, throughput 3.08927K wps
Begin Testing...
[Epoch 59] train avg loss 0.00473086, dev acc 0.8920, dev avg loss 0.290838, throughput 3.19663K wps
[Epoch 60 Batch 30/172] avg loss 0.00492577, throughput 3.52464K wps
[Epoch 60 Batch 60/172] avg loss 0.00467811, throughput 3.40876K wps
[Epoch 60 Batch 90/172] avg loss 0.00475258, throughput 3.63952K wps
[Epoch 60 Batch 120/172] avg loss 0.00426243, throughput 3.57549K wps
[Epoch 60 Batch 150/172] avg loss 0.00491706, throughput 3.11457K wps
Begin Testing...
[Epoch 60] train avg loss 0.00474271, dev acc 0.8920, dev avg loss 0.290655, throughput 3.47814K wps
[Epoch 61 Batch 30/172] avg loss 0.00465941, throughput 3.04142K wps
[Epoch 61 Batch 60/172] avg loss 0.00456225, throughput 2.9139K wps
[Epoch 61 Batch 90/172] avg loss 0.00472556, throughput 3.42994K wps
[Epoch 61 Batch 120/172] avg loss 0.00480213, throughput 3.25182K wps
[Epoch 61 Batch 150/172] avg loss 0.00491107, throughput 3.12986K wps
Begin Testing...
[Epoch 61] train avg loss 0.00471998, dev acc 0.8920, dev avg loss 0.291079, throughput 3.18471K wps
[Epoch 62 Batch 30/172] avg loss 0.00475273, throughput 2.96109K wps
[Epoch 62 Batch 60/172] avg loss 0.00454693, throughput 3.17198K wps
[Epoch 62 Batch 90/172] avg loss 0.00505831, throughput 3.38328K wps
[Epoch 62 Batch 120/172] avg loss 0.00446228, throughput 3.59412K wps
[Epoch 62 Batch 150/172] avg loss 0.00499186, throughput 3.55837K wps
Begin Testing...
[Epoch 62] train avg loss 0.00473727, dev acc 0.8920, dev avg loss 0.290198, throughput 3.30777K wps
[Epoch 63 Batch 30/172] avg loss 0.00444945, throughput 3.53873K wps
[Epoch 63 Batch 60/172] avg loss 0.00516302, throughput 3.69598K wps
[Epoch 63 Batch 90/172] avg loss 0.00446296, throughput 3.27949K wps
[Epoch 63 Batch 120/172] avg loss 0.0048082, throughput 3.40425K wps
[Epoch 63 Batch 150/172] avg loss 0.00470815, throughput 3.30544K wps
Begin Testing...
[Epoch 63] train avg loss 0.0046694, dev acc 0.8941, dev avg loss 0.289753, throughput 3.3596K wps
[Epoch 64 Batch 30/172] avg loss 0.00498284, throughput 3.10942K wps
[Epoch 64 Batch 60/172] avg loss 0.00449775, throughput 3.4226K wps
[Epoch 64 Batch 90/172] avg loss 0.00453805, throughput 2.97246K wps
[Epoch 64 Batch 120/172] avg loss 0.00445673, throughput 3.26335K wps
[Epoch 64 Batch 150/172] avg loss 0.0046603, throughput 3.1816K wps
Begin Testing...
[Epoch 64] train avg loss 0.0046254, dev acc 0.8920, dev avg loss 0.290628, throughput 3.23306K wps
[Epoch 65 Batch 30/172] avg loss 0.00436757, throughput 3.18026K wps
[Epoch 65 Batch 60/172] avg loss 0.00490062, throughput 3.1408K wps
[Epoch 65 Batch 90/172] avg loss 0.0049002, throughput 3.29139K wps
[Epoch 65 Batch 120/172] avg loss 0.0044319, throughput 2.98287K wps
[Epoch 65 Batch 150/172] avg loss 0.00441191, throughput 2.94407K wps
Begin Testing...
[Epoch 65] train avg loss 0.00464899, dev acc 0.8931, dev avg loss 0.290854, throughput 3.09185K wps
[Epoch 66 Batch 30/172] avg loss 0.00491189, throughput 3.46765K wps
[Epoch 66 Batch 60/172] avg loss 0.00440344, throughput 3.12516K wps
[Epoch 66 Batch 90/172] avg loss 0.00410055, throughput 3.79524K wps
[Epoch 66 Batch 120/172] avg loss 0.00475601, throughput 3.08849K wps
[Epoch 66 Batch 150/172] avg loss 0.00447231, throughput 3.29487K wps
Begin Testing...
[Epoch 66] train avg loss 0.00455857, dev acc 0.8920, dev avg loss 0.289625, throughput 3.38715K wps
[Epoch 67 Batch 30/172] avg loss 0.0044584, throughput 3.42793K wps
[Epoch 67 Batch 60/172] avg loss 0.00481578, throughput 3.14264K wps
[Epoch 67 Batch 90/172] avg loss 0.00466946, throughput 3.06517K wps
[Epoch 67 Batch 120/172] avg loss 0.0046749, throughput 3.01872K wps
[Epoch 67 Batch 150/172] avg loss 0.00426306, throughput 3.203K wps
Begin Testing...
[Epoch 67] train avg loss 0.00456029, dev acc 0.8931, dev avg loss 0.289565, throughput 3.12777K wps
[Epoch 68 Batch 30/172] avg loss 0.00481366, throughput 3.08636K wps
[Epoch 68 Batch 60/172] avg loss 0.00417077, throughput 3.66319K wps
[Epoch 68 Batch 90/172] avg loss 0.00452732, throughput 3.18965K wps
[Epoch 68 Batch 120/172] avg loss 0.00461212, throughput 3.26057K wps
[Epoch 68 Batch 150/172] avg loss 0.00451984, throughput 3.01051K wps
Begin Testing...
[Epoch 68] train avg loss 0.00454344, dev acc 0.8931, dev avg loss 0.290251, throughput 3.2355K wps
[Epoch 69 Batch 30/172] avg loss 0.00463247, throughput 3.52891K wps
[Epoch 69 Batch 60/172] avg loss 0.00473696, throughput 2.92278K wps
[Epoch 69 Batch 90/172] avg loss 0.00444612, throughput 3.2718K wps
[Epoch 69 Batch 120/172] avg loss 0.00430056, throughput 3.5018K wps
[Epoch 69 Batch 150/172] avg loss 0.00422796, throughput 3.00383K wps
Begin Testing...
[Epoch 69] train avg loss 0.00449458, dev acc 0.8931, dev avg loss 0.289535, throughput 3.26599K wps
[Epoch 70 Batch 30/172] avg loss 0.00477549, throughput 3.14251K wps
[Epoch 70 Batch 60/172] avg loss 0.00446934, throughput 3.17667K wps
[Epoch 70 Batch 90/172] avg loss 0.00432902, throughput 3.22294K wps
[Epoch 70 Batch 120/172] avg loss 0.00443986, throughput 2.9685K wps
[Epoch 70 Batch 150/172] avg loss 0.00468949, throughput 3.57108K wps
Begin Testing...
[Epoch 70] train avg loss 0.00453949, dev acc 0.8941, dev avg loss 0.28906, throughput 3.23889K wps
[Epoch 71 Batch 30/172] avg loss 0.00419962, throughput 3.11957K wps
[Epoch 71 Batch 60/172] avg loss 0.00437959, throughput 3.58624K wps
[Epoch 71 Batch 90/172] avg loss 0.00434645, throughput 3.13405K wps
[Epoch 71 Batch 120/172] avg loss 0.0050608, throughput 4.00807K wps
[Epoch 71 Batch 150/172] avg loss 0.00439632, throughput 3.73364K wps
Begin Testing...
[Epoch 71] train avg loss 0.00452136, dev acc 0.8910, dev avg loss 0.288529, throughput 3.45298K wps
[Epoch 72 Batch 30/172] avg loss 0.00416834, throughput 2.90564K wps
[Epoch 72 Batch 60/172] avg loss 0.00464691, throughput 3.15045K wps
[Epoch 72 Batch 90/172] avg loss 0.0045682, throughput 2.91078K wps
[Epoch 72 Batch 120/172] avg loss 0.00434568, throughput 3.14972K wps
[Epoch 72 Batch 150/172] avg loss 0.00458366, throughput 4.12707K wps
Begin Testing...
[Epoch 72] train avg loss 0.00441303, dev acc 0.8941, dev avg loss 0.289258, throughput 3.22225K wps
[Epoch 73 Batch 30/172] avg loss 0.00431959, throughput 3.24994K wps
[Epoch 73 Batch 60/172] avg loss 0.00466357, throughput 3.2344K wps
[Epoch 73 Batch 90/172] avg loss 0.00439424, throughput 3.09936K wps
[Epoch 73 Batch 120/172] avg loss 0.00433194, throughput 3.5559K wps
[Epoch 73 Batch 150/172] avg loss 0.00424773, throughput 2.91119K wps
Begin Testing...
[Epoch 73] train avg loss 0.00442798, dev acc 0.8931, dev avg loss 0.288607, throughput 3.21707K wps
[Epoch 74 Batch 30/172] avg loss 0.00466745, throughput 3.32035K wps
[Epoch 74 Batch 60/172] avg loss 0.00473098, throughput 3.20546K wps
[Epoch 74 Batch 90/172] avg loss 0.00422228, throughput 3.11554K wps
[Epoch 74 Batch 120/172] avg loss 0.00440042, throughput 3.17228K wps
[Epoch 74 Batch 150/172] avg loss 0.00429774, throughput 3.38104K wps
Begin Testing...
[Epoch 74] train avg loss 0.00443787, dev acc 0.8920, dev avg loss 0.290401, throughput 3.33477K wps
[Epoch 75 Batch 30/172] avg loss 0.00421157, throughput 3.24751K wps
[Epoch 75 Batch 60/172] avg loss 0.00413942, throughput 3.38262K wps
[Epoch 75 Batch 90/172] avg loss 0.00462275, throughput 3.80819K wps
[Epoch 75 Batch 120/172] avg loss 0.00474709, throughput 4.12804K wps
[Epoch 75 Batch 150/172] avg loss 0.00425725, throughput 3.45016K wps
Begin Testing...
[Epoch 75] train avg loss 0.00435231, dev acc 0.8931, dev avg loss 0.288779, throughput 3.54438K wps
[Epoch 76 Batch 30/172] avg loss 0.00447376, throughput 3.47886K wps
[Epoch 76 Batch 60/172] avg loss 0.00396477, throughput 3.5373K wps
[Epoch 76 Batch 90/172] avg loss 0.00488424, throughput 3.16726K wps
[Epoch 76 Batch 120/172] avg loss 0.00443685, throughput 3.74525K wps
[Epoch 76 Batch 150/172] avg loss 0.00429369, throughput 3.2395K wps
Begin Testing...
[Epoch 76] train avg loss 0.00434737, dev acc 0.8920, dev avg loss 0.288468, throughput 3.40167K wps
[Epoch 77 Batch 30/172] avg loss 0.00424375, throughput 3.30537K wps
[Epoch 77 Batch 60/172] avg loss 0.00434253, throughput 3.10215K wps
[Epoch 77 Batch 90/172] avg loss 0.00446783, throughput 3.0596K wps
[Epoch 77 Batch 120/172] avg loss 0.00421779, throughput 2.97094K wps
[Epoch 77 Batch 150/172] avg loss 0.00411392, throughput 3.04692K wps
Begin Testing...
[Epoch 77] train avg loss 0.00433333, dev acc 0.8952, dev avg loss 0.288326, throughput 3.14077K wps
Observed Improvement.
Begin Testing...
[Epoch 78 Batch 30/172] avg loss 0.00414387, throughput 2.90797K wps
[Epoch 78 Batch 60/172] avg loss 0.00461166, throughput 3.06603K wps
[Epoch 78 Batch 90/172] avg loss 0.0037979, throughput 3.2315K wps
[Epoch 78 Batch 120/172] avg loss 0.00450695, throughput 3.05258K wps
[Epoch 78 Batch 150/172] avg loss 0.00467417, throughput 3.157K wps
Begin Testing...
[Epoch 78] train avg loss 0.00428996, dev acc 0.8931, dev avg loss 0.288615, throughput 3.09958K wps
[Epoch 79 Batch 30/172] avg loss 0.00431482, throughput 2.94395K wps
[Epoch 79 Batch 60/172] avg loss 0.00457198, throughput 3.01226K wps
[Epoch 79 Batch 90/172] avg loss 0.00430128, throughput 2.94367K wps
[Epoch 79 Batch 120/172] avg loss 0.00436329, throughput 3.49268K wps
[Epoch 79 Batch 150/172] avg loss 0.00406549, throughput 3.43147K wps
Begin Testing...
[Epoch 79] train avg loss 0.00431558, dev acc 0.8931, dev avg loss 0.288707, throughput 3.14521K wps
[Epoch 80 Batch 30/172] avg loss 0.00405893, throughput 3.21048K wps
[Epoch 80 Batch 60/172] avg loss 0.00441195, throughput 3.09887K wps
[Epoch 80 Batch 90/172] avg loss 0.0041375, throughput 3.63138K wps
[Epoch 80 Batch 120/172] avg loss 0.00443323, throughput 3.73838K wps
[Epoch 80 Batch 150/172] avg loss 0.00421422, throughput 3.43341K wps
Begin Testing...
[Epoch 80] train avg loss 0.0042667, dev acc 0.8920, dev avg loss 0.288482, throughput 3.41657K wps
[Epoch 81 Batch 30/172] avg loss 0.00403282, throughput 2.91481K wps
[Epoch 81 Batch 60/172] avg loss 0.0043716, throughput 3.10163K wps
[Epoch 81 Batch 90/172] avg loss 0.00436788, throughput 3.67712K wps
[Epoch 81 Batch 120/172] avg loss 0.00399947, throughput 3.05896K wps
[Epoch 81 Batch 150/172] avg loss 0.00460142, throughput 3.8275K wps
Begin Testing...
[Epoch 81] train avg loss 0.00429336, dev acc 0.8952, dev avg loss 0.288049, throughput 3.23988K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/172] avg loss 0.00428541, throughput 3.28753K wps
[Epoch 82 Batch 60/172] avg loss 0.00462358, throughput 2.86829K wps
[Epoch 82 Batch 90/172] avg loss 0.00382982, throughput 3.06504K wps
[Epoch 82 Batch 120/172] avg loss 0.00417744, throughput 3.65696K wps
[Epoch 82 Batch 150/172] avg loss 0.00432791, throughput 3.75582K wps
Begin Testing...
[Epoch 82] train avg loss 0.00419879, dev acc 0.8973, dev avg loss 0.288152, throughput 3.27802K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/172] avg loss 0.00400673, throughput 2.9557K wps
[Epoch 83 Batch 60/172] avg loss 0.00402086, throughput 3.93179K wps
[Epoch 83 Batch 90/172] avg loss 0.00384908, throughput 3.24148K wps
[Epoch 83 Batch 120/172] avg loss 0.00441908, throughput 3.1995K wps
[Epoch 83 Batch 150/172] avg loss 0.00449433, throughput 3.70562K wps
Begin Testing...
[Epoch 83] train avg loss 0.00415648, dev acc 0.8952, dev avg loss 0.288176, throughput 3.35512K wps
[Epoch 84 Batch 30/172] avg loss 0.00411717, throughput 3.2803K wps
[Epoch 84 Batch 60/172] avg loss 0.00411664, throughput 3.34154K wps
[Epoch 84 Batch 90/172] avg loss 0.00386669, throughput 2.98958K wps
[Epoch 84 Batch 120/172] avg loss 0.00402667, throughput 3.16976K wps
[Epoch 84 Batch 150/172] avg loss 0.00422101, throughput 2.99731K wps
Begin Testing...
[Epoch 84] train avg loss 0.00412546, dev acc 0.8962, dev avg loss 0.288258, throughput 3.13195K wps
[Epoch 85 Batch 30/172] avg loss 0.00411886, throughput 3.40285K wps
[Epoch 85 Batch 60/172] avg loss 0.00408344, throughput 3.68083K wps
[Epoch 85 Batch 90/172] avg loss 0.0039377, throughput 3.3341K wps
[Epoch 85 Batch 120/172] avg loss 0.00491194, throughput 2.99192K wps
[Epoch 85 Batch 150/172] avg loss 0.0037347, throughput 3.66302K wps
Begin Testing...
[Epoch 85] train avg loss 0.00418198, dev acc 0.8952, dev avg loss 0.288093, throughput 3.43861K wps
[Epoch 86 Batch 30/172] avg loss 0.00397635, throughput 3.27426K wps
[Epoch 86 Batch 60/172] avg loss 0.00393046, throughput 3.45287K wps
[Epoch 86 Batch 90/172] avg loss 0.00414686, throughput 3.47978K wps
[Epoch 86 Batch 120/172] avg loss 0.0043437, throughput 3.67024K wps
[Epoch 86 Batch 150/172] avg loss 0.00440783, throughput 2.95035K wps
Begin Testing...
[Epoch 86] train avg loss 0.00413906, dev acc 0.8920, dev avg loss 0.288615, throughput 3.28263K wps
[Epoch 87 Batch 30/172] avg loss 0.00421413, throughput 3.05942K wps
[Epoch 87 Batch 60/172] avg loss 0.00399844, throughput 2.99122K wps
[Epoch 87 Batch 90/172] avg loss 0.00413841, throughput 3.01179K wps
[Epoch 87 Batch 120/172] avg loss 0.00369471, throughput 3.36717K wps
[Epoch 87 Batch 150/172] avg loss 0.00428959, throughput 3.23648K wps
Begin Testing...
[Epoch 87] train avg loss 0.00411718, dev acc 0.8931, dev avg loss 0.287475, throughput 3.11131K wps
[Epoch 88 Batch 30/172] avg loss 0.00404079, throughput 3.00683K wps
[Epoch 88 Batch 60/172] avg loss 0.00441427, throughput 3.49523K wps
[Epoch 88 Batch 90/172] avg loss 0.00422544, throughput 3.84714K wps
[Epoch 88 Batch 120/172] avg loss 0.00386855, throughput 3.3264K wps
[Epoch 88 Batch 150/172] avg loss 0.00380513, throughput 3.18545K wps
Begin Testing...
[Epoch 88] train avg loss 0.00408393, dev acc 0.8920, dev avg loss 0.287587, throughput 3.32741K wps
[Epoch 89 Batch 30/172] avg loss 0.00418543, throughput 3.26363K wps
[Epoch 89 Batch 60/172] avg loss 0.00425141, throughput 3.01589K wps
[Epoch 89 Batch 90/172] avg loss 0.00412781, throughput 3.17198K wps
[Epoch 89 Batch 120/172] avg loss 0.0038558, throughput 3.35134K wps
[Epoch 89 Batch 150/172] avg loss 0.00415142, throughput 3.3745K wps
Begin Testing...
[Epoch 89] train avg loss 0.00406459, dev acc 0.8952, dev avg loss 0.287745, throughput 3.18103K wps
[Epoch 90 Batch 30/172] avg loss 0.00371218, throughput 3.04618K wps
[Epoch 90 Batch 60/172] avg loss 0.00400038, throughput 2.88861K wps
[Epoch 90 Batch 90/172] avg loss 0.00414436, throughput 2.89744K wps
[Epoch 90 Batch 120/172] avg loss 0.00435015, throughput 2.954K wps
[Epoch 90 Batch 150/172] avg loss 0.00372285, throughput 3.226K wps
Begin Testing...
[Epoch 90] train avg loss 0.00399751, dev acc 0.8910, dev avg loss 0.28774, throughput 2.98784K wps
[Epoch 91 Batch 30/172] avg loss 0.00398294, throughput 3.30853K wps
[Epoch 91 Batch 60/172] avg loss 0.00390085, throughput 3.476K wps
[Epoch 91 Batch 90/172] avg loss 0.00379705, throughput 3.72158K wps
[Epoch 91 Batch 120/172] avg loss 0.00434308, throughput 3.53962K wps
[Epoch 91 Batch 150/172] avg loss 0.00393563, throughput 3.36419K wps
Begin Testing...
[Epoch 91] train avg loss 0.00399626, dev acc 0.8952, dev avg loss 0.288282, throughput 3.41452K wps
[Epoch 92 Batch 30/172] avg loss 0.00392608, throughput 3.11706K wps
[Epoch 92 Batch 60/172] avg loss 0.00369887, throughput 3.11966K wps
[Epoch 92 Batch 90/172] avg loss 0.00412449, throughput 3.26468K wps
[Epoch 92 Batch 120/172] avg loss 0.00425181, throughput 3.07975K wps
[Epoch 92 Batch 150/172] avg loss 0.00373091, throughput 3.37615K wps
Begin Testing...
[Epoch 92] train avg loss 0.00397902, dev acc 0.8952, dev avg loss 0.287774, throughput 3.18978K wps
[Epoch 93 Batch 30/172] avg loss 0.00387646, throughput 3.1538K wps
[Epoch 93 Batch 60/172] avg loss 0.00417917, throughput 3.434K wps
[Epoch 93 Batch 90/172] avg loss 0.00385847, throughput 3.30807K wps
[Epoch 93 Batch 120/172] avg loss 0.00411273, throughput 3.77804K wps
[Epoch 93 Batch 150/172] avg loss 0.00385832, throughput 3.22445K wps
Begin Testing...
[Epoch 93] train avg loss 0.00397853, dev acc 0.8952, dev avg loss 0.287914, throughput 3.34648K wps
[Epoch 94 Batch 30/172] avg loss 0.00380235, throughput 3.65197K wps
[Epoch 94 Batch 60/172] avg loss 0.00405506, throughput 3.11579K wps
[Epoch 94 Batch 90/172] avg loss 0.00417256, throughput 2.98588K wps
[Epoch 94 Batch 120/172] avg loss 0.00410221, throughput 3.03931K wps
[Epoch 94 Batch 150/172] avg loss 0.00381083, throughput 3.36603K wps
Begin Testing...
[Epoch 94] train avg loss 0.00399225, dev acc 0.8931, dev avg loss 0.288042, throughput 3.2931K wps
[Epoch 95 Batch 30/172] avg loss 0.00394238, throughput 3.0063K wps
[Epoch 95 Batch 60/172] avg loss 0.00400763, throughput 2.98182K wps
[Epoch 95 Batch 90/172] avg loss 0.00385179, throughput 2.95225K wps
[Epoch 95 Batch 120/172] avg loss 0.00378873, throughput 3.11958K wps
[Epoch 95 Batch 150/172] avg loss 0.00421701, throughput 3.0378K wps
Begin Testing...
[Epoch 95] train avg loss 0.0039637, dev acc 0.8920, dev avg loss 0.288038, throughput 3.13466K wps
[Epoch 96 Batch 30/172] avg loss 0.00384911, throughput 3.1105K wps
[Epoch 96 Batch 60/172] avg loss 0.00381155, throughput 3.71711K wps
[Epoch 96 Batch 90/172] avg loss 0.00410465, throughput 2.94272K wps
[Epoch 96 Batch 120/172] avg loss 0.00400315, throughput 3.05825K wps
[Epoch 96 Batch 150/172] avg loss 0.0042097, throughput 3.2173K wps
Begin Testing...
[Epoch 96] train avg loss 0.00397182, dev acc 0.8910, dev avg loss 0.288858, throughput 3.1942K wps
[Epoch 97 Batch 30/172] avg loss 0.00361455, throughput 3.64933K wps
[Epoch 97 Batch 60/172] avg loss 0.00383386, throughput 3.51009K wps
[Epoch 97 Batch 90/172] avg loss 0.00409922, throughput 3.01971K wps
[Epoch 97 Batch 120/172] avg loss 0.00380838, throughput 3.42968K wps
[Epoch 97 Batch 150/172] avg loss 0.00383468, throughput 2.97574K wps
Begin Testing...
[Epoch 97] train avg loss 0.00393394, dev acc 0.8931, dev avg loss 0.287914, throughput 3.27456K wps
[Epoch 98 Batch 30/172] avg loss 0.00358753, throughput 3.01419K wps
[Epoch 98 Batch 60/172] avg loss 0.00403582, throughput 2.95831K wps
[Epoch 98 Batch 90/172] avg loss 0.00388808, throughput 3.03635K wps
[Epoch 98 Batch 120/172] avg loss 0.00393978, throughput 3.15597K wps
[Epoch 98 Batch 150/172] avg loss 0.00406146, throughput 3.10263K wps
Begin Testing...
[Epoch 98] train avg loss 0.00389784, dev acc 0.8920, dev avg loss 0.288498, throughput 3.04023K wps
[Epoch 99 Batch 30/172] avg loss 0.00403256, throughput 3.26709K wps
[Epoch 99 Batch 60/172] avg loss 0.00374261, throughput 2.96317K wps
[Epoch 99 Batch 90/172] avg loss 0.00387819, throughput 3.12512K wps
[Epoch 99 Batch 120/172] avg loss 0.00387987, throughput 3.21153K wps
[Epoch 99 Batch 150/172] avg loss 0.00369141, throughput 3.32984K wps
Begin Testing...
[Epoch 99] train avg loss 0.00388359, dev acc 0.8910, dev avg loss 0.287578, throughput 3.13545K wps
[Epoch 100 Batch 30/172] avg loss 0.00370454, throughput 2.90353K wps
[Epoch 100 Batch 60/172] avg loss 0.00416681, throughput 3.01627K wps
[Epoch 100 Batch 90/172] avg loss 0.00367229, throughput 3.42452K wps
[Epoch 100 Batch 120/172] avg loss 0.00378642, throughput 3.56651K wps
[Epoch 100 Batch 150/172] avg loss 0.00404067, throughput 3.06262K wps
Begin Testing...
[Epoch 100] train avg loss 0.00385497, dev acc 0.8931, dev avg loss 0.290798, throughput 3.13473K wps
[Epoch 101 Batch 30/172] avg loss 0.00362722, throughput 2.91582K wps
[Epoch 101 Batch 60/172] avg loss 0.00403561, throughput 3.39096K wps
[Epoch 101 Batch 90/172] avg loss 0.00363619, throughput 2.92734K wps
[Epoch 101 Batch 120/172] avg loss 0.00380461, throughput 3.557K wps
[Epoch 101 Batch 150/172] avg loss 0.0040166, throughput 2.99773K wps
Begin Testing...
[Epoch 101] train avg loss 0.00384281, dev acc 0.8920, dev avg loss 0.287958, throughput 3.09803K wps
[Epoch 102 Batch 30/172] avg loss 0.00398713, throughput 2.99923K wps
[Epoch 102 Batch 60/172] avg loss 0.00350298, throughput 3.09381K wps
[Epoch 102 Batch 90/172] avg loss 0.00412249, throughput 3.28542K wps
[Epoch 102 Batch 120/172] avg loss 0.00382524, throughput 3.5407K wps
[Epoch 102 Batch 150/172] avg loss 0.00355246, throughput 3.3317K wps
Begin Testing...
[Epoch 102] train avg loss 0.00379039, dev acc 0.8910, dev avg loss 0.288314, throughput 3.24981K wps
[Epoch 103 Batch 30/172] avg loss 0.00373488, throughput 2.92819K wps
[Epoch 103 Batch 60/172] avg loss 0.00405807, throughput 3.18809K wps
[Epoch 103 Batch 90/172] avg loss 0.00334666, throughput 3.1621K wps
[Epoch 103 Batch 120/172] avg loss 0.00372488, throughput 3.53295K wps
[Epoch 103 Batch 150/172] avg loss 0.00412858, throughput 3.18838K wps
Begin Testing...
[Epoch 103] train avg loss 0.00381553, dev acc 0.8910, dev avg loss 0.287991, throughput 3.23149K wps
[Epoch 104 Batch 30/172] avg loss 0.00368573, throughput 3.01219K wps
[Epoch 104 Batch 60/172] avg loss 0.00375296, throughput 3.38797K wps
[Epoch 104 Batch 90/172] avg loss 0.00403113, throughput 3.55876K wps
[Epoch 104 Batch 120/172] avg loss 0.00411926, throughput 3.60693K wps
[Epoch 104 Batch 150/172] avg loss 0.0036734, throughput 3.43545K wps
Begin Testing...
[Epoch 104] train avg loss 0.00378831, dev acc 0.8920, dev avg loss 0.287951, throughput 3.40667K wps
[Epoch 105 Batch 30/172] avg loss 0.00390658, throughput 3.27872K wps
[Epoch 105 Batch 60/172] avg loss 0.00355568, throughput 3.56094K wps
[Epoch 105 Batch 90/172] avg loss 0.00353505, throughput 3.27469K wps
[Epoch 105 Batch 120/172] avg loss 0.00419389, throughput 3.3084K wps
[Epoch 105 Batch 150/172] avg loss 0.00370728, throughput 3.03696K wps
Begin Testing...
[Epoch 105] train avg loss 0.00378041, dev acc 0.8920, dev avg loss 0.288546, throughput 3.34332K wps
[Epoch 106 Batch 30/172] avg loss 0.00368186, throughput 3.21573K wps
[Epoch 106 Batch 60/172] avg loss 0.00374365, throughput 3.20121K wps
[Epoch 106 Batch 90/172] avg loss 0.00351513, throughput 3.02725K wps
[Epoch 106 Batch 120/172] avg loss 0.00389985, throughput 3.37159K wps
[Epoch 106 Batch 150/172] avg loss 0.003857, throughput 3.37846K wps
Begin Testing...
[Epoch 106] train avg loss 0.00376355, dev acc 0.8910, dev avg loss 0.287754, throughput 3.23983K wps
[Epoch 107 Batch 30/172] avg loss 0.00381799, throughput 3.04331K wps
[Epoch 107 Batch 60/172] avg loss 0.00334893, throughput 3.50903K wps
[Epoch 107 Batch 90/172] avg loss 0.00374035, throughput 2.95416K wps
[Epoch 107 Batch 120/172] avg loss 0.00359001, throughput 3.53852K wps
[Epoch 107 Batch 150/172] avg loss 0.00350323, throughput 3.18043K wps
Begin Testing...
[Epoch 107] train avg loss 0.00367484, dev acc 0.8920, dev avg loss 0.288003, throughput 3.19834K wps
[Epoch 108 Batch 30/172] avg loss 0.0039671, throughput 3.22647K wps
[Epoch 108 Batch 60/172] avg loss 0.00401323, throughput 3.25915K wps
[Epoch 108 Batch 90/172] avg loss 0.0034315, throughput 3.74596K wps
[Epoch 108 Batch 120/172] avg loss 0.00359487, throughput 3.33034K wps
[Epoch 108 Batch 150/172] avg loss 0.00359979, throughput 3.15318K wps
Begin Testing...
[Epoch 108] train avg loss 0.00371377, dev acc 0.8899, dev avg loss 0.290571, throughput 3.39583K wps
[Epoch 109 Batch 30/172] avg loss 0.00371488, throughput 3.19003K wps
[Epoch 109 Batch 60/172] avg loss 0.00381742, throughput 3.12779K wps
[Epoch 109 Batch 90/172] avg loss 0.00340356, throughput 3.36941K wps
[Epoch 109 Batch 120/172] avg loss 0.00411103, throughput 3.49436K wps
[Epoch 109 Batch 150/172] avg loss 0.00318881, throughput 3.21301K wps
Begin Testing...
[Epoch 109] train avg loss 0.0036969, dev acc 0.8889, dev avg loss 0.288644, throughput 3.34243K wps
[Epoch 110 Batch 30/172] avg loss 0.0039055, throughput 3.26736K wps
[Epoch 110 Batch 60/172] avg loss 0.00335964, throughput 3.01068K wps
[Epoch 110 Batch 90/172] avg loss 0.00350881, throughput 3.88968K wps
[Epoch 110 Batch 120/172] avg loss 0.00368449, throughput 3.4556K wps
[Epoch 110 Batch 150/172] avg loss 0.00375382, throughput 3.06619K wps
Begin Testing...
[Epoch 110] train avg loss 0.00364829, dev acc 0.8889, dev avg loss 0.288521, throughput 3.34157K wps
[Epoch 111 Batch 30/172] avg loss 0.00362977, throughput 3.44708K wps
[Epoch 111 Batch 60/172] avg loss 0.00340146, throughput 4.02245K wps
[Epoch 111 Batch 90/172] avg loss 0.00370019, throughput 3.24166K wps
[Epoch 111 Batch 120/172] avg loss 0.00382928, throughput 3.11659K wps
[Epoch 111 Batch 150/172] avg loss 0.00364969, throughput 3.07401K wps
Begin Testing...
[Epoch 111] train avg loss 0.00364102, dev acc 0.8899, dev avg loss 0.288465, throughput 3.36795K wps
[Epoch 112 Batch 30/172] avg loss 0.00336788, throughput 3.42981K wps
[Epoch 112 Batch 60/172] avg loss 0.00364739, throughput 3.16287K wps
[Epoch 112 Batch 90/172] avg loss 0.00370827, throughput 3.02449K wps
[Epoch 112 Batch 120/172] avg loss 0.00362433, throughput 3.50521K wps
[Epoch 112 Batch 150/172] avg loss 0.00353151, throughput 3.20493K wps
Begin Testing...
[Epoch 112] train avg loss 0.00364205, dev acc 0.8899, dev avg loss 0.287979, throughput 3.25123K wps
[Epoch 113 Batch 30/172] avg loss 0.00323935, throughput 3.21964K wps
[Epoch 113 Batch 60/172] avg loss 0.00354657, throughput 3.57375K wps
[Epoch 113 Batch 90/172] avg loss 0.00379021, throughput 4.05031K wps
[Epoch 113 Batch 120/172] avg loss 0.00369283, throughput 3.3575K wps
[Epoch 113 Batch 150/172] avg loss 0.00401921, throughput 2.89071K wps
Begin Testing...
[Epoch 113] train avg loss 0.00362545, dev acc 0.8899, dev avg loss 0.288391, throughput 3.35871K wps
[Epoch 114 Batch 30/172] avg loss 0.00369446, throughput 3.48367K wps
[Epoch 114 Batch 60/172] avg loss 0.00361194, throughput 3.14104K wps
[Epoch 114 Batch 90/172] avg loss 0.00354948, throughput 3.28056K wps
[Epoch 114 Batch 120/172] avg loss 0.00410023, throughput 3.29867K wps
[Epoch 114 Batch 150/172] avg loss 0.00334949, throughput 2.96624K wps
Begin Testing...
[Epoch 114] train avg loss 0.00361279, dev acc 0.8910, dev avg loss 0.288528, throughput 3.22021K wps
[Epoch 115 Batch 30/172] avg loss 0.00350096, throughput 2.95262K wps
[Epoch 115 Batch 60/172] avg loss 0.00391426, throughput 3.09736K wps
[Epoch 115 Batch 90/172] avg loss 0.00324918, throughput 3.08761K wps
[Epoch 115 Batch 120/172] avg loss 0.0034039, throughput 3.1995K wps
[Epoch 115 Batch 150/172] avg loss 0.00340083, throughput 3.8345K wps
Begin Testing...
[Epoch 115] train avg loss 0.00359198, dev acc 0.8931, dev avg loss 0.288945, throughput 3.22192K wps
[Epoch 116 Batch 30/172] avg loss 0.00413534, throughput 3.13214K wps
[Epoch 116 Batch 60/172] avg loss 0.00328207, throughput 2.92455K wps
[Epoch 116 Batch 90/172] avg loss 0.00345271, throughput 3.06657K wps
[Epoch 116 Batch 120/172] avg loss 0.00340152, throughput 3.41482K wps
[Epoch 116 Batch 150/172] avg loss 0.00362597, throughput 3.19526K wps
Begin Testing...
[Epoch 116] train avg loss 0.00356315, dev acc 0.8910, dev avg loss 0.288131, throughput 3.17298K wps
[Epoch 117 Batch 30/172] avg loss 0.00353572, throughput 3.10083K wps
[Epoch 117 Batch 60/172] avg loss 0.00379434, throughput 3.55754K wps
[Epoch 117 Batch 90/172] avg loss 0.00344982, throughput 3.08722K wps
[Epoch 117 Batch 120/172] avg loss 0.00362188, throughput 3.33945K wps
[Epoch 117 Batch 150/172] avg loss 0.00287993, throughput 3.01372K wps
Begin Testing...
[Epoch 117] train avg loss 0.00352194, dev acc 0.8878, dev avg loss 0.288529, throughput 3.26475K wps
[Epoch 118 Batch 30/172] avg loss 0.00345636, throughput 3.65626K wps
[Epoch 118 Batch 60/172] avg loss 0.00338755, throughput 3.2669K wps
[Epoch 118 Batch 90/172] avg loss 0.0036132, throughput 3.08489K wps
[Epoch 118 Batch 120/172] avg loss 0.00384619, throughput 3.03343K wps
[Epoch 118 Batch 150/172] avg loss 0.00314906, throughput 2.88528K wps
Begin Testing...
[Epoch 118] train avg loss 0.00352576, dev acc 0.8931, dev avg loss 0.29037, throughput 3.13988K wps
[Epoch 119 Batch 30/172] avg loss 0.00318678, throughput 3.28382K wps
[Epoch 119 Batch 60/172] avg loss 0.0034678, throughput 3.49979K wps
[Epoch 119 Batch 90/172] avg loss 0.00373737, throughput 2.98617K wps
[Epoch 119 Batch 120/172] avg loss 0.00349494, throughput 2.9393K wps
[Epoch 119 Batch 150/172] avg loss 0.00367504, throughput 3.65145K wps
Begin Testing...
[Epoch 119] train avg loss 0.00345679, dev acc 0.8931, dev avg loss 0.291279, throughput 3.33682K wps
[Epoch 120 Batch 30/172] avg loss 0.00363508, throughput 3.19681K wps
[Epoch 120 Batch 60/172] avg loss 0.00346493, throughput 3.35846K wps
[Epoch 120 Batch 90/172] avg loss 0.00338688, throughput 3.03022K wps
[Epoch 120 Batch 120/172] avg loss 0.00316071, throughput 3.01822K wps
[Epoch 120 Batch 150/172] avg loss 0.00368931, throughput 3.63485K wps
Begin Testing...
[Epoch 120] train avg loss 0.00345079, dev acc 0.8910, dev avg loss 0.288804, throughput 3.18743K wps
[Epoch 121 Batch 30/172] avg loss 0.00368637, throughput 3.06876K wps
[Epoch 121 Batch 60/172] avg loss 0.0031764, throughput 3.20016K wps
[Epoch 121 Batch 90/172] avg loss 0.00318263, throughput 3.50598K wps
[Epoch 121 Batch 120/172] avg loss 0.00365394, throughput 4.21413K wps
[Epoch 121 Batch 150/172] avg loss 0.00373214, throughput 3.55319K wps
Begin Testing...
[Epoch 121] train avg loss 0.00345573, dev acc 0.8910, dev avg loss 0.288325, throughput 3.42645K wps
[Epoch 122 Batch 30/172] avg loss 0.00371073, throughput 3.1133K wps
[Epoch 122 Batch 60/172] avg loss 0.00327035, throughput 3.60627K wps
[Epoch 122 Batch 90/172] avg loss 0.00350882, throughput 3.43681K wps
[Epoch 122 Batch 120/172] avg loss 0.00356241, throughput 3.4574K wps
[Epoch 122 Batch 150/172] avg loss 0.00346496, throughput 3.38154K wps
Begin Testing...
[Epoch 122] train avg loss 0.00348763, dev acc 0.8910, dev avg loss 0.28849, throughput 3.44934K wps
[Epoch 123 Batch 30/172] avg loss 0.00363081, throughput 3.32759K wps
[Epoch 123 Batch 60/172] avg loss 0.00342188, throughput 3.15476K wps
[Epoch 123 Batch 90/172] avg loss 0.00372906, throughput 3.06609K wps
[Epoch 123 Batch 120/172] avg loss 0.00331765, throughput 3.42579K wps
[Epoch 123 Batch 150/172] avg loss 0.00334027, throughput 3.0951K wps
Begin Testing...
[Epoch 123] train avg loss 0.00344587, dev acc 0.8910, dev avg loss 0.290751, throughput 3.2586K wps
[Epoch 124 Batch 30/172] avg loss 0.00331652, throughput 3.57231K wps
[Epoch 124 Batch 60/172] avg loss 0.00335617, throughput 3.94628K wps
[Epoch 124 Batch 90/172] avg loss 0.00309092, throughput 3.66223K wps
[Epoch 124 Batch 120/172] avg loss 0.00345078, throughput 3.22796K wps
[Epoch 124 Batch 150/172] avg loss 0.0038707, throughput 2.99663K wps
Begin Testing...
[Epoch 124] train avg loss 0.00339564, dev acc 0.8910, dev avg loss 0.289, throughput 3.41561K wps
[Epoch 125 Batch 30/172] avg loss 0.00318407, throughput 3.186K wps
[Epoch 125 Batch 60/172] avg loss 0.00335453, throughput 3.38117K wps
[Epoch 125 Batch 90/172] avg loss 0.00372024, throughput 3.48733K wps
[Epoch 125 Batch 120/172] avg loss 0.00314013, throughput 3.25505K wps
[Epoch 125 Batch 150/172] avg loss 0.00348052, throughput 3.03714K wps
Begin Testing...
[Epoch 125] train avg loss 0.00339816, dev acc 0.8899, dev avg loss 0.288233, throughput 3.2977K wps
[Epoch 126 Batch 30/172] avg loss 0.0036362, throughput 3.52574K wps
[Epoch 126 Batch 60/172] avg loss 0.00321073, throughput 3.29878K wps
[Epoch 126 Batch 90/172] avg loss 0.00313675, throughput 3.06332K wps
[Epoch 126 Batch 120/172] avg loss 0.00306088, throughput 3.0899K wps
[Epoch 126 Batch 150/172] avg loss 0.00379439, throughput 3.53645K wps
Begin Testing...
[Epoch 126] train avg loss 0.00335858, dev acc 0.8910, dev avg loss 0.291297, throughput 3.36137K wps
[Epoch 127 Batch 30/172] avg loss 0.00330359, throughput 3.10251K wps
[Epoch 127 Batch 60/172] avg loss 0.00300024, throughput 3.26549K wps
[Epoch 127 Batch 90/172] avg loss 0.00359538, throughput 3.29858K wps
[Epoch 127 Batch 120/172] avg loss 0.00373544, throughput 3.23356K wps
[Epoch 127 Batch 150/172] avg loss 0.00343883, throughput 2.96389K wps
Begin Testing...
[Epoch 127] train avg loss 0.00338745, dev acc 0.8889, dev avg loss 0.289583, throughput 3.16572K wps
[Epoch 128 Batch 30/172] avg loss 0.00341072, throughput 3.13504K wps
[Epoch 128 Batch 60/172] avg loss 0.00351354, throughput 3.03528K wps
[Epoch 128 Batch 90/172] avg loss 0.0030596, throughput 3.35169K wps
[Epoch 128 Batch 120/172] avg loss 0.00374479, throughput 3.07997K wps
[Epoch 128 Batch 150/172] avg loss 0.00308436, throughput 3.78789K wps
Begin Testing...
[Epoch 128] train avg loss 0.00334085, dev acc 0.8899, dev avg loss 0.289658, throughput 3.2519K wps
[Epoch 129 Batch 30/172] avg loss 0.00319927, throughput 2.98377K wps
[Epoch 129 Batch 60/172] avg loss 0.00349544, throughput 2.96087K wps
[Epoch 129 Batch 90/172] avg loss 0.00315246, throughput 3.01574K wps
[Epoch 129 Batch 120/172] avg loss 0.00358489, throughput 3.13963K wps
[Epoch 129 Batch 150/172] avg loss 0.00325204, throughput 3.53344K wps
Begin Testing...
[Epoch 129] train avg loss 0.00337009, dev acc 0.8899, dev avg loss 0.289057, throughput 3.09026K wps
[Epoch 130 Batch 30/172] avg loss 0.00316335, throughput 3.71622K wps
[Epoch 130 Batch 60/172] avg loss 0.00325024, throughput 3.2368K wps
[Epoch 130 Batch 90/172] avg loss 0.00359476, throughput 3.16143K wps
[Epoch 130 Batch 120/172] avg loss 0.00310851, throughput 3.31506K wps
[Epoch 130 Batch 150/172] avg loss 0.00372226, throughput 3.54482K wps
Begin Testing...
[Epoch 130] train avg loss 0.00332556, dev acc 0.8889, dev avg loss 0.29112, throughput 3.33427K wps
[Epoch 131 Batch 30/172] avg loss 0.00314437, throughput 3.21832K wps
[Epoch 131 Batch 60/172] avg loss 0.00357296, throughput 3.20054K wps
[Epoch 131 Batch 90/172] avg loss 0.00326554, throughput 3.12621K wps
[Epoch 131 Batch 120/172] avg loss 0.00348722, throughput 3.03359K wps
[Epoch 131 Batch 150/172] avg loss 0.00308078, throughput 3.54484K wps
Begin Testing...
[Epoch 131] train avg loss 0.00331733, dev acc 0.8910, dev avg loss 0.289277, throughput 3.18599K wps
[Epoch 132 Batch 30/172] avg loss 0.00325831, throughput 3.58427K wps
[Epoch 132 Batch 60/172] avg loss 0.00309546, throughput 2.89914K wps
[Epoch 132 Batch 90/172] avg loss 0.00328082, throughput 2.91233K wps
[Epoch 132 Batch 120/172] avg loss 0.00297116, throughput 3.34741K wps
[Epoch 132 Batch 150/172] avg loss 0.00325304, throughput 2.9202K wps
Begin Testing...
[Epoch 132] train avg loss 0.00325156, dev acc 0.8910, dev avg loss 0.289464, throughput 3.11762K wps
[Epoch 133 Batch 30/172] avg loss 0.00331296, throughput 2.87439K wps
[Epoch 133 Batch 60/172] avg loss 0.00326705, throughput 3.50076K wps
[Epoch 133 Batch 90/172] avg loss 0.00313224, throughput 2.90576K wps
[Epoch 133 Batch 120/172] avg loss 0.00311789, throughput 3.20963K wps
[Epoch 133 Batch 150/172] avg loss 0.00345387, throughput 3.05074K wps
Begin Testing...
[Epoch 133] train avg loss 0.00327178, dev acc 0.8889, dev avg loss 0.290226, throughput 3.12125K wps
[Epoch 134 Batch 30/172] avg loss 0.00358193, throughput 3.48106K wps
[Epoch 134 Batch 60/172] avg loss 0.00318872, throughput 3.25122K wps
[Epoch 134 Batch 90/172] avg loss 0.00326773, throughput 3.30863K wps
[Epoch 134 Batch 120/172] avg loss 0.00282143, throughput 3.41311K wps
[Epoch 134 Batch 150/172] avg loss 0.00341337, throughput 3.40114K wps
Begin Testing...
[Epoch 134] train avg loss 0.00326303, dev acc 0.8920, dev avg loss 0.289317, throughput 3.298K wps
[Epoch 135 Batch 30/172] avg loss 0.00312977, throughput 2.95348K wps
[Epoch 135 Batch 60/172] avg loss 0.0037889, throughput 3.54462K wps
[Epoch 135 Batch 90/172] avg loss 0.00296941, throughput 3.25542K wps
[Epoch 135 Batch 120/172] avg loss 0.00326183, throughput 3.05083K wps
[Epoch 135 Batch 150/172] avg loss 0.00319186, throughput 3.35889K wps
Begin Testing...
[Epoch 135] train avg loss 0.00327173, dev acc 0.8899, dev avg loss 0.291775, throughput 3.21249K wps
[Epoch 136 Batch 30/172] avg loss 0.00323464, throughput 2.86758K wps
[Epoch 136 Batch 60/172] avg loss 0.00344211, throughput 3.27257K wps
[Epoch 136 Batch 90/172] avg loss 0.0032668, throughput 3.70081K wps
[Epoch 136 Batch 120/172] avg loss 0.00324199, throughput 3.24047K wps
[Epoch 136 Batch 150/172] avg loss 0.00315421, throughput 2.91751K wps
Begin Testing...
[Epoch 136] train avg loss 0.00327376, dev acc 0.8920, dev avg loss 0.289259, throughput 3.21715K wps
[Epoch 137 Batch 30/172] avg loss 0.00326354, throughput 3.29439K wps
[Epoch 137 Batch 60/172] avg loss 0.00326909, throughput 3.35054K wps
[Epoch 137 Batch 90/172] avg loss 0.00303402, throughput 3.03863K wps
[Epoch 137 Batch 120/172] avg loss 0.00352298, throughput 2.90155K wps
[Epoch 137 Batch 150/172] avg loss 0.00299489, throughput 3.53647K wps
Begin Testing...
[Epoch 137] train avg loss 0.00319292, dev acc 0.8910, dev avg loss 0.290211, throughput 3.24248K wps
[Epoch 138 Batch 30/172] avg loss 0.00284735, throughput 3.66363K wps
[Epoch 138 Batch 60/172] avg loss 0.00297179, throughput 3.16532K wps
[Epoch 138 Batch 90/172] avg loss 0.00313311, throughput 3.0215K wps
[Epoch 138 Batch 120/172] avg loss 0.00351036, throughput 3.53252K wps
[Epoch 138 Batch 150/172] avg loss 0.00316743, throughput 3.29205K wps
Begin Testing...
[Epoch 138] train avg loss 0.00316724, dev acc 0.8910, dev avg loss 0.290615, throughput 3.37584K wps
[Epoch 139 Batch 30/172] avg loss 0.00309648, throughput 3.42262K wps
[Epoch 139 Batch 60/172] avg loss 0.00319797, throughput 3.24744K wps
[Epoch 139 Batch 90/172] avg loss 0.00312413, throughput 3.67668K wps
[Epoch 139 Batch 120/172] avg loss 0.00331347, throughput 3.14446K wps
[Epoch 139 Batch 150/172] avg loss 0.00343506, throughput 2.95798K wps
Begin Testing...
[Epoch 139] train avg loss 0.0032283, dev acc 0.8931, dev avg loss 0.289442, throughput 3.27815K wps
[Epoch 140 Batch 30/172] avg loss 0.00325824, throughput 3.68911K wps
[Epoch 140 Batch 60/172] avg loss 0.00297454, throughput 3.30197K wps
[Epoch 140 Batch 90/172] avg loss 0.00321118, throughput 3.37165K wps
[Epoch 140 Batch 120/172] avg loss 0.0032978, throughput 3.47555K wps
[Epoch 140 Batch 150/172] avg loss 0.00307837, throughput 3.26646K wps
Begin Testing...
[Epoch 140] train avg loss 0.00312713, dev acc 0.8920, dev avg loss 0.290355, throughput 3.39538K wps
[Epoch 141 Batch 30/172] avg loss 0.00320625, throughput 3.50111K wps
[Epoch 141 Batch 60/172] avg loss 0.00310705, throughput 3.2781K wps
[Epoch 141 Batch 90/172] avg loss 0.00309474, throughput 3.61909K wps
[Epoch 141 Batch 120/172] avg loss 0.00311048, throughput 3.25327K wps
[Epoch 141 Batch 150/172] avg loss 0.00319666, throughput 3.32949K wps
Begin Testing...
[Epoch 141] train avg loss 0.00314825, dev acc 0.8910, dev avg loss 0.290299, throughput 3.36668K wps
[Epoch 142 Batch 30/172] avg loss 0.0034395, throughput 3.22168K wps
[Epoch 142 Batch 60/172] avg loss 0.00283715, throughput 3.41605K wps
[Epoch 142 Batch 90/172] avg loss 0.00323308, throughput 2.97043K wps
[Epoch 142 Batch 120/172] avg loss 0.00310377, throughput 3.12897K wps
[Epoch 142 Batch 150/172] avg loss 0.00279501, throughput 3.38525K wps
Begin Testing...
[Epoch 142] train avg loss 0.00308358, dev acc 0.8899, dev avg loss 0.291152, throughput 3.1778K wps
[Epoch 143 Batch 30/172] avg loss 0.00337836, throughput 2.94977K wps
[Epoch 143 Batch 60/172] avg loss 0.00277336, throughput 2.94651K wps
[Epoch 143 Batch 90/172] avg loss 0.00295286, throughput 3.419K wps
[Epoch 143 Batch 120/172] avg loss 0.0031003, throughput 2.92932K wps
[Epoch 143 Batch 150/172] avg loss 0.00325049, throughput 2.90163K wps
Begin Testing...
[Epoch 143] train avg loss 0.00312015, dev acc 0.8889, dev avg loss 0.291236, throughput 3.0131K wps
[Epoch 144 Batch 30/172] avg loss 0.00322647, throughput 3.60347K wps
[Epoch 144 Batch 60/172] avg loss 0.00310547, throughput 3.23616K wps
[Epoch 144 Batch 90/172] avg loss 0.00286696, throughput 3.2633K wps
[Epoch 144 Batch 120/172] avg loss 0.00330534, throughput 3.35433K wps
[Epoch 144 Batch 150/172] avg loss 0.00293739, throughput 3.11345K wps
Begin Testing...
[Epoch 144] train avg loss 0.0030973, dev acc 0.8910, dev avg loss 0.290433, throughput 3.26425K wps
[Epoch 145 Batch 30/172] avg loss 0.00311704, throughput 2.98154K wps
[Epoch 145 Batch 60/172] avg loss 0.0030117, throughput 3.24934K wps
[Epoch 145 Batch 90/172] avg loss 0.00266867, throughput 3.67798K wps
[Epoch 145 Batch 120/172] avg loss 0.00307894, throughput 3.07808K wps
[Epoch 145 Batch 150/172] avg loss 0.00320724, throughput 3.50954K wps
Begin Testing...
[Epoch 145] train avg loss 0.00306991, dev acc 0.8889, dev avg loss 0.292296, throughput 3.2384K wps
[Epoch 146 Batch 30/172] avg loss 0.00299226, throughput 3.39626K wps
[Epoch 146 Batch 60/172] avg loss 0.00326786, throughput 3.67792K wps
[Epoch 146 Batch 90/172] avg loss 0.00256652, throughput 3.07139K wps
[Epoch 146 Batch 120/172] avg loss 0.00330459, throughput 3.32565K wps
[Epoch 146 Batch 150/172] avg loss 0.00301008, throughput 3.6268K wps
Begin Testing...
[Epoch 146] train avg loss 0.00303765, dev acc 0.8899, dev avg loss 0.291706, throughput 3.43142K wps
[Epoch 147 Batch 30/172] avg loss 0.00309433, throughput 3.31896K wps
[Epoch 147 Batch 60/172] avg loss 0.00314632, throughput 3.29368K wps
[Epoch 147 Batch 90/172] avg loss 0.00299306, throughput 3.72183K wps
[Epoch 147 Batch 120/172] avg loss 0.00267962, throughput 3.13067K wps
[Epoch 147 Batch 150/172] avg loss 0.00316444, throughput 3.14283K wps
Begin Testing...
[Epoch 147] train avg loss 0.0030093, dev acc 0.8899, dev avg loss 0.291461, throughput 3.35201K wps
[Epoch 148 Batch 30/172] avg loss 0.0032827, throughput 3.3544K wps
[Epoch 148 Batch 60/172] avg loss 0.003123, throughput 3.28163K wps
[Epoch 148 Batch 90/172] avg loss 0.00283055, throughput 3.40732K wps
[Epoch 148 Batch 120/172] avg loss 0.0028578, throughput 3.49467K wps
[Epoch 148 Batch 150/172] avg loss 0.00308969, throughput 3.6377K wps
Begin Testing...
[Epoch 148] train avg loss 0.00305443, dev acc 0.8889, dev avg loss 0.292699, throughput 3.46823K wps
[Epoch 149 Batch 30/172] avg loss 0.00321015, throughput 3.53737K wps
[Epoch 149 Batch 60/172] avg loss 0.00284152, throughput 3.22352K wps
[Epoch 149 Batch 90/172] avg loss 0.00287635, throughput 3.32353K wps
[Epoch 149 Batch 120/172] avg loss 0.00293728, throughput 3.4522K wps
[Epoch 149 Batch 150/172] avg loss 0.00317717, throughput 3.07516K wps
Begin Testing...
[Epoch 149] train avg loss 0.00303832, dev acc 0.8899, dev avg loss 0.291466, throughput 3.35932K wps
[Epoch 150 Batch 30/172] avg loss 0.00317808, throughput 2.96678K wps
[Epoch 150 Batch 60/172] avg loss 0.00307452, throughput 3.89948K wps
[Epoch 150 Batch 90/172] avg loss 0.00291867, throughput 4.06657K wps
[Epoch 150 Batch 120/172] avg loss 0.00302669, throughput 3.24278K wps
[Epoch 150 Batch 150/172] avg loss 0.00274182, throughput 3.53051K wps
Begin Testing...
[Epoch 150] train avg loss 0.00301111, dev acc 0.8920, dev avg loss 0.291981, throughput 3.52294K wps
[Epoch 151 Batch 30/172] avg loss 0.00339581, throughput 4.04078K wps
[Epoch 151 Batch 60/172] avg loss 0.00285335, throughput 3.14361K wps
[Epoch 151 Batch 90/172] avg loss 0.00268217, throughput 3.07896K wps
[Epoch 151 Batch 120/172] avg loss 0.0027933, throughput 2.9889K wps
[Epoch 151 Batch 150/172] avg loss 0.00295101, throughput 3.53228K wps
Begin Testing...
[Epoch 151] train avg loss 0.00295571, dev acc 0.8920, dev avg loss 0.292217, throughput 3.28825K wps
[Epoch 152 Batch 30/172] avg loss 0.00295356, throughput 3.14323K wps
[Epoch 152 Batch 60/172] avg loss 0.00302608, throughput 3.23592K wps
[Epoch 152 Batch 90/172] avg loss 0.00301718, throughput 3.35823K wps
[Epoch 152 Batch 120/172] avg loss 0.00295168, throughput 3.1201K wps
[Epoch 152 Batch 150/172] avg loss 0.00334856, throughput 3.14499K wps
Begin Testing...
[Epoch 152] train avg loss 0.00304861, dev acc 0.8899, dev avg loss 0.292437, throughput 3.2887K wps
[Epoch 153 Batch 30/172] avg loss 0.0030581, throughput 3.64842K wps
[Epoch 153 Batch 60/172] avg loss 0.00289675, throughput 3.01395K wps
[Epoch 153 Batch 90/172] avg loss 0.00298986, throughput 3.28081K wps
[Epoch 153 Batch 120/172] avg loss 0.00308858, throughput 3.72225K wps
[Epoch 153 Batch 150/172] avg loss 0.00327312, throughput 3.36157K wps
Begin Testing...
[Epoch 153] train avg loss 0.00301314, dev acc 0.8931, dev avg loss 0.293471, throughput 3.35797K wps
[Epoch 154 Batch 30/172] avg loss 0.00266955, throughput 3.51002K wps
[Epoch 154 Batch 60/172] avg loss 0.00285542, throughput 3.17907K wps
[Epoch 154 Batch 90/172] avg loss 0.00319862, throughput 2.92501K wps
[Epoch 154 Batch 120/172] avg loss 0.00276641, throughput 3.12804K wps
[Epoch 154 Batch 150/172] avg loss 0.00325079, throughput 2.93737K wps
Begin Testing...
[Epoch 154] train avg loss 0.00299972, dev acc 0.8899, dev avg loss 0.292084, throughput 3.12061K wps
[Epoch 155 Batch 30/172] avg loss 0.00308295, throughput 2.88997K wps
[Epoch 155 Batch 60/172] avg loss 0.00287574, throughput 3.00438K wps
[Epoch 155 Batch 90/172] avg loss 0.00300968, throughput 3.28822K wps
[Epoch 155 Batch 120/172] avg loss 0.00261484, throughput 3.06788K wps
[Epoch 155 Batch 150/172] avg loss 0.00326494, throughput 2.96123K wps
Begin Testing...
[Epoch 155] train avg loss 0.00294681, dev acc 0.8899, dev avg loss 0.292648, throughput 3.07936K wps
[Epoch 156 Batch 30/172] avg loss 0.00318373, throughput 3.55552K wps
[Epoch 156 Batch 60/172] avg loss 0.00283278, throughput 2.9157K wps
[Epoch 156 Batch 90/172] avg loss 0.00315793, throughput 3.55764K wps
[Epoch 156 Batch 120/172] avg loss 0.00285444, throughput 3.06547K wps
[Epoch 156 Batch 150/172] avg loss 0.00266357, throughput 2.9022K wps
Begin Testing...
[Epoch 156] train avg loss 0.00295725, dev acc 0.8899, dev avg loss 0.29216, throughput 3.21929K wps
[Epoch 157 Batch 30/172] avg loss 0.00283183, throughput 3.10399K wps
[Epoch 157 Batch 60/172] avg loss 0.00312352, throughput 4.23185K wps
[Epoch 157 Batch 90/172] avg loss 0.00272667, throughput 2.89914K wps
[Epoch 157 Batch 120/172] avg loss 0.00297088, throughput 3.12218K wps
[Epoch 157 Batch 150/172] avg loss 0.0028345, throughput 3.59308K wps
Begin Testing...
[Epoch 157] train avg loss 0.00295476, dev acc 0.8910, dev avg loss 0.292349, throughput 3.27247K wps
[Epoch 158 Batch 30/172] avg loss 0.00312136, throughput 2.85871K wps
[Epoch 158 Batch 60/172] avg loss 0.0026299, throughput 3.41261K wps
[Epoch 158 Batch 90/172] avg loss 0.00284862, throughput 3.11157K wps
[Epoch 158 Batch 120/172] avg loss 0.00285369, throughput 3.26537K wps
[Epoch 158 Batch 150/172] avg loss 0.00314286, throughput 3.21123K wps
Begin Testing...
[Epoch 158] train avg loss 0.0029354, dev acc 0.8899, dev avg loss 0.293355, throughput 3.12449K wps
[Epoch 159 Batch 30/172] avg loss 0.00271107, throughput 3.31808K wps
[Epoch 159 Batch 60/172] avg loss 0.00274376, throughput 3.3659K wps
[Epoch 159 Batch 90/172] avg loss 0.00311421, throughput 2.91556K wps
[Epoch 159 Batch 120/172] avg loss 0.00299871, throughput 2.91621K wps
[Epoch 159 Batch 150/172] avg loss 0.00253182, throughput 3.75089K wps
Begin Testing...
[Epoch 159] train avg loss 0.00286866, dev acc 0.8920, dev avg loss 0.293604, throughput 3.24303K wps
[Epoch 160 Batch 30/172] avg loss 0.00308377, throughput 3.28133K wps
[Epoch 160 Batch 60/172] avg loss 0.00279056, throughput 3.18663K wps
[Epoch 160 Batch 90/172] avg loss 0.00251737, throughput 3.54788K wps
[Epoch 160 Batch 120/172] avg loss 0.00347043, throughput 3.15142K wps
[Epoch 160 Batch 150/172] avg loss 0.00288413, throughput 3.49062K wps
Begin Testing...
[Epoch 160] train avg loss 0.00292504, dev acc 0.8910, dev avg loss 0.294365, throughput 3.29597K wps
[Epoch 161 Batch 30/172] avg loss 0.00263732, throughput 3.1819K wps
[Epoch 161 Batch 60/172] avg loss 0.00263023, throughput 2.93758K wps
[Epoch 161 Batch 90/172] avg loss 0.00285946, throughput 3.12224K wps
[Epoch 161 Batch 120/172] avg loss 0.00329394, throughput 3.37832K wps
[Epoch 161 Batch 150/172] avg loss 0.00294131, throughput 3.24567K wps
Begin Testing...
[Epoch 161] train avg loss 0.00283032, dev acc 0.8899, dev avg loss 0.293539, throughput 3.12957K wps
[Epoch 162 Batch 30/172] avg loss 0.00288744, throughput 3.65233K wps
[Epoch 162 Batch 60/172] avg loss 0.00293193, throughput 2.99006K wps
[Epoch 162 Batch 90/172] avg loss 0.00319725, throughput 4.29917K wps
[Epoch 162 Batch 120/172] avg loss 0.00266984, throughput 3.27365K wps
[Epoch 162 Batch 150/172] avg loss 0.00279628, throughput 2.89689K wps
Begin Testing...
[Epoch 162] train avg loss 0.00288238, dev acc 0.8910, dev avg loss 0.293805, throughput 3.28543K wps
[Epoch 163 Batch 30/172] avg loss 0.00263141, throughput 3.19723K wps
[Epoch 163 Batch 60/172] avg loss 0.00290293, throughput 2.95713K wps
[Epoch 163 Batch 90/172] avg loss 0.00245708, throughput 3.58796K wps
[Epoch 163 Batch 120/172] avg loss 0.00318131, throughput 3.48106K wps
[Epoch 163 Batch 150/172] avg loss 0.00290785, throughput 3.27492K wps
Begin Testing...
[Epoch 163] train avg loss 0.00281367, dev acc 0.8899, dev avg loss 0.293394, throughput 3.28621K wps
[Epoch 164 Batch 30/172] avg loss 0.00269554, throughput 2.88326K wps
[Epoch 164 Batch 60/172] avg loss 0.00269107, throughput 3.39199K wps
[Epoch 164 Batch 90/172] avg loss 0.0025846, throughput 2.93301K wps
[Epoch 164 Batch 120/172] avg loss 0.00328373, throughput 3.27829K wps
[Epoch 164 Batch 150/172] avg loss 0.00288613, throughput 3.49334K wps
Begin Testing...
[Epoch 164] train avg loss 0.00281133, dev acc 0.8910, dev avg loss 0.293899, throughput 3.19717K wps
[Epoch 165 Batch 30/172] avg loss 0.00257807, throughput 3.12216K wps
[Epoch 165 Batch 60/172] avg loss 0.00290161, throughput 3.91743K wps
[Epoch 165 Batch 90/172] avg loss 0.00261715, throughput 3.04305K wps
[Epoch 165 Batch 120/172] avg loss 0.00271715, throughput 2.98927K wps
[Epoch 165 Batch 150/172] avg loss 0.00305601, throughput 3.10768K wps
Begin Testing...
[Epoch 165] train avg loss 0.00281403, dev acc 0.8910, dev avg loss 0.293813, throughput 3.19845K wps
[Epoch 166 Batch 30/172] avg loss 0.00282884, throughput 3.18938K wps
[Epoch 166 Batch 60/172] avg loss 0.00307284, throughput 3.04847K wps
[Epoch 166 Batch 90/172] avg loss 0.00253494, throughput 2.97244K wps
[Epoch 166 Batch 120/172] avg loss 0.00290433, throughput 3.2249K wps
[Epoch 166 Batch 150/172] avg loss 0.00278593, throughput 3.6331K wps
Begin Testing...
[Epoch 166] train avg loss 0.00278765, dev acc 0.8910, dev avg loss 0.294325, throughput 3.21325K wps
[Epoch 167 Batch 30/172] avg loss 0.00278197, throughput 3.97212K wps
[Epoch 167 Batch 60/172] avg loss 0.0030124, throughput 2.95656K wps
[Epoch 167 Batch 90/172] avg loss 0.00272877, throughput 2.89303K wps
[Epoch 167 Batch 120/172] avg loss 0.00287364, throughput 3.57502K wps
[Epoch 167 Batch 150/172] avg loss 0.0028844, throughput 3.50488K wps
Begin Testing...
[Epoch 167] train avg loss 0.00280889, dev acc 0.8920, dev avg loss 0.297759, throughput 3.3733K wps
[Epoch 168 Batch 30/172] avg loss 0.00292751, throughput 2.99969K wps
[Epoch 168 Batch 60/172] avg loss 0.00243488, throughput 3.43207K wps
[Epoch 168 Batch 90/172] avg loss 0.00267188, throughput 3.46389K wps
[Epoch 168 Batch 120/172] avg loss 0.003017, throughput 3.84243K wps
[Epoch 168 Batch 150/172] avg loss 0.00313536, throughput 3.19116K wps
Begin Testing...
[Epoch 168] train avg loss 0.00278391, dev acc 0.8899, dev avg loss 0.294544, throughput 3.33381K wps
[Epoch 169 Batch 30/172] avg loss 0.00290737, throughput 3.13127K wps
[Epoch 169 Batch 60/172] avg loss 0.00295048, throughput 3.53691K wps
[Epoch 169 Batch 90/172] avg loss 0.00282558, throughput 3.61441K wps
[Epoch 169 Batch 120/172] avg loss 0.00267688, throughput 3.36952K wps
[Epoch 169 Batch 150/172] avg loss 0.00275968, throughput 3.28972K wps
Begin Testing...
[Epoch 169] train avg loss 0.00278626, dev acc 0.8920, dev avg loss 0.294422, throughput 3.34673K wps
[Epoch 170 Batch 30/172] avg loss 0.00251881, throughput 3.21053K wps
[Epoch 170 Batch 60/172] avg loss 0.00275228, throughput 2.97344K wps
[Epoch 170 Batch 90/172] avg loss 0.00282915, throughput 3.46879K wps
[Epoch 170 Batch 120/172] avg loss 0.00314482, throughput 4.00031K wps
[Epoch 170 Batch 150/172] avg loss 0.00282164, throughput 2.94442K wps
Begin Testing...
[Epoch 170] train avg loss 0.0028182, dev acc 0.8910, dev avg loss 0.295253, throughput 3.21801K wps
[Epoch 171 Batch 30/172] avg loss 0.00284667, throughput 3.02759K wps
[Epoch 171 Batch 60/172] avg loss 0.00248564, throughput 2.94641K wps
[Epoch 171 Batch 90/172] avg loss 0.002959, throughput 3.49458K wps
[Epoch 171 Batch 120/172] avg loss 0.00297266, throughput 3.06553K wps
[Epoch 171 Batch 150/172] avg loss 0.00284402, throughput 2.91724K wps
Begin Testing...
[Epoch 171] train avg loss 0.00283821, dev acc 0.8910, dev avg loss 0.293925, throughput 3.06676K wps
[Epoch 172 Batch 30/172] avg loss 0.00307228, throughput 3.63801K wps
[Epoch 172 Batch 60/172] avg loss 0.00314168, throughput 3.9016K wps
[Epoch 172 Batch 90/172] avg loss 0.00268749, throughput 2.9646K wps
[Epoch 172 Batch 120/172] avg loss 0.00240147, throughput 3.39904K wps
[Epoch 172 Batch 150/172] avg loss 0.00258137, throughput 3.23385K wps
Begin Testing...
[Epoch 172] train avg loss 0.0027318, dev acc 0.8920, dev avg loss 0.298744, throughput 3.37246K wps
[Epoch 173 Batch 30/172] avg loss 0.00302416, throughput 3.16207K wps
[Epoch 173 Batch 60/172] avg loss 0.00272155, throughput 3.06335K wps
[Epoch 173 Batch 90/172] avg loss 0.00263447, throughput 3.16561K wps
[Epoch 173 Batch 120/172] avg loss 0.00260407, throughput 3.84861K wps
[Epoch 173 Batch 150/172] avg loss 0.00267569, throughput 3.94722K wps
Begin Testing...
[Epoch 173] train avg loss 0.00275574, dev acc 0.8920, dev avg loss 0.298766, throughput 3.3651K wps
[Epoch 174 Batch 30/172] avg loss 0.00301485, throughput 3.30169K wps
[Epoch 174 Batch 60/172] avg loss 0.00286091, throughput 3.00391K wps
[Epoch 174 Batch 90/172] avg loss 0.00239318, throughput 2.95098K wps
[Epoch 174 Batch 120/172] avg loss 0.00266045, throughput 2.96664K wps
[Epoch 174 Batch 150/172] avg loss 0.0028554, throughput 3.16739K wps
Begin Testing...
[Epoch 174] train avg loss 0.00270822, dev acc 0.8910, dev avg loss 0.295883, throughput 3.09305K wps
[Epoch 175 Batch 30/172] avg loss 0.00289208, throughput 3.10115K wps
[Epoch 175 Batch 60/172] avg loss 0.00250703, throughput 3.35115K wps
[Epoch 175 Batch 90/172] avg loss 0.00296, throughput 3.03237K wps
[Epoch 175 Batch 120/172] avg loss 0.00252593, throughput 3.23748K wps
[Epoch 175 Batch 150/172] avg loss 0.00273408, throughput 2.92486K wps
Begin Testing...
[Epoch 175] train avg loss 0.00269957, dev acc 0.8899, dev avg loss 0.300304, throughput 3.11121K wps
[Epoch 176 Batch 30/172] avg loss 0.00260725, throughput 3.25791K wps
[Epoch 176 Batch 60/172] avg loss 0.00253681, throughput 3.59621K wps
[Epoch 176 Batch 90/172] avg loss 0.00263771, throughput 2.93935K wps
[Epoch 176 Batch 120/172] avg loss 0.00290226, throughput 3.04311K wps
[Epoch 176 Batch 150/172] avg loss 0.00256718, throughput 3.07245K wps
Begin Testing...
[Epoch 176] train avg loss 0.00265451, dev acc 0.8910, dev avg loss 0.296531, throughput 3.22928K wps
[Epoch 177 Batch 30/172] avg loss 0.00257561, throughput 3.15605K wps
[Epoch 177 Batch 60/172] avg loss 0.00240491, throughput 3.08911K wps
[Epoch 177 Batch 90/172] avg loss 0.00250677, throughput 3.1446K wps
[Epoch 177 Batch 120/172] avg loss 0.00309183, throughput 3.25228K wps
[Epoch 177 Batch 150/172] avg loss 0.00285328, throughput 3.26311K wps
Begin Testing...
[Epoch 177] train avg loss 0.00267813, dev acc 0.8899, dev avg loss 0.297486, throughput 3.20061K wps
[Epoch 178 Batch 30/172] avg loss 0.00256015, throughput 3.86886K wps
[Epoch 178 Batch 60/172] avg loss 0.00271976, throughput 3.3895K wps
[Epoch 178 Batch 90/172] avg loss 0.00269212, throughput 3.79714K wps
[Epoch 178 Batch 120/172] avg loss 0.00287677, throughput 4.0523K wps
[Epoch 178 Batch 150/172] avg loss 0.00276961, throughput 3.22995K wps
Begin Testing...
[Epoch 178] train avg loss 0.00270548, dev acc 0.8899, dev avg loss 0.295625, throughput 3.57745K wps
[Epoch 179 Batch 30/172] avg loss 0.00269704, throughput 2.95797K wps
[Epoch 179 Batch 60/172] avg loss 0.00259405, throughput 3.04341K wps
[Epoch 179 Batch 90/172] avg loss 0.002871, throughput 2.95686K wps
[Epoch 179 Batch 120/172] avg loss 0.00278483, throughput 3.51891K wps
[Epoch 179 Batch 150/172] avg loss 0.00283577, throughput 3.83605K wps
Begin Testing...
[Epoch 179] train avg loss 0.00269691, dev acc 0.8910, dev avg loss 0.296544, throughput 3.32679K wps
[Epoch 180 Batch 30/172] avg loss 0.00243765, throughput 3.92461K wps
[Epoch 180 Batch 60/172] avg loss 0.00271557, throughput 4.09433K wps
[Epoch 180 Batch 90/172] avg loss 0.00292907, throughput 3.24393K wps
[Epoch 180 Batch 120/172] avg loss 0.0026773, throughput 3.48367K wps
[Epoch 180 Batch 150/172] avg loss 0.00230558, throughput 3.14822K wps
Begin Testing...
[Epoch 180] train avg loss 0.00262899, dev acc 0.8889, dev avg loss 0.296201, throughput 3.50672K wps
[Epoch 181 Batch 30/172] avg loss 0.00248577, throughput 3.17318K wps
[Epoch 181 Batch 60/172] avg loss 0.0025218, throughput 3.16469K wps
[Epoch 181 Batch 90/172] avg loss 0.00258134, throughput 3.07396K wps
[Epoch 181 Batch 120/172] avg loss 0.00266613, throughput 3.06677K wps
[Epoch 181 Batch 150/172] avg loss 0.00268424, throughput 3.61277K wps
Begin Testing...
[Epoch 181] train avg loss 0.00262404, dev acc 0.8910, dev avg loss 0.296323, throughput 3.27211K wps
[Epoch 182 Batch 30/172] avg loss 0.0027117, throughput 3.24283K wps
[Epoch 182 Batch 60/172] avg loss 0.00247698, throughput 3.47542K wps
[Epoch 182 Batch 90/172] avg loss 0.00249488, throughput 3.29062K wps
[Epoch 182 Batch 120/172] avg loss 0.00288238, throughput 3.00876K wps
[Epoch 182 Batch 150/172] avg loss 0.00264854, throughput 3.00668K wps
Begin Testing...
[Epoch 182] train avg loss 0.00262658, dev acc 0.8899, dev avg loss 0.29691, throughput 3.23026K wps
[Epoch 183 Batch 30/172] avg loss 0.00248164, throughput 3.49233K wps
[Epoch 183 Batch 60/172] avg loss 0.0024414, throughput 3.06965K wps
[Epoch 183 Batch 90/172] avg loss 0.00281739, throughput 2.91924K wps
[Epoch 183 Batch 120/172] avg loss 0.00257618, throughput 2.91144K wps
[Epoch 183 Batch 150/172] avg loss 0.00227971, throughput 3.01605K wps
Begin Testing...
[Epoch 183] train avg loss 0.00260519, dev acc 0.8931, dev avg loss 0.298876, throughput 3.08597K wps
[Epoch 184 Batch 30/172] avg loss 0.00249239, throughput 3.06213K wps
[Epoch 184 Batch 60/172] avg loss 0.00263471, throughput 3.31727K wps
[Epoch 184 Batch 90/172] avg loss 0.00244707, throughput 3.08105K wps
[Epoch 184 Batch 120/172] avg loss 0.00266697, throughput 3.6955K wps
[Epoch 184 Batch 150/172] avg loss 0.00269057, throughput 3.20447K wps
Begin Testing...
[Epoch 184] train avg loss 0.00260685, dev acc 0.8941, dev avg loss 0.296353, throughput 3.21807K wps
[Epoch 185 Batch 30/172] avg loss 0.00270711, throughput 2.9666K wps
[Epoch 185 Batch 60/172] avg loss 0.00245071, throughput 3.02164K wps
[Epoch 185 Batch 90/172] avg loss 0.00258655, throughput 3.23519K wps
[Epoch 185 Batch 120/172] avg loss 0.00265815, throughput 3.26707K wps
[Epoch 185 Batch 150/172] avg loss 0.00272763, throughput 3.24821K wps
Begin Testing...
[Epoch 185] train avg loss 0.00260199, dev acc 0.8941, dev avg loss 0.296939, throughput 3.18433K wps
[Epoch 186 Batch 30/172] avg loss 0.00282391, throughput 3.02103K wps
[Epoch 186 Batch 60/172] avg loss 0.00298241, throughput 3.18725K wps
[Epoch 186 Batch 90/172] avg loss 0.00245575, throughput 3.26565K wps
[Epoch 186 Batch 120/172] avg loss 0.00239998, throughput 3.00842K wps
[Epoch 186 Batch 150/172] avg loss 0.00195919, throughput 3.52521K wps
Begin Testing...
[Epoch 186] train avg loss 0.00252108, dev acc 0.8931, dev avg loss 0.300512, throughput 3.20313K wps
[Epoch 187 Batch 30/172] avg loss 0.00242485, throughput 3.27591K wps
[Epoch 187 Batch 60/172] avg loss 0.00267333, throughput 3.69036K wps
[Epoch 187 Batch 90/172] avg loss 0.0027352, throughput 2.98192K wps
[Epoch 187 Batch 120/172] avg loss 0.00248438, throughput 3.25284K wps
[Epoch 187 Batch 150/172] avg loss 0.0028435, throughput 3.46846K wps
Begin Testing...
[Epoch 187] train avg loss 0.00260065, dev acc 0.8931, dev avg loss 0.297826, throughput 3.29969K wps
[Epoch 188 Batch 30/172] avg loss 0.00237086, throughput 3.6485K wps
[Epoch 188 Batch 60/172] avg loss 0.00264989, throughput 3.27579K wps
[Epoch 188 Batch 90/172] avg loss 0.0024878, throughput 3.32364K wps
[Epoch 188 Batch 120/172] avg loss 0.00245124, throughput 3.52155K wps
[Epoch 188 Batch 150/172] avg loss 0.00234912, throughput 3.05039K wps
Begin Testing...
[Epoch 188] train avg loss 0.00251808, dev acc 0.8910, dev avg loss 0.303431, throughput 3.30852K wps
[Epoch 189 Batch 30/172] avg loss 0.00234412, throughput 3.52755K wps
[Epoch 189 Batch 60/172] avg loss 0.0024844, throughput 3.12029K wps
[Epoch 189 Batch 90/172] avg loss 0.00247368, throughput 3.31362K wps
[Epoch 189 Batch 120/172] avg loss 0.00278242, throughput 2.85745K wps
[Epoch 189 Batch 150/172] avg loss 0.00252584, throughput 3.34615K wps
Begin Testing...
[Epoch 189] train avg loss 0.00251581, dev acc 0.8941, dev avg loss 0.300438, throughput 3.17877K wps
[Epoch 190 Batch 30/172] avg loss 0.00240038, throughput 2.89513K wps
[Epoch 190 Batch 60/172] avg loss 0.00238822, throughput 3.09009K wps
[Epoch 190 Batch 90/172] avg loss 0.00233777, throughput 2.90945K wps
[Epoch 190 Batch 120/172] avg loss 0.0029507, throughput 3.18464K wps
[Epoch 190 Batch 150/172] avg loss 0.00248547, throughput 2.94032K wps
Begin Testing...
[Epoch 190] train avg loss 0.0024976, dev acc 0.8889, dev avg loss 0.297395, throughput 3.07082K wps
[Epoch 191 Batch 30/172] avg loss 0.00249262, throughput 3.50568K wps
[Epoch 191 Batch 60/172] avg loss 0.0023261, throughput 2.99533K wps
[Epoch 191 Batch 90/172] avg loss 0.0022611, throughput 3.27538K wps
[Epoch 191 Batch 120/172] avg loss 0.00249736, throughput 3.00266K wps
[Epoch 191 Batch 150/172] avg loss 0.00279966, throughput 3.62739K wps
Begin Testing...
[Epoch 191] train avg loss 0.0024803, dev acc 0.8920, dev avg loss 0.299272, throughput 3.26663K wps
[Epoch 192 Batch 30/172] avg loss 0.00231962, throughput 3.46287K wps
[Epoch 192 Batch 60/172] avg loss 0.00274206, throughput 3.17431K wps
[Epoch 192 Batch 90/172] avg loss 0.00221863, throughput 3.66846K wps
[Epoch 192 Batch 120/172] avg loss 0.00250718, throughput 3.16223K wps
[Epoch 192 Batch 150/172] avg loss 0.00264402, throughput 3.89041K wps
Begin Testing...
[Epoch 192] train avg loss 0.00249112, dev acc 0.8920, dev avg loss 0.300423, throughput 3.39837K wps
[Epoch 193 Batch 30/172] avg loss 0.00246641, throughput 2.96577K wps
[Epoch 193 Batch 60/172] avg loss 0.00246093, throughput 3.35355K wps
[Epoch 193 Batch 90/172] avg loss 0.00245235, throughput 3.27772K wps
[Epoch 193 Batch 120/172] avg loss 0.00248942, throughput 3.03733K wps
[Epoch 193 Batch 150/172] avg loss 0.00266336, throughput 3.6594K wps
Begin Testing...
[Epoch 193] train avg loss 0.00249367, dev acc 0.8910, dev avg loss 0.297774, throughput 3.211K wps
[Epoch 194 Batch 30/172] avg loss 0.00217138, throughput 3.22101K wps
[Epoch 194 Batch 60/172] avg loss 0.0026463, throughput 3.36893K wps
[Epoch 194 Batch 90/172] avg loss 0.00257419, throughput 3.18644K wps
[Epoch 194 Batch 120/172] avg loss 0.00248401, throughput 3.25369K wps
[Epoch 194 Batch 150/172] avg loss 0.00255222, throughput 3.94875K wps
Begin Testing...
[Epoch 194] train avg loss 0.00250226, dev acc 0.8910, dev avg loss 0.301828, throughput 3.38423K wps
[Epoch 195 Batch 30/172] avg loss 0.00293057, throughput 2.89603K wps
[Epoch 195 Batch 60/172] avg loss 0.00246967, throughput 3.25986K wps
[Epoch 195 Batch 90/172] avg loss 0.00242639, throughput 3.07963K wps
[Epoch 195 Batch 120/172] avg loss 0.00230677, throughput 3.10687K wps
[Epoch 195 Batch 150/172] avg loss 0.0023421, throughput 3.00596K wps
Begin Testing...
[Epoch 195] train avg loss 0.00248857, dev acc 0.8910, dev avg loss 0.299809, throughput 3.03916K wps
[Epoch 196 Batch 30/172] avg loss 0.00230136, throughput 3.21603K wps
[Epoch 196 Batch 60/172] avg loss 0.00281014, throughput 3.5887K wps
[Epoch 196 Batch 90/172] avg loss 0.00252801, throughput 3.10233K wps
[Epoch 196 Batch 120/172] avg loss 0.00248179, throughput 3.10339K wps
[Epoch 196 Batch 150/172] avg loss 0.00211283, throughput 2.96246K wps
Begin Testing...
[Epoch 196] train avg loss 0.00242902, dev acc 0.8931, dev avg loss 0.299584, throughput 3.19454K wps
[Epoch 197 Batch 30/172] avg loss 0.00262413, throughput 3.48726K wps
[Epoch 197 Batch 60/172] avg loss 0.00228802, throughput 3.30937K wps
[Epoch 197 Batch 90/172] avg loss 0.00251026, throughput 2.96941K wps
[Epoch 197 Batch 120/172] avg loss 0.00248218, throughput 3.01242K wps
[Epoch 197 Batch 150/172] avg loss 0.0024172, throughput 3.12017K wps
Begin Testing...
[Epoch 197] train avg loss 0.00243685, dev acc 0.8931, dev avg loss 0.300463, throughput 3.21417K wps
[Epoch 198 Batch 30/172] avg loss 0.00262292, throughput 3.0474K wps
[Epoch 198 Batch 60/172] avg loss 0.00261099, throughput 3.15333K wps
[Epoch 198 Batch 90/172] avg loss 0.00218366, throughput 3.30387K wps
[Epoch 198 Batch 120/172] avg loss 0.00252439, throughput 3.47563K wps
[Epoch 198 Batch 150/172] avg loss 0.00246666, throughput 3.00278K wps
Begin Testing...
[Epoch 198] train avg loss 0.00247386, dev acc 0.8931, dev avg loss 0.301131, throughput 3.17499K wps
[Epoch 199 Batch 30/172] avg loss 0.00261662, throughput 3.09712K wps
[Epoch 199 Batch 60/172] avg loss 0.00252633, throughput 3.41778K wps
[Epoch 199 Batch 90/172] avg loss 0.00211397, throughput 3.70489K wps
[Epoch 199 Batch 120/172] avg loss 0.00277627, throughput 3.22294K wps
[Epoch 199 Batch 150/172] avg loss 0.00215792, throughput 3.11013K wps
Begin Testing...
[Epoch 199] train avg loss 0.00243091, dev acc 0.8878, dev avg loss 0.299206, throughput 3.28614K wps
Test loss 0.248095, test acc 0.8981
Total time cost 407.97s
[Epoch 0 Batch 30/172] avg loss 0.0131428, throughput 3.08623K wps
[Epoch 0 Batch 60/172] avg loss 0.0124763, throughput 3.49416K wps
[Epoch 0 Batch 90/172] avg loss 0.0123138, throughput 3.24341K wps
[Epoch 0 Batch 120/172] avg loss 0.0120415, throughput 2.98226K wps
[Epoch 0 Batch 150/172] avg loss 0.0118927, throughput 3.21373K wps
Begin Testing...
[Epoch 0] train avg loss 0.0123915, dev acc 0.7044, dev avg loss 0.590419, throughput 3.22265K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.011997, throughput 3.36457K wps
[Epoch 1 Batch 60/172] avg loss 0.0119977, throughput 2.89655K wps
[Epoch 1 Batch 90/172] avg loss 0.0119651, throughput 2.96021K wps
[Epoch 1 Batch 120/172] avg loss 0.0118177, throughput 3.3736K wps
[Epoch 1 Batch 150/172] avg loss 0.0120887, throughput 3.35866K wps
Begin Testing...
[Epoch 1] train avg loss 0.012029, dev acc 0.7044, dev avg loss 0.580617, throughput 3.16863K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0118603, throughput 3.23734K wps
[Epoch 2 Batch 60/172] avg loss 0.0118252, throughput 3.11717K wps
[Epoch 2 Batch 90/172] avg loss 0.0115846, throughput 3.49462K wps
[Epoch 2 Batch 120/172] avg loss 0.0116833, throughput 2.91054K wps
[Epoch 2 Batch 150/172] avg loss 0.0116544, throughput 3.24747K wps
Begin Testing...
[Epoch 2] train avg loss 0.0117584, dev acc 0.7044, dev avg loss 0.565489, throughput 3.18143K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0118641, throughput 3.01258K wps
[Epoch 3 Batch 60/172] avg loss 0.0113308, throughput 3.07072K wps
[Epoch 3 Batch 90/172] avg loss 0.0114321, throughput 3.17722K wps
[Epoch 3 Batch 120/172] avg loss 0.0112772, throughput 3.20628K wps
[Epoch 3 Batch 150/172] avg loss 0.0111246, throughput 3.47298K wps
Begin Testing...
[Epoch 3] train avg loss 0.011431, dev acc 0.7044, dev avg loss 0.548799, throughput 3.18827K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0114485, throughput 3.03509K wps
[Epoch 4 Batch 60/172] avg loss 0.011088, throughput 3.16863K wps
[Epoch 4 Batch 90/172] avg loss 0.0109684, throughput 3.34506K wps
[Epoch 4 Batch 120/172] avg loss 0.0109612, throughput 3.02195K wps
[Epoch 4 Batch 150/172] avg loss 0.011054, throughput 3.4119K wps
Begin Testing...
[Epoch 4] train avg loss 0.0110775, dev acc 0.7149, dev avg loss 0.532043, throughput 3.15726K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0109185, throughput 2.98112K wps
[Epoch 5 Batch 60/172] avg loss 0.0108649, throughput 3.49155K wps
[Epoch 5 Batch 90/172] avg loss 0.0109532, throughput 3.03645K wps
[Epoch 5 Batch 120/172] avg loss 0.0105741, throughput 3.0677K wps
[Epoch 5 Batch 150/172] avg loss 0.0103332, throughput 3.22942K wps
Begin Testing...
[Epoch 5] train avg loss 0.0107121, dev acc 0.7338, dev avg loss 0.512116, throughput 3.1897K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.0103806, throughput 2.98687K wps
[Epoch 6 Batch 60/172] avg loss 0.0105165, throughput 3.05742K wps
[Epoch 6 Batch 90/172] avg loss 0.0102628, throughput 3.06382K wps
[Epoch 6 Batch 120/172] avg loss 0.0100482, throughput 3.46841K wps
[Epoch 6 Batch 150/172] avg loss 0.0101476, throughput 3.39929K wps
Begin Testing...
[Epoch 6] train avg loss 0.0102398, dev acc 0.7715, dev avg loss 0.490342, throughput 3.2003K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.00979837, throughput 3.42385K wps
[Epoch 7 Batch 60/172] avg loss 0.00984624, throughput 3.56092K wps
[Epoch 7 Batch 90/172] avg loss 0.0099196, throughput 3.57677K wps
[Epoch 7 Batch 120/172] avg loss 0.00977563, throughput 2.9121K wps
[Epoch 7 Batch 150/172] avg loss 0.00950556, throughput 2.92458K wps
Begin Testing...
[Epoch 7] train avg loss 0.0097611, dev acc 0.8029, dev avg loss 0.469245, throughput 3.31135K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00928619, throughput 3.08066K wps
[Epoch 8 Batch 60/172] avg loss 0.00960423, throughput 3.24201K wps
[Epoch 8 Batch 90/172] avg loss 0.00935321, throughput 2.95578K wps
[Epoch 8 Batch 120/172] avg loss 0.00892197, throughput 3.01886K wps
[Epoch 8 Batch 150/172] avg loss 0.00920249, throughput 3.12961K wps
Begin Testing...
[Epoch 8] train avg loss 0.00926204, dev acc 0.8312, dev avg loss 0.446926, throughput 3.11742K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00892272, throughput 3.30656K wps
[Epoch 9 Batch 60/172] avg loss 0.00888036, throughput 3.03134K wps
[Epoch 9 Batch 90/172] avg loss 0.00902597, throughput 3.21198K wps
[Epoch 9 Batch 120/172] avg loss 0.00864465, throughput 3.76765K wps
[Epoch 9 Batch 150/172] avg loss 0.00858364, throughput 3.08886K wps
Begin Testing...
[Epoch 9] train avg loss 0.00880719, dev acc 0.8312, dev avg loss 0.423132, throughput 3.34708K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00841892, throughput 3.03964K wps
[Epoch 10 Batch 60/172] avg loss 0.00795692, throughput 3.93454K wps
[Epoch 10 Batch 90/172] avg loss 0.00848305, throughput 3.4744K wps
[Epoch 10 Batch 120/172] avg loss 0.00832743, throughput 3.20656K wps
[Epoch 10 Batch 150/172] avg loss 0.0083117, throughput 3.29918K wps
Begin Testing...
[Epoch 10] train avg loss 0.00831999, dev acc 0.8585, dev avg loss 0.405509, throughput 3.39361K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00796917, throughput 3.05132K wps
[Epoch 11 Batch 60/172] avg loss 0.00798281, throughput 3.71944K wps
[Epoch 11 Batch 90/172] avg loss 0.00780722, throughput 3.41255K wps
[Epoch 11 Batch 120/172] avg loss 0.00803876, throughput 3.18598K wps
[Epoch 11 Batch 150/172] avg loss 0.00776195, throughput 3.70404K wps
Begin Testing...
[Epoch 11] train avg loss 0.00793903, dev acc 0.8564, dev avg loss 0.387468, throughput 3.42871K wps
[Epoch 12 Batch 30/172] avg loss 0.00788529, throughput 3.57681K wps
[Epoch 12 Batch 60/172] avg loss 0.00793195, throughput 3.22148K wps
[Epoch 12 Batch 90/172] avg loss 0.00749271, throughput 3.06132K wps
[Epoch 12 Batch 120/172] avg loss 0.00736485, throughput 3.51087K wps
[Epoch 12 Batch 150/172] avg loss 0.00731105, throughput 3.4161K wps
Begin Testing...
[Epoch 12] train avg loss 0.00760636, dev acc 0.8564, dev avg loss 0.372565, throughput 3.36253K wps
[Epoch 13 Batch 30/172] avg loss 0.00760388, throughput 3.35374K wps
[Epoch 13 Batch 60/172] avg loss 0.00757828, throughput 3.11085K wps
[Epoch 13 Batch 90/172] avg loss 0.00755788, throughput 3.09853K wps
[Epoch 13 Batch 120/172] avg loss 0.00720234, throughput 3.73009K wps
[Epoch 13 Batch 150/172] avg loss 0.00711487, throughput 3.55949K wps
Begin Testing...
[Epoch 13] train avg loss 0.00732971, dev acc 0.8553, dev avg loss 0.360516, throughput 3.32896K wps
[Epoch 14 Batch 30/172] avg loss 0.00707279, throughput 2.97505K wps
[Epoch 14 Batch 60/172] avg loss 0.00700882, throughput 3.2881K wps
[Epoch 14 Batch 90/172] avg loss 0.0073682, throughput 3.15659K wps
[Epoch 14 Batch 120/172] avg loss 0.00704836, throughput 3.18022K wps
[Epoch 14 Batch 150/172] avg loss 0.00736453, throughput 3.2515K wps
Begin Testing...
[Epoch 14] train avg loss 0.00708505, dev acc 0.8595, dev avg loss 0.350207, throughput 3.23826K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.00682662, throughput 3.44971K wps
[Epoch 15 Batch 60/172] avg loss 0.00684694, throughput 3.43471K wps
[Epoch 15 Batch 90/172] avg loss 0.0065451, throughput 3.09646K wps
[Epoch 15 Batch 120/172] avg loss 0.0068602, throughput 3.1101K wps
[Epoch 15 Batch 150/172] avg loss 0.00698996, throughput 3.21791K wps
Begin Testing...
[Epoch 15] train avg loss 0.00686035, dev acc 0.8669, dev avg loss 0.342465, throughput 3.22181K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.00676052, throughput 3.01729K wps
[Epoch 16 Batch 60/172] avg loss 0.00686354, throughput 3.08951K wps
[Epoch 16 Batch 90/172] avg loss 0.00687649, throughput 2.99535K wps
[Epoch 16 Batch 120/172] avg loss 0.00642024, throughput 3.14086K wps
[Epoch 16 Batch 150/172] avg loss 0.00659092, throughput 3.09836K wps
Begin Testing...
[Epoch 16] train avg loss 0.00672168, dev acc 0.8637, dev avg loss 0.335935, throughput 3.08642K wps
[Epoch 17 Batch 30/172] avg loss 0.00667567, throughput 3.02573K wps
[Epoch 17 Batch 60/172] avg loss 0.00616392, throughput 3.5164K wps
[Epoch 17 Batch 90/172] avg loss 0.00635932, throughput 3.07298K wps
[Epoch 17 Batch 120/172] avg loss 0.00699217, throughput 3.46514K wps
[Epoch 17 Batch 150/172] avg loss 0.00640697, throughput 3.01268K wps
Begin Testing...
[Epoch 17] train avg loss 0.0065133, dev acc 0.8690, dev avg loss 0.330056, throughput 3.18015K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.00631902, throughput 3.95233K wps
[Epoch 18 Batch 60/172] avg loss 0.00616001, throughput 3.01857K wps
[Epoch 18 Batch 90/172] avg loss 0.006356, throughput 3.39652K wps
[Epoch 18 Batch 120/172] avg loss 0.0065353, throughput 3.10165K wps
[Epoch 18 Batch 150/172] avg loss 0.00628508, throughput 4.09832K wps
Begin Testing...
[Epoch 18] train avg loss 0.00636359, dev acc 0.8721, dev avg loss 0.325227, throughput 3.46411K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00639169, throughput 3.22216K wps
[Epoch 19 Batch 60/172] avg loss 0.00659259, throughput 3.0542K wps
[Epoch 19 Batch 90/172] avg loss 0.00631485, throughput 2.97687K wps
[Epoch 19 Batch 120/172] avg loss 0.00611719, throughput 3.19338K wps
[Epoch 19 Batch 150/172] avg loss 0.00604318, throughput 4.22228K wps
Begin Testing...
[Epoch 19] train avg loss 0.00626941, dev acc 0.8732, dev avg loss 0.321052, throughput 3.27204K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.00664816, throughput 2.98912K wps
[Epoch 20 Batch 60/172] avg loss 0.00626823, throughput 3.48725K wps
[Epoch 20 Batch 90/172] avg loss 0.00593203, throughput 3.09329K wps
[Epoch 20 Batch 120/172] avg loss 0.00595847, throughput 3.39882K wps
[Epoch 20 Batch 150/172] avg loss 0.00589022, throughput 3.40212K wps
Begin Testing...
[Epoch 20] train avg loss 0.00617125, dev acc 0.8721, dev avg loss 0.318, throughput 3.27877K wps
[Epoch 21 Batch 30/172] avg loss 0.00624515, throughput 3.44787K wps
[Epoch 21 Batch 60/172] avg loss 0.00591656, throughput 3.25031K wps
[Epoch 21 Batch 90/172] avg loss 0.00612125, throughput 3.4471K wps
[Epoch 21 Batch 120/172] avg loss 0.00626221, throughput 3.65029K wps
[Epoch 21 Batch 150/172] avg loss 0.00555277, throughput 3.13261K wps
Begin Testing...
[Epoch 21] train avg loss 0.00606315, dev acc 0.8732, dev avg loss 0.315248, throughput 3.32812K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.00556931, throughput 3.47417K wps
[Epoch 22 Batch 60/172] avg loss 0.00603174, throughput 3.05792K wps
[Epoch 22 Batch 90/172] avg loss 0.00607749, throughput 4.0281K wps
[Epoch 22 Batch 120/172] avg loss 0.00597692, throughput 3.51523K wps
[Epoch 22 Batch 150/172] avg loss 0.0062357, throughput 4.24822K wps
Begin Testing...
[Epoch 22] train avg loss 0.00596221, dev acc 0.8721, dev avg loss 0.312624, throughput 3.57944K wps
[Epoch 23 Batch 30/172] avg loss 0.00580337, throughput 3.79411K wps
[Epoch 23 Batch 60/172] avg loss 0.00597773, throughput 3.29444K wps
[Epoch 23 Batch 90/172] avg loss 0.00600664, throughput 2.936K wps
[Epoch 23 Batch 120/172] avg loss 0.00566232, throughput 3.38409K wps
[Epoch 23 Batch 150/172] avg loss 0.00586245, throughput 2.91931K wps
Begin Testing...
[Epoch 23] train avg loss 0.00590697, dev acc 0.8816, dev avg loss 0.311065, throughput 3.2401K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00585091, throughput 2.96852K wps
[Epoch 24 Batch 60/172] avg loss 0.00557172, throughput 3.48747K wps
[Epoch 24 Batch 90/172] avg loss 0.00583849, throughput 3.3087K wps
[Epoch 24 Batch 120/172] avg loss 0.00580619, throughput 2.9189K wps
[Epoch 24 Batch 150/172] avg loss 0.00606055, throughput 2.89702K wps
Begin Testing...
[Epoch 24] train avg loss 0.00580276, dev acc 0.8721, dev avg loss 0.308507, throughput 3.10154K wps
[Epoch 25 Batch 30/172] avg loss 0.00571676, throughput 3.11787K wps
[Epoch 25 Batch 60/172] avg loss 0.0060013, throughput 3.21756K wps
[Epoch 25 Batch 90/172] avg loss 0.0057224, throughput 2.9227K wps
[Epoch 25 Batch 120/172] avg loss 0.00603684, throughput 3.28413K wps
[Epoch 25 Batch 150/172] avg loss 0.00576968, throughput 3.53357K wps
Begin Testing...
[Epoch 25] train avg loss 0.00583841, dev acc 0.8805, dev avg loss 0.307514, throughput 3.22357K wps
[Epoch 26 Batch 30/172] avg loss 0.00548602, throughput 3.39374K wps
[Epoch 26 Batch 60/172] avg loss 0.00566034, throughput 3.8717K wps
[Epoch 26 Batch 90/172] avg loss 0.00607051, throughput 3.60073K wps
[Epoch 26 Batch 120/172] avg loss 0.00572695, throughput 3.5218K wps
[Epoch 26 Batch 150/172] avg loss 0.00585646, throughput 3.0226K wps
Begin Testing...
[Epoch 26] train avg loss 0.00572291, dev acc 0.8742, dev avg loss 0.305419, throughput 3.4062K wps
[Epoch 27 Batch 30/172] avg loss 0.00558961, throughput 3.08409K wps
[Epoch 27 Batch 60/172] avg loss 0.00545599, throughput 2.88663K wps
[Epoch 27 Batch 90/172] avg loss 0.00575598, throughput 3.09317K wps
[Epoch 27 Batch 120/172] avg loss 0.00601216, throughput 3.42905K wps
[Epoch 27 Batch 150/172] avg loss 0.00594678, throughput 3.24861K wps
Begin Testing...
[Epoch 27] train avg loss 0.00573342, dev acc 0.8763, dev avg loss 0.304006, throughput 3.18806K wps
[Epoch 28 Batch 30/172] avg loss 0.00545333, throughput 3.62108K wps
[Epoch 28 Batch 60/172] avg loss 0.00539429, throughput 3.18532K wps
[Epoch 28 Batch 90/172] avg loss 0.00580237, throughput 3.48029K wps
[Epoch 28 Batch 120/172] avg loss 0.00550063, throughput 3.50019K wps
[Epoch 28 Batch 150/172] avg loss 0.00532259, throughput 3.27096K wps
Begin Testing...
[Epoch 28] train avg loss 0.00556419, dev acc 0.8774, dev avg loss 0.302803, throughput 3.42546K wps
[Epoch 29 Batch 30/172] avg loss 0.0057608, throughput 3.50018K wps
[Epoch 29 Batch 60/172] avg loss 0.00559955, throughput 4.24249K wps
[Epoch 29 Batch 90/172] avg loss 0.00595788, throughput 3.91293K wps
[Epoch 29 Batch 120/172] avg loss 0.00547571, throughput 3.10704K wps
[Epoch 29 Batch 150/172] avg loss 0.0051236, throughput 3.41107K wps
Begin Testing...
[Epoch 29] train avg loss 0.0055558, dev acc 0.8774, dev avg loss 0.302494, throughput 3.61627K wps
[Epoch 30 Batch 30/172] avg loss 0.0055119, throughput 2.98212K wps
[Epoch 30 Batch 60/172] avg loss 0.00534082, throughput 3.2207K wps
[Epoch 30 Batch 90/172] avg loss 0.00555905, throughput 3.33367K wps
[Epoch 30 Batch 120/172] avg loss 0.00571938, throughput 3.7341K wps
[Epoch 30 Batch 150/172] avg loss 0.00515788, throughput 3.28717K wps
Begin Testing...
[Epoch 30] train avg loss 0.00546949, dev acc 0.8784, dev avg loss 0.30061, throughput 3.32388K wps
[Epoch 31 Batch 30/172] avg loss 0.00553872, throughput 3.59312K wps
[Epoch 31 Batch 60/172] avg loss 0.00557428, throughput 3.51035K wps
[Epoch 31 Batch 90/172] avg loss 0.00555359, throughput 3.33741K wps
[Epoch 31 Batch 120/172] avg loss 0.00564407, throughput 3.34971K wps
[Epoch 31 Batch 150/172] avg loss 0.00509429, throughput 4.01769K wps
Begin Testing...
[Epoch 31] train avg loss 0.00546533, dev acc 0.8774, dev avg loss 0.299732, throughput 3.55278K wps
[Epoch 32 Batch 30/172] avg loss 0.00547374, throughput 3.16025K wps
[Epoch 32 Batch 60/172] avg loss 0.00532957, throughput 3.09057K wps
[Epoch 32 Batch 90/172] avg loss 0.00492862, throughput 3.52351K wps
[Epoch 32 Batch 120/172] avg loss 0.00559236, throughput 3.18707K wps
[Epoch 32 Batch 150/172] avg loss 0.00553083, throughput 3.00865K wps
Begin Testing...
[Epoch 32] train avg loss 0.0053687, dev acc 0.8784, dev avg loss 0.298407, throughput 3.2207K wps
[Epoch 33 Batch 30/172] avg loss 0.00555, throughput 3.62253K wps
[Epoch 33 Batch 60/172] avg loss 0.00556953, throughput 4.04651K wps
[Epoch 33 Batch 90/172] avg loss 0.00543006, throughput 3.86433K wps
[Epoch 33 Batch 120/172] avg loss 0.00527154, throughput 2.96583K wps
[Epoch 33 Batch 150/172] avg loss 0.00516715, throughput 3.5523K wps
Begin Testing...
[Epoch 33] train avg loss 0.00536285, dev acc 0.8805, dev avg loss 0.297693, throughput 3.50813K wps
[Epoch 34 Batch 30/172] avg loss 0.00526053, throughput 3.17758K wps
[Epoch 34 Batch 60/172] avg loss 0.00520314, throughput 3.05954K wps
[Epoch 34 Batch 90/172] avg loss 0.00546861, throughput 3.1875K wps
[Epoch 34 Batch 120/172] avg loss 0.00503365, throughput 3.31835K wps
[Epoch 34 Batch 150/172] avg loss 0.00601532, throughput 2.98863K wps
Begin Testing...
[Epoch 34] train avg loss 0.00539363, dev acc 0.8795, dev avg loss 0.297099, throughput 3.12835K wps
[Epoch 35 Batch 30/172] avg loss 0.00510753, throughput 3.43117K wps
[Epoch 35 Batch 60/172] avg loss 0.00540222, throughput 3.36444K wps
[Epoch 35 Batch 90/172] avg loss 0.00488898, throughput 3.43397K wps
[Epoch 35 Batch 120/172] avg loss 0.00517528, throughput 3.18893K wps
[Epoch 35 Batch 150/172] avg loss 0.00586079, throughput 3.02555K wps
Begin Testing...
[Epoch 35] train avg loss 0.00529257, dev acc 0.8795, dev avg loss 0.296031, throughput 3.27027K wps
[Epoch 36 Batch 30/172] avg loss 0.00529313, throughput 3.15315K wps
[Epoch 36 Batch 60/172] avg loss 0.0049977, throughput 3.39286K wps
[Epoch 36 Batch 90/172] avg loss 0.00533152, throughput 3.27511K wps
[Epoch 36 Batch 120/172] avg loss 0.00534461, throughput 3.81829K wps
[Epoch 36 Batch 150/172] avg loss 0.00524369, throughput 3.64221K wps
Begin Testing...
[Epoch 36] train avg loss 0.00526089, dev acc 0.8795, dev avg loss 0.295163, throughput 3.41713K wps
[Epoch 37 Batch 30/172] avg loss 0.005212, throughput 3.21398K wps
[Epoch 37 Batch 60/172] avg loss 0.00544735, throughput 3.18166K wps
[Epoch 37 Batch 90/172] avg loss 0.00504174, throughput 3.1867K wps
[Epoch 37 Batch 120/172] avg loss 0.00551583, throughput 3.81137K wps
[Epoch 37 Batch 150/172] avg loss 0.0050265, throughput 3.19809K wps
Begin Testing...
[Epoch 37] train avg loss 0.00523593, dev acc 0.8795, dev avg loss 0.294341, throughput 3.35522K wps
[Epoch 38 Batch 30/172] avg loss 0.00542355, throughput 3.00739K wps
[Epoch 38 Batch 60/172] avg loss 0.00550027, throughput 3.57412K wps
[Epoch 38 Batch 90/172] avg loss 0.0048208, throughput 3.14785K wps
[Epoch 38 Batch 120/172] avg loss 0.005171, throughput 3.01692K wps
[Epoch 38 Batch 150/172] avg loss 0.00503678, throughput 3.55784K wps
Begin Testing...
[Epoch 38] train avg loss 0.0051631, dev acc 0.8836, dev avg loss 0.294597, throughput 3.3082K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/172] avg loss 0.0057343, throughput 3.28946K wps
[Epoch 39 Batch 60/172] avg loss 0.00492689, throughput 3.60808K wps
[Epoch 39 Batch 90/172] avg loss 0.00543972, throughput 3.23677K wps
[Epoch 39 Batch 120/172] avg loss 0.00494994, throughput 3.02348K wps
[Epoch 39 Batch 150/172] avg loss 0.00494345, throughput 3.177K wps
Begin Testing...
[Epoch 39] train avg loss 0.00517902, dev acc 0.8816, dev avg loss 0.293961, throughput 3.21011K wps
[Epoch 40 Batch 30/172] avg loss 0.00492781, throughput 3.33364K wps
[Epoch 40 Batch 60/172] avg loss 0.00512029, throughput 2.96162K wps
[Epoch 40 Batch 90/172] avg loss 0.00504844, throughput 3.07995K wps
[Epoch 40 Batch 120/172] avg loss 0.00533818, throughput 3.51491K wps
[Epoch 40 Batch 150/172] avg loss 0.00546359, throughput 2.93561K wps
Begin Testing...
[Epoch 40] train avg loss 0.00515929, dev acc 0.8795, dev avg loss 0.29239, throughput 3.25041K wps
[Epoch 41 Batch 30/172] avg loss 0.00522227, throughput 3.24211K wps
[Epoch 41 Batch 60/172] avg loss 0.00484336, throughput 3.23519K wps
[Epoch 41 Batch 90/172] avg loss 0.00523415, throughput 3.2438K wps
[Epoch 41 Batch 120/172] avg loss 0.00496838, throughput 3.54019K wps
[Epoch 41 Batch 150/172] avg loss 0.00506426, throughput 3.54847K wps
Begin Testing...
[Epoch 41] train avg loss 0.00511139, dev acc 0.8847, dev avg loss 0.292865, throughput 3.36665K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/172] avg loss 0.00513925, throughput 3.34877K wps
[Epoch 42 Batch 60/172] avg loss 0.00529085, throughput 3.08077K wps
[Epoch 42 Batch 90/172] avg loss 0.00484439, throughput 3.67205K wps
[Epoch 42 Batch 120/172] avg loss 0.00486904, throughput 3.64733K wps
[Epoch 42 Batch 150/172] avg loss 0.00506044, throughput 3.23373K wps
Begin Testing...
[Epoch 42] train avg loss 0.00505498, dev acc 0.8795, dev avg loss 0.291147, throughput 3.44874K wps
[Epoch 43 Batch 30/172] avg loss 0.00488998, throughput 3.04662K wps
[Epoch 43 Batch 60/172] avg loss 0.00506584, throughput 3.48435K wps
[Epoch 43 Batch 90/172] avg loss 0.00522186, throughput 3.19816K wps
[Epoch 43 Batch 120/172] avg loss 0.00480696, throughput 3.39627K wps
[Epoch 43 Batch 150/172] avg loss 0.00476445, throughput 3.10453K wps
Begin Testing...
[Epoch 43] train avg loss 0.00497367, dev acc 0.8826, dev avg loss 0.29133, throughput 3.21503K wps
[Epoch 44 Batch 30/172] avg loss 0.00525093, throughput 3.053K wps
[Epoch 44 Batch 60/172] avg loss 0.00520206, throughput 3.39963K wps
[Epoch 44 Batch 90/172] avg loss 0.00475528, throughput 2.99886K wps
[Epoch 44 Batch 120/172] avg loss 0.00497008, throughput 3.10235K wps
[Epoch 44 Batch 150/172] avg loss 0.00503202, throughput 3.26104K wps
Begin Testing...
[Epoch 44] train avg loss 0.00496903, dev acc 0.8805, dev avg loss 0.290148, throughput 3.13522K wps
[Epoch 45 Batch 30/172] avg loss 0.00482634, throughput 3.46497K wps
[Epoch 45 Batch 60/172] avg loss 0.00497375, throughput 3.29703K wps
[Epoch 45 Batch 90/172] avg loss 0.00482753, throughput 3.76765K wps
[Epoch 45 Batch 120/172] avg loss 0.00506363, throughput 3.4942K wps
[Epoch 45 Batch 150/172] avg loss 0.00479402, throughput 3.3315K wps
Begin Testing...
[Epoch 45] train avg loss 0.00494388, dev acc 0.8784, dev avg loss 0.28994, throughput 3.42603K wps
[Epoch 46 Batch 30/172] avg loss 0.00484425, throughput 2.96828K wps
[Epoch 46 Batch 60/172] avg loss 0.00492215, throughput 3.05133K wps
[Epoch 46 Batch 90/172] avg loss 0.00444153, throughput 3.71502K wps
[Epoch 46 Batch 120/172] avg loss 0.00549308, throughput 4.09216K wps
[Epoch 46 Batch 150/172] avg loss 0.00497785, throughput 3.23395K wps
Begin Testing...
[Epoch 46] train avg loss 0.00494577, dev acc 0.8784, dev avg loss 0.289233, throughput 3.36349K wps
[Epoch 47 Batch 30/172] avg loss 0.00479453, throughput 3.08195K wps
[Epoch 47 Batch 60/172] avg loss 0.00478986, throughput 3.40739K wps
[Epoch 47 Batch 90/172] avg loss 0.00500868, throughput 3.69668K wps
[Epoch 47 Batch 120/172] avg loss 0.00498388, throughput 3.82687K wps
[Epoch 47 Batch 150/172] avg loss 0.00477108, throughput 3.28051K wps
Begin Testing...
[Epoch 47] train avg loss 0.00487523, dev acc 0.8795, dev avg loss 0.288726, throughput 3.47772K wps
[Epoch 48 Batch 30/172] avg loss 0.00531149, throughput 3.04966K wps
[Epoch 48 Batch 60/172] avg loss 0.00466469, throughput 3.13525K wps
[Epoch 48 Batch 90/172] avg loss 0.00481445, throughput 3.17829K wps
[Epoch 48 Batch 120/172] avg loss 0.00463933, throughput 3.05871K wps
[Epoch 48 Batch 150/172] avg loss 0.0048756, throughput 4.15575K wps
Begin Testing...
[Epoch 48] train avg loss 0.00488377, dev acc 0.8805, dev avg loss 0.288499, throughput 3.34984K wps
[Epoch 49 Batch 30/172] avg loss 0.00502528, throughput 3.5202K wps
[Epoch 49 Batch 60/172] avg loss 0.00461825, throughput 3.01523K wps
[Epoch 49 Batch 90/172] avg loss 0.00471816, throughput 3.11168K wps
[Epoch 49 Batch 120/172] avg loss 0.00497361, throughput 3.23421K wps
[Epoch 49 Batch 150/172] avg loss 0.00470138, throughput 3.67086K wps
Begin Testing...
[Epoch 49] train avg loss 0.00483275, dev acc 0.8816, dev avg loss 0.288125, throughput 3.32712K wps
[Epoch 50 Batch 30/172] avg loss 0.00442276, throughput 3.72115K wps
[Epoch 50 Batch 60/172] avg loss 0.00475515, throughput 3.15721K wps
[Epoch 50 Batch 90/172] avg loss 0.00510736, throughput 3.77859K wps
[Epoch 50 Batch 120/172] avg loss 0.00455048, throughput 3.87398K wps
[Epoch 50 Batch 150/172] avg loss 0.00506537, throughput 3.26238K wps
Begin Testing...
[Epoch 50] train avg loss 0.00479813, dev acc 0.8836, dev avg loss 0.289106, throughput 3.48312K wps
[Epoch 51 Batch 30/172] avg loss 0.00480438, throughput 3.07155K wps
[Epoch 51 Batch 60/172] avg loss 0.00441351, throughput 3.57297K wps
[Epoch 51 Batch 90/172] avg loss 0.00484475, throughput 3.14506K wps
[Epoch 51 Batch 120/172] avg loss 0.00487658, throughput 2.99044K wps
[Epoch 51 Batch 150/172] avg loss 0.00468974, throughput 3.21191K wps
Begin Testing...
[Epoch 51] train avg loss 0.00473156, dev acc 0.8836, dev avg loss 0.287207, throughput 3.19279K wps
[Epoch 52 Batch 30/172] avg loss 0.00469471, throughput 3.14501K wps
[Epoch 52 Batch 60/172] avg loss 0.00482115, throughput 3.11263K wps
[Epoch 52 Batch 90/172] avg loss 0.00448857, throughput 3.03256K wps
[Epoch 52 Batch 120/172] avg loss 0.00504254, throughput 3.27689K wps
[Epoch 52 Batch 150/172] avg loss 0.00467892, throughput 3.27835K wps
Begin Testing...
[Epoch 52] train avg loss 0.00477531, dev acc 0.8805, dev avg loss 0.286852, throughput 3.15656K wps
[Epoch 53 Batch 30/172] avg loss 0.00450337, throughput 3.16433K wps
[Epoch 53 Batch 60/172] avg loss 0.00470892, throughput 3.46051K wps
[Epoch 53 Batch 90/172] avg loss 0.00449731, throughput 3.12309K wps
[Epoch 53 Batch 120/172] avg loss 0.00463329, throughput 3.35188K wps
[Epoch 53 Batch 150/172] avg loss 0.00471547, throughput 2.97468K wps
Begin Testing...
[Epoch 53] train avg loss 0.00467456, dev acc 0.8784, dev avg loss 0.286238, throughput 3.21812K wps
[Epoch 54 Batch 30/172] avg loss 0.00479818, throughput 3.36328K wps
[Epoch 54 Batch 60/172] avg loss 0.00468568, throughput 3.22385K wps
[Epoch 54 Batch 90/172] avg loss 0.00467522, throughput 3.37165K wps
[Epoch 54 Batch 120/172] avg loss 0.0044997, throughput 3.38474K wps
[Epoch 54 Batch 150/172] avg loss 0.00487044, throughput 3.89054K wps
Begin Testing...
[Epoch 54] train avg loss 0.00474818, dev acc 0.8847, dev avg loss 0.28655, throughput 3.40146K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/172] avg loss 0.00441837, throughput 3.71396K wps
[Epoch 55 Batch 60/172] avg loss 0.00527056, throughput 3.51411K wps
[Epoch 55 Batch 90/172] avg loss 0.00469949, throughput 3.44638K wps
[Epoch 55 Batch 120/172] avg loss 0.00469657, throughput 2.97036K wps
[Epoch 55 Batch 150/172] avg loss 0.00434336, throughput 3.22504K wps
Begin Testing...
[Epoch 55] train avg loss 0.00468128, dev acc 0.8805, dev avg loss 0.285543, throughput 3.33549K wps
[Epoch 56 Batch 30/172] avg loss 0.00461578, throughput 3.01823K wps
[Epoch 56 Batch 60/172] avg loss 0.00472691, throughput 3.12643K wps
[Epoch 56 Batch 90/172] avg loss 0.00458255, throughput 3.43675K wps
[Epoch 56 Batch 120/172] avg loss 0.00471845, throughput 3.74814K wps
[Epoch 56 Batch 150/172] avg loss 0.00489266, throughput 3.10419K wps
Begin Testing...
[Epoch 56] train avg loss 0.00469635, dev acc 0.8847, dev avg loss 0.286583, throughput 3.2244K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/172] avg loss 0.00459841, throughput 3.10014K wps
[Epoch 57 Batch 60/172] avg loss 0.00494553, throughput 3.3558K wps
[Epoch 57 Batch 90/172] avg loss 0.004604, throughput 3.47055K wps
[Epoch 57 Batch 120/172] avg loss 0.00437, throughput 3.69925K wps
[Epoch 57 Batch 150/172] avg loss 0.00434093, throughput 3.50402K wps
Begin Testing...
[Epoch 57] train avg loss 0.00461482, dev acc 0.8816, dev avg loss 0.28495, throughput 3.40444K wps
[Epoch 58 Batch 30/172] avg loss 0.0047812, throughput 3.43891K wps
[Epoch 58 Batch 60/172] avg loss 0.00461843, throughput 3.58904K wps
[Epoch 58 Batch 90/172] avg loss 0.00485089, throughput 3.44384K wps
[Epoch 58 Batch 120/172] avg loss 0.00447516, throughput 2.97852K wps
[Epoch 58 Batch 150/172] avg loss 0.00443977, throughput 3.86221K wps
Begin Testing...
[Epoch 58] train avg loss 0.00461682, dev acc 0.8847, dev avg loss 0.284838, throughput 3.38971K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/172] avg loss 0.00453377, throughput 3.29001K wps
[Epoch 59 Batch 60/172] avg loss 0.00434059, throughput 3.53247K wps
[Epoch 59 Batch 90/172] avg loss 0.00488701, throughput 3.19126K wps
[Epoch 59 Batch 120/172] avg loss 0.00435179, throughput 3.43976K wps
[Epoch 59 Batch 150/172] avg loss 0.00443042, throughput 3.68069K wps
Begin Testing...
[Epoch 59] train avg loss 0.00461413, dev acc 0.8826, dev avg loss 0.284052, throughput 3.46056K wps
[Epoch 60 Batch 30/172] avg loss 0.0047641, throughput 3.35586K wps
[Epoch 60 Batch 60/172] avg loss 0.00443196, throughput 3.73947K wps
[Epoch 60 Batch 90/172] avg loss 0.00414429, throughput 3.43129K wps
[Epoch 60 Batch 120/172] avg loss 0.00453895, throughput 3.47117K wps
[Epoch 60 Batch 150/172] avg loss 0.00455807, throughput 3.79046K wps
Begin Testing...
[Epoch 60] train avg loss 0.00456181, dev acc 0.8826, dev avg loss 0.283664, throughput 3.47353K wps
[Epoch 61 Batch 30/172] avg loss 0.00444975, throughput 3.61772K wps
[Epoch 61 Batch 60/172] avg loss 0.00471465, throughput 2.99444K wps
[Epoch 61 Batch 90/172] avg loss 0.00419933, throughput 3.19542K wps
[Epoch 61 Batch 120/172] avg loss 0.0042954, throughput 3.24957K wps
[Epoch 61 Batch 150/172] avg loss 0.00514085, throughput 3.65771K wps
Begin Testing...
[Epoch 61] train avg loss 0.00456407, dev acc 0.8836, dev avg loss 0.284019, throughput 3.37273K wps
[Epoch 62 Batch 30/172] avg loss 0.00444916, throughput 3.46509K wps
[Epoch 62 Batch 60/172] avg loss 0.00414047, throughput 3.14121K wps
[Epoch 62 Batch 90/172] avg loss 0.00462263, throughput 3.3793K wps
[Epoch 62 Batch 120/172] avg loss 0.00494112, throughput 3.09571K wps
[Epoch 62 Batch 150/172] avg loss 0.00406777, throughput 3.3643K wps
Begin Testing...
[Epoch 62] train avg loss 0.0044694, dev acc 0.8868, dev avg loss 0.28341, throughput 3.26945K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/172] avg loss 0.00389023, throughput 3.23715K wps
[Epoch 63 Batch 60/172] avg loss 0.00467879, throughput 3.60877K wps
[Epoch 63 Batch 90/172] avg loss 0.00464587, throughput 3.34548K wps
[Epoch 63 Batch 120/172] avg loss 0.00462986, throughput 3.00736K wps
[Epoch 63 Batch 150/172] avg loss 0.00503671, throughput 3.27236K wps
Begin Testing...
[Epoch 63] train avg loss 0.00450768, dev acc 0.8857, dev avg loss 0.28455, throughput 3.31475K wps
[Epoch 64 Batch 30/172] avg loss 0.0041293, throughput 3.05992K wps
[Epoch 64 Batch 60/172] avg loss 0.00451612, throughput 3.21435K wps
[Epoch 64 Batch 90/172] avg loss 0.00472832, throughput 3.45304K wps
[Epoch 64 Batch 120/172] avg loss 0.00409826, throughput 3.80903K wps
[Epoch 64 Batch 150/172] avg loss 0.00448952, throughput 3.26626K wps
Begin Testing...
[Epoch 64] train avg loss 0.0044663, dev acc 0.8868, dev avg loss 0.282444, throughput 3.31708K wps
Observed Improvement.
Begin Testing...
[Epoch 65 Batch 30/172] avg loss 0.00461461, throughput 2.93763K wps
[Epoch 65 Batch 60/172] avg loss 0.00436807, throughput 3.19259K wps
[Epoch 65 Batch 90/172] avg loss 0.0043955, throughput 3.52404K wps
[Epoch 65 Batch 120/172] avg loss 0.00436453, throughput 3.44809K wps
[Epoch 65 Batch 150/172] avg loss 0.00429289, throughput 3.45904K wps
Begin Testing...
[Epoch 65] train avg loss 0.00448808, dev acc 0.8836, dev avg loss 0.281631, throughput 3.24903K wps
[Epoch 66 Batch 30/172] avg loss 0.00433271, throughput 3.3545K wps
[Epoch 66 Batch 60/172] avg loss 0.00408638, throughput 3.00286K wps
[Epoch 66 Batch 90/172] avg loss 0.0040224, throughput 3.07793K wps
[Epoch 66 Batch 120/172] avg loss 0.0049125, throughput 3.38082K wps
[Epoch 66 Batch 150/172] avg loss 0.00419865, throughput 3.51326K wps
Begin Testing...
[Epoch 66] train avg loss 0.00435981, dev acc 0.8857, dev avg loss 0.281663, throughput 3.26248K wps
[Epoch 67 Batch 30/172] avg loss 0.00454827, throughput 3.46872K wps
[Epoch 67 Batch 60/172] avg loss 0.00444444, throughput 4.02106K wps
[Epoch 67 Batch 90/172] avg loss 0.00439243, throughput 3.0433K wps
[Epoch 67 Batch 120/172] avg loss 0.00485216, throughput 3.81309K wps
[Epoch 67 Batch 150/172] avg loss 0.00403251, throughput 3.38261K wps
Begin Testing...
[Epoch 67] train avg loss 0.00440184, dev acc 0.8857, dev avg loss 0.28274, throughput 3.44568K wps
[Epoch 68 Batch 30/172] avg loss 0.00466269, throughput 3.53226K wps
[Epoch 68 Batch 60/172] avg loss 0.00459553, throughput 3.36822K wps
[Epoch 68 Batch 90/172] avg loss 0.00435113, throughput 3.47124K wps
[Epoch 68 Batch 120/172] avg loss 0.0042454, throughput 3.52408K wps
[Epoch 68 Batch 150/172] avg loss 0.00403074, throughput 3.47718K wps
Begin Testing...
[Epoch 68] train avg loss 0.00435562, dev acc 0.8889, dev avg loss 0.281407, throughput 3.49717K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/172] avg loss 0.00425382, throughput 3.08082K wps
[Epoch 69 Batch 60/172] avg loss 0.00417371, throughput 3.21301K wps
[Epoch 69 Batch 90/172] avg loss 0.00456112, throughput 4.27745K wps
[Epoch 69 Batch 120/172] avg loss 0.00423657, throughput 3.85959K wps
[Epoch 69 Batch 150/172] avg loss 0.00392505, throughput 3.32324K wps
Begin Testing...
[Epoch 69] train avg loss 0.00424514, dev acc 0.8868, dev avg loss 0.281648, throughput 3.537K wps
[Epoch 70 Batch 30/172] avg loss 0.00436762, throughput 3.36468K wps
[Epoch 70 Batch 60/172] avg loss 0.00441651, throughput 2.77235K wps
[Epoch 70 Batch 90/172] avg loss 0.00395608, throughput 3.27336K wps
[Epoch 70 Batch 120/172] avg loss 0.0041644, throughput 3.10471K wps
[Epoch 70 Batch 150/172] avg loss 0.00457142, throughput 3.19395K wps
Begin Testing...
[Epoch 70] train avg loss 0.00428123, dev acc 0.8857, dev avg loss 0.281351, throughput 3.17888K wps
[Epoch 71 Batch 30/172] avg loss 0.00436852, throughput 3.43889K wps
[Epoch 71 Batch 60/172] avg loss 0.00408873, throughput 4.16318K wps
[Epoch 71 Batch 90/172] avg loss 0.00452093, throughput 3.23003K wps
[Epoch 71 Batch 120/172] avg loss 0.00417764, throughput 3.44126K wps
[Epoch 71 Batch 150/172] avg loss 0.00401949, throughput 3.04783K wps
Begin Testing...
[Epoch 71] train avg loss 0.00422389, dev acc 0.8868, dev avg loss 0.28071, throughput 3.39138K wps
[Epoch 72 Batch 30/172] avg loss 0.00418589, throughput 3.50104K wps
[Epoch 72 Batch 60/172] avg loss 0.00470252, throughput 3.87811K wps
[Epoch 72 Batch 90/172] avg loss 0.00427977, throughput 3.7061K wps
[Epoch 72 Batch 120/172] avg loss 0.00427206, throughput 3.74283K wps
[Epoch 72 Batch 150/172] avg loss 0.00408354, throughput 3.76306K wps
Begin Testing...
[Epoch 72] train avg loss 0.00428689, dev acc 0.8868, dev avg loss 0.281165, throughput 3.63347K wps
[Epoch 73 Batch 30/172] avg loss 0.00401954, throughput 4.04419K wps
[Epoch 73 Batch 60/172] avg loss 0.00432072, throughput 3.51523K wps
[Epoch 73 Batch 90/172] avg loss 0.00434148, throughput 3.89229K wps
[Epoch 73 Batch 120/172] avg loss 0.00459893, throughput 3.65436K wps
[Epoch 73 Batch 150/172] avg loss 0.00397273, throughput 3.21155K wps
Begin Testing...
[Epoch 73] train avg loss 0.00422612, dev acc 0.8868, dev avg loss 0.280574, throughput 3.53209K wps
[Epoch 74 Batch 30/172] avg loss 0.00406624, throughput 3.09365K wps
[Epoch 74 Batch 60/172] avg loss 0.00442225, throughput 3.06149K wps
[Epoch 74 Batch 90/172] avg loss 0.00412022, throughput 3.22825K wps
[Epoch 74 Batch 120/172] avg loss 0.00427684, throughput 3.46205K wps
[Epoch 74 Batch 150/172] avg loss 0.00438411, throughput 3.17551K wps
Begin Testing...
[Epoch 74] train avg loss 0.00418256, dev acc 0.8910, dev avg loss 0.28037, throughput 3.26471K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/172] avg loss 0.00437333, throughput 3.32771K wps
[Epoch 75 Batch 60/172] avg loss 0.00402819, throughput 3.27717K wps
[Epoch 75 Batch 90/172] avg loss 0.00452457, throughput 3.08406K wps
[Epoch 75 Batch 120/172] avg loss 0.00358944, throughput 3.97017K wps
[Epoch 75 Batch 150/172] avg loss 0.00445712, throughput 3.72361K wps
Begin Testing...
[Epoch 75] train avg loss 0.00418604, dev acc 0.8857, dev avg loss 0.28053, throughput 3.36366K wps
[Epoch 76 Batch 30/172] avg loss 0.00441328, throughput 2.9271K wps
[Epoch 76 Batch 60/172] avg loss 0.00410473, throughput 3.24913K wps
[Epoch 76 Batch 90/172] avg loss 0.00429022, throughput 3.25653K wps
[Epoch 76 Batch 120/172] avg loss 0.00404992, throughput 3.55644K wps
[Epoch 76 Batch 150/172] avg loss 0.00407375, throughput 3.00042K wps
Begin Testing...
[Epoch 76] train avg loss 0.00416146, dev acc 0.8857, dev avg loss 0.280466, throughput 3.18233K wps
[Epoch 77 Batch 30/172] avg loss 0.00421762, throughput 3.22068K wps
[Epoch 77 Batch 60/172] avg loss 0.00412786, throughput 3.74417K wps
[Epoch 77 Batch 90/172] avg loss 0.00429749, throughput 2.96825K wps
[Epoch 77 Batch 120/172] avg loss 0.00382481, throughput 3.30488K wps
[Epoch 77 Batch 150/172] avg loss 0.00417611, throughput 3.44427K wps
Begin Testing...
[Epoch 77] train avg loss 0.00411514, dev acc 0.8889, dev avg loss 0.279708, throughput 3.29749K wps
[Epoch 78 Batch 30/172] avg loss 0.00414808, throughput 3.33744K wps
[Epoch 78 Batch 60/172] avg loss 0.00441851, throughput 3.57344K wps
[Epoch 78 Batch 90/172] avg loss 0.00366444, throughput 3.14177K wps
[Epoch 78 Batch 120/172] avg loss 0.00378521, throughput 3.66103K wps
[Epoch 78 Batch 150/172] avg loss 0.00492269, throughput 2.92261K wps
Begin Testing...
[Epoch 78] train avg loss 0.0041537, dev acc 0.8889, dev avg loss 0.281141, throughput 3.31527K wps
[Epoch 79 Batch 30/172] avg loss 0.00450323, throughput 3.58307K wps
[Epoch 79 Batch 60/172] avg loss 0.00435038, throughput 3.49427K wps
[Epoch 79 Batch 90/172] avg loss 0.00384822, throughput 3.274K wps
[Epoch 79 Batch 120/172] avg loss 0.00424755, throughput 3.40301K wps
[Epoch 79 Batch 150/172] avg loss 0.00369694, throughput 3.0049K wps
Begin Testing...
[Epoch 79] train avg loss 0.00412405, dev acc 0.8889, dev avg loss 0.27942, throughput 3.35314K wps
[Epoch 80 Batch 30/172] avg loss 0.00400289, throughput 3.01966K wps
[Epoch 80 Batch 60/172] avg loss 0.00390975, throughput 3.32268K wps
[Epoch 80 Batch 90/172] avg loss 0.00393861, throughput 3.30875K wps
[Epoch 80 Batch 120/172] avg loss 0.00402324, throughput 2.99598K wps
[Epoch 80 Batch 150/172] avg loss 0.00417184, throughput 2.98136K wps
Begin Testing...
[Epoch 80] train avg loss 0.00400457, dev acc 0.8889, dev avg loss 0.27966, throughput 3.19113K wps
[Epoch 81 Batch 30/172] avg loss 0.00396391, throughput 3.29837K wps
[Epoch 81 Batch 60/172] avg loss 0.00406774, throughput 3.05365K wps
[Epoch 81 Batch 90/172] avg loss 0.00410077, throughput 3.45374K wps
[Epoch 81 Batch 120/172] avg loss 0.00430607, throughput 3.71403K wps
[Epoch 81 Batch 150/172] avg loss 0.00383329, throughput 3.03727K wps
Begin Testing...
[Epoch 81] train avg loss 0.00407102, dev acc 0.8899, dev avg loss 0.278782, throughput 3.24289K wps
[Epoch 82 Batch 30/172] avg loss 0.00429455, throughput 3.30245K wps
[Epoch 82 Batch 60/172] avg loss 0.00368941, throughput 2.99639K wps
[Epoch 82 Batch 90/172] avg loss 0.00373897, throughput 3.01074K wps
[Epoch 82 Batch 120/172] avg loss 0.0042125, throughput 3.64871K wps
[Epoch 82 Batch 150/172] avg loss 0.00418339, throughput 3.24718K wps
Begin Testing...
[Epoch 82] train avg loss 0.00406578, dev acc 0.8878, dev avg loss 0.279686, throughput 3.19963K wps
[Epoch 83 Batch 30/172] avg loss 0.00417269, throughput 3.26344K wps
[Epoch 83 Batch 60/172] avg loss 0.00377289, throughput 3.16746K wps
[Epoch 83 Batch 90/172] avg loss 0.0038959, throughput 3.56031K wps
[Epoch 83 Batch 120/172] avg loss 0.00370589, throughput 4.16334K wps
[Epoch 83 Batch 150/172] avg loss 0.0044088, throughput 3.58958K wps
Begin Testing...
[Epoch 83] train avg loss 0.00403114, dev acc 0.8868, dev avg loss 0.279361, throughput 3.60006K wps
[Epoch 84 Batch 30/172] avg loss 0.00412873, throughput 3.1011K wps
[Epoch 84 Batch 60/172] avg loss 0.00403629, throughput 4.05538K wps
[Epoch 84 Batch 90/172] avg loss 0.00390664, throughput 3.12933K wps
[Epoch 84 Batch 120/172] avg loss 0.00376898, throughput 3.49461K wps
[Epoch 84 Batch 150/172] avg loss 0.00392972, throughput 3.05652K wps
Begin Testing...
[Epoch 84] train avg loss 0.00402295, dev acc 0.8910, dev avg loss 0.278725, throughput 3.30343K wps
Observed Improvement.
Begin Testing...
[Epoch 85 Batch 30/172] avg loss 0.00432166, throughput 3.04663K wps
[Epoch 85 Batch 60/172] avg loss 0.00374501, throughput 2.99662K wps
[Epoch 85 Batch 90/172] avg loss 0.00381592, throughput 3.57541K wps
[Epoch 85 Batch 120/172] avg loss 0.00423991, throughput 3.63545K wps
[Epoch 85 Batch 150/172] avg loss 0.00402347, throughput 2.94721K wps
Begin Testing...
[Epoch 85] train avg loss 0.00403704, dev acc 0.8910, dev avg loss 0.278511, throughput 3.22872K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/172] avg loss 0.00418457, throughput 3.23029K wps
[Epoch 86 Batch 60/172] avg loss 0.00391872, throughput 3.18042K wps
[Epoch 86 Batch 90/172] avg loss 0.00396597, throughput 3.41773K wps
[Epoch 86 Batch 120/172] avg loss 0.00393423, throughput 3.10088K wps
[Epoch 86 Batch 150/172] avg loss 0.00371034, throughput 3.43034K wps
Begin Testing...
[Epoch 86] train avg loss 0.00396308, dev acc 0.8899, dev avg loss 0.278186, throughput 3.21936K wps
[Epoch 87 Batch 30/172] avg loss 0.00398116, throughput 3.09639K wps
[Epoch 87 Batch 60/172] avg loss 0.00412303, throughput 3.19563K wps
[Epoch 87 Batch 90/172] avg loss 0.00388607, throughput 2.90615K wps
[Epoch 87 Batch 120/172] avg loss 0.00411095, throughput 3.03555K wps
[Epoch 87 Batch 150/172] avg loss 0.00401089, throughput 3.31806K wps
Begin Testing...
[Epoch 87] train avg loss 0.00395859, dev acc 0.8868, dev avg loss 0.278668, throughput 3.15631K wps
[Epoch 88 Batch 30/172] avg loss 0.00398452, throughput 3.62046K wps
[Epoch 88 Batch 60/172] avg loss 0.0039767, throughput 3.59299K wps
[Epoch 88 Batch 90/172] avg loss 0.00377471, throughput 3.11695K wps
[Epoch 88 Batch 120/172] avg loss 0.00399656, throughput 3.55743K wps
[Epoch 88 Batch 150/172] avg loss 0.00385819, throughput 3.32651K wps
Begin Testing...
[Epoch 88] train avg loss 0.00390175, dev acc 0.8899, dev avg loss 0.278029, throughput 3.38137K wps
[Epoch 89 Batch 30/172] avg loss 0.00371944, throughput 3.02073K wps
[Epoch 89 Batch 60/172] avg loss 0.00388222, throughput 3.63753K wps
[Epoch 89 Batch 90/172] avg loss 0.00411245, throughput 3.42675K wps
[Epoch 89 Batch 120/172] avg loss 0.00399009, throughput 3.2018K wps
[Epoch 89 Batch 150/172] avg loss 0.00396928, throughput 3.8658K wps
Begin Testing...
[Epoch 89] train avg loss 0.00392216, dev acc 0.8910, dev avg loss 0.27952, throughput 3.43229K wps
Observed Improvement.
Begin Testing...
[Epoch 90 Batch 30/172] avg loss 0.00409131, throughput 3.58936K wps
[Epoch 90 Batch 60/172] avg loss 0.00400971, throughput 3.77509K wps
[Epoch 90 Batch 90/172] avg loss 0.00350037, throughput 2.97398K wps
[Epoch 90 Batch 120/172] avg loss 0.00408356, throughput 3.38664K wps
[Epoch 90 Batch 150/172] avg loss 0.00367744, throughput 3.37785K wps
Begin Testing...
[Epoch 90] train avg loss 0.00386445, dev acc 0.8920, dev avg loss 0.277647, throughput 3.37197K wps
Observed Improvement.
Begin Testing...
[Epoch 91 Batch 30/172] avg loss 0.00367631, throughput 3.32381K wps
[Epoch 91 Batch 60/172] avg loss 0.00379681, throughput 3.00924K wps
[Epoch 91 Batch 90/172] avg loss 0.00386261, throughput 3.75745K wps
[Epoch 91 Batch 120/172] avg loss 0.0037002, throughput 3.45203K wps
[Epoch 91 Batch 150/172] avg loss 0.00389789, throughput 2.96665K wps
Begin Testing...
[Epoch 91] train avg loss 0.00380053, dev acc 0.8889, dev avg loss 0.27766, throughput 3.21791K wps
[Epoch 92 Batch 30/172] avg loss 0.00367419, throughput 3.12097K wps
[Epoch 92 Batch 60/172] avg loss 0.00379815, throughput 3.15775K wps
[Epoch 92 Batch 90/172] avg loss 0.00359334, throughput 3.24704K wps
[Epoch 92 Batch 120/172] avg loss 0.0041705, throughput 3.55305K wps
[Epoch 92 Batch 150/172] avg loss 0.00395872, throughput 4.07856K wps
Begin Testing...
[Epoch 92] train avg loss 0.00386967, dev acc 0.8899, dev avg loss 0.277437, throughput 3.35948K wps
[Epoch 93 Batch 30/172] avg loss 0.00352248, throughput 2.91294K wps
[Epoch 93 Batch 60/172] avg loss 0.00429712, throughput 3.63661K wps
[Epoch 93 Batch 90/172] avg loss 0.00358206, throughput 3.18477K wps
[Epoch 93 Batch 120/172] avg loss 0.00394753, throughput 3.32536K wps
[Epoch 93 Batch 150/172] avg loss 0.00392456, throughput 3.67478K wps
Begin Testing...
[Epoch 93] train avg loss 0.00381577, dev acc 0.8878, dev avg loss 0.277227, throughput 3.31786K wps
[Epoch 94 Batch 30/172] avg loss 0.00375188, throughput 2.98179K wps
[Epoch 94 Batch 60/172] avg loss 0.00368815, throughput 3.50744K wps
[Epoch 94 Batch 90/172] avg loss 0.00403554, throughput 3.46276K wps
[Epoch 94 Batch 120/172] avg loss 0.00342268, throughput 2.87844K wps
[Epoch 94 Batch 150/172] avg loss 0.00384908, throughput 3.45102K wps
Begin Testing...
[Epoch 94] train avg loss 0.0037812, dev acc 0.8878, dev avg loss 0.277569, throughput 3.30542K wps
[Epoch 95 Batch 30/172] avg loss 0.00363005, throughput 3.04755K wps
[Epoch 95 Batch 60/172] avg loss 0.00371595, throughput 3.05246K wps
[Epoch 95 Batch 90/172] avg loss 0.00378336, throughput 4.13512K wps
[Epoch 95 Batch 120/172] avg loss 0.00414931, throughput 3.39295K wps
[Epoch 95 Batch 150/172] avg loss 0.0034622, throughput 2.89358K wps
Begin Testing...
[Epoch 95] train avg loss 0.00380067, dev acc 0.8878, dev avg loss 0.277253, throughput 3.2279K wps
[Epoch 96 Batch 30/172] avg loss 0.00369254, throughput 3.59347K wps
[Epoch 96 Batch 60/172] avg loss 0.00401555, throughput 3.47621K wps
[Epoch 96 Batch 90/172] avg loss 0.00333056, throughput 3.48554K wps
[Epoch 96 Batch 120/172] avg loss 0.00392512, throughput 3.1694K wps
[Epoch 96 Batch 150/172] avg loss 0.00383753, throughput 3.02699K wps
Begin Testing...
[Epoch 96] train avg loss 0.00378059, dev acc 0.8920, dev avg loss 0.277483, throughput 3.29042K wps
Observed Improvement.
Begin Testing...
[Epoch 97 Batch 30/172] avg loss 0.00387836, throughput 3.41429K wps
[Epoch 97 Batch 60/172] avg loss 0.0037647, throughput 3.24204K wps
[Epoch 97 Batch 90/172] avg loss 0.00364134, throughput 3.45601K wps
[Epoch 97 Batch 120/172] avg loss 0.00375252, throughput 2.92888K wps
[Epoch 97 Batch 150/172] avg loss 0.0035291, throughput 3.09517K wps
Begin Testing...
[Epoch 97] train avg loss 0.00373276, dev acc 0.8899, dev avg loss 0.277271, throughput 3.22839K wps
[Epoch 98 Batch 30/172] avg loss 0.0032855, throughput 3.85013K wps
[Epoch 98 Batch 60/172] avg loss 0.00363829, throughput 3.02748K wps
[Epoch 98 Batch 90/172] avg loss 0.00366183, throughput 3.11002K wps
[Epoch 98 Batch 120/172] avg loss 0.004002, throughput 3.73739K wps
[Epoch 98 Batch 150/172] avg loss 0.00378364, throughput 3.23859K wps
Begin Testing...
[Epoch 98] train avg loss 0.00368021, dev acc 0.8899, dev avg loss 0.276856, throughput 3.35271K wps
[Epoch 99 Batch 30/172] avg loss 0.00354095, throughput 3.01145K wps
[Epoch 99 Batch 60/172] avg loss 0.00365216, throughput 3.06716K wps
[Epoch 99 Batch 90/172] avg loss 0.0033452, throughput 3.39549K wps
[Epoch 99 Batch 120/172] avg loss 0.00373508, throughput 3.1846K wps
[Epoch 99 Batch 150/172] avg loss 0.00400466, throughput 3.16972K wps
Begin Testing...
[Epoch 99] train avg loss 0.00376961, dev acc 0.8910, dev avg loss 0.276333, throughput 3.14765K wps
[Epoch 100 Batch 30/172] avg loss 0.00359199, throughput 3.05371K wps
[Epoch 100 Batch 60/172] avg loss 0.00355463, throughput 3.59673K wps
[Epoch 100 Batch 90/172] avg loss 0.00353953, throughput 3.30055K wps
[Epoch 100 Batch 120/172] avg loss 0.00365711, throughput 3.22626K wps
[Epoch 100 Batch 150/172] avg loss 0.00411808, throughput 3.29826K wps
Begin Testing...
[Epoch 100] train avg loss 0.00371298, dev acc 0.8899, dev avg loss 0.276438, throughput 3.2379K wps
[Epoch 101 Batch 30/172] avg loss 0.00374575, throughput 3.13432K wps
[Epoch 101 Batch 60/172] avg loss 0.00358356, throughput 3.38525K wps
[Epoch 101 Batch 90/172] avg loss 0.00356979, throughput 3.79655K wps
[Epoch 101 Batch 120/172] avg loss 0.00351344, throughput 3.32557K wps
[Epoch 101 Batch 150/172] avg loss 0.00359606, throughput 3.20224K wps
Begin Testing...
[Epoch 101] train avg loss 0.00365718, dev acc 0.8920, dev avg loss 0.277149, throughput 3.36894K wps
Observed Improvement.
Begin Testing...
[Epoch 102 Batch 30/172] avg loss 0.00390119, throughput 2.96009K wps
[Epoch 102 Batch 60/172] avg loss 0.00353319, throughput 4.0259K wps
[Epoch 102 Batch 90/172] avg loss 0.00359621, throughput 3.31893K wps
[Epoch 102 Batch 120/172] avg loss 0.00355162, throughput 3.64269K wps
[Epoch 102 Batch 150/172] avg loss 0.00356408, throughput 3.11705K wps
Begin Testing...
[Epoch 102] train avg loss 0.00364146, dev acc 0.8857, dev avg loss 0.276392, throughput 3.38961K wps
[Epoch 103 Batch 30/172] avg loss 0.00390164, throughput 3.55344K wps
[Epoch 103 Batch 60/172] avg loss 0.00357607, throughput 3.19032K wps
[Epoch 103 Batch 90/172] avg loss 0.00360015, throughput 3.20852K wps
[Epoch 103 Batch 120/172] avg loss 0.00343091, throughput 3.16013K wps
[Epoch 103 Batch 150/172] avg loss 0.00337528, throughput 3.04676K wps
Begin Testing...
[Epoch 103] train avg loss 0.00356614, dev acc 0.8910, dev avg loss 0.276964, throughput 3.26149K wps
[Epoch 104 Batch 30/172] avg loss 0.00365504, throughput 3.03491K wps
[Epoch 104 Batch 60/172] avg loss 0.00343891, throughput 3.62361K wps
[Epoch 104 Batch 90/172] avg loss 0.00341067, throughput 3.74156K wps
[Epoch 104 Batch 120/172] avg loss 0.00376392, throughput 3.24225K wps
[Epoch 104 Batch 150/172] avg loss 0.00342916, throughput 3.31982K wps
Begin Testing...
[Epoch 104] train avg loss 0.00358877, dev acc 0.8899, dev avg loss 0.276731, throughput 3.36981K wps
[Epoch 105 Batch 30/172] avg loss 0.00369423, throughput 3.07326K wps
[Epoch 105 Batch 60/172] avg loss 0.00359537, throughput 3.53609K wps
[Epoch 105 Batch 90/172] avg loss 0.00336356, throughput 3.86283K wps
[Epoch 105 Batch 120/172] avg loss 0.00376326, throughput 3.5374K wps
[Epoch 105 Batch 150/172] avg loss 0.00327505, throughput 3.84304K wps
Begin Testing...
[Epoch 105] train avg loss 0.00357967, dev acc 0.8899, dev avg loss 0.276892, throughput 3.5103K wps
[Epoch 106 Batch 30/172] avg loss 0.00359394, throughput 3.42467K wps
[Epoch 106 Batch 60/172] avg loss 0.00337836, throughput 3.41205K wps
[Epoch 106 Batch 90/172] avg loss 0.00351386, throughput 3.28216K wps
[Epoch 106 Batch 120/172] avg loss 0.00358193, throughput 3.31518K wps
[Epoch 106 Batch 150/172] avg loss 0.0038797, throughput 2.91612K wps
Begin Testing...
[Epoch 106] train avg loss 0.00356209, dev acc 0.8899, dev avg loss 0.277604, throughput 3.23905K wps
[Epoch 107 Batch 30/172] avg loss 0.00340163, throughput 3.33364K wps
[Epoch 107 Batch 60/172] avg loss 0.0038163, throughput 3.36512K wps
[Epoch 107 Batch 90/172] avg loss 0.00352505, throughput 3.08216K wps
[Epoch 107 Batch 120/172] avg loss 0.00324485, throughput 3.21454K wps
[Epoch 107 Batch 150/172] avg loss 0.00358916, throughput 3.49287K wps
Begin Testing...
[Epoch 107] train avg loss 0.00348506, dev acc 0.8889, dev avg loss 0.277417, throughput 3.35914K wps
[Epoch 108 Batch 30/172] avg loss 0.00354588, throughput 3.47884K wps
[Epoch 108 Batch 60/172] avg loss 0.00350264, throughput 2.99631K wps
[Epoch 108 Batch 90/172] avg loss 0.00402344, throughput 3.83631K wps
[Epoch 108 Batch 120/172] avg loss 0.00327664, throughput 3.52952K wps
[Epoch 108 Batch 150/172] avg loss 0.00377588, throughput 3.11333K wps
Begin Testing...
[Epoch 108] train avg loss 0.00357987, dev acc 0.8910, dev avg loss 0.277045, throughput 3.37434K wps
[Epoch 109 Batch 30/172] avg loss 0.00352028, throughput 3.11459K wps
[Epoch 109 Batch 60/172] avg loss 0.00362574, throughput 3.28064K wps
[Epoch 109 Batch 90/172] avg loss 0.00345018, throughput 2.99798K wps
[Epoch 109 Batch 120/172] avg loss 0.00352997, throughput 3.22046K wps
[Epoch 109 Batch 150/172] avg loss 0.00358234, throughput 3.28364K wps
Begin Testing...
[Epoch 109] train avg loss 0.00356041, dev acc 0.8899, dev avg loss 0.276597, throughput 3.18225K wps
[Epoch 110 Batch 30/172] avg loss 0.003582, throughput 2.9147K wps
[Epoch 110 Batch 60/172] avg loss 0.00320542, throughput 3.03275K wps
[Epoch 110 Batch 90/172] avg loss 0.00332913, throughput 3.07471K wps
[Epoch 110 Batch 120/172] avg loss 0.00354208, throughput 3.65882K wps
[Epoch 110 Batch 150/172] avg loss 0.0035522, throughput 3.46851K wps
Begin Testing...
[Epoch 110] train avg loss 0.0034713, dev acc 0.8920, dev avg loss 0.277102, throughput 3.17859K wps
Observed Improvement.
Begin Testing...
[Epoch 111 Batch 30/172] avg loss 0.00322986, throughput 2.97171K wps
[Epoch 111 Batch 60/172] avg loss 0.00346928, throughput 3.19574K wps
[Epoch 111 Batch 90/172] avg loss 0.00347722, throughput 3.54808K wps
[Epoch 111 Batch 120/172] avg loss 0.00323896, throughput 3.53445K wps
[Epoch 111 Batch 150/172] avg loss 0.00350857, throughput 3.0021K wps
Begin Testing...
[Epoch 111] train avg loss 0.00343696, dev acc 0.8878, dev avg loss 0.277881, throughput 3.21447K wps
[Epoch 112 Batch 30/172] avg loss 0.00360378, throughput 3.54612K wps
[Epoch 112 Batch 60/172] avg loss 0.00319304, throughput 3.00503K wps
[Epoch 112 Batch 90/172] avg loss 0.00378094, throughput 3.32628K wps
[Epoch 112 Batch 120/172] avg loss 0.00310354, throughput 3.6228K wps
[Epoch 112 Batch 150/172] avg loss 0.00335044, throughput 3.10312K wps
Begin Testing...
[Epoch 112] train avg loss 0.00343579, dev acc 0.8899, dev avg loss 0.276374, throughput 3.27346K wps
[Epoch 113 Batch 30/172] avg loss 0.00348685, throughput 3.18386K wps
[Epoch 113 Batch 60/172] avg loss 0.00312595, throughput 2.97907K wps
[Epoch 113 Batch 90/172] avg loss 0.00391076, throughput 3.08874K wps
[Epoch 113 Batch 120/172] avg loss 0.00312372, throughput 3.04349K wps
[Epoch 113 Batch 150/172] avg loss 0.00337226, throughput 3.1942K wps
Begin Testing...
[Epoch 113] train avg loss 0.00336848, dev acc 0.8889, dev avg loss 0.276767, throughput 3.09842K wps
[Epoch 114 Batch 30/172] avg loss 0.00325445, throughput 3.94073K wps
[Epoch 114 Batch 60/172] avg loss 0.00384816, throughput 3.02241K wps
[Epoch 114 Batch 90/172] avg loss 0.00338682, throughput 3.22717K wps
[Epoch 114 Batch 120/172] avg loss 0.00324994, throughput 3.07452K wps
[Epoch 114 Batch 150/172] avg loss 0.00355836, throughput 3.43757K wps
Begin Testing...
[Epoch 114] train avg loss 0.00343827, dev acc 0.8910, dev avg loss 0.276542, throughput 3.27076K wps
[Epoch 115 Batch 30/172] avg loss 0.00290444, throughput 3.22811K wps
[Epoch 115 Batch 60/172] avg loss 0.00366241, throughput 3.74551K wps
[Epoch 115 Batch 90/172] avg loss 0.00363667, throughput 3.6754K wps
[Epoch 115 Batch 120/172] avg loss 0.00323663, throughput 3.67369K wps
[Epoch 115 Batch 150/172] avg loss 0.00344813, throughput 3.112K wps
Begin Testing...
[Epoch 115] train avg loss 0.00340938, dev acc 0.8910, dev avg loss 0.276943, throughput 3.48192K wps
[Epoch 116 Batch 30/172] avg loss 0.00330804, throughput 3.50416K wps
[Epoch 116 Batch 60/172] avg loss 0.00309343, throughput 3.13875K wps
[Epoch 116 Batch 90/172] avg loss 0.00326401, throughput 3.05559K wps
[Epoch 116 Batch 120/172] avg loss 0.00331142, throughput 3.2212K wps
[Epoch 116 Batch 150/172] avg loss 0.0037606, throughput 3.45988K wps
Begin Testing...
[Epoch 116] train avg loss 0.00336936, dev acc 0.8889, dev avg loss 0.27635, throughput 3.27818K wps
[Epoch 117 Batch 30/172] avg loss 0.00362315, throughput 3.28107K wps
[Epoch 117 Batch 60/172] avg loss 0.0030854, throughput 3.67978K wps
[Epoch 117 Batch 90/172] avg loss 0.00321052, throughput 3.2638K wps
[Epoch 117 Batch 120/172] avg loss 0.00312067, throughput 3.30156K wps
[Epoch 117 Batch 150/172] avg loss 0.00362087, throughput 3.16038K wps
Begin Testing...
[Epoch 117] train avg loss 0.00339507, dev acc 0.8910, dev avg loss 0.276413, throughput 3.28973K wps
[Epoch 118 Batch 30/172] avg loss 0.00306637, throughput 3.63468K wps
[Epoch 118 Batch 60/172] avg loss 0.00367963, throughput 3.35049K wps
[Epoch 118 Batch 90/172] avg loss 0.00328632, throughput 3.50435K wps
[Epoch 118 Batch 120/172] avg loss 0.00351553, throughput 3.3384K wps
[Epoch 118 Batch 150/172] avg loss 0.00336098, throughput 3.50835K wps
Begin Testing...
[Epoch 118] train avg loss 0.00334986, dev acc 0.8910, dev avg loss 0.27639, throughput 3.45774K wps
[Epoch 119 Batch 30/172] avg loss 0.00316503, throughput 3.02332K wps
[Epoch 119 Batch 60/172] avg loss 0.0034894, throughput 3.54697K wps
[Epoch 119 Batch 90/172] avg loss 0.00343435, throughput 4.21758K wps
[Epoch 119 Batch 120/172] avg loss 0.00300375, throughput 3.93712K wps
[Epoch 119 Batch 150/172] avg loss 0.00336864, throughput 3.38601K wps
Begin Testing...
[Epoch 119] train avg loss 0.00331404, dev acc 0.8878, dev avg loss 0.276632, throughput 3.48711K wps
[Epoch 120 Batch 30/172] avg loss 0.00320779, throughput 3.10841K wps
[Epoch 120 Batch 60/172] avg loss 0.00315765, throughput 3.09899K wps
[Epoch 120 Batch 90/172] avg loss 0.00325685, throughput 3.8693K wps
[Epoch 120 Batch 120/172] avg loss 0.00357846, throughput 3.20267K wps
[Epoch 120 Batch 150/172] avg loss 0.00305545, throughput 3.17484K wps
Begin Testing...
[Epoch 120] train avg loss 0.00327534, dev acc 0.8910, dev avg loss 0.277406, throughput 3.28077K wps
[Epoch 121 Batch 30/172] avg loss 0.00356067, throughput 3.18297K wps
[Epoch 121 Batch 60/172] avg loss 0.00281527, throughput 3.37766K wps
[Epoch 121 Batch 90/172] avg loss 0.00332337, throughput 3.49412K wps
[Epoch 121 Batch 120/172] avg loss 0.00359174, throughput 3.19786K wps
[Epoch 121 Batch 150/172] avg loss 0.00335963, throughput 3.37158K wps
Begin Testing...
[Epoch 121] train avg loss 0.00332246, dev acc 0.8920, dev avg loss 0.277831, throughput 3.35476K wps
Observed Improvement.
Begin Testing...
[Epoch 122 Batch 30/172] avg loss 0.00307352, throughput 3.32072K wps
[Epoch 122 Batch 60/172] avg loss 0.00320259, throughput 3.18189K wps
[Epoch 122 Batch 90/172] avg loss 0.00340958, throughput 3.9267K wps
[Epoch 122 Batch 120/172] avg loss 0.00331796, throughput 3.5264K wps
[Epoch 122 Batch 150/172] avg loss 0.00332677, throughput 3.26784K wps
Begin Testing...
[Epoch 122] train avg loss 0.00325231, dev acc 0.8910, dev avg loss 0.278055, throughput 3.51658K wps
[Epoch 123 Batch 30/172] avg loss 0.00370421, throughput 3.32484K wps
[Epoch 123 Batch 60/172] avg loss 0.00319491, throughput 3.12936K wps
[Epoch 123 Batch 90/172] avg loss 0.00314357, throughput 3.0626K wps
[Epoch 123 Batch 120/172] avg loss 0.00310039, throughput 3.01648K wps
[Epoch 123 Batch 150/172] avg loss 0.00315428, throughput 3.05584K wps
Begin Testing...
[Epoch 123] train avg loss 0.00325519, dev acc 0.8899, dev avg loss 0.277461, throughput 3.16898K wps
[Epoch 124 Batch 30/172] avg loss 0.00315844, throughput 3.39124K wps
[Epoch 124 Batch 60/172] avg loss 0.00313456, throughput 3.37494K wps
[Epoch 124 Batch 90/172] avg loss 0.00326648, throughput 3.32715K wps
[Epoch 124 Batch 120/172] avg loss 0.00348169, throughput 3.10439K wps
[Epoch 124 Batch 150/172] avg loss 0.00289383, throughput 3.29389K wps
Begin Testing...
[Epoch 124] train avg loss 0.00323363, dev acc 0.8878, dev avg loss 0.276466, throughput 3.30643K wps
[Epoch 125 Batch 30/172] avg loss 0.00320597, throughput 3.26637K wps
[Epoch 125 Batch 60/172] avg loss 0.00344298, throughput 3.07473K wps
[Epoch 125 Batch 90/172] avg loss 0.00328838, throughput 3.12539K wps
[Epoch 125 Batch 120/172] avg loss 0.00308841, throughput 3.75207K wps
[Epoch 125 Batch 150/172] avg loss 0.00332836, throughput 3.14521K wps
Begin Testing...
[Epoch 125] train avg loss 0.00328585, dev acc 0.8910, dev avg loss 0.27657, throughput 3.30877K wps
[Epoch 126 Batch 30/172] avg loss 0.00295084, throughput 3.36467K wps
[Epoch 126 Batch 60/172] avg loss 0.00359573, throughput 3.80071K wps
[Epoch 126 Batch 90/172] avg loss 0.00334053, throughput 3.39681K wps
[Epoch 126 Batch 120/172] avg loss 0.00329454, throughput 3.10364K wps
[Epoch 126 Batch 150/172] avg loss 0.00324109, throughput 3.40935K wps
Begin Testing...
[Epoch 126] train avg loss 0.00325753, dev acc 0.8910, dev avg loss 0.276937, throughput 3.35399K wps
[Epoch 127 Batch 30/172] avg loss 0.00302202, throughput 3.53266K wps
[Epoch 127 Batch 60/172] avg loss 0.00339878, throughput 3.28729K wps
[Epoch 127 Batch 90/172] avg loss 0.00304856, throughput 3.86631K wps
[Epoch 127 Batch 120/172] avg loss 0.00317195, throughput 3.38145K wps
[Epoch 127 Batch 150/172] avg loss 0.00305337, throughput 3.18107K wps
Begin Testing...
[Epoch 127] train avg loss 0.00313518, dev acc 0.8899, dev avg loss 0.27775, throughput 3.40968K wps
[Epoch 128 Batch 30/172] avg loss 0.00327587, throughput 3.43607K wps
[Epoch 128 Batch 60/172] avg loss 0.00283793, throughput 3.04095K wps
[Epoch 128 Batch 90/172] avg loss 0.00293052, throughput 3.33252K wps
[Epoch 128 Batch 120/172] avg loss 0.00330912, throughput 3.35921K wps
[Epoch 128 Batch 150/172] avg loss 0.00344206, throughput 3.51953K wps
Begin Testing...
[Epoch 128] train avg loss 0.00315656, dev acc 0.8889, dev avg loss 0.277392, throughput 3.34479K wps
[Epoch 129 Batch 30/172] avg loss 0.00346548, throughput 3.79033K wps
[Epoch 129 Batch 60/172] avg loss 0.00280072, throughput 3.4818K wps
[Epoch 129 Batch 90/172] avg loss 0.00298693, throughput 3.08892K wps
[Epoch 129 Batch 120/172] avg loss 0.00321451, throughput 3.38518K wps
[Epoch 129 Batch 150/172] avg loss 0.00322955, throughput 2.98746K wps
Begin Testing...
[Epoch 129] train avg loss 0.00318524, dev acc 0.8878, dev avg loss 0.279545, throughput 3.32983K wps
[Epoch 130 Batch 30/172] avg loss 0.00300358, throughput 3.05778K wps
[Epoch 130 Batch 60/172] avg loss 0.00328329, throughput 3.93686K wps
[Epoch 130 Batch 90/172] avg loss 0.00312892, throughput 3.10326K wps
[Epoch 130 Batch 120/172] avg loss 0.00335097, throughput 3.08164K wps
[Epoch 130 Batch 150/172] avg loss 0.00317889, throughput 3.27515K wps
Begin Testing...
[Epoch 130] train avg loss 0.00313768, dev acc 0.8878, dev avg loss 0.277466, throughput 3.2794K wps
[Epoch 131 Batch 30/172] avg loss 0.00326265, throughput 3.00743K wps
[Epoch 131 Batch 60/172] avg loss 0.00319834, throughput 3.07632K wps
[Epoch 131 Batch 90/172] avg loss 0.00302265, throughput 3.57231K wps
[Epoch 131 Batch 120/172] avg loss 0.00305621, throughput 3.34676K wps
[Epoch 131 Batch 150/172] avg loss 0.00311009, throughput 3.17913K wps
Begin Testing...
[Epoch 131] train avg loss 0.0031008, dev acc 0.8910, dev avg loss 0.276893, throughput 3.22511K wps
[Epoch 132 Batch 30/172] avg loss 0.00306003, throughput 3.70379K wps
[Epoch 132 Batch 60/172] avg loss 0.00338903, throughput 3.1356K wps
[Epoch 132 Batch 90/172] avg loss 0.00324836, throughput 3.3374K wps
[Epoch 132 Batch 120/172] avg loss 0.00275874, throughput 3.07318K wps
[Epoch 132 Batch 150/172] avg loss 0.00324411, throughput 3.23599K wps
Begin Testing...
[Epoch 132] train avg loss 0.00311365, dev acc 0.8889, dev avg loss 0.277181, throughput 3.35129K wps
[Epoch 133 Batch 30/172] avg loss 0.00289378, throughput 3.20284K wps
[Epoch 133 Batch 60/172] avg loss 0.00333548, throughput 3.37544K wps
[Epoch 133 Batch 90/172] avg loss 0.00310943, throughput 3.16985K wps
[Epoch 133 Batch 120/172] avg loss 0.00316822, throughput 3.44402K wps
[Epoch 133 Batch 150/172] avg loss 0.00333425, throughput 3.05404K wps
Begin Testing...
[Epoch 133] train avg loss 0.0031096, dev acc 0.8889, dev avg loss 0.279096, throughput 3.21375K wps
[Epoch 134 Batch 30/172] avg loss 0.00298641, throughput 3.63299K wps
[Epoch 134 Batch 60/172] avg loss 0.00322491, throughput 3.29211K wps
[Epoch 134 Batch 90/172] avg loss 0.00306252, throughput 2.94085K wps
[Epoch 134 Batch 120/172] avg loss 0.00306062, throughput 3.44329K wps
[Epoch 134 Batch 150/172] avg loss 0.00333637, throughput 3.45295K wps
Begin Testing...
[Epoch 134] train avg loss 0.00309701, dev acc 0.8920, dev avg loss 0.277948, throughput 3.36494K wps
Observed Improvement.
Begin Testing...
[Epoch 135 Batch 30/172] avg loss 0.0032603, throughput 3.26535K wps
[Epoch 135 Batch 60/172] avg loss 0.00319637, throughput 3.31819K wps
[Epoch 135 Batch 90/172] avg loss 0.00290782, throughput 3.38641K wps
[Epoch 135 Batch 120/172] avg loss 0.00323739, throughput 3.23567K wps
[Epoch 135 Batch 150/172] avg loss 0.00322334, throughput 2.99011K wps
Begin Testing...
[Epoch 135] train avg loss 0.00313473, dev acc 0.8889, dev avg loss 0.278925, throughput 3.22918K wps
[Epoch 136 Batch 30/172] avg loss 0.00281186, throughput 3.54261K wps
[Epoch 136 Batch 60/172] avg loss 0.0033496, throughput 2.97958K wps
[Epoch 136 Batch 90/172] avg loss 0.00285661, throughput 3.04628K wps
[Epoch 136 Batch 120/172] avg loss 0.00341762, throughput 3.2992K wps
[Epoch 136 Batch 150/172] avg loss 0.00313327, throughput 3.56255K wps
Begin Testing...
[Epoch 136] train avg loss 0.00309769, dev acc 0.8899, dev avg loss 0.278323, throughput 3.31834K wps
[Epoch 137 Batch 30/172] avg loss 0.00327871, throughput 2.97944K wps
[Epoch 137 Batch 60/172] avg loss 0.00274762, throughput 3.0414K wps
[Epoch 137 Batch 90/172] avg loss 0.00309741, throughput 3.46366K wps
[Epoch 137 Batch 120/172] avg loss 0.00330511, throughput 3.15218K wps
[Epoch 137 Batch 150/172] avg loss 0.0027607, throughput 3.24714K wps
Begin Testing...
[Epoch 137] train avg loss 0.00305263, dev acc 0.8878, dev avg loss 0.277704, throughput 3.19868K wps
[Epoch 138 Batch 30/172] avg loss 0.00303972, throughput 3.47031K wps
[Epoch 138 Batch 60/172] avg loss 0.00305273, throughput 3.39599K wps
[Epoch 138 Batch 90/172] avg loss 0.003315, throughput 2.997K wps
[Epoch 138 Batch 120/172] avg loss 0.00309019, throughput 3.7125K wps
[Epoch 138 Batch 150/172] avg loss 0.00315034, throughput 3.24721K wps
Begin Testing...
[Epoch 138] train avg loss 0.00306582, dev acc 0.8899, dev avg loss 0.277519, throughput 3.37276K wps
[Epoch 139 Batch 30/172] avg loss 0.00314938, throughput 3.16309K wps
[Epoch 139 Batch 60/172] avg loss 0.0032401, throughput 2.99688K wps
[Epoch 139 Batch 90/172] avg loss 0.00273629, throughput 3.21079K wps
[Epoch 139 Batch 120/172] avg loss 0.00290705, throughput 3.38864K wps
[Epoch 139 Batch 150/172] avg loss 0.00295109, throughput 3.1899K wps
Begin Testing...
[Epoch 139] train avg loss 0.00298063, dev acc 0.8910, dev avg loss 0.280767, throughput 3.20504K wps
[Epoch 140 Batch 30/172] avg loss 0.00276114, throughput 2.88291K wps
[Epoch 140 Batch 60/172] avg loss 0.00323748, throughput 3.10714K wps
[Epoch 140 Batch 90/172] avg loss 0.00297264, throughput 3.31337K wps
[Epoch 140 Batch 120/172] avg loss 0.0027403, throughput 3.12976K wps
[Epoch 140 Batch 150/172] avg loss 0.00338579, throughput 3.60527K wps
Begin Testing...
[Epoch 140] train avg loss 0.00304656, dev acc 0.8899, dev avg loss 0.278127, throughput 3.16936K wps
[Epoch 141 Batch 30/172] avg loss 0.00296777, throughput 3.01173K wps
[Epoch 141 Batch 60/172] avg loss 0.00305277, throughput 3.86956K wps
[Epoch 141 Batch 90/172] avg loss 0.00305597, throughput 3.4662K wps
[Epoch 141 Batch 120/172] avg loss 0.00333804, throughput 3.70826K wps
[Epoch 141 Batch 150/172] avg loss 0.00299174, throughput 3.24334K wps
Begin Testing...
[Epoch 141] train avg loss 0.00305189, dev acc 0.8899, dev avg loss 0.277524, throughput 3.44825K wps
[Epoch 142 Batch 30/172] avg loss 0.00300221, throughput 3.01169K wps
[Epoch 142 Batch 60/172] avg loss 0.00281884, throughput 3.55336K wps
[Epoch 142 Batch 90/172] avg loss 0.0028921, throughput 3.0875K wps
[Epoch 142 Batch 120/172] avg loss 0.00305298, throughput 3.32543K wps
[Epoch 142 Batch 150/172] avg loss 0.00273914, throughput 3.36248K wps
Begin Testing...
[Epoch 142] train avg loss 0.00294757, dev acc 0.8878, dev avg loss 0.277676, throughput 3.31225K wps
[Epoch 143 Batch 30/172] avg loss 0.00304314, throughput 3.25714K wps
[Epoch 143 Batch 60/172] avg loss 0.0029343, throughput 2.99084K wps
[Epoch 143 Batch 90/172] avg loss 0.00294066, throughput 3.32974K wps
[Epoch 143 Batch 120/172] avg loss 0.00318291, throughput 3.60052K wps
[Epoch 143 Batch 150/172] avg loss 0.00302319, throughput 3.18917K wps
Begin Testing...
[Epoch 143] train avg loss 0.00296844, dev acc 0.8910, dev avg loss 0.278824, throughput 3.32363K wps
[Epoch 144 Batch 30/172] avg loss 0.00282801, throughput 3.06209K wps
[Epoch 144 Batch 60/172] avg loss 0.00298921, throughput 3.14182K wps
[Epoch 144 Batch 90/172] avg loss 0.00310851, throughput 3.23752K wps
[Epoch 144 Batch 120/172] avg loss 0.00283815, throughput 3.18733K wps
[Epoch 144 Batch 150/172] avg loss 0.00311052, throughput 3.37944K wps
Begin Testing...
[Epoch 144] train avg loss 0.00296192, dev acc 0.8931, dev avg loss 0.278305, throughput 3.2555K wps
Observed Improvement.
Begin Testing...
[Epoch 145 Batch 30/172] avg loss 0.00301673, throughput 3.57271K wps
[Epoch 145 Batch 60/172] avg loss 0.00296442, throughput 3.1071K wps
[Epoch 145 Batch 90/172] avg loss 0.0027907, throughput 3.16286K wps
[Epoch 145 Batch 120/172] avg loss 0.00307003, throughput 3.15354K wps
[Epoch 145 Batch 150/172] avg loss 0.00292861, throughput 2.89822K wps
Begin Testing...
[Epoch 145] train avg loss 0.00297576, dev acc 0.8920, dev avg loss 0.278923, throughput 3.17223K wps
[Epoch 146 Batch 30/172] avg loss 0.00293942, throughput 3.04251K wps
[Epoch 146 Batch 60/172] avg loss 0.00304141, throughput 3.25649K wps
[Epoch 146 Batch 90/172] avg loss 0.00312538, throughput 2.96375K wps
[Epoch 146 Batch 120/172] avg loss 0.00264554, throughput 3.71819K wps
[Epoch 146 Batch 150/172] avg loss 0.00303922, throughput 3.77342K wps
Begin Testing...
[Epoch 146] train avg loss 0.00294323, dev acc 0.8899, dev avg loss 0.27888, throughput 3.3443K wps
[Epoch 147 Batch 30/172] avg loss 0.00282217, throughput 3.64919K wps
[Epoch 147 Batch 60/172] avg loss 0.00295119, throughput 3.06215K wps
[Epoch 147 Batch 90/172] avg loss 0.00282613, throughput 3.39725K wps
[Epoch 147 Batch 120/172] avg loss 0.00300864, throughput 3.59175K wps
[Epoch 147 Batch 150/172] avg loss 0.00321743, throughput 3.20913K wps
Begin Testing...
[Epoch 147] train avg loss 0.00291231, dev acc 0.8889, dev avg loss 0.278996, throughput 3.3751K wps
[Epoch 148 Batch 30/172] avg loss 0.0028822, throughput 3.34748K wps
[Epoch 148 Batch 60/172] avg loss 0.00241727, throughput 3.34075K wps
[Epoch 148 Batch 90/172] avg loss 0.00317472, throughput 3.39789K wps
[Epoch 148 Batch 120/172] avg loss 0.00268865, throughput 3.32488K wps
[Epoch 148 Batch 150/172] avg loss 0.00354325, throughput 3.14293K wps
Begin Testing...
[Epoch 148] train avg loss 0.00293361, dev acc 0.8899, dev avg loss 0.27888, throughput 3.27983K wps
[Epoch 149 Batch 30/172] avg loss 0.00296922, throughput 3.02694K wps
[Epoch 149 Batch 60/172] avg loss 0.00295303, throughput 3.30672K wps
[Epoch 149 Batch 90/172] avg loss 0.00291433, throughput 3.44588K wps
[Epoch 149 Batch 120/172] avg loss 0.00280301, throughput 3.47423K wps
[Epoch 149 Batch 150/172] avg loss 0.00329091, throughput 3.85034K wps
Begin Testing...
[Epoch 149] train avg loss 0.00290667, dev acc 0.8889, dev avg loss 0.279431, throughput 3.40837K wps
[Epoch 150 Batch 30/172] avg loss 0.00317958, throughput 3.34887K wps
[Epoch 150 Batch 60/172] avg loss 0.00270124, throughput 3.1015K wps
[Epoch 150 Batch 90/172] avg loss 0.00312651, throughput 3.53101K wps
[Epoch 150 Batch 120/172] avg loss 0.00259828, throughput 3.592K wps
[Epoch 150 Batch 150/172] avg loss 0.00285849, throughput 3.48091K wps
Begin Testing...
[Epoch 150] train avg loss 0.00290364, dev acc 0.8920, dev avg loss 0.277768, throughput 3.39417K wps
[Epoch 151 Batch 30/172] avg loss 0.00286052, throughput 3.88483K wps
[Epoch 151 Batch 60/172] avg loss 0.00294991, throughput 3.34977K wps
[Epoch 151 Batch 90/172] avg loss 0.00301047, throughput 3.82159K wps
[Epoch 151 Batch 120/172] avg loss 0.0027187, throughput 3.01701K wps
[Epoch 151 Batch 150/172] avg loss 0.00271784, throughput 2.99124K wps
Begin Testing...
[Epoch 151] train avg loss 0.00283613, dev acc 0.8899, dev avg loss 0.278126, throughput 3.43251K wps
[Epoch 152 Batch 30/172] avg loss 0.00298008, throughput 3.26012K wps
[Epoch 152 Batch 60/172] avg loss 0.0027687, throughput 3.20387K wps
[Epoch 152 Batch 90/172] avg loss 0.00330191, throughput 3.13181K wps
[Epoch 152 Batch 120/172] avg loss 0.00251393, throughput 3.60892K wps
[Epoch 152 Batch 150/172] avg loss 0.00277943, throughput 3.40022K wps
Begin Testing...
[Epoch 152] train avg loss 0.00284892, dev acc 0.8889, dev avg loss 0.279846, throughput 3.27756K wps
[Epoch 153 Batch 30/172] avg loss 0.00273686, throughput 3.13124K wps
[Epoch 153 Batch 60/172] avg loss 0.00294789, throughput 3.22904K wps
[Epoch 153 Batch 90/172] avg loss 0.00258614, throughput 3.03399K wps
[Epoch 153 Batch 120/172] avg loss 0.00282931, throughput 3.15852K wps
[Epoch 153 Batch 150/172] avg loss 0.00287712, throughput 3.44775K wps
Begin Testing...
[Epoch 153] train avg loss 0.00282467, dev acc 0.8931, dev avg loss 0.278852, throughput 3.20233K wps
Observed Improvement.
Begin Testing...
[Epoch 154 Batch 30/172] avg loss 0.00269321, throughput 3.67662K wps
[Epoch 154 Batch 60/172] avg loss 0.00311812, throughput 3.16706K wps
[Epoch 154 Batch 90/172] avg loss 0.00275974, throughput 3.69184K wps
[Epoch 154 Batch 120/172] avg loss 0.00284183, throughput 4.02033K wps
[Epoch 154 Batch 150/172] avg loss 0.00245562, throughput 3.31949K wps
Begin Testing...
[Epoch 154] train avg loss 0.0028235, dev acc 0.8878, dev avg loss 0.28116, throughput 3.64218K wps
[Epoch 155 Batch 30/172] avg loss 0.0027471, throughput 2.97741K wps
[Epoch 155 Batch 60/172] avg loss 0.0029672, throughput 3.80372K wps
[Epoch 155 Batch 90/172] avg loss 0.00308721, throughput 3.09185K wps
[Epoch 155 Batch 120/172] avg loss 0.00271897, throughput 3.3335K wps
[Epoch 155 Batch 150/172] avg loss 0.00259356, throughput 3.13126K wps
Begin Testing...
[Epoch 155] train avg loss 0.00283854, dev acc 0.8931, dev avg loss 0.278466, throughput 3.24678K wps
Observed Improvement.
Begin Testing...
[Epoch 156 Batch 30/172] avg loss 0.00266719, throughput 3.05707K wps
[Epoch 156 Batch 60/172] avg loss 0.00322382, throughput 3.31719K wps
[Epoch 156 Batch 90/172] avg loss 0.00300469, throughput 3.36298K wps
[Epoch 156 Batch 120/172] avg loss 0.00233125, throughput 3.1328K wps
[Epoch 156 Batch 150/172] avg loss 0.00277836, throughput 3.46458K wps
Begin Testing...
[Epoch 156] train avg loss 0.00278988, dev acc 0.8889, dev avg loss 0.279758, throughput 3.26757K wps
[Epoch 157 Batch 30/172] avg loss 0.00294831, throughput 3.32798K wps
[Epoch 157 Batch 60/172] avg loss 0.00270951, throughput 3.38886K wps
[Epoch 157 Batch 90/172] avg loss 0.00301718, throughput 3.18903K wps
[Epoch 157 Batch 120/172] avg loss 0.00284424, throughput 3.60229K wps
[Epoch 157 Batch 150/172] avg loss 0.00277717, throughput 2.99621K wps
Begin Testing...
[Epoch 157] train avg loss 0.00282253, dev acc 0.8899, dev avg loss 0.279913, throughput 3.2605K wps
[Epoch 158 Batch 30/172] avg loss 0.00248953, throughput 3.7656K wps
[Epoch 158 Batch 60/172] avg loss 0.00285212, throughput 3.66804K wps
[Epoch 158 Batch 90/172] avg loss 0.00275086, throughput 3.25189K wps
[Epoch 158 Batch 120/172] avg loss 0.00285105, throughput 3.47965K wps
[Epoch 158 Batch 150/172] avg loss 0.00280776, throughput 3.05296K wps
Begin Testing...
[Epoch 158] train avg loss 0.00275432, dev acc 0.8899, dev avg loss 0.279574, throughput 3.40857K wps
[Epoch 159 Batch 30/172] avg loss 0.0029164, throughput 3.34078K wps
[Epoch 159 Batch 60/172] avg loss 0.00280881, throughput 3.16633K wps
[Epoch 159 Batch 90/172] avg loss 0.00254729, throughput 3.28242K wps
[Epoch 159 Batch 120/172] avg loss 0.00274711, throughput 3.45354K wps
[Epoch 159 Batch 150/172] avg loss 0.00282208, throughput 3.19955K wps
Begin Testing...
[Epoch 159] train avg loss 0.002729, dev acc 0.8899, dev avg loss 0.281506, throughput 3.26664K wps
[Epoch 160 Batch 30/172] avg loss 0.0025287, throughput 3.32963K wps
[Epoch 160 Batch 60/172] avg loss 0.0025207, throughput 3.19334K wps
[Epoch 160 Batch 90/172] avg loss 0.00273296, throughput 3.30833K wps
[Epoch 160 Batch 120/172] avg loss 0.003145, throughput 2.94866K wps
[Epoch 160 Batch 150/172] avg loss 0.00252908, throughput 2.92958K wps
Begin Testing...
[Epoch 160] train avg loss 0.00269092, dev acc 0.8941, dev avg loss 0.280914, throughput 3.11147K wps
Observed Improvement.
Begin Testing...
[Epoch 161 Batch 30/172] avg loss 0.00283396, throughput 3.34245K wps
[Epoch 161 Batch 60/172] avg loss 0.00278245, throughput 3.43207K wps
[Epoch 161 Batch 90/172] avg loss 0.00287293, throughput 3.2796K wps
[Epoch 161 Batch 120/172] avg loss 0.00264421, throughput 3.03664K wps
[Epoch 161 Batch 150/172] avg loss 0.0028018, throughput 2.97656K wps
Begin Testing...
[Epoch 161] train avg loss 0.00275042, dev acc 0.8899, dev avg loss 0.280575, throughput 3.25753K wps
[Epoch 162 Batch 30/172] avg loss 0.00290521, throughput 3.25212K wps
[Epoch 162 Batch 60/172] avg loss 0.00273229, throughput 3.65541K wps
[Epoch 162 Batch 90/172] avg loss 0.00300675, throughput 3.7582K wps
[Epoch 162 Batch 120/172] avg loss 0.00247052, throughput 3.5255K wps
[Epoch 162 Batch 150/172] avg loss 0.00261358, throughput 3.04268K wps
Begin Testing...
[Epoch 162] train avg loss 0.0027193, dev acc 0.8920, dev avg loss 0.28126, throughput 3.42039K wps
[Epoch 163 Batch 30/172] avg loss 0.00286341, throughput 3.35397K wps
[Epoch 163 Batch 60/172] avg loss 0.00284015, throughput 3.10513K wps
[Epoch 163 Batch 90/172] avg loss 0.00243016, throughput 3.44589K wps
[Epoch 163 Batch 120/172] avg loss 0.00231289, throughput 3.10299K wps
[Epoch 163 Batch 150/172] avg loss 0.0027376, throughput 4.20069K wps
Begin Testing...
[Epoch 163] train avg loss 0.00266759, dev acc 0.8889, dev avg loss 0.280364, throughput 3.3996K wps
[Epoch 164 Batch 30/172] avg loss 0.00239015, throughput 3.48661K wps
[Epoch 164 Batch 60/172] avg loss 0.00243963, throughput 3.0176K wps
[Epoch 164 Batch 90/172] avg loss 0.00256284, throughput 3.43075K wps
[Epoch 164 Batch 120/172] avg loss 0.00299651, throughput 3.32051K wps
[Epoch 164 Batch 150/172] avg loss 0.00263353, throughput 3.42527K wps
Begin Testing...
[Epoch 164] train avg loss 0.00268654, dev acc 0.8889, dev avg loss 0.280341, throughput 3.41248K wps
[Epoch 165 Batch 30/172] avg loss 0.00241143, throughput 3.21602K wps
[Epoch 165 Batch 60/172] avg loss 0.00266355, throughput 3.56452K wps
[Epoch 165 Batch 90/172] avg loss 0.00291658, throughput 3.5261K wps
[Epoch 165 Batch 120/172] avg loss 0.00298617, throughput 4.01779K wps
[Epoch 165 Batch 150/172] avg loss 0.00247586, throughput 3.43026K wps
Begin Testing...
[Epoch 165] train avg loss 0.0027011, dev acc 0.8899, dev avg loss 0.280477, throughput 3.44721K wps
[Epoch 166 Batch 30/172] avg loss 0.00269765, throughput 3.16534K wps
[Epoch 166 Batch 60/172] avg loss 0.00283265, throughput 3.01576K wps
[Epoch 166 Batch 90/172] avg loss 0.0025097, throughput 3.28839K wps
[Epoch 166 Batch 120/172] avg loss 0.00242149, throughput 3.66987K wps
[Epoch 166 Batch 150/172] avg loss 0.00286828, throughput 3.84245K wps
Begin Testing...
[Epoch 166] train avg loss 0.0026799, dev acc 0.8910, dev avg loss 0.281207, throughput 3.3149K wps
[Epoch 167 Batch 30/172] avg loss 0.00263158, throughput 3.02147K wps
[Epoch 167 Batch 60/172] avg loss 0.00258804, throughput 4.01814K wps
[Epoch 167 Batch 90/172] avg loss 0.00268757, throughput 3.35692K wps
[Epoch 167 Batch 120/172] avg loss 0.00248384, throughput 3.10867K wps
[Epoch 167 Batch 150/172] avg loss 0.00258453, throughput 3.38622K wps
Begin Testing...
[Epoch 167] train avg loss 0.00260743, dev acc 0.8899, dev avg loss 0.280826, throughput 3.31939K wps
[Epoch 168 Batch 30/172] avg loss 0.00276852, throughput 3.42892K wps
[Epoch 168 Batch 60/172] avg loss 0.00238684, throughput 4.32249K wps
[Epoch 168 Batch 90/172] avg loss 0.00248425, throughput 3.16303K wps
[Epoch 168 Batch 120/172] avg loss 0.00264794, throughput 3.2279K wps
[Epoch 168 Batch 150/172] avg loss 0.00267576, throughput 3.48999K wps
Begin Testing...
[Epoch 168] train avg loss 0.00264768, dev acc 0.8899, dev avg loss 0.280595, throughput 3.53748K wps
[Epoch 169 Batch 30/172] avg loss 0.00258595, throughput 3.04484K wps
[Epoch 169 Batch 60/172] avg loss 0.00273659, throughput 3.32567K wps
[Epoch 169 Batch 90/172] avg loss 0.0024953, throughput 3.32729K wps
[Epoch 169 Batch 120/172] avg loss 0.00274037, throughput 3.14171K wps
[Epoch 169 Batch 150/172] avg loss 0.00268777, throughput 3.44916K wps
Begin Testing...
[Epoch 169] train avg loss 0.00266018, dev acc 0.8899, dev avg loss 0.280704, throughput 3.29762K wps
[Epoch 170 Batch 30/172] avg loss 0.00259988, throughput 3.20988K wps
[Epoch 170 Batch 60/172] avg loss 0.00257089, throughput 2.98167K wps
[Epoch 170 Batch 90/172] avg loss 0.00240085, throughput 3.14274K wps
[Epoch 170 Batch 120/172] avg loss 0.0027927, throughput 3.11304K wps
[Epoch 170 Batch 150/172] avg loss 0.0026419, throughput 3.54806K wps
Begin Testing...
[Epoch 170] train avg loss 0.00262273, dev acc 0.8952, dev avg loss 0.281188, throughput 3.24816K wps
Observed Improvement.
Begin Testing...
[Epoch 171 Batch 30/172] avg loss 0.00262244, throughput 2.93547K wps
[Epoch 171 Batch 60/172] avg loss 0.00236531, throughput 3.05243K wps
[Epoch 171 Batch 90/172] avg loss 0.00268793, throughput 2.99279K wps
[Epoch 171 Batch 120/172] avg loss 0.00243763, throughput 3.36414K wps
[Epoch 171 Batch 150/172] avg loss 0.00265205, throughput 3.26667K wps
Begin Testing...
[Epoch 171] train avg loss 0.00257602, dev acc 0.8910, dev avg loss 0.280599, throughput 3.08041K wps
[Epoch 172 Batch 30/172] avg loss 0.00266974, throughput 3.40355K wps
[Epoch 172 Batch 60/172] avg loss 0.00260155, throughput 3.18438K wps
[Epoch 172 Batch 90/172] avg loss 0.00267849, throughput 3.16386K wps
[Epoch 172 Batch 120/172] avg loss 0.00257381, throughput 3.38464K wps
[Epoch 172 Batch 150/172] avg loss 0.0028005, throughput 2.96026K wps
Begin Testing...
[Epoch 172] train avg loss 0.00264413, dev acc 0.8920, dev avg loss 0.280643, throughput 3.23626K wps
[Epoch 173 Batch 30/172] avg loss 0.00238833, throughput 3.03558K wps
[Epoch 173 Batch 60/172] avg loss 0.00263187, throughput 3.13093K wps
[Epoch 173 Batch 90/172] avg loss 0.00251103, throughput 3.50273K wps
[Epoch 173 Batch 120/172] avg loss 0.00225129, throughput 3.51935K wps
[Epoch 173 Batch 150/172] avg loss 0.00239982, throughput 2.91887K wps
Begin Testing...
[Epoch 173] train avg loss 0.00250495, dev acc 0.8899, dev avg loss 0.281501, throughput 3.16837K wps
[Epoch 174 Batch 30/172] avg loss 0.00258337, throughput 3.0211K wps
[Epoch 174 Batch 60/172] avg loss 0.00278747, throughput 3.29235K wps
[Epoch 174 Batch 90/172] avg loss 0.00228651, throughput 3.03707K wps
[Epoch 174 Batch 120/172] avg loss 0.00244399, throughput 3.30091K wps
[Epoch 174 Batch 150/172] avg loss 0.00263828, throughput 3.39094K wps
Begin Testing...
[Epoch 174] train avg loss 0.00251843, dev acc 0.8962, dev avg loss 0.28281, throughput 3.1634K wps
Observed Improvement.
Begin Testing...
[Epoch 175 Batch 30/172] avg loss 0.00252911, throughput 3.69438K wps
[Epoch 175 Batch 60/172] avg loss 0.00231658, throughput 3.58969K wps
[Epoch 175 Batch 90/172] avg loss 0.00268937, throughput 3.26771K wps
[Epoch 175 Batch 120/172] avg loss 0.0024961, throughput 3.93397K wps
[Epoch 175 Batch 150/172] avg loss 0.00264076, throughput 3.52309K wps
Begin Testing...
[Epoch 175] train avg loss 0.00254546, dev acc 0.8910, dev avg loss 0.280856, throughput 3.51746K wps
[Epoch 176 Batch 30/172] avg loss 0.00278778, throughput 3.33092K wps
[Epoch 176 Batch 60/172] avg loss 0.00240737, throughput 3.58022K wps
[Epoch 176 Batch 90/172] avg loss 0.00255056, throughput 3.18306K wps
[Epoch 176 Batch 120/172] avg loss 0.00253932, throughput 3.5578K wps
[Epoch 176 Batch 150/172] avg loss 0.00268557, throughput 3.09532K wps
Begin Testing...
[Epoch 176] train avg loss 0.0026079, dev acc 0.8941, dev avg loss 0.28116, throughput 3.34264K wps
[Epoch 177 Batch 30/172] avg loss 0.00284485, throughput 3.66634K wps
[Epoch 177 Batch 60/172] avg loss 0.00251524, throughput 3.74541K wps
[Epoch 177 Batch 90/172] avg loss 0.00244328, throughput 2.99907K wps
[Epoch 177 Batch 120/172] avg loss 0.00254146, throughput 3.60771K wps
[Epoch 177 Batch 150/172] avg loss 0.00228719, throughput 3.89113K wps
Begin Testing...
[Epoch 177] train avg loss 0.00251377, dev acc 0.8910, dev avg loss 0.282119, throughput 3.54029K wps
[Epoch 178 Batch 30/172] avg loss 0.0028397, throughput 3.70619K wps
[Epoch 178 Batch 60/172] avg loss 0.00271361, throughput 3.71659K wps
[Epoch 178 Batch 90/172] avg loss 0.00245628, throughput 3.39535K wps
[Epoch 178 Batch 120/172] avg loss 0.0022827, throughput 3.41439K wps
[Epoch 178 Batch 150/172] avg loss 0.00212726, throughput 3.91117K wps
Begin Testing...
[Epoch 178] train avg loss 0.00252009, dev acc 0.8952, dev avg loss 0.283216, throughput 3.61012K wps
[Epoch 179 Batch 30/172] avg loss 0.00245624, throughput 3.56843K wps
[Epoch 179 Batch 60/172] avg loss 0.00253692, throughput 3.3619K wps
[Epoch 179 Batch 90/172] avg loss 0.00277841, throughput 3.61074K wps
[Epoch 179 Batch 120/172] avg loss 0.00229037, throughput 3.27653K wps
[Epoch 179 Batch 150/172] avg loss 0.00265663, throughput 3.48221K wps
Begin Testing...
[Epoch 179] train avg loss 0.00253169, dev acc 0.8952, dev avg loss 0.283247, throughput 3.3921K wps
[Epoch 180 Batch 30/172] avg loss 0.00273778, throughput 3.16293K wps
[Epoch 180 Batch 60/172] avg loss 0.00247357, throughput 3.27746K wps
[Epoch 180 Batch 90/172] avg loss 0.00247186, throughput 2.97889K wps
[Epoch 180 Batch 120/172] avg loss 0.00262401, throughput 3.51265K wps
[Epoch 180 Batch 150/172] avg loss 0.00254592, throughput 3.4692K wps
Begin Testing...
[Epoch 180] train avg loss 0.00258112, dev acc 0.8931, dev avg loss 0.281999, throughput 3.25432K wps
[Epoch 181 Batch 30/172] avg loss 0.00252076, throughput 3.06189K wps
[Epoch 181 Batch 60/172] avg loss 0.00292357, throughput 3.23137K wps
[Epoch 181 Batch 90/172] avg loss 0.0023574, throughput 4.02419K wps
[Epoch 181 Batch 120/172] avg loss 0.00256229, throughput 3.15537K wps
[Epoch 181 Batch 150/172] avg loss 0.00237735, throughput 3.10915K wps
Begin Testing...
[Epoch 181] train avg loss 0.00250052, dev acc 0.8931, dev avg loss 0.283716, throughput 3.29466K wps
[Epoch 182 Batch 30/172] avg loss 0.00241751, throughput 3.63679K wps
[Epoch 182 Batch 60/172] avg loss 0.00257537, throughput 3.67774K wps
[Epoch 182 Batch 90/172] avg loss 0.00244783, throughput 3.34554K wps
[Epoch 182 Batch 120/172] avg loss 0.00246414, throughput 3.18462K wps
[Epoch 182 Batch 150/172] avg loss 0.00278141, throughput 3.96144K wps
Begin Testing...
[Epoch 182] train avg loss 0.00250309, dev acc 0.8941, dev avg loss 0.284514, throughput 3.53667K wps
[Epoch 183 Batch 30/172] avg loss 0.00234797, throughput 2.91468K wps
[Epoch 183 Batch 60/172] avg loss 0.00225202, throughput 3.45058K wps
[Epoch 183 Batch 90/172] avg loss 0.00236574, throughput 3.77046K wps
[Epoch 183 Batch 120/172] avg loss 0.00253194, throughput 3.32534K wps
[Epoch 183 Batch 150/172] avg loss 0.00248997, throughput 3.39537K wps
Begin Testing...
[Epoch 183] train avg loss 0.00247939, dev acc 0.8941, dev avg loss 0.282341, throughput 3.30442K wps
[Epoch 184 Batch 30/172] avg loss 0.00246775, throughput 2.9727K wps
[Epoch 184 Batch 60/172] avg loss 0.00245199, throughput 3.26819K wps
[Epoch 184 Batch 90/172] avg loss 0.00244056, throughput 3.44567K wps
[Epoch 184 Batch 120/172] avg loss 0.00251379, throughput 3.42317K wps
[Epoch 184 Batch 150/172] avg loss 0.00260134, throughput 3.04423K wps
Begin Testing...
[Epoch 184] train avg loss 0.00247334, dev acc 0.8931, dev avg loss 0.283391, throughput 3.22656K wps
[Epoch 185 Batch 30/172] avg loss 0.00220184, throughput 4.02567K wps
[Epoch 185 Batch 60/172] avg loss 0.00241329, throughput 3.09006K wps
[Epoch 185 Batch 90/172] avg loss 0.00262132, throughput 3.15146K wps
[Epoch 185 Batch 120/172] avg loss 0.00247202, throughput 3.21017K wps
[Epoch 185 Batch 150/172] avg loss 0.00269754, throughput 3.30579K wps
Begin Testing...
[Epoch 185] train avg loss 0.00247584, dev acc 0.8931, dev avg loss 0.284213, throughput 3.40287K wps
[Epoch 186 Batch 30/172] avg loss 0.00246289, throughput 2.8991K wps
[Epoch 186 Batch 60/172] avg loss 0.00227671, throughput 3.42165K wps
[Epoch 186 Batch 90/172] avg loss 0.00276661, throughput 2.90987K wps
[Epoch 186 Batch 120/172] avg loss 0.00229223, throughput 3.4179K wps
[Epoch 186 Batch 150/172] avg loss 0.00232292, throughput 3.61086K wps
Begin Testing...
[Epoch 186] train avg loss 0.00243692, dev acc 0.8931, dev avg loss 0.286263, throughput 3.27248K wps
[Epoch 187 Batch 30/172] avg loss 0.00253785, throughput 3.52103K wps
[Epoch 187 Batch 60/172] avg loss 0.00251104, throughput 3.26163K wps
[Epoch 187 Batch 90/172] avg loss 0.0024139, throughput 3.20299K wps
[Epoch 187 Batch 120/172] avg loss 0.00235769, throughput 4.15384K wps
[Epoch 187 Batch 150/172] avg loss 0.0024562, throughput 2.91144K wps
Begin Testing...
[Epoch 187] train avg loss 0.00242759, dev acc 0.8941, dev avg loss 0.285408, throughput 3.29629K wps
[Epoch 188 Batch 30/172] avg loss 0.00252798, throughput 3.03227K wps
[Epoch 188 Batch 60/172] avg loss 0.00222952, throughput 3.13588K wps
[Epoch 188 Batch 90/172] avg loss 0.00246211, throughput 3.35631K wps
[Epoch 188 Batch 120/172] avg loss 0.0025379, throughput 2.97874K wps
[Epoch 188 Batch 150/172] avg loss 0.00222779, throughput 3.88777K wps
Begin Testing...
[Epoch 188] train avg loss 0.0024282, dev acc 0.8931, dev avg loss 0.286344, throughput 3.29823K wps
[Epoch 189 Batch 30/172] avg loss 0.00235652, throughput 3.0706K wps
[Epoch 189 Batch 60/172] avg loss 0.00267806, throughput 3.02954K wps
[Epoch 189 Batch 90/172] avg loss 0.0023768, throughput 3.70717K wps
[Epoch 189 Batch 120/172] avg loss 0.00256359, throughput 3.31355K wps
[Epoch 189 Batch 150/172] avg loss 0.00247078, throughput 3.83043K wps
Begin Testing...
[Epoch 189] train avg loss 0.00246254, dev acc 0.8920, dev avg loss 0.284268, throughput 3.40058K wps
[Epoch 190 Batch 30/172] avg loss 0.00223818, throughput 3.22936K wps
[Epoch 190 Batch 60/172] avg loss 0.00226288, throughput 3.27219K wps
[Epoch 190 Batch 90/172] avg loss 0.00243406, throughput 3.11767K wps
[Epoch 190 Batch 120/172] avg loss 0.0021844, throughput 3.40072K wps
[Epoch 190 Batch 150/172] avg loss 0.00259907, throughput 3.11513K wps
Begin Testing...
[Epoch 190] train avg loss 0.00242466, dev acc 0.8941, dev avg loss 0.284901, throughput 3.22184K wps
[Epoch 191 Batch 30/172] avg loss 0.00230373, throughput 2.94363K wps
[Epoch 191 Batch 60/172] avg loss 0.00223094, throughput 3.26063K wps
[Epoch 191 Batch 90/172] avg loss 0.00275348, throughput 3.06669K wps
[Epoch 191 Batch 120/172] avg loss 0.00226104, throughput 3.3103K wps
[Epoch 191 Batch 150/172] avg loss 0.00225658, throughput 3.21287K wps
Begin Testing...
[Epoch 191] train avg loss 0.00239008, dev acc 0.8931, dev avg loss 0.28463, throughput 3.12457K wps
[Epoch 192 Batch 30/172] avg loss 0.00248118, throughput 3.12484K wps
[Epoch 192 Batch 60/172] avg loss 0.00206958, throughput 3.38714K wps
[Epoch 192 Batch 90/172] avg loss 0.00220192, throughput 3.03731K wps
[Epoch 192 Batch 120/172] avg loss 0.00263104, throughput 2.98863K wps
[Epoch 192 Batch 150/172] avg loss 0.00234211, throughput 3.27986K wps
Begin Testing...
[Epoch 192] train avg loss 0.00237972, dev acc 0.8941, dev avg loss 0.28372, throughput 3.15586K wps
[Epoch 193 Batch 30/172] avg loss 0.00206233, throughput 3.29811K wps
[Epoch 193 Batch 60/172] avg loss 0.00242768, throughput 3.39539K wps
[Epoch 193 Batch 90/172] avg loss 0.00268395, throughput 3.66632K wps
[Epoch 193 Batch 120/172] avg loss 0.00245871, throughput 3.40279K wps
[Epoch 193 Batch 150/172] avg loss 0.00229776, throughput 3.53583K wps
Begin Testing...
[Epoch 193] train avg loss 0.00233438, dev acc 0.8962, dev avg loss 0.285691, throughput 3.52961K wps
Observed Improvement.
Begin Testing...
[Epoch 194 Batch 30/172] avg loss 0.00236413, throughput 3.3521K wps
[Epoch 194 Batch 60/172] avg loss 0.00260091, throughput 3.62613K wps
[Epoch 194 Batch 90/172] avg loss 0.00207431, throughput 3.1509K wps
[Epoch 194 Batch 120/172] avg loss 0.00214959, throughput 3.01168K wps
[Epoch 194 Batch 150/172] avg loss 0.00250389, throughput 3.05701K wps
Begin Testing...
[Epoch 194] train avg loss 0.00236181, dev acc 0.8920, dev avg loss 0.284639, throughput 3.28548K wps
[Epoch 195 Batch 30/172] avg loss 0.00247083, throughput 3.24843K wps
[Epoch 195 Batch 60/172] avg loss 0.00245976, throughput 3.33652K wps
[Epoch 195 Batch 90/172] avg loss 0.0020646, throughput 3.62878K wps
[Epoch 195 Batch 120/172] avg loss 0.00221032, throughput 3.57559K wps
[Epoch 195 Batch 150/172] avg loss 0.00241011, throughput 3.05077K wps
Begin Testing...
[Epoch 195] train avg loss 0.00230389, dev acc 0.8962, dev avg loss 0.287509, throughput 3.34482K wps
Observed Improvement.
Begin Testing...
[Epoch 196 Batch 30/172] avg loss 0.00248652, throughput 3.58384K wps
[Epoch 196 Batch 60/172] avg loss 0.00219506, throughput 3.20866K wps
[Epoch 196 Batch 90/172] avg loss 0.0024206, throughput 3.39194K wps
[Epoch 196 Batch 120/172] avg loss 0.00239569, throughput 3.12231K wps
[Epoch 196 Batch 150/172] avg loss 0.00224583, throughput 3.02342K wps
Begin Testing...
[Epoch 196] train avg loss 0.00237805, dev acc 0.8920, dev avg loss 0.285135, throughput 3.21899K wps
[Epoch 197 Batch 30/172] avg loss 0.00235188, throughput 3.2063K wps
[Epoch 197 Batch 60/172] avg loss 0.00267772, throughput 3.05603K wps
[Epoch 197 Batch 90/172] avg loss 0.00235128, throughput 3.61579K wps
[Epoch 197 Batch 120/172] avg loss 0.00222862, throughput 3.41021K wps
[Epoch 197 Batch 150/172] avg loss 0.00238831, throughput 3.23304K wps
Begin Testing...
[Epoch 197] train avg loss 0.00237807, dev acc 0.8941, dev avg loss 0.284904, throughput 3.34303K wps
[Epoch 198 Batch 30/172] avg loss 0.00222304, throughput 3.04445K wps
[Epoch 198 Batch 60/172] avg loss 0.00211649, throughput 3.4547K wps
[Epoch 198 Batch 90/172] avg loss 0.00213125, throughput 2.97708K wps
[Epoch 198 Batch 120/172] avg loss 0.00237114, throughput 3.24595K wps
[Epoch 198 Batch 150/172] avg loss 0.00255637, throughput 3.5486K wps
Begin Testing...
[Epoch 198] train avg loss 0.0023204, dev acc 0.8952, dev avg loss 0.285244, throughput 3.23275K wps
[Epoch 199 Batch 30/172] avg loss 0.00245105, throughput 3.12498K wps
[Epoch 199 Batch 60/172] avg loss 0.00245788, throughput 3.18999K wps
[Epoch 199 Batch 90/172] avg loss 0.0021858, throughput 3.37972K wps
[Epoch 199 Batch 120/172] avg loss 0.00222385, throughput 3.41883K wps
[Epoch 199 Batch 150/172] avg loss 0.00219016, throughput 3.38529K wps
Begin Testing...
[Epoch 199] train avg loss 0.00231718, dev acc 0.8941, dev avg loss 0.285919, throughput 3.30118K wps
Test loss 0.341564, test acc 0.8849
Total time cost 393.44s
[Epoch 0 Batch 30/172] avg loss 0.0127982, throughput 2.78465K wps
[Epoch 0 Batch 60/172] avg loss 0.01229, throughput 2.9348K wps
[Epoch 0 Batch 90/172] avg loss 0.0124257, throughput 3.47807K wps
[Epoch 0 Batch 120/172] avg loss 0.0123104, throughput 3.36287K wps
[Epoch 0 Batch 150/172] avg loss 0.0122975, throughput 3.18255K wps
Begin Testing...
[Epoch 0] train avg loss 0.0124122, dev acc 0.6771, dev avg loss 0.617801, throughput 3.10396K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0119565, throughput 3.04726K wps
[Epoch 1 Batch 60/172] avg loss 0.0120373, throughput 3.68889K wps
[Epoch 1 Batch 90/172] avg loss 0.0123518, throughput 4.15369K wps
[Epoch 1 Batch 120/172] avg loss 0.0120913, throughput 3.08442K wps
[Epoch 1 Batch 150/172] avg loss 0.0120308, throughput 3.70299K wps
Begin Testing...
[Epoch 1] train avg loss 0.0120763, dev acc 0.6771, dev avg loss 0.604518, throughput 3.49589K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0118358, throughput 3.32892K wps
[Epoch 2 Batch 60/172] avg loss 0.0118452, throughput 3.76431K wps
[Epoch 2 Batch 90/172] avg loss 0.0119833, throughput 3.38022K wps
[Epoch 2 Batch 120/172] avg loss 0.0118246, throughput 2.94091K wps
[Epoch 2 Batch 150/172] avg loss 0.0117414, throughput 3.64197K wps
Begin Testing...
[Epoch 2] train avg loss 0.0117974, dev acc 0.6771, dev avg loss 0.591004, throughput 3.43577K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.011726, throughput 3.49885K wps
[Epoch 3 Batch 60/172] avg loss 0.0114146, throughput 3.42523K wps
[Epoch 3 Batch 90/172] avg loss 0.0114951, throughput 3.49032K wps
[Epoch 3 Batch 120/172] avg loss 0.0115168, throughput 2.98526K wps
[Epoch 3 Batch 150/172] avg loss 0.0113317, throughput 2.97224K wps
Begin Testing...
[Epoch 3] train avg loss 0.0114842, dev acc 0.6771, dev avg loss 0.574553, throughput 3.33967K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0111187, throughput 3.26444K wps
[Epoch 4 Batch 60/172] avg loss 0.0115738, throughput 2.99966K wps
[Epoch 4 Batch 90/172] avg loss 0.0112798, throughput 3.4921K wps
[Epoch 4 Batch 120/172] avg loss 0.0108782, throughput 3.75763K wps
[Epoch 4 Batch 150/172] avg loss 0.0110157, throughput 3.1421K wps
Begin Testing...
[Epoch 4] train avg loss 0.0111255, dev acc 0.6813, dev avg loss 0.556844, throughput 3.2698K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0108732, throughput 3.16582K wps
[Epoch 5 Batch 60/172] avg loss 0.0106675, throughput 3.00606K wps
[Epoch 5 Batch 90/172] avg loss 0.0108672, throughput 3.47363K wps
[Epoch 5 Batch 120/172] avg loss 0.0104335, throughput 3.34745K wps
[Epoch 5 Batch 150/172] avg loss 0.0108129, throughput 3.77888K wps
Begin Testing...
[Epoch 5] train avg loss 0.0107475, dev acc 0.7138, dev avg loss 0.535051, throughput 3.3419K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.0103151, throughput 3.13981K wps
[Epoch 6 Batch 60/172] avg loss 0.0103266, throughput 3.04226K wps
[Epoch 6 Batch 90/172] avg loss 0.0101016, throughput 2.97133K wps
[Epoch 6 Batch 120/172] avg loss 0.0104626, throughput 3.32751K wps
[Epoch 6 Batch 150/172] avg loss 0.0100681, throughput 3.38334K wps
Begin Testing...
[Epoch 6] train avg loss 0.0102777, dev acc 0.7589, dev avg loss 0.510968, throughput 3.24689K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.0101584, throughput 3.20643K wps
[Epoch 7 Batch 60/172] avg loss 0.00985858, throughput 3.75392K wps
[Epoch 7 Batch 90/172] avg loss 0.00972202, throughput 3.50284K wps
[Epoch 7 Batch 120/172] avg loss 0.00997329, throughput 3.44332K wps
[Epoch 7 Batch 150/172] avg loss 0.00965308, throughput 3.32642K wps
Begin Testing...
[Epoch 7] train avg loss 0.00984861, dev acc 0.7642, dev avg loss 0.486076, throughput 3.48442K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00930513, throughput 3.06309K wps
[Epoch 8 Batch 60/172] avg loss 0.00944635, throughput 3.19551K wps
[Epoch 8 Batch 90/172] avg loss 0.00952802, throughput 3.56081K wps
[Epoch 8 Batch 120/172] avg loss 0.00931012, throughput 3.38447K wps
[Epoch 8 Batch 150/172] avg loss 0.00925462, throughput 3.78478K wps
Begin Testing...
[Epoch 8] train avg loss 0.00934376, dev acc 0.7883, dev avg loss 0.461294, throughput 3.35622K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00908691, throughput 3.64108K wps
[Epoch 9 Batch 60/172] avg loss 0.00908601, throughput 3.07746K wps
[Epoch 9 Batch 90/172] avg loss 0.00876498, throughput 3.48292K wps
[Epoch 9 Batch 120/172] avg loss 0.00898411, throughput 3.37758K wps
[Epoch 9 Batch 150/172] avg loss 0.00859089, throughput 3.26255K wps
Begin Testing...
[Epoch 9] train avg loss 0.00886546, dev acc 0.8071, dev avg loss 0.438025, throughput 3.37878K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00835651, throughput 3.02703K wps
[Epoch 10 Batch 60/172] avg loss 0.00872133, throughput 3.13929K wps
[Epoch 10 Batch 90/172] avg loss 0.00846815, throughput 3.73766K wps
[Epoch 10 Batch 120/172] avg loss 0.00846549, throughput 3.26847K wps
[Epoch 10 Batch 150/172] avg loss 0.00832669, throughput 3.32793K wps
Begin Testing...
[Epoch 10] train avg loss 0.00841942, dev acc 0.8218, dev avg loss 0.415605, throughput 3.28204K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00793525, throughput 3.2853K wps
[Epoch 11 Batch 60/172] avg loss 0.00820808, throughput 3.20521K wps
[Epoch 11 Batch 90/172] avg loss 0.0080386, throughput 3.94433K wps
[Epoch 11 Batch 120/172] avg loss 0.00796055, throughput 3.21547K wps
[Epoch 11 Batch 150/172] avg loss 0.00788415, throughput 3.44239K wps
Begin Testing...
[Epoch 11] train avg loss 0.00802932, dev acc 0.8470, dev avg loss 0.394025, throughput 3.3651K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.00759623, throughput 2.97352K wps
[Epoch 12 Batch 60/172] avg loss 0.00758202, throughput 3.45238K wps
[Epoch 12 Batch 90/172] avg loss 0.00793183, throughput 3.26314K wps
[Epoch 12 Batch 120/172] avg loss 0.00767516, throughput 3.36551K wps
[Epoch 12 Batch 150/172] avg loss 0.00760781, throughput 2.98049K wps
Begin Testing...
[Epoch 12] train avg loss 0.00765905, dev acc 0.8491, dev avg loss 0.377887, throughput 3.17565K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.00725746, throughput 3.40061K wps
[Epoch 13 Batch 60/172] avg loss 0.00762335, throughput 3.14914K wps
[Epoch 13 Batch 90/172] avg loss 0.00750104, throughput 3.19361K wps
[Epoch 13 Batch 120/172] avg loss 0.00747123, throughput 3.24362K wps
[Epoch 13 Batch 150/172] avg loss 0.00713497, throughput 3.09195K wps
Begin Testing...
[Epoch 13] train avg loss 0.00736467, dev acc 0.8553, dev avg loss 0.363179, throughput 3.18582K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.0074028, throughput 2.89832K wps
[Epoch 14 Batch 60/172] avg loss 0.00730026, throughput 3.15706K wps
[Epoch 14 Batch 90/172] avg loss 0.00706134, throughput 3.5341K wps
[Epoch 14 Batch 120/172] avg loss 0.00727455, throughput 3.27343K wps
[Epoch 14 Batch 150/172] avg loss 0.00646804, throughput 3.33825K wps
Begin Testing...
[Epoch 14] train avg loss 0.00713862, dev acc 0.8753, dev avg loss 0.351141, throughput 3.25116K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.00697023, throughput 3.37453K wps
[Epoch 15 Batch 60/172] avg loss 0.0068267, throughput 3.41661K wps
[Epoch 15 Batch 90/172] avg loss 0.00644966, throughput 3.30453K wps
[Epoch 15 Batch 120/172] avg loss 0.00724827, throughput 3.30594K wps
[Epoch 15 Batch 150/172] avg loss 0.00707762, throughput 3.53815K wps
Begin Testing...
[Epoch 15] train avg loss 0.00688733, dev acc 0.8742, dev avg loss 0.341051, throughput 3.42101K wps
[Epoch 16 Batch 30/172] avg loss 0.00648742, throughput 3.14819K wps
[Epoch 16 Batch 60/172] avg loss 0.00736078, throughput 3.2772K wps
[Epoch 16 Batch 90/172] avg loss 0.00657236, throughput 3.49287K wps
[Epoch 16 Batch 120/172] avg loss 0.00684128, throughput 3.42337K wps
[Epoch 16 Batch 150/172] avg loss 0.00685311, throughput 3.28975K wps
Begin Testing...
[Epoch 16] train avg loss 0.00676837, dev acc 0.8805, dev avg loss 0.333179, throughput 3.36936K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/172] avg loss 0.00657836, throughput 3.50982K wps
[Epoch 17 Batch 60/172] avg loss 0.00613476, throughput 3.63736K wps
[Epoch 17 Batch 90/172] avg loss 0.00642265, throughput 2.90592K wps
[Epoch 17 Batch 120/172] avg loss 0.00672222, throughput 2.98929K wps
[Epoch 17 Batch 150/172] avg loss 0.00712704, throughput 3.72736K wps
Begin Testing...
[Epoch 17] train avg loss 0.0065956, dev acc 0.8836, dev avg loss 0.326493, throughput 3.34471K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.00678602, throughput 3.53284K wps
[Epoch 18 Batch 60/172] avg loss 0.00630158, throughput 3.34801K wps
[Epoch 18 Batch 90/172] avg loss 0.0064707, throughput 3.18011K wps
[Epoch 18 Batch 120/172] avg loss 0.00601532, throughput 3.02687K wps
[Epoch 18 Batch 150/172] avg loss 0.00659838, throughput 3.17772K wps
Begin Testing...
[Epoch 18] train avg loss 0.00646796, dev acc 0.8857, dev avg loss 0.320802, throughput 3.24819K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00627247, throughput 3.59021K wps
[Epoch 19 Batch 60/172] avg loss 0.00607763, throughput 3.81152K wps
[Epoch 19 Batch 90/172] avg loss 0.00647683, throughput 3.74142K wps
[Epoch 19 Batch 120/172] avg loss 0.00646932, throughput 2.99458K wps
[Epoch 19 Batch 150/172] avg loss 0.00630457, throughput 2.96611K wps
Begin Testing...
[Epoch 19] train avg loss 0.00633189, dev acc 0.8857, dev avg loss 0.31618, throughput 3.41457K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.0065614, throughput 3.89443K wps
[Epoch 20 Batch 60/172] avg loss 0.00597925, throughput 3.51235K wps
[Epoch 20 Batch 90/172] avg loss 0.00612721, throughput 3.56807K wps
[Epoch 20 Batch 120/172] avg loss 0.00599726, throughput 3.28856K wps
[Epoch 20 Batch 150/172] avg loss 0.00648109, throughput 3.35098K wps
Begin Testing...
[Epoch 20] train avg loss 0.00622466, dev acc 0.8889, dev avg loss 0.31227, throughput 3.48293K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/172] avg loss 0.00562228, throughput 3.02953K wps
[Epoch 21 Batch 60/172] avg loss 0.00609999, throughput 3.16553K wps
[Epoch 21 Batch 90/172] avg loss 0.00688194, throughput 3.20462K wps
[Epoch 21 Batch 120/172] avg loss 0.00629079, throughput 3.05826K wps
[Epoch 21 Batch 150/172] avg loss 0.00554651, throughput 3.17253K wps
Begin Testing...
[Epoch 21] train avg loss 0.00616149, dev acc 0.8899, dev avg loss 0.308605, throughput 3.14591K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.00639972, throughput 3.01552K wps
[Epoch 22 Batch 60/172] avg loss 0.00592813, throughput 3.60322K wps
[Epoch 22 Batch 90/172] avg loss 0.00621364, throughput 3.21763K wps
[Epoch 22 Batch 120/172] avg loss 0.00603247, throughput 3.9898K wps
[Epoch 22 Batch 150/172] avg loss 0.00594659, throughput 3.61167K wps
Begin Testing...
[Epoch 22] train avg loss 0.00608913, dev acc 0.8920, dev avg loss 0.305753, throughput 3.44161K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/172] avg loss 0.00587407, throughput 3.20209K wps
[Epoch 23 Batch 60/172] avg loss 0.0059014, throughput 3.46018K wps
[Epoch 23 Batch 90/172] avg loss 0.00613681, throughput 3.47136K wps
[Epoch 23 Batch 120/172] avg loss 0.00622571, throughput 3.00362K wps
[Epoch 23 Batch 150/172] avg loss 0.00603694, throughput 3.64637K wps
Begin Testing...
[Epoch 23] train avg loss 0.0059857, dev acc 0.8920, dev avg loss 0.304315, throughput 3.31721K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00570895, throughput 3.23601K wps
[Epoch 24 Batch 60/172] avg loss 0.00589211, throughput 3.67368K wps
[Epoch 24 Batch 90/172] avg loss 0.00600316, throughput 3.72772K wps
[Epoch 24 Batch 120/172] avg loss 0.00591926, throughput 3.15882K wps
[Epoch 24 Batch 150/172] avg loss 0.0060188, throughput 3.22944K wps
Begin Testing...
[Epoch 24] train avg loss 0.00586868, dev acc 0.8910, dev avg loss 0.301408, throughput 3.31687K wps
[Epoch 25 Batch 30/172] avg loss 0.0063269, throughput 3.43345K wps
[Epoch 25 Batch 60/172] avg loss 0.00591282, throughput 3.57502K wps
[Epoch 25 Batch 90/172] avg loss 0.00580811, throughput 3.21349K wps
[Epoch 25 Batch 120/172] avg loss 0.00577394, throughput 3.65581K wps
[Epoch 25 Batch 150/172] avg loss 0.00598485, throughput 3.64547K wps
Begin Testing...
[Epoch 25] train avg loss 0.00590113, dev acc 0.8931, dev avg loss 0.299564, throughput 3.48887K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/172] avg loss 0.00603116, throughput 2.97689K wps
[Epoch 26 Batch 60/172] avg loss 0.00576161, throughput 3.64211K wps
[Epoch 26 Batch 90/172] avg loss 0.00553021, throughput 3.53367K wps
[Epoch 26 Batch 120/172] avg loss 0.00606279, throughput 3.1364K wps
[Epoch 26 Batch 150/172] avg loss 0.00605493, throughput 3.15281K wps
Begin Testing...
[Epoch 26] train avg loss 0.00579666, dev acc 0.8899, dev avg loss 0.297214, throughput 3.24437K wps
[Epoch 27 Batch 30/172] avg loss 0.00611505, throughput 3.34335K wps
[Epoch 27 Batch 60/172] avg loss 0.0056016, throughput 3.70474K wps
[Epoch 27 Batch 90/172] avg loss 0.00510629, throughput 3.67326K wps
[Epoch 27 Batch 120/172] avg loss 0.00571327, throughput 3.0384K wps
[Epoch 27 Batch 150/172] avg loss 0.00596378, throughput 3.27404K wps
Begin Testing...
[Epoch 27] train avg loss 0.00575504, dev acc 0.8941, dev avg loss 0.295338, throughput 3.32986K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/172] avg loss 0.00622277, throughput 3.56473K wps
[Epoch 28 Batch 60/172] avg loss 0.00574544, throughput 3.55972K wps
[Epoch 28 Batch 90/172] avg loss 0.00556336, throughput 3.63121K wps
[Epoch 28 Batch 120/172] avg loss 0.00560993, throughput 3.35005K wps
[Epoch 28 Batch 150/172] avg loss 0.00551379, throughput 3.39006K wps
Begin Testing...
[Epoch 28] train avg loss 0.00571819, dev acc 0.8931, dev avg loss 0.29433, throughput 3.46538K wps
[Epoch 29 Batch 30/172] avg loss 0.00558681, throughput 3.06362K wps
[Epoch 29 Batch 60/172] avg loss 0.00544373, throughput 3.02757K wps
[Epoch 29 Batch 90/172] avg loss 0.00605611, throughput 3.03936K wps
[Epoch 29 Batch 120/172] avg loss 0.005678, throughput 2.98043K wps
[Epoch 29 Batch 150/172] avg loss 0.00614458, throughput 3.15587K wps
Begin Testing...
[Epoch 29] train avg loss 0.00571061, dev acc 0.8931, dev avg loss 0.292794, throughput 3.08372K wps
[Epoch 30 Batch 30/172] avg loss 0.00550298, throughput 3.10372K wps
[Epoch 30 Batch 60/172] avg loss 0.00555565, throughput 3.17196K wps
[Epoch 30 Batch 90/172] avg loss 0.00532579, throughput 3.26883K wps
[Epoch 30 Batch 120/172] avg loss 0.00602697, throughput 3.03941K wps
[Epoch 30 Batch 150/172] avg loss 0.00569152, throughput 4.04788K wps
Begin Testing...
[Epoch 30] train avg loss 0.00564914, dev acc 0.8931, dev avg loss 0.291421, throughput 3.31151K wps
[Epoch 31 Batch 30/172] avg loss 0.00514577, throughput 3.21046K wps
[Epoch 31 Batch 60/172] avg loss 0.00557991, throughput 3.00451K wps
[Epoch 31 Batch 90/172] avg loss 0.00508484, throughput 3.31458K wps
[Epoch 31 Batch 120/172] avg loss 0.00556178, throughput 3.02015K wps
[Epoch 31 Batch 150/172] avg loss 0.00575864, throughput 3.01189K wps
Begin Testing...
[Epoch 31] train avg loss 0.00553703, dev acc 0.8983, dev avg loss 0.290356, throughput 3.09293K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/172] avg loss 0.0054737, throughput 3.07115K wps
[Epoch 32 Batch 60/172] avg loss 0.00555472, throughput 3.86421K wps
[Epoch 32 Batch 90/172] avg loss 0.00544748, throughput 3.46085K wps
[Epoch 32 Batch 120/172] avg loss 0.00561007, throughput 3.96211K wps
[Epoch 32 Batch 150/172] avg loss 0.00562267, throughput 3.29976K wps
Begin Testing...
[Epoch 32] train avg loss 0.00552142, dev acc 0.8941, dev avg loss 0.289265, throughput 3.47669K wps
[Epoch 33 Batch 30/172] avg loss 0.00525618, throughput 3.75304K wps
[Epoch 33 Batch 60/172] avg loss 0.00581399, throughput 3.19146K wps
[Epoch 33 Batch 90/172] avg loss 0.00578987, throughput 3.08739K wps
[Epoch 33 Batch 120/172] avg loss 0.0049016, throughput 3.55408K wps
[Epoch 33 Batch 150/172] avg loss 0.00553172, throughput 3.76999K wps
Begin Testing...
[Epoch 33] train avg loss 0.00548066, dev acc 0.8994, dev avg loss 0.288308, throughput 3.38968K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/172] avg loss 0.00520953, throughput 3.11003K wps
[Epoch 34 Batch 60/172] avg loss 0.00540849, throughput 3.22163K wps
[Epoch 34 Batch 90/172] avg loss 0.00630699, throughput 3.20187K wps
[Epoch 34 Batch 120/172] avg loss 0.00533808, throughput 3.03114K wps
[Epoch 34 Batch 150/172] avg loss 0.00521373, throughput 3.00355K wps
Begin Testing...
[Epoch 34] train avg loss 0.00545424, dev acc 0.8973, dev avg loss 0.287714, throughput 3.17729K wps
[Epoch 35 Batch 30/172] avg loss 0.00515304, throughput 3.36409K wps
[Epoch 35 Batch 60/172] avg loss 0.00545778, throughput 3.11201K wps
[Epoch 35 Batch 90/172] avg loss 0.00549705, throughput 3.43476K wps
[Epoch 35 Batch 120/172] avg loss 0.00562472, throughput 3.11999K wps
[Epoch 35 Batch 150/172] avg loss 0.00549986, throughput 3.23832K wps
Begin Testing...
[Epoch 35] train avg loss 0.00538916, dev acc 0.8983, dev avg loss 0.287681, throughput 3.23616K wps
[Epoch 36 Batch 30/172] avg loss 0.00564579, throughput 3.56026K wps
[Epoch 36 Batch 60/172] avg loss 0.0054558, throughput 3.41658K wps
[Epoch 36 Batch 90/172] avg loss 0.00522283, throughput 3.5136K wps
[Epoch 36 Batch 120/172] avg loss 0.00544419, throughput 3.02239K wps
[Epoch 36 Batch 150/172] avg loss 0.00497312, throughput 3.44846K wps
Begin Testing...
[Epoch 36] train avg loss 0.00544124, dev acc 0.8973, dev avg loss 0.286339, throughput 3.43812K wps
[Epoch 37 Batch 30/172] avg loss 0.00520778, throughput 2.98671K wps
[Epoch 37 Batch 60/172] avg loss 0.00527388, throughput 3.4742K wps
[Epoch 37 Batch 90/172] avg loss 0.00585047, throughput 3.07632K wps
[Epoch 37 Batch 120/172] avg loss 0.00533093, throughput 3.44589K wps
[Epoch 37 Batch 150/172] avg loss 0.00521331, throughput 2.99545K wps
Begin Testing...
[Epoch 37] train avg loss 0.00533951, dev acc 0.8962, dev avg loss 0.285288, throughput 3.15864K wps
[Epoch 38 Batch 30/172] avg loss 0.00523277, throughput 3.75801K wps
[Epoch 38 Batch 60/172] avg loss 0.00571166, throughput 3.63885K wps
[Epoch 38 Batch 90/172] avg loss 0.00536461, throughput 3.08745K wps
[Epoch 38 Batch 120/172] avg loss 0.00517326, throughput 3.08035K wps
[Epoch 38 Batch 150/172] avg loss 0.0050693, throughput 2.93451K wps
Begin Testing...
[Epoch 38] train avg loss 0.00526517, dev acc 0.8973, dev avg loss 0.284731, throughput 3.22768K wps
[Epoch 39 Batch 30/172] avg loss 0.00529398, throughput 3.18039K wps
[Epoch 39 Batch 60/172] avg loss 0.00479319, throughput 2.8845K wps
[Epoch 39 Batch 90/172] avg loss 0.00557481, throughput 3.54063K wps
[Epoch 39 Batch 120/172] avg loss 0.00532162, throughput 3.09932K wps
[Epoch 39 Batch 150/172] avg loss 0.00541419, throughput 3.25622K wps
Begin Testing...
[Epoch 39] train avg loss 0.00528059, dev acc 0.8983, dev avg loss 0.28402, throughput 3.17297K wps
[Epoch 40 Batch 30/172] avg loss 0.004964, throughput 3.01591K wps
[Epoch 40 Batch 60/172] avg loss 0.00478986, throughput 3.81126K wps
[Epoch 40 Batch 90/172] avg loss 0.0056067, throughput 3.24244K wps
[Epoch 40 Batch 120/172] avg loss 0.00507842, throughput 3.08096K wps
[Epoch 40 Batch 150/172] avg loss 0.0055434, throughput 3.15243K wps
Begin Testing...
[Epoch 40] train avg loss 0.0052378, dev acc 0.8973, dev avg loss 0.283286, throughput 3.28141K wps
[Epoch 41 Batch 30/172] avg loss 0.00530673, throughput 3.24001K wps
[Epoch 41 Batch 60/172] avg loss 0.00490352, throughput 3.29554K wps
[Epoch 41 Batch 90/172] avg loss 0.0053384, throughput 2.96943K wps
[Epoch 41 Batch 120/172] avg loss 0.00555994, throughput 3.56537K wps
[Epoch 41 Batch 150/172] avg loss 0.00498874, throughput 3.08027K wps
Begin Testing...
[Epoch 41] train avg loss 0.00521128, dev acc 0.9015, dev avg loss 0.283055, throughput 3.19172K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/172] avg loss 0.00513326, throughput 3.27386K wps
[Epoch 42 Batch 60/172] avg loss 0.00516212, throughput 3.73545K wps
[Epoch 42 Batch 90/172] avg loss 0.00518252, throughput 3.16148K wps
[Epoch 42 Batch 120/172] avg loss 0.0053098, throughput 3.82772K wps
[Epoch 42 Batch 150/172] avg loss 0.00492764, throughput 3.69591K wps
Begin Testing...
[Epoch 42] train avg loss 0.00515201, dev acc 0.8983, dev avg loss 0.281838, throughput 3.51211K wps
[Epoch 43 Batch 30/172] avg loss 0.00499269, throughput 3.09746K wps
[Epoch 43 Batch 60/172] avg loss 0.00520379, throughput 3.1402K wps
[Epoch 43 Batch 90/172] avg loss 0.00549911, throughput 3.48334K wps
[Epoch 43 Batch 120/172] avg loss 0.00519899, throughput 4.00399K wps
[Epoch 43 Batch 150/172] avg loss 0.00468773, throughput 3.31973K wps
Begin Testing...
[Epoch 43] train avg loss 0.00514741, dev acc 0.8973, dev avg loss 0.281361, throughput 3.33899K wps
[Epoch 44 Batch 30/172] avg loss 0.00505977, throughput 3.21386K wps
[Epoch 44 Batch 60/172] avg loss 0.00511335, throughput 3.75458K wps
[Epoch 44 Batch 90/172] avg loss 0.00519309, throughput 3.70459K wps
[Epoch 44 Batch 120/172] avg loss 0.00491723, throughput 3.57189K wps
[Epoch 44 Batch 150/172] avg loss 0.00488669, throughput 3.65566K wps
Begin Testing...
[Epoch 44] train avg loss 0.00511566, dev acc 0.8962, dev avg loss 0.280959, throughput 3.49205K wps
[Epoch 45 Batch 30/172] avg loss 0.00526741, throughput 3.01628K wps
[Epoch 45 Batch 60/172] avg loss 0.00477238, throughput 3.47119K wps
[Epoch 45 Batch 90/172] avg loss 0.00486069, throughput 3.15496K wps
[Epoch 45 Batch 120/172] avg loss 0.00520067, throughput 3.43344K wps
[Epoch 45 Batch 150/172] avg loss 0.00518719, throughput 3.37706K wps
Begin Testing...
[Epoch 45] train avg loss 0.00507889, dev acc 0.8994, dev avg loss 0.280197, throughput 3.28565K wps
[Epoch 46 Batch 30/172] avg loss 0.00467982, throughput 3.34841K wps
[Epoch 46 Batch 60/172] avg loss 0.00504622, throughput 3.01212K wps
[Epoch 46 Batch 90/172] avg loss 0.00528426, throughput 3.16453K wps
[Epoch 46 Batch 120/172] avg loss 0.00541596, throughput 3.58147K wps
[Epoch 46 Batch 150/172] avg loss 0.00492249, throughput 3.58857K wps
Begin Testing...
[Epoch 46] train avg loss 0.00506603, dev acc 0.9015, dev avg loss 0.280344, throughput 3.28962K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/172] avg loss 0.00504652, throughput 3.24147K wps
[Epoch 47 Batch 60/172] avg loss 0.00513317, throughput 3.39314K wps
[Epoch 47 Batch 90/172] avg loss 0.0053662, throughput 3.08422K wps
[Epoch 47 Batch 120/172] avg loss 0.00499246, throughput 3.60344K wps
[Epoch 47 Batch 150/172] avg loss 0.00509776, throughput 3.31078K wps
Begin Testing...
[Epoch 47] train avg loss 0.00504974, dev acc 0.9015, dev avg loss 0.280111, throughput 3.2691K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/172] avg loss 0.00520832, throughput 3.71959K wps
[Epoch 48 Batch 60/172] avg loss 0.00497805, throughput 3.22978K wps
[Epoch 48 Batch 90/172] avg loss 0.0047699, throughput 3.42274K wps
[Epoch 48 Batch 120/172] avg loss 0.00526881, throughput 3.74157K wps
[Epoch 48 Batch 150/172] avg loss 0.00452233, throughput 3.75805K wps
Begin Testing...
[Epoch 48] train avg loss 0.00500622, dev acc 0.8983, dev avg loss 0.278958, throughput 3.57803K wps
[Epoch 49 Batch 30/172] avg loss 0.0050531, throughput 3.1311K wps
[Epoch 49 Batch 60/172] avg loss 0.00447771, throughput 3.12866K wps
[Epoch 49 Batch 90/172] avg loss 0.00513292, throughput 3.279K wps
[Epoch 49 Batch 120/172] avg loss 0.00469577, throughput 3.09676K wps
[Epoch 49 Batch 150/172] avg loss 0.0054333, throughput 3.39567K wps
Begin Testing...
[Epoch 49] train avg loss 0.00491711, dev acc 0.9004, dev avg loss 0.281373, throughput 3.25618K wps
[Epoch 50 Batch 30/172] avg loss 0.00503142, throughput 2.88799K wps
[Epoch 50 Batch 60/172] avg loss 0.00494135, throughput 3.5839K wps
[Epoch 50 Batch 90/172] avg loss 0.00475804, throughput 3.64239K wps
[Epoch 50 Batch 120/172] avg loss 0.00520745, throughput 3.47933K wps
[Epoch 50 Batch 150/172] avg loss 0.00493214, throughput 3.13378K wps
Begin Testing...
[Epoch 50] train avg loss 0.00498527, dev acc 0.8994, dev avg loss 0.278221, throughput 3.25961K wps
[Epoch 51 Batch 30/172] avg loss 0.00470687, throughput 3.22593K wps
[Epoch 51 Batch 60/172] avg loss 0.00453721, throughput 3.91667K wps
[Epoch 51 Batch 90/172] avg loss 0.00488762, throughput 3.1613K wps
[Epoch 51 Batch 120/172] avg loss 0.00537221, throughput 3.23131K wps
[Epoch 51 Batch 150/172] avg loss 0.00524161, throughput 3.51864K wps
Begin Testing...
[Epoch 51] train avg loss 0.00492184, dev acc 0.9015, dev avg loss 0.278007, throughput 3.33463K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/172] avg loss 0.00475031, throughput 3.1474K wps
[Epoch 52 Batch 60/172] avg loss 0.00451594, throughput 3.75738K wps
[Epoch 52 Batch 90/172] avg loss 0.00535038, throughput 2.89751K wps
[Epoch 52 Batch 120/172] avg loss 0.00514894, throughput 3.00369K wps
[Epoch 52 Batch 150/172] avg loss 0.00436197, throughput 3.56248K wps
Begin Testing...
[Epoch 52] train avg loss 0.00488662, dev acc 0.9036, dev avg loss 0.277754, throughput 3.26262K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/172] avg loss 0.00493143, throughput 3.19856K wps
[Epoch 53 Batch 60/172] avg loss 0.00469149, throughput 3.47142K wps
[Epoch 53 Batch 90/172] avg loss 0.00447562, throughput 3.32274K wps
[Epoch 53 Batch 120/172] avg loss 0.00471089, throughput 3.51678K wps
[Epoch 53 Batch 150/172] avg loss 0.00523027, throughput 3.56865K wps
Begin Testing...
[Epoch 53] train avg loss 0.00486046, dev acc 0.9015, dev avg loss 0.276527, throughput 3.41825K wps
[Epoch 54 Batch 30/172] avg loss 0.00486478, throughput 3.00135K wps
[Epoch 54 Batch 60/172] avg loss 0.00443831, throughput 3.34327K wps
[Epoch 54 Batch 90/172] avg loss 0.005294, throughput 3.51421K wps
[Epoch 54 Batch 120/172] avg loss 0.00503384, throughput 3.43663K wps
[Epoch 54 Batch 150/172] avg loss 0.00480233, throughput 3.21208K wps
Begin Testing...
[Epoch 54] train avg loss 0.00484104, dev acc 0.9025, dev avg loss 0.276552, throughput 3.28125K wps
[Epoch 55 Batch 30/172] avg loss 0.00506931, throughput 3.05275K wps
[Epoch 55 Batch 60/172] avg loss 0.00462282, throughput 3.40393K wps
[Epoch 55 Batch 90/172] avg loss 0.00453719, throughput 3.15114K wps
[Epoch 55 Batch 120/172] avg loss 0.00461675, throughput 3.47989K wps
[Epoch 55 Batch 150/172] avg loss 0.00519266, throughput 3.68835K wps
Begin Testing...
[Epoch 55] train avg loss 0.00481306, dev acc 0.9036, dev avg loss 0.275718, throughput 3.30445K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/172] avg loss 0.00483887, throughput 3.17435K wps
[Epoch 56 Batch 60/172] avg loss 0.00474217, throughput 3.96298K wps
[Epoch 56 Batch 90/172] avg loss 0.00458859, throughput 3.39785K wps
[Epoch 56 Batch 120/172] avg loss 0.00508376, throughput 3.91414K wps
[Epoch 56 Batch 150/172] avg loss 0.00441207, throughput 3.2988K wps
Begin Testing...
[Epoch 56] train avg loss 0.00474882, dev acc 0.9004, dev avg loss 0.275401, throughput 3.56139K wps
[Epoch 57 Batch 30/172] avg loss 0.00500956, throughput 3.51816K wps
[Epoch 57 Batch 60/172] avg loss 0.00436793, throughput 3.3235K wps
[Epoch 57 Batch 90/172] avg loss 0.00573365, throughput 3.36605K wps
[Epoch 57 Batch 120/172] avg loss 0.00452325, throughput 3.86883K wps
[Epoch 57 Batch 150/172] avg loss 0.00458094, throughput 3.25476K wps
Begin Testing...
[Epoch 57] train avg loss 0.00478203, dev acc 0.9025, dev avg loss 0.277034, throughput 3.3865K wps
[Epoch 58 Batch 30/172] avg loss 0.00476269, throughput 3.21799K wps
[Epoch 58 Batch 60/172] avg loss 0.00474581, throughput 3.00465K wps
[Epoch 58 Batch 90/172] avg loss 0.00451577, throughput 3.07569K wps
[Epoch 58 Batch 120/172] avg loss 0.00494537, throughput 3.60002K wps
[Epoch 58 Batch 150/172] avg loss 0.00466577, throughput 3.3681K wps
Begin Testing...
[Epoch 58] train avg loss 0.00471365, dev acc 0.9015, dev avg loss 0.27478, throughput 3.26695K wps
[Epoch 59 Batch 30/172] avg loss 0.00503751, throughput 3.0469K wps
[Epoch 59 Batch 60/172] avg loss 0.00471413, throughput 3.10554K wps
[Epoch 59 Batch 90/172] avg loss 0.00481769, throughput 3.43846K wps
[Epoch 59 Batch 120/172] avg loss 0.00442973, throughput 3.44498K wps
[Epoch 59 Batch 150/172] avg loss 0.00452096, throughput 3.45574K wps
Begin Testing...
[Epoch 59] train avg loss 0.00474618, dev acc 0.9004, dev avg loss 0.274544, throughput 3.3556K wps
[Epoch 60 Batch 30/172] avg loss 0.00446012, throughput 3.0387K wps
[Epoch 60 Batch 60/172] avg loss 0.00420046, throughput 2.99487K wps
[Epoch 60 Batch 90/172] avg loss 0.00439904, throughput 3.07888K wps
[Epoch 60 Batch 120/172] avg loss 0.00506302, throughput 2.9863K wps
[Epoch 60 Batch 150/172] avg loss 0.00476582, throughput 3.71551K wps
Begin Testing...
[Epoch 60] train avg loss 0.00465536, dev acc 0.8994, dev avg loss 0.274213, throughput 3.11445K wps
[Epoch 61 Batch 30/172] avg loss 0.00415061, throughput 3.14604K wps
[Epoch 61 Batch 60/172] avg loss 0.0049604, throughput 3.24154K wps
[Epoch 61 Batch 90/172] avg loss 0.00446826, throughput 3.22382K wps
[Epoch 61 Batch 120/172] avg loss 0.00452696, throughput 3.88778K wps
[Epoch 61 Batch 150/172] avg loss 0.00472762, throughput 3.01293K wps
Begin Testing...
[Epoch 61] train avg loss 0.00457438, dev acc 0.9015, dev avg loss 0.274235, throughput 3.31431K wps
[Epoch 62 Batch 30/172] avg loss 0.0043492, throughput 3.27454K wps
[Epoch 62 Batch 60/172] avg loss 0.00461129, throughput 3.7905K wps
[Epoch 62 Batch 90/172] avg loss 0.00488859, throughput 3.29141K wps
[Epoch 62 Batch 120/172] avg loss 0.00443988, throughput 3.24441K wps
[Epoch 62 Batch 150/172] avg loss 0.00464301, throughput 3.26885K wps
Begin Testing...
[Epoch 62] train avg loss 0.00461329, dev acc 0.9004, dev avg loss 0.273422, throughput 3.3364K wps
[Epoch 63 Batch 30/172] avg loss 0.00461023, throughput 3.18613K wps
[Epoch 63 Batch 60/172] avg loss 0.0046206, throughput 3.02792K wps
[Epoch 63 Batch 90/172] avg loss 0.00443315, throughput 3.48545K wps
[Epoch 63 Batch 120/172] avg loss 0.00484585, throughput 3.20765K wps
[Epoch 63 Batch 150/172] avg loss 0.00495113, throughput 3.77402K wps
Begin Testing...
[Epoch 63] train avg loss 0.0046452, dev acc 0.9004, dev avg loss 0.273423, throughput 3.2583K wps
[Epoch 64 Batch 30/172] avg loss 0.00421539, throughput 3.39844K wps
[Epoch 64 Batch 60/172] avg loss 0.00515374, throughput 4.01922K wps
[Epoch 64 Batch 90/172] avg loss 0.00433426, throughput 3.36014K wps
[Epoch 64 Batch 120/172] avg loss 0.00466524, throughput 3.221K wps
[Epoch 64 Batch 150/172] avg loss 0.0047047, throughput 3.26778K wps
Begin Testing...
[Epoch 64] train avg loss 0.00459584, dev acc 0.9004, dev avg loss 0.273668, throughput 3.40224K wps
[Epoch 65 Batch 30/172] avg loss 0.00439133, throughput 3.13862K wps
[Epoch 65 Batch 60/172] avg loss 0.00435657, throughput 3.06407K wps
[Epoch 65 Batch 90/172] avg loss 0.00439302, throughput 3.01502K wps
[Epoch 65 Batch 120/172] avg loss 0.00462933, throughput 3.34246K wps
[Epoch 65 Batch 150/172] avg loss 0.00489872, throughput 4.028K wps
Begin Testing...
[Epoch 65] train avg loss 0.00453529, dev acc 0.9015, dev avg loss 0.273214, throughput 3.25431K wps
[Epoch 66 Batch 30/172] avg loss 0.00448313, throughput 2.99758K wps
[Epoch 66 Batch 60/172] avg loss 0.00449517, throughput 3.2008K wps
[Epoch 66 Batch 90/172] avg loss 0.00460733, throughput 3.12848K wps
[Epoch 66 Batch 120/172] avg loss 0.00448735, throughput 3.77674K wps
[Epoch 66 Batch 150/172] avg loss 0.0044213, throughput 3.5122K wps
Begin Testing...
[Epoch 66] train avg loss 0.00453096, dev acc 0.9015, dev avg loss 0.272677, throughput 3.27709K wps
[Epoch 67 Batch 30/172] avg loss 0.00462693, throughput 2.9847K wps
[Epoch 67 Batch 60/172] avg loss 0.00402518, throughput 3.5304K wps
[Epoch 67 Batch 90/172] avg loss 0.00461667, throughput 3.45076K wps
[Epoch 67 Batch 120/172] avg loss 0.00426923, throughput 3.22772K wps
[Epoch 67 Batch 150/172] avg loss 0.00459233, throughput 3.48881K wps
Begin Testing...
[Epoch 67] train avg loss 0.00443184, dev acc 0.9057, dev avg loss 0.275122, throughput 3.37331K wps
Observed Improvement.
Begin Testing...
[Epoch 68 Batch 30/172] avg loss 0.00462199, throughput 3.34887K wps
[Epoch 68 Batch 60/172] avg loss 0.00472331, throughput 3.46984K wps
[Epoch 68 Batch 90/172] avg loss 0.00418975, throughput 3.21401K wps
[Epoch 68 Batch 120/172] avg loss 0.00437117, throughput 3.19733K wps
[Epoch 68 Batch 150/172] avg loss 0.00465282, throughput 3.149K wps
Begin Testing...
[Epoch 68] train avg loss 0.0045098, dev acc 0.9036, dev avg loss 0.273379, throughput 3.24903K wps
[Epoch 69 Batch 30/172] avg loss 0.00433414, throughput 3.73261K wps
[Epoch 69 Batch 60/172] avg loss 0.00516008, throughput 3.26296K wps
[Epoch 69 Batch 90/172] avg loss 0.00421044, throughput 3.33024K wps
[Epoch 69 Batch 120/172] avg loss 0.00461055, throughput 3.15497K wps
[Epoch 69 Batch 150/172] avg loss 0.00437252, throughput 3.04485K wps
Begin Testing...
[Epoch 69] train avg loss 0.00450004, dev acc 0.9004, dev avg loss 0.271661, throughput 3.26311K wps
[Epoch 70 Batch 30/172] avg loss 0.00450702, throughput 3.2929K wps
[Epoch 70 Batch 60/172] avg loss 0.00426507, throughput 3.63524K wps
[Epoch 70 Batch 90/172] avg loss 0.00430201, throughput 3.59996K wps
[Epoch 70 Batch 120/172] avg loss 0.00435223, throughput 3.62245K wps
[Epoch 70 Batch 150/172] avg loss 0.00463313, throughput 3.39337K wps
Begin Testing...
[Epoch 70] train avg loss 0.00445979, dev acc 0.8994, dev avg loss 0.271379, throughput 3.49726K wps
[Epoch 71 Batch 30/172] avg loss 0.00473222, throughput 3.24188K wps
[Epoch 71 Batch 60/172] avg loss 0.00402116, throughput 3.09987K wps
[Epoch 71 Batch 90/172] avg loss 0.00429519, throughput 3.36089K wps
[Epoch 71 Batch 120/172] avg loss 0.00430398, throughput 3.13437K wps
[Epoch 71 Batch 150/172] avg loss 0.00476537, throughput 3.05998K wps
Begin Testing...
[Epoch 71] train avg loss 0.00443224, dev acc 0.9004, dev avg loss 0.271528, throughput 3.18829K wps
[Epoch 72 Batch 30/172] avg loss 0.00455991, throughput 3.25456K wps
[Epoch 72 Batch 60/172] avg loss 0.00449733, throughput 3.03777K wps
[Epoch 72 Batch 90/172] avg loss 0.00451996, throughput 3.97822K wps
[Epoch 72 Batch 120/172] avg loss 0.00459526, throughput 3.66758K wps
[Epoch 72 Batch 150/172] avg loss 0.00414273, throughput 3.20368K wps
Begin Testing...
[Epoch 72] train avg loss 0.00436714, dev acc 0.9025, dev avg loss 0.272166, throughput 3.41274K wps
[Epoch 73 Batch 30/172] avg loss 0.00439817, throughput 2.98435K wps
[Epoch 73 Batch 60/172] avg loss 0.00435611, throughput 3.2149K wps
[Epoch 73 Batch 90/172] avg loss 0.00451907, throughput 3.22883K wps
[Epoch 73 Batch 120/172] avg loss 0.00419102, throughput 3.16795K wps
[Epoch 73 Batch 150/172] avg loss 0.00424984, throughput 3.96603K wps
Begin Testing...
[Epoch 73] train avg loss 0.00433775, dev acc 0.9004, dev avg loss 0.271022, throughput 3.23508K wps
[Epoch 74 Batch 30/172] avg loss 0.00417152, throughput 3.23762K wps
[Epoch 74 Batch 60/172] avg loss 0.00423575, throughput 3.48873K wps
[Epoch 74 Batch 90/172] avg loss 0.00467189, throughput 3.05011K wps
[Epoch 74 Batch 120/172] avg loss 0.00455028, throughput 2.92747K wps
[Epoch 74 Batch 150/172] avg loss 0.00388144, throughput 2.89775K wps
Begin Testing...
[Epoch 74] train avg loss 0.00432434, dev acc 0.9015, dev avg loss 0.27071, throughput 3.10602K wps
[Epoch 75 Batch 30/172] avg loss 0.00461136, throughput 3.20157K wps
[Epoch 75 Batch 60/172] avg loss 0.00446231, throughput 3.00591K wps
[Epoch 75 Batch 90/172] avg loss 0.00418863, throughput 3.18456K wps
[Epoch 75 Batch 120/172] avg loss 0.00376047, throughput 3.16469K wps
[Epoch 75 Batch 150/172] avg loss 0.00451228, throughput 3.06897K wps
Begin Testing...
[Epoch 75] train avg loss 0.0043326, dev acc 0.8994, dev avg loss 0.271537, throughput 3.13314K wps
[Epoch 76 Batch 30/172] avg loss 0.00425507, throughput 3.04753K wps
[Epoch 76 Batch 60/172] avg loss 0.00426706, throughput 3.36722K wps
[Epoch 76 Batch 90/172] avg loss 0.0040157, throughput 3.20554K wps
[Epoch 76 Batch 120/172] avg loss 0.00474264, throughput 3.09243K wps
[Epoch 76 Batch 150/172] avg loss 0.00439345, throughput 3.54495K wps
Begin Testing...
[Epoch 76] train avg loss 0.00433132, dev acc 0.9015, dev avg loss 0.270568, throughput 3.18787K wps
[Epoch 77 Batch 30/172] avg loss 0.00438544, throughput 3.17262K wps
[Epoch 77 Batch 60/172] avg loss 0.00442032, throughput 3.87126K wps
[Epoch 77 Batch 90/172] avg loss 0.00447105, throughput 3.46729K wps
[Epoch 77 Batch 120/172] avg loss 0.00427309, throughput 3.01638K wps
[Epoch 77 Batch 150/172] avg loss 0.0041485, throughput 3.02983K wps
Begin Testing...
[Epoch 77] train avg loss 0.00425203, dev acc 0.8983, dev avg loss 0.270323, throughput 3.36788K wps
[Epoch 78 Batch 30/172] avg loss 0.00418694, throughput 3.23565K wps
[Epoch 78 Batch 60/172] avg loss 0.00407764, throughput 2.99002K wps
[Epoch 78 Batch 90/172] avg loss 0.00489657, throughput 3.03927K wps
[Epoch 78 Batch 120/172] avg loss 0.00441839, throughput 3.1842K wps
[Epoch 78 Batch 150/172] avg loss 0.00440857, throughput 3.2279K wps
Begin Testing...
[Epoch 78] train avg loss 0.00431021, dev acc 0.9004, dev avg loss 0.27081, throughput 3.10072K wps
[Epoch 79 Batch 30/172] avg loss 0.0038957, throughput 3.28491K wps
[Epoch 79 Batch 60/172] avg loss 0.00429544, throughput 3.63184K wps
[Epoch 79 Batch 90/172] avg loss 0.00423006, throughput 3.25059K wps
[Epoch 79 Batch 120/172] avg loss 0.0046141, throughput 2.98884K wps
[Epoch 79 Batch 150/172] avg loss 0.00447914, throughput 3.02242K wps
Begin Testing...
[Epoch 79] train avg loss 0.00429022, dev acc 0.9046, dev avg loss 0.272252, throughput 3.21014K wps
[Epoch 80 Batch 30/172] avg loss 0.004855, throughput 3.23922K wps
[Epoch 80 Batch 60/172] avg loss 0.00409554, throughput 3.26413K wps
[Epoch 80 Batch 90/172] avg loss 0.00400377, throughput 3.31062K wps
[Epoch 80 Batch 120/172] avg loss 0.00388542, throughput 3.15715K wps
[Epoch 80 Batch 150/172] avg loss 0.00434301, throughput 3.07309K wps
Begin Testing...
[Epoch 80] train avg loss 0.00423647, dev acc 0.9004, dev avg loss 0.270322, throughput 3.23424K wps
[Epoch 81 Batch 30/172] avg loss 0.00450778, throughput 3.10345K wps
[Epoch 81 Batch 60/172] avg loss 0.00414082, throughput 3.74174K wps
[Epoch 81 Batch 90/172] avg loss 0.00417283, throughput 2.98803K wps
[Epoch 81 Batch 120/172] avg loss 0.00395974, throughput 3.62254K wps
[Epoch 81 Batch 150/172] avg loss 0.00400581, throughput 3.13342K wps
Begin Testing...
[Epoch 81] train avg loss 0.00413171, dev acc 0.8983, dev avg loss 0.27037, throughput 3.28682K wps
[Epoch 82 Batch 30/172] avg loss 0.00419146, throughput 3.19438K wps
[Epoch 82 Batch 60/172] avg loss 0.00395215, throughput 3.13284K wps
[Epoch 82 Batch 90/172] avg loss 0.00429956, throughput 2.9988K wps
[Epoch 82 Batch 120/172] avg loss 0.00451957, throughput 2.97271K wps
[Epoch 82 Batch 150/172] avg loss 0.00379551, throughput 3.70612K wps
Begin Testing...
[Epoch 82] train avg loss 0.0042124, dev acc 0.9004, dev avg loss 0.269768, throughput 3.24214K wps
[Epoch 83 Batch 30/172] avg loss 0.00398277, throughput 3.14449K wps
[Epoch 83 Batch 60/172] avg loss 0.00453005, throughput 3.53948K wps
[Epoch 83 Batch 90/172] avg loss 0.00411547, throughput 2.98393K wps
[Epoch 83 Batch 120/172] avg loss 0.00408247, throughput 3.30955K wps
[Epoch 83 Batch 150/172] avg loss 0.00404682, throughput 3.78948K wps
Begin Testing...
[Epoch 83] train avg loss 0.00417423, dev acc 0.9015, dev avg loss 0.269913, throughput 3.35357K wps
[Epoch 84 Batch 30/172] avg loss 0.00413476, throughput 3.14459K wps
[Epoch 84 Batch 60/172] avg loss 0.00435787, throughput 3.36883K wps
[Epoch 84 Batch 90/172] avg loss 0.00376966, throughput 3.47202K wps
[Epoch 84 Batch 120/172] avg loss 0.0046486, throughput 3.24132K wps
[Epoch 84 Batch 150/172] avg loss 0.00375093, throughput 4.04628K wps
Begin Testing...
[Epoch 84] train avg loss 0.00415342, dev acc 0.9025, dev avg loss 0.269396, throughput 3.38766K wps
[Epoch 85 Batch 30/172] avg loss 0.00412949, throughput 3.34839K wps
[Epoch 85 Batch 60/172] avg loss 0.00421991, throughput 3.09889K wps
[Epoch 85 Batch 90/172] avg loss 0.00405075, throughput 3.07375K wps
[Epoch 85 Batch 120/172] avg loss 0.00397672, throughput 3.15437K wps
[Epoch 85 Batch 150/172] avg loss 0.004059, throughput 3.22585K wps
Begin Testing...
[Epoch 85] train avg loss 0.00412468, dev acc 0.9046, dev avg loss 0.271152, throughput 3.1993K wps
[Epoch 86 Batch 30/172] avg loss 0.00440537, throughput 3.50621K wps
[Epoch 86 Batch 60/172] avg loss 0.00360179, throughput 3.42526K wps
[Epoch 86 Batch 90/172] avg loss 0.00389161, throughput 3.33668K wps
[Epoch 86 Batch 120/172] avg loss 0.00424539, throughput 3.41563K wps
[Epoch 86 Batch 150/172] avg loss 0.00447996, throughput 3.10498K wps
Begin Testing...
[Epoch 86] train avg loss 0.00411189, dev acc 0.9015, dev avg loss 0.268929, throughput 3.33727K wps
[Epoch 87 Batch 30/172] avg loss 0.00398517, throughput 3.04951K wps
[Epoch 87 Batch 60/172] avg loss 0.0045184, throughput 3.18439K wps
[Epoch 87 Batch 90/172] avg loss 0.0037172, throughput 3.46846K wps
[Epoch 87 Batch 120/172] avg loss 0.00391926, throughput 3.52273K wps
[Epoch 87 Batch 150/172] avg loss 0.00402327, throughput 3.65293K wps
Begin Testing...
[Epoch 87] train avg loss 0.00408503, dev acc 0.9025, dev avg loss 0.268952, throughput 3.30933K wps
[Epoch 88 Batch 30/172] avg loss 0.00390875, throughput 3.42106K wps
[Epoch 88 Batch 60/172] avg loss 0.00364177, throughput 3.27303K wps
[Epoch 88 Batch 90/172] avg loss 0.00397743, throughput 3.51893K wps
[Epoch 88 Batch 120/172] avg loss 0.0041597, throughput 3.14648K wps
[Epoch 88 Batch 150/172] avg loss 0.00446393, throughput 3.17725K wps
Begin Testing...
[Epoch 88] train avg loss 0.00404939, dev acc 0.9036, dev avg loss 0.268927, throughput 3.24514K wps
[Epoch 89 Batch 30/172] avg loss 0.00384886, throughput 3.11723K wps
[Epoch 89 Batch 60/172] avg loss 0.00425446, throughput 3.30634K wps
[Epoch 89 Batch 90/172] avg loss 0.0044165, throughput 3.49551K wps
[Epoch 89 Batch 120/172] avg loss 0.00394451, throughput 3.87434K wps
[Epoch 89 Batch 150/172] avg loss 0.00388073, throughput 3.17287K wps
Begin Testing...
[Epoch 89] train avg loss 0.00405574, dev acc 0.9036, dev avg loss 0.26817, throughput 3.38867K wps
[Epoch 90 Batch 30/172] avg loss 0.0037614, throughput 3.16816K wps
[Epoch 90 Batch 60/172] avg loss 0.00411573, throughput 3.28731K wps
[Epoch 90 Batch 90/172] avg loss 0.00434161, throughput 3.26793K wps
[Epoch 90 Batch 120/172] avg loss 0.00379984, throughput 3.25159K wps
[Epoch 90 Batch 150/172] avg loss 0.00415933, throughput 3.17202K wps
Begin Testing...
[Epoch 90] train avg loss 0.00406363, dev acc 0.9025, dev avg loss 0.268884, throughput 3.26401K wps
[Epoch 91 Batch 30/172] avg loss 0.00419205, throughput 3.36576K wps
[Epoch 91 Batch 60/172] avg loss 0.00393803, throughput 3.13011K wps
[Epoch 91 Batch 90/172] avg loss 0.00379544, throughput 3.4711K wps
[Epoch 91 Batch 120/172] avg loss 0.00418216, throughput 3.27338K wps
[Epoch 91 Batch 150/172] avg loss 0.00400338, throughput 3.41944K wps
Begin Testing...
[Epoch 91] train avg loss 0.00397841, dev acc 0.9036, dev avg loss 0.26891, throughput 3.35085K wps
[Epoch 92 Batch 30/172] avg loss 0.00444049, throughput 2.99514K wps
[Epoch 92 Batch 60/172] avg loss 0.00376398, throughput 3.20791K wps
[Epoch 92 Batch 90/172] avg loss 0.00388337, throughput 3.29582K wps
[Epoch 92 Batch 120/172] avg loss 0.00366444, throughput 3.03947K wps
[Epoch 92 Batch 150/172] avg loss 0.00397049, throughput 3.15608K wps
Begin Testing...
[Epoch 92] train avg loss 0.00399473, dev acc 0.9036, dev avg loss 0.267641, throughput 3.15474K wps
[Epoch 93 Batch 30/172] avg loss 0.00450875, throughput 3.76245K wps
[Epoch 93 Batch 60/172] avg loss 0.00387159, throughput 3.22837K wps
[Epoch 93 Batch 90/172] avg loss 0.0036655, throughput 3.25807K wps
[Epoch 93 Batch 120/172] avg loss 0.00387175, throughput 3.18837K wps
[Epoch 93 Batch 150/172] avg loss 0.00398513, throughput 2.92854K wps
Begin Testing...
[Epoch 93] train avg loss 0.00397832, dev acc 0.9025, dev avg loss 0.267731, throughput 3.21414K wps
[Epoch 94 Batch 30/172] avg loss 0.00415742, throughput 3.69393K wps
[Epoch 94 Batch 60/172] avg loss 0.00398062, throughput 3.67254K wps
[Epoch 94 Batch 90/172] avg loss 0.00362703, throughput 3.38849K wps
[Epoch 94 Batch 120/172] avg loss 0.00430693, throughput 3.23194K wps
[Epoch 94 Batch 150/172] avg loss 0.00376664, throughput 3.23174K wps
Begin Testing...
[Epoch 94] train avg loss 0.00395118, dev acc 0.9015, dev avg loss 0.267699, throughput 3.4088K wps
[Epoch 95 Batch 30/172] avg loss 0.00370323, throughput 3.17682K wps
[Epoch 95 Batch 60/172] avg loss 0.00390967, throughput 3.06295K wps
[Epoch 95 Batch 90/172] avg loss 0.00388734, throughput 3.10568K wps
[Epoch 95 Batch 120/172] avg loss 0.00438635, throughput 3.20369K wps
[Epoch 95 Batch 150/172] avg loss 0.00389627, throughput 3.16816K wps
Begin Testing...
[Epoch 95] train avg loss 0.00394784, dev acc 0.9067, dev avg loss 0.267816, throughput 3.18484K wps
Observed Improvement.
Begin Testing...
[Epoch 96 Batch 30/172] avg loss 0.00415929, throughput 3.62231K wps
[Epoch 96 Batch 60/172] avg loss 0.00369667, throughput 3.44594K wps
[Epoch 96 Batch 90/172] avg loss 0.00411374, throughput 3.03085K wps
[Epoch 96 Batch 120/172] avg loss 0.00355449, throughput 3.05813K wps
[Epoch 96 Batch 150/172] avg loss 0.00404241, throughput 3.50069K wps
Begin Testing...
[Epoch 96] train avg loss 0.00392342, dev acc 0.9046, dev avg loss 0.267846, throughput 3.30546K wps
[Epoch 97 Batch 30/172] avg loss 0.00389706, throughput 3.75783K wps
[Epoch 97 Batch 60/172] avg loss 0.00372553, throughput 3.11623K wps
[Epoch 97 Batch 90/172] avg loss 0.0039851, throughput 3.97062K wps
[Epoch 97 Batch 120/172] avg loss 0.00401143, throughput 3.70095K wps
[Epoch 97 Batch 150/172] avg loss 0.00390636, throughput 2.98592K wps
Begin Testing...
[Epoch 97] train avg loss 0.00388682, dev acc 0.9036, dev avg loss 0.269454, throughput 3.4708K wps
[Epoch 98 Batch 30/172] avg loss 0.00358283, throughput 3.36243K wps
[Epoch 98 Batch 60/172] avg loss 0.00394968, throughput 3.25561K wps
[Epoch 98 Batch 90/172] avg loss 0.00393694, throughput 3.69019K wps
[Epoch 98 Batch 120/172] avg loss 0.00358979, throughput 3.02618K wps
[Epoch 98 Batch 150/172] avg loss 0.00414226, throughput 3.45893K wps
Begin Testing...
[Epoch 98] train avg loss 0.00384526, dev acc 0.9046, dev avg loss 0.267895, throughput 3.30464K wps
[Epoch 99 Batch 30/172] avg loss 0.00384511, throughput 3.5241K wps
[Epoch 99 Batch 60/172] avg loss 0.00389137, throughput 3.69993K wps
[Epoch 99 Batch 90/172] avg loss 0.00382598, throughput 3.88573K wps
[Epoch 99 Batch 120/172] avg loss 0.00394029, throughput 3.45513K wps
[Epoch 99 Batch 150/172] avg loss 0.0038986, throughput 3.2287K wps
Begin Testing...
[Epoch 99] train avg loss 0.00381946, dev acc 0.9046, dev avg loss 0.26862, throughput 3.5284K wps
[Epoch 100 Batch 30/172] avg loss 0.00396993, throughput 3.75487K wps
[Epoch 100 Batch 60/172] avg loss 0.00374519, throughput 3.59422K wps
[Epoch 100 Batch 90/172] avg loss 0.00394298, throughput 4.07707K wps
[Epoch 100 Batch 120/172] avg loss 0.00379911, throughput 3.66274K wps
[Epoch 100 Batch 150/172] avg loss 0.00383138, throughput 3.81433K wps
Begin Testing...
[Epoch 100] train avg loss 0.00383497, dev acc 0.9057, dev avg loss 0.268379, throughput 3.66592K wps
[Epoch 101 Batch 30/172] avg loss 0.00345545, throughput 3.39831K wps
[Epoch 101 Batch 60/172] avg loss 0.00409011, throughput 3.57851K wps
[Epoch 101 Batch 90/172] avg loss 0.00389847, throughput 2.96622K wps
[Epoch 101 Batch 120/172] avg loss 0.00363759, throughput 3.31897K wps
[Epoch 101 Batch 150/172] avg loss 0.00390828, throughput 4.01299K wps
Begin Testing...
[Epoch 101] train avg loss 0.00381298, dev acc 0.9036, dev avg loss 0.267675, throughput 3.46329K wps
[Epoch 102 Batch 30/172] avg loss 0.00347066, throughput 3.06435K wps
[Epoch 102 Batch 60/172] avg loss 0.00376038, throughput 3.24004K wps
[Epoch 102 Batch 90/172] avg loss 0.00376462, throughput 3.27893K wps
[Epoch 102 Batch 120/172] avg loss 0.00368079, throughput 3.3753K wps
[Epoch 102 Batch 150/172] avg loss 0.00426537, throughput 3.09306K wps
Begin Testing...
[Epoch 102] train avg loss 0.0037625, dev acc 0.9057, dev avg loss 0.267887, throughput 3.1861K wps
[Epoch 103 Batch 30/172] avg loss 0.00345668, throughput 3.19361K wps
[Epoch 103 Batch 60/172] avg loss 0.00383649, throughput 3.21555K wps
[Epoch 103 Batch 90/172] avg loss 0.00383095, throughput 3.79755K wps
[Epoch 103 Batch 120/172] avg loss 0.00364363, throughput 3.40438K wps
[Epoch 103 Batch 150/172] avg loss 0.00360521, throughput 3.0577K wps
Begin Testing...
[Epoch 103] train avg loss 0.00380189, dev acc 0.9004, dev avg loss 0.26686, throughput 3.28067K wps
[Epoch 104 Batch 30/172] avg loss 0.00378652, throughput 2.94826K wps
[Epoch 104 Batch 60/172] avg loss 0.00379015, throughput 2.96687K wps
[Epoch 104 Batch 90/172] avg loss 0.00386747, throughput 3.34819K wps
[Epoch 104 Batch 120/172] avg loss 0.00363812, throughput 3.35954K wps
[Epoch 104 Batch 150/172] avg loss 0.00379331, throughput 3.43301K wps
Begin Testing...
[Epoch 104] train avg loss 0.00374911, dev acc 0.9067, dev avg loss 0.268673, throughput 3.22765K wps
Observed Improvement.
Begin Testing...
[Epoch 105 Batch 30/172] avg loss 0.00367307, throughput 3.0226K wps
[Epoch 105 Batch 60/172] avg loss 0.0037602, throughput 3.28814K wps
[Epoch 105 Batch 90/172] avg loss 0.00379353, throughput 3.0723K wps
[Epoch 105 Batch 120/172] avg loss 0.0037443, throughput 3.67789K wps
[Epoch 105 Batch 150/172] avg loss 0.00382226, throughput 3.11081K wps
Begin Testing...
[Epoch 105] train avg loss 0.00373243, dev acc 0.9057, dev avg loss 0.2679, throughput 3.25105K wps
[Epoch 106 Batch 30/172] avg loss 0.00379679, throughput 3.36798K wps
[Epoch 106 Batch 60/172] avg loss 0.00378679, throughput 3.23515K wps
[Epoch 106 Batch 90/172] avg loss 0.00383339, throughput 3.37686K wps
[Epoch 106 Batch 120/172] avg loss 0.00384445, throughput 2.95516K wps
[Epoch 106 Batch 150/172] avg loss 0.00328119, throughput 3.25233K wps
Begin Testing...
[Epoch 106] train avg loss 0.00373028, dev acc 0.9046, dev avg loss 0.267348, throughput 3.24386K wps
[Epoch 107 Batch 30/172] avg loss 0.00376769, throughput 3.23999K wps
[Epoch 107 Batch 60/172] avg loss 0.00338565, throughput 3.14677K wps
[Epoch 107 Batch 90/172] avg loss 0.00380547, throughput 3.11933K wps
[Epoch 107 Batch 120/172] avg loss 0.00384298, throughput 3.62887K wps
[Epoch 107 Batch 150/172] avg loss 0.00355571, throughput 3.06119K wps
Begin Testing...
[Epoch 107] train avg loss 0.00375, dev acc 0.9025, dev avg loss 0.266792, throughput 3.21553K wps
[Epoch 108 Batch 30/172] avg loss 0.00343564, throughput 3.54288K wps
[Epoch 108 Batch 60/172] avg loss 0.00391751, throughput 3.33935K wps
[Epoch 108 Batch 90/172] avg loss 0.00363017, throughput 3.47237K wps
[Epoch 108 Batch 120/172] avg loss 0.00415608, throughput 3.8382K wps
[Epoch 108 Batch 150/172] avg loss 0.00363511, throughput 3.32614K wps
Begin Testing...
[Epoch 108] train avg loss 0.00371222, dev acc 0.9067, dev avg loss 0.268578, throughput 3.46702K wps
Observed Improvement.
Begin Testing...
[Epoch 109 Batch 30/172] avg loss 0.00378663, throughput 3.34387K wps
[Epoch 109 Batch 60/172] avg loss 0.0034192, throughput 3.85237K wps
[Epoch 109 Batch 90/172] avg loss 0.00337625, throughput 3.27333K wps
[Epoch 109 Batch 120/172] avg loss 0.00366111, throughput 3.02985K wps
[Epoch 109 Batch 150/172] avg loss 0.00378955, throughput 3.09262K wps
Begin Testing...
[Epoch 109] train avg loss 0.00366679, dev acc 0.9057, dev avg loss 0.267738, throughput 3.27336K wps
[Epoch 110 Batch 30/172] avg loss 0.0038681, throughput 3.15976K wps
[Epoch 110 Batch 60/172] avg loss 0.0033793, throughput 3.81261K wps
[Epoch 110 Batch 90/172] avg loss 0.00356989, throughput 3.40093K wps
[Epoch 110 Batch 120/172] avg loss 0.00342997, throughput 2.96111K wps
[Epoch 110 Batch 150/172] avg loss 0.00372084, throughput 3.52646K wps
Begin Testing...
[Epoch 110] train avg loss 0.00365034, dev acc 0.9015, dev avg loss 0.267127, throughput 3.32279K wps
[Epoch 111 Batch 30/172] avg loss 0.00371742, throughput 4.02277K wps
[Epoch 111 Batch 60/172] avg loss 0.00350182, throughput 3.86533K wps
[Epoch 111 Batch 90/172] avg loss 0.00391583, throughput 3.20494K wps
[Epoch 111 Batch 120/172] avg loss 0.00363972, throughput 3.6821K wps
[Epoch 111 Batch 150/172] avg loss 0.00388209, throughput 3.22547K wps
Begin Testing...
[Epoch 111] train avg loss 0.00367113, dev acc 0.9067, dev avg loss 0.267741, throughput 3.58888K wps
Observed Improvement.
Begin Testing...
[Epoch 112 Batch 30/172] avg loss 0.00338155, throughput 2.99728K wps
[Epoch 112 Batch 60/172] avg loss 0.00394581, throughput 3.44691K wps
[Epoch 112 Batch 90/172] avg loss 0.00369764, throughput 3.1841K wps
[Epoch 112 Batch 120/172] avg loss 0.00362209, throughput 3.19407K wps
[Epoch 112 Batch 150/172] avg loss 0.00365354, throughput 3.18926K wps
Begin Testing...
[Epoch 112] train avg loss 0.00366228, dev acc 0.9067, dev avg loss 0.266751, throughput 3.179K wps
Observed Improvement.
Begin Testing...
[Epoch 113 Batch 30/172] avg loss 0.0039046, throughput 3.16275K wps
[Epoch 113 Batch 60/172] avg loss 0.00353934, throughput 3.5151K wps
[Epoch 113 Batch 90/172] avg loss 0.00361599, throughput 3.05088K wps
[Epoch 113 Batch 120/172] avg loss 0.00363606, throughput 3.11887K wps
[Epoch 113 Batch 150/172] avg loss 0.00368184, throughput 3.32733K wps
Begin Testing...
[Epoch 113] train avg loss 0.00362691, dev acc 0.9046, dev avg loss 0.272412, throughput 3.26732K wps
[Epoch 114 Batch 30/172] avg loss 0.00379586, throughput 3.48842K wps
[Epoch 114 Batch 60/172] avg loss 0.00349172, throughput 3.29999K wps
[Epoch 114 Batch 90/172] avg loss 0.00374521, throughput 3.1525K wps
[Epoch 114 Batch 120/172] avg loss 0.00372678, throughput 3.14751K wps
[Epoch 114 Batch 150/172] avg loss 0.00371331, throughput 3.46894K wps
Begin Testing...
[Epoch 114] train avg loss 0.00364184, dev acc 0.9067, dev avg loss 0.268729, throughput 3.24905K wps
Observed Improvement.
Begin Testing...
[Epoch 115 Batch 30/172] avg loss 0.00307132, throughput 2.95063K wps
[Epoch 115 Batch 60/172] avg loss 0.00379551, throughput 3.32927K wps
[Epoch 115 Batch 90/172] avg loss 0.00411375, throughput 3.23811K wps
[Epoch 115 Batch 120/172] avg loss 0.00337057, throughput 3.61047K wps
[Epoch 115 Batch 150/172] avg loss 0.00334951, throughput 3.66407K wps
Begin Testing...
[Epoch 115] train avg loss 0.00358579, dev acc 0.9025, dev avg loss 0.266371, throughput 3.3371K wps
[Epoch 116 Batch 30/172] avg loss 0.00353532, throughput 3.31337K wps
[Epoch 116 Batch 60/172] avg loss 0.00374523, throughput 3.21898K wps
[Epoch 116 Batch 90/172] avg loss 0.00350875, throughput 3.33533K wps
[Epoch 116 Batch 120/172] avg loss 0.00341363, throughput 4.07811K wps
[Epoch 116 Batch 150/172] avg loss 0.003545, throughput 3.5605K wps
Begin Testing...
[Epoch 116] train avg loss 0.00355432, dev acc 0.9046, dev avg loss 0.267623, throughput 3.48207K wps
[Epoch 117 Batch 30/172] avg loss 0.00367452, throughput 3.64946K wps
[Epoch 117 Batch 60/172] avg loss 0.00348317, throughput 3.40628K wps
[Epoch 117 Batch 90/172] avg loss 0.00359248, throughput 2.99262K wps
[Epoch 117 Batch 120/172] avg loss 0.00350173, throughput 3.15419K wps
[Epoch 117 Batch 150/172] avg loss 0.00359545, throughput 3.62008K wps
Begin Testing...
[Epoch 117] train avg loss 0.00354201, dev acc 0.9046, dev avg loss 0.267385, throughput 3.33966K wps
[Epoch 118 Batch 30/172] avg loss 0.00375466, throughput 3.33212K wps
[Epoch 118 Batch 60/172] avg loss 0.00346833, throughput 3.56237K wps
[Epoch 118 Batch 90/172] avg loss 0.00366665, throughput 3.74329K wps
[Epoch 118 Batch 120/172] avg loss 0.00333497, throughput 3.48717K wps
[Epoch 118 Batch 150/172] avg loss 0.00326378, throughput 3.19036K wps
Begin Testing...
[Epoch 118] train avg loss 0.00351391, dev acc 0.9067, dev avg loss 0.266997, throughput 3.38859K wps
Observed Improvement.
Begin Testing...
[Epoch 119 Batch 30/172] avg loss 0.00336151, throughput 3.19375K wps
[Epoch 119 Batch 60/172] avg loss 0.0032277, throughput 3.6154K wps
[Epoch 119 Batch 90/172] avg loss 0.0037465, throughput 3.2708K wps
[Epoch 119 Batch 120/172] avg loss 0.0034324, throughput 3.43898K wps
[Epoch 119 Batch 150/172] avg loss 0.00365812, throughput 3.48888K wps
Begin Testing...
[Epoch 119] train avg loss 0.00352796, dev acc 0.9036, dev avg loss 0.266228, throughput 3.3241K wps
[Epoch 120 Batch 30/172] avg loss 0.00312929, throughput 2.97113K wps
[Epoch 120 Batch 60/172] avg loss 0.00344016, throughput 3.2288K wps
[Epoch 120 Batch 90/172] avg loss 0.00366283, throughput 3.9739K wps
[Epoch 120 Batch 120/172] avg loss 0.00374817, throughput 3.30426K wps
[Epoch 120 Batch 150/172] avg loss 0.00367052, throughput 3.42124K wps
Begin Testing...
[Epoch 120] train avg loss 0.00347063, dev acc 0.9088, dev avg loss 0.269402, throughput 3.30801K wps
Observed Improvement.
Begin Testing...
[Epoch 121 Batch 30/172] avg loss 0.00384457, throughput 3.25925K wps
[Epoch 121 Batch 60/172] avg loss 0.0034459, throughput 3.23685K wps
[Epoch 121 Batch 90/172] avg loss 0.00320032, throughput 3.43767K wps
[Epoch 121 Batch 120/172] avg loss 0.00327347, throughput 2.93792K wps
[Epoch 121 Batch 150/172] avg loss 0.00385581, throughput 3.1994K wps
Begin Testing...
[Epoch 121] train avg loss 0.00350275, dev acc 0.9036, dev avg loss 0.266785, throughput 3.15905K wps
[Epoch 122 Batch 30/172] avg loss 0.00350146, throughput 3.15222K wps
[Epoch 122 Batch 60/172] avg loss 0.00336693, throughput 3.38153K wps
[Epoch 122 Batch 90/172] avg loss 0.00308776, throughput 3.54945K wps
[Epoch 122 Batch 120/172] avg loss 0.00368702, throughput 3.40619K wps
[Epoch 122 Batch 150/172] avg loss 0.00361614, throughput 3.1107K wps
Begin Testing...
[Epoch 122] train avg loss 0.00345298, dev acc 0.9036, dev avg loss 0.266605, throughput 3.31271K wps
[Epoch 123 Batch 30/172] avg loss 0.00358662, throughput 3.06734K wps
[Epoch 123 Batch 60/172] avg loss 0.00340359, throughput 3.24955K wps
[Epoch 123 Batch 90/172] avg loss 0.00322332, throughput 3.15094K wps
[Epoch 123 Batch 120/172] avg loss 0.00318646, throughput 3.46963K wps
[Epoch 123 Batch 150/172] avg loss 0.0034203, throughput 2.96962K wps
Begin Testing...
[Epoch 123] train avg loss 0.00338586, dev acc 0.9046, dev avg loss 0.266897, throughput 3.15394K wps
[Epoch 124 Batch 30/172] avg loss 0.00334557, throughput 3.03414K wps
[Epoch 124 Batch 60/172] avg loss 0.00376098, throughput 3.65082K wps
[Epoch 124 Batch 90/172] avg loss 0.00335704, throughput 3.5964K wps
[Epoch 124 Batch 120/172] avg loss 0.00339052, throughput 3.11883K wps
[Epoch 124 Batch 150/172] avg loss 0.00336247, throughput 3.06845K wps
Begin Testing...
[Epoch 124] train avg loss 0.00340251, dev acc 0.9067, dev avg loss 0.268211, throughput 3.3003K wps
[Epoch 125 Batch 30/172] avg loss 0.00360422, throughput 3.02757K wps
[Epoch 125 Batch 60/172] avg loss 0.0035743, throughput 3.26891K wps
[Epoch 125 Batch 90/172] avg loss 0.00342397, throughput 3.36046K wps
[Epoch 125 Batch 120/172] avg loss 0.0030457, throughput 3.23807K wps
[Epoch 125 Batch 150/172] avg loss 0.00333339, throughput 3.19958K wps
Begin Testing...
[Epoch 125] train avg loss 0.00341493, dev acc 0.9036, dev avg loss 0.266202, throughput 3.18097K wps
[Epoch 126 Batch 30/172] avg loss 0.00364774, throughput 3.06875K wps
[Epoch 126 Batch 60/172] avg loss 0.00309933, throughput 3.12443K wps
[Epoch 126 Batch 90/172] avg loss 0.00336293, throughput 3.34697K wps
[Epoch 126 Batch 120/172] avg loss 0.0035736, throughput 3.7839K wps
[Epoch 126 Batch 150/172] avg loss 0.0033863, throughput 3.45787K wps
Begin Testing...
[Epoch 126] train avg loss 0.00337085, dev acc 0.9015, dev avg loss 0.266446, throughput 3.37621K wps
[Epoch 127 Batch 30/172] avg loss 0.003267, throughput 3.14938K wps
[Epoch 127 Batch 60/172] avg loss 0.00316827, throughput 3.73046K wps
[Epoch 127 Batch 90/172] avg loss 0.00324756, throughput 3.69006K wps
[Epoch 127 Batch 120/172] avg loss 0.00325625, throughput 3.62052K wps
[Epoch 127 Batch 150/172] avg loss 0.00366165, throughput 3.54981K wps
Begin Testing...
[Epoch 127] train avg loss 0.0033419, dev acc 0.9067, dev avg loss 0.267995, throughput 3.46811K wps
[Epoch 128 Batch 30/172] avg loss 0.00345978, throughput 2.98166K wps
[Epoch 128 Batch 60/172] avg loss 0.00342762, throughput 3.01032K wps
[Epoch 128 Batch 90/172] avg loss 0.00304822, throughput 3.4143K wps
[Epoch 128 Batch 120/172] avg loss 0.00322616, throughput 3.51677K wps
[Epoch 128 Batch 150/172] avg loss 0.00330243, throughput 3.51046K wps
Begin Testing...
[Epoch 128] train avg loss 0.00333559, dev acc 0.9057, dev avg loss 0.268299, throughput 3.26151K wps
[Epoch 129 Batch 30/172] avg loss 0.00340123, throughput 3.08204K wps
[Epoch 129 Batch 60/172] avg loss 0.00325574, throughput 2.91951K wps
[Epoch 129 Batch 90/172] avg loss 0.00337173, throughput 3.11543K wps
[Epoch 129 Batch 120/172] avg loss 0.00351712, throughput 3.68957K wps
[Epoch 129 Batch 150/172] avg loss 0.00303067, throughput 3.10544K wps
Begin Testing...
[Epoch 129] train avg loss 0.00333018, dev acc 0.9078, dev avg loss 0.268023, throughput 3.16358K wps
[Epoch 130 Batch 30/172] avg loss 0.00333051, throughput 2.9251K wps
[Epoch 130 Batch 60/172] avg loss 0.00329581, throughput 2.99965K wps
[Epoch 130 Batch 90/172] avg loss 0.00343016, throughput 3.69146K wps
[Epoch 130 Batch 120/172] avg loss 0.00346142, throughput 3.1552K wps
[Epoch 130 Batch 150/172] avg loss 0.00332541, throughput 3.51569K wps
Begin Testing...
[Epoch 130] train avg loss 0.0033562, dev acc 0.9036, dev avg loss 0.266551, throughput 3.20917K wps
[Epoch 131 Batch 30/172] avg loss 0.00341697, throughput 3.02312K wps
[Epoch 131 Batch 60/172] avg loss 0.00319652, throughput 2.96396K wps
[Epoch 131 Batch 90/172] avg loss 0.0033995, throughput 3.23249K wps
[Epoch 131 Batch 120/172] avg loss 0.00333478, throughput 3.06436K wps
[Epoch 131 Batch 150/172] avg loss 0.00340934, throughput 3.00236K wps
Begin Testing...
[Epoch 131] train avg loss 0.00336692, dev acc 0.9046, dev avg loss 0.267143, throughput 3.08677K wps
[Epoch 132 Batch 30/172] avg loss 0.00359163, throughput 2.9783K wps
[Epoch 132 Batch 60/172] avg loss 0.00325159, throughput 3.52347K wps
[Epoch 132 Batch 90/172] avg loss 0.00308402, throughput 3.06403K wps
[Epoch 132 Batch 120/172] avg loss 0.00312941, throughput 3.09844K wps
[Epoch 132 Batch 150/172] avg loss 0.003125, throughput 3.44059K wps
Begin Testing...
[Epoch 132] train avg loss 0.00326778, dev acc 0.9025, dev avg loss 0.267538, throughput 3.16673K wps
[Epoch 133 Batch 30/172] avg loss 0.00326305, throughput 3.08391K wps
[Epoch 133 Batch 60/172] avg loss 0.0032998, throughput 3.56071K wps
[Epoch 133 Batch 90/172] avg loss 0.00324385, throughput 3.0562K wps
[Epoch 133 Batch 120/172] avg loss 0.00347172, throughput 3.45238K wps<