Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
14554 lines (14553 sloc) 925 KB
Namespace(batch_size=50, data_name='MPQA', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='non-static')
Use gpu0
maximum length (in tokens): 36
Done! Tokenizing Time=0.05s, #Sentences=10606
SentimentNet(
(embedding): Embedding(6250 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/172] avg loss 0.0126928, throughput 0.549151K wps
[Epoch 0 Batch 60/172] avg loss 0.012231, throughput 2.76918K wps
[Epoch 0 Batch 90/172] avg loss 0.0123579, throughput 2.76777K wps
[Epoch 0 Batch 120/172] avg loss 0.012406, throughput 2.77952K wps
[Epoch 0 Batch 150/172] avg loss 0.0125239, throughput 2.79206K wps
Begin Testing...
[Epoch 0] train avg loss 0.0124507, dev acc 0.7013, dev avg loss 0.596259, throughput 1.20274K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0120705, throughput 2.53308K wps
[Epoch 1 Batch 60/172] avg loss 0.0121902, throughput 2.74741K wps
[Epoch 1 Batch 90/172] avg loss 0.0121087, throughput 2.7083K wps
[Epoch 1 Batch 120/172] avg loss 0.0117511, throughput 2.73531K wps
[Epoch 1 Batch 150/172] avg loss 0.0118742, throughput 2.69609K wps
Begin Testing...
[Epoch 1] train avg loss 0.012036, dev acc 0.7013, dev avg loss 0.579904, throughput 2.70109K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0118831, throughput 2.68901K wps
[Epoch 2 Batch 60/172] avg loss 0.0119082, throughput 2.75887K wps
[Epoch 2 Batch 90/172] avg loss 0.0115443, throughput 2.64538K wps
[Epoch 2 Batch 120/172] avg loss 0.0117426, throughput 2.80056K wps
[Epoch 2 Batch 150/172] avg loss 0.0116032, throughput 2.82189K wps
Begin Testing...
[Epoch 2] train avg loss 0.0117496, dev acc 0.7013, dev avg loss 0.566141, throughput 2.7434K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0112746, throughput 2.7854K wps
[Epoch 3 Batch 60/172] avg loss 0.0117582, throughput 2.80729K wps
[Epoch 3 Batch 90/172] avg loss 0.0112166, throughput 2.80576K wps
[Epoch 3 Batch 120/172] avg loss 0.0111017, throughput 2.76712K wps
[Epoch 3 Batch 150/172] avg loss 0.011465, throughput 2.83712K wps
Begin Testing...
[Epoch 3] train avg loss 0.0113436, dev acc 0.7075, dev avg loss 0.547682, throughput 2.80459K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0111585, throughput 2.87166K wps
[Epoch 4 Batch 60/172] avg loss 0.0110493, throughput 2.81711K wps
[Epoch 4 Batch 90/172] avg loss 0.0108553, throughput 2.65157K wps
[Epoch 4 Batch 120/172] avg loss 0.0106288, throughput 2.85442K wps
[Epoch 4 Batch 150/172] avg loss 0.0109444, throughput 2.69169K wps
Begin Testing...
[Epoch 4] train avg loss 0.0109081, dev acc 0.7212, dev avg loss 0.523879, throughput 2.78546K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.010359, throughput 2.80897K wps
[Epoch 5 Batch 60/172] avg loss 0.0106236, throughput 2.77397K wps
[Epoch 5 Batch 90/172] avg loss 0.0101856, throughput 2.82405K wps
[Epoch 5 Batch 120/172] avg loss 0.0103365, throughput 2.80893K wps
[Epoch 5 Batch 150/172] avg loss 0.0102536, throughput 2.78665K wps
Begin Testing...
[Epoch 5] train avg loss 0.0103578, dev acc 0.7579, dev avg loss 0.497988, throughput 2.80261K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.0102423, throughput 2.73289K wps
[Epoch 6 Batch 60/172] avg loss 0.00982508, throughput 2.81688K wps
[Epoch 6 Batch 90/172] avg loss 0.0098585, throughput 2.82827K wps
[Epoch 6 Batch 120/172] avg loss 0.00972761, throughput 2.76261K wps
[Epoch 6 Batch 150/172] avg loss 0.00960476, throughput 2.75232K wps
Begin Testing...
[Epoch 6] train avg loss 0.00979844, dev acc 0.7767, dev avg loss 0.469011, throughput 2.77753K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.00936999, throughput 2.82554K wps
[Epoch 7 Batch 60/172] avg loss 0.00955766, throughput 2.81343K wps
[Epoch 7 Batch 90/172] avg loss 0.00927683, throughput 2.80962K wps
[Epoch 7 Batch 120/172] avg loss 0.00924034, throughput 2.80729K wps
[Epoch 7 Batch 150/172] avg loss 0.00872233, throughput 2.69823K wps
Begin Testing...
[Epoch 7] train avg loss 0.00919154, dev acc 0.8082, dev avg loss 0.440199, throughput 2.79486K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00875735, throughput 2.71455K wps
[Epoch 8 Batch 60/172] avg loss 0.00849978, throughput 2.79018K wps
[Epoch 8 Batch 90/172] avg loss 0.00873235, throughput 2.78302K wps
[Epoch 8 Batch 120/172] avg loss 0.00851956, throughput 2.74658K wps
[Epoch 8 Batch 150/172] avg loss 0.00837778, throughput 2.76537K wps
Begin Testing...
[Epoch 8] train avg loss 0.00857382, dev acc 0.8218, dev avg loss 0.413312, throughput 2.74961K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00807877, throughput 2.65893K wps
[Epoch 9 Batch 60/172] avg loss 0.00815975, throughput 2.77991K wps
[Epoch 9 Batch 90/172] avg loss 0.00789, throughput 2.8007K wps
[Epoch 9 Batch 120/172] avg loss 0.00795063, throughput 2.63606K wps
[Epoch 9 Batch 150/172] avg loss 0.00794494, throughput 2.67481K wps
Begin Testing...
[Epoch 9] train avg loss 0.00801713, dev acc 0.8480, dev avg loss 0.388897, throughput 2.72095K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00770858, throughput 2.75928K wps
[Epoch 10 Batch 60/172] avg loss 0.00756434, throughput 2.6411K wps
[Epoch 10 Batch 90/172] avg loss 0.00737881, throughput 2.7498K wps
[Epoch 10 Batch 120/172] avg loss 0.00754697, throughput 2.77648K wps
[Epoch 10 Batch 150/172] avg loss 0.00752909, throughput 2.74128K wps
Begin Testing...
[Epoch 10] train avg loss 0.00754081, dev acc 0.8595, dev avg loss 0.36841, throughput 2.72798K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00750522, throughput 2.69673K wps
[Epoch 11 Batch 60/172] avg loss 0.00717641, throughput 2.76179K wps
[Epoch 11 Batch 90/172] avg loss 0.00683584, throughput 2.73812K wps
[Epoch 11 Batch 120/172] avg loss 0.0072524, throughput 2.71136K wps
[Epoch 11 Batch 150/172] avg loss 0.00701619, throughput 2.81096K wps
Begin Testing...
[Epoch 11] train avg loss 0.00711113, dev acc 0.8616, dev avg loss 0.352199, throughput 2.74283K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.00666099, throughput 2.79356K wps
[Epoch 12 Batch 60/172] avg loss 0.00681539, throughput 2.73663K wps
[Epoch 12 Batch 90/172] avg loss 0.00684493, throughput 2.74126K wps
[Epoch 12 Batch 120/172] avg loss 0.00669911, throughput 2.80014K wps
[Epoch 12 Batch 150/172] avg loss 0.00691324, throughput 2.71282K wps
Begin Testing...
[Epoch 12] train avg loss 0.00679373, dev acc 0.8700, dev avg loss 0.341799, throughput 2.76714K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.00624144, throughput 2.82438K wps
[Epoch 13 Batch 60/172] avg loss 0.00652241, throughput 2.80297K wps
[Epoch 13 Batch 90/172] avg loss 0.00648729, throughput 2.62413K wps
[Epoch 13 Batch 120/172] avg loss 0.0063995, throughput 2.83022K wps
[Epoch 13 Batch 150/172] avg loss 0.00648764, throughput 2.818K wps
Begin Testing...
[Epoch 13] train avg loss 0.00644848, dev acc 0.8774, dev avg loss 0.328401, throughput 2.76912K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.00629639, throughput 2.8784K wps
[Epoch 14 Batch 60/172] avg loss 0.00646935, throughput 2.64764K wps
[Epoch 14 Batch 90/172] avg loss 0.00625741, throughput 2.85571K wps
[Epoch 14 Batch 120/172] avg loss 0.00623011, throughput 2.79461K wps
[Epoch 14 Batch 150/172] avg loss 0.00597936, throughput 2.696K wps
Begin Testing...
[Epoch 14] train avg loss 0.00623304, dev acc 0.8774, dev avg loss 0.319339, throughput 2.76481K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.00580444, throughput 2.84483K wps
[Epoch 15 Batch 60/172] avg loss 0.00613956, throughput 2.76785K wps
[Epoch 15 Batch 90/172] avg loss 0.00586006, throughput 2.78592K wps
[Epoch 15 Batch 120/172] avg loss 0.00611138, throughput 2.61996K wps
[Epoch 15 Batch 150/172] avg loss 0.00584744, throughput 2.83849K wps
Begin Testing...
[Epoch 15] train avg loss 0.00596948, dev acc 0.8795, dev avg loss 0.311664, throughput 2.77453K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.00639232, throughput 2.78965K wps
[Epoch 16 Batch 60/172] avg loss 0.00593197, throughput 2.8377K wps
[Epoch 16 Batch 90/172] avg loss 0.00564518, throughput 2.83117K wps
[Epoch 16 Batch 120/172] avg loss 0.00545925, throughput 2.76259K wps
[Epoch 16 Batch 150/172] avg loss 0.0055351, throughput 2.72063K wps
Begin Testing...
[Epoch 16] train avg loss 0.00577258, dev acc 0.8784, dev avg loss 0.307991, throughput 2.79258K wps
[Epoch 17 Batch 30/172] avg loss 0.00592104, throughput 2.81877K wps
[Epoch 17 Batch 60/172] avg loss 0.00536967, throughput 2.81504K wps
[Epoch 17 Batch 90/172] avg loss 0.0052447, throughput 2.65802K wps
[Epoch 17 Batch 120/172] avg loss 0.00554106, throughput 2.79737K wps
[Epoch 17 Batch 150/172] avg loss 0.00581569, throughput 2.81675K wps
Begin Testing...
[Epoch 17] train avg loss 0.00551693, dev acc 0.8805, dev avg loss 0.300026, throughput 2.78597K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.00542136, throughput 2.60337K wps
[Epoch 18 Batch 60/172] avg loss 0.00521506, throughput 2.87761K wps
[Epoch 18 Batch 90/172] avg loss 0.00519219, throughput 2.81217K wps
[Epoch 18 Batch 120/172] avg loss 0.00521328, throughput 2.78959K wps
[Epoch 18 Batch 150/172] avg loss 0.00537268, throughput 2.81389K wps
Begin Testing...
[Epoch 18] train avg loss 0.00535075, dev acc 0.8826, dev avg loss 0.296532, throughput 2.75361K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.0051287, throughput 2.83711K wps
[Epoch 19 Batch 60/172] avg loss 0.00487743, throughput 2.79971K wps
[Epoch 19 Batch 90/172] avg loss 0.00515894, throughput 2.65962K wps
[Epoch 19 Batch 120/172] avg loss 0.00541589, throughput 2.83387K wps
[Epoch 19 Batch 150/172] avg loss 0.00522594, throughput 2.71673K wps
Begin Testing...
[Epoch 19] train avg loss 0.00517025, dev acc 0.8836, dev avg loss 0.292086, throughput 2.77047K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.00492628, throughput 2.78831K wps
[Epoch 20 Batch 60/172] avg loss 0.00520696, throughput 2.8507K wps
[Epoch 20 Batch 90/172] avg loss 0.00526214, throughput 2.7129K wps
[Epoch 20 Batch 120/172] avg loss 0.0050315, throughput 2.65881K wps
[Epoch 20 Batch 150/172] avg loss 0.00494871, throughput 2.8055K wps
Begin Testing...
[Epoch 20] train avg loss 0.00504039, dev acc 0.8878, dev avg loss 0.288653, throughput 2.76297K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/172] avg loss 0.00447922, throughput 2.70889K wps
[Epoch 21 Batch 60/172] avg loss 0.00520544, throughput 2.75546K wps
[Epoch 21 Batch 90/172] avg loss 0.005262, throughput 2.80143K wps
[Epoch 21 Batch 120/172] avg loss 0.00477517, throughput 2.64998K wps
[Epoch 21 Batch 150/172] avg loss 0.00489142, throughput 2.79294K wps
Begin Testing...
[Epoch 21] train avg loss 0.00494309, dev acc 0.8910, dev avg loss 0.287141, throughput 2.74174K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.00445861, throughput 2.84708K wps
[Epoch 22 Batch 60/172] avg loss 0.00485505, throughput 2.81213K wps
[Epoch 22 Batch 90/172] avg loss 0.00486406, throughput 2.71247K wps
[Epoch 22 Batch 120/172] avg loss 0.00501402, throughput 2.78587K wps
[Epoch 22 Batch 150/172] avg loss 0.00511238, throughput 2.8162K wps
Begin Testing...
[Epoch 22] train avg loss 0.00484205, dev acc 0.8910, dev avg loss 0.284605, throughput 2.79786K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/172] avg loss 0.00435729, throughput 2.88521K wps
[Epoch 23 Batch 60/172] avg loss 0.00476568, throughput 2.81594K wps
[Epoch 23 Batch 90/172] avg loss 0.004856, throughput 2.66943K wps
[Epoch 23 Batch 120/172] avg loss 0.00456167, throughput 2.84887K wps
[Epoch 23 Batch 150/172] avg loss 0.00450433, throughput 2.77982K wps
Begin Testing...
[Epoch 23] train avg loss 0.00464757, dev acc 0.8952, dev avg loss 0.282458, throughput 2.78768K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00450792, throughput 2.8146K wps
[Epoch 24 Batch 60/172] avg loss 0.00459212, throughput 2.78086K wps
[Epoch 24 Batch 90/172] avg loss 0.00467437, throughput 2.65854K wps
[Epoch 24 Batch 120/172] avg loss 0.00441824, throughput 2.77543K wps
[Epoch 24 Batch 150/172] avg loss 0.00449693, throughput 2.79258K wps
Begin Testing...
[Epoch 24] train avg loss 0.0045382, dev acc 0.8973, dev avg loss 0.280796, throughput 2.76429K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/172] avg loss 0.00455045, throughput 2.79126K wps
[Epoch 25 Batch 60/172] avg loss 0.00445533, throughput 2.76256K wps
[Epoch 25 Batch 90/172] avg loss 0.00440132, throughput 2.7739K wps
[Epoch 25 Batch 120/172] avg loss 0.00422465, throughput 2.83433K wps
[Epoch 25 Batch 150/172] avg loss 0.00470133, throughput 2.79073K wps
Begin Testing...
[Epoch 25] train avg loss 0.00439043, dev acc 0.8941, dev avg loss 0.279175, throughput 2.78628K wps
[Epoch 26 Batch 30/172] avg loss 0.00412574, throughput 2.84644K wps
[Epoch 26 Batch 60/172] avg loss 0.00445612, throughput 2.75971K wps
[Epoch 26 Batch 90/172] avg loss 0.00424822, throughput 2.76305K wps
[Epoch 26 Batch 120/172] avg loss 0.00432147, throughput 2.74999K wps
[Epoch 26 Batch 150/172] avg loss 0.0045786, throughput 2.82449K wps
Begin Testing...
[Epoch 26] train avg loss 0.00435441, dev acc 0.8941, dev avg loss 0.278974, throughput 2.79005K wps
[Epoch 27 Batch 30/172] avg loss 0.0041455, throughput 2.68975K wps
[Epoch 27 Batch 60/172] avg loss 0.00440346, throughput 2.75746K wps
[Epoch 27 Batch 90/172] avg loss 0.00408933, throughput 2.7772K wps
[Epoch 27 Batch 120/172] avg loss 0.00413214, throughput 2.62172K wps
[Epoch 27 Batch 150/172] avg loss 0.00443016, throughput 2.84191K wps
Begin Testing...
[Epoch 27] train avg loss 0.00415631, dev acc 0.8941, dev avg loss 0.278469, throughput 2.73886K wps
[Epoch 28 Batch 30/172] avg loss 0.0041587, throughput 2.80894K wps
[Epoch 28 Batch 60/172] avg loss 0.0038126, throughput 2.7239K wps
[Epoch 28 Batch 90/172] avg loss 0.00411748, throughput 2.78902K wps
[Epoch 28 Batch 120/172] avg loss 0.0041327, throughput 2.79498K wps
[Epoch 28 Batch 150/172] avg loss 0.00394642, throughput 2.78546K wps
Begin Testing...
[Epoch 28] train avg loss 0.00405998, dev acc 0.8952, dev avg loss 0.277544, throughput 2.74981K wps
[Epoch 29 Batch 30/172] avg loss 0.00393652, throughput 2.81068K wps
[Epoch 29 Batch 60/172] avg loss 0.00359452, throughput 2.7499K wps
[Epoch 29 Batch 90/172] avg loss 0.00432227, throughput 2.67878K wps
[Epoch 29 Batch 120/172] avg loss 0.00404087, throughput 2.7936K wps
[Epoch 29 Batch 150/172] avg loss 0.00397155, throughput 2.78072K wps
Begin Testing...
[Epoch 29] train avg loss 0.00397084, dev acc 0.8889, dev avg loss 0.278634, throughput 2.76768K wps
[Epoch 30 Batch 30/172] avg loss 0.00439413, throughput 2.79636K wps
[Epoch 30 Batch 60/172] avg loss 0.0034501, throughput 2.86943K wps
[Epoch 30 Batch 90/172] avg loss 0.00357668, throughput 2.69254K wps
[Epoch 30 Batch 120/172] avg loss 0.0040402, throughput 2.82213K wps
[Epoch 30 Batch 150/172] avg loss 0.00397061, throughput 2.81858K wps
Begin Testing...
[Epoch 30] train avg loss 0.00385057, dev acc 0.8983, dev avg loss 0.276176, throughput 2.79982K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/172] avg loss 0.00376895, throughput 2.8215K wps
[Epoch 31 Batch 60/172] avg loss 0.00348545, throughput 2.73788K wps
[Epoch 31 Batch 90/172] avg loss 0.00396721, throughput 2.83236K wps
[Epoch 31 Batch 120/172] avg loss 0.00349228, throughput 2.69519K wps
[Epoch 31 Batch 150/172] avg loss 0.00372122, throughput 2.64592K wps
Begin Testing...
[Epoch 31] train avg loss 0.00378444, dev acc 0.8920, dev avg loss 0.278367, throughput 2.76087K wps
[Epoch 32 Batch 30/172] avg loss 0.0033693, throughput 2.71567K wps
[Epoch 32 Batch 60/172] avg loss 0.00360079, throughput 2.81814K wps
[Epoch 32 Batch 90/172] avg loss 0.00392364, throughput 2.81057K wps
[Epoch 32 Batch 120/172] avg loss 0.0037627, throughput 2.65563K wps
[Epoch 32 Batch 150/172] avg loss 0.00384535, throughput 2.69141K wps
Begin Testing...
[Epoch 32] train avg loss 0.00364001, dev acc 0.8931, dev avg loss 0.279928, throughput 2.72152K wps
[Epoch 33 Batch 30/172] avg loss 0.003437, throughput 2.73528K wps
[Epoch 33 Batch 60/172] avg loss 0.00344368, throughput 2.81409K wps
[Epoch 33 Batch 90/172] avg loss 0.0035358, throughput 2.79265K wps
[Epoch 33 Batch 120/172] avg loss 0.00354323, throughput 2.79033K wps
[Epoch 33 Batch 150/172] avg loss 0.0040242, throughput 2.80683K wps
Begin Testing...
[Epoch 33] train avg loss 0.00362748, dev acc 0.8962, dev avg loss 0.276617, throughput 2.78899K wps
[Epoch 34 Batch 30/172] avg loss 0.00376741, throughput 2.88599K wps
[Epoch 34 Batch 60/172] avg loss 0.00368413, throughput 2.82455K wps
[Epoch 34 Batch 90/172] avg loss 0.00340793, throughput 2.79339K wps
[Epoch 34 Batch 120/172] avg loss 0.00329242, throughput 2.7858K wps
[Epoch 34 Batch 150/172] avg loss 0.00340328, throughput 2.63561K wps
Begin Testing...
[Epoch 34] train avg loss 0.0034752, dev acc 0.8962, dev avg loss 0.27965, throughput 2.78519K wps
[Epoch 35 Batch 30/172] avg loss 0.00322258, throughput 2.7636K wps
[Epoch 35 Batch 60/172] avg loss 0.00297573, throughput 2.81934K wps
[Epoch 35 Batch 90/172] avg loss 0.00365879, throughput 2.74061K wps
[Epoch 35 Batch 120/172] avg loss 0.00366207, throughput 2.7885K wps
[Epoch 35 Batch 150/172] avg loss 0.00335564, throughput 2.79709K wps
Begin Testing...
[Epoch 35] train avg loss 0.00341104, dev acc 0.8952, dev avg loss 0.277381, throughput 2.77287K wps
[Epoch 36 Batch 30/172] avg loss 0.00314231, throughput 2.87722K wps
[Epoch 36 Batch 60/172] avg loss 0.00298133, throughput 2.83128K wps
[Epoch 36 Batch 90/172] avg loss 0.00323707, throughput 2.81057K wps
[Epoch 36 Batch 120/172] avg loss 0.00358186, throughput 2.73372K wps
[Epoch 36 Batch 150/172] avg loss 0.00339945, throughput 2.85284K wps
Begin Testing...
[Epoch 36] train avg loss 0.00330062, dev acc 0.8952, dev avg loss 0.278956, throughput 2.80659K wps
[Epoch 37 Batch 30/172] avg loss 0.00333857, throughput 2.77663K wps
[Epoch 37 Batch 60/172] avg loss 0.00316806, throughput 2.75203K wps
[Epoch 37 Batch 90/172] avg loss 0.00310215, throughput 2.75664K wps
[Epoch 37 Batch 120/172] avg loss 0.00317976, throughput 2.68331K wps
[Epoch 37 Batch 150/172] avg loss 0.00313798, throughput 2.74949K wps
Begin Testing...
[Epoch 37] train avg loss 0.00323274, dev acc 0.8962, dev avg loss 0.279641, throughput 2.75097K wps
[Epoch 38 Batch 30/172] avg loss 0.00319156, throughput 2.65306K wps
[Epoch 38 Batch 60/172] avg loss 0.00310578, throughput 2.76549K wps
[Epoch 38 Batch 90/172] avg loss 0.00304612, throughput 2.78152K wps
[Epoch 38 Batch 120/172] avg loss 0.00285324, throughput 2.70722K wps
[Epoch 38 Batch 150/172] avg loss 0.00358348, throughput 2.87022K wps
Begin Testing...
[Epoch 38] train avg loss 0.00315856, dev acc 0.8952, dev avg loss 0.280872, throughput 2.74141K wps
[Epoch 39 Batch 30/172] avg loss 0.00319208, throughput 2.82559K wps
[Epoch 39 Batch 60/172] avg loss 0.00306346, throughput 2.76587K wps
[Epoch 39 Batch 90/172] avg loss 0.00320639, throughput 2.78409K wps
[Epoch 39 Batch 120/172] avg loss 0.00328215, throughput 2.80901K wps
[Epoch 39 Batch 150/172] avg loss 0.00322311, throughput 2.7585K wps
Begin Testing...
[Epoch 39] train avg loss 0.003196, dev acc 0.8962, dev avg loss 0.285186, throughput 2.78384K wps
[Epoch 40 Batch 30/172] avg loss 0.00334597, throughput 2.72434K wps
[Epoch 40 Batch 60/172] avg loss 0.00280189, throughput 2.79351K wps
[Epoch 40 Batch 90/172] avg loss 0.00297214, throughput 2.78978K wps
[Epoch 40 Batch 120/172] avg loss 0.00298594, throughput 2.75389K wps
[Epoch 40 Batch 150/172] avg loss 0.00299749, throughput 2.79287K wps
Begin Testing...
[Epoch 40] train avg loss 0.0030466, dev acc 0.8899, dev avg loss 0.282349, throughput 2.75108K wps
[Epoch 41 Batch 30/172] avg loss 0.00301094, throughput 2.73441K wps
[Epoch 41 Batch 60/172] avg loss 0.00285609, throughput 2.78946K wps
[Epoch 41 Batch 90/172] avg loss 0.00309734, throughput 2.62449K wps
[Epoch 41 Batch 120/172] avg loss 0.00274158, throughput 2.68475K wps
[Epoch 41 Batch 150/172] avg loss 0.00324973, throughput 2.74577K wps
Begin Testing...
[Epoch 41] train avg loss 0.00296032, dev acc 0.8941, dev avg loss 0.28399, throughput 2.70913K wps
[Epoch 42 Batch 30/172] avg loss 0.00241018, throughput 2.82906K wps
[Epoch 42 Batch 60/172] avg loss 0.00293656, throughput 2.72186K wps
[Epoch 42 Batch 90/172] avg loss 0.00305046, throughput 2.70589K wps
[Epoch 42 Batch 120/172] avg loss 0.00296457, throughput 2.74911K wps
[Epoch 42 Batch 150/172] avg loss 0.00308894, throughput 2.77614K wps
Begin Testing...
[Epoch 42] train avg loss 0.00289822, dev acc 0.8941, dev avg loss 0.284448, throughput 2.75966K wps
[Epoch 43 Batch 30/172] avg loss 0.00261785, throughput 2.81734K wps
[Epoch 43 Batch 60/172] avg loss 0.00291687, throughput 2.81571K wps
[Epoch 43 Batch 90/172] avg loss 0.00278176, throughput 2.74914K wps
[Epoch 43 Batch 120/172] avg loss 0.0027469, throughput 2.74057K wps
[Epoch 43 Batch 150/172] avg loss 0.00303175, throughput 2.75331K wps
Begin Testing...
[Epoch 43] train avg loss 0.00284281, dev acc 0.8941, dev avg loss 0.287387, throughput 2.78453K wps
[Epoch 44 Batch 30/172] avg loss 0.00293644, throughput 2.81014K wps
[Epoch 44 Batch 60/172] avg loss 0.00285801, throughput 2.7901K wps
[Epoch 44 Batch 90/172] avg loss 0.00293109, throughput 2.79596K wps
[Epoch 44 Batch 120/172] avg loss 0.00283418, throughput 2.77129K wps
[Epoch 44 Batch 150/172] avg loss 0.00257462, throughput 2.78875K wps
Begin Testing...
[Epoch 44] train avg loss 0.0028174, dev acc 0.8952, dev avg loss 0.285683, throughput 2.78244K wps
[Epoch 45 Batch 30/172] avg loss 0.00281347, throughput 2.81499K wps
[Epoch 45 Batch 60/172] avg loss 0.00229716, throughput 2.75821K wps
[Epoch 45 Batch 90/172] avg loss 0.00293045, throughput 2.77543K wps
[Epoch 45 Batch 120/172] avg loss 0.00273788, throughput 2.59766K wps
[Epoch 45 Batch 150/172] avg loss 0.0027125, throughput 2.7556K wps
Begin Testing...
[Epoch 45] train avg loss 0.00275313, dev acc 0.8941, dev avg loss 0.289451, throughput 2.74378K wps
[Epoch 46 Batch 30/172] avg loss 0.0025544, throughput 2.76705K wps
[Epoch 46 Batch 60/172] avg loss 0.00271811, throughput 2.77272K wps
[Epoch 46 Batch 90/172] avg loss 0.00267758, throughput 2.67841K wps
[Epoch 46 Batch 120/172] avg loss 0.00294413, throughput 2.71711K wps
[Epoch 46 Batch 150/172] avg loss 0.00266982, throughput 2.7459K wps
Begin Testing...
[Epoch 46] train avg loss 0.00267556, dev acc 0.8910, dev avg loss 0.288273, throughput 2.72858K wps
[Epoch 47 Batch 30/172] avg loss 0.00247773, throughput 2.74087K wps
[Epoch 47 Batch 60/172] avg loss 0.00280285, throughput 2.75626K wps
[Epoch 47 Batch 90/172] avg loss 0.00261683, throughput 2.62957K wps
[Epoch 47 Batch 120/172] avg loss 0.00272366, throughput 2.77739K wps
[Epoch 47 Batch 150/172] avg loss 0.00260107, throughput 2.7826K wps
Begin Testing...
[Epoch 47] train avg loss 0.00265971, dev acc 0.8941, dev avg loss 0.290363, throughput 2.73896K wps
[Epoch 48 Batch 30/172] avg loss 0.00272002, throughput 2.73869K wps
[Epoch 48 Batch 60/172] avg loss 0.00226009, throughput 2.77691K wps
[Epoch 48 Batch 90/172] avg loss 0.00289962, throughput 2.76296K wps
[Epoch 48 Batch 120/172] avg loss 0.00259383, throughput 2.68611K wps
[Epoch 48 Batch 150/172] avg loss 0.00272842, throughput 2.80168K wps
Begin Testing...
[Epoch 48] train avg loss 0.0025953, dev acc 0.8910, dev avg loss 0.290595, throughput 2.74343K wps
[Epoch 49 Batch 30/172] avg loss 0.00268858, throughput 2.70349K wps
[Epoch 49 Batch 60/172] avg loss 0.00261042, throughput 2.80612K wps
[Epoch 49 Batch 90/172] avg loss 0.0023963, throughput 2.80105K wps
[Epoch 49 Batch 120/172] avg loss 0.00243235, throughput 2.7318K wps
[Epoch 49 Batch 150/172] avg loss 0.0026498, throughput 2.80898K wps
Begin Testing...
[Epoch 49] train avg loss 0.00257455, dev acc 0.8910, dev avg loss 0.292198, throughput 2.75854K wps
[Epoch 50 Batch 30/172] avg loss 0.00245119, throughput 2.81845K wps
[Epoch 50 Batch 60/172] avg loss 0.00231437, throughput 2.83749K wps
[Epoch 50 Batch 90/172] avg loss 0.00246802, throughput 2.81591K wps
[Epoch 50 Batch 120/172] avg loss 0.00260552, throughput 2.69555K wps
[Epoch 50 Batch 150/172] avg loss 0.0024443, throughput 2.70613K wps
Begin Testing...
[Epoch 50] train avg loss 0.00245249, dev acc 0.8899, dev avg loss 0.296872, throughput 2.78662K wps
[Epoch 51 Batch 30/172] avg loss 0.00223579, throughput 2.79622K wps
[Epoch 51 Batch 60/172] avg loss 0.00233213, throughput 2.86014K wps
[Epoch 51 Batch 90/172] avg loss 0.00248576, throughput 2.8119K wps
[Epoch 51 Batch 120/172] avg loss 0.00230759, throughput 2.79693K wps
[Epoch 51 Batch 150/172] avg loss 0.00239625, throughput 2.74147K wps
Begin Testing...
[Epoch 51] train avg loss 0.00244313, dev acc 0.8910, dev avg loss 0.29573, throughput 2.80234K wps
[Epoch 52 Batch 30/172] avg loss 0.00214654, throughput 2.88428K wps
[Epoch 52 Batch 60/172] avg loss 0.00245422, throughput 2.77485K wps
[Epoch 52 Batch 90/172] avg loss 0.00234328, throughput 2.72806K wps
[Epoch 52 Batch 120/172] avg loss 0.00241707, throughput 2.81997K wps
[Epoch 52 Batch 150/172] avg loss 0.00254939, throughput 2.81962K wps
Begin Testing...
[Epoch 52] train avg loss 0.00235845, dev acc 0.8910, dev avg loss 0.299475, throughput 2.80621K wps
[Epoch 53 Batch 30/172] avg loss 0.00237353, throughput 2.73664K wps
[Epoch 53 Batch 60/172] avg loss 0.00233047, throughput 2.71074K wps
[Epoch 53 Batch 90/172] avg loss 0.00209178, throughput 2.78743K wps
[Epoch 53 Batch 120/172] avg loss 0.00242813, throughput 2.78615K wps
[Epoch 53 Batch 150/172] avg loss 0.00236686, throughput 2.67861K wps
Begin Testing...
[Epoch 53] train avg loss 0.00231734, dev acc 0.8910, dev avg loss 0.302144, throughput 2.74899K wps
[Epoch 54 Batch 30/172] avg loss 0.00237059, throughput 2.72146K wps
[Epoch 54 Batch 60/172] avg loss 0.00233748, throughput 2.72253K wps
[Epoch 54 Batch 90/172] avg loss 0.0024156, throughput 2.79206K wps
[Epoch 54 Batch 120/172] avg loss 0.0021466, throughput 2.81487K wps
[Epoch 54 Batch 150/172] avg loss 0.00237059, throughput 2.77084K wps
Begin Testing...
[Epoch 54] train avg loss 0.00230328, dev acc 0.8920, dev avg loss 0.301351, throughput 2.77122K wps
[Epoch 55 Batch 30/172] avg loss 0.00173806, throughput 2.6924K wps
[Epoch 55 Batch 60/172] avg loss 0.0024559, throughput 2.82645K wps
[Epoch 55 Batch 90/172] avg loss 0.0023964, throughput 2.82486K wps
[Epoch 55 Batch 120/172] avg loss 0.00229216, throughput 2.77729K wps
[Epoch 55 Batch 150/172] avg loss 0.0021489, throughput 2.74921K wps
Begin Testing...
[Epoch 55] train avg loss 0.00221972, dev acc 0.8910, dev avg loss 0.305315, throughput 2.77844K wps
[Epoch 56 Batch 30/172] avg loss 0.00240169, throughput 2.73054K wps
[Epoch 56 Batch 60/172] avg loss 0.00209009, throughput 2.74335K wps
[Epoch 56 Batch 90/172] avg loss 0.00208562, throughput 2.79904K wps
[Epoch 56 Batch 120/172] avg loss 0.00251054, throughput 2.78129K wps
[Epoch 56 Batch 150/172] avg loss 0.00219171, throughput 2.78728K wps
Begin Testing...
[Epoch 56] train avg loss 0.00221885, dev acc 0.8889, dev avg loss 0.306618, throughput 2.77251K wps
[Epoch 57 Batch 30/172] avg loss 0.00208706, throughput 2.83618K wps
[Epoch 57 Batch 60/172] avg loss 0.00213893, throughput 2.74805K wps
[Epoch 57 Batch 90/172] avg loss 0.00224802, throughput 2.71589K wps
[Epoch 57 Batch 120/172] avg loss 0.00202984, throughput 2.75469K wps
[Epoch 57 Batch 150/172] avg loss 0.00211068, throughput 2.6849K wps
Begin Testing...
[Epoch 57] train avg loss 0.00214076, dev acc 0.8910, dev avg loss 0.305469, throughput 2.74817K wps
[Epoch 58 Batch 30/172] avg loss 0.00208664, throughput 2.77204K wps
[Epoch 58 Batch 60/172] avg loss 0.00209262, throughput 2.78994K wps
[Epoch 58 Batch 90/172] avg loss 0.0020565, throughput 2.78226K wps
[Epoch 58 Batch 120/172] avg loss 0.00207067, throughput 2.74781K wps
[Epoch 58 Batch 150/172] avg loss 0.00216872, throughput 2.71107K wps
Begin Testing...
[Epoch 58] train avg loss 0.00215029, dev acc 0.8899, dev avg loss 0.308237, throughput 2.76827K wps
[Epoch 59 Batch 30/172] avg loss 0.00210979, throughput 2.66593K wps
[Epoch 59 Batch 60/172] avg loss 0.00188111, throughput 2.76032K wps
[Epoch 59 Batch 90/172] avg loss 0.00232613, throughput 2.74998K wps
[Epoch 59 Batch 120/172] avg loss 0.00209185, throughput 2.78882K wps
[Epoch 59 Batch 150/172] avg loss 0.00237364, throughput 2.77307K wps
Begin Testing...
[Epoch 59] train avg loss 0.00216, dev acc 0.8889, dev avg loss 0.30784, throughput 2.7474K wps
[Epoch 60 Batch 30/172] avg loss 0.0020038, throughput 2.81292K wps
[Epoch 60 Batch 60/172] avg loss 0.00204772, throughput 2.72218K wps
[Epoch 60 Batch 90/172] avg loss 0.00205819, throughput 2.75757K wps
[Epoch 60 Batch 120/172] avg loss 0.00184914, throughput 2.80147K wps
[Epoch 60 Batch 150/172] avg loss 0.00224187, throughput 2.76711K wps
Begin Testing...
[Epoch 60] train avg loss 0.00204576, dev acc 0.8868, dev avg loss 0.314357, throughput 2.75405K wps
[Epoch 61 Batch 30/172] avg loss 0.00195325, throughput 2.82318K wps
[Epoch 61 Batch 60/172] avg loss 0.00178514, throughput 2.70719K wps
[Epoch 61 Batch 90/172] avg loss 0.00199736, throughput 2.66736K wps
[Epoch 61 Batch 120/172] avg loss 0.00211323, throughput 2.79071K wps
[Epoch 61 Batch 150/172] avg loss 0.00222862, throughput 2.77008K wps
Begin Testing...
[Epoch 61] train avg loss 0.00200701, dev acc 0.8847, dev avg loss 0.318029, throughput 2.74143K wps
[Epoch 62 Batch 30/172] avg loss 0.0018834, throughput 2.66589K wps
[Epoch 62 Batch 60/172] avg loss 0.00190324, throughput 2.78838K wps
[Epoch 62 Batch 90/172] avg loss 0.00227852, throughput 2.74804K wps
[Epoch 62 Batch 120/172] avg loss 0.00200629, throughput 2.69534K wps
[Epoch 62 Batch 150/172] avg loss 0.00208284, throughput 2.77215K wps
Begin Testing...
[Epoch 62] train avg loss 0.00200923, dev acc 0.8889, dev avg loss 0.315823, throughput 2.72773K wps
[Epoch 63 Batch 30/172] avg loss 0.00195473, throughput 2.79378K wps
[Epoch 63 Batch 60/172] avg loss 0.00220943, throughput 2.79439K wps
[Epoch 63 Batch 90/172] avg loss 0.00174484, throughput 2.77739K wps
[Epoch 63 Batch 120/172] avg loss 0.00216727, throughput 2.75716K wps
[Epoch 63 Batch 150/172] avg loss 0.00208204, throughput 2.76571K wps
Begin Testing...
[Epoch 63] train avg loss 0.00200024, dev acc 0.8857, dev avg loss 0.319318, throughput 2.77725K wps
[Epoch 64 Batch 30/172] avg loss 0.00209908, throughput 2.64135K wps
[Epoch 64 Batch 60/172] avg loss 0.00188704, throughput 2.7338K wps
[Epoch 64 Batch 90/172] avg loss 0.00200499, throughput 2.77004K wps
[Epoch 64 Batch 120/172] avg loss 0.00188061, throughput 2.61814K wps
[Epoch 64 Batch 150/172] avg loss 0.00212567, throughput 2.67681K wps
Begin Testing...
[Epoch 64] train avg loss 0.0019758, dev acc 0.8899, dev avg loss 0.318016, throughput 2.69734K wps
[Epoch 65 Batch 30/172] avg loss 0.00161278, throughput 2.80866K wps
[Epoch 65 Batch 60/172] avg loss 0.00210408, throughput 2.76381K wps
[Epoch 65 Batch 90/172] avg loss 0.00208129, throughput 2.77042K wps
[Epoch 65 Batch 120/172] avg loss 0.0017421, throughput 2.79746K wps
[Epoch 65 Batch 150/172] avg loss 0.00191488, throughput 2.7778K wps
Begin Testing...
[Epoch 65] train avg loss 0.00190682, dev acc 0.8868, dev avg loss 0.32114, throughput 2.78488K wps
[Epoch 66 Batch 30/172] avg loss 0.00181698, throughput 2.77665K wps
[Epoch 66 Batch 60/172] avg loss 0.00187917, throughput 2.82464K wps
[Epoch 66 Batch 90/172] avg loss 0.00169005, throughput 2.82066K wps
[Epoch 66 Batch 120/172] avg loss 0.0021372, throughput 2.80865K wps
[Epoch 66 Batch 150/172] avg loss 0.00185511, throughput 2.74154K wps
Begin Testing...
[Epoch 66] train avg loss 0.00188618, dev acc 0.8889, dev avg loss 0.325612, throughput 2.78296K wps
[Epoch 67 Batch 30/172] avg loss 0.00157105, throughput 2.8081K wps
[Epoch 67 Batch 60/172] avg loss 0.0019729, throughput 2.81097K wps
[Epoch 67 Batch 90/172] avg loss 0.0019844, throughput 2.72873K wps
[Epoch 67 Batch 120/172] avg loss 0.00200766, throughput 2.79892K wps
[Epoch 67 Batch 150/172] avg loss 0.00170668, throughput 2.7575K wps
Begin Testing...
[Epoch 67] train avg loss 0.00187866, dev acc 0.8878, dev avg loss 0.31917, throughput 2.77172K wps
[Epoch 68 Batch 30/172] avg loss 0.0017842, throughput 2.70354K wps
[Epoch 68 Batch 60/172] avg loss 0.00173205, throughput 2.76425K wps
[Epoch 68 Batch 90/172] avg loss 0.00179924, throughput 2.58273K wps
[Epoch 68 Batch 120/172] avg loss 0.00200403, throughput 2.68458K wps
[Epoch 68 Batch 150/172] avg loss 0.00181533, throughput 2.81224K wps
Begin Testing...
[Epoch 68] train avg loss 0.00182778, dev acc 0.8878, dev avg loss 0.324547, throughput 2.72126K wps
[Epoch 69 Batch 30/172] avg loss 0.00177616, throughput 2.83588K wps
[Epoch 69 Batch 60/172] avg loss 0.00188674, throughput 2.71146K wps
[Epoch 69 Batch 90/172] avg loss 0.00191715, throughput 2.81855K wps
[Epoch 69 Batch 120/172] avg loss 0.00155582, throughput 2.82203K wps
[Epoch 69 Batch 150/172] avg loss 0.00179792, throughput 2.81654K wps
Begin Testing...
[Epoch 69] train avg loss 0.00178722, dev acc 0.8868, dev avg loss 0.330206, throughput 2.79762K wps
[Epoch 70 Batch 30/172] avg loss 0.0018801, throughput 2.89284K wps
[Epoch 70 Batch 60/172] avg loss 0.00178383, throughput 2.7481K wps
[Epoch 70 Batch 90/172] avg loss 0.00162137, throughput 2.75567K wps
[Epoch 70 Batch 120/172] avg loss 0.00178986, throughput 2.74037K wps
[Epoch 70 Batch 150/172] avg loss 0.0017684, throughput 2.85264K wps
Begin Testing...
[Epoch 70] train avg loss 0.00180814, dev acc 0.8857, dev avg loss 0.330938, throughput 2.78557K wps
[Epoch 71 Batch 30/172] avg loss 0.00165113, throughput 2.72884K wps
[Epoch 71 Batch 60/172] avg loss 0.00154354, throughput 2.81326K wps
[Epoch 71 Batch 90/172] avg loss 0.00157637, throughput 2.75441K wps
[Epoch 71 Batch 120/172] avg loss 0.00195269, throughput 2.78935K wps
[Epoch 71 Batch 150/172] avg loss 0.00183392, throughput 2.83369K wps
Begin Testing...
[Epoch 71] train avg loss 0.00177064, dev acc 0.8868, dev avg loss 0.330598, throughput 2.78876K wps
[Epoch 72 Batch 30/172] avg loss 0.00155844, throughput 2.83072K wps
[Epoch 72 Batch 60/172] avg loss 0.00168929, throughput 2.86907K wps
[Epoch 72 Batch 90/172] avg loss 0.00182746, throughput 2.75475K wps
[Epoch 72 Batch 120/172] avg loss 0.00184831, throughput 2.7766K wps
[Epoch 72 Batch 150/172] avg loss 0.00184277, throughput 2.82721K wps
Begin Testing...
[Epoch 72] train avg loss 0.00173557, dev acc 0.8868, dev avg loss 0.332911, throughput 2.8115K wps
[Epoch 73 Batch 30/172] avg loss 0.00161733, throughput 2.81589K wps
[Epoch 73 Batch 60/172] avg loss 0.001727, throughput 2.82439K wps
[Epoch 73 Batch 90/172] avg loss 0.00167194, throughput 2.73676K wps
[Epoch 73 Batch 120/172] avg loss 0.00188426, throughput 2.74028K wps
[Epoch 73 Batch 150/172] avg loss 0.00175101, throughput 2.66625K wps
Begin Testing...
[Epoch 73] train avg loss 0.00173154, dev acc 0.8857, dev avg loss 0.333203, throughput 2.75857K wps
[Epoch 74 Batch 30/172] avg loss 0.00172058, throughput 2.80186K wps
[Epoch 74 Batch 60/172] avg loss 0.00166125, throughput 2.8308K wps
[Epoch 74 Batch 90/172] avg loss 0.00164898, throughput 2.81343K wps
[Epoch 74 Batch 120/172] avg loss 0.00179801, throughput 2.78501K wps
[Epoch 74 Batch 150/172] avg loss 0.00173265, throughput 2.64462K wps
Begin Testing...
[Epoch 74] train avg loss 0.00173191, dev acc 0.8857, dev avg loss 0.347223, throughput 2.77892K wps
[Epoch 75 Batch 30/172] avg loss 0.00150831, throughput 2.62712K wps
[Epoch 75 Batch 60/172] avg loss 0.00176183, throughput 2.57157K wps
[Epoch 75 Batch 90/172] avg loss 0.00179401, throughput 2.53689K wps
[Epoch 75 Batch 120/172] avg loss 0.00172102, throughput 2.77198K wps
[Epoch 75 Batch 150/172] avg loss 0.00157941, throughput 2.7438K wps
Begin Testing...
[Epoch 75] train avg loss 0.00168289, dev acc 0.8868, dev avg loss 0.338494, throughput 2.66228K wps
[Epoch 76 Batch 30/172] avg loss 0.00144126, throughput 2.73544K wps
[Epoch 76 Batch 60/172] avg loss 0.001641, throughput 2.73352K wps
[Epoch 76 Batch 90/172] avg loss 0.00207923, throughput 2.78574K wps
[Epoch 76 Batch 120/172] avg loss 0.00176922, throughput 2.73655K wps
[Epoch 76 Batch 150/172] avg loss 0.00177001, throughput 2.7846K wps
Begin Testing...
[Epoch 76] train avg loss 0.00169135, dev acc 0.8857, dev avg loss 0.348807, throughput 2.7496K wps
[Epoch 77 Batch 30/172] avg loss 0.0016465, throughput 2.80028K wps
[Epoch 77 Batch 60/172] avg loss 0.0016201, throughput 2.78283K wps
[Epoch 77 Batch 90/172] avg loss 0.00165577, throughput 2.6781K wps
[Epoch 77 Batch 120/172] avg loss 0.00140208, throughput 2.79543K wps
[Epoch 77 Batch 150/172] avg loss 0.00190054, throughput 2.79796K wps
Begin Testing...
[Epoch 77] train avg loss 0.00166067, dev acc 0.8816, dev avg loss 0.339006, throughput 2.77385K wps
[Epoch 78 Batch 30/172] avg loss 0.00153052, throughput 2.85089K wps
[Epoch 78 Batch 60/172] avg loss 0.00169569, throughput 2.80119K wps
[Epoch 78 Batch 90/172] avg loss 0.0013032, throughput 2.67655K wps
[Epoch 78 Batch 120/172] avg loss 0.00159728, throughput 2.80974K wps
[Epoch 78 Batch 150/172] avg loss 0.00185166, throughput 2.8185K wps
Begin Testing...
[Epoch 78] train avg loss 0.00157366, dev acc 0.8847, dev avg loss 0.353385, throughput 2.79059K wps
[Epoch 79 Batch 30/172] avg loss 0.00165853, throughput 2.73326K wps
[Epoch 79 Batch 60/172] avg loss 0.00177558, throughput 2.84831K wps
[Epoch 79 Batch 90/172] avg loss 0.00148779, throughput 2.72322K wps
[Epoch 79 Batch 120/172] avg loss 0.00159862, throughput 2.66846K wps
[Epoch 79 Batch 150/172] avg loss 0.00166202, throughput 2.78648K wps
Begin Testing...
[Epoch 79] train avg loss 0.00165862, dev acc 0.8857, dev avg loss 0.346793, throughput 2.75659K wps
[Epoch 80 Batch 30/172] avg loss 0.0014278, throughput 2.85089K wps
[Epoch 80 Batch 60/172] avg loss 0.00162588, throughput 2.7457K wps
[Epoch 80 Batch 90/172] avg loss 0.00146205, throughput 2.82465K wps
[Epoch 80 Batch 120/172] avg loss 0.00161393, throughput 2.76645K wps
[Epoch 80 Batch 150/172] avg loss 0.00153597, throughput 2.77412K wps
Begin Testing...
[Epoch 80] train avg loss 0.00155417, dev acc 0.8805, dev avg loss 0.34539, throughput 2.79324K wps
[Epoch 81 Batch 30/172] avg loss 0.00135982, throughput 2.85489K wps
[Epoch 81 Batch 60/172] avg loss 0.00177747, throughput 2.78063K wps
[Epoch 81 Batch 90/172] avg loss 0.00147059, throughput 2.6416K wps
[Epoch 81 Batch 120/172] avg loss 0.00147718, throughput 2.87082K wps
[Epoch 81 Batch 150/172] avg loss 0.00173423, throughput 2.82662K wps
Begin Testing...
[Epoch 81] train avg loss 0.00156963, dev acc 0.8836, dev avg loss 0.349941, throughput 2.79501K wps
[Epoch 82 Batch 30/172] avg loss 0.00172422, throughput 2.83615K wps
[Epoch 82 Batch 60/172] avg loss 0.00164427, throughput 2.80438K wps
[Epoch 82 Batch 90/172] avg loss 0.00157658, throughput 2.82432K wps
[Epoch 82 Batch 120/172] avg loss 0.00136438, throughput 2.81134K wps
[Epoch 82 Batch 150/172] avg loss 0.0016074, throughput 2.81346K wps
Begin Testing...
[Epoch 82] train avg loss 0.00156028, dev acc 0.8784, dev avg loss 0.35216, throughput 2.81823K wps
[Epoch 83 Batch 30/172] avg loss 0.00149176, throughput 2.76393K wps
[Epoch 83 Batch 60/172] avg loss 0.00147162, throughput 2.70302K wps
[Epoch 83 Batch 90/172] avg loss 0.00137855, throughput 2.66927K wps
[Epoch 83 Batch 120/172] avg loss 0.00164464, throughput 2.7558K wps
[Epoch 83 Batch 150/172] avg loss 0.00169072, throughput 2.77353K wps
Begin Testing...
[Epoch 83] train avg loss 0.00152669, dev acc 0.8857, dev avg loss 0.353714, throughput 2.73792K wps
[Epoch 84 Batch 30/172] avg loss 0.00163609, throughput 2.80053K wps
[Epoch 84 Batch 60/172] avg loss 0.00135959, throughput 2.67912K wps
[Epoch 84 Batch 90/172] avg loss 0.00156341, throughput 2.79895K wps
[Epoch 84 Batch 120/172] avg loss 0.00132367, throughput 2.80828K wps
[Epoch 84 Batch 150/172] avg loss 0.00143879, throughput 2.81889K wps
Begin Testing...
[Epoch 84] train avg loss 0.00148002, dev acc 0.8857, dev avg loss 0.358406, throughput 2.7787K wps
[Epoch 85 Batch 30/172] avg loss 0.00145745, throughput 2.85942K wps
[Epoch 85 Batch 60/172] avg loss 0.00134181, throughput 2.60491K wps
[Epoch 85 Batch 90/172] avg loss 0.00153363, throughput 2.8667K wps
[Epoch 85 Batch 120/172] avg loss 0.00188838, throughput 2.67901K wps
[Epoch 85 Batch 150/172] avg loss 0.00147216, throughput 2.81503K wps
Begin Testing...
[Epoch 85] train avg loss 0.0015088, dev acc 0.8857, dev avg loss 0.359537, throughput 2.76382K wps
[Epoch 86 Batch 30/172] avg loss 0.00128487, throughput 2.84766K wps
[Epoch 86 Batch 60/172] avg loss 0.00154641, throughput 2.72689K wps
[Epoch 86 Batch 90/172] avg loss 0.00132645, throughput 2.69712K wps
[Epoch 86 Batch 120/172] avg loss 0.00174567, throughput 2.71564K wps
[Epoch 86 Batch 150/172] avg loss 0.00158331, throughput 2.71753K wps
Begin Testing...
[Epoch 86] train avg loss 0.00152183, dev acc 0.8816, dev avg loss 0.356918, throughput 2.73856K wps
[Epoch 87 Batch 30/172] avg loss 0.00149061, throughput 2.76425K wps
[Epoch 87 Batch 60/172] avg loss 0.00144733, throughput 2.78829K wps
[Epoch 87 Batch 90/172] avg loss 0.00149273, throughput 2.73616K wps
[Epoch 87 Batch 120/172] avg loss 0.00130997, throughput 2.73057K wps
[Epoch 87 Batch 150/172] avg loss 0.00159867, throughput 2.73814K wps
Begin Testing...
[Epoch 87] train avg loss 0.00147904, dev acc 0.8868, dev avg loss 0.364397, throughput 2.75418K wps
[Epoch 88 Batch 30/172] avg loss 0.00144808, throughput 2.7329K wps
[Epoch 88 Batch 60/172] avg loss 0.0015261, throughput 2.68869K wps
[Epoch 88 Batch 90/172] avg loss 0.00160711, throughput 2.81581K wps
[Epoch 88 Batch 120/172] avg loss 0.00134493, throughput 2.76683K wps
[Epoch 88 Batch 150/172] avg loss 0.00147698, throughput 2.77858K wps
Begin Testing...
[Epoch 88] train avg loss 0.00146596, dev acc 0.8836, dev avg loss 0.358323, throughput 2.75944K wps
[Epoch 89 Batch 30/172] avg loss 0.00139321, throughput 2.7099K wps
[Epoch 89 Batch 60/172] avg loss 0.0014177, throughput 2.779K wps
[Epoch 89 Batch 90/172] avg loss 0.00167616, throughput 2.66383K wps
[Epoch 89 Batch 120/172] avg loss 0.00133774, throughput 2.71841K wps
[Epoch 89 Batch 150/172] avg loss 0.00149181, throughput 2.74423K wps
Begin Testing...
[Epoch 89] train avg loss 0.00143547, dev acc 0.8836, dev avg loss 0.361525, throughput 2.71448K wps
[Epoch 90 Batch 30/172] avg loss 0.001308, throughput 2.71562K wps
[Epoch 90 Batch 60/172] avg loss 0.00120969, throughput 2.75585K wps
[Epoch 90 Batch 90/172] avg loss 0.00152813, throughput 2.79558K wps
[Epoch 90 Batch 120/172] avg loss 0.00150173, throughput 2.78069K wps
[Epoch 90 Batch 150/172] avg loss 0.0013234, throughput 2.76776K wps
Begin Testing...
[Epoch 90] train avg loss 0.00138453, dev acc 0.8847, dev avg loss 0.366482, throughput 2.76655K wps
[Epoch 91 Batch 30/172] avg loss 0.00135162, throughput 2.8068K wps
[Epoch 91 Batch 60/172] avg loss 0.00147274, throughput 2.78038K wps
[Epoch 91 Batch 90/172] avg loss 0.00117917, throughput 2.66801K wps
[Epoch 91 Batch 120/172] avg loss 0.00164703, throughput 2.64495K wps
[Epoch 91 Batch 150/172] avg loss 0.00132307, throughput 2.59627K wps
Begin Testing...
[Epoch 91] train avg loss 0.00142008, dev acc 0.8836, dev avg loss 0.366514, throughput 2.69154K wps
[Epoch 92 Batch 30/172] avg loss 0.00141428, throughput 2.73978K wps
[Epoch 92 Batch 60/172] avg loss 0.00115986, throughput 2.78673K wps
[Epoch 92 Batch 90/172] avg loss 0.00148471, throughput 2.68072K wps
[Epoch 92 Batch 120/172] avg loss 0.00150825, throughput 2.62263K wps
[Epoch 92 Batch 150/172] avg loss 0.00134274, throughput 2.71137K wps
Begin Testing...
[Epoch 92] train avg loss 0.00138712, dev acc 0.8763, dev avg loss 0.362103, throughput 2.71361K wps
[Epoch 93 Batch 30/172] avg loss 0.00144244, throughput 2.76761K wps
[Epoch 93 Batch 60/172] avg loss 0.00130748, throughput 2.698K wps
[Epoch 93 Batch 90/172] avg loss 0.00137945, throughput 2.71566K wps
[Epoch 93 Batch 120/172] avg loss 0.00133854, throughput 2.72896K wps
[Epoch 93 Batch 150/172] avg loss 0.00137892, throughput 2.77777K wps
Begin Testing...
[Epoch 93] train avg loss 0.00140499, dev acc 0.8816, dev avg loss 0.370645, throughput 2.74439K wps
[Epoch 94 Batch 30/172] avg loss 0.0012216, throughput 2.81141K wps
[Epoch 94 Batch 60/172] avg loss 0.00138639, throughput 2.81154K wps
[Epoch 94 Batch 90/172] avg loss 0.00157057, throughput 2.69576K wps
[Epoch 94 Batch 120/172] avg loss 0.00138366, throughput 2.79724K wps
[Epoch 94 Batch 150/172] avg loss 0.00144457, throughput 2.74426K wps
Begin Testing...
[Epoch 94] train avg loss 0.00137991, dev acc 0.8784, dev avg loss 0.365997, throughput 2.77885K wps
[Epoch 95 Batch 30/172] avg loss 0.00128633, throughput 2.88354K wps
[Epoch 95 Batch 60/172] avg loss 0.00149179, throughput 2.8185K wps
[Epoch 95 Batch 90/172] avg loss 0.00136911, throughput 2.81299K wps
[Epoch 95 Batch 120/172] avg loss 0.00125868, throughput 2.73382K wps
[Epoch 95 Batch 150/172] avg loss 0.00152256, throughput 2.65383K wps
Begin Testing...
[Epoch 95] train avg loss 0.00136463, dev acc 0.8847, dev avg loss 0.375225, throughput 2.77789K wps
[Epoch 96 Batch 30/172] avg loss 0.00127297, throughput 2.86701K wps
[Epoch 96 Batch 60/172] avg loss 0.0012415, throughput 2.76333K wps
[Epoch 96 Batch 90/172] avg loss 0.00140106, throughput 2.74154K wps
[Epoch 96 Batch 120/172] avg loss 0.00153056, throughput 2.78667K wps
[Epoch 96 Batch 150/172] avg loss 0.00147084, throughput 2.81839K wps
Begin Testing...
[Epoch 96] train avg loss 0.00138546, dev acc 0.8836, dev avg loss 0.381039, throughput 2.79143K wps
[Epoch 97 Batch 30/172] avg loss 0.00122598, throughput 2.80912K wps
[Epoch 97 Batch 60/172] avg loss 0.00129342, throughput 2.76547K wps
[Epoch 97 Batch 90/172] avg loss 0.00155185, throughput 2.82409K wps
[Epoch 97 Batch 120/172] avg loss 0.00119645, throughput 2.826K wps
[Epoch 97 Batch 150/172] avg loss 0.00138605, throughput 2.8169K wps
Begin Testing...
[Epoch 97] train avg loss 0.0013515, dev acc 0.8847, dev avg loss 0.377122, throughput 2.80693K wps
[Epoch 98 Batch 30/172] avg loss 0.00125917, throughput 2.87008K wps
[Epoch 98 Batch 60/172] avg loss 0.00139154, throughput 2.8054K wps
[Epoch 98 Batch 90/172] avg loss 0.00123782, throughput 2.77732K wps
[Epoch 98 Batch 120/172] avg loss 0.00133369, throughput 2.68978K wps
[Epoch 98 Batch 150/172] avg loss 0.00123166, throughput 2.79804K wps
Begin Testing...
[Epoch 98] train avg loss 0.00130718, dev acc 0.8836, dev avg loss 0.385561, throughput 2.77877K wps
[Epoch 99 Batch 30/172] avg loss 0.00130718, throughput 2.8384K wps
[Epoch 99 Batch 60/172] avg loss 0.00135356, throughput 2.82779K wps
[Epoch 99 Batch 90/172] avg loss 0.00143748, throughput 2.73879K wps
[Epoch 99 Batch 120/172] avg loss 0.00124602, throughput 2.81733K wps
[Epoch 99 Batch 150/172] avg loss 0.001143, throughput 2.71553K wps
Begin Testing...
[Epoch 99] train avg loss 0.00130833, dev acc 0.8795, dev avg loss 0.381089, throughput 2.7714K wps
[Epoch 100 Batch 30/172] avg loss 0.00139035, throughput 2.71791K wps
[Epoch 100 Batch 60/172] avg loss 0.00129047, throughput 2.7054K wps
[Epoch 100 Batch 90/172] avg loss 0.00128067, throughput 2.76936K wps
[Epoch 100 Batch 120/172] avg loss 0.00120748, throughput 2.80795K wps
[Epoch 100 Batch 150/172] avg loss 0.00133554, throughput 2.81988K wps
Begin Testing...
[Epoch 100] train avg loss 0.00132652, dev acc 0.8836, dev avg loss 0.386272, throughput 2.7642K wps
[Epoch 101 Batch 30/172] avg loss 0.00128642, throughput 2.86651K wps
[Epoch 101 Batch 60/172] avg loss 0.00125655, throughput 2.82602K wps
[Epoch 101 Batch 90/172] avg loss 0.00107329, throughput 2.81375K wps
[Epoch 101 Batch 120/172] avg loss 0.00135166, throughput 2.71277K wps
[Epoch 101 Batch 150/172] avg loss 0.00145364, throughput 2.68403K wps
Begin Testing...
[Epoch 101] train avg loss 0.00127788, dev acc 0.8836, dev avg loss 0.391514, throughput 2.78034K wps
[Epoch 102 Batch 30/172] avg loss 0.00124856, throughput 2.76884K wps
[Epoch 102 Batch 60/172] avg loss 0.00121874, throughput 2.70177K wps
[Epoch 102 Batch 90/172] avg loss 0.00131657, throughput 2.82995K wps
[Epoch 102 Batch 120/172] avg loss 0.0012616, throughput 2.80322K wps
[Epoch 102 Batch 150/172] avg loss 0.00124682, throughput 2.77702K wps
Begin Testing...
[Epoch 102] train avg loss 0.00126056, dev acc 0.8826, dev avg loss 0.395905, throughput 2.77278K wps
[Epoch 103 Batch 30/172] avg loss 0.00117484, throughput 2.81621K wps
[Epoch 103 Batch 60/172] avg loss 0.00138272, throughput 2.77017K wps
[Epoch 103 Batch 90/172] avg loss 0.00118179, throughput 2.70804K wps
[Epoch 103 Batch 120/172] avg loss 0.00125369, throughput 2.76492K wps
[Epoch 103 Batch 150/172] avg loss 0.00130781, throughput 2.69906K wps
Begin Testing...
[Epoch 103] train avg loss 0.00125341, dev acc 0.8774, dev avg loss 0.386605, throughput 2.7527K wps
[Epoch 104 Batch 30/172] avg loss 0.00105678, throughput 2.86183K wps
[Epoch 104 Batch 60/172] avg loss 0.00118208, throughput 2.7161K wps
[Epoch 104 Batch 90/172] avg loss 0.00155007, throughput 2.7006K wps
[Epoch 104 Batch 120/172] avg loss 0.00130326, throughput 2.74599K wps
[Epoch 104 Batch 150/172] avg loss 0.00121954, throughput 2.8085K wps
Begin Testing...
[Epoch 104] train avg loss 0.00124248, dev acc 0.8784, dev avg loss 0.386033, throughput 2.76911K wps
[Epoch 105 Batch 30/172] avg loss 0.00115801, throughput 2.81745K wps
[Epoch 105 Batch 60/172] avg loss 0.00121295, throughput 2.71448K wps
[Epoch 105 Batch 90/172] avg loss 0.00122893, throughput 2.76511K wps
[Epoch 105 Batch 120/172] avg loss 0.00149382, throughput 2.55807K wps
[Epoch 105 Batch 150/172] avg loss 0.00108476, throughput 2.77742K wps
Begin Testing...
[Epoch 105] train avg loss 0.00126495, dev acc 0.8795, dev avg loss 0.392979, throughput 2.72938K wps
[Epoch 106 Batch 30/172] avg loss 0.00113196, throughput 2.70069K wps
[Epoch 106 Batch 60/172] avg loss 0.00116276, throughput 2.7935K wps
[Epoch 106 Batch 90/172] avg loss 0.00118754, throughput 2.75603K wps
[Epoch 106 Batch 120/172] avg loss 0.00130417, throughput 2.71778K wps
[Epoch 106 Batch 150/172] avg loss 0.00143454, throughput 2.76284K wps
Begin Testing...
[Epoch 106] train avg loss 0.00126019, dev acc 0.8774, dev avg loss 0.389534, throughput 2.74726K wps
[Epoch 107 Batch 30/172] avg loss 0.00121251, throughput 2.80312K wps
[Epoch 107 Batch 60/172] avg loss 0.00107181, throughput 2.75629K wps
[Epoch 107 Batch 90/172] avg loss 0.00122687, throughput 2.72619K wps
[Epoch 107 Batch 120/172] avg loss 0.00120954, throughput 2.7472K wps
[Epoch 107 Batch 150/172] avg loss 0.0011288, throughput 2.75047K wps
Begin Testing...
[Epoch 107] train avg loss 0.00120756, dev acc 0.8742, dev avg loss 0.386645, throughput 2.7551K wps
[Epoch 108 Batch 30/172] avg loss 0.00116764, throughput 2.79232K wps
[Epoch 108 Batch 60/172] avg loss 0.00126542, throughput 2.76108K wps
[Epoch 108 Batch 90/172] avg loss 0.00113832, throughput 2.75275K wps
[Epoch 108 Batch 120/172] avg loss 0.00117084, throughput 2.78073K wps
[Epoch 108 Batch 150/172] avg loss 0.00124701, throughput 2.72794K wps
Begin Testing...
[Epoch 108] train avg loss 0.00121456, dev acc 0.8795, dev avg loss 0.39218, throughput 2.77347K wps
[Epoch 109 Batch 30/172] avg loss 0.0012272, throughput 2.81954K wps
[Epoch 109 Batch 60/172] avg loss 0.00130019, throughput 2.80819K wps
[Epoch 109 Batch 90/172] avg loss 0.00116882, throughput 2.82337K wps
[Epoch 109 Batch 120/172] avg loss 0.00146559, throughput 2.81968K wps
[Epoch 109 Batch 150/172] avg loss 0.000909688, throughput 2.77622K wps
Begin Testing...
[Epoch 109] train avg loss 0.00125098, dev acc 0.8763, dev avg loss 0.393064, throughput 2.79065K wps
[Epoch 110 Batch 30/172] avg loss 0.00106254, throughput 2.8523K wps
[Epoch 110 Batch 60/172] avg loss 0.00116698, throughput 2.80215K wps
[Epoch 110 Batch 90/172] avg loss 0.00123249, throughput 2.77909K wps
[Epoch 110 Batch 120/172] avg loss 0.00125302, throughput 2.70045K wps
[Epoch 110 Batch 150/172] avg loss 0.00122551, throughput 2.70199K wps
Begin Testing...
[Epoch 110] train avg loss 0.00118992, dev acc 0.8784, dev avg loss 0.403369, throughput 2.77512K wps
[Epoch 111 Batch 30/172] avg loss 0.00102229, throughput 2.84806K wps
[Epoch 111 Batch 60/172] avg loss 0.00115309, throughput 2.77264K wps
[Epoch 111 Batch 90/172] avg loss 0.00102898, throughput 2.80214K wps
[Epoch 111 Batch 120/172] avg loss 0.00131528, throughput 2.79054K wps
[Epoch 111 Batch 150/172] avg loss 0.00118723, throughput 2.76944K wps
Begin Testing...
[Epoch 111] train avg loss 0.00116482, dev acc 0.8763, dev avg loss 0.392626, throughput 2.79328K wps
[Epoch 112 Batch 30/172] avg loss 0.00120851, throughput 2.6963K wps
[Epoch 112 Batch 60/172] avg loss 0.00129749, throughput 2.77419K wps
[Epoch 112 Batch 90/172] avg loss 0.00110445, throughput 2.76591K wps
[Epoch 112 Batch 120/172] avg loss 0.00106701, throughput 2.7282K wps
[Epoch 112 Batch 150/172] avg loss 0.00106633, throughput 2.60752K wps
Begin Testing...
[Epoch 112] train avg loss 0.00120059, dev acc 0.8742, dev avg loss 0.3901, throughput 2.71629K wps
[Epoch 113 Batch 30/172] avg loss 0.00100532, throughput 2.80276K wps
[Epoch 113 Batch 60/172] avg loss 0.00105896, throughput 2.69059K wps
[Epoch 113 Batch 90/172] avg loss 0.00135811, throughput 2.82794K wps
[Epoch 113 Batch 120/172] avg loss 0.00109355, throughput 2.6934K wps
[Epoch 113 Batch 150/172] avg loss 0.00153393, throughput 2.81702K wps
Begin Testing...
[Epoch 113] train avg loss 0.00117668, dev acc 0.8816, dev avg loss 0.405281, throughput 2.77494K wps
[Epoch 114 Batch 30/172] avg loss 0.00123457, throughput 2.86718K wps
[Epoch 114 Batch 60/172] avg loss 0.00116172, throughput 2.66259K wps
[Epoch 114 Batch 90/172] avg loss 0.00100368, throughput 2.82161K wps
[Epoch 114 Batch 120/172] avg loss 0.00135368, throughput 2.84744K wps
[Epoch 114 Batch 150/172] avg loss 0.00119027, throughput 2.80733K wps
Begin Testing...
[Epoch 114] train avg loss 0.00116912, dev acc 0.8753, dev avg loss 0.397811, throughput 2.80303K wps
[Epoch 115 Batch 30/172] avg loss 0.00116833, throughput 2.87813K wps
[Epoch 115 Batch 60/172] avg loss 0.00120844, throughput 2.70539K wps
[Epoch 115 Batch 90/172] avg loss 0.00107383, throughput 2.80326K wps
[Epoch 115 Batch 120/172] avg loss 0.00115432, throughput 2.76731K wps
[Epoch 115 Batch 150/172] avg loss 0.00108699, throughput 2.80629K wps
Begin Testing...
[Epoch 115] train avg loss 0.00118092, dev acc 0.8732, dev avg loss 0.398761, throughput 2.7869K wps
[Epoch 116 Batch 30/172] avg loss 0.00116465, throughput 2.78044K wps
[Epoch 116 Batch 60/172] avg loss 0.00110007, throughput 2.78352K wps
[Epoch 116 Batch 90/172] avg loss 0.00098856, throughput 2.79159K wps
[Epoch 116 Batch 120/172] avg loss 0.0011331, throughput 2.79427K wps
[Epoch 116 Batch 150/172] avg loss 0.00130854, throughput 2.74725K wps
Begin Testing...
[Epoch 116] train avg loss 0.00114303, dev acc 0.8763, dev avg loss 0.400664, throughput 2.74802K wps
[Epoch 117 Batch 30/172] avg loss 0.00111233, throughput 2.79333K wps
[Epoch 117 Batch 60/172] avg loss 0.00114745, throughput 2.7269K wps
[Epoch 117 Batch 90/172] avg loss 0.00104815, throughput 2.75143K wps
[Epoch 117 Batch 120/172] avg loss 0.00126141, throughput 2.72925K wps
[Epoch 117 Batch 150/172] avg loss 0.000963707, throughput 2.7567K wps
Begin Testing...
[Epoch 117] train avg loss 0.00112186, dev acc 0.8816, dev avg loss 0.414897, throughput 2.75309K wps
[Epoch 118 Batch 30/172] avg loss 0.00110368, throughput 2.81341K wps
[Epoch 118 Batch 60/172] avg loss 0.00115515, throughput 2.73616K wps
[Epoch 118 Batch 90/172] avg loss 0.00121758, throughput 2.78271K wps
[Epoch 118 Batch 120/172] avg loss 0.00113137, throughput 2.78133K wps
[Epoch 118 Batch 150/172] avg loss 0.00097112, throughput 2.62494K wps
Begin Testing...
[Epoch 118] train avg loss 0.00112864, dev acc 0.8836, dev avg loss 0.416375, throughput 2.74827K wps
[Epoch 119 Batch 30/172] avg loss 0.000979859, throughput 2.65669K wps
[Epoch 119 Batch 60/172] avg loss 0.000982254, throughput 2.78552K wps
[Epoch 119 Batch 90/172] avg loss 0.00119937, throughput 2.76005K wps
[Epoch 119 Batch 120/172] avg loss 0.00131748, throughput 2.74607K wps
[Epoch 119 Batch 150/172] avg loss 0.00118537, throughput 2.73266K wps
Begin Testing...
[Epoch 119] train avg loss 0.00111257, dev acc 0.8847, dev avg loss 0.418709, throughput 2.74194K wps
[Epoch 120 Batch 30/172] avg loss 0.00113309, throughput 2.68079K wps
[Epoch 120 Batch 60/172] avg loss 0.00105472, throughput 2.83112K wps
[Epoch 120 Batch 90/172] avg loss 0.00111334, throughput 2.75081K wps
[Epoch 120 Batch 120/172] avg loss 0.000862267, throughput 2.77089K wps
[Epoch 120 Batch 150/172] avg loss 0.00116842, throughput 2.84587K wps
Begin Testing...
[Epoch 120] train avg loss 0.00108227, dev acc 0.8753, dev avg loss 0.411315, throughput 2.78211K wps
[Epoch 121 Batch 30/172] avg loss 0.00112838, throughput 2.84007K wps
[Epoch 121 Batch 60/172] avg loss 0.00110921, throughput 2.81598K wps
[Epoch 121 Batch 90/172] avg loss 0.00115535, throughput 2.82322K wps
[Epoch 121 Batch 120/172] avg loss 0.00116351, throughput 2.7801K wps
[Epoch 121 Batch 150/172] avg loss 0.00114652, throughput 2.63595K wps
Begin Testing...
[Epoch 121] train avg loss 0.00112317, dev acc 0.8711, dev avg loss 0.413547, throughput 2.77724K wps
[Epoch 122 Batch 30/172] avg loss 0.00123213, throughput 2.76342K wps
[Epoch 122 Batch 60/172] avg loss 0.00106079, throughput 2.75552K wps
[Epoch 122 Batch 90/172] avg loss 0.00110019, throughput 2.78838K wps
[Epoch 122 Batch 120/172] avg loss 0.00108071, throughput 2.59867K wps
[Epoch 122 Batch 150/172] avg loss 0.00122793, throughput 2.70021K wps
Begin Testing...
[Epoch 122] train avg loss 0.00113785, dev acc 0.8732, dev avg loss 0.409658, throughput 2.7134K wps
[Epoch 123 Batch 30/172] avg loss 0.00120986, throughput 2.73902K wps
[Epoch 123 Batch 60/172] avg loss 0.000963434, throughput 2.76941K wps
[Epoch 123 Batch 90/172] avg loss 0.00123545, throughput 2.72724K wps
[Epoch 123 Batch 120/172] avg loss 0.00109523, throughput 2.83031K wps
[Epoch 123 Batch 150/172] avg loss 0.00114176, throughput 2.79333K wps
Begin Testing...
[Epoch 123] train avg loss 0.00110525, dev acc 0.8742, dev avg loss 0.410695, throughput 2.75873K wps
[Epoch 124 Batch 30/172] avg loss 0.000844326, throughput 2.80826K wps
[Epoch 124 Batch 60/172] avg loss 0.00107982, throughput 2.75298K wps
[Epoch 124 Batch 90/172] avg loss 0.000995569, throughput 2.80317K wps
[Epoch 124 Batch 120/172] avg loss 0.00135346, throughput 2.81511K wps
[Epoch 124 Batch 150/172] avg loss 0.00123074, throughput 2.74349K wps
Begin Testing...
[Epoch 124] train avg loss 0.00108729, dev acc 0.8690, dev avg loss 0.409354, throughput 2.77845K wps
[Epoch 125 Batch 30/172] avg loss 0.000875502, throughput 2.81173K wps
[Epoch 125 Batch 60/172] avg loss 0.000965245, throughput 2.77058K wps
[Epoch 125 Batch 90/172] avg loss 0.00126172, throughput 2.66982K wps
[Epoch 125 Batch 120/172] avg loss 0.000971165, throughput 2.78182K wps
[Epoch 125 Batch 150/172] avg loss 0.00122199, throughput 2.75008K wps
Begin Testing...
[Epoch 125] train avg loss 0.00109779, dev acc 0.8690, dev avg loss 0.406987, throughput 2.76361K wps
[Epoch 126 Batch 30/172] avg loss 0.00107545, throughput 2.76945K wps
[Epoch 126 Batch 60/172] avg loss 0.000903218, throughput 2.81582K wps
[Epoch 126 Batch 90/172] avg loss 0.00109262, throughput 2.83362K wps
[Epoch 126 Batch 120/172] avg loss 0.00100467, throughput 2.8154K wps
[Epoch 126 Batch 150/172] avg loss 0.00132938, throughput 2.81326K wps
Begin Testing...
[Epoch 126] train avg loss 0.00108607, dev acc 0.8774, dev avg loss 0.422488, throughput 2.80391K wps
[Epoch 127 Batch 30/172] avg loss 0.00101127, throughput 2.75022K wps
[Epoch 127 Batch 60/172] avg loss 0.000989318, throughput 2.85533K wps
[Epoch 127 Batch 90/172] avg loss 0.00115565, throughput 2.73675K wps
[Epoch 127 Batch 120/172] avg loss 0.00114603, throughput 2.69861K wps
[Epoch 127 Batch 150/172] avg loss 0.00123293, throughput 2.78251K wps
Begin Testing...
[Epoch 127] train avg loss 0.00108391, dev acc 0.8742, dev avg loss 0.416631, throughput 2.74942K wps
[Epoch 128 Batch 30/172] avg loss 0.00103738, throughput 2.84434K wps
[Epoch 128 Batch 60/172] avg loss 0.00110538, throughput 2.70608K wps
[Epoch 128 Batch 90/172] avg loss 0.000932196, throughput 2.74652K wps
[Epoch 128 Batch 120/172] avg loss 0.00126013, throughput 2.75047K wps
[Epoch 128 Batch 150/172] avg loss 0.00102552, throughput 2.6805K wps
Begin Testing...
[Epoch 128] train avg loss 0.00107551, dev acc 0.8763, dev avg loss 0.424315, throughput 2.76143K wps
[Epoch 129 Batch 30/172] avg loss 0.00107484, throughput 2.80356K wps
[Epoch 129 Batch 60/172] avg loss 0.00102244, throughput 2.68651K wps
[Epoch 129 Batch 90/172] avg loss 0.000905811, throughput 2.82171K wps
[Epoch 129 Batch 120/172] avg loss 0.00130147, throughput 2.83255K wps
[Epoch 129 Batch 150/172] avg loss 0.000874757, throughput 2.81472K wps
Begin Testing...
[Epoch 129] train avg loss 0.00105752, dev acc 0.8711, dev avg loss 0.420764, throughput 2.79282K wps
[Epoch 130 Batch 30/172] avg loss 0.00106957, throughput 2.87569K wps
[Epoch 130 Batch 60/172] avg loss 0.000995615, throughput 2.73588K wps
[Epoch 130 Batch 90/172] avg loss 0.000952297, throughput 2.72247K wps
[Epoch 130 Batch 120/172] avg loss 0.000980015, throughput 2.80817K wps
[Epoch 130 Batch 150/172] avg loss 0.00121599, throughput 2.69254K wps
Begin Testing...
[Epoch 130] train avg loss 0.0010567, dev acc 0.8753, dev avg loss 0.426588, throughput 2.73721K wps
[Epoch 131 Batch 30/172] avg loss 0.000933362, throughput 2.7796K wps
[Epoch 131 Batch 60/172] avg loss 0.00110347, throughput 2.64242K wps
[Epoch 131 Batch 90/172] avg loss 0.0010496, throughput 2.76014K wps
[Epoch 131 Batch 120/172] avg loss 0.00109522, throughput 2.75154K wps
[Epoch 131 Batch 150/172] avg loss 0.00119175, throughput 2.68425K wps
Begin Testing...
[Epoch 131] train avg loss 0.00104719, dev acc 0.8732, dev avg loss 0.425656, throughput 2.72775K wps
[Epoch 132 Batch 30/172] avg loss 0.00102949, throughput 2.85009K wps
[Epoch 132 Batch 60/172] avg loss 0.000841538, throughput 2.69736K wps
[Epoch 132 Batch 90/172] avg loss 0.00104198, throughput 2.83625K wps
[Epoch 132 Batch 120/172] avg loss 0.000930769, throughput 2.80567K wps
[Epoch 132 Batch 150/172] avg loss 0.00118039, throughput 2.681K wps
Begin Testing...
[Epoch 132] train avg loss 0.00102427, dev acc 0.8721, dev avg loss 0.424918, throughput 2.78689K wps
[Epoch 133 Batch 30/172] avg loss 0.000929047, throughput 2.8536K wps
[Epoch 133 Batch 60/172] avg loss 0.00100095, throughput 2.82753K wps
[Epoch 133 Batch 90/172] avg loss 0.00109208, throughput 2.77692K wps
[Epoch 133 Batch 120/172] avg loss 0.000937357, throughput 2.75203K wps
[Epoch 133 Batch 150/172] avg loss 0.00124781, throughput 2.77254K wps
Begin Testing...
[Epoch 133] train avg loss 0.00104708, dev acc 0.8721, dev avg loss 0.424054, throughput 2.79174K wps
[Epoch 134 Batch 30/172] avg loss 0.00123395, throughput 2.6517K wps
[Epoch 134 Batch 60/172] avg loss 0.00105461, throughput 2.78322K wps
[Epoch 134 Batch 90/172] avg loss 0.00105144, throughput 2.76796K wps
[Epoch 134 Batch 120/172] avg loss 0.00100036, throughput 2.77604K wps
[Epoch 134 Batch 150/172] avg loss 0.001003, throughput 2.7887K wps
Begin Testing...
[Epoch 134] train avg loss 0.00103455, dev acc 0.8700, dev avg loss 0.425837, throughput 2.75912K wps
[Epoch 135 Batch 30/172] avg loss 0.000824522, throughput 2.84377K wps
[Epoch 135 Batch 60/172] avg loss 0.0010946, throughput 2.76465K wps
[Epoch 135 Batch 90/172] avg loss 0.00102055, throughput 2.72504K wps
[Epoch 135 Batch 120/172] avg loss 0.00111425, throughput 2.61698K wps
[Epoch 135 Batch 150/172] avg loss 0.00100462, throughput 2.73853K wps
Begin Testing...
[Epoch 135] train avg loss 0.00103359, dev acc 0.8742, dev avg loss 0.432483, throughput 2.74508K wps
[Epoch 136 Batch 30/172] avg loss 0.000999315, throughput 2.81561K wps
[Epoch 136 Batch 60/172] avg loss 0.00105131, throughput 2.74719K wps
[Epoch 136 Batch 90/172] avg loss 0.00107401, throughput 2.85326K wps
[Epoch 136 Batch 120/172] avg loss 0.00117571, throughput 2.75307K wps
[Epoch 136 Batch 150/172] avg loss 0.00112495, throughput 2.85981K wps
Begin Testing...
[Epoch 136] train avg loss 0.00105125, dev acc 0.8732, dev avg loss 0.431173, throughput 2.80049K wps
[Epoch 137 Batch 30/172] avg loss 0.00100453, throughput 2.87131K wps
[Epoch 137 Batch 60/172] avg loss 0.000993876, throughput 2.80228K wps
[Epoch 137 Batch 90/172] avg loss 0.000978457, throughput 2.79391K wps
[Epoch 137 Batch 120/172] avg loss 0.00108968, throughput 2.66218K wps
[Epoch 137 Batch 150/172] avg loss 0.0010033, throughput 2.87437K wps
Begin Testing...
[Epoch 137] train avg loss 0.00100585, dev acc 0.8679, dev avg loss 0.426087, throughput 2.79927K wps
[Epoch 138 Batch 30/172] avg loss 0.000876025, throughput 2.81277K wps
[Epoch 138 Batch 60/172] avg loss 0.000959048, throughput 2.75955K wps
[Epoch 138 Batch 90/172] avg loss 0.000976369, throughput 2.7591K wps
[Epoch 138 Batch 120/172] avg loss 0.00101248, throughput 2.84439K wps
[Epoch 138 Batch 150/172] avg loss 0.000919826, throughput 2.78952K wps
Begin Testing...
[Epoch 138] train avg loss 0.000978871, dev acc 0.8774, dev avg loss 0.441284, throughput 2.789K wps
[Epoch 139 Batch 30/172] avg loss 0.000767959, throughput 2.86893K wps
[Epoch 139 Batch 60/172] avg loss 0.000991633, throughput 2.80355K wps
[Epoch 139 Batch 90/172] avg loss 0.00101656, throughput 2.82285K wps
[Epoch 139 Batch 120/172] avg loss 0.00105502, throughput 2.8122K wps
[Epoch 139 Batch 150/172] avg loss 0.00126999, throughput 2.82468K wps
Begin Testing...
[Epoch 139] train avg loss 0.00102984, dev acc 0.8690, dev avg loss 0.426065, throughput 2.82398K wps
[Epoch 140 Batch 30/172] avg loss 0.000882707, throughput 2.70597K wps
[Epoch 140 Batch 60/172] avg loss 0.000961593, throughput 2.85371K wps
[Epoch 140 Batch 90/172] avg loss 0.000894222, throughput 2.79837K wps
[Epoch 140 Batch 120/172] avg loss 0.00105962, throughput 2.81274K wps
[Epoch 140 Batch 150/172] avg loss 0.00100695, throughput 2.83068K wps
Begin Testing...
[Epoch 140] train avg loss 0.000959867, dev acc 0.8721, dev avg loss 0.435595, throughput 2.78778K wps
[Epoch 141 Batch 30/172] avg loss 0.000952414, throughput 2.79491K wps
[Epoch 141 Batch 60/172] avg loss 0.000882785, throughput 2.81161K wps
[Epoch 141 Batch 90/172] avg loss 0.001073, throughput 2.81527K wps
[Epoch 141 Batch 120/172] avg loss 0.000958858, throughput 2.80229K wps
[Epoch 141 Batch 150/172] avg loss 0.00100191, throughput 2.83338K wps
Begin Testing...
[Epoch 141] train avg loss 0.000989184, dev acc 0.8742, dev avg loss 0.43356, throughput 2.81233K wps
[Epoch 142 Batch 30/172] avg loss 0.000981376, throughput 2.88155K wps
[Epoch 142 Batch 60/172] avg loss 0.00090554, throughput 2.82815K wps
[Epoch 142 Batch 90/172] avg loss 0.00111199, throughput 2.81692K wps
[Epoch 142 Batch 120/172] avg loss 0.000848424, throughput 2.81862K wps
[Epoch 142 Batch 150/172] avg loss 0.000924198, throughput 2.8229K wps
Begin Testing...
[Epoch 142] train avg loss 0.000960787, dev acc 0.8805, dev avg loss 0.454374, throughput 2.83322K wps
[Epoch 143 Batch 30/172] avg loss 0.00095291, throughput 2.83067K wps
[Epoch 143 Batch 60/172] avg loss 0.000837984, throughput 2.77829K wps
[Epoch 143 Batch 90/172] avg loss 0.000910345, throughput 2.76152K wps
[Epoch 143 Batch 120/172] avg loss 0.00100837, throughput 2.76758K wps
[Epoch 143 Batch 150/172] avg loss 0.000967633, throughput 2.77282K wps
Begin Testing...
[Epoch 143] train avg loss 0.000957311, dev acc 0.8679, dev avg loss 0.42816, throughput 2.76516K wps
[Epoch 144 Batch 30/172] avg loss 0.00100786, throughput 2.69139K wps
[Epoch 144 Batch 60/172] avg loss 0.000996819, throughput 2.69457K wps
[Epoch 144 Batch 90/172] avg loss 0.000869517, throughput 2.81952K wps
[Epoch 144 Batch 120/172] avg loss 0.0011267, throughput 2.74262K wps
[Epoch 144 Batch 150/172] avg loss 0.000959323, throughput 2.62955K wps
Begin Testing...
[Epoch 144] train avg loss 0.00098729, dev acc 0.8732, dev avg loss 0.437153, throughput 2.72289K wps
[Epoch 145 Batch 30/172] avg loss 0.000986424, throughput 2.73706K wps
[Epoch 145 Batch 60/172] avg loss 0.000911007, throughput 2.78555K wps
[Epoch 145 Batch 90/172] avg loss 0.000895209, throughput 2.63821K wps
[Epoch 145 Batch 120/172] avg loss 0.00102123, throughput 2.76128K wps
[Epoch 145 Batch 150/172] avg loss 0.0010134, throughput 2.81115K wps
Begin Testing...
[Epoch 145] train avg loss 0.0009715, dev acc 0.8763, dev avg loss 0.444836, throughput 2.75168K wps
[Epoch 146 Batch 30/172] avg loss 0.000761066, throughput 2.85317K wps
[Epoch 146 Batch 60/172] avg loss 0.00107652, throughput 2.75903K wps
[Epoch 146 Batch 90/172] avg loss 0.000924407, throughput 2.75572K wps
[Epoch 146 Batch 120/172] avg loss 0.00100184, throughput 2.67393K wps
[Epoch 146 Batch 150/172] avg loss 0.00097837, throughput 2.80585K wps
Begin Testing...
[Epoch 146] train avg loss 0.000948075, dev acc 0.8690, dev avg loss 0.440923, throughput 2.77497K wps
[Epoch 147 Batch 30/172] avg loss 0.000971536, throughput 2.81299K wps
[Epoch 147 Batch 60/172] avg loss 0.000967195, throughput 2.74351K wps
[Epoch 147 Batch 90/172] avg loss 0.00102204, throughput 2.72141K wps
[Epoch 147 Batch 120/172] avg loss 0.000702788, throughput 2.69433K wps
[Epoch 147 Batch 150/172] avg loss 0.000912299, throughput 2.74553K wps
Begin Testing...
[Epoch 147] train avg loss 0.000933812, dev acc 0.8679, dev avg loss 0.439954, throughput 2.75408K wps
[Epoch 148 Batch 30/172] avg loss 0.000893502, throughput 2.82932K wps
[Epoch 148 Batch 60/172] avg loss 0.000775621, throughput 2.77259K wps
[Epoch 148 Batch 90/172] avg loss 0.00102155, throughput 2.74655K wps
[Epoch 148 Batch 120/172] avg loss 0.00103192, throughput 2.71194K wps
[Epoch 148 Batch 150/172] avg loss 0.00100367, throughput 2.74906K wps
Begin Testing...
[Epoch 148] train avg loss 0.000941322, dev acc 0.8742, dev avg loss 0.448747, throughput 2.76589K wps
[Epoch 149 Batch 30/172] avg loss 0.000966773, throughput 2.80993K wps
[Epoch 149 Batch 60/172] avg loss 0.000807861, throughput 2.65924K wps
[Epoch 149 Batch 90/172] avg loss 0.000789827, throughput 2.72216K wps
[Epoch 149 Batch 120/172] avg loss 0.00105477, throughput 2.72702K wps
[Epoch 149 Batch 150/172] avg loss 0.000998336, throughput 2.75707K wps
Begin Testing...
[Epoch 149] train avg loss 0.000955392, dev acc 0.8690, dev avg loss 0.437546, throughput 2.71172K wps
[Epoch 150 Batch 30/172] avg loss 0.000836748, throughput 2.81064K wps
[Epoch 150 Batch 60/172] avg loss 0.00107913, throughput 2.7704K wps
[Epoch 150 Batch 90/172] avg loss 0.000850911, throughput 2.66571K wps
[Epoch 150 Batch 120/172] avg loss 0.000913107, throughput 2.78836K wps
[Epoch 150 Batch 150/172] avg loss 0.000856594, throughput 2.79089K wps
Begin Testing...
[Epoch 150] train avg loss 0.000915303, dev acc 0.8753, dev avg loss 0.44612, throughput 2.76878K wps
[Epoch 151 Batch 30/172] avg loss 0.00105555, throughput 2.86273K wps
[Epoch 151 Batch 60/172] avg loss 0.00082092, throughput 2.8162K wps
[Epoch 151 Batch 90/172] avg loss 0.000744916, throughput 2.81153K wps
[Epoch 151 Batch 120/172] avg loss 0.00102123, throughput 2.80011K wps
[Epoch 151 Batch 150/172] avg loss 0.000930778, throughput 2.75008K wps
Begin Testing...
[Epoch 151] train avg loss 0.000956602, dev acc 0.8679, dev avg loss 0.439644, throughput 2.802K wps
[Epoch 152 Batch 30/172] avg loss 0.000906524, throughput 2.80118K wps
[Epoch 152 Batch 60/172] avg loss 0.000881338, throughput 2.8768K wps
[Epoch 152 Batch 90/172] avg loss 0.000865341, throughput 2.69961K wps
[Epoch 152 Batch 120/172] avg loss 0.000894784, throughput 2.74549K wps
[Epoch 152 Batch 150/172] avg loss 0.000998565, throughput 2.85456K wps
Begin Testing...
[Epoch 152] train avg loss 0.000926281, dev acc 0.8732, dev avg loss 0.456753, throughput 2.79879K wps
[Epoch 153 Batch 30/172] avg loss 0.00097709, throughput 2.82044K wps
[Epoch 153 Batch 60/172] avg loss 0.000939512, throughput 2.84995K wps
[Epoch 153 Batch 90/172] avg loss 0.000920233, throughput 2.64119K wps
[Epoch 153 Batch 120/172] avg loss 0.00102152, throughput 2.70676K wps
[Epoch 153 Batch 150/172] avg loss 0.00105635, throughput 2.79837K wps
Begin Testing...
[Epoch 153] train avg loss 0.000935561, dev acc 0.8816, dev avg loss 0.469615, throughput 2.76848K wps
[Epoch 154 Batch 30/172] avg loss 0.000788035, throughput 2.84606K wps
[Epoch 154 Batch 60/172] avg loss 0.000763566, throughput 2.69976K wps
[Epoch 154 Batch 90/172] avg loss 0.00102442, throughput 2.80315K wps
[Epoch 154 Batch 120/172] avg loss 0.000891118, throughput 2.80507K wps
[Epoch 154 Batch 150/172] avg loss 0.00101041, throughput 2.72151K wps
Begin Testing...
[Epoch 154] train avg loss 0.000917129, dev acc 0.8721, dev avg loss 0.459207, throughput 2.78609K wps
[Epoch 155 Batch 30/172] avg loss 0.000969169, throughput 2.86056K wps
[Epoch 155 Batch 60/172] avg loss 0.000853018, throughput 2.7669K wps
[Epoch 155 Batch 90/172] avg loss 0.00102049, throughput 2.82966K wps
[Epoch 155 Batch 120/172] avg loss 0.00080413, throughput 2.83002K wps
[Epoch 155 Batch 150/172] avg loss 0.000961169, throughput 2.81789K wps
Begin Testing...
[Epoch 155] train avg loss 0.000922946, dev acc 0.8700, dev avg loss 0.458052, throughput 2.80915K wps
[Epoch 156 Batch 30/172] avg loss 0.00075716, throughput 2.74948K wps
[Epoch 156 Batch 60/172] avg loss 0.000929719, throughput 2.80497K wps
[Epoch 156 Batch 90/172] avg loss 0.000970563, throughput 2.6811K wps
[Epoch 156 Batch 120/172] avg loss 0.000901816, throughput 2.79723K wps
[Epoch 156 Batch 150/172] avg loss 0.00087811, throughput 2.80412K wps
Begin Testing...
[Epoch 156] train avg loss 0.00089541, dev acc 0.8700, dev avg loss 0.45094, throughput 2.7728K wps
[Epoch 157 Batch 30/172] avg loss 0.000805608, throughput 2.82951K wps
[Epoch 157 Batch 60/172] avg loss 0.00109476, throughput 2.73596K wps
[Epoch 157 Batch 90/172] avg loss 0.000877401, throughput 2.66114K wps
[Epoch 157 Batch 120/172] avg loss 0.000905128, throughput 2.76928K wps
[Epoch 157 Batch 150/172] avg loss 0.000839213, throughput 2.76277K wps
Begin Testing...
[Epoch 157] train avg loss 0.000906489, dev acc 0.8742, dev avg loss 0.450035, throughput 2.75519K wps
[Epoch 158 Batch 30/172] avg loss 0.000852656, throughput 2.67633K wps
[Epoch 158 Batch 60/172] avg loss 0.000732638, throughput 2.77543K wps
[Epoch 158 Batch 90/172] avg loss 0.000953941, throughput 2.71452K wps
[Epoch 158 Batch 120/172] avg loss 0.000888382, throughput 2.79859K wps
[Epoch 158 Batch 150/172] avg loss 0.00101701, throughput 2.80919K wps
Begin Testing...
[Epoch 158] train avg loss 0.000888475, dev acc 0.8732, dev avg loss 0.461406, throughput 2.75554K wps
[Epoch 159 Batch 30/172] avg loss 0.000806251, throughput 2.84297K wps
[Epoch 159 Batch 60/172] avg loss 0.000693552, throughput 2.72828K wps
[Epoch 159 Batch 90/172] avg loss 0.000828848, throughput 2.79708K wps
[Epoch 159 Batch 120/172] avg loss 0.00104224, throughput 2.78386K wps
[Epoch 159 Batch 150/172] avg loss 0.000743625, throughput 2.82011K wps
Begin Testing...
[Epoch 159] train avg loss 0.000852234, dev acc 0.8732, dev avg loss 0.465084, throughput 2.803K wps
[Epoch 160 Batch 30/172] avg loss 0.000710399, throughput 2.87097K wps
[Epoch 160 Batch 60/172] avg loss 0.000898374, throughput 2.8017K wps
[Epoch 160 Batch 90/172] avg loss 0.000746714, throughput 2.64611K wps
[Epoch 160 Batch 120/172] avg loss 0.00113817, throughput 2.67242K wps
[Epoch 160 Batch 150/172] avg loss 0.000900968, throughput 2.7981K wps
Begin Testing...
[Epoch 160] train avg loss 0.000904836, dev acc 0.8711, dev avg loss 0.458376, throughput 2.76415K wps
[Epoch 161 Batch 30/172] avg loss 0.000748004, throughput 2.87136K wps
[Epoch 161 Batch 60/172] avg loss 0.000807448, throughput 2.77484K wps
[Epoch 161 Batch 90/172] avg loss 0.000978786, throughput 2.75562K wps
[Epoch 161 Batch 120/172] avg loss 0.00101258, throughput 2.82837K wps
[Epoch 161 Batch 150/172] avg loss 0.000985551, throughput 2.7773K wps
Begin Testing...
[Epoch 161] train avg loss 0.000878549, dev acc 0.8711, dev avg loss 0.466883, throughput 2.8033K wps
[Epoch 162 Batch 30/172] avg loss 0.000888088, throughput 2.72632K wps
[Epoch 162 Batch 60/172] avg loss 0.000827426, throughput 2.77759K wps
[Epoch 162 Batch 90/172] avg loss 0.000996832, throughput 2.74002K wps
[Epoch 162 Batch 120/172] avg loss 0.000849034, throughput 2.69677K wps
[Epoch 162 Batch 150/172] avg loss 0.000912488, throughput 2.76247K wps
Begin Testing...
[Epoch 162] train avg loss 0.000902757, dev acc 0.8742, dev avg loss 0.46085, throughput 2.722K wps
[Epoch 163 Batch 30/172] avg loss 0.000728069, throughput 2.68864K wps
[Epoch 163 Batch 60/172] avg loss 0.000915858, throughput 2.71947K wps
[Epoch 163 Batch 90/172] avg loss 0.000623204, throughput 2.66887K wps
[Epoch 163 Batch 120/172] avg loss 0.00105412, throughput 2.65497K wps
[Epoch 163 Batch 150/172] avg loss 0.00104965, throughput 2.66186K wps
Begin Testing...
[Epoch 163] train avg loss 0.000877815, dev acc 0.8679, dev avg loss 0.45596, throughput 2.69022K wps
[Epoch 164 Batch 30/172] avg loss 0.000805169, throughput 2.73691K wps
[Epoch 164 Batch 60/172] avg loss 0.000887354, throughput 2.76093K wps
[Epoch 164 Batch 90/172] avg loss 0.000764448, throughput 2.78446K wps
[Epoch 164 Batch 120/172] avg loss 0.00102809, throughput 2.78605K wps
[Epoch 164 Batch 150/172] avg loss 0.000908322, throughput 2.76914K wps
Begin Testing...
[Epoch 164] train avg loss 0.000879445, dev acc 0.8711, dev avg loss 0.462263, throughput 2.76805K wps
[Epoch 165 Batch 30/172] avg loss 0.000689537, throughput 2.72483K wps
[Epoch 165 Batch 60/172] avg loss 0.00096713, throughput 2.80351K wps
[Epoch 165 Batch 90/172] avg loss 0.000737198, throughput 2.79395K wps
[Epoch 165 Batch 120/172] avg loss 0.000702553, throughput 2.7925K wps
[Epoch 165 Batch 150/172] avg loss 0.000874117, throughput 2.8185K wps
Begin Testing...
[Epoch 165] train avg loss 0.000853938, dev acc 0.8700, dev avg loss 0.468094, throughput 2.78995K wps
[Epoch 166 Batch 30/172] avg loss 0.000808709, throughput 2.83517K wps
[Epoch 166 Batch 60/172] avg loss 0.00106704, throughput 2.81343K wps
[Epoch 166 Batch 90/172] avg loss 0.000853551, throughput 2.83407K wps
[Epoch 166 Batch 120/172] avg loss 0.00081182, throughput 2.83081K wps
[Epoch 166 Batch 150/172] avg loss 0.000855458, throughput 2.75815K wps
Begin Testing...
[Epoch 166] train avg loss 0.000863159, dev acc 0.8711, dev avg loss 0.46052, throughput 2.81286K wps
[Epoch 167 Batch 30/172] avg loss 0.000800157, throughput 2.79022K wps
[Epoch 167 Batch 60/172] avg loss 0.00102637, throughput 2.80391K wps
[Epoch 167 Batch 90/172] avg loss 0.00088012, throughput 2.81655K wps
[Epoch 167 Batch 120/172] avg loss 0.000708802, throughput 2.80342K wps
[Epoch 167 Batch 150/172] avg loss 0.000961534, throughput 2.8342K wps
Begin Testing...
[Epoch 167] train avg loss 0.000877277, dev acc 0.8700, dev avg loss 0.477739, throughput 2.80845K wps
[Epoch 168 Batch 30/172] avg loss 0.000781547, throughput 2.66917K wps
[Epoch 168 Batch 60/172] avg loss 0.000741335, throughput 2.86826K wps
[Epoch 168 Batch 90/172] avg loss 0.000917062, throughput 2.81066K wps
[Epoch 168 Batch 120/172] avg loss 0.000990093, throughput 2.72512K wps
[Epoch 168 Batch 150/172] avg loss 0.000891019, throughput 2.84183K wps
Begin Testing...
[Epoch 168] train avg loss 0.000853354, dev acc 0.8732, dev avg loss 0.47104, throughput 2.78039K wps
[Epoch 169 Batch 30/172] avg loss 0.00067317, throughput 2.86458K wps
[Epoch 169 Batch 60/172] avg loss 0.00107855, throughput 2.80588K wps
[Epoch 169 Batch 90/172] avg loss 0.000953784, throughput 2.73608K wps
[Epoch 169 Batch 120/172] avg loss 0.000790842, throughput 2.7589K wps
[Epoch 169 Batch 150/172] avg loss 0.000959324, throughput 2.74793K wps
Begin Testing...
[Epoch 169] train avg loss 0.000881351, dev acc 0.8721, dev avg loss 0.466206, throughput 2.7837K wps
[Epoch 170 Batch 30/172] avg loss 0.000797064, throughput 2.77138K wps
[Epoch 170 Batch 60/172] avg loss 0.00073895, throughput 2.66972K wps
[Epoch 170 Batch 90/172] avg loss 0.000846745, throughput 2.83634K wps
[Epoch 170 Batch 120/172] avg loss 0.000931336, throughput 2.73625K wps
[Epoch 170 Batch 150/172] avg loss 0.000827159, throughput 2.71255K wps
Begin Testing...
[Epoch 170] train avg loss 0.000848916, dev acc 0.8700, dev avg loss 0.466908, throughput 2.74865K wps
[Epoch 171 Batch 30/172] avg loss 0.000808316, throughput 2.70488K wps
[Epoch 171 Batch 60/172] avg loss 0.00072929, throughput 2.77604K wps
[Epoch 171 Batch 90/172] avg loss 0.000818188, throughput 2.76497K wps
[Epoch 171 Batch 120/172] avg loss 0.00105473, throughput 2.75695K wps
[Epoch 171 Batch 150/172] avg loss 0.000971618, throughput 2.76594K wps
Begin Testing...
[Epoch 171] train avg loss 0.00086169, dev acc 0.8721, dev avg loss 0.471018, throughput 2.7452K wps
[Epoch 172 Batch 30/172] avg loss 0.000984327, throughput 2.70639K wps
[Epoch 172 Batch 60/172] avg loss 0.00103375, throughput 2.81583K wps
[Epoch 172 Batch 90/172] avg loss 0.000795655, throughput 2.63978K wps
[Epoch 172 Batch 120/172] avg loss 0.000669184, throughput 2.72211K wps
[Epoch 172 Batch 150/172] avg loss 0.000766531, throughput 2.69835K wps
Begin Testing...
[Epoch 172] train avg loss 0.000867683, dev acc 0.8732, dev avg loss 0.481165, throughput 2.71879K wps
[Epoch 173 Batch 30/172] avg loss 0.000837171, throughput 2.82314K wps
[Epoch 173 Batch 60/172] avg loss 0.000679151, throughput 2.79209K wps
[Epoch 173 Batch 90/172] avg loss 0.000796199, throughput 2.81527K wps
[Epoch 173 Batch 120/172] avg loss 0.000823653, throughput 2.74245K wps
[Epoch 173 Batch 150/172] avg loss 0.0010244, throughput 2.76724K wps
Begin Testing...
[Epoch 173] train avg loss 0.00085182, dev acc 0.8732, dev avg loss 0.48022, throughput 2.78762K wps
[Epoch 174 Batch 30/172] avg loss 0.00100329, throughput 2.844K wps
[Epoch 174 Batch 60/172] avg loss 0.000776342, throughput 2.78338K wps
[Epoch 174 Batch 90/172] avg loss 0.000766287, throughput 2.80494K wps
[Epoch 174 Batch 120/172] avg loss 0.000796132, throughput 2.80531K wps
[Epoch 174 Batch 150/172] avg loss 0.000939006, throughput 2.79857K wps
Begin Testing...
[Epoch 174] train avg loss 0.000835074, dev acc 0.8711, dev avg loss 0.472496, throughput 2.79552K wps
[Epoch 175 Batch 30/172] avg loss 0.000732412, throughput 2.80426K wps
[Epoch 175 Batch 60/172] avg loss 0.000895973, throughput 2.75005K wps
[Epoch 175 Batch 90/172] avg loss 0.00084425, throughput 2.71432K wps
[Epoch 175 Batch 120/172] avg loss 0.000845784, throughput 2.77134K wps
[Epoch 175 Batch 150/172] avg loss 0.000832473, throughput 2.60072K wps
Begin Testing...
[Epoch 175] train avg loss 0.000841181, dev acc 0.8732, dev avg loss 0.482117, throughput 2.73975K wps
[Epoch 176 Batch 30/172] avg loss 0.000796619, throughput 2.81805K wps
[Epoch 176 Batch 60/172] avg loss 0.000778929, throughput 2.65453K wps
[Epoch 176 Batch 90/172] avg loss 0.000858341, throughput 2.70235K wps
[Epoch 176 Batch 120/172] avg loss 0.00103015, throughput 2.78527K wps
[Epoch 176 Batch 150/172] avg loss 0.000704105, throughput 2.72927K wps
Begin Testing...
[Epoch 176] train avg loss 0.000823036, dev acc 0.8679, dev avg loss 0.480329, throughput 2.73952K wps
[Epoch 177 Batch 30/172] avg loss 0.000729645, throughput 2.81682K wps
[Epoch 177 Batch 60/172] avg loss 0.000741156, throughput 2.76086K wps
[Epoch 177 Batch 90/172] avg loss 0.000796939, throughput 2.76704K wps
[Epoch 177 Batch 120/172] avg loss 0.00107227, throughput 2.76358K wps
[Epoch 177 Batch 150/172] avg loss 0.000916067, throughput 2.64974K wps
Begin Testing...
[Epoch 177] train avg loss 0.000844412, dev acc 0.8742, dev avg loss 0.475579, throughput 2.73115K wps
[Epoch 178 Batch 30/172] avg loss 0.000731754, throughput 2.56172K wps
[Epoch 178 Batch 60/172] avg loss 0.000797616, throughput 2.7906K wps
[Epoch 178 Batch 90/172] avg loss 0.00091756, throughput 2.65775K wps
[Epoch 178 Batch 120/172] avg loss 0.000858203, throughput 2.72241K wps
[Epoch 178 Batch 150/172] avg loss 0.000862688, throughput 2.74775K wps
Begin Testing...
[Epoch 178] train avg loss 0.00082476, dev acc 0.8700, dev avg loss 0.479895, throughput 2.70574K wps
[Epoch 179 Batch 30/172] avg loss 0.000834111, throughput 2.74225K wps
[Epoch 179 Batch 60/172] avg loss 0.000801919, throughput 2.6714K wps
[Epoch 179 Batch 90/172] avg loss 0.000917084, throughput 2.77848K wps
[Epoch 179 Batch 120/172] avg loss 0.00074404, throughput 2.68947K wps
[Epoch 179 Batch 150/172] avg loss 0.000979546, throughput 2.73789K wps
Begin Testing...
[Epoch 179] train avg loss 0.000825009, dev acc 0.8700, dev avg loss 0.473855, throughput 2.69903K wps
[Epoch 180 Batch 30/172] avg loss 0.000739747, throughput 2.81938K wps
[Epoch 180 Batch 60/172] avg loss 0.000817118, throughput 2.82609K wps
[Epoch 180 Batch 90/172] avg loss 0.000791916, throughput 2.8138K wps
[Epoch 180 Batch 120/172] avg loss 0.000884572, throughput 2.73272K wps
[Epoch 180 Batch 150/172] avg loss 0.000800232, throughput 2.89169K wps
Begin Testing...
[Epoch 180] train avg loss 0.000818281, dev acc 0.8690, dev avg loss 0.478644, throughput 2.80363K wps
[Epoch 181 Batch 30/172] avg loss 0.000726339, throughput 2.72669K wps
[Epoch 181 Batch 60/172] avg loss 0.000713861, throughput 2.71406K wps
[Epoch 181 Batch 90/172] avg loss 0.000885447, throughput 2.78721K wps
[Epoch 181 Batch 120/172] avg loss 0.000909579, throughput 2.69606K wps
[Epoch 181 Batch 150/172] avg loss 0.000799729, throughput 2.77579K wps
Begin Testing...
[Epoch 181] train avg loss 0.000825923, dev acc 0.8711, dev avg loss 0.480638, throughput 2.74551K wps
[Epoch 182 Batch 30/172] avg loss 0.000807491, throughput 2.84331K wps
[Epoch 182 Batch 60/172] avg loss 0.000734963, throughput 2.66337K wps
[Epoch 182 Batch 90/172] avg loss 0.00072024, throughput 2.78664K wps
[Epoch 182 Batch 120/172] avg loss 0.00080832, throughput 2.67837K wps
[Epoch 182 Batch 150/172] avg loss 0.000818118, throughput 2.78709K wps
Begin Testing...
[Epoch 182] train avg loss 0.00079498, dev acc 0.8711, dev avg loss 0.480558, throughput 2.75465K wps
[Epoch 183 Batch 30/172] avg loss 0.000693203, throughput 2.6893K wps
[Epoch 183 Batch 60/172] avg loss 0.000625236, throughput 2.75401K wps
[Epoch 183 Batch 90/172] avg loss 0.000858011, throughput 2.79267K wps
[Epoch 183 Batch 120/172] avg loss 0.000969461, throughput 2.79473K wps
[Epoch 183 Batch 150/172] avg loss 0.000717532, throughput 2.79439K wps
Begin Testing...
[Epoch 183] train avg loss 0.000812295, dev acc 0.8711, dev avg loss 0.474945, throughput 2.75984K wps
[Epoch 184 Batch 30/172] avg loss 0.000733458, throughput 2.78315K wps
[Epoch 184 Batch 60/172] avg loss 0.000860489, throughput 2.74456K wps
[Epoch 184 Batch 90/172] avg loss 0.000799189, throughput 2.80287K wps
[Epoch 184 Batch 120/172] avg loss 0.000837319, throughput 2.73709K wps
[Epoch 184 Batch 150/172] avg loss 0.000802658, throughput 2.79895K wps
Begin Testing...
[Epoch 184] train avg loss 0.000821396, dev acc 0.8700, dev avg loss 0.477323, throughput 2.77745K wps
[Epoch 185 Batch 30/172] avg loss 0.00088537, throughput 2.79457K wps
[Epoch 185 Batch 60/172] avg loss 0.000660466, throughput 2.78399K wps
[Epoch 185 Batch 90/172] avg loss 0.000800615, throughput 2.85829K wps
[Epoch 185 Batch 120/172] avg loss 0.000776312, throughput 2.81484K wps
[Epoch 185 Batch 150/172] avg loss 0.000845127, throughput 2.83254K wps
Begin Testing...
[Epoch 185] train avg loss 0.000811309, dev acc 0.8753, dev avg loss 0.481912, throughput 2.81356K wps
[Epoch 186 Batch 30/172] avg loss 0.000940065, throughput 2.80436K wps
[Epoch 186 Batch 60/172] avg loss 0.000955417, throughput 2.78027K wps
[Epoch 186 Batch 90/172] avg loss 0.000768652, throughput 2.81709K wps
[Epoch 186 Batch 120/172] avg loss 0.000768773, throughput 2.66522K wps
[Epoch 186 Batch 150/172] avg loss 0.000775231, throughput 2.58268K wps
Begin Testing...
[Epoch 186] train avg loss 0.000828164, dev acc 0.8826, dev avg loss 0.509061, throughput 2.74268K wps
[Epoch 187 Batch 30/172] avg loss 0.000680193, throughput 2.7373K wps
[Epoch 187 Batch 60/172] avg loss 0.000953492, throughput 2.71598K wps
[Epoch 187 Batch 90/172] avg loss 0.000939858, throughput 2.80725K wps
[Epoch 187 Batch 120/172] avg loss 0.000736879, throughput 2.78069K wps
[Epoch 187 Batch 150/172] avg loss 0.000779854, throughput 2.78161K wps
Begin Testing...
[Epoch 187] train avg loss 0.000808063, dev acc 0.8711, dev avg loss 0.48065, throughput 2.75653K wps
[Epoch 188 Batch 30/172] avg loss 0.000805495, throughput 2.84482K wps
[Epoch 188 Batch 60/172] avg loss 0.000800908, throughput 2.62924K wps
[Epoch 188 Batch 90/172] avg loss 0.000813489, throughput 2.76418K wps
[Epoch 188 Batch 120/172] avg loss 0.000771188, throughput 2.73234K wps
[Epoch 188 Batch 150/172] avg loss 0.000723598, throughput 2.79468K wps
Begin Testing...
[Epoch 188] train avg loss 0.00080685, dev acc 0.8784, dev avg loss 0.487294, throughput 2.75886K wps
[Epoch 189 Batch 30/172] avg loss 0.000726026, throughput 2.82849K wps
[Epoch 189 Batch 60/172] avg loss 0.000711272, throughput 2.78844K wps
[Epoch 189 Batch 90/172] avg loss 0.000691993, throughput 2.79629K wps
[Epoch 189 Batch 120/172] avg loss 0.000833766, throughput 2.62555K wps
[Epoch 189 Batch 150/172] avg loss 0.000959169, throughput 2.62461K wps
Begin Testing...
[Epoch 189] train avg loss 0.000766061, dev acc 0.8753, dev avg loss 0.494581, throughput 2.7335K wps
[Epoch 190 Batch 30/172] avg loss 0.000685618, throughput 2.7984K wps
[Epoch 190 Batch 60/172] avg loss 0.000595965, throughput 2.80056K wps
[Epoch 190 Batch 90/172] avg loss 0.000695524, throughput 2.78629K wps
[Epoch 190 Batch 120/172] avg loss 0.000882943, throughput 2.76543K wps
[Epoch 190 Batch 150/172] avg loss 0.000853942, throughput 2.76827K wps
Begin Testing...
[Epoch 190] train avg loss 0.000752419, dev acc 0.8711, dev avg loss 0.483019, throughput 2.78021K wps
[Epoch 191 Batch 30/172] avg loss 0.000610477, throughput 2.81658K wps
[Epoch 191 Batch 60/172] avg loss 0.00074989, throughput 2.75136K wps
[Epoch 191 Batch 90/172] avg loss 0.000655798, throughput 2.71864K wps
[Epoch 191 Batch 120/172] avg loss 0.000936517, throughput 2.69867K wps
[Epoch 191 Batch 150/172] avg loss 0.000903703, throughput 2.75975K wps
Begin Testing...
[Epoch 191] train avg loss 0.000781181, dev acc 0.8711, dev avg loss 0.489476, throughput 2.75669K wps
[Epoch 192 Batch 30/172] avg loss 0.000697644, throughput 2.86576K wps
[Epoch 192 Batch 60/172] avg loss 0.000908484, throughput 2.79065K wps
[Epoch 192 Batch 90/172] avg loss 0.000629207, throughput 2.65305K wps
[Epoch 192 Batch 120/172] avg loss 0.000887577, throughput 2.60227K wps
[Epoch 192 Batch 150/172] avg loss 0.000703827, throughput 2.82593K wps
Begin Testing...
[Epoch 192] train avg loss 0.000777025, dev acc 0.8753, dev avg loss 0.504626, throughput 2.72318K wps
[Epoch 193 Batch 30/172] avg loss 0.000864716, throughput 2.84091K wps
[Epoch 193 Batch 60/172] avg loss 0.000708461, throughput 2.78489K wps
[Epoch 193 Batch 90/172] avg loss 0.00083726, throughput 2.71475K wps
[Epoch 193 Batch 120/172] avg loss 0.000701726, throughput 2.6984K wps
[Epoch 193 Batch 150/172] avg loss 0.000851984, throughput 2.80764K wps
Begin Testing...
[Epoch 193] train avg loss 0.000781147, dev acc 0.8711, dev avg loss 0.494324, throughput 2.77298K wps
[Epoch 194 Batch 30/172] avg loss 0.000709147, throughput 2.70957K wps
[Epoch 194 Batch 60/172] avg loss 0.000953305, throughput 2.75718K wps
[Epoch 194 Batch 90/172] avg loss 0.000705847, throughput 2.75398K wps
[Epoch 194 Batch 120/172] avg loss 0.000746758, throughput 2.80841K wps
[Epoch 194 Batch 150/172] avg loss 0.000757778, throughput 2.74521K wps
Begin Testing...
[Epoch 194] train avg loss 0.000788853, dev acc 0.8742, dev avg loss 0.498788, throughput 2.76176K wps
[Epoch 195 Batch 30/172] avg loss 0.000864406, throughput 2.85474K wps
[Epoch 195 Batch 60/172] avg loss 0.000758295, throughput 2.79391K wps
[Epoch 195 Batch 90/172] avg loss 0.000845076, throughput 2.77176K wps
[Epoch 195 Batch 120/172] avg loss 0.000690255, throughput 2.74676K wps
[Epoch 195 Batch 150/172] avg loss 0.000687588, throughput 2.79396K wps
Begin Testing...
[Epoch 195] train avg loss 0.00078932, dev acc 0.8721, dev avg loss 0.486198, throughput 2.78833K wps
[Epoch 196 Batch 30/172] avg loss 0.000676361, throughput 2.76132K wps
[Epoch 196 Batch 60/172] avg loss 0.000963646, throughput 2.80529K wps
[Epoch 196 Batch 90/172] avg loss 0.000748857, throughput 2.79966K wps
[Epoch 196 Batch 120/172] avg loss 0.000796571, throughput 2.70252K wps
[Epoch 196 Batch 150/172] avg loss 0.000746429, throughput 2.72945K wps
Begin Testing...
[Epoch 196] train avg loss 0.000779163, dev acc 0.8774, dev avg loss 0.498773, throughput 2.76354K wps
[Epoch 197 Batch 30/172] avg loss 0.000599231, throughput 2.75941K wps
[Epoch 197 Batch 60/172] avg loss 0.000751369, throughput 2.69319K wps
[Epoch 197 Batch 90/172] avg loss 0.000993334, throughput 2.8069K wps
[Epoch 197 Batch 120/172] avg loss 0.000925902, throughput 2.77191K wps
[Epoch 197 Batch 150/172] avg loss 0.000735124, throughput 2.7517K wps
Begin Testing...
[Epoch 197] train avg loss 0.000769379, dev acc 0.8711, dev avg loss 0.490714, throughput 2.75154K wps
[Epoch 198 Batch 30/172] avg loss 0.000789001, throughput 2.81343K wps
[Epoch 198 Batch 60/172] avg loss 0.000964735, throughput 2.75129K wps
[Epoch 198 Batch 90/172] avg loss 0.000536206, throughput 2.69931K wps
[Epoch 198 Batch 120/172] avg loss 0.000843158, throughput 2.64836K wps
[Epoch 198 Batch 150/172] avg loss 0.000829274, throughput 2.75596K wps
Begin Testing...
[Epoch 198] train avg loss 0.000786078, dev acc 0.8774, dev avg loss 0.498968, throughput 2.73493K wps
[Epoch 199 Batch 30/172] avg loss 0.000920825, throughput 2.7975K wps
[Epoch 199 Batch 60/172] avg loss 0.000663322, throughput 2.69563K wps
[Epoch 199 Batch 90/172] avg loss 0.000623232, throughput 2.77122K wps
[Epoch 199 Batch 120/172] avg loss 0.000822723, throughput 2.70779K wps
[Epoch 199 Batch 150/172] avg loss 0.000799291, throughput 2.75252K wps
Begin Testing...
[Epoch 199] train avg loss 0.000775892, dev acc 0.8711, dev avg loss 0.497628, throughput 2.73623K wps
Test loss 0.256892, test acc 0.9057
Total time cost 473.53s
[Epoch 0 Batch 30/172] avg loss 0.0131373, throughput 2.59146K wps
[Epoch 0 Batch 60/172] avg loss 0.0125239, throughput 2.75564K wps
[Epoch 0 Batch 90/172] avg loss 0.0123566, throughput 2.76097K wps
[Epoch 0 Batch 120/172] avg loss 0.0120741, throughput 2.7402K wps
[Epoch 0 Batch 150/172] avg loss 0.0118945, throughput 2.75203K wps
Begin Testing...
[Epoch 0] train avg loss 0.0124149, dev acc 0.7044, dev avg loss 0.589785, throughput 2.69997K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0119504, throughput 2.74214K wps
[Epoch 1 Batch 60/172] avg loss 0.0119846, throughput 2.79442K wps
[Epoch 1 Batch 90/172] avg loss 0.0119617, throughput 2.59029K wps
[Epoch 1 Batch 120/172] avg loss 0.01178, throughput 2.68375K wps
[Epoch 1 Batch 150/172] avg loss 0.012055, throughput 2.86189K wps
Begin Testing...
[Epoch 1] train avg loss 0.0119936, dev acc 0.7044, dev avg loss 0.576995, throughput 2.74235K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0118324, throughput 2.76974K wps
[Epoch 2 Batch 60/172] avg loss 0.0117365, throughput 2.71017K wps
[Epoch 2 Batch 90/172] avg loss 0.0114759, throughput 2.73746K wps
[Epoch 2 Batch 120/172] avg loss 0.0115758, throughput 2.7068K wps
[Epoch 2 Batch 150/172] avg loss 0.0115457, throughput 2.65469K wps
Begin Testing...
[Epoch 2] train avg loss 0.0116645, dev acc 0.7044, dev avg loss 0.55857, throughput 2.72481K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.0117162, throughput 2.77739K wps
[Epoch 3 Batch 60/172] avg loss 0.0111587, throughput 2.84164K wps
[Epoch 3 Batch 90/172] avg loss 0.0112501, throughput 2.73665K wps
[Epoch 3 Batch 120/172] avg loss 0.0110153, throughput 2.81157K wps
[Epoch 3 Batch 150/172] avg loss 0.0109102, throughput 2.77419K wps
Begin Testing...
[Epoch 3] train avg loss 0.0112334, dev acc 0.7128, dev avg loss 0.537125, throughput 2.78628K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0111619, throughput 2.81634K wps
[Epoch 4 Batch 60/172] avg loss 0.010741, throughput 2.76584K wps
[Epoch 4 Batch 90/172] avg loss 0.0106798, throughput 2.82143K wps
[Epoch 4 Batch 120/172] avg loss 0.010655, throughput 2.77283K wps
[Epoch 4 Batch 150/172] avg loss 0.0106989, throughput 2.79104K wps
Begin Testing...
[Epoch 4] train avg loss 0.0107446, dev acc 0.7212, dev avg loss 0.512518, throughput 2.79663K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0105371, throughput 2.62461K wps
[Epoch 5 Batch 60/172] avg loss 0.0104867, throughput 2.83768K wps
[Epoch 5 Batch 90/172] avg loss 0.0105452, throughput 2.84269K wps
[Epoch 5 Batch 120/172] avg loss 0.0100985, throughput 2.81725K wps
[Epoch 5 Batch 150/172] avg loss 0.00986894, throughput 2.80219K wps
Begin Testing...
[Epoch 5] train avg loss 0.01027, dev acc 0.7788, dev avg loss 0.485723, throughput 2.78622K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.00980251, throughput 2.88007K wps
[Epoch 6 Batch 60/172] avg loss 0.00992466, throughput 2.66977K wps
[Epoch 6 Batch 90/172] avg loss 0.00967799, throughput 2.68856K wps
[Epoch 6 Batch 120/172] avg loss 0.00939038, throughput 2.84697K wps
[Epoch 6 Batch 150/172] avg loss 0.00951754, throughput 2.80308K wps
Begin Testing...
[Epoch 6] train avg loss 0.00961438, dev acc 0.7925, dev avg loss 0.455996, throughput 2.77978K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.00905505, throughput 2.84292K wps
[Epoch 7 Batch 60/172] avg loss 0.00916517, throughput 2.80059K wps
[Epoch 7 Batch 90/172] avg loss 0.00907488, throughput 2.75589K wps
[Epoch 7 Batch 120/172] avg loss 0.00892741, throughput 2.77736K wps
[Epoch 7 Batch 150/172] avg loss 0.00863546, throughput 2.76786K wps
Begin Testing...
[Epoch 7] train avg loss 0.0089573, dev acc 0.8501, dev avg loss 0.429316, throughput 2.78406K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00846913, throughput 2.72986K wps
[Epoch 8 Batch 60/172] avg loss 0.0087132, throughput 2.82137K wps
[Epoch 8 Batch 90/172] avg loss 0.00836626, throughput 2.77357K wps
[Epoch 8 Batch 120/172] avg loss 0.00800821, throughput 2.78392K wps
[Epoch 8 Batch 150/172] avg loss 0.00825895, throughput 2.73022K wps
Begin Testing...
[Epoch 8] train avg loss 0.00833136, dev acc 0.8585, dev avg loss 0.401211, throughput 2.74652K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00797761, throughput 2.8004K wps
[Epoch 9 Batch 60/172] avg loss 0.00785522, throughput 2.79872K wps
[Epoch 9 Batch 90/172] avg loss 0.00807243, throughput 2.80331K wps
[Epoch 9 Batch 120/172] avg loss 0.00760072, throughput 2.79562K wps
[Epoch 9 Batch 150/172] avg loss 0.00765766, throughput 2.76342K wps
Begin Testing...
[Epoch 9] train avg loss 0.00783154, dev acc 0.8606, dev avg loss 0.37628, throughput 2.79427K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00740254, throughput 2.73746K wps
[Epoch 10 Batch 60/172] avg loss 0.00688458, throughput 2.76341K wps
[Epoch 10 Batch 90/172] avg loss 0.00748682, throughput 2.71771K wps
[Epoch 10 Batch 120/172] avg loss 0.0071925, throughput 2.75132K wps
[Epoch 10 Batch 150/172] avg loss 0.00727412, throughput 2.80261K wps
Begin Testing...
[Epoch 10] train avg loss 0.00727203, dev acc 0.8690, dev avg loss 0.358048, throughput 2.74969K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00684173, throughput 2.81585K wps
[Epoch 11 Batch 60/172] avg loss 0.00688315, throughput 2.70208K wps
[Epoch 11 Batch 90/172] avg loss 0.00670464, throughput 2.77176K wps
[Epoch 11 Batch 120/172] avg loss 0.00704115, throughput 2.78991K wps
[Epoch 11 Batch 150/172] avg loss 0.0066915, throughput 2.79906K wps
Begin Testing...
[Epoch 11] train avg loss 0.00684602, dev acc 0.8732, dev avg loss 0.342765, throughput 2.7771K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.00671542, throughput 2.67047K wps
[Epoch 12 Batch 60/172] avg loss 0.00703948, throughput 2.82343K wps
[Epoch 12 Batch 90/172] avg loss 0.00624867, throughput 2.74556K wps
[Epoch 12 Batch 120/172] avg loss 0.00639954, throughput 2.76306K wps
[Epoch 12 Batch 150/172] avg loss 0.00610893, throughput 2.74937K wps
Begin Testing...
[Epoch 12] train avg loss 0.00653125, dev acc 0.8721, dev avg loss 0.330847, throughput 2.74554K wps
[Epoch 13 Batch 30/172] avg loss 0.00642252, throughput 2.80828K wps
[Epoch 13 Batch 60/172] avg loss 0.00668808, throughput 2.79033K wps
[Epoch 13 Batch 90/172] avg loss 0.00646491, throughput 2.68775K wps
[Epoch 13 Batch 120/172] avg loss 0.00596648, throughput 2.799K wps
[Epoch 13 Batch 150/172] avg loss 0.00616181, throughput 2.7821K wps
Begin Testing...
[Epoch 13] train avg loss 0.0062618, dev acc 0.8774, dev avg loss 0.32058, throughput 2.77378K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.00596216, throughput 2.80959K wps
[Epoch 14 Batch 60/172] avg loss 0.00582804, throughput 2.8361K wps
[Epoch 14 Batch 90/172] avg loss 0.00648938, throughput 2.79623K wps
[Epoch 14 Batch 120/172] avg loss 0.00589467, throughput 2.7834K wps
[Epoch 14 Batch 150/172] avg loss 0.00631527, throughput 2.58092K wps
Begin Testing...
[Epoch 14] train avg loss 0.00599615, dev acc 0.8784, dev avg loss 0.313223, throughput 2.75972K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/172] avg loss 0.00582781, throughput 2.82663K wps
[Epoch 15 Batch 60/172] avg loss 0.00568534, throughput 2.78817K wps
[Epoch 15 Batch 90/172] avg loss 0.00539156, throughput 2.76894K wps
[Epoch 15 Batch 120/172] avg loss 0.00579352, throughput 2.79339K wps
[Epoch 15 Batch 150/172] avg loss 0.00582054, throughput 2.77162K wps
Begin Testing...
[Epoch 15] train avg loss 0.00576198, dev acc 0.8805, dev avg loss 0.306953, throughput 2.79187K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.00562024, throughput 2.84345K wps
[Epoch 16 Batch 60/172] avg loss 0.00569581, throughput 2.78221K wps
[Epoch 16 Batch 90/172] avg loss 0.00570161, throughput 2.80012K wps
[Epoch 16 Batch 120/172] avg loss 0.00533464, throughput 2.80906K wps
[Epoch 16 Batch 150/172] avg loss 0.00534984, throughput 2.8162K wps
Begin Testing...
[Epoch 16] train avg loss 0.00556377, dev acc 0.8784, dev avg loss 0.303259, throughput 2.8069K wps
[Epoch 17 Batch 30/172] avg loss 0.00551758, throughput 2.749K wps
[Epoch 17 Batch 60/172] avg loss 0.00501987, throughput 2.79731K wps
[Epoch 17 Batch 90/172] avg loss 0.00514349, throughput 2.80436K wps
[Epoch 17 Batch 120/172] avg loss 0.00577059, throughput 2.81798K wps
[Epoch 17 Batch 150/172] avg loss 0.005249, throughput 2.76372K wps
Begin Testing...
[Epoch 17] train avg loss 0.00533308, dev acc 0.8816, dev avg loss 0.298073, throughput 2.78943K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.00508914, throughput 2.87955K wps
[Epoch 18 Batch 60/172] avg loss 0.00501305, throughput 2.77745K wps
[Epoch 18 Batch 90/172] avg loss 0.00510213, throughput 2.74843K wps
[Epoch 18 Batch 120/172] avg loss 0.00527482, throughput 2.78208K wps
[Epoch 18 Batch 150/172] avg loss 0.00508101, throughput 2.77825K wps
Begin Testing...
[Epoch 18] train avg loss 0.00515111, dev acc 0.8836, dev avg loss 0.294454, throughput 2.79239K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00501815, throughput 2.80747K wps
[Epoch 19 Batch 60/172] avg loss 0.00541131, throughput 2.69783K wps
[Epoch 19 Batch 90/172] avg loss 0.00509474, throughput 2.85641K wps
[Epoch 19 Batch 120/172] avg loss 0.00494675, throughput 2.76536K wps
[Epoch 19 Batch 150/172] avg loss 0.00494046, throughput 2.78402K wps
Begin Testing...
[Epoch 19] train avg loss 0.00504375, dev acc 0.8857, dev avg loss 0.291996, throughput 2.78027K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.00533523, throughput 2.85594K wps
[Epoch 20 Batch 60/172] avg loss 0.0049337, throughput 2.80199K wps
[Epoch 20 Batch 90/172] avg loss 0.0045058, throughput 2.80697K wps
[Epoch 20 Batch 120/172] avg loss 0.00469795, throughput 2.81911K wps
[Epoch 20 Batch 150/172] avg loss 0.00456382, throughput 2.65671K wps
Begin Testing...
[Epoch 20] train avg loss 0.00486129, dev acc 0.8857, dev avg loss 0.29004, throughput 2.79187K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/172] avg loss 0.00483244, throughput 2.6361K wps
[Epoch 21 Batch 60/172] avg loss 0.0045837, throughput 2.78376K wps
[Epoch 21 Batch 90/172] avg loss 0.00487844, throughput 2.82675K wps
[Epoch 21 Batch 120/172] avg loss 0.00477775, throughput 2.82858K wps
[Epoch 21 Batch 150/172] avg loss 0.00419469, throughput 2.70861K wps
Begin Testing...
[Epoch 21] train avg loss 0.00469381, dev acc 0.8847, dev avg loss 0.28809, throughput 2.77011K wps
[Epoch 22 Batch 30/172] avg loss 0.00428657, throughput 2.66389K wps
[Epoch 22 Batch 60/172] avg loss 0.00459004, throughput 2.80323K wps
[Epoch 22 Batch 90/172] avg loss 0.00456077, throughput 2.80372K wps
[Epoch 22 Batch 120/172] avg loss 0.00450043, throughput 2.766K wps
[Epoch 22 Batch 150/172] avg loss 0.00465466, throughput 2.80495K wps
Begin Testing...
[Epoch 22] train avg loss 0.00455339, dev acc 0.8826, dev avg loss 0.287089, throughput 2.76153K wps
[Epoch 23 Batch 30/172] avg loss 0.00442038, throughput 2.83225K wps
[Epoch 23 Batch 60/172] avg loss 0.00447073, throughput 2.79008K wps
[Epoch 23 Batch 90/172] avg loss 0.00455786, throughput 2.84557K wps
[Epoch 23 Batch 120/172] avg loss 0.00430694, throughput 2.82226K wps
[Epoch 23 Batch 150/172] avg loss 0.00440529, throughput 2.74353K wps
Begin Testing...
[Epoch 23] train avg loss 0.00444871, dev acc 0.8878, dev avg loss 0.2873, throughput 2.81855K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/172] avg loss 0.00417104, throughput 2.84032K wps
[Epoch 24 Batch 60/172] avg loss 0.00402329, throughput 2.79827K wps
[Epoch 24 Batch 90/172] avg loss 0.00417842, throughput 2.78614K wps
[Epoch 24 Batch 120/172] avg loss 0.00428873, throughput 2.78448K wps
[Epoch 24 Batch 150/172] avg loss 0.00466145, throughput 2.75579K wps
Begin Testing...
[Epoch 24] train avg loss 0.00426421, dev acc 0.8816, dev avg loss 0.285206, throughput 2.7917K wps
[Epoch 25 Batch 30/172] avg loss 0.00413204, throughput 2.6048K wps
[Epoch 25 Batch 60/172] avg loss 0.00446766, throughput 2.81928K wps
[Epoch 25 Batch 90/172] avg loss 0.00420557, throughput 2.81608K wps
[Epoch 25 Batch 120/172] avg loss 0.00440776, throughput 2.81893K wps
[Epoch 25 Batch 150/172] avg loss 0.00408532, throughput 2.78851K wps
Begin Testing...
[Epoch 25] train avg loss 0.00424522, dev acc 0.8868, dev avg loss 0.285771, throughput 2.76364K wps
[Epoch 26 Batch 30/172] avg loss 0.00380794, throughput 2.83977K wps
[Epoch 26 Batch 60/172] avg loss 0.00395309, throughput 2.80281K wps
[Epoch 26 Batch 90/172] avg loss 0.0045133, throughput 2.81034K wps
[Epoch 26 Batch 120/172] avg loss 0.00420981, throughput 2.6871K wps
[Epoch 26 Batch 150/172] avg loss 0.00406385, throughput 2.75836K wps
Begin Testing...
[Epoch 26] train avg loss 0.00409114, dev acc 0.8857, dev avg loss 0.285865, throughput 2.77118K wps
[Epoch 27 Batch 30/172] avg loss 0.00384225, throughput 2.7485K wps
[Epoch 27 Batch 60/172] avg loss 0.00377498, throughput 2.78499K wps
[Epoch 27 Batch 90/172] avg loss 0.00406797, throughput 2.79289K wps
[Epoch 27 Batch 120/172] avg loss 0.00432475, throughput 2.72669K wps
[Epoch 27 Batch 150/172] avg loss 0.00419415, throughput 2.7839K wps
Begin Testing...
[Epoch 27] train avg loss 0.00403179, dev acc 0.8857, dev avg loss 0.285466, throughput 2.76955K wps
[Epoch 28 Batch 30/172] avg loss 0.00372246, throughput 2.76099K wps
[Epoch 28 Batch 60/172] avg loss 0.0037963, throughput 2.70492K wps
[Epoch 28 Batch 90/172] avg loss 0.00393428, throughput 2.75635K wps
[Epoch 28 Batch 120/172] avg loss 0.00370553, throughput 2.59788K wps
[Epoch 28 Batch 150/172] avg loss 0.00359081, throughput 2.76361K wps
Begin Testing...
[Epoch 28] train avg loss 0.00380253, dev acc 0.8836, dev avg loss 0.285861, throughput 2.70988K wps
[Epoch 29 Batch 30/172] avg loss 0.00403105, throughput 2.82754K wps
[Epoch 29 Batch 60/172] avg loss 0.00368373, throughput 2.6307K wps
[Epoch 29 Batch 90/172] avg loss 0.00389419, throughput 2.81487K wps
[Epoch 29 Batch 120/172] avg loss 0.00374385, throughput 2.78248K wps
[Epoch 29 Batch 150/172] avg loss 0.00335343, throughput 2.68198K wps
Begin Testing...
[Epoch 29] train avg loss 0.0037556, dev acc 0.8847, dev avg loss 0.287041, throughput 2.75167K wps
[Epoch 30 Batch 30/172] avg loss 0.00338948, throughput 2.78043K wps
[Epoch 30 Batch 60/172] avg loss 0.00354425, throughput 2.78384K wps
[Epoch 30 Batch 90/172] avg loss 0.00374193, throughput 2.64477K wps
[Epoch 30 Batch 120/172] avg loss 0.00399542, throughput 2.76179K wps
[Epoch 30 Batch 150/172] avg loss 0.00330784, throughput 2.77727K wps
Begin Testing...
[Epoch 30] train avg loss 0.00361748, dev acc 0.8857, dev avg loss 0.287302, throughput 2.74444K wps
[Epoch 31 Batch 30/172] avg loss 0.00354084, throughput 2.80993K wps
[Epoch 31 Batch 60/172] avg loss 0.00341181, throughput 2.71232K wps
[Epoch 31 Batch 90/172] avg loss 0.00352322, throughput 2.78062K wps
[Epoch 31 Batch 120/172] avg loss 0.00375761, throughput 2.79029K wps
[Epoch 31 Batch 150/172] avg loss 0.00345147, throughput 2.75587K wps
Begin Testing...
[Epoch 31] train avg loss 0.00355962, dev acc 0.8899, dev avg loss 0.289246, throughput 2.76018K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/172] avg loss 0.00346269, throughput 2.83882K wps
[Epoch 32 Batch 60/172] avg loss 0.00348074, throughput 2.72887K wps
[Epoch 32 Batch 90/172] avg loss 0.00302633, throughput 2.71927K wps
[Epoch 32 Batch 120/172] avg loss 0.00365067, throughput 2.76305K wps
[Epoch 32 Batch 150/172] avg loss 0.00346055, throughput 2.80531K wps
Begin Testing...
[Epoch 32] train avg loss 0.00344491, dev acc 0.8878, dev avg loss 0.289336, throughput 2.76791K wps
[Epoch 33 Batch 30/172] avg loss 0.00352471, throughput 2.8684K wps
[Epoch 33 Batch 60/172] avg loss 0.00332482, throughput 2.74885K wps
[Epoch 33 Batch 90/172] avg loss 0.00338412, throughput 2.68743K wps
[Epoch 33 Batch 120/172] avg loss 0.00328894, throughput 2.79434K wps
[Epoch 33 Batch 150/172] avg loss 0.00325721, throughput 2.82908K wps
Begin Testing...
[Epoch 33] train avg loss 0.00336819, dev acc 0.8878, dev avg loss 0.289663, throughput 2.78882K wps
[Epoch 34 Batch 30/172] avg loss 0.00340226, throughput 2.86818K wps
[Epoch 34 Batch 60/172] avg loss 0.00302793, throughput 2.75649K wps
[Epoch 34 Batch 90/172] avg loss 0.00348946, throughput 2.80679K wps
[Epoch 34 Batch 120/172] avg loss 0.00300768, throughput 2.787K wps
[Epoch 34 Batch 150/172] avg loss 0.00358474, throughput 2.80022K wps
Begin Testing...
[Epoch 34] train avg loss 0.00330351, dev acc 0.8910, dev avg loss 0.291094, throughput 2.79489K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/172] avg loss 0.00307887, throughput 2.86321K wps
[Epoch 35 Batch 60/172] avg loss 0.00317199, throughput 2.73838K wps
[Epoch 35 Batch 90/172] avg loss 0.00309814, throughput 2.81437K wps
[Epoch 35 Batch 120/172] avg loss 0.00325504, throughput 2.69138K wps
[Epoch 35 Batch 150/172] avg loss 0.00348061, throughput 2.83814K wps
Begin Testing...
[Epoch 35] train avg loss 0.00321661, dev acc 0.8868, dev avg loss 0.292122, throughput 2.77851K wps
[Epoch 36 Batch 30/172] avg loss 0.00297538, throughput 2.85192K wps
[Epoch 36 Batch 60/172] avg loss 0.00302271, throughput 2.80312K wps
[Epoch 36 Batch 90/172] avg loss 0.00325619, throughput 2.68365K wps
[Epoch 36 Batch 120/172] avg loss 0.00310201, throughput 2.75903K wps
[Epoch 36 Batch 150/172] avg loss 0.00312082, throughput 2.79929K wps
Begin Testing...
[Epoch 36] train avg loss 0.00312176, dev acc 0.8868, dev avg loss 0.293055, throughput 2.76079K wps
[Epoch 37 Batch 30/172] avg loss 0.00301643, throughput 2.81374K wps
[Epoch 37 Batch 60/172] avg loss 0.00310134, throughput 2.74539K wps
[Epoch 37 Batch 90/172] avg loss 0.00298316, throughput 2.77802K wps
[Epoch 37 Batch 120/172] avg loss 0.003219, throughput 2.70511K wps
[Epoch 37 Batch 150/172] avg loss 0.00295469, throughput 2.77525K wps
Begin Testing...
[Epoch 37] train avg loss 0.00304965, dev acc 0.8878, dev avg loss 0.294511, throughput 2.76545K wps
[Epoch 38 Batch 30/172] avg loss 0.00300678, throughput 2.6665K wps
[Epoch 38 Batch 60/172] avg loss 0.00324008, throughput 2.77282K wps
[Epoch 38 Batch 90/172] avg loss 0.00269616, throughput 2.82294K wps
[Epoch 38 Batch 120/172] avg loss 0.00312137, throughput 2.81185K wps
[Epoch 38 Batch 150/172] avg loss 0.00293254, throughput 2.75212K wps
Begin Testing...
[Epoch 38] train avg loss 0.002956, dev acc 0.8899, dev avg loss 0.297442, throughput 2.77014K wps
[Epoch 39 Batch 30/172] avg loss 0.00337141, throughput 2.79658K wps
[Epoch 39 Batch 60/172] avg loss 0.00287595, throughput 2.82398K wps
[Epoch 39 Batch 90/172] avg loss 0.00297794, throughput 2.81921K wps
[Epoch 39 Batch 120/172] avg loss 0.00282803, throughput 2.78412K wps
[Epoch 39 Batch 150/172] avg loss 0.00297045, throughput 2.71631K wps
Begin Testing...
[Epoch 39] train avg loss 0.00294828, dev acc 0.8868, dev avg loss 0.297721, throughput 2.78225K wps
[Epoch 40 Batch 30/172] avg loss 0.00272704, throughput 2.80053K wps
[Epoch 40 Batch 60/172] avg loss 0.00283186, throughput 2.41429K wps
[Epoch 40 Batch 90/172] avg loss 0.00277574, throughput 2.74898K wps
[Epoch 40 Batch 120/172] avg loss 0.00285949, throughput 2.77055K wps
[Epoch 40 Batch 150/172] avg loss 0.00301297, throughput 2.74763K wps
Begin Testing...
[Epoch 40] train avg loss 0.00283616, dev acc 0.8889, dev avg loss 0.301893, throughput 2.67882K wps
[Epoch 41 Batch 30/172] avg loss 0.00282323, throughput 2.82689K wps
[Epoch 41 Batch 60/172] avg loss 0.00255391, throughput 2.70005K wps
[Epoch 41 Batch 90/172] avg loss 0.00275034, throughput 2.74827K wps
[Epoch 41 Batch 120/172] avg loss 0.00279355, throughput 2.77757K wps
[Epoch 41 Batch 150/172] avg loss 0.00281572, throughput 2.7267K wps
Begin Testing...
[Epoch 41] train avg loss 0.00276629, dev acc 0.8878, dev avg loss 0.301584, throughput 2.76912K wps
[Epoch 42 Batch 30/172] avg loss 0.00270426, throughput 2.87745K wps
[Epoch 42 Batch 60/172] avg loss 0.00290312, throughput 2.80901K wps
[Epoch 42 Batch 90/172] avg loss 0.00241155, throughput 2.80089K wps
[Epoch 42 Batch 120/172] avg loss 0.00261054, throughput 2.79045K wps
[Epoch 42 Batch 150/172] avg loss 0.00280836, throughput 2.79236K wps
Begin Testing...
[Epoch 42] train avg loss 0.00273146, dev acc 0.8857, dev avg loss 0.304376, throughput 2.80872K wps
[Epoch 43 Batch 30/172] avg loss 0.00262602, throughput 2.88542K wps
[Epoch 43 Batch 60/172] avg loss 0.00247619, throughput 2.77138K wps
[Epoch 43 Batch 90/172] avg loss 0.00263983, throughput 2.71967K wps
[Epoch 43 Batch 120/172] avg loss 0.00259343, throughput 2.83217K wps
[Epoch 43 Batch 150/172] avg loss 0.00269249, throughput 2.77095K wps
Begin Testing...
[Epoch 43] train avg loss 0.00262, dev acc 0.8878, dev avg loss 0.306375, throughput 2.79226K wps
[Epoch 44 Batch 30/172] avg loss 0.00272898, throughput 2.87852K wps
[Epoch 44 Batch 60/172] avg loss 0.00285822, throughput 2.81088K wps
[Epoch 44 Batch 90/172] avg loss 0.00233492, throughput 2.75189K wps
[Epoch 44 Batch 120/172] avg loss 0.00266598, throughput 2.78382K wps
[Epoch 44 Batch 150/172] avg loss 0.00275345, throughput 2.68034K wps
Begin Testing...
[Epoch 44] train avg loss 0.00259829, dev acc 0.8847, dev avg loss 0.309162, throughput 2.74494K wps
[Epoch 45 Batch 30/172] avg loss 0.00234337, throughput 2.69056K wps
[Epoch 45 Batch 60/172] avg loss 0.00263534, throughput 2.69922K wps
[Epoch 45 Batch 90/172] avg loss 0.00232578, throughput 2.83042K wps
[Epoch 45 Batch 120/172] avg loss 0.00265411, throughput 2.76984K wps
[Epoch 45 Batch 150/172] avg loss 0.00264653, throughput 2.75543K wps
Begin Testing...
[Epoch 45] train avg loss 0.00250816, dev acc 0.8889, dev avg loss 0.312424, throughput 2.74399K wps
[Epoch 46 Batch 30/172] avg loss 0.00251427, throughput 2.72116K wps
[Epoch 46 Batch 60/172] avg loss 0.00231817, throughput 2.79971K wps
[Epoch 46 Batch 90/172] avg loss 0.00234816, throughput 2.80045K wps
[Epoch 46 Batch 120/172] avg loss 0.00283449, throughput 2.63549K wps
[Epoch 46 Batch 150/172] avg loss 0.00260049, throughput 2.80434K wps
Begin Testing...
[Epoch 46] train avg loss 0.0025144, dev acc 0.8878, dev avg loss 0.314558, throughput 2.75412K wps
[Epoch 47 Batch 30/172] avg loss 0.00233328, throughput 2.82845K wps
[Epoch 47 Batch 60/172] avg loss 0.00228134, throughput 2.77382K wps
[Epoch 47 Batch 90/172] avg loss 0.00243461, throughput 2.81205K wps
[Epoch 47 Batch 120/172] avg loss 0.0025058, throughput 2.76934K wps
[Epoch 47 Batch 150/172] avg loss 0.00238009, throughput 2.8324K wps
Begin Testing...
[Epoch 47] train avg loss 0.00241689, dev acc 0.8878, dev avg loss 0.315957, throughput 2.80579K wps
[Epoch 48 Batch 30/172] avg loss 0.0025597, throughput 2.86205K wps
[Epoch 48 Batch 60/172] avg loss 0.00222537, throughput 2.80033K wps
[Epoch 48 Batch 90/172] avg loss 0.00237711, throughput 2.80123K wps
[Epoch 48 Batch 120/172] avg loss 0.0023988, throughput 2.79148K wps
[Epoch 48 Batch 150/172] avg loss 0.00231713, throughput 2.70141K wps
Begin Testing...
[Epoch 48] train avg loss 0.00239604, dev acc 0.8847, dev avg loss 0.317765, throughput 2.80265K wps
[Epoch 49 Batch 30/172] avg loss 0.00240755, throughput 2.71464K wps
[Epoch 49 Batch 60/172] avg loss 0.00237502, throughput 2.75536K wps
[Epoch 49 Batch 90/172] avg loss 0.00221101, throughput 2.78613K wps
[Epoch 49 Batch 120/172] avg loss 0.0022662, throughput 2.82011K wps
[Epoch 49 Batch 150/172] avg loss 0.00238405, throughput 2.77972K wps
Begin Testing...
[Epoch 49] train avg loss 0.00235862, dev acc 0.8826, dev avg loss 0.320095, throughput 2.77368K wps
[Epoch 50 Batch 30/172] avg loss 0.0020914, throughput 2.84626K wps
[Epoch 50 Batch 60/172] avg loss 0.00216831, throughput 2.71859K wps
[Epoch 50 Batch 90/172] avg loss 0.00240941, throughput 2.66802K wps
[Epoch 50 Batch 120/172] avg loss 0.00216794, throughput 2.77582K wps
[Epoch 50 Batch 150/172] avg loss 0.00252444, throughput 2.6373K wps
Begin Testing...
[Epoch 50] train avg loss 0.00226402, dev acc 0.8836, dev avg loss 0.322792, throughput 2.73527K wps
[Epoch 51 Batch 30/172] avg loss 0.00228373, throughput 2.7301K wps
[Epoch 51 Batch 60/172] avg loss 0.00202529, throughput 2.72353K wps
[Epoch 51 Batch 90/172] avg loss 0.00233075, throughput 2.79138K wps
[Epoch 51 Batch 120/172] avg loss 0.00237932, throughput 2.68659K wps
[Epoch 51 Batch 150/172] avg loss 0.00223769, throughput 2.59501K wps
Begin Testing...
[Epoch 51] train avg loss 0.00224698, dev acc 0.8889, dev avg loss 0.325097, throughput 2.72364K wps
[Epoch 52 Batch 30/172] avg loss 0.0023273, throughput 2.76248K wps
[Epoch 52 Batch 60/172] avg loss 0.00233013, throughput 2.7808K wps
[Epoch 52 Batch 90/172] avg loss 0.00190164, throughput 2.75365K wps
[Epoch 52 Batch 120/172] avg loss 0.00231076, throughput 2.65542K wps
[Epoch 52 Batch 150/172] avg loss 0.00230719, throughput 2.68311K wps
Begin Testing...
[Epoch 52] train avg loss 0.00223711, dev acc 0.8857, dev avg loss 0.326405, throughput 2.73005K wps
[Epoch 53 Batch 30/172] avg loss 0.00196733, throughput 2.65663K wps
[Epoch 53 Batch 60/172] avg loss 0.0022085, throughput 2.72832K wps
[Epoch 53 Batch 90/172] avg loss 0.00191535, throughput 2.78345K wps
[Epoch 53 Batch 120/172] avg loss 0.00213799, throughput 2.77631K wps
[Epoch 53 Batch 150/172] avg loss 0.00230772, throughput 2.78972K wps
Begin Testing...
[Epoch 53] train avg loss 0.00214429, dev acc 0.8878, dev avg loss 0.328913, throughput 2.75303K wps
[Epoch 54 Batch 30/172] avg loss 0.00223129, throughput 2.6102K wps
[Epoch 54 Batch 60/172] avg loss 0.00214438, throughput 2.88127K wps
[Epoch 54 Batch 90/172] avg loss 0.00207556, throughput 2.776K wps
[Epoch 54 Batch 120/172] avg loss 0.00210279, throughput 2.82603K wps
[Epoch 54 Batch 150/172] avg loss 0.00215984, throughput 2.82911K wps
Begin Testing...
[Epoch 54] train avg loss 0.00217739, dev acc 0.8878, dev avg loss 0.330366, throughput 2.78366K wps
[Epoch 55 Batch 30/172] avg loss 0.00201583, throughput 2.79883K wps
[Epoch 55 Batch 60/172] avg loss 0.00244946, throughput 2.79511K wps
[Epoch 55 Batch 90/172] avg loss 0.00198656, throughput 2.77294K wps
[Epoch 55 Batch 120/172] avg loss 0.00206872, throughput 2.71116K wps
[Epoch 55 Batch 150/172] avg loss 0.002114, throughput 2.67626K wps
Begin Testing...
[Epoch 55] train avg loss 0.00212801, dev acc 0.8878, dev avg loss 0.333886, throughput 2.75451K wps
[Epoch 56 Batch 30/172] avg loss 0.00198471, throughput 2.79266K wps
[Epoch 56 Batch 60/172] avg loss 0.00199314, throughput 2.64361K wps
[Epoch 56 Batch 90/172] avg loss 0.00205182, throughput 2.79232K wps
[Epoch 56 Batch 120/172] avg loss 0.0022627, throughput 2.7004K wps
[Epoch 56 Batch 150/172] avg loss 0.00221778, throughput 2.71228K wps
Begin Testing...
[Epoch 56] train avg loss 0.00210091, dev acc 0.8847, dev avg loss 0.334075, throughput 2.73977K wps
[Epoch 57 Batch 30/172] avg loss 0.00194093, throughput 2.84624K wps
[Epoch 57 Batch 60/172] avg loss 0.00220541, throughput 2.66809K wps
[Epoch 57 Batch 90/172] avg loss 0.00208045, throughput 2.86357K wps
[Epoch 57 Batch 120/172] avg loss 0.00180625, throughput 2.81407K wps
[Epoch 57 Batch 150/172] avg loss 0.00192389, throughput 2.73364K wps
Begin Testing...
[Epoch 57] train avg loss 0.00201998, dev acc 0.8857, dev avg loss 0.336855, throughput 2.78213K wps
[Epoch 58 Batch 30/172] avg loss 0.00187562, throughput 2.83075K wps
[Epoch 58 Batch 60/172] avg loss 0.00192588, throughput 2.80616K wps
[Epoch 58 Batch 90/172] avg loss 0.0020178, throughput 2.8225K wps
[Epoch 58 Batch 120/172] avg loss 0.00218222, throughput 2.72703K wps
[Epoch 58 Batch 150/172] avg loss 0.00202891, throughput 2.82809K wps
Begin Testing...
[Epoch 58] train avg loss 0.00202061, dev acc 0.8889, dev avg loss 0.339204, throughput 2.7872K wps
[Epoch 59 Batch 30/172] avg loss 0.00172511, throughput 2.8307K wps
[Epoch 59 Batch 60/172] avg loss 0.00190794, throughput 2.76886K wps
[Epoch 59 Batch 90/172] avg loss 0.00210442, throughput 2.748K wps
[Epoch 59 Batch 120/172] avg loss 0.00189828, throughput 2.8252K wps
[Epoch 59 Batch 150/172] avg loss 0.00188649, throughput 2.78142K wps
Begin Testing...
[Epoch 59] train avg loss 0.0019449, dev acc 0.8878, dev avg loss 0.341289, throughput 2.79233K wps
[Epoch 60 Batch 30/172] avg loss 0.0019332, throughput 2.87484K wps
[Epoch 60 Batch 60/172] avg loss 0.00184633, throughput 2.80786K wps
[Epoch 60 Batch 90/172] avg loss 0.0018978, throughput 2.80252K wps
[Epoch 60 Batch 120/172] avg loss 0.00184081, throughput 2.70492K wps
[Epoch 60 Batch 150/172] avg loss 0.00199578, throughput 2.87108K wps
Begin Testing...
[Epoch 60] train avg loss 0.00197439, dev acc 0.8868, dev avg loss 0.343335, throughput 2.81231K wps
[Epoch 61 Batch 30/172] avg loss 0.00174525, throughput 2.87777K wps
[Epoch 61 Batch 60/172] avg loss 0.00207963, throughput 2.79173K wps
[Epoch 61 Batch 90/172] avg loss 0.00180767, throughput 2.74363K wps
[Epoch 61 Batch 120/172] avg loss 0.00189769, throughput 2.80331K wps
[Epoch 61 Batch 150/172] avg loss 0.00198439, throughput 2.75917K wps
Begin Testing...
[Epoch 61] train avg loss 0.00191, dev acc 0.8805, dev avg loss 0.346627, throughput 2.76262K wps
[Epoch 62 Batch 30/172] avg loss 0.00201194, throughput 2.7396K wps
[Epoch 62 Batch 60/172] avg loss 0.00161729, throughput 2.80286K wps
[Epoch 62 Batch 90/172] avg loss 0.00183233, throughput 2.75034K wps
[Epoch 62 Batch 120/172] avg loss 0.00203997, throughput 2.70335K wps
[Epoch 62 Batch 150/172] avg loss 0.00171397, throughput 2.83167K wps
Begin Testing...
[Epoch 62] train avg loss 0.00187702, dev acc 0.8847, dev avg loss 0.346649, throughput 2.76154K wps
[Epoch 63 Batch 30/172] avg loss 0.0016041, throughput 2.82457K wps
[Epoch 63 Batch 60/172] avg loss 0.00177763, throughput 2.75028K wps
[Epoch 63 Batch 90/172] avg loss 0.00176549, throughput 2.6743K wps
[Epoch 63 Batch 120/172] avg loss 0.00205491, throughput 2.72378K wps
[Epoch 63 Batch 150/172] avg loss 0.00221258, throughput 2.74526K wps
Begin Testing...
[Epoch 63] train avg loss 0.0018734, dev acc 0.8795, dev avg loss 0.349192, throughput 2.74163K wps
[Epoch 64 Batch 30/172] avg loss 0.00165305, throughput 2.76873K wps
[Epoch 64 Batch 60/172] avg loss 0.00194648, throughput 2.615K wps
[Epoch 64 Batch 90/172] avg loss 0.0021055, throughput 2.77173K wps
[Epoch 64 Batch 120/172] avg loss 0.001753, throughput 2.74363K wps
[Epoch 64 Batch 150/172] avg loss 0.00183708, throughput 2.75051K wps
Begin Testing...
[Epoch 64] train avg loss 0.00189143, dev acc 0.8816, dev avg loss 0.351609, throughput 2.71822K wps
[Epoch 65 Batch 30/172] avg loss 0.00173569, throughput 2.75281K wps
[Epoch 65 Batch 60/172] avg loss 0.00181206, throughput 2.77107K wps
[Epoch 65 Batch 90/172] avg loss 0.00168031, throughput 2.77888K wps
[Epoch 65 Batch 120/172] avg loss 0.00154967, throughput 2.68479K wps
[Epoch 65 Batch 150/172] avg loss 0.00192339, throughput 2.74525K wps
Begin Testing...
[Epoch 65] train avg loss 0.00177968, dev acc 0.8847, dev avg loss 0.353055, throughput 2.74048K wps
[Epoch 66 Batch 30/172] avg loss 0.00176374, throughput 2.83295K wps
[Epoch 66 Batch 60/172] avg loss 0.00177616, throughput 2.80027K wps
[Epoch 66 Batch 90/172] avg loss 0.00164947, throughput 2.78517K wps
[Epoch 66 Batch 120/172] avg loss 0.00195168, throughput 2.78624K wps
[Epoch 66 Batch 150/172] avg loss 0.00161282, throughput 2.69722K wps
Begin Testing...
[Epoch 66] train avg loss 0.00178511, dev acc 0.8836, dev avg loss 0.35544, throughput 2.77795K wps
[Epoch 67 Batch 30/172] avg loss 0.00182345, throughput 2.78566K wps
[Epoch 67 Batch 60/172] avg loss 0.00160521, throughput 2.75694K wps
[Epoch 67 Batch 90/172] avg loss 0.00168405, throughput 2.79481K wps
[Epoch 67 Batch 120/172] avg loss 0.00204475, throughput 2.78228K wps
[Epoch 67 Batch 150/172] avg loss 0.00146018, throughput 2.80532K wps
Begin Testing...
[Epoch 67] train avg loss 0.00173022, dev acc 0.8836, dev avg loss 0.358349, throughput 2.78381K wps
[Epoch 68 Batch 30/172] avg loss 0.00195811, throughput 2.70627K wps
[Epoch 68 Batch 60/172] avg loss 0.00177741, throughput 2.84265K wps
[Epoch 68 Batch 90/172] avg loss 0.00175818, throughput 2.74887K wps
[Epoch 68 Batch 120/172] avg loss 0.00185755, throughput 2.81784K wps
[Epoch 68 Batch 150/172] avg loss 0.00162128, throughput 2.73968K wps
Begin Testing...
[Epoch 68] train avg loss 0.00176675, dev acc 0.8836, dev avg loss 0.359965, throughput 2.77102K wps
[Epoch 69 Batch 30/172] avg loss 0.00182288, throughput 2.74403K wps
[Epoch 69 Batch 60/172] avg loss 0.00157037, throughput 2.7914K wps
[Epoch 69 Batch 90/172] avg loss 0.00187809, throughput 2.82685K wps
[Epoch 69 Batch 120/172] avg loss 0.00164584, throughput 2.83549K wps
[Epoch 69 Batch 150/172] avg loss 0.00151764, throughput 2.79997K wps
Begin Testing...
[Epoch 69] train avg loss 0.00168048, dev acc 0.8816, dev avg loss 0.363552, throughput 2.78835K wps
[Epoch 70 Batch 30/172] avg loss 0.0015062, throughput 2.81092K wps
[Epoch 70 Batch 60/172] avg loss 0.00192474, throughput 2.75162K wps
[Epoch 70 Batch 90/172] avg loss 0.00134532, throughput 2.87455K wps
[Epoch 70 Batch 120/172] avg loss 0.00168231, throughput 2.75502K wps
[Epoch 70 Batch 150/172] avg loss 0.00175764, throughput 2.68687K wps
Begin Testing...
[Epoch 70] train avg loss 0.00165188, dev acc 0.8847, dev avg loss 0.363835, throughput 2.78616K wps
[Epoch 71 Batch 30/172] avg loss 0.00154343, throughput 2.86091K wps
[Epoch 71 Batch 60/172] avg loss 0.00166052, throughput 2.71625K wps
[Epoch 71 Batch 90/172] avg loss 0.00189672, throughput 2.81628K wps
[Epoch 71 Batch 120/172] avg loss 0.00170179, throughput 2.76177K wps
[Epoch 71 Batch 150/172] avg loss 0.00152478, throughput 2.74977K wps
Begin Testing...
[Epoch 71] train avg loss 0.00165504, dev acc 0.8899, dev avg loss 0.364081, throughput 2.78071K wps
[Epoch 72 Batch 30/172] avg loss 0.00171491, throughput 2.6979K wps
[Epoch 72 Batch 60/172] avg loss 0.00194141, throughput 2.78021K wps
[Epoch 72 Batch 90/172] avg loss 0.00152173, throughput 2.73395K wps
[Epoch 72 Batch 120/172] avg loss 0.00178248, throughput 2.63168K wps
[Epoch 72 Batch 150/172] avg loss 0.00144634, throughput 2.74438K wps
Begin Testing...
[Epoch 72] train avg loss 0.00167458, dev acc 0.8847, dev avg loss 0.366038, throughput 2.6959K wps
[Epoch 73 Batch 30/172] avg loss 0.00140024, throughput 2.80371K wps
[Epoch 73 Batch 60/172] avg loss 0.00178828, throughput 2.75498K wps
[Epoch 73 Batch 90/172] avg loss 0.00158017, throughput 2.72071K wps
[Epoch 73 Batch 120/172] avg loss 0.00159004, throughput 2.64244K wps
[Epoch 73 Batch 150/172] avg loss 0.00160307, throughput 2.71432K wps
Begin Testing...
[Epoch 73] train avg loss 0.00160948, dev acc 0.8868, dev avg loss 0.368089, throughput 2.73081K wps
[Epoch 74 Batch 30/172] avg loss 0.00146578, throughput 2.7332K wps
[Epoch 74 Batch 60/172] avg loss 0.00167035, throughput 2.73844K wps
[Epoch 74 Batch 90/172] avg loss 0.00151642, throughput 2.67683K wps
[Epoch 74 Batch 120/172] avg loss 0.00154968, throughput 2.85081K wps
[Epoch 74 Batch 150/172] avg loss 0.00150836, throughput 2.75939K wps
Begin Testing...
[Epoch 74] train avg loss 0.00153531, dev acc 0.8889, dev avg loss 0.372499, throughput 2.75437K wps
[Epoch 75 Batch 30/172] avg loss 0.00159506, throughput 2.81391K wps
[Epoch 75 Batch 60/172] avg loss 0.0014907, throughput 2.78789K wps
[Epoch 75 Batch 90/172] avg loss 0.00157091, throughput 2.72464K wps
[Epoch 75 Batch 120/172] avg loss 0.00146915, throughput 2.79557K wps
[Epoch 75 Batch 150/172] avg loss 0.00184657, throughput 2.77732K wps
Begin Testing...
[Epoch 75] train avg loss 0.00160126, dev acc 0.8847, dev avg loss 0.372596, throughput 2.78069K wps
[Epoch 76 Batch 30/172] avg loss 0.00150833, throughput 2.84709K wps
[Epoch 76 Batch 60/172] avg loss 0.00147829, throughput 2.7906K wps
[Epoch 76 Batch 90/172] avg loss 0.00144718, throughput 2.81529K wps
[Epoch 76 Batch 120/172] avg loss 0.00165165, throughput 2.74264K wps
[Epoch 76 Batch 150/172] avg loss 0.00142619, throughput 2.81121K wps
Begin Testing...
[Epoch 76] train avg loss 0.00151376, dev acc 0.8931, dev avg loss 0.376585, throughput 2.80246K wps
Observed Improvement.
Begin Testing...
[Epoch 77 Batch 30/172] avg loss 0.00167212, throughput 2.8353K wps
[Epoch 77 Batch 60/172] avg loss 0.00148518, throughput 2.75646K wps
[Epoch 77 Batch 90/172] avg loss 0.00168175, throughput 2.75555K wps
[Epoch 77 Batch 120/172] avg loss 0.00141058, throughput 2.83146K wps
[Epoch 77 Batch 150/172] avg loss 0.00152436, throughput 2.79839K wps
Begin Testing...
[Epoch 77] train avg loss 0.00154654, dev acc 0.8910, dev avg loss 0.37712, throughput 2.79959K wps
[Epoch 78 Batch 30/172] avg loss 0.00160286, throughput 2.81727K wps
[Epoch 78 Batch 60/172] avg loss 0.00168369, throughput 2.71411K wps
[Epoch 78 Batch 90/172] avg loss 0.00129293, throughput 2.75944K wps
[Epoch 78 Batch 120/172] avg loss 0.0013664, throughput 2.77066K wps
[Epoch 78 Batch 150/172] avg loss 0.00202232, throughput 2.6604K wps
Begin Testing...
[Epoch 78] train avg loss 0.00157785, dev acc 0.8816, dev avg loss 0.378549, throughput 2.74892K wps
[Epoch 79 Batch 30/172] avg loss 0.00172332, throughput 2.80372K wps
[Epoch 79 Batch 60/172] avg loss 0.00160626, throughput 2.76267K wps
[Epoch 79 Batch 90/172] avg loss 0.00134456, throughput 2.66884K wps
[Epoch 79 Batch 120/172] avg loss 0.00141271, throughput 2.77944K wps
[Epoch 79 Batch 150/172] avg loss 0.00145475, throughput 2.67466K wps
Begin Testing...
[Epoch 79] train avg loss 0.00152844, dev acc 0.8899, dev avg loss 0.380771, throughput 2.73889K wps
[Epoch 80 Batch 30/172] avg loss 0.00140728, throughput 2.79975K wps
[Epoch 80 Batch 60/172] avg loss 0.00163825, throughput 2.7806K wps
[Epoch 80 Batch 90/172] avg loss 0.00141132, throughput 2.74934K wps
[Epoch 80 Batch 120/172] avg loss 0.00132126, throughput 2.74172K wps
[Epoch 80 Batch 150/172] avg loss 0.0014711, throughput 2.68319K wps
Begin Testing...
[Epoch 80] train avg loss 0.00145593, dev acc 0.8857, dev avg loss 0.38232, throughput 2.75035K wps
[Epoch 81 Batch 30/172] avg loss 0.00139713, throughput 2.71312K wps
[Epoch 81 Batch 60/172] avg loss 0.00143137, throughput 2.74738K wps
[Epoch 81 Batch 90/172] avg loss 0.00156855, throughput 2.66578K wps
[Epoch 81 Batch 120/172] avg loss 0.00176013, throughput 2.80094K wps
[Epoch 81 Batch 150/172] avg loss 0.00140543, throughput 2.76093K wps
Begin Testing...
[Epoch 81] train avg loss 0.00149879, dev acc 0.8889, dev avg loss 0.383855, throughput 2.74471K wps
[Epoch 82 Batch 30/172] avg loss 0.00178724, throughput 2.8581K wps
[Epoch 82 Batch 60/172] avg loss 0.00116447, throughput 2.7925K wps
[Epoch 82 Batch 90/172] avg loss 0.00145176, throughput 2.7502K wps
[Epoch 82 Batch 120/172] avg loss 0.00138429, throughput 2.78562K wps
[Epoch 82 Batch 150/172] avg loss 0.00160479, throughput 2.79207K wps
Begin Testing...
[Epoch 82] train avg loss 0.00149629, dev acc 0.8878, dev avg loss 0.38652, throughput 2.79514K wps
[Epoch 83 Batch 30/172] avg loss 0.00147272, throughput 2.7998K wps
[Epoch 83 Batch 60/172] avg loss 0.00127546, throughput 2.80657K wps
[Epoch 83 Batch 90/172] avg loss 0.00163282, throughput 2.82016K wps
[Epoch 83 Batch 120/172] avg loss 0.00130277, throughput 2.69503K wps
[Epoch 83 Batch 150/172] avg loss 0.0015161, throughput 2.84661K wps
Begin Testing...
[Epoch 83] train avg loss 0.00148586, dev acc 0.8889, dev avg loss 0.385975, throughput 2.78338K wps
[Epoch 84 Batch 30/172] avg loss 0.00152489, throughput 2.68253K wps
[Epoch 84 Batch 60/172] avg loss 0.00146131, throughput 2.76671K wps
[Epoch 84 Batch 90/172] avg loss 0.00147142, throughput 2.71169K wps
[Epoch 84 Batch 120/172] avg loss 0.00121891, throughput 2.7162K wps
[Epoch 84 Batch 150/172] avg loss 0.00153248, throughput 2.84115K wps
Begin Testing...
[Epoch 84] train avg loss 0.00147863, dev acc 0.8889, dev avg loss 0.388373, throughput 2.74867K wps
[Epoch 85 Batch 30/172] avg loss 0.00134648, throughput 2.69762K wps
[Epoch 85 Batch 60/172] avg loss 0.00151125, throughput 2.69521K wps
[Epoch 85 Batch 90/172] avg loss 0.00125255, throughput 2.81267K wps
[Epoch 85 Batch 120/172] avg loss 0.00145843, throughput 2.78408K wps
[Epoch 85 Batch 150/172] avg loss 0.00129704, throughput 2.74021K wps
Begin Testing...
[Epoch 85] train avg loss 0.00140255, dev acc 0.8878, dev avg loss 0.390092, throughput 2.7474K wps
[Epoch 86 Batch 30/172] avg loss 0.00152227, throughput 2.65153K wps
[Epoch 86 Batch 60/172] avg loss 0.00127751, throughput 2.59108K wps
[Epoch 86 Batch 90/172] avg loss 0.00132623, throughput 2.71961K wps
[Epoch 86 Batch 120/172] avg loss 0.00140014, throughput 2.58063K wps
[Epoch 86 Batch 150/172] avg loss 0.00135505, throughput 2.76536K wps
Begin Testing...
[Epoch 86] train avg loss 0.00143449, dev acc 0.8878, dev avg loss 0.391, throughput 2.67344K wps
[Epoch 87 Batch 30/172] avg loss 0.00138645, throughput 2.80526K wps
[Epoch 87 Batch 60/172] avg loss 0.00169254, throughput 2.65932K wps
[Epoch 87 Batch 90/172] avg loss 0.00142567, throughput 2.6455K wps
[Epoch 87 Batch 120/172] avg loss 0.00134061, throughput 2.74837K wps
[Epoch 87 Batch 150/172] avg loss 0.00127094, throughput 2.80733K wps
Begin Testing...
[Epoch 87] train avg loss 0.00143281, dev acc 0.8868, dev avg loss 0.394145, throughput 2.73375K wps
[Epoch 88 Batch 30/172] avg loss 0.00136094, throughput 2.78689K wps
[Epoch 88 Batch 60/172] avg loss 0.00139783, throughput 2.60208K wps
[Epoch 88 Batch 90/172] avg loss 0.00137687, throughput 2.75042K wps
[Epoch 88 Batch 120/172] avg loss 0.00133316, throughput 2.82088K wps
[Epoch 88 Batch 150/172] avg loss 0.00126656, throughput 2.75497K wps
Begin Testing...
[Epoch 88] train avg loss 0.00137979, dev acc 0.8847, dev avg loss 0.395278, throughput 2.74862K wps
[Epoch 89 Batch 30/172] avg loss 0.00132648, throughput 2.81464K wps
[Epoch 89 Batch 60/172] avg loss 0.00126876, throughput 2.81938K wps
[Epoch 89 Batch 90/172] avg loss 0.00136917, throughput 2.81812K wps
[Epoch 89 Batch 120/172] avg loss 0.00155383, throughput 2.79495K wps
[Epoch 89 Batch 150/172] avg loss 0.00130261, throughput 2.7907K wps
Begin Testing...
[Epoch 89] train avg loss 0.00138093, dev acc 0.8878, dev avg loss 0.397339, throughput 2.80719K wps
[Epoch 90 Batch 30/172] avg loss 0.00149267, throughput 2.85461K wps
[Epoch 90 Batch 60/172] avg loss 0.00146995, throughput 2.77765K wps
[Epoch 90 Batch 90/172] avg loss 0.00115688, throughput 2.66601K wps
[Epoch 90 Batch 120/172] avg loss 0.00155898, throughput 2.66724K wps
[Epoch 90 Batch 150/172] avg loss 0.00125824, throughput 2.82507K wps
Begin Testing...
[Epoch 90] train avg loss 0.00136615, dev acc 0.8889, dev avg loss 0.40105, throughput 2.75756K wps
[Epoch 91 Batch 30/172] avg loss 0.00136178, throughput 2.78823K wps
[Epoch 91 Batch 60/172] avg loss 0.00118929, throughput 2.7953K wps
[Epoch 91 Batch 90/172] avg loss 0.00123316, throughput 2.66445K wps
[Epoch 91 Batch 120/172] avg loss 0.00149226, throughput 2.68886K wps
[Epoch 91 Batch 150/172] avg loss 0.00131568, throughput 2.7637K wps
Begin Testing...
[Epoch 91] train avg loss 0.00132791, dev acc 0.8889, dev avg loss 0.402772, throughput 2.74741K wps
[Epoch 92 Batch 30/172] avg loss 0.0012085, throughput 2.79495K wps
[Epoch 92 Batch 60/172] avg loss 0.0013056, throughput 2.80562K wps
[Epoch 92 Batch 90/172] avg loss 0.00122642, throughput 2.65696K wps
[Epoch 92 Batch 120/172] avg loss 0.00152714, throughput 2.79695K wps
[Epoch 92 Batch 150/172] avg loss 0.00163498, throughput 2.76607K wps
Begin Testing...
[Epoch 92] train avg loss 0.00137779, dev acc 0.8847, dev avg loss 0.402908, throughput 2.76459K wps
[Epoch 93 Batch 30/172] avg loss 0.00119893, throughput 2.64748K wps
[Epoch 93 Batch 60/172] avg loss 0.00143772, throughput 2.69861K wps
[Epoch 93 Batch 90/172] avg loss 0.00137164, throughput 2.67241K wps
[Epoch 93 Batch 120/172] avg loss 0.00133198, throughput 2.79898K wps
[Epoch 93 Batch 150/172] avg loss 0.00117524, throughput 2.76719K wps
Begin Testing...
[Epoch 93] train avg loss 0.00131453, dev acc 0.8836, dev avg loss 0.403464, throughput 2.69823K wps
[Epoch 94 Batch 30/172] avg loss 0.00134164, throughput 2.82016K wps
[Epoch 94 Batch 60/172] avg loss 0.00123784, throughput 2.57852K wps
[Epoch 94 Batch 90/172] avg loss 0.00139599, throughput 2.66095K wps
[Epoch 94 Batch 120/172] avg loss 0.00120821, throughput 2.70206K wps
[Epoch 94 Batch 150/172] avg loss 0.00130064, throughput 2.76139K wps
Begin Testing...
[Epoch 94] train avg loss 0.00132449, dev acc 0.8836, dev avg loss 0.405721, throughput 2.7107K wps
[Epoch 95 Batch 30/172] avg loss 0.00116219, throughput 2.70638K wps
[Epoch 95 Batch 60/172] avg loss 0.00118925, throughput 2.79135K wps
[Epoch 95 Batch 90/172] avg loss 0.00120481, throughput 2.77384K wps
[Epoch 95 Batch 120/172] avg loss 0.00150928, throughput 2.76399K wps
[Epoch 95 Batch 150/172] avg loss 0.00116408, throughput 2.65901K wps
Begin Testing...
[Epoch 95] train avg loss 0.00127912, dev acc 0.8784, dev avg loss 0.413517, throughput 2.7514K wps
[Epoch 96 Batch 30/172] avg loss 0.00125707, throughput 2.78496K wps
[Epoch 96 Batch 60/172] avg loss 0.00130964, throughput 2.78677K wps
[Epoch 96 Batch 90/172] avg loss 0.00117618, throughput 2.57006K wps
[Epoch 96 Batch 120/172] avg loss 0.0014007, throughput 2.74603K wps
[Epoch 96 Batch 150/172] avg loss 0.0012577, throughput 2.77478K wps
Begin Testing...
[Epoch 96] train avg loss 0.00128513, dev acc 0.8889, dev avg loss 0.407459, throughput 2.73848K wps
[Epoch 97 Batch 30/172] avg loss 0.00116609, throughput 2.76735K wps
[Epoch 97 Batch 60/172] avg loss 0.0011987, throughput 2.72487K wps
[Epoch 97 Batch 90/172] avg loss 0.0011872, throughput 2.69717K wps
[Epoch 97 Batch 120/172] avg loss 0.00135051, throughput 2.76252K wps
[Epoch 97 Batch 150/172] avg loss 0.00117055, throughput 2.69211K wps
Begin Testing...
[Epoch 97] train avg loss 0.00125284, dev acc 0.8857, dev avg loss 0.410138, throughput 2.7342K wps
[Epoch 98 Batch 30/172] avg loss 0.00120379, throughput 2.75841K wps
[Epoch 98 Batch 60/172] avg loss 0.0011362, throughput 2.66527K wps
[Epoch 98 Batch 90/172] avg loss 0.00112493, throughput 2.73737K wps
[Epoch 98 Batch 120/172] avg loss 0.00145564, throughput 2.78349K wps
[Epoch 98 Batch 150/172] avg loss 0.00128279, throughput 2.5878K wps
Begin Testing...
[Epoch 98] train avg loss 0.00125167, dev acc 0.8878, dev avg loss 0.415865, throughput 2.72729K wps
[Epoch 99 Batch 30/172] avg loss 0.00110864, throughput 2.73371K wps
[Epoch 99 Batch 60/172] avg loss 0.00116646, throughput 2.67473K wps
[Epoch 99 Batch 90/172] avg loss 0.00117021, throughput 2.69669K wps
[Epoch 99 Batch 120/172] avg loss 0.00136263, throughput 2.78739K wps
[Epoch 99 Batch 150/172] avg loss 0.00146384, throughput 2.75129K wps
Begin Testing...
[Epoch 99] train avg loss 0.0012918, dev acc 0.8826, dev avg loss 0.412683, throughput 2.71032K wps
[Epoch 100 Batch 30/172] avg loss 0.00105371, throughput 2.75041K wps
[Epoch 100 Batch 60/172] avg loss 0.00125296, throughput 2.64612K wps
[Epoch 100 Batch 90/172] avg loss 0.00139293, throughput 2.73207K wps
[Epoch 100 Batch 120/172] avg loss 0.00124981, throughput 2.78104K wps
[Epoch 100 Batch 150/172] avg loss 0.00156855, throughput 2.6924K wps
Begin Testing...
[Epoch 100] train avg loss 0.00127694, dev acc 0.8878, dev avg loss 0.414616, throughput 2.72221K wps
[Epoch 101 Batch 30/172] avg loss 0.00134468, throughput 2.64622K wps
[Epoch 101 Batch 60/172] avg loss 0.00146366, throughput 2.77027K wps
[Epoch 101 Batch 90/172] avg loss 0.00101175, throughput 2.75748K wps
[Epoch 101 Batch 120/172] avg loss 0.00112191, throughput 2.76683K wps
[Epoch 101 Batch 150/172] avg loss 0.00132602, throughput 2.74673K wps
Begin Testing...
[Epoch 101] train avg loss 0.00125725, dev acc 0.8857, dev avg loss 0.419204, throughput 2.74359K wps
[Epoch 102 Batch 30/172] avg loss 0.00128895, throughput 2.79627K wps
[Epoch 102 Batch 60/172] avg loss 0.0010466, throughput 2.75053K wps
[Epoch 102 Batch 90/172] avg loss 0.00129874, throughput 2.66764K wps
[Epoch 102 Batch 120/172] avg loss 0.00112945, throughput 2.77702K wps
[Epoch 102 Batch 150/172] avg loss 0.00123435, throughput 2.76437K wps
Begin Testing...
[Epoch 102] train avg loss 0.00121117, dev acc 0.8826, dev avg loss 0.417505, throughput 2.73933K wps
[Epoch 103 Batch 30/172] avg loss 0.00112081, throughput 2.80294K wps
[Epoch 103 Batch 60/172] avg loss 0.00134898, throughput 2.75349K wps
[Epoch 103 Batch 90/172] avg loss 0.00134, throughput 2.77276K wps
[Epoch 103 Batch 120/172] avg loss 0.001027, throughput 2.79351K wps
[Epoch 103 Batch 150/172] avg loss 0.00126039, throughput 2.76712K wps
Begin Testing...
[Epoch 103] train avg loss 0.00121287, dev acc 0.8836, dev avg loss 0.420279, throughput 2.7469K wps
[Epoch 104 Batch 30/172] avg loss 0.00122225, throughput 2.78141K wps
[Epoch 104 Batch 60/172] avg loss 0.000921789, throughput 2.78028K wps
[Epoch 104 Batch 90/172] avg loss 0.00121397, throughput 2.77262K wps
[Epoch 104 Batch 120/172] avg loss 0.00125493, throughput 2.7938K wps
[Epoch 104 Batch 150/172] avg loss 0.00105928, throughput 2.76759K wps
Begin Testing...
[Epoch 104] train avg loss 0.00118983, dev acc 0.8847, dev avg loss 0.42143, throughput 2.78252K wps
[Epoch 105 Batch 30/172] avg loss 0.0011242, throughput 2.77465K wps
[Epoch 105 Batch 60/172] avg loss 0.00114018, throughput 2.87808K wps
[Epoch 105 Batch 90/172] avg loss 0.000999832, throughput 2.81599K wps
[Epoch 105 Batch 120/172] avg loss 0.0013565, throughput 2.79874K wps
[Epoch 105 Batch 150/172] avg loss 0.00110884, throughput 2.78802K wps
Begin Testing...
[Epoch 105] train avg loss 0.00117838, dev acc 0.8816, dev avg loss 0.421509, throughput 2.80818K wps
[Epoch 106 Batch 30/172] avg loss 0.000880461, throughput 2.7837K wps
[Epoch 106 Batch 60/172] avg loss 0.00115323, throughput 2.80995K wps
[Epoch 106 Batch 90/172] avg loss 0.00116588, throughput 2.80143K wps
[Epoch 106 Batch 120/172] avg loss 0.00148702, throughput 2.6647K wps
[Epoch 106 Batch 150/172] avg loss 0.00133946, throughput 2.82044K wps
Begin Testing...
[Epoch 106] train avg loss 0.00120909, dev acc 0.8836, dev avg loss 0.423026, throughput 2.74741K wps
[Epoch 107 Batch 30/172] avg loss 0.00106259, throughput 2.75884K wps
[Epoch 107 Batch 60/172] avg loss 0.00116906, throughput 2.64393K wps
[Epoch 107 Batch 90/172] avg loss 0.00125096, throughput 2.76966K wps
[Epoch 107 Batch 120/172] avg loss 0.00101352, throughput 2.72293K wps
[Epoch 107 Batch 150/172] avg loss 0.00122006, throughput 2.79295K wps
Begin Testing...
[Epoch 107] train avg loss 0.00112614, dev acc 0.8816, dev avg loss 0.427937, throughput 2.74204K wps
[Epoch 108 Batch 30/172] avg loss 0.00111463, throughput 2.76829K wps
[Epoch 108 Batch 60/172] avg loss 0.00126729, throughput 2.72491K wps
[Epoch 108 Batch 90/172] avg loss 0.00123233, throughput 2.69558K wps
[Epoch 108 Batch 120/172] avg loss 0.00114289, throughput 2.6278K wps
[Epoch 108 Batch 150/172] avg loss 0.00131753, throughput 2.728K wps
Begin Testing...
[Epoch 108] train avg loss 0.00120139, dev acc 0.8857, dev avg loss 0.427391, throughput 2.71765K wps
[Epoch 109 Batch 30/172] avg loss 0.000998304, throughput 2.71191K wps
[Epoch 109 Batch 60/172] avg loss 0.00123769, throughput 2.7555K wps
[Epoch 109 Batch 90/172] avg loss 0.00102676, throughput 2.58913K wps
[Epoch 109 Batch 120/172] avg loss 0.00117533, throughput 2.75882K wps
[Epoch 109 Batch 150/172] avg loss 0.00124365, throughput 2.76221K wps
Begin Testing...
[Epoch 109] train avg loss 0.00118748, dev acc 0.8836, dev avg loss 0.425325, throughput 2.71743K wps
[Epoch 110 Batch 30/172] avg loss 0.00115613, throughput 2.82318K wps
[Epoch 110 Batch 60/172] avg loss 0.00105401, throughput 2.76596K wps
[Epoch 110 Batch 90/172] avg loss 0.00116892, throughput 2.76878K wps
[Epoch 110 Batch 120/172] avg loss 0.00104365, throughput 2.5747K wps
[Epoch 110 Batch 150/172] avg loss 0.0012486, throughput 2.77212K wps
Begin Testing...
[Epoch 110] train avg loss 0.0011373, dev acc 0.8847, dev avg loss 0.427761, throughput 2.74294K wps
[Epoch 111 Batch 30/172] avg loss 0.0012986, throughput 2.67521K wps
[Epoch 111 Batch 60/172] avg loss 0.00112542, throughput 2.60384K wps
[Epoch 111 Batch 90/172] avg loss 0.00124321, throughput 2.62221K wps
[Epoch 111 Batch 120/172] avg loss 0.000972065, throughput 2.86092K wps
[Epoch 111 Batch 150/172] avg loss 0.00121746, throughput 2.66232K wps
Begin Testing...
[Epoch 111] train avg loss 0.00117123, dev acc 0.8868, dev avg loss 0.432847, throughput 2.69788K wps
[Epoch 112 Batch 30/172] avg loss 0.00124339, throughput 2.8507K wps
[Epoch 112 Batch 60/172] avg loss 0.00107696, throughput 2.78715K wps
[Epoch 112 Batch 90/172] avg loss 0.00122267, throughput 2.8135K wps
[Epoch 112 Batch 120/172] avg loss 0.00103359, throughput 2.68734K wps
[Epoch 112 Batch 150/172] avg loss 0.00107585, throughput 2.85322K wps
Begin Testing...
[Epoch 112] train avg loss 0.00114292, dev acc 0.8836, dev avg loss 0.433799, throughput 2.79601K wps
[Epoch 113 Batch 30/172] avg loss 0.00115405, throughput 2.85752K wps
[Epoch 113 Batch 60/172] avg loss 0.0010621, throughput 2.79814K wps
[Epoch 113 Batch 90/172] avg loss 0.00100035, throughput 2.78199K wps
[Epoch 113 Batch 120/172] avg loss 0.00112438, throughput 2.77122K wps
[Epoch 113 Batch 150/172] avg loss 0.00135592, throughput 2.7704K wps
Begin Testing...
[Epoch 113] train avg loss 0.00112883, dev acc 0.8816, dev avg loss 0.434024, throughput 2.79347K wps
[Epoch 114 Batch 30/172] avg loss 0.00104896, throughput 2.85246K wps
[Epoch 114 Batch 60/172] avg loss 0.00135103, throughput 2.69402K wps
[Epoch 114 Batch 90/172] avg loss 0.00120952, throughput 2.78426K wps
[Epoch 114 Batch 120/172] avg loss 0.000987132, throughput 2.72058K wps
[Epoch 114 Batch 150/172] avg loss 0.000964843, throughput 2.69157K wps
Begin Testing...
[Epoch 114] train avg loss 0.00109675, dev acc 0.8857, dev avg loss 0.438308, throughput 2.74931K wps
[Epoch 115 Batch 30/172] avg loss 0.000861865, throughput 2.76539K wps
[Epoch 115 Batch 60/172] avg loss 0.00120876, throughput 2.60399K wps
[Epoch 115 Batch 90/172] avg loss 0.0010964, throughput 2.79818K wps
[Epoch 115 Batch 120/172] avg loss 0.00117921, throughput 2.77613K wps
[Epoch 115 Batch 150/172] avg loss 0.0011393, throughput 2.76917K wps
Begin Testing...
[Epoch 115] train avg loss 0.00112597, dev acc 0.8836, dev avg loss 0.442855, throughput 2.72917K wps
[Epoch 116 Batch 30/172] avg loss 0.0010742, throughput 2.70341K wps
[Epoch 116 Batch 60/172] avg loss 0.00101319, throughput 2.66742K wps
[Epoch 116 Batch 90/172] avg loss 0.00109939, throughput 2.7689K wps
[Epoch 116 Batch 120/172] avg loss 0.00114469, throughput 2.76832K wps
[Epoch 116 Batch 150/172] avg loss 0.0011247, throughput 2.74898K wps
Begin Testing...
[Epoch 116] train avg loss 0.00110836, dev acc 0.8847, dev avg loss 0.438489, throughput 2.71991K wps
[Epoch 117 Batch 30/172] avg loss 0.000998791, throughput 2.78113K wps
[Epoch 117 Batch 60/172] avg loss 0.00112701, throughput 2.68036K wps
[Epoch 117 Batch 90/172] avg loss 0.00110589, throughput 2.82794K wps
[Epoch 117 Batch 120/172] avg loss 0.000855219, throughput 2.69829K wps
[Epoch 117 Batch 150/172] avg loss 0.0012236, throughput 2.79867K wps
Begin Testing...
[Epoch 117] train avg loss 0.00108941, dev acc 0.8826, dev avg loss 0.438082, throughput 2.75987K wps
[Epoch 118 Batch 30/172] avg loss 0.000870535, throughput 2.81998K wps
[Epoch 118 Batch 60/172] avg loss 0.00133311, throughput 2.73297K wps
[Epoch 118 Batch 90/172] avg loss 0.00110249, throughput 2.76308K wps
[Epoch 118 Batch 120/172] avg loss 0.00117949, throughput 2.7193K wps
[Epoch 118 Batch 150/172] avg loss 0.00109752, throughput 2.74567K wps
Begin Testing...
[Epoch 118] train avg loss 0.00111789, dev acc 0.8868, dev avg loss 0.439404, throughput 2.7507K wps
[Epoch 119 Batch 30/172] avg loss 0.0010363, throughput 2.83719K wps
[Epoch 119 Batch 60/172] avg loss 0.00105124, throughput 2.75529K wps
[Epoch 119 Batch 90/172] avg loss 0.00127443, throughput 2.76874K wps
[Epoch 119 Batch 120/172] avg loss 0.000942164, throughput 2.79149K wps
[Epoch 119 Batch 150/172] avg loss 0.00103449, throughput 2.81169K wps
Begin Testing...
[Epoch 119] train avg loss 0.00107917, dev acc 0.8836, dev avg loss 0.440398, throughput 2.77997K wps
[Epoch 120 Batch 30/172] avg loss 0.00121603, throughput 2.87687K wps
[Epoch 120 Batch 60/172] avg loss 0.00104474, throughput 2.81646K wps
[Epoch 120 Batch 90/172] avg loss 0.0011351, throughput 2.73207K wps
[Epoch 120 Batch 120/172] avg loss 0.00122508, throughput 2.861K wps
[Epoch 120 Batch 150/172] avg loss 0.000881773, throughput 2.79823K wps
Begin Testing...
[Epoch 120] train avg loss 0.00110243, dev acc 0.8857, dev avg loss 0.442955, throughput 2.80691K wps
[Epoch 121 Batch 30/172] avg loss 0.000881542, throughput 2.74167K wps
[Epoch 121 Batch 60/172] avg loss 0.000942257, throughput 2.72376K wps
[Epoch 121 Batch 90/172] avg loss 0.00113184, throughput 2.72361K wps
[Epoch 121 Batch 120/172] avg loss 0.0010436, throughput 2.75599K wps
[Epoch 121 Batch 150/172] avg loss 0.00116825, throughput 2.7015K wps
Begin Testing...
[Epoch 121] train avg loss 0.00105709, dev acc 0.8836, dev avg loss 0.445979, throughput 2.71392K wps
[Epoch 122 Batch 30/172] avg loss 0.000909478, throughput 2.70764K wps
[Epoch 122 Batch 60/172] avg loss 0.00103233, throughput 2.75003K wps
[Epoch 122 Batch 90/172] avg loss 0.00115168, throughput 2.74741K wps
[Epoch 122 Batch 120/172] avg loss 0.00119612, throughput 2.75051K wps
[Epoch 122 Batch 150/172] avg loss 0.00115365, throughput 2.70338K wps
Begin Testing...
[Epoch 122] train avg loss 0.00107505, dev acc 0.8847, dev avg loss 0.444439, throughput 2.74003K wps
[Epoch 123 Batch 30/172] avg loss 0.00107769, throughput 2.73227K wps
[Epoch 123 Batch 60/172] avg loss 0.000917169, throughput 2.62321K wps
[Epoch 123 Batch 90/172] avg loss 0.00100183, throughput 2.81579K wps
[Epoch 123 Batch 120/172] avg loss 0.000955776, throughput 2.66974K wps
[Epoch 123 Batch 150/172] avg loss 0.00116053, throughput 2.77833K wps
Begin Testing...
[Epoch 123] train avg loss 0.00103896, dev acc 0.8816, dev avg loss 0.442714, throughput 2.72656K wps
[Epoch 124 Batch 30/172] avg loss 0.00098926, throughput 2.73054K wps
[Epoch 124 Batch 60/172] avg loss 0.000939163, throughput 2.796K wps
[Epoch 124 Batch 90/172] avg loss 0.000876691, throughput 2.65526K wps
[Epoch 124 Batch 120/172] avg loss 0.00104535, throughput 2.64459K wps
[Epoch 124 Batch 150/172] avg loss 0.00111444, throughput 2.80654K wps
Begin Testing...
[Epoch 124] train avg loss 0.0010338, dev acc 0.8784, dev avg loss 0.44693, throughput 2.73325K wps
[Epoch 125 Batch 30/172] avg loss 0.000917573, throughput 2.81556K wps
[Epoch 125 Batch 60/172] avg loss 0.000955751, throughput 2.76976K wps
[Epoch 125 Batch 90/172] avg loss 0.00106708, throughput 2.63394K wps
[Epoch 125 Batch 120/172] avg loss 0.000899873, throughput 2.75181K wps
[Epoch 125 Batch 150/172] avg loss 0.00119467, throughput 2.69629K wps
Begin Testing...
[Epoch 125] train avg loss 0.00104429, dev acc 0.8805, dev avg loss 0.445, throughput 2.74081K wps
[Epoch 126 Batch 30/172] avg loss 0.00098032, throughput 2.86803K wps
[Epoch 126 Batch 60/172] avg loss 0.00111292, throughput 2.69909K wps
[Epoch 126 Batch 90/172] avg loss 0.00118745, throughput 2.85311K wps
[Epoch 126 Batch 120/172] avg loss 0.00107802, throughput 2.81564K wps
[Epoch 126 Batch 150/172] avg loss 0.000993639, throughput 2.78385K wps
Begin Testing...
[Epoch 126] train avg loss 0.00105046, dev acc 0.8836, dev avg loss 0.455582, throughput 2.80259K wps
[Epoch 127 Batch 30/172] avg loss 0.00101941, throughput 2.82085K wps
[Epoch 127 Batch 60/172] avg loss 0.00113657, throughput 2.80285K wps
[Epoch 127 Batch 90/172] avg loss 0.000962575, throughput 2.70992K wps
[Epoch 127 Batch 120/172] avg loss 0.000990427, throughput 2.80547K wps
[Epoch 127 Batch 150/172] avg loss 0.000861138, throughput 2.7749K wps
Begin Testing...
[Epoch 127] train avg loss 0.00100373, dev acc 0.8847, dev avg loss 0.452127, throughput 2.77677K wps
[Epoch 128 Batch 30/172] avg loss 0.000872104, throughput 2.75222K wps
[Epoch 128 Batch 60/172] avg loss 0.000972903, throughput 2.77322K wps
[Epoch 128 Batch 90/172] avg loss 0.00105288, throughput 2.66875K wps
[Epoch 128 Batch 120/172] avg loss 0.001082, throughput 2.76496K wps
[Epoch 128 Batch 150/172] avg loss 0.00116236, throughput 2.70468K wps
Begin Testing...
[Epoch 128] train avg loss 0.00102142, dev acc 0.8784, dev avg loss 0.453501, throughput 2.7381K wps
[Epoch 129 Batch 30/172] avg loss 0.000956641, throughput 2.82653K wps
[Epoch 129 Batch 60/172] avg loss 0.00101564, throughput 2.69174K wps
[Epoch 129 Batch 90/172] avg loss 0.000899015, throughput 2.76945K wps
[Epoch 129 Batch 120/172] avg loss 0.00101695, throughput 2.75625K wps
[Epoch 129 Batch 150/172] avg loss 0.00102828, throughput 2.77754K wps
Begin Testing...
[Epoch 129] train avg loss 0.00100789, dev acc 0.8816, dev avg loss 0.464231, throughput 2.76097K wps
[Epoch 130 Batch 30/172] avg loss 0.0009192, throughput 2.82276K wps
[Epoch 130 Batch 60/172] avg loss 0.000983284, throughput 2.77633K wps
[Epoch 130 Batch 90/172] avg loss 0.00092545, throughput 2.64329K wps
[Epoch 130 Batch 120/172] avg loss 0.00129318, throughput 2.81601K wps
[Epoch 130 Batch 150/172] avg loss 0.00109455, throughput 2.75225K wps
Begin Testing...
[Epoch 130] train avg loss 0.00101593, dev acc 0.8805, dev avg loss 0.457687, throughput 2.75416K wps
[Epoch 131 Batch 30/172] avg loss 0.00105034, throughput 2.837K wps
[Epoch 131 Batch 60/172] avg loss 0.000926347, throughput 2.61692K wps
[Epoch 131 Batch 90/172] avg loss 0.00099325, throughput 2.73157K wps
[Epoch 131 Batch 120/172] avg loss 0.00102529, throughput 2.62537K wps
[Epoch 131 Batch 150/172] avg loss 0.00103984, throughput 2.74427K wps
Begin Testing...
[Epoch 131] train avg loss 0.000997616, dev acc 0.8816, dev avg loss 0.457287, throughput 2.71168K wps
[Epoch 132 Batch 30/172] avg loss 0.000995141, throughput 2.58666K wps
[Epoch 132 Batch 60/172] avg loss 0.00114295, throughput 2.75388K wps
[Epoch 132 Batch 90/172] avg loss 0.00108546, throughput 2.62589K wps
[Epoch 132 Batch 120/172] avg loss 0.000862614, throughput 2.73794K wps
[Epoch 132 Batch 150/172] avg loss 0.000885448, throughput 2.59139K wps
Begin Testing...
[Epoch 132] train avg loss 0.000996749, dev acc 0.8826, dev avg loss 0.459406, throughput 2.65787K wps
[Epoch 133 Batch 30/172] avg loss 0.00087455, throughput 2.69752K wps
[Epoch 133 Batch 60/172] avg loss 0.00102721, throughput 2.76821K wps
[Epoch 133 Batch 90/172] avg loss 0.000896684, throughput 2.75177K wps
[Epoch 133 Batch 120/172] avg loss 0.0010813, throughput 2.77961K wps
[Epoch 133 Batch 150/172] avg loss 0.00106936, throughput 2.72472K wps
Begin Testing...
[Epoch 133] train avg loss 0.000995765, dev acc 0.8847, dev avg loss 0.45914, throughput 2.74718K wps
[Epoch 134 Batch 30/172] avg loss 0.000899801, throughput 2.77878K wps
[Epoch 134 Batch 60/172] avg loss 0.000868813, throughput 2.73887K wps
[Epoch 134 Batch 90/172] avg loss 0.000993468, throughput 2.86012K wps
[Epoch 134 Batch 120/172] avg loss 0.000988829, throughput 2.74887K wps
[Epoch 134 Batch 150/172] avg loss 0.00127771, throughput 2.69453K wps
Begin Testing...
[Epoch 134] train avg loss 0.000977262, dev acc 0.8836, dev avg loss 0.463009, throughput 2.75999K wps
[Epoch 135 Batch 30/172] avg loss 0.000935538, throughput 2.84846K wps
[Epoch 135 Batch 60/172] avg loss 0.00101134, throughput 2.74877K wps
[Epoch 135 Batch 90/172] avg loss 0.000840814, throughput 2.76406K wps
[Epoch 135 Batch 120/172] avg loss 0.000928509, throughput 2.79688K wps
[Epoch 135 Batch 150/172] avg loss 0.00127695, throughput 2.76939K wps
Begin Testing...
[Epoch 135] train avg loss 0.000996772, dev acc 0.8816, dev avg loss 0.463709, throughput 2.77983K wps
[Epoch 136 Batch 30/172] avg loss 0.000839854, throughput 2.78731K wps
[Epoch 136 Batch 60/172] avg loss 0.00106683, throughput 2.80668K wps
[Epoch 136 Batch 90/172] avg loss 0.000938322, throughput 2.78253K wps
[Epoch 136 Batch 120/172] avg loss 0.000936419, throughput 2.67418K wps
[Epoch 136 Batch 150/172] avg loss 0.00113248, throughput 2.67542K wps
Begin Testing...
[Epoch 136] train avg loss 0.000977835, dev acc 0.8816, dev avg loss 0.463821, throughput 2.72755K wps
[Epoch 137 Batch 30/172] avg loss 0.000980282, throughput 2.67149K wps
[Epoch 137 Batch 60/172] avg loss 0.000873493, throughput 2.63024K wps
[Epoch 137 Batch 90/172] avg loss 0.000913335, throughput 2.75567K wps
[Epoch 137 Batch 120/172] avg loss 0.00117557, throughput 2.74933K wps
[Epoch 137 Batch 150/172] avg loss 0.000875929, throughput 2.71322K wps
Begin Testing...
[Epoch 137] train avg loss 0.000987623, dev acc 0.8805, dev avg loss 0.461357, throughput 2.68799K wps
[Epoch 138 Batch 30/172] avg loss 0.000757785, throughput 2.74316K wps
[Epoch 138 Batch 60/172] avg loss 0.000921189, throughput 2.7392K wps
[Epoch 138 Batch 90/172] avg loss 0.000916035, throughput 2.74839K wps
[Epoch 138 Batch 120/172] avg loss 0.00118665, throughput 2.66987K wps
[Epoch 138 Batch 150/172] avg loss 0.00106325, throughput 2.73408K wps
Begin Testing...
[Epoch 138] train avg loss 0.000966843, dev acc 0.8836, dev avg loss 0.464229, throughput 2.69897K wps
[Epoch 139 Batch 30/172] avg loss 0.0010155, throughput 2.80219K wps
[Epoch 139 Batch 60/172] avg loss 0.00105763, throughput 2.74188K wps
[Epoch 139 Batch 90/172] avg loss 0.000894631, throughput 2.70619K wps
[Epoch 139 Batch 120/172] avg loss 0.000866374, throughput 2.69887K wps
[Epoch 139 Batch 150/172] avg loss 0.00100509, throughput 2.76051K wps
Begin Testing...
[Epoch 139] train avg loss 0.000990661, dev acc 0.8784, dev avg loss 0.474769, throughput 2.73754K wps
[Epoch 140 Batch 30/172] avg loss 0.000729789, throughput 2.68503K wps
[Epoch 140 Batch 60/172] avg loss 0.00108121, throughput 2.68976K wps
[Epoch 140 Batch 90/172] avg loss 0.0008635, throughput 2.7622K wps
[Epoch 140 Batch 120/172] avg loss 0.000996771, throughput 2.73689K wps
[Epoch 140 Batch 150/172] avg loss 0.00110223, throughput 2.6441K wps
Begin Testing...
[Epoch 140] train avg loss 0.000963988, dev acc 0.8836, dev avg loss 0.463248, throughput 2.71948K wps
[Epoch 141 Batch 30/172] avg loss 0.000917509, throughput 2.78471K wps
[Epoch 141 Batch 60/172] avg loss 0.000941781, throughput 2.67285K wps
[Epoch 141 Batch 90/172] avg loss 0.000866924, throughput 2.68311K wps
[Epoch 141 Batch 120/172] avg loss 0.00107627, throughput 2.72308K wps
[Epoch 141 Batch 150/172] avg loss 0.0010928, throughput 2.72763K wps
Begin Testing...
[Epoch 141] train avg loss 0.000966701, dev acc 0.8795, dev avg loss 0.466666, throughput 2.72201K wps
[Epoch 142 Batch 30/172] avg loss 0.00100998, throughput 2.79843K wps
[Epoch 142 Batch 60/172] avg loss 0.000828522, throughput 2.71587K wps
[Epoch 142 Batch 90/172] avg loss 0.000927091, throughput 2.81838K wps
[Epoch 142 Batch 120/172] avg loss 0.000993486, throughput 2.75636K wps
[Epoch 142 Batch 150/172] avg loss 0.000793366, throughput 2.81713K wps
Begin Testing...
[Epoch 142] train avg loss 0.000977016, dev acc 0.8784, dev avg loss 0.467605, throughput 2.78675K wps
[Epoch 143 Batch 30/172] avg loss 0.0011566, throughput 2.80945K wps
[Epoch 143 Batch 60/172] avg loss 0.000867805, throughput 2.79413K wps
[Epoch 143 Batch 90/172] avg loss 0.000888303, throughput 2.84253K wps
[Epoch 143 Batch 120/172] avg loss 0.000921825, throughput 2.79428K wps
[Epoch 143 Batch 150/172] avg loss 0.000995727, throughput 2.78852K wps
Begin Testing...
[Epoch 143] train avg loss 0.000945557, dev acc 0.8826, dev avg loss 0.47204, throughput 2.7977K wps
[Epoch 144 Batch 30/172] avg loss 0.000958767, throughput 2.85132K wps
[Epoch 144 Batch 60/172] avg loss 0.000875519, throughput 2.76693K wps
[Epoch 144 Batch 90/172] avg loss 0.000946093, throughput 2.77127K wps
[Epoch 144 Batch 120/172] avg loss 0.000928899, throughput 2.66327K wps
[Epoch 144 Batch 150/172] avg loss 0.00104807, throughput 2.79627K wps
Begin Testing...
[Epoch 144] train avg loss 0.000962725, dev acc 0.8826, dev avg loss 0.476391, throughput 2.77025K wps
[Epoch 145 Batch 30/172] avg loss 0.000745263, throughput 2.67354K wps
[Epoch 145 Batch 60/172] avg loss 0.000964415, throughput 2.59481K wps
[Epoch 145 Batch 90/172] avg loss 0.000915074, throughput 2.79415K wps
[Epoch 145 Batch 120/172] avg loss 0.000945413, throughput 2.75227K wps
[Epoch 145 Batch 150/172] avg loss 0.0012348, throughput 2.77347K wps
Begin Testing...
[Epoch 145] train avg loss 0.000981904, dev acc 0.8795, dev avg loss 0.47143, throughput 2.72206K wps
[Epoch 146 Batch 30/172] avg loss 0.000868591, throughput 2.62138K wps
[Epoch 146 Batch 60/172] avg loss 0.000816665, throughput 2.81961K wps
[Epoch 146 Batch 90/172] avg loss 0.00100817, throughput 2.72037K wps
[Epoch 146 Batch 120/172] avg loss 0.000984949, throughput 2.71276K wps
[Epoch 146 Batch 150/172] avg loss 0.00103947, throughput 2.65182K wps
Begin Testing...
[Epoch 146] train avg loss 0.00094229, dev acc 0.8826, dev avg loss 0.471276, throughput 2.71939K wps
[Epoch 147 Batch 30/172] avg loss 0.000902361, throughput 2.76053K wps
[Epoch 147 Batch 60/172] avg loss 0.00095171, throughput 2.74718K wps
[Epoch 147 Batch 90/172] avg loss 0.000881304, throughput 2.7567K wps
[Epoch 147 Batch 120/172] avg loss 0.000876134, throughput 2.70702K wps
[Epoch 147 Batch 150/172] avg loss 0.00104379, throughput 2.7976K wps
Begin Testing...
[Epoch 147] train avg loss 0.000936384, dev acc 0.8805, dev avg loss 0.474815, throughput 2.75423K wps
[Epoch 148 Batch 30/172] avg loss 0.000815595, throughput 2.71243K wps
[Epoch 148 Batch 60/172] avg loss 0.000946289, throughput 2.63408K wps
[Epoch 148 Batch 90/172] avg loss 0.00108259, throughput 2.82033K wps
[Epoch 148 Batch 120/172] avg loss 0.000929928, throughput 2.77434K wps
[Epoch 148 Batch 150/172] avg loss 0.00100389, throughput 2.74625K wps
Begin Testing...
[Epoch 148] train avg loss 0.000945459, dev acc 0.8816, dev avg loss 0.481224, throughput 2.73906K wps
[Epoch 149 Batch 30/172] avg loss 0.00088144, throughput 2.78224K wps
[Epoch 149 Batch 60/172] avg loss 0.000901452, throughput 2.66342K wps
[Epoch 149 Batch 90/172] avg loss 0.00111937, throughput 2.77593K wps
[Epoch 149 Batch 120/172] avg loss 0.000877335, throughput 2.80223K wps
[Epoch 149 Batch 150/172] avg loss 0.000951954, throughput 2.75722K wps
Begin Testing...
[Epoch 149] train avg loss 0.000922932, dev acc 0.8816, dev avg loss 0.480142, throughput 2.75638K wps
[Epoch 150 Batch 30/172] avg loss 0.000910792, throughput 2.76487K wps
[Epoch 150 Batch 60/172] avg loss 0.000900058, throughput 2.75917K wps
[Epoch 150 Batch 90/172] avg loss 0.00107089, throughput 2.74717K wps
[Epoch 150 Batch 120/172] avg loss 0.000783581, throughput 2.59998K wps
[Epoch 150 Batch 150/172] avg loss 0.000795855, throughput 2.75497K wps
Begin Testing...
[Epoch 150] train avg loss 0.000910933, dev acc 0.8805, dev avg loss 0.480106, throughput 2.72348K wps
[Epoch 151 Batch 30/172] avg loss 0.00101437, throughput 2.67481K wps
[Epoch 151 Batch 60/172] avg loss 0.000723759, throughput 2.76367K wps
[Epoch 151 Batch 90/172] avg loss 0.000865984, throughput 2.66953K wps
[Epoch 151 Batch 120/172] avg loss 0.000729494, throughput 2.6918K wps
[Epoch 151 Batch 150/172] avg loss 0.00103898, throughput 2.75841K wps
Begin Testing...
[Epoch 151] train avg loss 0.000880716, dev acc 0.8774, dev avg loss 0.485057, throughput 2.72087K wps
[Epoch 152 Batch 30/172] avg loss 0.000889395, throughput 2.5484K wps
[Epoch 152 Batch 60/172] avg loss 0.000927335, throughput 2.77067K wps
[Epoch 152 Batch 90/172] avg loss 0.000948715, throughput 2.75432K wps
[Epoch 152 Batch 120/172] avg loss 0.000793356, throughput 2.72555K wps
[Epoch 152 Batch 150/172] avg loss 0.00101083, throughput 2.75162K wps
Begin Testing...
[Epoch 152] train avg loss 0.000944624, dev acc 0.8805, dev avg loss 0.477009, throughput 2.71228K wps
[Epoch 153 Batch 30/172] avg loss 0.000730505, throughput 2.66045K wps
[Epoch 153 Batch 60/172] avg loss 0.000906849, throughput 2.74171K wps
[Epoch 153 Batch 90/172] avg loss 0.000760334, throughput 2.73365K wps
[Epoch 153 Batch 120/172] avg loss 0.000813377, throughput 2.79466K wps
[Epoch 153 Batch 150/172] avg loss 0.00116105, throughput 2.79393K wps
Begin Testing...
[Epoch 153] train avg loss 0.000877959, dev acc 0.8836, dev avg loss 0.480097, throughput 2.7484K wps
[Epoch 154 Batch 30/172] avg loss 0.000881874, throughput 2.80065K wps
[Epoch 154 Batch 60/172] avg loss 0.000762928, throughput 2.67084K wps
[Epoch 154 Batch 90/172] avg loss 0.000853585, throughput 2.72639K wps
[Epoch 154 Batch 120/172] avg loss 0.00103556, throughput 2.61009K wps
[Epoch 154 Batch 150/172] avg loss 0.000919976, throughput 2.83063K wps
Begin Testing...
[Epoch 154] train avg loss 0.000922225, dev acc 0.8836, dev avg loss 0.481221, throughput 2.73087K wps
[Epoch 155 Batch 30/172] avg loss 0.000913694, throughput 2.69626K wps
[Epoch 155 Batch 60/172] avg loss 0.00080115, throughput 2.75626K wps
[Epoch 155 Batch 90/172] avg loss 0.00108223, throughput 2.68814K wps
[Epoch 155 Batch 120/172] avg loss 0.0007407, throughput 2.73268K wps
[Epoch 155 Batch 150/172] avg loss 0.000752809, throughput 2.64717K wps
Begin Testing...
[Epoch 155] train avg loss 0.000884884, dev acc 0.8836, dev avg loss 0.487371, throughput 2.71959K wps
[Epoch 156 Batch 30/172] avg loss 0.000838297, throughput 2.73767K wps
[Epoch 156 Batch 60/172] avg loss 0.00103441, throughput 2.61116K wps
[Epoch 156 Batch 90/172] avg loss 0.000922214, throughput 2.66003K wps
[Epoch 156 Batch 120/172] avg loss 0.000754375, throughput 2.76244K wps
[Epoch 156 Batch 150/172] avg loss 0.000918554, throughput 2.78912K wps
Begin Testing...
[Epoch 156] train avg loss 0.000882222, dev acc 0.8826, dev avg loss 0.490947, throughput 2.71931K wps
[Epoch 157 Batch 30/172] avg loss 0.000855929, throughput 2.76517K wps
[Epoch 157 Batch 60/172] avg loss 0.00082963, throughput 2.69443K wps
[Epoch 157 Batch 90/172] avg loss 0.0010367, throughput 2.71559K wps
[Epoch 157 Batch 120/172] avg loss 0.000800977, throughput 2.75679K wps
[Epoch 157 Batch 150/172] avg loss 0.000971094, throughput 2.63389K wps
Begin Testing...
[Epoch 157] train avg loss 0.00090748, dev acc 0.8805, dev avg loss 0.486811, throughput 2.71873K wps
[Epoch 158 Batch 30/172] avg loss 0.000834621, throughput 2.71796K wps
[Epoch 158 Batch 60/172] avg loss 0.000831272, throughput 2.77157K wps
[Epoch 158 Batch 90/172] avg loss 0.00102237, throughput 2.61379K wps
[Epoch 158 Batch 120/172] avg loss 0.000796644, throughput 2.83088K wps
[Epoch 158 Batch 150/172] avg loss 0.00090318, throughput 2.77296K wps
Begin Testing...
[Epoch 158] train avg loss 0.000880977, dev acc 0.8816, dev avg loss 0.48678, throughput 2.73667K wps
[Epoch 159 Batch 30/172] avg loss 0.000902976, throughput 2.78407K wps
[Epoch 159 Batch 60/172] avg loss 0.00104734, throughput 2.76554K wps
[Epoch 159 Batch 90/172] avg loss 0.000860002, throughput 2.67038K wps
[Epoch 159 Batch 120/172] avg loss 0.000957241, throughput 2.71309K wps
[Epoch 159 Batch 150/172] avg loss 0.000754149, throughput 2.79935K wps
Begin Testing...
[Epoch 159] train avg loss 0.000878047, dev acc 0.8836, dev avg loss 0.489765, throughput 2.7497K wps
[Epoch 160 Batch 30/172] avg loss 0.000751274, throughput 2.74092K wps
[Epoch 160 Batch 60/172] avg loss 0.000783081, throughput 2.69901K wps
[Epoch 160 Batch 90/172] avg loss 0.000798579, throughput 2.74987K wps
[Epoch 160 Batch 120/172] avg loss 0.00110633, throughput 2.77946K wps
[Epoch 160 Batch 150/172] avg loss 0.000721737, throughput 2.77139K wps
Begin Testing...
[Epoch 160] train avg loss 0.000850522, dev acc 0.8826, dev avg loss 0.492644, throughput 2.75155K wps
[Epoch 161 Batch 30/172] avg loss 0.000742588, throughput 2.79011K wps
[Epoch 161 Batch 60/172] avg loss 0.000917609, throughput 2.76964K wps
[Epoch 161 Batch 90/172] avg loss 0.000941758, throughput 2.68991K wps
[Epoch 161 Batch 120/172] avg loss 0.00086886, throughput 2.79291K wps
[Epoch 161 Batch 150/172] avg loss 0.00101312, throughput 2.68612K wps
Begin Testing...
[Epoch 161] train avg loss 0.000873412, dev acc 0.8784, dev avg loss 0.49121, throughput 2.74699K wps
[Epoch 162 Batch 30/172] avg loss 0.000767141, throughput 2.72015K wps
[Epoch 162 Batch 60/172] avg loss 0.000902945, throughput 2.74756K wps
[Epoch 162 Batch 90/172] avg loss 0.000924791, throughput 2.68374K wps
[Epoch 162 Batch 120/172] avg loss 0.000942971, throughput 2.66393K wps
[Epoch 162 Batch 150/172] avg loss 0.00087459, throughput 2.7467K wps
Begin Testing...
[Epoch 162] train avg loss 0.000864269, dev acc 0.8805, dev avg loss 0.491975, throughput 2.71933K wps
[Epoch 163 Batch 30/172] avg loss 0.000912824, throughput 2.74821K wps
[Epoch 163 Batch 60/172] avg loss 0.000872909, throughput 2.78948K wps
[Epoch 163 Batch 90/172] avg loss 0.000835264, throughput 2.73079K wps
[Epoch 163 Batch 120/172] avg loss 0.000679365, throughput 2.79082K wps
[Epoch 163 Batch 150/172] avg loss 0.000911729, throughput 2.74556K wps
Begin Testing...
[Epoch 163] train avg loss 0.000848803, dev acc 0.8816, dev avg loss 0.492266, throughput 2.76082K wps
[Epoch 164 Batch 30/172] avg loss 0.00060535, throughput 2.77087K wps
[Epoch 164 Batch 60/172] avg loss 0.000918689, throughput 2.78382K wps
[Epoch 164 Batch 90/172] avg loss 0.000974365, throughput 2.75924K wps
[Epoch 164 Batch 120/172] avg loss 0.00107656, throughput 2.68348K wps
[Epoch 164 Batch 150/172] avg loss 0.00078506, throughput 2.78142K wps
Begin Testing...
[Epoch 164] train avg loss 0.000886633, dev acc 0.8826, dev avg loss 0.493958, throughput 2.75874K wps
[Epoch 165 Batch 30/172] avg loss 0.000805851, throughput 2.81754K wps
[Epoch 165 Batch 60/172] avg loss 0.000756253, throughput 2.78308K wps
[Epoch 165 Batch 90/172] avg loss 0.00094154, throughput 2.76011K wps
[Epoch 165 Batch 120/172] avg loss 0.00103275, throughput 2.71965K wps
[Epoch 165 Batch 150/172] avg loss 0.00076917, throughput 2.65218K wps
Begin Testing...
[Epoch 165] train avg loss 0.000876977, dev acc 0.8805, dev avg loss 0.493445, throughput 2.73949K wps
[Epoch 166 Batch 30/172] avg loss 0.000943579, throughput 2.83237K wps
[Epoch 166 Batch 60/172] avg loss 0.00089639, throughput 2.77023K wps
[Epoch 166 Batch 90/172] avg loss 0.000800683, throughput 2.75561K wps
[Epoch 166 Batch 120/172] avg loss 0.000698472, throughput 2.75938K wps
[Epoch 166 Batch 150/172] avg loss 0.000884183, throughput 2.77275K wps
Begin Testing...
[Epoch 166] train avg loss 0.000846019, dev acc 0.8805, dev avg loss 0.502092, throughput 2.75623K wps
[Epoch 167 Batch 30/172] avg loss 0.000717016, throughput 2.84622K wps
[Epoch 167 Batch 60/172] avg loss 0.000827157, throughput 2.69195K wps
[Epoch 167 Batch 90/172] avg loss 0.000752611, throughput 2.7606K wps
[Epoch 167 Batch 120/172] avg loss 0.000981126, throughput 2.72085K wps
[Epoch 167 Batch 150/172] avg loss 0.000830166, throughput 2.82448K wps
Begin Testing...
[Epoch 167] train avg loss 0.000841103, dev acc 0.8826, dev avg loss 0.495295, throughput 2.76869K wps
[Epoch 168 Batch 30/172] avg loss 0.000897153, throughput 2.84974K wps
[Epoch 168 Batch 60/172] avg loss 0.000641675, throughput 2.73041K wps
[Epoch 168 Batch 90/172] avg loss 0.000829222, throughput 2.82508K wps
[Epoch 168 Batch 120/172] avg loss 0.000949478, throughput 2.77703K wps
[Epoch 168 Batch 150/172] avg loss 0.000787532, throughput 2.7083K wps
Begin Testing...
[Epoch 168] train avg loss 0.00083918, dev acc 0.8805, dev avg loss 0.498216, throughput 2.78157K wps
[Epoch 169 Batch 30/172] avg loss 0.000962621, throughput 2.79838K wps
[Epoch 169 Batch 60/172] avg loss 0.000779156, throughput 2.77652K wps
[Epoch 169 Batch 90/172] avg loss 0.000825717, throughput 2.82066K wps
[Epoch 169 Batch 120/172] avg loss 0.000860643, throughput 2.81051K wps
[Epoch 169 Batch 150/172] avg loss 0.000798194, throughput 2.67104K wps
Begin Testing...
[Epoch 169] train avg loss 0.000842852, dev acc 0.8836, dev avg loss 0.501813, throughput 2.78711K wps
[Epoch 170 Batch 30/172] avg loss 0.00079104, throughput 2.81363K wps
[Epoch 170 Batch 60/172] avg loss 0.000732077, throughput 2.76976K wps
[Epoch 170 Batch 90/172] avg loss 0.000669385, throughput 2.7638K wps
[Epoch 170 Batch 120/172] avg loss 0.000966155, throughput 2.79609K wps
[Epoch 170 Batch 150/172] avg loss 0.000903827, throughput 2.78139K wps
Begin Testing...
[Epoch 170] train avg loss 0.000837639, dev acc 0.8805, dev avg loss 0.49944, throughput 2.78385K wps
[Epoch 171 Batch 30/172] avg loss 0.000799354, throughput 2.64349K wps
[Epoch 171 Batch 60/172] avg loss 0.000732472, throughput 2.7514K wps
[Epoch 171 Batch 90/172] avg loss 0.000753717, throughput 2.74014K wps
[Epoch 171 Batch 120/172] avg loss 0.00101886, throughput 2.73732K wps
[Epoch 171 Batch 150/172] avg loss 0.000786563, throughput 2.73924K wps
Begin Testing...
[Epoch 171] train avg loss 0.000829677, dev acc 0.8805, dev avg loss 0.503012, throughput 2.7141K wps
[Epoch 172 Batch 30/172] avg loss 0.000782264, throughput 2.73821K wps
[Epoch 172 Batch 60/172] avg loss 0.000821224, throughput 2.66969K wps
[Epoch 172 Batch 90/172] avg loss 0.000997733, throughput 2.7349K wps
[Epoch 172 Batch 120/172] avg loss 0.000751802, throughput 2.66676K wps
[Epoch 172 Batch 150/172] avg loss 0.000908178, throughput 2.7623K wps
Begin Testing...
[Epoch 172] train avg loss 0.000844521, dev acc 0.8826, dev avg loss 0.506029, throughput 2.70684K wps
[Epoch 173 Batch 30/172] avg loss 0.000752746, throughput 2.7836K wps
[Epoch 173 Batch 60/172] avg loss 0.000803637, throughput 2.74232K wps
[Epoch 173 Batch 90/172] avg loss 0.000835174, throughput 2.75189K wps
[Epoch 173 Batch 120/172] avg loss 0.000732063, throughput 2.73531K wps
[Epoch 173 Batch 150/172] avg loss 0.000919066, throughput 2.74832K wps
Begin Testing...
[Epoch 173] train avg loss 0.000830415, dev acc 0.8795, dev avg loss 0.507847, throughput 2.74952K wps
[Epoch 174 Batch 30/172] avg loss 0.000715814, throughput 2.8015K wps
[Epoch 174 Batch 60/172] avg loss 0.000775641, throughput 2.76763K wps
[Epoch 174 Batch 90/172] avg loss 0.000788667, throughput 2.69463K wps
[Epoch 174 Batch 120/172] avg loss 0.000875091, throughput 2.6013K wps
[Epoch 174 Batch 150/172] avg loss 0.000960051, throughput 2.74831K wps
Begin Testing...
[Epoch 174] train avg loss 0.000809684, dev acc 0.8805, dev avg loss 0.506295, throughput 2.70325K wps
[Epoch 175 Batch 30/172] avg loss 0.000900068, throughput 2.78317K wps
[Epoch 175 Batch 60/172] avg loss 0.000835169, throughput 2.7887K wps
[Epoch 175 Batch 90/172] avg loss 0.000778441, throughput 2.75226K wps
[Epoch 175 Batch 120/172] avg loss 0.000976709, throughput 2.7701K wps
[Epoch 175 Batch 150/172] avg loss 0.00086215, throughput 2.65163K wps
Begin Testing...
[Epoch 175] train avg loss 0.000858631, dev acc 0.8795, dev avg loss 0.504212, throughput 2.76419K wps
[Epoch 176 Batch 30/172] avg loss 0.000883148, throughput 2.83306K wps
[Epoch 176 Batch 60/172] avg loss 0.000776314, throughput 2.7919K wps
[Epoch 176 Batch 90/172] avg loss 0.000700888, throughput 2.68082K wps
[Epoch 176 Batch 120/172] avg loss 0.000886362, throughput 2.79568K wps
[Epoch 176 Batch 150/172] avg loss 0.000844084, throughput 2.79047K wps
Begin Testing...
[Epoch 176] train avg loss 0.000807773, dev acc 0.8805, dev avg loss 0.50521, throughput 2.77201K wps
[Epoch 177 Batch 30/172] avg loss 0.000908844, throughput 2.78272K wps
[Epoch 177 Batch 60/172] avg loss 0.000688398, throughput 2.79445K wps
[Epoch 177 Batch 90/172] avg loss 0.000772481, throughput 2.64424K wps
[Epoch 177 Batch 120/172] avg loss 0.000904445, throughput 2.71478K wps
[Epoch 177 Batch 150/172] avg loss 0.000757693, throughput 2.67523K wps
Begin Testing...
[Epoch 177] train avg loss 0.000816219, dev acc 0.8774, dev avg loss 0.513817, throughput 2.73341K wps
[Epoch 178 Batch 30/172] avg loss 0.000857834, throughput 2.72923K wps
[Epoch 178 Batch 60/172] avg loss 0.000744231, throughput 2.757K wps
[Epoch 178 Batch 90/172] avg loss 0.000980061, throughput 2.73466K wps
[Epoch 178 Batch 120/172] avg loss 0.000674989, throughput 2.80472K wps
[Epoch 178 Batch 150/172] avg loss 0.000671365, throughput 2.74524K wps
Begin Testing...
[Epoch 178] train avg loss 0.000805757, dev acc 0.8836, dev avg loss 0.508016, throughput 2.74597K wps
[Epoch 179 Batch 30/172] avg loss 0.000711134, throughput 2.78594K wps
[Epoch 179 Batch 60/172] avg loss 0.000991503, throughput 2.75398K wps
[Epoch 179 Batch 90/172] avg loss 0.000864205, throughput 2.71065K wps
[Epoch 179 Batch 120/172] avg loss 0.000831556, throughput 2.71672K wps
[Epoch 179 Batch 150/172] avg loss 0.000854456, throughput 2.68375K wps
Begin Testing...
[Epoch 179] train avg loss 0.000841755, dev acc 0.8784, dev avg loss 0.512375, throughput 2.73476K wps
[Epoch 180 Batch 30/172] avg loss 0.000758554, throughput 2.7188K wps
[Epoch 180 Batch 60/172] avg loss 0.000748487, throughput 2.54099K wps
[Epoch 180 Batch 90/172] avg loss 0.00081645, throughput 2.78824K wps
[Epoch 180 Batch 120/172] avg loss 0.00088454, throughput 2.7679K wps
[Epoch 180 Batch 150/172] avg loss 0.000712311, throughput 2.70608K wps
Begin Testing...
[Epoch 180] train avg loss 0.000815589, dev acc 0.8826, dev avg loss 0.51163, throughput 2.70659K wps
[Epoch 181 Batch 30/172] avg loss 0.000683871, throughput 2.80252K wps
[Epoch 181 Batch 60/172] avg loss 0.000920173, throughput 2.69484K wps
[Epoch 181 Batch 90/172] avg loss 0.000790321, throughput 2.73161K wps
[Epoch 181 Batch 120/172] avg loss 0.000879124, throughput 2.6593K wps
[Epoch 181 Batch 150/172] avg loss 0.000864441, throughput 2.69783K wps
Begin Testing...
[Epoch 181] train avg loss 0.000829125, dev acc 0.8805, dev avg loss 0.511343, throughput 2.72165K wps
[Epoch 182 Batch 30/172] avg loss 0.000854921, throughput 2.66776K wps
[Epoch 182 Batch 60/172] avg loss 0.000890179, throughput 2.76054K wps
[Epoch 182 Batch 90/172] avg loss 0.000737673, throughput 2.74338K wps
[Epoch 182 Batch 120/172] avg loss 0.000807739, throughput 2.75963K wps
[Epoch 182 Batch 150/172] avg loss 0.000900371, throughput 2.70349K wps
Begin Testing...
[Epoch 182] train avg loss 0.00081264, dev acc 0.8805, dev avg loss 0.513566, throughput 2.73665K wps
[Epoch 183 Batch 30/172] avg loss 0.000726739, throughput 2.77138K wps
[Epoch 183 Batch 60/172] avg loss 0.000825717, throughput 2.69576K wps
[Epoch 183 Batch 90/172] avg loss 0.00076022, throughput 2.8171K wps
[Epoch 183 Batch 120/172] avg loss 0.000935737, throughput 2.79386K wps
[Epoch 183 Batch 150/172] avg loss 0.000687785, throughput 2.80149K wps
Begin Testing...
[Epoch 183] train avg loss 0.000806861, dev acc 0.8826, dev avg loss 0.51174, throughput 2.77961K wps
[Epoch 184 Batch 30/172] avg loss 0.000917351, throughput 2.78641K wps
[Epoch 184 Batch 60/172] avg loss 0.000730989, throughput 2.77336K wps
[Epoch 184 Batch 90/172] avg loss 0.000765249, throughput 2.77152K wps
[Epoch 184 Batch 120/172] avg loss 0.000781856, throughput 2.81042K wps
[Epoch 184 Batch 150/172] avg loss 0.000697268, throughput 2.76712K wps
Begin Testing...
[Epoch 184] train avg loss 0.000788817, dev acc 0.8826, dev avg loss 0.510684, throughput 2.77613K wps
[Epoch 185 Batch 30/172] avg loss 0.000677687, throughput 2.79569K wps
[Epoch 185 Batch 60/172] avg loss 0.000725582, throughput 2.64335K wps
[Epoch 185 Batch 90/172] avg loss 0.000671022, throughput 2.71295K wps
[Epoch 185 Batch 120/172] avg loss 0.000795624, throughput 2.8504K wps
[Epoch 185 Batch 150/172] avg loss 0.000969466, throughput 2.80959K wps
Begin Testing...
[Epoch 185] train avg loss 0.000785651, dev acc 0.8805, dev avg loss 0.512237, throughput 2.75514K wps
[Epoch 186 Batch 30/172] avg loss 0.000708741, throughput 2.80449K wps
[Epoch 186 Batch 60/172] avg loss 0.000680795, throughput 2.6716K wps
[Epoch 186 Batch 90/172] avg loss 0.0010825, throughput 2.70475K wps
[Epoch 186 Batch 120/172] avg loss 0.000690544, throughput 2.70957K wps
[Epoch 186 Batch 150/172] avg loss 0.000819221, throughput 2.81789K wps
Begin Testing...
[Epoch 186] train avg loss 0.000795689, dev acc 0.8774, dev avg loss 0.527923, throughput 2.74403K wps
[Epoch 187 Batch 30/172] avg loss 0.000832037, throughput 2.81647K wps
[Epoch 187 Batch 60/172] avg loss 0.00076423, throughput 2.66883K wps
[Epoch 187 Batch 90/172] avg loss 0.000742903, throughput 2.82424K wps
[Epoch 187 Batch 120/172] avg loss 0.000728803, throughput 2.73719K wps
[Epoch 187 Batch 150/172] avg loss 0.000877031, throughput 2.74487K wps
Begin Testing...
[Epoch 187] train avg loss 0.000790994, dev acc 0.8795, dev avg loss 0.521996, throughput 2.75535K wps
[Epoch 188 Batch 30/172] avg loss 0.000921589, throughput 2.6691K wps
[Epoch 188 Batch 60/172] avg loss 0.000730195, throughput 2.72791K wps
[Epoch 188 Batch 90/172] avg loss 0.000760872, throughput 2.74097K wps
[Epoch 188 Batch 120/172] avg loss 0.000725075, throughput 2.75189K wps
[Epoch 188 Batch 150/172] avg loss 0.000686427, throughput 2.739K wps
Begin Testing...
[Epoch 188] train avg loss 0.000781698, dev acc 0.8836, dev avg loss 0.516236, throughput 2.72916K wps
[Epoch 189 Batch 30/172] avg loss 0.000768229, throughput 2.75812K wps
[Epoch 189 Batch 60/172] avg loss 0.00084211, throughput 2.65205K wps
[Epoch 189 Batch 90/172] avg loss 0.000695547, throughput 2.76453K wps
[Epoch 189 Batch 120/172] avg loss 0.000812706, throughput 2.74078K wps
[Epoch 189 Batch 150/172] avg loss 0.000939919, throughput 2.71565K wps
Begin Testing...
[Epoch 189] train avg loss 0.000793804, dev acc 0.8774, dev avg loss 0.522994, throughput 2.72357K wps
[Epoch 190 Batch 30/172] avg loss 0.000706315, throughput 2.79559K wps
[Epoch 190 Batch 60/172] avg loss 0.000902893, throughput 2.76679K wps
[Epoch 190 Batch 90/172] avg loss 0.000867617, throughput 2.78633K wps
[Epoch 190 Batch 120/172] avg loss 0.000834234, throughput 2.79132K wps
[Epoch 190 Batch 150/172] avg loss 0.000704301, throughput 2.81543K wps
Begin Testing...
[Epoch 190] train avg loss 0.000815156, dev acc 0.8816, dev avg loss 0.51463, throughput 2.79329K wps
[Epoch 191 Batch 30/172] avg loss 0.000673312, throughput 2.75995K wps
[Epoch 191 Batch 60/172] avg loss 0.000754415, throughput 2.81671K wps
[Epoch 191 Batch 90/172] avg loss 0.000942849, throughput 2.8059K wps
[Epoch 191 Batch 120/172] avg loss 0.000724117, throughput 2.79278K wps
[Epoch 191 Batch 150/172] avg loss 0.000839356, throughput 2.83948K wps
Begin Testing...
[Epoch 191] train avg loss 0.000794769, dev acc 0.8795, dev avg loss 0.521961, throughput 2.8004K wps
[Epoch 192 Batch 30/172] avg loss 0.000680415, throughput 2.86543K wps
[Epoch 192 Batch 60/172] avg loss 0.000541642, throughput 2.82223K wps
[Epoch 192 Batch 90/172] avg loss 0.000803169, throughput 2.73707K wps
[Epoch 192 Batch 120/172] avg loss 0.00089396, throughput 2.87651K wps
[Epoch 192 Batch 150/172] avg loss 0.00106629, throughput 2.80584K wps
Begin Testing...
[Epoch 192] train avg loss 0.000783022, dev acc 0.8784, dev avg loss 0.517898, throughput 2.81311K wps
[Epoch 193 Batch 30/172] avg loss 0.000681166, throughput 2.82648K wps
[Epoch 193 Batch 60/172] avg loss 0.000851911, throughput 2.76093K wps
[Epoch 193 Batch 90/172] avg loss 0.000903668, throughput 2.77093K wps
[Epoch 193 Batch 120/172] avg loss 0.000727067, throughput 2.66361K wps
[Epoch 193 Batch 150/172] avg loss 0.000752951, throughput 2.73947K wps
Begin Testing...
[Epoch 193] train avg loss 0.000774337, dev acc 0.8795, dev avg loss 0.525503, throughput 2.74809K wps
[Epoch 194 Batch 30/172] avg loss 0.000712265, throughput 2.55029K wps
[Epoch 194 Batch 60/172] avg loss 0.000919048, throughput 2.72863K wps
[Epoch 194 Batch 90/172] avg loss 0.000608845, throughput 2.72805K wps
[Epoch 194 Batch 120/172] avg loss 0.000797669, throughput 2.74422K wps
[Epoch 194 Batch 150/172] avg loss 0.000822338, throughput 2.5788K wps
Begin Testing...
[Epoch 194] train avg loss 0.000772406, dev acc 0.8816, dev avg loss 0.520595, throughput 2.68054K wps
[Epoch 195 Batch 30/172] avg loss 0.000954505, throughput 2.68714K wps
[Epoch 195 Batch 60/172] avg loss 0.000684932, throughput 2.65935K wps
[Epoch 195 Batch 90/172] avg loss 0.000625747, throughput 2.62703K wps
[Epoch 195 Batch 120/172] avg loss 0.000854033, throughput 2.79285K wps
[Epoch 195 Batch 150/172] avg loss 0.000779062, throughput 2.74457K wps
Begin Testing...
[Epoch 195] train avg loss 0.00076017, dev acc 0.8774, dev avg loss 0.533805, throughput 2.70539K wps
[Epoch 196 Batch 30/172] avg loss 0.00071123, throughput 2.5916K wps
[Epoch 196 Batch 60/172] avg loss 0.000661665, throughput 2.7477K wps
[Epoch 196 Batch 90/172] avg loss 0.000778948, throughput 2.76931K wps
[Epoch 196 Batch 120/172] avg loss 0.000856108, throughput 2.67219K wps
[Epoch 196 Batch 150/172] avg loss 0.000772566, throughput 2.80148K wps
Begin Testing...
[Epoch 196] train avg loss 0.000792866, dev acc 0.8774, dev avg loss 0.520917, throughput 2.72339K wps
[Epoch 197 Batch 30/172] avg loss 0.000790628, throughput 2.67766K wps
[Epoch 197 Batch 60/172] avg loss 0.000860607, throughput 2.76961K wps
[Epoch 197 Batch 90/172] avg loss 0.000877589, throughput 2.64383K wps
[Epoch 197 Batch 120/172] avg loss 0.000611034, throughput 2.8247K wps
[Epoch 197 Batch 150/172] avg loss 0.00080183, throughput 2.80543K wps
Begin Testing...
[Epoch 197] train avg loss 0.000775703, dev acc 0.8795, dev avg loss 0.525543, throughput 2.75078K wps
[Epoch 198 Batch 30/172] avg loss 0.000665315, throughput 2.80016K wps
[Epoch 198 Batch 60/172] avg loss 0.00082959, throughput 2.70923K wps
[Epoch 198 Batch 90/172] avg loss 0.000594266, throughput 2.82224K wps
[Epoch 198 Batch 120/172] avg loss 0.000810716, throughput 2.809K wps
[Epoch 198 Batch 150/172] avg loss 0.000950606, throughput 2.70471K wps
Begin Testing...
[Epoch 198] train avg loss 0.000787609, dev acc 0.8795, dev avg loss 0.524343, throughput 2.76671K wps
[Epoch 199 Batch 30/172] avg loss 0.000745704, throughput 2.80686K wps
[Epoch 199 Batch 60/172] avg loss 0.000858091, throughput 2.62173K wps
[Epoch 199 Batch 90/172] avg loss 0.000751961, throughput 2.774K wps
[Epoch 199 Batch 120/172] avg loss 0.000689031, throughput 2.72785K wps
[Epoch 199 Batch 150/172] avg loss 0.000828458, throughput 2.72501K wps
Begin Testing...
[Epoch 199] train avg loss 0.000783533, dev acc 0.8774, dev avg loss 0.523398, throughput 2.73478K wps
Test loss 0.360084, test acc 0.8764
Total time cost 468.15s
[Epoch 0 Batch 30/172] avg loss 0.0127795, throughput 2.5716K wps
[Epoch 0 Batch 60/172] avg loss 0.0123127, throughput 2.74142K wps
[Epoch 0 Batch 90/172] avg loss 0.0123193, throughput 2.79129K wps
[Epoch 0 Batch 120/172] avg loss 0.0123357, throughput 2.79338K wps
[Epoch 0 Batch 150/172] avg loss 0.0122514, throughput 2.76842K wps
Begin Testing...
[Epoch 0] train avg loss 0.0123865, dev acc 0.6771, dev avg loss 0.61606, throughput 2.73223K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/172] avg loss 0.0118825, throughput 2.75666K wps
[Epoch 1 Batch 60/172] avg loss 0.0118096, throughput 2.74845K wps
[Epoch 1 Batch 90/172] avg loss 0.0122284, throughput 2.74244K wps
[Epoch 1 Batch 120/172] avg loss 0.0120171, throughput 2.74943K wps
[Epoch 1 Batch 150/172] avg loss 0.0118781, throughput 2.75557K wps
Begin Testing...
[Epoch 1] train avg loss 0.0119531, dev acc 0.6771, dev avg loss 0.601406, throughput 2.75281K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/172] avg loss 0.0117396, throughput 2.78909K wps
[Epoch 2 Batch 60/172] avg loss 0.0117339, throughput 2.76862K wps
[Epoch 2 Batch 90/172] avg loss 0.0118101, throughput 2.73259K wps
[Epoch 2 Batch 120/172] avg loss 0.011793, throughput 2.76525K wps
[Epoch 2 Batch 150/172] avg loss 0.0116368, throughput 2.72186K wps
Begin Testing...
[Epoch 2] train avg loss 0.0116804, dev acc 0.6771, dev avg loss 0.585678, throughput 2.75848K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/172] avg loss 0.011514, throughput 2.74626K wps
[Epoch 3 Batch 60/172] avg loss 0.011195, throughput 2.70189K wps
[Epoch 3 Batch 90/172] avg loss 0.0113321, throughput 2.66463K wps
[Epoch 3 Batch 120/172] avg loss 0.0113185, throughput 2.75261K wps
[Epoch 3 Batch 150/172] avg loss 0.0110403, throughput 2.79107K wps
Begin Testing...
[Epoch 3] train avg loss 0.0112701, dev acc 0.6834, dev avg loss 0.564125, throughput 2.7345K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/172] avg loss 0.0108465, throughput 2.7352K wps
[Epoch 4 Batch 60/172] avg loss 0.0113698, throughput 2.81393K wps
[Epoch 4 Batch 90/172] avg loss 0.0110772, throughput 2.70193K wps
[Epoch 4 Batch 120/172] avg loss 0.0104724, throughput 2.82285K wps
[Epoch 4 Batch 150/172] avg loss 0.0105704, throughput 2.7392K wps
Begin Testing...
[Epoch 4] train avg loss 0.0108232, dev acc 0.7065, dev avg loss 0.538941, throughput 2.76814K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/172] avg loss 0.0104012, throughput 2.65386K wps
[Epoch 5 Batch 60/172] avg loss 0.0101306, throughput 2.85951K wps
[Epoch 5 Batch 90/172] avg loss 0.0104339, throughput 2.80699K wps
[Epoch 5 Batch 120/172] avg loss 0.0100064, throughput 2.68758K wps
[Epoch 5 Batch 150/172] avg loss 0.0102722, throughput 2.72051K wps
Begin Testing...
[Epoch 5] train avg loss 0.0102571, dev acc 0.7589, dev avg loss 0.50724, throughput 2.75842K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/172] avg loss 0.00968505, throughput 2.7744K wps
[Epoch 6 Batch 60/172] avg loss 0.00973779, throughput 2.7858K wps
[Epoch 6 Batch 90/172] avg loss 0.00958833, throughput 2.74786K wps
[Epoch 6 Batch 120/172] avg loss 0.00981572, throughput 2.80189K wps
[Epoch 6 Batch 150/172] avg loss 0.00937121, throughput 2.62724K wps
Begin Testing...
[Epoch 6] train avg loss 0.00964205, dev acc 0.7883, dev avg loss 0.474024, throughput 2.73091K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/172] avg loss 0.00935314, throughput 2.7714K wps
[Epoch 7 Batch 60/172] avg loss 0.0090669, throughput 2.77539K wps
[Epoch 7 Batch 90/172] avg loss 0.00891175, throughput 2.65035K wps
[Epoch 7 Batch 120/172] avg loss 0.00920498, throughput 2.77165K wps
[Epoch 7 Batch 150/172] avg loss 0.00875514, throughput 2.71395K wps
Begin Testing...
[Epoch 7] train avg loss 0.00902372, dev acc 0.7925, dev avg loss 0.441802, throughput 2.74042K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/172] avg loss 0.00840035, throughput 2.81255K wps
[Epoch 8 Batch 60/172] avg loss 0.0085087, throughput 2.73359K wps
[Epoch 8 Batch 90/172] avg loss 0.00865519, throughput 2.76755K wps
[Epoch 8 Batch 120/172] avg loss 0.00842124, throughput 2.76297K wps
[Epoch 8 Batch 150/172] avg loss 0.00828181, throughput 2.72606K wps
Begin Testing...
[Epoch 8] train avg loss 0.00839458, dev acc 0.8218, dev avg loss 0.410387, throughput 2.75965K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/172] avg loss 0.00814128, throughput 2.82352K wps
[Epoch 9 Batch 60/172] avg loss 0.00817808, throughput 2.76995K wps
[Epoch 9 Batch 90/172] avg loss 0.00778083, throughput 2.63186K wps
[Epoch 9 Batch 120/172] avg loss 0.0077185, throughput 2.70336K wps
[Epoch 9 Batch 150/172] avg loss 0.00786287, throughput 2.75017K wps
Begin Testing...
[Epoch 9] train avg loss 0.00788518, dev acc 0.8323, dev avg loss 0.386395, throughput 2.74159K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/172] avg loss 0.00727284, throughput 2.75137K wps
[Epoch 10 Batch 60/172] avg loss 0.0077665, throughput 2.75451K wps
[Epoch 10 Batch 90/172] avg loss 0.00729863, throughput 2.77134K wps
[Epoch 10 Batch 120/172] avg loss 0.00767402, throughput 2.70586K wps
[Epoch 10 Batch 150/172] avg loss 0.00717885, throughput 2.79456K wps
Begin Testing...
[Epoch 10] train avg loss 0.00740074, dev acc 0.8417, dev avg loss 0.367184, throughput 2.76393K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/172] avg loss 0.00688197, throughput 2.69193K wps
[Epoch 11 Batch 60/172] avg loss 0.00712591, throughput 2.71183K wps
[Epoch 11 Batch 90/172] avg loss 0.0069986, throughput 2.86255K wps
[Epoch 11 Batch 120/172] avg loss 0.00680128, throughput 2.78459K wps
[Epoch 11 Batch 150/172] avg loss 0.00696962, throughput 2.78238K wps
Begin Testing...
[Epoch 11] train avg loss 0.00697421, dev acc 0.8700, dev avg loss 0.344394, throughput 2.7605K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/172] avg loss 0.0065241, throughput 2.86048K wps
[Epoch 12 Batch 60/172] avg loss 0.00662805, throughput 2.76115K wps
[Epoch 12 Batch 90/172] avg loss 0.00678966, throughput 2.71008K wps
[Epoch 12 Batch 120/172] avg loss 0.00670952, throughput 2.81958K wps
[Epoch 12 Batch 150/172] avg loss 0.00669365, throughput 2.63089K wps
Begin Testing...
[Epoch 12] train avg loss 0.00663949, dev acc 0.8711, dev avg loss 0.33092, throughput 2.74907K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/172] avg loss 0.00623887, throughput 2.79954K wps
[Epoch 13 Batch 60/172] avg loss 0.00665627, throughput 2.7596K wps
[Epoch 13 Batch 90/172] avg loss 0.00643556, throughput 2.77652K wps
[Epoch 13 Batch 120/172] avg loss 0.00642379, throughput 2.74961K wps
[Epoch 13 Batch 150/172] avg loss 0.00604633, throughput 2.80318K wps
Begin Testing...
[Epoch 13] train avg loss 0.0063191, dev acc 0.8889, dev avg loss 0.319008, throughput 2.77683K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/172] avg loss 0.00636208, throughput 2.7764K wps
[Epoch 14 Batch 60/172] avg loss 0.00607271, throughput 2.61765K wps
[Epoch 14 Batch 90/172] avg loss 0.00605953, throughput 2.69076K wps
[Epoch 14 Batch 120/172] avg loss 0.0063374, throughput 2.75365K wps
[Epoch 14 Batch 150/172] avg loss 0.00540225, throughput 2.64865K wps
Begin Testing...
[Epoch 14] train avg loss 0.00605731, dev acc 0.8868, dev avg loss 0.311268, throughput 2.70969K wps
[Epoch 15 Batch 30/172] avg loss 0.00585507, throughput 2.81343K wps
[Epoch 15 Batch 60/172] avg loss 0.00567421, throughput 2.73717K wps
[Epoch 15 Batch 90/172] avg loss 0.00537615, throughput 2.75537K wps
[Epoch 15 Batch 120/172] avg loss 0.00616933, throughput 2.70967K wps
[Epoch 15 Batch 150/172] avg loss 0.00601172, throughput 2.56268K wps
Begin Testing...
[Epoch 15] train avg loss 0.00579281, dev acc 0.8899, dev avg loss 0.301544, throughput 2.72472K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/172] avg loss 0.00541363, throughput 2.75178K wps
[Epoch 16 Batch 60/172] avg loss 0.0060939, throughput 2.52727K wps
[Epoch 16 Batch 90/172] avg loss 0.00529784, throughput 2.79125K wps
[Epoch 16 Batch 120/172] avg loss 0.00556229, throughput 2.75697K wps
[Epoch 16 Batch 150/172] avg loss 0.00569516, throughput 2.6016K wps
Begin Testing...
[Epoch 16] train avg loss 0.00557021, dev acc 0.8931, dev avg loss 0.295785, throughput 2.70007K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/172] avg loss 0.005526, throughput 2.81046K wps
[Epoch 17 Batch 60/172] avg loss 0.00511834, throughput 2.67226K wps
[Epoch 17 Batch 90/172] avg loss 0.00543857, throughput 2.69808K wps
[Epoch 17 Batch 120/172] avg loss 0.0053515, throughput 2.74756K wps
[Epoch 17 Batch 150/172] avg loss 0.00587533, throughput 2.78313K wps
Begin Testing...
[Epoch 17] train avg loss 0.00544643, dev acc 0.8931, dev avg loss 0.289748, throughput 2.74511K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/172] avg loss 0.00541547, throughput 2.71987K wps
[Epoch 18 Batch 60/172] avg loss 0.00499411, throughput 2.80532K wps
[Epoch 18 Batch 90/172] avg loss 0.00538614, throughput 2.72296K wps
[Epoch 18 Batch 120/172] avg loss 0.00488918, throughput 2.79249K wps
[Epoch 18 Batch 150/172] avg loss 0.00563513, throughput 2.80135K wps
Begin Testing...
[Epoch 18] train avg loss 0.00531556, dev acc 0.9025, dev avg loss 0.285484, throughput 2.76509K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/172] avg loss 0.00508111, throughput 2.7522K wps
[Epoch 19 Batch 60/172] avg loss 0.00478034, throughput 2.85972K wps
[Epoch 19 Batch 90/172] avg loss 0.0051471, throughput 2.70322K wps
[Epoch 19 Batch 120/172] avg loss 0.00519238, throughput 2.78885K wps
[Epoch 19 Batch 150/172] avg loss 0.00523214, throughput 2.72213K wps
Begin Testing...
[Epoch 19] train avg loss 0.00513526, dev acc 0.9036, dev avg loss 0.281429, throughput 2.78129K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/172] avg loss 0.00529872, throughput 2.76577K wps
[Epoch 20 Batch 60/172] avg loss 0.00462951, throughput 2.80129K wps
[Epoch 20 Batch 90/172] avg loss 0.00496299, throughput 2.80563K wps
[Epoch 20 Batch 120/172] avg loss 0.00481824, throughput 2.65143K wps
[Epoch 20 Batch 150/172] avg loss 0.00540663, throughput 2.82557K wps
Begin Testing...
[Epoch 20] train avg loss 0.00499096, dev acc 0.9046, dev avg loss 0.278695, throughput 2.77049K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/172] avg loss 0.00447379, throughput 2.71962K wps
[Epoch 21 Batch 60/172] avg loss 0.00470366, throughput 2.71495K wps
[Epoch 21 Batch 90/172] avg loss 0.00528733, throughput 2.71832K wps
[Epoch 21 Batch 120/172] avg loss 0.00477801, throughput 2.75608K wps
[Epoch 21 Batch 150/172] avg loss 0.00431154, throughput 2.78929K wps
Begin Testing...
[Epoch 21] train avg loss 0.00477094, dev acc 0.9057, dev avg loss 0.275344, throughput 2.72358K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/172] avg loss 0.00488276, throughput 2.84189K wps
[Epoch 22 Batch 60/172] avg loss 0.00479411, throughput 2.64481K wps
[Epoch 22 Batch 90/172] avg loss 0.00473587, throughput 2.81503K wps
[Epoch 22 Batch 120/172] avg loss 0.00444735, throughput 2.75153K wps
[Epoch 22 Batch 150/172] avg loss 0.00450501, throughput 2.72965K wps
Begin Testing...
[Epoch 22] train avg loss 0.00465124, dev acc 0.9036, dev avg loss 0.272579, throughput 2.73806K wps
[Epoch 23 Batch 30/172] avg loss 0.00429775, throughput 2.67256K wps
[Epoch 23 Batch 60/172] avg loss 0.0046306, throughput 2.71925K wps
[Epoch 23 Batch 90/172] avg loss 0.00481733, throughput 2.67793K wps
[Epoch 23 Batch 120/172] avg loss 0.00474854, throughput 2.7699K wps
[Epoch 23 Batch 150/172] avg loss 0.00466242, throughput 2.71647K wps
Begin Testing...
[Epoch 23] train avg loss 0.00457313, dev acc 0.9046, dev avg loss 0.271762, throughput 2.71874K wps
[Epoch 24 Batch 30/172] avg loss 0.00443744, throughput 2.70198K wps
[Epoch 24 Batch 60/172] avg loss 0.00437595, throughput 2.68199K wps
[Epoch 24 Batch 90/172] avg loss 0.00453134, throughput 2.7308K wps
[Epoch 24 Batch 120/172] avg loss 0.00475098, throughput 2.74799K wps
[Epoch 24 Batch 150/172] avg loss 0.00440335, throughput 2.77286K wps
Begin Testing...
[Epoch 24] train avg loss 0.00448695, dev acc 0.9036, dev avg loss 0.269277, throughput 2.73307K wps
[Epoch 25 Batch 30/172] avg loss 0.00455458, throughput 2.7911K wps
[Epoch 25 Batch 60/172] avg loss 0.00432152, throughput 2.74334K wps
[Epoch 25 Batch 90/172] avg loss 0.00407983, throughput 2.79171K wps
[Epoch 25 Batch 120/172] avg loss 0.00429959, throughput 2.80872K wps
[Epoch 25 Batch 150/172] avg loss 0.00449776, throughput 2.81235K wps
Begin Testing...
[Epoch 25] train avg loss 0.00433306, dev acc 0.9046, dev avg loss 0.268205, throughput 2.79326K wps
[Epoch 26 Batch 30/172] avg loss 0.00436407, throughput 2.74452K wps
[Epoch 26 Batch 60/172] avg loss 0.00416434, throughput 2.79966K wps
[Epoch 26 Batch 90/172] avg loss 0.00399204, throughput 2.81757K wps
[Epoch 26 Batch 120/172] avg loss 0.00449747, throughput 2.81405K wps
[Epoch 26 Batch 150/172] avg loss 0.00430512, throughput 2.82768K wps
Begin Testing...
[Epoch 26] train avg loss 0.00416449, dev acc 0.9025, dev avg loss 0.266376, throughput 2.80036K wps
[Epoch 27 Batch 30/172] avg loss 0.00441707, throughput 2.86713K wps
[Epoch 27 Batch 60/172] avg loss 0.00380548, throughput 2.82538K wps
[Epoch 27 Batch 90/172] avg loss 0.00362349, throughput 2.83328K wps
[Epoch 27 Batch 120/172] avg loss 0.00398941, throughput 2.63407K wps
[Epoch 27 Batch 150/172] avg loss 0.0044293, throughput 2.76467K wps
Begin Testing...
[Epoch 27] train avg loss 0.00408727, dev acc 0.9036, dev avg loss 0.264681, throughput 2.79033K wps
[Epoch 28 Batch 30/172] avg loss 0.00441718, throughput 2.81066K wps
[Epoch 28 Batch 60/172] avg loss 0.00392989, throughput 2.75496K wps
[Epoch 28 Batch 90/172] avg loss 0.00394157, throughput 2.71868K wps
[Epoch 28 Batch 120/172] avg loss 0.00394876, throughput 2.80697K wps
[Epoch 28 Batch 150/172] avg loss 0.00375923, throughput 2.7647K wps
Begin Testing...
[Epoch 28] train avg loss 0.00398904, dev acc 0.9046, dev avg loss 0.264213, throughput 2.77247K wps
[Epoch 29 Batch 30/172] avg loss 0.00387621, throughput 2.74486K wps
[Epoch 29 Batch 60/172] avg loss 0.00358242, throughput 2.76775K wps
[Epoch 29 Batch 90/172] avg loss 0.00405507, throughput 2.7872K wps
[Epoch 29 Batch 120/172] avg loss 0.00384752, throughput 2.78169K wps
[Epoch 29 Batch 150/172] avg loss 0.00447601, throughput 2.77922K wps
Begin Testing...
[Epoch 29] train avg loss 0.0039094, dev acc 0.9057, dev avg loss 0.263609, throughput 2.76854K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/172] avg loss 0.00370844, throughput 2.75731K wps
[Epoch 30 Batch 60/172] avg loss 0.0038816, throughput 2.80821K wps
[Epoch 30 Batch 90/172] avg loss 0.00347126, throughput 2.81587K wps
[Epoch 30 Batch 120/172] avg loss 0.0041809, throughput 2.77516K wps
[Epoch 30 Batch 150/172] avg loss 0.00382309, throughput 2.79448K wps
Begin Testing...
[Epoch 30] train avg loss 0.00382401, dev acc 0.9057, dev avg loss 0.265864, throughput 2.78596K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/172] avg loss 0.00353641, throughput 2.69485K wps
[Epoch 31 Batch 60/172] avg loss 0.0037109, throughput 2.84283K wps
[Epoch 31 Batch 90/172] avg loss 0.00328132, throughput 2.81398K wps
[Epoch 31 Batch 120/172] avg loss 0.00361807, throughput 2.81617K wps
[Epoch 31 Batch 150/172] avg loss 0.00379687, throughput 2.78872K wps
Begin Testing...
[Epoch 31] train avg loss 0.00368652, dev acc 0.9036, dev avg loss 0.26353, throughput 2.78712K wps
[Epoch 32 Batch 30/172] avg loss 0.00361423, throughput 2.65106K wps
[Epoch 32 Batch 60/172] avg loss 0.00376457, throughput 2.67278K wps
[Epoch 32 Batch 90/172] avg loss 0.00360959, throughput 2.64786K wps
[Epoch 32 Batch 120/172] avg loss 0.00359716, throughput 2.75905K wps
[Epoch 32 Batch 150/172] avg loss 0.00350042, throughput 2.75835K wps
Begin Testing...
[Epoch 32] train avg loss 0.00358072, dev acc 0.9088, dev avg loss 0.265034, throughput 2.69463K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/172] avg loss 0.00340457, throughput 2.70734K wps
[Epoch 33 Batch 60/172] avg loss 0.00373396, throughput 2.77412K wps
[Epoch 33 Batch 90/172] avg loss 0.00383247, throughput 2.7032K wps
[Epoch 33 Batch 120/172] avg loss 0.00306928, throughput 2.77709K wps
[Epoch 33 Batch 150/172] avg loss 0.00353988, throughput 2.76359K wps
Begin Testing...
[Epoch 33] train avg loss 0.00355067, dev acc 0.9057, dev avg loss 0.262977, throughput 2.75788K wps
[Epoch 34 Batch 30/172] avg loss 0.00308901, throughput 2.87222K wps
[Epoch 34 Batch 60/172] avg loss 0.00346568, throughput 2.77101K wps
[Epoch 34 Batch 90/172] avg loss 0.00417276, throughput 2.8836K wps
[Epoch 34 Batch 120/172] avg loss 0.00319952, throughput 2.82893K wps
[Epoch 34 Batch 150/172] avg loss 0.00337853, throughput 2.70498K wps
Begin Testing...
[Epoch 34] train avg loss 0.00342672, dev acc 0.9067, dev avg loss 0.264448, throughput 2.81002K wps
[Epoch 35 Batch 30/172] avg loss 0.00315986, throughput 2.80169K wps
[Epoch 35 Batch 60/172] avg loss 0.00337496, throughput 2.81076K wps
[Epoch 35 Batch 90/172] avg loss 0.00339235, throughput 2.68968K wps
[Epoch 35 Batch 120/172] avg loss 0.00332144, throughput 2.71044K wps
[Epoch 35 Batch 150/172] avg loss 0.00345005, throughput 2.73919K wps
Begin Testing...
[Epoch 35] train avg loss 0.00331177, dev acc 0.9036, dev avg loss 0.26406, throughput 2.74482K wps
[Epoch 36 Batch 30/172] avg loss 0.003636, throughput 2.79101K wps
[Epoch 36 Batch 60/172] avg loss 0.00334931, throughput 2.79278K wps
[Epoch 36 Batch 90/172] avg loss 0.0032314, throughput 2.80693K wps
[Epoch 36 Batch 120/172] avg loss 0.00308854, throughput 2.80817K wps
[Epoch 36 Batch 150/172] avg loss 0.00310889, throughput 2.78297K wps
Begin Testing...
[Epoch 36] train avg loss 0.00332888, dev acc 0.9036, dev avg loss 0.263433, throughput 2.79824K wps
[Epoch 37 Batch 30/172] avg loss 0.00314409, throughput 2.82292K wps
[Epoch 37 Batch 60/172] avg loss 0.0030412, throughput 2.80159K wps
[Epoch 37 Batch 90/172] avg loss 0.00369915, throughput 2.64694K wps
[Epoch 37 Batch 120/172] avg loss 0.00321145, throughput 2.76759K wps
[Epoch 37 Batch 150/172] avg loss 0.00315473, throughput 2.69643K wps
Begin Testing...
[Epoch 37] train avg loss 0.00322177, dev acc 0.9057, dev avg loss 0.264007, throughput 2.74848K wps
[Epoch 38 Batch 30/172] avg loss 0.00314741, throughput 2.52805K wps
[Epoch 38 Batch 60/172] avg loss 0.00339283, throughput 2.73228K wps
[Epoch 38 Batch 90/172] avg loss 0.00310925, throughput 2.81438K wps
[Epoch 38 Batch 120/172] avg loss 0.00308695, throughput 2.67877K wps
[Epoch 38 Batch 150/172] avg loss 0.00294351, throughput 2.71484K wps
Begin Testing...
[Epoch 38] train avg loss 0.00309006, dev acc 0.9046, dev avg loss 0.264599, throughput 2.70196K wps
[Epoch 39 Batch 30/172] avg loss 0.00322894, throughput 2.62811K wps
[Epoch 39 Batch 60/172] avg loss 0.00267922, throughput 2.74422K wps
[Epoch 39 Batch 90/172] avg loss 0.00322318, throughput 2.62113K wps
[Epoch 39 Batch 120/172] avg loss 0.00301272, throughput 2.59377K wps
[Epoch 39 Batch 150/172] avg loss 0.00317426, throughput 2.72559K wps
Begin Testing...
[Epoch 39] train avg loss 0.00307978, dev acc 0.9057, dev avg loss 0.264835, throughput 2.67424K wps
[Epoch 40 Batch 30/172] avg loss 0.00290167, throughput 2.82048K wps
[Epoch 40 Batch 60/172] avg loss 0.00278887, throughput 2.81574K wps
[Epoch 40 Batch 90/172] avg loss 0.00328781, throughput 2.71109K wps
[Epoch 40 Batch 120/172] avg loss 0.00291149, throughput 2.79448K wps
[Epoch 40 Batch 150/172] avg loss 0.00317167, throughput 2.80061K wps
Begin Testing...
[Epoch 40] train avg loss 0.00299508, dev acc 0.9046, dev avg loss 0.265575, throughput 2.78685K wps
[Epoch 41 Batch 30/172] avg loss 0.00286077, throughput 2.80072K wps
[Epoch 41 Batch 60/172] avg loss 0.00276694, throughput 2.75125K wps
[Epoch 41 Batch 90/172] avg loss 0.00302439, throughput 2.8681K wps
[Epoch 41 Batch 120/172] avg loss 0.00333506, throughput 2.80626K wps
[Epoch 41 Batch 150/172] avg loss 0.00272849, throughput 2.84206K wps
Begin Testing...
[Epoch 41] train avg loss 0.00294863, dev acc 0.9036, dev avg loss 0.267208, throughput 2.81517K wps
[Epoch 42 Batch 30/172] avg loss 0.00262943, throughput 2.84854K wps
[Epoch 42 Batch 60/172] avg loss 0.00280067, throughput 2.83835K wps
[Epoch 42 Batch 90/172] avg loss 0.00286341, throughput 2.68887K wps
[Epoch 42 Batch 120/172] avg loss 0.00306341, throughput 2.69329K wps
[Epoch 42 Batch 150/172] avg loss 0.0026229, throughput 2.84265K wps
Begin Testing...
[Epoch 42] train avg loss 0.0028143, dev acc 0.9057, dev avg loss 0.268717, throughput 2.77255K wps
[Epoch 43 Batch 30/172] avg loss 0.0025706, throughput 2.82766K wps
[Epoch 43 Batch 60/172] avg loss 0.00291418, throughput 2.73454K wps
[Epoch 43 Batch 90/172] avg loss 0.00305374, throughput 2.64469K wps
[Epoch 43 Batch 120/172] avg loss 0.00282442, throughput 2.75188K wps
[Epoch 43 Batch 150/172] avg loss 0.00279495, throughput 2.77406K wps
Begin Testing...
[Epoch 43] train avg loss 0.00284216, dev acc 0.9057, dev avg loss 0.269092, throughput 2.74254K wps
[Epoch 44 Batch 30/172] avg loss 0.002811, throughput 2.83958K wps
[Epoch 44 Batch 60/172] avg loss 0.00274636, throughput 2.79691K wps
[Epoch 44 Batch 90/172] avg loss 0.00279992, throughput 2.73986K wps
[Epoch 44 Batch 120/172] avg loss 0.00267293, throughput 2.73379K wps
[Epoch 44 Batch 150/172] avg loss 0.00250427, throughput 2.77836K wps
Begin Testing...
[Epoch 44] train avg loss 0.00276498, dev acc 0.9078, dev avg loss 0.270241, throughput 2.76575K wps
[Epoch 45 Batch 30/172] avg loss 0.00281346, throughput 2.82649K wps
[Epoch 45 Batch 60/172] avg loss 0.00255719, throughput 2.81376K wps
[Epoch 45 Batch 90/172] avg loss 0.00266051, throughput 2.80689K wps
[Epoch 45 Batch 120/172] avg loss 0.00258473, throughput 2.80261K wps
[Epoch 45 Batch 150/172] avg loss 0.00291455, throughput 2.66248K wps
Begin Testing...
[Epoch 45] train avg loss 0.00269922, dev acc 0.9046, dev avg loss 0.2722, throughput 2.78791K wps
[Epoch 46 Batch 30/172] avg loss 0.00235498, throughput 2.82111K wps
[Epoch 46 Batch 60/172] avg loss 0.00266457, throughput 2.75046K wps
[Epoch 46 Batch 90/172] avg loss 0.00278193, throughput 2.82329K wps
[Epoch 46 Batch 120/172] avg loss 0.00267742, throughput 2.81174K wps
[Epoch 46 Batch 150/172] avg loss 0.00267233, throughput 2.82634K wps
Begin Testing...
[Epoch 46] train avg loss 0.00263864, dev acc 0.9057, dev avg loss 0.271827, throughput 2.80476K wps
[Epoch 47 Batch 30/172] avg loss 0.00250553, throughput 2.85438K wps
[Epoch 47 Batch 60/172] avg loss 0.00273805, throughput 2.77701K wps
[Epoch 47 Batch 90/172] avg loss 0.00271192, throughput 2.8113K wps
[Epoch 47 Batch 120/172] avg loss 0.00273649, throughput 2.80691K wps
[Epoch 47 Batch 150/172] avg loss 0.00260036, throughput 2.7994K wps
Begin Testing...
[Epoch 47] train avg loss 0.00263336, dev acc 0.9046, dev avg loss 0.274937, throughput 2.81355K wps
[Epoch 48 Batch 30/172] avg loss 0.00244428, throughput 2.7034K wps
[Epoch 48 Batch 60/172] avg loss 0.00254771, throughput 2.75495K wps
[Epoch 48 Batch 90/172] avg loss 0.00240388, throughput 2.78163K wps
[Epoch 48 Batch 120/172] avg loss 0.0027475, throughput 2.81663K wps
[Epoch 48 Batch 150/172] avg loss 0.00229827, throughput 2.806K wps
Begin Testing...
[Epoch 48] train avg loss 0.00251894, dev acc 0.9057, dev avg loss 0.276239, throughput 2.77791K wps
[Epoch 49 Batch 30/172] avg loss 0.00254875, throughput 2.72651K wps
[Epoch 49 Batch 60/172] avg loss 0.00220279, throughput 2.72522K wps
[Epoch 49 Batch 90/172] avg loss 0.00247574, throughput 2.79285K wps
[Epoch 49 Batch 120/172] avg loss 0.00230078, throughput 2.79056K wps
[Epoch 49 Batch 150/172] avg loss 0.00268042, throughput 2.78603K wps
Begin Testing...
[Epoch 49] train avg loss 0.00247326, dev acc 0.9046, dev avg loss 0.277066, throughput 2.76928K wps
[Epoch 50 Batch 30/172] avg loss 0.00240862, throughput 2.88308K wps
[Epoch 50 Batch 60/172] avg loss 0.00233322, throughput 2.80719K wps
[Epoch 50 Batch 90/172] avg loss 0.00225388, throughput 2.82373K wps
[Epoch 50 Batch 120/172] avg loss 0.00252883, throughput 2.78365K wps
[Epoch 50 Batch 150/172] avg loss 0.00254408, throughput 2.78392K wps
Begin Testing...
[Epoch 50] train avg loss 0.00242632, dev acc 0.9025, dev avg loss 0.278143, throughput 2.80982K wps
[Epoch 51 Batch 30/172] avg loss 0.00222934, throughput 2.80697K wps
[Epoch 51 Batch 60/172] avg loss 0.00206564, throughput 2.79367K wps
[Epoch 51 Batch 90/172] avg loss 0.00230698, throughput 2.80814K wps
[Epoch 51 Batch 120/172] avg loss 0.0028302, throughput 2.74915K wps
[Epoch 51 Batch 150/172] avg loss 0.00251086, throughput 2.75594K wps
Begin Testing...
[Epoch 51] train avg loss 0.00238064, dev acc 0.9036, dev avg loss 0.278673, throughput 2.78286K wps
[Epoch 52 Batch 30/172] avg loss 0.00214192, throughput 2.81038K wps
[Epoch 52 Batch 60/172] avg loss 0.0021763, throughput 2.80168K wps
[Epoch 52 Batch 90/172] avg loss 0.00264943, throughput 2.7747K wps
[Epoch 52 Batch 120/172] avg loss 0.00243499, throughput 2.81289K wps
[Epoch 52 Batch 150/172] avg loss 0.00210524, throughput 2.8043K wps
Begin Testing...
[Epoch 52] train avg loss 0.00234565, dev acc 0.9046, dev avg loss 0.280635, throughput 2.79581K wps
[Epoch 53 Batch 30/172] avg loss 0.00240381, throughput 2.86457K wps
[Epoch 53 Batch 60/172] avg loss 0.0023306, throughput 2.7919K wps
[Epoch 53 Batch 90/172] avg loss 0.00214725, throughput 2.77833K wps
[Epoch 53 Batch 120/172] avg loss 0.00222067, throughput 2.76282K wps
[Epoch 53 Batch 150/172] avg loss 0.00233441, throughput 2.77221K wps
Begin Testing...
[Epoch 53] train avg loss 0.00236446, dev acc 0.9057, dev avg loss 0.280222, throughput 2.7854K wps
[Epoch 54 Batch 30/172] avg loss 0.00234644, throughput 2.74968K wps
[Epoch 54 Batch 60/172] avg loss 0.00197758, throughput 2.77884K wps
[Epoch 54 Batch 90/172] avg loss 0.00240886, throughput 2.76617K wps
[Epoch 54 Batch 120/172] avg loss 0.00252331, throughput 2.80048K wps
[Epoch 54 Batch 150/172] avg loss 0.00235976, throughput 2.78159K wps
Begin Testing...
[Epoch 54] train avg loss 0.00232294, dev acc 0.9036, dev avg loss 0.282011, throughput 2.77961K wps
[Epoch 55 Batch 30/172] avg loss 0.002152, throughput 2.74672K wps
[Epoch 55 Batch 60/172] avg loss 0.00204487, throughput 2.78947K wps
[Epoch 55 Batch 90/172] avg loss 0.00213601, throughput 2.78784K wps
[Epoch 55 Batch 120/172] avg loss 0.00238714, throughput 2.75165K wps
[Epoch 55 Batch 150/172] avg loss 0.00261464, throughput 2.79107K wps
Begin Testing...
[Epoch 55] train avg loss 0.00223195, dev acc 0.9036, dev avg loss 0.285378, throughput 2.77495K wps
[Epoch 56 Batch 30/172] avg loss 0.00221494, throughput 2.67552K wps
[Epoch 56 Batch 60/172] avg loss 0.00212406, throughput 2.75021K wps
[Epoch 56 Batch 90/172] avg loss 0.00199088, throughput 2.74521K wps
[Epoch 56 Batch 120/172] avg loss 0.0022598, throughput 2.73868K wps
[Epoch 56 Batch 150/172] avg loss 0.00214739, throughput 2.7963K wps
Begin Testing...
[Epoch 56] train avg loss 0.00217264, dev acc 0.9025, dev avg loss 0.287439, throughput 2.74942K wps
[Epoch 57 Batch 30/172] avg loss 0.00243202, throughput 2.78897K wps
[Epoch 57 Batch 60/172] avg loss 0.00180112, throughput 2.79178K wps
[Epoch 57 Batch 90/172] avg loss 0.00272897, throughput 2.76446K wps
[Epoch 57 Batch 120/172] avg loss 0.00207487, throughput 2.82093K wps
[Epoch 57 Batch 150/172] avg loss 0.0020794, throughput 2.80903K wps
Begin Testing...
[Epoch 57] train avg loss 0.00217564, dev acc 0.9036, dev avg loss 0.287979, throughput 2.7944K wps
[Epoch 58 Batch 30/172] avg loss 0.00214439, throughput 2.83547K wps
[Epoch 58 Batch 60/172] avg loss 0.00199426, throughput 2.75471K wps
[Epoch 58 Batch 90/172] avg loss 0.00203042, throughput 2.58252K wps
[Epoch 58 Batch 120/172] avg loss 0.00234829, throughput 2.68728K wps
[Epoch 58 Batch 150/172] avg loss 0.00222029, throughput 2.78625K wps
Begin Testing...
[Epoch 58] train avg loss 0.00212555, dev acc 0.9036, dev avg loss 0.290131, throughput 2.74101K wps
[Epoch 59 Batch 30/172] avg loss 0.00227097, throughput 2.84508K wps
[Epoch 59 Batch 60/172] avg loss 0.00201781, throughput 2.77868K wps
[Epoch 59 Batch 90/172] avg loss 0.00202506, throughput 2.80913K wps
[Epoch 59 Batch 120/172] avg loss 0.00191659, throughput 2.72738K wps
[Epoch 59 Batch 150/172] avg loss 0.00200407, throughput 2.62541K wps
Begin Testing...
[Epoch 59] train avg loss 0.00207569, dev acc 0.9015, dev avg loss 0.292112, throughput 2.74538K wps
[Epoch 60 Batch 30/172] avg loss 0.00196139, throughput 2.76841K wps
[Epoch 60 Batch 60/172] avg loss 0.00178849, throughput 2.80889K wps
[Epoch 60 Batch 90/172] avg loss 0.00195759, throughput 2.77405K wps
[Epoch 60 Batch 120/172] avg loss 0.00217681, throughput 2.605K wps
[Epoch 60 Batch 150/172] avg loss 0.00222112, throughput 2.77312K wps
Begin Testing...
[Epoch 60] train avg loss 0.00204355, dev acc 0.8994, dev avg loss 0.293896, throughput 2.7536K wps
[Epoch 61 Batch 30/172] avg loss 0.00170665, throughput 2.74035K wps
[Epoch 61 Batch 60/172] avg loss 0.00226072, throughput 2.82717K wps
[Epoch 61 Batch 90/172] avg loss 0.00188166, throughput 2.6482K wps
[Epoch 61 Batch 120/172] avg loss 0.00196511, throughput 2.6272K wps
[Epoch 61 Batch 150/172] avg loss 0.00203363, throughput 2.78719K wps
Begin Testing...
[Epoch 61] train avg loss 0.00195346, dev acc 0.9004, dev avg loss 0.296244, throughput 2.74465K wps
[Epoch 62 Batch 30/172] avg loss 0.00185371, throughput 2.85551K wps
[Epoch 62 Batch 60/172] avg loss 0.00203998, throughput 2.74407K wps
[Epoch 62 Batch 90/172] avg loss 0.00206101, throughput 2.79767K wps
[Epoch 62 Batch 120/172] avg loss 0.00196719, throughput 2.83363K wps
[Epoch 62 Batch 150/172] avg loss 0.00187186, throughput 2.81196K wps
Begin Testing...
[Epoch 62] train avg loss 0.00198764, dev acc 0.8994, dev avg loss 0.295373, throughput 2.80969K wps
[Epoch 63 Batch 30/172] avg loss 0.00202665, throughput 2.84505K wps
[Epoch 63 Batch 60/172] avg loss 0.00211393, throughput 2.80929K wps
[Epoch 63 Batch 90/172] avg loss 0.00177642, throughput 2.75936K wps
[Epoch 63 Batch 120/172] avg loss 0.002029, throughput 2.71764K wps
[Epoch 63 Batch 150/172] avg loss 0.00220173, throughput 2.77208K wps
Begin Testing...
[Epoch 63] train avg loss 0.00200428, dev acc 0.8983, dev avg loss 0.296471, throughput 2.78607K wps
[Epoch 64 Batch 30/172] avg loss 0.00170336, throughput 2.84441K wps
[Epoch 64 Batch 60/172] avg loss 0.00222511, throughput 2.80138K wps
[Epoch 64 Batch 90/172] avg loss 0.00173025, throughput 2.77786K wps
[Epoch 64 Batch 120/172] avg loss 0.00202521, throughput 2.79915K wps
[Epoch 64 Batch 150/172] avg loss 0.00201181, throughput 2.6475K wps
Begin Testing...
[Epoch 64] train avg loss 0.00195172, dev acc 0.9015, dev avg loss 0.297599, throughput 2.75615K wps
[Epoch 65 Batch 30/172] avg loss 0.00180622, throughput 2.79143K wps
[Epoch 65 Batch 60/172] avg loss 0.0018458, throughput 2.62029K wps
[Epoch 65 Batch 90/172] avg loss 0.00178117, throughput 2.75325K wps
[Epoch 65 Batch 120/172] avg loss 0.00192537, throughput 2.77439K wps
[Epoch 65 Batch 150/172] avg loss 0.00214027, throughput 2.74767K wps
Begin Testing...
[Epoch 65] train avg loss 0.00189144, dev acc 0.9057, dev avg loss 0.301816, throughput 2.72802K wps
[Epoch 66 Batch 30/172] avg loss 0.00179919, throughput 2.83922K wps
[Epoch 66 Batch 60/172] avg loss 0.00189875, throughput 2.73625K wps
[Epoch 66 Batch 90/172] avg loss 0.00186935, throughput 2.689K wps
[Epoch 66 Batch 120/172] avg loss 0.00196376, throughput 2.69582K wps
[Epoch 66 Batch 150/172] avg loss 0.00187701, throughput 2.72149K wps
Begin Testing...
[Epoch 66] train avg loss 0.00189121, dev acc 0.9036, dev avg loss 0.302487, throughput 2.72911K wps
[Epoch 67 Batch 30/172] avg loss 0.00191143, throughput 2.84981K wps
[Epoch 67 Batch 60/172] avg loss 0.00159821, throughput 2.66093K wps
[Epoch 67 Batch 90/172] avg loss 0.00180079, throughput 2.84193K wps
[Epoch 67 Batch 120/172] avg loss 0.00167104, throughput 2.77396K wps
[Epoch 67 Batch 150/172] avg loss 0.00191922, throughput 2.68205K wps
Begin Testing...
[Epoch 67] train avg loss 0.00180333, dev acc 0.9078, dev avg loss 0.306032, throughput 2.75403K wps
[Epoch 68 Batch 30/172] avg loss 0.00167945, throughput 2.83388K wps
[Epoch 68 Batch 60/172] avg loss 0.00194352, throughput 2.79819K wps
[Epoch 68 Batch 90/172] avg loss 0.00176791, throughput 2.77371K wps
[Epoch 68 Batch 120/172] avg loss 0.00180475, throughput 2.78567K wps
[Epoch 68 Batch 150/172] avg loss 0.00186778, throughput 2.71449K wps
Begin Testing...
[Epoch 68] train avg loss 0.00181936, dev acc 0.8994, dev avg loss 0.305415, throughput 2.76008K wps
[Epoch 69 Batch 30/172] avg loss 0.00175159, throughput 2.82613K wps
[Epoch 69 Batch 60/172] avg loss 0.00198517, throughput 2.75853K wps
[Epoch 69 Batch 90/172] avg loss 0.00166019, throughput 2.69731K wps
[Epoch 69 Batch 120/172] avg loss 0.00194847, throughput 2.77592K wps
[Epoch 69 Batch 150/172] avg loss 0.00174213, throughput 2.69792K wps
Begin Testing...
[Epoch 69] train avg loss 0.00181118, dev acc 0.8983, dev avg loss 0.305522, throughput 2.75485K wps
[Epoch 70 Batch 30/172] avg loss 0.00165232, throughput 2.81637K wps
[Epoch 70 Batch 60/172] avg loss 0.00193797, throughput 2.76187K wps
[Epoch 70 Batch 90/172] avg loss 0.00153811, throughput 2.69727K wps
[Epoch 70 Batch 120/172] avg loss 0.00160436, throughput 2.80244K wps
[Epoch 70 Batch 150/172] avg loss 0.00192409, throughput 2.76064K wps
Begin Testing...
[Epoch 70] train avg loss 0.00177261, dev acc 0.9015, dev avg loss 0.307278, throughput 2.76844K wps
[Epoch 71 Batch 30/172] avg loss 0.00203799, throughput 2.71357K wps
[Epoch 71 Batch 60/172] avg loss 0.00150024, throughput 2.80206K wps
[Epoch 71 Batch 90/172] avg loss 0.00160273, throughput 2.6244K wps
[Epoch 71 Batch 120/172] avg loss 0.00176503, throughput 2.68194K wps
[Epoch 71 Batch 150/172] avg loss 0.00196859, throughput 2.80111K wps
Begin Testing...
[Epoch 71] train avg loss 0.00175071, dev acc 0.8973, dev avg loss 0.308977, throughput 2.72631K wps
[Epoch 72 Batch 30/172] avg loss 0.00210167, throughput 2.74323K wps
[Epoch 72 Batch 60/172] avg loss 0.0016807, throughput 2.75652K wps
[Epoch 72 Batch 90/172] avg loss 0.00170877, throughput 2.78767K wps
[Epoch 72 Batch 120/172] avg loss 0.00181962, throughput 2.78395K wps
[Epoch 72 Batch 150/172] avg loss 0.00177436, throughput 2.71449K wps
Begin Testing...
[Epoch 72] train avg loss 0.00177213, dev acc 0.9004, dev avg loss 0.310289, throughput 2.76145K wps
[Epoch 73 Batch 30/172] avg loss 0.00193533, throughput 2.76766K wps
[Epoch 73 Batch 60/172] avg loss 0.00158955, throughput 2.73176K wps
[Epoch 73 Batch 90/172] avg loss 0.00185931, throughput 2.75609K wps
[Epoch 73 Batch 120/172] avg loss 0.00157181, throughput 2.78724K wps
[Epoch 73 Batch 150/172] avg loss 0.00164582, throughput 2.78064K wps
Begin Testing...
[Epoch 73] train avg loss 0.00176598, dev acc 0.9036, dev avg loss 0.311709, throughput 2.75767K wps
[Epoch 74 Batch 30/172] avg loss 0.00169887, throughput 2.76712K wps
[Epoch 74 Batch 60/172] avg loss 0.00153168, throughput 2.78138K wps
[Epoch 74 Batch 90/172] avg loss 0.00176495, throughput 2.78767K wps
[Epoch 74 Batch 120/172] avg loss 0.00171033, throughput 2.78737K wps
[Epoch 74 Batch 150/172] avg loss 0.00154069, throughput 2.76986K wps
Begin Testing...
[Epoch 74] train avg loss 0.00164942, dev acc 0.8973, dev avg loss 0.314482, throughput 2.75971K wps
[Epoch 75 Batch 30/172] avg loss 0.00169773, throughput 2.81027K wps
[Epoch 75 Batch 60/172] avg loss 0.00169227, throughput 2.78267K wps
[Epoch 75 Batch 90/172] avg loss 0.0017815, throughput 2.70263K wps
[Epoch 75 Batch 120/172] avg loss 0.00150199, throughput 2.7896K wps
[Epoch 75 Batch 150/172] avg loss 0.00167896, throughput 2.74806K wps
Begin Testing...
[Epoch 75] train avg loss 0.00168386, dev acc 0.8973, dev avg loss 0.316263, throughput 2.77066K wps
[Epoch 76 Batch 30/172] avg loss 0.00154213, throughput 2.79206K wps
[Epoch 76 Batch 60/172] avg loss 0.00157918, throughput 2.86007K wps
[Epoch 76 Batch 90/172] avg loss 0.00156143, throughput 2.81508K wps
[Epoch 76 Batch 120/172] avg loss 0.00154422, throughput 2.81092K wps
[Epoch 76 Batch 150/172] avg loss 0.00184421, throughput 2.80693K wps
Begin Testing...
[Epoch 76] train avg loss 0.00159822, dev acc 0.8994, dev avg loss 0.315898, throughput 2.80618K wps
[Epoch 77 Batch 30/172] avg loss 0.00157921, throughput 2.74301K wps
[Epoch 77 Batch 60/172] avg loss 0.00168263, throughput 2.64369K wps
[Epoch 77 Batch 90/172] avg loss 0.00165648, throughput 2.80459K wps
[Epoch 77 Batch 120/172] avg loss 0.00166781, throughput 2.79464K wps
[Epoch 77 Batch 150/172] avg loss 0.00168643, throughput 2.82358K wps
Begin Testing...
[Epoch 77] train avg loss 0.0016306, dev acc 0.9004, dev avg loss 0.31706, throughput 2.76608K wps
[Epoch 78 Batch 30/172] avg loss 0.00164253, throughput 2.87756K wps
[Epoch 78 Batch 60/172] avg loss 0.001529, throughput 2.75101K wps
[Epoch 78 Batch 90/172] avg loss 0.00176147, throughput 2.76468K wps
[Epoch 78 Batch 120/172] avg loss 0.00159041, throughput 2.76299K wps
[Epoch 78 Batch 150/172] avg loss 0.00171046, throughput 2.71394K wps
Begin Testing...
[Epoch 78] train avg loss 0.00160713, dev acc 0.8994, dev avg loss 0.320417, throughput 2.74893K wps
[Epoch 79 Batch 30/172] avg loss 0.00149419, throughput 2.79392K wps
[Epoch 79 Batch 60/172] avg loss 0.00157202, throughput 2.75404K wps
[Epoch 79 Batch 90/172] avg loss 0.00159975, throughput 2.7908K wps
[Epoch 79 Batch 120/172] avg loss 0.00160457, throughput 2.77063K wps
[Epoch 79 Batch 150/172] avg loss 0.001787, throughput 2.7803K wps
Begin Testing...
[Epoch 79] train avg loss 0.00163722, dev acc 0.9036, dev avg loss 0.320833, throughput 2.77185K wps
[Epoch 80 Batch 30/172] avg loss 0.00166432, throughput 2.84382K wps
[Epoch 80 Batch 60/172] avg loss 0.0015801, throughput 2.80576K wps
[Epoch 80 Batch 90/172] avg loss 0.00156703, throughput 2.77366K wps
[Epoch 80 Batch 120/172] avg loss 0.0015458, throughput 2.79407K wps
[Epoch 80 Batch 150/172] avg loss 0.00180126, throughput 2.74862K wps
Begin Testing...
[Epoch 80] train avg loss 0.0016085, dev acc 0.8994, dev avg loss 0.322074, throughput 2.79303K wps
[Epoch 81 Batch 30/172] avg loss 0.00159754, throughput 2.75642K wps
[Epoch 81 Batch 60/172] avg loss 0.00142924, throughput 2.78279K wps
[Epoch 81 Batch 90/172] avg loss 0.00162352, throughput 2.70263K wps
[Epoch 81 Batch 120/172] avg loss 0.00147838, throughput 2.81315K wps
[Epoch 81 Batch 150/172] avg loss 0.00159367, throughput 2.71272K wps
Begin Testing...
[Epoch 81] train avg loss 0.00155559, dev acc 0.8983, dev avg loss 0.325617, throughput 2.75822K wps
[Epoch 82 Batch 30/172] avg loss 0.00157733, throughput 2.82925K wps
[Epoch 82 Batch 60/172] avg loss 0.00135355, throughput 2.72769K wps
[Epoch 82 Batch 90/172] avg loss 0.00151327, throughput 2.76016K wps
[Epoch 82 Batch 120/172] avg loss 0.00177958, throughput 2.76168K wps
[Epoch 82 Batch 150/172] avg loss 0.0013133, throughput 2.76014K wps
Begin Testing...
[Epoch 82] train avg loss 0.00154468, dev acc 0.9004, dev avg loss 0.32695, throughput 2.75335K wps
[Epoch 83 Batch 30/172] avg loss 0.00128388, throughput 2.76265K wps
[Epoch 83 Batch 60/172] avg loss 0.00150826, throughput 2.76897K wps
[Epoch 83 Batch 90/172] avg loss 0.00157674, throughput 2.78327K wps
[Epoch 83 Batch 120/172] avg loss 0.00145701, throughput 2.80297K wps
[Epoch 83 Batch 150/172] avg loss 0.00147009, throughput 2.79856K wps
Begin Testing...
[Epoch 83] train avg loss 0.00150842, dev acc 0.9004, dev avg loss 0.32635, throughput 2.7859K wps
[Epoch 84 Batch 30/172] avg loss 0.00165504, throughput 2.87543K wps
[Epoch 84 Batch 60/172] avg loss 0.00142256, throughput 2.81422K wps
[Epoch 84 Batch 90/172] avg loss 0.00150631, throughput 2.8011K wps
[Epoch 84 Batch 120/172] avg loss 0.00163703, throughput 2.78701K wps
[Epoch 84 Batch 150/172] avg loss 0.00131296, throughput 2.796K wps
Begin Testing...
[Epoch 84] train avg loss 0.00151585, dev acc 0.8994, dev avg loss 0.328782, throughput 2.81031K wps
[Epoch 85 Batch 30/172] avg loss 0.00139268, throughput 2.70688K wps
[Epoch 85 Batch 60/172] avg loss 0.00131528, throughput 2.59058K wps
[Epoch 85 Batch 90/172] avg loss 0.00165268, throughput 2.82877K wps
[Epoch 85 Batch 120/172] avg loss 0.00150357, throughput 2.78116K wps
[Epoch 85 Batch 150/172] avg loss 0.00153531, throughput 2.662K wps
Begin Testing...
[Epoch 85] train avg loss 0.0014955, dev acc 0.9036, dev avg loss 0.330789, throughput 2.71283K wps
[Epoch 86 Batch 30/172] avg loss 0.00161542, throughput 2.84398K wps
[Epoch 86 Batch 60/172] avg loss 0.00125768, throughput 2.78975K wps
[Epoch 86 Batch 90/172] avg loss 0.00124049, throughput 2.72594K wps
[Epoch 86 Batch 120/172] avg loss 0.00151118, throughput 2.78404K wps
[Epoch 86 Batch 150/172] avg loss 0.00159971, throughput 2.72677K wps
Begin Testing...
[Epoch 86] train avg loss 0.00144403, dev acc 0.9004, dev avg loss 0.331599, throughput 2.77582K wps
[Epoch 87 Batch 30/172] avg loss 0.00148745, throughput 2.52611K wps
[Epoch 87 Batch 60/172] avg loss 0.0016074, throughput 2.80325K wps
[Epoch 87 Batch 90/172] avg loss 0.00135834, throughput 2.74777K wps
[Epoch 87 Batch 120/172] avg loss 0.00147305, throughput 2.77584K wps
[Epoch 87 Batch 150/172] avg loss 0.00142181, throughput 2.76659K wps
Begin Testing...
[Epoch 87] train avg loss 0.00147735, dev acc 0.8983, dev avg loss 0.332977, throughput 2.7254K wps
[Epoch 88 Batch 30/172] avg loss 0.00144598, throughput 2.75318K wps
[Epoch 88 Batch 60/172] avg loss 0.00119113, throughput 2.58988K wps
[Epoch 88 Batch 90/172] avg loss 0.00145346, throughput 2.84507K wps
[Epoch 88 Batch 120/172] avg loss 0.00140931, throughput 2.78568K wps
[Epoch 88 Batch 150/172] avg loss 0.00178739, throughput 2.80983K wps
Begin Testing...
[Epoch 88] train avg loss 0.00145924, dev acc 0.9004, dev avg loss 0.335665, throughput 2.75899K wps
[Epoch 89 Batch 30/172] avg loss 0.00117196, throughput 2.78875K wps
[Epoch 89 Batch 60/172] avg loss 0.00149973, throughput 2.77117K wps
[Epoch 89 Batch 90/172] avg loss 0.00154143, throughput 2.65556K wps
[Epoch 89 Batch 120/172] avg loss 0.00160459, throughput 2.70215K wps
[Epoch 89 Batch 150/172] avg loss 0.0013891, throughput 2.83333K wps
Begin Testing...
[Epoch 89] train avg loss 0.00145174, dev acc 0.8973, dev avg loss 0.334484, throughput 2.74392K wps
[Epoch 90 Batch 30/172] avg loss 0.00100085, throughput 2.79266K wps
[Epoch 90 Batch 60/172] avg loss 0.00150214, throughput 2.77138K wps
[Epoch 90 Batch 90/172] avg loss 0.00153067, throughput 2.77167K wps
[Epoch 90 Batch 120/172] avg loss 0.00142222, throughput 2.78091K wps
[Epoch 90 Batch 150/172] avg loss 0.00163752, throughput 2.799K wps
Begin Testing...
[Epoch 90] train avg loss 0.00145276, dev acc 0.8983, dev avg loss 0.336652, throughput 2.78265K wps
[Epoch 91 Batch 30/172] avg loss 0.00151747, throughput 2.84556K wps
[Epoch 91 Batch 60/172] avg loss 0.00136013, throughput 2.78438K wps
[Epoch 91 Batch 90/172] avg loss 0.0013978, throughput 2.78263K wps
[Epoch 91 Batch 120/172] avg loss 0.00147021, throughput 2.78528K wps
[Epoch 91 Batch 150/172] avg loss 0.0014788, throughput 2.68454K wps
Begin Testing...
[Epoch 91] train avg loss 0.00140944, dev acc 0.9046, dev avg loss 0.340347, throughput 2.76223K wps
[Epoch 92 Batch 30/172] avg loss 0.00165664, throughput 2.76754K wps
[Epoch 92 Batch 60/172] avg loss 0.00135093, throughput 2.72852K wps
[Epoch 92 Batch 90/172] avg loss 0.00127985, throughput 2.8048K wps
[Epoch 92 Batch 120/172] avg loss 0.00127347, throughput 2.77962K wps
[Epoch 92 Batch 150/172] avg loss 0.0012834, throughput 2.77489K wps
Begin Testing...
[Epoch 92] train avg loss 0.00139343, dev acc 0.9015, dev avg loss 0.340235, throughput 2.77267K wps
[Epoch 93 Batch 30/172] avg loss 0.00158671, throughput 2.8053K wps
[Epoch 93 Batch 60/172] avg loss 0.00115001, throughput 2.77808K wps
[Epoch 93 Batch 90/172] avg loss 0.00116651, throughput 2.79765K wps
[Epoch 93 Batch 120/172] avg loss 0.0013947, throughput 2.78289K wps
[Epoch 93 Batch 150/172] avg loss 0.00152417, throughput 2.80527K wps
Begin Testing...
[Epoch 93] train avg loss 0.0013554, dev acc 0.9004, dev avg loss 0.341519, throughput 2.79345K wps
[Epoch 94 Batch 30/172] avg loss 0.00131996, throughput 2.71707K wps
[Epoch 94 Batch 60/172] avg loss 0.00137402, throughput 2.83994K wps
[Epoch 94 Batch 90/172] avg loss 0.00123777, throughput 2.79209K wps
[Epoch 94 Batch 120/172] avg loss 0.00132271, throughput 2.82051K wps
[Epoch 94 Batch 150/172] avg loss 0.00145417, throughput 2.73964K wps
Begin Testing...
[Epoch 94] train avg loss 0.00134425, dev acc 0.9015, dev avg loss 0.342593, throughput 2.7927K wps
[Epoch 95 Batch 30/172] avg loss 0.00114034, throughput 2.76263K wps
[Epoch 95 Batch 60/172] avg loss 0.00134255, throughput 2.80741K wps
[Epoch 95 Batch 90/172] avg loss 0.0012563, throughput 2.68244K wps
[Epoch 95 Batch 120/172] avg loss 0.00148403, throughput 2.72862K wps
[Epoch 95 Batch 150/172] avg loss 0.00149108, throughput 2.82512K wps
Begin Testing...
[Epoch 95] train avg loss 0.00133551, dev acc 0.8983, dev avg loss 0.34404, throughput 2.76506K wps
[Epoch 96 Batch 30/172] avg loss 0.00131555, throughput 2.82183K wps
[Epoch 96 Batch 60/172] avg loss 0.00113121, throughput 2.68089K wps
[Epoch 96 Batch 90/172] avg loss 0.00123819, throughput 2.76558K wps
[Epoch 96 Batch 120/172] avg loss 0.00140801, throughput 2.78047K wps
[Epoch 96 Batch 150/172] avg loss 0.00129519, throughput 2.85306K wps
Begin Testing...
[Epoch 96] train avg loss 0.00133068, dev acc 0.8952, dev avg loss 0.343658, throughput 2.78255K wps
[Epoch 97 Batch 30/172] avg loss 0.00132583, throughput 2.72329K wps
[Epoch 97 Batch 60/172] avg loss 0.00135835, throughput 2.85276K wps
[Epoch 97 Batch 90/172] avg loss 0.00130945, throughput 2.75467K wps
[Epoch 97 Batch 120/172] avg loss 0.00134846, throughput 2.72001K wps
[Epoch 97 Batch 150/172] avg loss 0.0013877, throughput 2.74936K wps
Begin Testing...
[Epoch 97] train avg loss 0.00133499, dev acc 0.9078, dev avg loss 0.353434, throughput 2.76779K wps
[Epoch 98 Batch 30/172] avg loss 0.00114354, throughput 2.77435K wps
[Epoch 98 Batch 60/172] avg loss 0.00126706, throughput 2.67939K wps
[Epoch 98 Batch 90/172] avg loss 0.00127163, throughput 2.7345K wps
[Epoch 98 Batch 120/172] avg loss 0.00128577, throughput 2.67156K wps
[Epoch 98 Batch 150/172] avg loss 0.00137904, throughput 2.77581K wps
Begin Testing...
[Epoch 98] train avg loss 0.00129585, dev acc 0.9057, dev avg loss 0.348703, throughput 2.72746K wps
[Epoch 99 Batch 30/172] avg loss 0.00111137, throughput 2.81744K wps
[Epoch 99 Batch 60/172] avg loss 0.00137012, throughput 2.73909K wps
[Epoch 99 Batch 90/172] avg loss 0.00126898, throughput 2.7904K wps
[Epoch 99 Batch 120/172] avg loss 0.00140261, throughput 2.77124K wps
[Epoch 99 Batch 150/172] avg loss 0.00139271, throughput 2.7637K wps
Begin Testing...
[Epoch 99] train avg loss 0.00129706, dev acc 0.9015, dev avg loss 0.352028, throughput 2.75252K wps
[Epoch 100 Batch 30/172] avg loss 0.00112136, throughput 2.84287K wps
[Epoch 100 Batch 60/172] avg loss 0.00136665, throughput 2.76284K wps
[Epoch 100 Batch 90/172] avg loss 0.00144848, throughput 2.78275K wps
[Epoch 100 Batch 120/172] avg loss 0.00133076, throughput 2.68096K wps
[Epoch 100 Batch 150/172] avg loss 0.00138621, throughput 2.77582K wps
Begin Testing...
[Epoch 100] train avg loss 0.00131462, dev acc 0.9025, dev avg loss 0.349188, throughput 2.76713K wps
[Epoch 101 Batch 30/172] avg loss 0.00117898, throughput 2.84176K wps
[Epoch 101 Batch 60/172] avg loss 0.00131767, throughput 2.80137K wps
[Epoch 101 Batch 90/172] avg loss 0.00121451, throughput 2.80392K wps
[Epoch 101 Batch 120/172] avg loss 0.00136886, throughput 2.75612K wps
[Epoch 101 Batch 150/172] avg loss 0.00140726, throughput 2.81392K wps
Begin Testing...
[Epoch 101] train avg loss 0.00129712, dev acc 0.9015, dev avg loss 0.350467, throughput 2.80348K wps
[Epoch 102 Batch 30/172] avg loss 0.000966767, throughput 2.78093K wps
[Epoch 102 Batch 60/172] avg loss 0.0011978, throughput 2.71101K wps
[Epoch 102 Batch 90/172] avg loss 0.00134842, throughput 2.7704K wps
[Epoch 102 Batch 120/172] avg loss 0.00126762, throughput 2.71036K wps
[Epoch 102 Batch 150/172] avg loss 0.0015851, throughput 2.88731K wps
Begin Testing...
[Epoch 102] train avg loss 0.00127394, dev acc 0.9046, dev avg loss 0.353711, throughput 2.76423K wps
[Epoch 103 Batch 30/172] avg loss 0.00120302, throughput 2.79113K wps
[Epoch 103 Batch 60/172] avg loss 0.0010678, throughput 2.8155K wps
[Epoch 103 Batch 90/172] avg loss 0.00132974, throughput 2.81216K wps
[Epoch 103 Batch 120/172] avg loss 0.00113728, throughput 2.81229K wps
[Epoch 103 Batch 150/172] avg loss 0.00128824, throughput 2.80614K wps
Begin Testing...
[Epoch 103] train avg loss 0.00128056, dev acc 0.9015, dev avg loss 0.351343, throughput 2.80483K wps
[Epoch 104 Batch 30/172] avg loss 0.00117136, throughput 2.82101K wps
[Epoch 104 Batch 60/172] avg loss 0.00119437, throughput 2.80834K wps
[Epoch 104 Batch 90/172] avg loss 0.00124924, throughput 2.7639K wps
[Epoch 104 Batch 120/172] avg loss 0.00127479, throughput 2.75175K wps
[Epoch 104 Batch 150/172] avg loss 0.00119834, throughput 2.78911K wps
Begin Testing...
[Epoch 104] train avg loss 0.00122216, dev acc 0.9046, dev avg loss 0.363178, throughput 2.78843K wps
[Epoch 105 Batch 30/172] avg loss 0.00119739, throughput 2.76129K wps
[Epoch 105 Batch 60/172] avg loss 0.00124063, throughput 2.70636K wps
[Epoch 105 Batch 90/172] avg loss 0.00118034, throughput 2.78513K wps
[Epoch 105 Batch 120/172] avg loss 0.00141202, throughput 2.75331K wps
[Epoch 105 Batch 150/172] avg loss 0.00120959, throughput 2.79218K wps
Begin Testing...
[Epoch 105] train avg loss 0.00124096, dev acc 0.9004, dev avg loss 0.357448, throughput 2.75975K wps
[Epoch 106 Batch 30/172] avg loss 0.000971687, throughput 2.80432K wps
[Epoch 106 Batch 60/172] avg loss 0.00113631, throughput 2.81708K wps
[Epoch 106 Batch 90/172] avg loss 0.0013475, throughput 2.63227K wps
[Epoch 106 Batch 120/172] avg loss 0.00126064, throughput 2.81728K wps
[Epoch 106 Batch 150/172] avg loss 0.00131834, throughput 2.77701K wps
Begin Testing...
[Epoch 106] train avg loss 0.00124048, dev acc 0.8983, dev avg loss 0.359426, throughput 2.76929K wps
[Epoch 107 Batch 30/172] avg loss 0.00113534, throughput 2.75134K wps
[Epoch 107 Batch 60/172] avg loss 0.00118058, throughput 2.80524K wps
[Epoch 107 Batch 90/172] avg loss 0.00123078, throughput 2.7638K wps
[Epoch 107 Batch 120/172] avg loss 0.00122547, throughput 2.69671K wps
[Epoch 107 Batch 150/172] avg loss 0.0012884, throughput 2.77682K wps
Begin Testing...
[Epoch 107] train avg loss 0.0012656, dev acc 0.8983, dev avg loss 0.360102, throughput 2.75956K wps
[Epoch 108 Batch 30/172] avg loss 0.00110852, throughput 2.77425K wps
[Epoch 108 Batch 60/172] avg loss 0.00121161, throughput 2.77998K wps
[Epoch 108 Batch 90/172] avg loss 0.00125547, throughput 2.70414K wps
[Epoch 108 Batch 120/172] avg loss 0.00133709, throughput 2.6487K wps
[Epoch 108 Batch 150/172] avg loss 0.0012255, throughput 2.8113K wps
Begin Testing...
[Epoch 108] train avg loss 0.0012157, dev acc 0.9036, dev avg loss 0.3642, throughput 2.75188K wps
[Epoch 109 Batch 30/172] avg loss 0.0012193, throughput 2.79436K wps
[Epoch 109 Batch 60/172] avg loss 0.00104759, throughput 2.70291K wps
[Epoch 109 Batch 90/172] avg loss 0.00127654, throughput 2.70279K wps
[Epoch 109 Batch 120/172] avg loss 0.0010844, throughput 2.8171K wps
[Epoch 109 Batch 150/172] avg loss 0.00114311, throughput 2.79924K wps
Begin Testing...
[Epoch 109] train avg loss 0.00121073, dev acc 0.9025, dev avg loss 0.363802, throughput 2.75756K wps
[Epoch 110 Batch 30/172] avg loss 0.00133113, throughput 2.79956K wps
[Epoch 110 Batch 60/172] avg loss 0.00102222, throughput 2.77416K wps
[Epoch 110 Batch 90/172] avg loss 0.00119166, throughput 2.76575K wps
[Epoch 110 Batch 120/172] avg loss 0.00111882, throughput 2.68972K wps
[Epoch 110 Batch 150/172] avg loss 0.00141948, throughput 2.7953K wps
Begin Testing...
[Epoch 110] train avg loss 0.00120535, dev acc 0.8962, dev avg loss 0.363688, throughput 2.74424K wps
[Epoch 111 Batch 30/172] avg loss 0.00121991, throughput 2.87149K wps
[Epoch 111 Batch 60/172] avg loss 0.00116039, throughput 2.81973K wps
[Epoch 111 Batch 90/172] avg loss 0.00133899, throughput 2.81083K wps
[Epoch 111 Batch 120/172] avg loss 0.00109971, throughput 2.80761K wps
[Epoch 111 Batch 150/172] avg loss 0.00127909, throughput 2.74776K wps
Begin Testing...
[Epoch 111] train avg loss 0.00119923, dev acc 0.9036, dev avg loss 0.364561, throughput 2.80282K wps
[Epoch 112 Batch 30/172] avg loss 0.00113891, throughput 2.76513K wps
[Epoch 112 Batch 60/172] avg loss 0.00120059, throughput 2.7914K wps
[Epoch 112 Batch 90/172] avg loss 0.00108213, throughput 2.73555K wps
[Epoch 112 Batch 120/172] avg loss 0.00129038, throughput 2.75858K wps
[Epoch 112 Batch 150/172] avg loss 0.00120274, throughput 2.74193K wps
Begin Testing...
[Epoch 112] train avg loss 0.00118921, dev acc 0.9015, dev avg loss 0.365609, throughput 2.77342K wps
[Epoch 113 Batch 30/172] avg loss 0.00127423, throughput 2.84609K wps
[Epoch 113 Batch 60/172] avg loss 0.00111049, throughput 2.69775K wps
[Epoch 113 Batch 90/172] avg loss 0.00125935, throughput 2.81432K wps
[Epoch 113 Batch 120/172] avg loss 0.00120297, throughput 2.76768K wps
[Epoch 113 Batch 150/172] avg loss 0.00123375, throughput 2.73172K wps
Begin Testing...
[Epoch 113] train avg loss 0.00118563, dev acc 0.9036, dev avg loss 0.371547, throughput 2.77446K wps
[Epoch 114 Batch 30/172] avg loss 0.00124561, throughput 2.76808K wps
[Epoch 114 Batch 60/172] avg loss 0.00106659, throughput 2.83979K wps
[Epoch 114 Batch 90/172] avg loss 0.00120347, throughput 2.67451K wps
[Epoch 114 Batch 120/172] avg loss 0.00127099, throughput 2.85981K wps
[Epoch 114 Batch 150/172] avg loss 0.00124064, throughput 2.81591K wps
Begin Testing...
[Epoch 114] train avg loss 0.00119254, dev acc 0.9036, dev avg loss 0.368933, throughput 2.79078K wps
[Epoch 115 Batch 30/172] avg loss 0.00103052, throughput 2.82181K wps
[Epoch 115 Batch 60/172] avg loss 0.00124988, throughput 2.68544K wps
[Epoch 115 Batch 90/172] avg loss 0.00129752, throughput 2.80959K wps
[Epoch 115 Batch 120/172] avg loss 0.00111174, throughput 2.80528K wps
[Epoch 115 Batch 150/172] avg loss 0.000915635, throughput 2.71092K wps
Begin Testing...
[Epoch 115] train avg loss 0.00113794, dev acc 0.9025, dev avg loss 0.369703, throughput 2.75874K wps
[Epoch 116 Batch 30/172] avg loss 0.000996287, throughput 2.79579K wps
[Epoch 116 Batch 60/172] avg loss 0.00121162, throughput 2.70429K wps
[Epoch 116 Batch 90/172] avg loss 0.00121709, throughput 2.74183K wps
[Epoch 116 Batch 120/172] avg loss 0.00111487, throughput 2.7562K wps
[Epoch 116 Batch 150/172] avg loss 0.00117376, throughput 2.74911K wps
Begin Testing...
[Epoch 116] train avg loss 0.00115616, dev acc 0.9036, dev avg loss 0.370699, throughput 2.73655K wps
[Epoch 117 Batch 30/172] avg loss 0.001314, throughput 2.72799K wps
[Epoch 117 Batch 60/172] avg loss 0.00093822, throughput 2.77159K wps
[Epoch 117 Batch 90/172] avg loss 0.001145, throughput 2.79626K wps
[Epoch 117 Batch 120/172] avg loss 0.00116118, throughput 2.76852K wps
[Epoch 117 Batch 150/172] avg loss 0.00109345, throughput 2.72217K wps
Begin Testing...
[Epoch 117] train avg loss 0.00113246, dev acc 0.9046, dev avg loss 0.37397, throughput 2.75742K wps
[Epoch 118 Batch 30/172] avg loss 0.00113779, throughput 2.69781K wps
[Epoch 118 Batch 60/172] avg loss 0.00113114, throughput 2.78806K wps
[Epoch 118 Batch 90/172] avg loss 0.00130307, throughput 2.77316K wps
[Epoch 118 Batch 120/172] avg loss 0.00104701, throughput 2.78023K wps
[Epoch 118 Batch 150/172] avg loss 0.0010252, throughput 2.76969K wps
Begin Testing...
[Epoch 118] train avg loss 0.00115986, dev acc 0.9025, dev avg loss 0.370958, throughput 2.76063K wps
[Epoch 119 Batch 30/172] avg loss 0.00106173, throughput 2.79334K wps
[Epoch 119 Batch 60/172] avg loss 0.00101063, throughput 2.637K wps
[Epoch 119 Batch 90/172] avg loss 0.00126903, throughput 2.8063K wps
[Epoch 119 Batch 120/172] avg loss 0.00110486, throughput 2.77166K wps
[Epoch 119 Batch 150/172] avg loss 0.00123621, throughput 2.75131K wps
Begin Testing...
[Epoch 119] train avg loss 0.00115775, dev acc 0.9015, dev avg loss 0.371161, throughput 2.74863K wps
[Epoch 120 Batch 30/172] avg loss 0.000965332, throughput 2.8036K wps
[Epoch 120 Batch 60/172] avg loss 0.00109746, throughput 2.77253K wps
[Epoch 120 Batch 90/172] avg loss 0.00109115, throughput 2.74489K wps
[Epoch 120 Batch 120/172] avg loss 0.00122575, throughput 2.65847K wps
[Epoch 120 Batch 150/172] avg loss 0.00129062, throughput 2.65602K wps
Begin Testing...
[Epoch 120] train avg loss 0.00112202, dev acc 0.9004, dev avg loss 0.384004, throughput 2.73217K wps
[Epoch 121 Batch 30/172] avg loss 0.00116012, throughput 2.80362K wps
[Epoch 121 Batch 60/172] avg loss 0.0011893, throughput 2.76003K wps
[Epoch 121 Batch 90/172] avg loss 0.00099164, throughput 2.78026K wps
[Epoch 121 Batch 120/172] avg loss 0.00113147, throughput 2.78461K wps
[Epoch 121 Batch 150/172] avg loss 0.00126744, throughput 2.80993K wps
Begin Testing...
[Epoch 121] train avg loss 0.00115301, dev acc 0.9025, dev avg loss 0.375597, throughput 2.78491K wps
[Epoch 122 Batch 30/172] avg loss 0.000936276, throughput 2.86151K wps
[Epoch 122 Batch 60/172] avg loss 0.00112411, throughput 2.8268K wps
[Epoch 122 Batch 90/172] avg loss 0.000932508, throughput 2.76885K wps
[Epoch 122 Batch 120/172] avg loss 0.00128069, throughput 2.78863K wps
[Epoch 122 Batch 150/172] avg loss 0.00112296, throughput 2.76996K wps
Begin Testing...
[Epoch 122] train avg loss 0.00107954, dev acc 0.9025, dev avg loss 0.378099, throughput 2.80521K wps
[Epoch 123 Batch 30/172] avg loss 0.00118797, throughput 2.83029K wps
[Epoch 123 Batch 60/172] avg loss 0.00107413, throughput 2.75846K wps
[Epoch 123 Batch 90/172] avg loss 0.00105813, throughput 2.84442K wps
[Epoch 123 Batch 120/172] avg loss 0.00119254, throughput 2.8169K wps
[Epoch 123 Batch 150/172] avg loss 0.000989026, throughput 2.6994K wps
Begin Testing...
[Epoch 123] train avg loss 0.00111623, dev acc 0.9025, dev avg loss 0.378653, throughput 2.79591K wps
[Epoch 124 Batch 30/172] avg loss 0.000958638, throughput 2.83145K wps
[Epoch 124 Batch 60/172] avg loss 0.00120998, throughput 2.77348K wps
[Epoch 124 Batch 90/172] avg loss 0.00102621, throughput 2.77375K wps
[Epoch 124 Batch 120/172] avg loss 0.00104242, throughput 2.67398K wps
[Epoch 124 Batch 150/172] avg loss 0.00121638, throughput 2.66544K wps
Begin Testing...
[Epoch 124] train avg loss 0.00107947, dev acc 0.9025, dev avg loss 0.381829, throughput 2.7516K wps
[Epoch 125 Batch 30/172] avg loss 0.00110677, throughput 2.77173K wps
[Epoch 125 Batch 60/172] avg loss 0.00123339, throughput 2.72214K wps
[Epoch 125 Batch 90/172] avg loss 0.00124096, throughput 2.60573K wps
[Epoch 125 Batch 120/172] avg loss 0.000871062, throughput 2.71079K wps
[Epoch 125 Batch 150/172] avg loss 0.00110643, throughput 2.79715K wps
Begin Testing...
[Epoch 125] train avg loss 0.00111568, dev acc 0.9025, dev avg loss 0.379348, throughput 2.7042K wps
[Epoch 126 Batch 30/172] avg loss 0.00100912, throughput 2.80111K wps
[Epoch 126 Batch 60/172] avg loss 0.000938236, throughput 2.79224K wps
[Epoch 126 Batch 90/172] avg loss 0.00107586, throughput 2.72599K wps
[Epoch 126 Batch 120/172] avg loss 0.00125129, throughput 2.81217K wps
[Epoch 126 Batch 150/172] avg loss 0.00101525, throughput 2.78081K wps
Begin Testing...
[Epoch 126] train avg loss 0.00103347, dev acc 0.9025, dev avg loss 0.383787, throughput 2.7813K wps
[Epoch 127 Batch 30/172] avg loss 0.000926853, throughput 2.81105K wps
[Epoch 127 Batch 60/172] avg loss 0.00102428, throughput 2.78811K wps
[Epoch 127 Batch 90/172] avg loss 0.00100806, throughput 2.81077K wps
[Epoch 127 Batch 120/172] avg loss 0.000972051, throughput 2.80765K wps
[Epoch 127 Batch 150/172] avg loss 0.00130292, throughput 2.79789K wps
Begin Testing...
[Epoch 127] train avg loss 0.00108876, dev acc 0.9025, dev avg loss 0.379777, throughput 2.80195K wps
[Epoch 128 Batch 30/172] avg loss 0.00105057, throughput 2.85796K wps
[Epoch 128 Batch 60/172] avg loss 0.00109647, throughput 2.77636K wps
[Epoch 128 Batch 90/172] avg loss 0.0010008, throughput 2.7695K wps
[Epoch 128 Batch 120/172] avg loss 0.00099322, throughput 2.71511K wps
[Epoch 128 Batch 150/172] avg loss 0.000893342, throughput 2.78319K wps
Begin Testing...
[Epoch 128] train avg loss 0.00102154, dev acc 0.9046, dev avg loss 0.387923, throughput 2.77181K wps
[Epoch 129 Batch 30/172] avg loss 0.000822269, throughput 2.80654K wps
[Epoch 129 Batch 60/172] avg loss 0.00107426, throughput 2.79292K wps
[Epoch 129 Batch 90/172] avg loss 0.00103879, throughput 2.80207K wps
[Epoch 129 Batch 120/172] avg loss 0.00121077, throughput 2.69293K wps
[Epoch 129 Batch 150/172] avg loss 0.00112658, throughput 2.82374K wps
Begin Testing...
[Epoch 129] train avg loss 0.00105076, dev acc 0.9036, dev avg loss 0.385687, throughput 2.77803K wps
[Epoch 130 Batch 30/172] avg loss 0.00111756, throughput 2.72829K wps
[Epoch 130 Batch 60/172] avg loss 0.0010183, throughput 2.70203K wps
[Epoch 130 Batch 90/172] avg loss 0.00084051, throughput 2.77507K wps
[Epoch 130 Batch 120/172] avg loss 0.00112762, throughput 2.73834K wps
[Epoch 130 Batch 150/172] avg loss 0.00106382, throughput 2.73078K wps
Begin Testing...
[Epoch 130] train avg loss 0.00105024, dev acc 0.9046, dev avg loss 0.384846, throughput 2.73297K wps
[Epoch 131 Batch 30/172] avg loss 0.000849196, throughput 2.8033K wps
[Epoch 131 Batch 60/172] avg loss 0.000909452, throughput 2.73342K wps
[Epoch 131 Batch 90/172] avg loss 0.00090225, throughput 2.78071K wps
[Epoch 131 Batch 120/172] avg loss 0.00104023, throughput 2.6447K wps
[Epoch 131 Batch 150/172] avg loss 0.00112547, throughput 2.67023K wps
Begin Testing...
[Epoch 131] train avg loss 0.00101785, dev acc 0.9025, dev avg loss 0.388399, throughput 2.72079K wps
[Epoch 132 Batch 30/172] avg loss 0.0010693, throughput 2.69432K wps
[Epoch 132 Batch 60/172] avg loss 0.00100469, throughput 2.64259K wps
[Epoch 132 Batch 90/172] avg loss 0.00116712, throughput 2.74537K wps
[Epoch 132 Batch 120/172] avg loss 0.00087929, throughput 2.75211K wps
[Epoch 132 Batch 150/172] avg loss 0.00102786, throughput 2.61688K wps
Begin Testing...
[Epoch 132] train avg loss 0.00104845, dev acc 0.9046, dev avg loss 0.389507, throughput 2.70723K wps
[Epoch 133 Batch 30/172] avg loss 0.00117789, throughput 2.86481K wps
[Epoch 133 Batch 60/172] avg loss 0.00107348, throughput 2.71661K wps
[Epoch 133 Batch 90/172] avg loss 0.000902754, throughput 2.79445K wps
[Epoch 133 Batch 120/172] avg loss 0.00112116, throughput 2.77239K wps
[Epoch 133 Batch 150/172] avg loss 0.00097881, throughput 2.64525K wps
Begin Testing...
[Epoch 133] train avg loss 0.00105969, dev acc 0.9046, dev avg loss 0.388887, throughput 2.76805K wps
[Epoch 134 Batch 30/172] avg loss 0.00105728, throughput 2.83952K wps
[Epoch 134 Batch 60/172] avg loss 0.00092694, throughput 2.78164K wps
[Epoch 134 Batch 90/172] avg loss 0.000968722, throughput 2.6854K wps
[Epoch 134 Batch 120/172] avg loss 0.00104917, throughput 2.82391K wps
[Epoch 134 Batch 150/172] avg loss 0.0011611, throughput 2.80492K wps
Begin Testing...
[Epoch 134] train avg loss 0.00101876, dev acc 0.9004, dev avg loss 0.392255, throughput 2.78756K wps
[Epoch 135 Batch 30/172] avg loss 0.000969175, throughput 2.7015K wps
[Epoch 135 Batch 60/172] avg loss 0.000978334, throughput 2.75039K wps
[Epoch 135 Batch 90/172] avg loss 0.0010328, throughput 2.77385K wps
[Epoch 135 Batch 120/172] avg loss 0.00113531, throughput 2.68343K wps
[Epoch 135 Batch 150/172] avg loss 0.00116226, throughput 2.71971K wps
Begin Testing...
[Epoch 135] train avg loss 0.00106441, dev acc 0.9046, dev avg loss 0.39237, throughput 2.73308K wps
[Epoch 136 Batch 30/172] avg loss 0.000893199, throughput 2.78692K wps
[Epoch 136 Batch 60/172] avg loss 0.000887138, throughput 2.78292K wps
[Epoch 136 Batch 90/172] avg loss 0.000988721, throughput 2.76371K wps
[Epoch 136 Batch 120/172] avg loss 0.00109535, throughput 2.66238K wps
[Epoch 136 Batch 150/172] avg loss 0.000923039, throughput 2.81109K wps
Begin Testing...
[Epoch 136] train avg loss 0.00101255, dev acc 0.9015, dev avg loss 0.393766, throughput 2.76529K wps
[Epoch 137 Batch 30/172] avg loss 0.000984952, throughput 2.74143K wps
[Epoch 137 Batch 60/172] avg loss 0.00107392, throughput 2.76219K wps
[Epoch 137 Batch 90/172] avg loss 0.000954425, throughput 2.74877K wps
[Epoch 137 Batch 120/172] avg loss 0.00103871, throughput 2.7523K wps
[Epoch 137 Batch 150/172] avg loss 0.00097619, throughput 2.86537K wps
Begin Testing...
[Epoch 137] train avg loss 0.00102843, dev acc 0.9015, dev avg loss 0.393099, throughput 2.78055K wps
[Epoch 138 Batch 30/172] avg loss 0.000883082, throughput 2.75664K wps
[Epoch 138 Batch 60/172] avg loss 0.000936972, throughput 2.78689K wps
[Epoch 138 Batch 90/172] avg loss 0.000991133, throughput 2.81287K wps
[Epoch 138 Batch 120/172] avg loss 0.00112071, throughput 2.7925K wps
[Epoch 138 Batch 150/172] avg loss 0.000960506, throughput 2.80579K wps
Begin Testing...
[Epoch 138] train avg loss 0.000995474, dev acc 0.9025, dev avg loss 0.392531, throughput 2.79297K wps
[Epoch 139 Batch 30/172] avg loss 0.00080581, throughput 2.6606K wps
[Epoch 139 Batch 60/172] avg loss 0.000929134, throughput 2.82902K wps
[Epoch 139 Batch 90/172] avg loss 0.00108102, throughput 2.67555K wps
[Epoch 139 Batch 120/172] avg loss 0.00134243, throughput 2.85727K wps
[Epoch 139 Batch 150/172] avg loss 0.00098505, throughput 2.8126K wps
Begin Testing...
[Epoch 139] train avg loss 0.00103171, dev acc 0.9004, dev avg loss 0.391367, throughput 2.75295K wps
[Epoch 140 Batch 30/172] avg loss 0.000938076, throughput 2.83292K wps
[Epoch 140 Batch 60/172] avg loss 0.00107055, throughput 2.75653K wps
[Epoch 140 Batch 90/172] avg loss 0.000886319, throughput 2.61171K wps
[Epoch 140 Batch 120/172] avg loss 0.0011897, throughput 2.7284K wps
[Epoch 140 Batch 150/172] avg loss 0.000992911, throughput 2.59381K wps
Begin Testing...
[Epoch 140] train avg loss 0.00101929, dev acc 0.9036, dev avg loss 0.396637, throughput 2.69837K wps
[Epoch 141 Batch 30/172] avg loss 0.00086021, throughput 2.83136K wps
[Epoch 141 Batch 60/172] avg loss 0.00100704, throughput 2.74779K wps
[Epoch 141 Batch 90/172] avg loss 0.000991026, throughput 2.77945K wps
[Epoch 141 Batch 120/172] avg loss 0.00109152, throughput 2.79833K wps
[Epoch 141 Batch 150/172] avg loss 0.000941887, throughput 2.69303K wps
Begin Testing...
[Epoch 141] train avg loss 0.00098976, dev acc 0.9036, dev avg loss 0.39934, throughput 2.75712K wps
[Epoch 142 Batch 30/172] avg loss 0.000718949, throughput 2.7432K wps
[Epoch 142 Batch 60/172] avg loss 0.00104754, throughput 2.78495K wps
[Epoch 142 Batch 90/172] avg loss 0.000990628, throughput 2.73523K wps
[Epoch 142 Batch 120/172] avg loss 0.000911125, throughput 2.82511K wps
[Epoch 142 Batch 150/172] avg loss 0.00115625, throughput 2.55344K wps
Begin Testing...
[Epoch 142] train avg loss 0.000985973, dev acc 0.9015, dev avg loss 0.396431, throughput 2.74335K wps
[Epoch 143 Batch 30/172] avg loss 0.000970778, throughput 2.78733K wps
[Epoch 143 Batch 60/172] avg loss 0.00100042, throughput 2.77769K wps
[Epoch 143 Batch 90/172] avg loss 0.000936836, throughput 2.76297K wps
[Epoch 143 Batch 120/172] avg loss 0.000886848, throughput 2.77353K wps
[Epoch 143 Batch 150/172] avg loss 0.00108416, throughput 2.75465K wps
Begin Testing...
[Epoch 143] train avg loss 0.000990943, dev acc 0.9025, dev avg loss 0.40201, throughput 2.76843K wps
[Epoch 144 Batch 30/172] avg loss 0.00102065, throughput 2.77727K wps
[Epoch 144 Batch 60/172] avg loss 0.000885024, throughput 2.79939K wps
[Epoch 144 Batch 90/172] avg loss 0.00105027, throughput 2.70162K wps
[Epoch 144 Batch 120/172] avg loss 0.00113261, throughput 2.80073K wps
[Epoch 144 Batch 150/172] avg loss 0.000834891, throughput 2.69195K wps
Begin Testing...
[Epoch 144] train avg loss 0.000978072, dev acc 0.9015, dev avg loss 0.401693, throughput 2.75654K wps
[Epoch 145 Batch 30/172] avg loss 0.000768947, throughput 2.77055K wps
[Epoch 145 Batch 60/172] avg loss 0.000919286, throughput 2.59322K wps
[Epoch 145 Batch 90/172] avg loss 0.000852223, throughput 2.65357K wps
[Epoch 145 Batch 120/172] avg loss 0.00101703, throughput 2.67992K wps
[Epoch 145 Batch 150/172] avg loss 0.00105272, throughput 2.73975K wps
Begin Testing...
[Epoch 145] train avg loss 0.000956503, dev acc 0.9004, dev avg loss 0.400752, throughput 2.69999K wps
[Epoch 146 Batch 30/172] avg loss 0.000915141, throughput 2.86633K wps
[Epoch 146 Batch 60/172] avg loss 0.000845421, throughput 2.80068K wps
[Epoch 146 Batch 90/172] avg loss 0.000807013, throughput 2.80019K wps
[Epoch 146 Batch 120/172] avg loss 0.00104572, throughput 2.7626K wps
[Epoch 146 Batch 150/172] avg loss 0.00086792, throughput 2.80912K wps