Skip to content
Permalink
master
Switch branches/tags
Go to file
 
 
Cannot retrieve contributors at this time
Namespace(batch_size=50, data_name='Subj', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='static')
Use gpu0
maximum length (in tokens): 120
Done! Tokenizing Time=0.24s, #Sentences=10000
SentimentNet(
(embedding): Embedding(21326 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/162] avg loss 0.0139927, throughput 0.563051K wps
[Epoch 0 Batch 60/162] avg loss 0.0138302, throughput 9.52509K wps
[Epoch 0 Batch 90/162] avg loss 0.0137475, throughput 9.45732K wps
[Epoch 0 Batch 120/162] avg loss 0.0136195, throughput 9.51905K wps
[Epoch 0 Batch 150/162] avg loss 0.0135821, throughput 9.46978K wps
Begin Testing...
[Epoch 0] train avg loss 0.0137352, dev acc 0.7733, dev avg loss 0.662958, throughput 2.41023K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0133297, throughput 9.67788K wps
[Epoch 1 Batch 60/162] avg loss 0.0131657, throughput 9.37649K wps
[Epoch 1 Batch 90/162] avg loss 0.0131255, throughput 9.49583K wps
[Epoch 1 Batch 120/162] avg loss 0.0129709, throughput 9.35736K wps
[Epoch 1 Batch 150/162] avg loss 0.0129759, throughput 9.46346K wps
Begin Testing...
[Epoch 1] train avg loss 0.013072, dev acc 0.6622, dev avg loss 0.63981, throughput 9.48049K wps
[Epoch 2 Batch 30/162] avg loss 0.0128046, throughput 9.62158K wps
[Epoch 2 Batch 60/162] avg loss 0.0125604, throughput 9.39331K wps
[Epoch 2 Batch 90/162] avg loss 0.0124079, throughput 9.55848K wps
[Epoch 2 Batch 120/162] avg loss 0.0122335, throughput 9.54988K wps
[Epoch 2 Batch 150/162] avg loss 0.0122206, throughput 9.68006K wps
Begin Testing...
[Epoch 2] train avg loss 0.0124286, dev acc 0.8478, dev avg loss 0.600085, throughput 9.55656K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.0119629, throughput 9.74152K wps
[Epoch 3 Batch 60/162] avg loss 0.0119284, throughput 9.48386K wps
[Epoch 3 Batch 90/162] avg loss 0.0117038, throughput 9.53708K wps
[Epoch 3 Batch 120/162] avg loss 0.0116698, throughput 9.333K wps
[Epoch 3 Batch 150/162] avg loss 0.0113905, throughput 9.67526K wps
Begin Testing...
[Epoch 3] train avg loss 0.011693, dev acc 0.8567, dev avg loss 0.564077, throughput 9.52981K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/162] avg loss 0.0111505, throughput 9.79962K wps
[Epoch 4 Batch 60/162] avg loss 0.0109733, throughput 9.69988K wps
[Epoch 4 Batch 90/162] avg loss 0.0109122, throughput 9.54538K wps
[Epoch 4 Batch 120/162] avg loss 0.01089, throughput 9.43827K wps
[Epoch 4 Batch 150/162] avg loss 0.0107541, throughput 9.62589K wps
Begin Testing...
[Epoch 4] train avg loss 0.0108868, dev acc 0.8678, dev avg loss 0.522418, throughput 9.6274K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/162] avg loss 0.0104188, throughput 9.71097K wps
[Epoch 5 Batch 60/162] avg loss 0.0102718, throughput 9.58992K wps
[Epoch 5 Batch 90/162] avg loss 0.00994886, throughput 9.49034K wps
[Epoch 5 Batch 120/162] avg loss 0.00980264, throughput 9.46891K wps
[Epoch 5 Batch 150/162] avg loss 0.00998378, throughput 9.70823K wps
Begin Testing...
[Epoch 5] train avg loss 0.0100641, dev acc 0.8722, dev avg loss 0.482372, throughput 9.60097K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/162] avg loss 0.00960078, throughput 9.66465K wps
[Epoch 6 Batch 60/162] avg loss 0.00951817, throughput 9.49316K wps
[Epoch 6 Batch 90/162] avg loss 0.00915172, throughput 9.61593K wps
[Epoch 6 Batch 120/162] avg loss 0.00909617, throughput 9.456K wps
[Epoch 6 Batch 150/162] avg loss 0.00920177, throughput 9.35415K wps
Begin Testing...
[Epoch 6] train avg loss 0.00926776, dev acc 0.8689, dev avg loss 0.447398, throughput 9.5075K wps
[Epoch 7 Batch 30/162] avg loss 0.00898816, throughput 9.71807K wps
[Epoch 7 Batch 60/162] avg loss 0.00859521, throughput 9.54446K wps
[Epoch 7 Batch 90/162] avg loss 0.00854635, throughput 9.38923K wps
[Epoch 7 Batch 120/162] avg loss 0.0082965, throughput 9.61768K wps
[Epoch 7 Batch 150/162] avg loss 0.00855405, throughput 9.64452K wps
Begin Testing...
[Epoch 7] train avg loss 0.00859653, dev acc 0.8722, dev avg loss 0.417487, throughput 9.58623K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/162] avg loss 0.00807569, throughput 9.63364K wps
[Epoch 8 Batch 60/162] avg loss 0.0081229, throughput 9.54556K wps
[Epoch 8 Batch 90/162] avg loss 0.00795082, throughput 9.47762K wps
[Epoch 8 Batch 120/162] avg loss 0.00809858, throughput 9.34018K wps
[Epoch 8 Batch 150/162] avg loss 0.00807176, throughput 9.64029K wps
Begin Testing...
[Epoch 8] train avg loss 0.00802045, dev acc 0.8700, dev avg loss 0.394164, throughput 9.51211K wps
[Epoch 9 Batch 30/162] avg loss 0.00765048, throughput 9.68748K wps
[Epoch 9 Batch 60/162] avg loss 0.00766964, throughput 9.60219K wps
[Epoch 9 Batch 90/162] avg loss 0.00778438, throughput 9.46143K wps
[Epoch 9 Batch 120/162] avg loss 0.00752026, throughput 9.51964K wps
[Epoch 9 Batch 150/162] avg loss 0.0073391, throughput 9.57476K wps
Begin Testing...
[Epoch 9] train avg loss 0.0075777, dev acc 0.8733, dev avg loss 0.376142, throughput 9.548K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.00717406, throughput 9.68558K wps
[Epoch 10 Batch 60/162] avg loss 0.00721908, throughput 9.46155K wps
[Epoch 10 Batch 90/162] avg loss 0.00730357, throughput 9.58996K wps
[Epoch 10 Batch 120/162] avg loss 0.00733003, throughput 9.44458K wps
[Epoch 10 Batch 150/162] avg loss 0.00701399, throughput 9.40509K wps
Begin Testing...
[Epoch 10] train avg loss 0.00718916, dev acc 0.8756, dev avg loss 0.363214, throughput 9.51757K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.00678275, throughput 9.54591K wps
[Epoch 11 Batch 60/162] avg loss 0.00700366, throughput 9.4893K wps
[Epoch 11 Batch 90/162] avg loss 0.00689843, throughput 9.4057K wps
[Epoch 11 Batch 120/162] avg loss 0.0070885, throughput 9.4621K wps
[Epoch 11 Batch 150/162] avg loss 0.0067608, throughput 9.6145K wps
Begin Testing...
[Epoch 11] train avg loss 0.00691345, dev acc 0.8778, dev avg loss 0.352661, throughput 9.48861K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.00683382, throughput 9.78665K wps
[Epoch 12 Batch 60/162] avg loss 0.00662973, throughput 9.41849K wps
[Epoch 12 Batch 90/162] avg loss 0.0062095, throughput 9.52567K wps
[Epoch 12 Batch 120/162] avg loss 0.00622745, throughput 9.63293K wps
[Epoch 12 Batch 150/162] avg loss 0.00668349, throughput 9.51729K wps
Begin Testing...
[Epoch 12] train avg loss 0.00650781, dev acc 0.8811, dev avg loss 0.341088, throughput 9.57071K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/162] avg loss 0.0066457, throughput 9.55024K wps
[Epoch 13 Batch 60/162] avg loss 0.0066435, throughput 9.52019K wps
[Epoch 13 Batch 90/162] avg loss 0.00622139, throughput 9.42255K wps
[Epoch 13 Batch 120/162] avg loss 0.00649197, throughput 9.44282K wps
[Epoch 13 Batch 150/162] avg loss 0.00611179, throughput 9.57489K wps
Begin Testing...
[Epoch 13] train avg loss 0.00644013, dev acc 0.8856, dev avg loss 0.333625, throughput 9.49058K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00632254, throughput 9.62778K wps
[Epoch 14 Batch 60/162] avg loss 0.00630582, throughput 9.36483K wps
[Epoch 14 Batch 90/162] avg loss 0.00641791, throughput 9.35305K wps
[Epoch 14 Batch 120/162] avg loss 0.00604265, throughput 9.41917K wps
[Epoch 14 Batch 150/162] avg loss 0.00612689, throughput 9.57848K wps
Begin Testing...
[Epoch 14] train avg loss 0.00624127, dev acc 0.8867, dev avg loss 0.326977, throughput 9.44655K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00631406, throughput 9.60864K wps
[Epoch 15 Batch 60/162] avg loss 0.00592309, throughput 9.57579K wps
[Epoch 15 Batch 90/162] avg loss 0.00617731, throughput 9.50213K wps
[Epoch 15 Batch 120/162] avg loss 0.00596338, throughput 9.37959K wps
[Epoch 15 Batch 150/162] avg loss 0.00599038, throughput 9.4467K wps
Begin Testing...
[Epoch 15] train avg loss 0.00605979, dev acc 0.8889, dev avg loss 0.320968, throughput 9.48432K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/162] avg loss 0.00569059, throughput 9.56589K wps
[Epoch 16 Batch 60/162] avg loss 0.00582947, throughput 9.40569K wps
[Epoch 16 Batch 90/162] avg loss 0.0062145, throughput 9.26223K wps
[Epoch 16 Batch 120/162] avg loss 0.00600952, throughput 9.43629K wps
[Epoch 16 Batch 150/162] avg loss 0.00606738, throughput 9.49933K wps
Begin Testing...
[Epoch 16] train avg loss 0.00593747, dev acc 0.8867, dev avg loss 0.315545, throughput 9.42478K wps
[Epoch 17 Batch 30/162] avg loss 0.00557341, throughput 9.58483K wps
[Epoch 17 Batch 60/162] avg loss 0.00592358, throughput 9.39695K wps
[Epoch 17 Batch 90/162] avg loss 0.00622066, throughput 9.32288K wps
[Epoch 17 Batch 120/162] avg loss 0.00554141, throughput 9.35937K wps
[Epoch 17 Batch 150/162] avg loss 0.00607266, throughput 9.64441K wps
Begin Testing...
[Epoch 17] train avg loss 0.00583486, dev acc 0.8889, dev avg loss 0.311482, throughput 9.46071K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/162] avg loss 0.00559593, throughput 9.73618K wps
[Epoch 18 Batch 60/162] avg loss 0.00565354, throughput 9.46467K wps
[Epoch 18 Batch 90/162] avg loss 0.00555657, throughput 9.26849K wps
[Epoch 18 Batch 120/162] avg loss 0.00615792, throughput 9.34718K wps
[Epoch 18 Batch 150/162] avg loss 0.00531747, throughput 9.45421K wps
Begin Testing...
[Epoch 18] train avg loss 0.0056676, dev acc 0.8889, dev avg loss 0.306577, throughput 9.44157K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.00553821, throughput 9.64454K wps
[Epoch 19 Batch 60/162] avg loss 0.00530412, throughput 9.39186K wps
[Epoch 19 Batch 90/162] avg loss 0.00555156, throughput 9.39406K wps
[Epoch 19 Batch 120/162] avg loss 0.00581841, throughput 9.43238K wps
[Epoch 19 Batch 150/162] avg loss 0.00549919, throughput 9.34169K wps
Begin Testing...
[Epoch 19] train avg loss 0.00553728, dev acc 0.8911, dev avg loss 0.302825, throughput 9.43016K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/162] avg loss 0.00521793, throughput 9.78528K wps
[Epoch 20 Batch 60/162] avg loss 0.00523199, throughput 9.59378K wps
[Epoch 20 Batch 90/162] avg loss 0.00544018, throughput 9.55586K wps
[Epoch 20 Batch 120/162] avg loss 0.00572085, throughput 9.62187K wps
[Epoch 20 Batch 150/162] avg loss 0.00584005, throughput 9.55164K wps
Begin Testing...
[Epoch 20] train avg loss 0.00545546, dev acc 0.8922, dev avg loss 0.298785, throughput 9.61922K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/162] avg loss 0.00563956, throughput 9.55574K wps
[Epoch 21 Batch 60/162] avg loss 0.00519013, throughput 9.38313K wps
[Epoch 21 Batch 90/162] avg loss 0.0051913, throughput 9.47522K wps
[Epoch 21 Batch 120/162] avg loss 0.00510079, throughput 9.50785K wps
[Epoch 21 Batch 150/162] avg loss 0.00502946, throughput 9.27265K wps
Begin Testing...
[Epoch 21] train avg loss 0.00524858, dev acc 0.8956, dev avg loss 0.295013, throughput 9.43116K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/162] avg loss 0.0052942, throughput 9.69388K wps
[Epoch 22 Batch 60/162] avg loss 0.00522257, throughput 9.48309K wps
[Epoch 22 Batch 90/162] avg loss 0.00504285, throughput 9.44366K wps
[Epoch 22 Batch 120/162] avg loss 0.00495531, throughput 9.39026K wps
[Epoch 22 Batch 150/162] avg loss 0.00539518, throughput 9.52251K wps
Begin Testing...
[Epoch 22] train avg loss 0.00516369, dev acc 0.8967, dev avg loss 0.291609, throughput 9.51514K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/162] avg loss 0.00514018, throughput 9.64495K wps
[Epoch 23 Batch 60/162] avg loss 0.00501121, throughput 9.4804K wps
[Epoch 23 Batch 90/162] avg loss 0.00491868, throughput 9.47652K wps
[Epoch 23 Batch 120/162] avg loss 0.00524275, throughput 9.41957K wps
[Epoch 23 Batch 150/162] avg loss 0.0052641, throughput 9.45964K wps
Begin Testing...
[Epoch 23] train avg loss 0.00508836, dev acc 0.8922, dev avg loss 0.289848, throughput 9.48341K wps
[Epoch 24 Batch 30/162] avg loss 0.00510846, throughput 9.47826K wps
[Epoch 24 Batch 60/162] avg loss 0.00495213, throughput 9.58132K wps
[Epoch 24 Batch 90/162] avg loss 0.00466404, throughput 9.34193K wps
[Epoch 24 Batch 120/162] avg loss 0.00530204, throughput 9.60358K wps
[Epoch 24 Batch 150/162] avg loss 0.0047967, throughput 9.47435K wps
Begin Testing...
[Epoch 24] train avg loss 0.00494935, dev acc 0.8956, dev avg loss 0.285739, throughput 9.50178K wps
[Epoch 25 Batch 30/162] avg loss 0.00484253, throughput 9.66595K wps
[Epoch 25 Batch 60/162] avg loss 0.00514943, throughput 9.43817K wps
[Epoch 25 Batch 90/162] avg loss 0.00455663, throughput 9.34473K wps
[Epoch 25 Batch 120/162] avg loss 0.00507725, throughput 9.44696K wps
[Epoch 25 Batch 150/162] avg loss 0.00469302, throughput 9.48272K wps
Begin Testing...
[Epoch 25] train avg loss 0.00487417, dev acc 0.8956, dev avg loss 0.283306, throughput 9.47791K wps
[Epoch 26 Batch 30/162] avg loss 0.00503558, throughput 9.49072K wps
[Epoch 26 Batch 60/162] avg loss 0.00440408, throughput 9.44584K wps
[Epoch 26 Batch 90/162] avg loss 0.00464679, throughput 9.52314K wps
[Epoch 26 Batch 120/162] avg loss 0.00515276, throughput 9.33639K wps
[Epoch 26 Batch 150/162] avg loss 0.00482135, throughput 9.45824K wps
Begin Testing...
[Epoch 26] train avg loss 0.00482186, dev acc 0.8978, dev avg loss 0.280569, throughput 9.45502K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/162] avg loss 0.0045356, throughput 9.64995K wps
[Epoch 27 Batch 60/162] avg loss 0.00455278, throughput 9.36593K wps
[Epoch 27 Batch 90/162] avg loss 0.00482514, throughput 9.35764K wps
[Epoch 27 Batch 120/162] avg loss 0.00492322, throughput 9.25941K wps
[Epoch 27 Batch 150/162] avg loss 0.00494774, throughput 9.46653K wps
Begin Testing...
[Epoch 27] train avg loss 0.00475806, dev acc 0.8967, dev avg loss 0.2784, throughput 9.40425K wps
[Epoch 28 Batch 30/162] avg loss 0.00434146, throughput 9.67811K wps
[Epoch 28 Batch 60/162] avg loss 0.00488754, throughput 9.60404K wps
[Epoch 28 Batch 90/162] avg loss 0.00423977, throughput 9.34693K wps
[Epoch 28 Batch 120/162] avg loss 0.00503577, throughput 9.59636K wps
[Epoch 28 Batch 150/162] avg loss 0.00460634, throughput 9.33307K wps
Begin Testing...
[Epoch 28] train avg loss 0.00464542, dev acc 0.8978, dev avg loss 0.275934, throughput 9.50684K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/162] avg loss 0.00482399, throughput 9.56692K wps
[Epoch 29 Batch 60/162] avg loss 0.00454589, throughput 9.25683K wps
[Epoch 29 Batch 90/162] avg loss 0.00435749, throughput 9.57799K wps
[Epoch 29 Batch 120/162] avg loss 0.00478881, throughput 9.4965K wps
[Epoch 29 Batch 150/162] avg loss 0.00409712, throughput 9.51701K wps
Begin Testing...
[Epoch 29] train avg loss 0.00448683, dev acc 0.8978, dev avg loss 0.273418, throughput 9.4819K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/162] avg loss 0.00478067, throughput 9.80287K wps
[Epoch 30 Batch 60/162] avg loss 0.00419504, throughput 9.3498K wps
[Epoch 30 Batch 90/162] avg loss 0.00431353, throughput 9.42033K wps
[Epoch 30 Batch 120/162] avg loss 0.00470617, throughput 9.29221K wps
[Epoch 30 Batch 150/162] avg loss 0.00430479, throughput 9.39394K wps
Begin Testing...
[Epoch 30] train avg loss 0.00446317, dev acc 0.9000, dev avg loss 0.272378, throughput 9.45538K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/162] avg loss 0.00446604, throughput 9.61072K wps
[Epoch 31 Batch 60/162] avg loss 0.00452568, throughput 9.51161K wps
[Epoch 31 Batch 90/162] avg loss 0.00415589, throughput 9.45126K wps
[Epoch 31 Batch 120/162] avg loss 0.00450758, throughput 9.564K wps
[Epoch 31 Batch 150/162] avg loss 0.00432665, throughput 9.42179K wps
Begin Testing...
[Epoch 31] train avg loss 0.00440475, dev acc 0.8967, dev avg loss 0.269901, throughput 9.51569K wps
[Epoch 32 Batch 30/162] avg loss 0.00433878, throughput 9.51472K wps
[Epoch 32 Batch 60/162] avg loss 0.00443535, throughput 9.47296K wps
[Epoch 32 Batch 90/162] avg loss 0.00449091, throughput 9.45848K wps
[Epoch 32 Batch 120/162] avg loss 0.00428892, throughput 9.28994K wps
[Epoch 32 Batch 150/162] avg loss 0.00416887, throughput 9.31816K wps
Begin Testing...
[Epoch 32] train avg loss 0.00433306, dev acc 0.8989, dev avg loss 0.268951, throughput 9.4196K wps
[Epoch 33 Batch 30/162] avg loss 0.00421211, throughput 9.58092K wps
[Epoch 33 Batch 60/162] avg loss 0.00408083, throughput 9.63121K wps
[Epoch 33 Batch 90/162] avg loss 0.00463901, throughput 9.34245K wps
[Epoch 33 Batch 120/162] avg loss 0.0044146, throughput 9.38966K wps
[Epoch 33 Batch 150/162] avg loss 0.00431222, throughput 9.31115K wps
Begin Testing...
[Epoch 33] train avg loss 0.00428337, dev acc 0.8989, dev avg loss 0.268036, throughput 9.46413K wps
[Epoch 34 Batch 30/162] avg loss 0.00408815, throughput 9.47671K wps
[Epoch 34 Batch 60/162] avg loss 0.00419916, throughput 9.39808K wps
[Epoch 34 Batch 90/162] avg loss 0.00407953, throughput 9.35001K wps
[Epoch 34 Batch 120/162] avg loss 0.0037462, throughput 9.5343K wps
[Epoch 34 Batch 150/162] avg loss 0.00441781, throughput 9.38551K wps
Begin Testing...
[Epoch 34] train avg loss 0.00414092, dev acc 0.8989, dev avg loss 0.264133, throughput 9.41777K wps
[Epoch 35 Batch 30/162] avg loss 0.00430011, throughput 9.60308K wps
[Epoch 35 Batch 60/162] avg loss 0.0040314, throughput 9.43085K wps
[Epoch 35 Batch 90/162] avg loss 0.00414802, throughput 9.59724K wps
[Epoch 35 Batch 120/162] avg loss 0.00419202, throughput 9.41857K wps
[Epoch 35 Batch 150/162] avg loss 0.00377472, throughput 9.55167K wps
Begin Testing...
[Epoch 35] train avg loss 0.00408505, dev acc 0.8978, dev avg loss 0.262154, throughput 9.52085K wps
[Epoch 36 Batch 30/162] avg loss 0.00398365, throughput 9.4616K wps
[Epoch 36 Batch 60/162] avg loss 0.00406087, throughput 9.57624K wps
[Epoch 36 Batch 90/162] avg loss 0.00419579, throughput 9.36475K wps
[Epoch 36 Batch 120/162] avg loss 0.00370225, throughput 9.47195K wps
[Epoch 36 Batch 150/162] avg loss 0.00422434, throughput 9.56823K wps
Begin Testing...
[Epoch 36] train avg loss 0.00403643, dev acc 0.9022, dev avg loss 0.26199, throughput 9.48986K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/162] avg loss 0.00369248, throughput 9.70319K wps
[Epoch 37 Batch 60/162] avg loss 0.00399778, throughput 9.23667K wps
[Epoch 37 Batch 90/162] avg loss 0.00388887, throughput 9.34555K wps
[Epoch 37 Batch 120/162] avg loss 0.0043826, throughput 9.49703K wps
[Epoch 37 Batch 150/162] avg loss 0.00427731, throughput 9.60465K wps
Begin Testing...
[Epoch 37] train avg loss 0.00402125, dev acc 0.9000, dev avg loss 0.259542, throughput 9.4719K wps
[Epoch 38 Batch 30/162] avg loss 0.00380891, throughput 9.49633K wps
[Epoch 38 Batch 60/162] avg loss 0.0040139, throughput 9.49164K wps
[Epoch 38 Batch 90/162] avg loss 0.00386191, throughput 9.35476K wps
[Epoch 38 Batch 120/162] avg loss 0.00370861, throughput 9.45103K wps
[Epoch 38 Batch 150/162] avg loss 0.0042393, throughput 9.26727K wps
Begin Testing...
[Epoch 38] train avg loss 0.0039613, dev acc 0.9011, dev avg loss 0.258285, throughput 9.40156K wps
[Epoch 39 Batch 30/162] avg loss 0.00401124, throughput 9.54127K wps
[Epoch 39 Batch 60/162] avg loss 0.00390607, throughput 9.31581K wps
[Epoch 39 Batch 90/162] avg loss 0.00373968, throughput 9.27547K wps
[Epoch 39 Batch 120/162] avg loss 0.00396615, throughput 9.41871K wps
[Epoch 39 Batch 150/162] avg loss 0.00379264, throughput 9.39576K wps
Begin Testing...
[Epoch 39] train avg loss 0.00390702, dev acc 0.8989, dev avg loss 0.256657, throughput 9.39427K wps
[Epoch 40 Batch 30/162] avg loss 0.00393162, throughput 9.67434K wps
[Epoch 40 Batch 60/162] avg loss 0.0037819, throughput 9.57699K wps
[Epoch 40 Batch 90/162] avg loss 0.00377915, throughput 9.46876K wps
[Epoch 40 Batch 120/162] avg loss 0.00373343, throughput 9.45007K wps
[Epoch 40 Batch 150/162] avg loss 0.00380252, throughput 9.26224K wps
Begin Testing...
[Epoch 40] train avg loss 0.00381492, dev acc 0.8989, dev avg loss 0.255187, throughput 9.4706K wps
[Epoch 41 Batch 30/162] avg loss 0.00399802, throughput 9.49814K wps
[Epoch 41 Batch 60/162] avg loss 0.00345699, throughput 9.29504K wps
[Epoch 41 Batch 90/162] avg loss 0.00360227, throughput 9.45748K wps
[Epoch 41 Batch 120/162] avg loss 0.00398935, throughput 9.41777K wps
[Epoch 41 Batch 150/162] avg loss 0.00378132, throughput 9.27861K wps
Begin Testing...
[Epoch 41] train avg loss 0.00374704, dev acc 0.8989, dev avg loss 0.25391, throughput 9.38096K wps
[Epoch 42 Batch 30/162] avg loss 0.00357563, throughput 9.54487K wps
[Epoch 42 Batch 60/162] avg loss 0.00361566, throughput 9.62968K wps
[Epoch 42 Batch 90/162] avg loss 0.00354063, throughput 9.49293K wps
[Epoch 42 Batch 120/162] avg loss 0.00361762, throughput 9.44148K wps
[Epoch 42 Batch 150/162] avg loss 0.00418969, throughput 9.36544K wps
Begin Testing...
[Epoch 42] train avg loss 0.00373444, dev acc 0.9000, dev avg loss 0.25275, throughput 9.50581K wps
[Epoch 43 Batch 30/162] avg loss 0.00370573, throughput 9.5242K wps
[Epoch 43 Batch 60/162] avg loss 0.00373772, throughput 9.43488K wps
[Epoch 43 Batch 90/162] avg loss 0.00357915, throughput 9.30858K wps
[Epoch 43 Batch 120/162] avg loss 0.00372048, throughput 9.3571K wps
[Epoch 43 Batch 150/162] avg loss 0.00356147, throughput 9.36751K wps
Begin Testing...
[Epoch 43] train avg loss 0.00365763, dev acc 0.9011, dev avg loss 0.251234, throughput 9.39331K wps
[Epoch 44 Batch 30/162] avg loss 0.00378978, throughput 9.61609K wps
[Epoch 44 Batch 60/162] avg loss 0.00358534, throughput 9.39665K wps
[Epoch 44 Batch 90/162] avg loss 0.00380747, throughput 9.63629K wps
[Epoch 44 Batch 120/162] avg loss 0.00341006, throughput 9.35279K wps
[Epoch 44 Batch 150/162] avg loss 0.00383862, throughput 9.451K wps
Begin Testing...
[Epoch 44] train avg loss 0.00362973, dev acc 0.9011, dev avg loss 0.250194, throughput 9.48177K wps
[Epoch 45 Batch 30/162] avg loss 0.00359368, throughput 9.65537K wps
[Epoch 45 Batch 60/162] avg loss 0.00378111, throughput 9.51237K wps
[Epoch 45 Batch 90/162] avg loss 0.00363177, throughput 9.39398K wps
[Epoch 45 Batch 120/162] avg loss 0.00371031, throughput 9.26322K wps
[Epoch 45 Batch 150/162] avg loss 0.00336906, throughput 9.49175K wps
Begin Testing...
[Epoch 45] train avg loss 0.00359808, dev acc 0.9011, dev avg loss 0.249553, throughput 9.46465K wps
[Epoch 46 Batch 30/162] avg loss 0.00344127, throughput 9.72143K wps
[Epoch 46 Batch 60/162] avg loss 0.00328195, throughput 9.34182K wps
[Epoch 46 Batch 90/162] avg loss 0.00362182, throughput 9.33831K wps
[Epoch 46 Batch 120/162] avg loss 0.00373991, throughput 9.35089K wps
[Epoch 46 Batch 150/162] avg loss 0.00354795, throughput 9.44608K wps
Begin Testing...
[Epoch 46] train avg loss 0.00350159, dev acc 0.9056, dev avg loss 0.249423, throughput 9.43889K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/162] avg loss 0.00342204, throughput 9.64541K wps
[Epoch 47 Batch 60/162] avg loss 0.00347048, throughput 9.31912K wps
[Epoch 47 Batch 90/162] avg loss 0.00336565, throughput 9.37817K wps
[Epoch 47 Batch 120/162] avg loss 0.00349036, throughput 9.57605K wps
[Epoch 47 Batch 150/162] avg loss 0.00336753, throughput 9.44081K wps
Begin Testing...
[Epoch 47] train avg loss 0.00339945, dev acc 0.9067, dev avg loss 0.248338, throughput 9.44925K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/162] avg loss 0.00350012, throughput 9.44065K wps
[Epoch 48 Batch 60/162] avg loss 0.00330553, throughput 9.36003K wps
[Epoch 48 Batch 90/162] avg loss 0.00328175, throughput 9.31135K wps
[Epoch 48 Batch 120/162] avg loss 0.00379125, throughput 9.6338K wps
[Epoch 48 Batch 150/162] avg loss 0.00353947, throughput 9.38186K wps
Begin Testing...
[Epoch 48] train avg loss 0.00342202, dev acc 0.9067, dev avg loss 0.247824, throughput 9.43725K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/162] avg loss 0.00321639, throughput 9.54198K wps
[Epoch 49 Batch 60/162] avg loss 0.00353585, throughput 9.35214K wps
[Epoch 49 Batch 90/162] avg loss 0.00326472, throughput 9.29211K wps
[Epoch 49 Batch 120/162] avg loss 0.00307165, throughput 9.37366K wps
[Epoch 49 Batch 150/162] avg loss 0.00346831, throughput 9.27105K wps
Begin Testing...
[Epoch 49] train avg loss 0.00331722, dev acc 0.9022, dev avg loss 0.245726, throughput 9.35504K wps
[Epoch 50 Batch 30/162] avg loss 0.00318394, throughput 9.53669K wps
[Epoch 50 Batch 60/162] avg loss 0.00343808, throughput 9.49831K wps
[Epoch 50 Batch 90/162] avg loss 0.00327675, throughput 9.54207K wps
[Epoch 50 Batch 120/162] avg loss 0.00381823, throughput 9.49602K wps
[Epoch 50 Batch 150/162] avg loss 0.00297331, throughput 9.33471K wps
Begin Testing...
[Epoch 50] train avg loss 0.00333092, dev acc 0.9022, dev avg loss 0.244905, throughput 9.47841K wps
[Epoch 51 Batch 30/162] avg loss 0.00321521, throughput 9.52019K wps
[Epoch 51 Batch 60/162] avg loss 0.0032855, throughput 9.41971K wps
[Epoch 51 Batch 90/162] avg loss 0.00326702, throughput 9.27154K wps
[Epoch 51 Batch 120/162] avg loss 0.00322066, throughput 9.26474K wps
[Epoch 51 Batch 150/162] avg loss 0.00306519, throughput 9.47363K wps
Begin Testing...
[Epoch 51] train avg loss 0.00322283, dev acc 0.9022, dev avg loss 0.244081, throughput 9.39416K wps
[Epoch 52 Batch 30/162] avg loss 0.00318501, throughput 9.45487K wps
[Epoch 52 Batch 60/162] avg loss 0.00311068, throughput 9.43871K wps
[Epoch 52 Batch 90/162] avg loss 0.00349898, throughput 9.52401K wps
[Epoch 52 Batch 120/162] avg loss 0.00307708, throughput 9.49713K wps
[Epoch 52 Batch 150/162] avg loss 0.00310367, throughput 9.58282K wps
Begin Testing...
[Epoch 52] train avg loss 0.00319324, dev acc 0.9078, dev avg loss 0.244634, throughput 9.4922K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/162] avg loss 0.00330412, throughput 9.5568K wps
[Epoch 53 Batch 60/162] avg loss 0.00340498, throughput 9.38188K wps
[Epoch 53 Batch 90/162] avg loss 0.002726, throughput 9.33004K wps
[Epoch 53 Batch 120/162] avg loss 0.00325923, throughput 9.30984K wps
[Epoch 53 Batch 150/162] avg loss 0.00311163, throughput 9.50999K wps
Begin Testing...
[Epoch 53] train avg loss 0.00313863, dev acc 0.9078, dev avg loss 0.24387, throughput 9.40989K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/162] avg loss 0.00280522, throughput 9.5136K wps
[Epoch 54 Batch 60/162] avg loss 0.00312256, throughput 9.34957K wps
[Epoch 54 Batch 90/162] avg loss 0.0032516, throughput 9.49484K wps
[Epoch 54 Batch 120/162] avg loss 0.00296608, throughput 9.6302K wps
[Epoch 54 Batch 150/162] avg loss 0.00336393, throughput 9.57051K wps
Begin Testing...
[Epoch 54] train avg loss 0.00309793, dev acc 0.9067, dev avg loss 0.24246, throughput 9.5052K wps
[Epoch 55 Batch 30/162] avg loss 0.00311743, throughput 9.41864K wps
[Epoch 55 Batch 60/162] avg loss 0.00315787, throughput 9.32091K wps
[Epoch 55 Batch 90/162] avg loss 0.00304349, throughput 9.47142K wps
[Epoch 55 Batch 120/162] avg loss 0.00312005, throughput 9.26735K wps
[Epoch 55 Batch 150/162] avg loss 0.00288105, throughput 9.34897K wps
Begin Testing...
[Epoch 55] train avg loss 0.00304497, dev acc 0.9033, dev avg loss 0.240875, throughput 9.3566K wps
[Epoch 56 Batch 30/162] avg loss 0.00289679, throughput 9.51227K wps
[Epoch 56 Batch 60/162] avg loss 0.00314012, throughput 9.31222K wps
[Epoch 56 Batch 90/162] avg loss 0.00300704, throughput 9.40977K wps
[Epoch 56 Batch 120/162] avg loss 0.00313969, throughput 9.2384K wps
[Epoch 56 Batch 150/162] avg loss 0.00277625, throughput 9.54468K wps
Begin Testing...
[Epoch 56] train avg loss 0.00299008, dev acc 0.9067, dev avg loss 0.240352, throughput 9.39314K wps
[Epoch 57 Batch 30/162] avg loss 0.00299225, throughput 9.46794K wps
[Epoch 57 Batch 60/162] avg loss 0.00281025, throughput 9.35416K wps
[Epoch 57 Batch 90/162] avg loss 0.00297896, throughput 9.40174K wps
[Epoch 57 Batch 120/162] avg loss 0.00293832, throughput 9.25101K wps
[Epoch 57 Batch 150/162] avg loss 0.00276375, throughput 9.43863K wps
Begin Testing...
[Epoch 57] train avg loss 0.0029357, dev acc 0.9078, dev avg loss 0.240183, throughput 9.39874K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/162] avg loss 0.00280653, throughput 9.64921K wps
[Epoch 58 Batch 60/162] avg loss 0.00301661, throughput 9.3229K wps
[Epoch 58 Batch 90/162] avg loss 0.00294062, throughput 9.38517K wps
[Epoch 58 Batch 120/162] avg loss 0.00316029, throughput 9.40372K wps
[Epoch 58 Batch 150/162] avg loss 0.00312705, throughput 9.37382K wps
Begin Testing...
[Epoch 58] train avg loss 0.00300378, dev acc 0.9100, dev avg loss 0.240917, throughput 9.41193K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/162] avg loss 0.00293071, throughput 9.59655K wps
[Epoch 59 Batch 60/162] avg loss 0.00278885, throughput 9.3282K wps
[Epoch 59 Batch 90/162] avg loss 0.00260123, throughput 9.34104K wps
[Epoch 59 Batch 120/162] avg loss 0.00304476, throughput 9.48451K wps
[Epoch 59 Batch 150/162] avg loss 0.00294758, throughput 9.38572K wps
Begin Testing...
[Epoch 59] train avg loss 0.00288044, dev acc 0.9044, dev avg loss 0.238944, throughput 9.42144K wps
[Epoch 60 Batch 30/162] avg loss 0.00300888, throughput 9.568K wps
[Epoch 60 Batch 60/162] avg loss 0.00291366, throughput 9.44866K wps
[Epoch 60 Batch 90/162] avg loss 0.00267981, throughput 9.24282K wps
[Epoch 60 Batch 120/162] avg loss 0.00292821, throughput 9.48347K wps
[Epoch 60 Batch 150/162] avg loss 0.00276907, throughput 9.45153K wps
Begin Testing...
[Epoch 60] train avg loss 0.00283028, dev acc 0.9056, dev avg loss 0.239464, throughput 9.42948K wps
[Epoch 61 Batch 30/162] avg loss 0.00260855, throughput 9.67308K wps
[Epoch 61 Batch 60/162] avg loss 0.00278626, throughput 9.50827K wps
[Epoch 61 Batch 90/162] avg loss 0.00290986, throughput 9.53516K wps
[Epoch 61 Batch 120/162] avg loss 0.00299438, throughput 9.45582K wps
[Epoch 61 Batch 150/162] avg loss 0.00274816, throughput 9.48674K wps
Begin Testing...
[Epoch 61] train avg loss 0.00281143, dev acc 0.9056, dev avg loss 0.237615, throughput 9.5077K wps
[Epoch 62 Batch 30/162] avg loss 0.00288795, throughput 9.4864K wps
[Epoch 62 Batch 60/162] avg loss 0.00276271, throughput 9.33604K wps
[Epoch 62 Batch 90/162] avg loss 0.00266529, throughput 9.41457K wps
[Epoch 62 Batch 120/162] avg loss 0.00267132, throughput 9.4805K wps
[Epoch 62 Batch 150/162] avg loss 0.00278233, throughput 9.42662K wps
Begin Testing...
[Epoch 62] train avg loss 0.002759, dev acc 0.9056, dev avg loss 0.23689, throughput 9.42278K wps
[Epoch 63 Batch 30/162] avg loss 0.00265385, throughput 9.74434K wps
[Epoch 63 Batch 60/162] avg loss 0.00286414, throughput 9.40379K wps
[Epoch 63 Batch 90/162] avg loss 0.0028863, throughput 9.39386K wps
[Epoch 63 Batch 120/162] avg loss 0.00272298, throughput 9.50042K wps
[Epoch 63 Batch 150/162] avg loss 0.00257107, throughput 9.36452K wps
Begin Testing...
[Epoch 63] train avg loss 0.00273231, dev acc 0.9078, dev avg loss 0.23665, throughput 9.49075K wps
[Epoch 64 Batch 30/162] avg loss 0.00287369, throughput 9.70174K wps
[Epoch 64 Batch 60/162] avg loss 0.00268595, throughput 9.32407K wps
[Epoch 64 Batch 90/162] avg loss 0.00249368, throughput 9.40545K wps
[Epoch 64 Batch 120/162] avg loss 0.00279411, throughput 9.55831K wps
[Epoch 64 Batch 150/162] avg loss 0.00262947, throughput 9.50336K wps
Begin Testing...
[Epoch 64] train avg loss 0.00268018, dev acc 0.9067, dev avg loss 0.236, throughput 9.49381K wps
[Epoch 65 Batch 30/162] avg loss 0.00239457, throughput 9.43358K wps
[Epoch 65 Batch 60/162] avg loss 0.00267318, throughput 9.31637K wps
[Epoch 65 Batch 90/162] avg loss 0.00275123, throughput 9.47293K wps
[Epoch 65 Batch 120/162] avg loss 0.00294261, throughput 9.32146K wps
[Epoch 65 Batch 150/162] avg loss 0.00245131, throughput 9.52185K wps
Begin Testing...
[Epoch 65] train avg loss 0.00264365, dev acc 0.9089, dev avg loss 0.235717, throughput 9.40323K wps
[Epoch 66 Batch 30/162] avg loss 0.00268589, throughput 9.51885K wps
[Epoch 66 Batch 60/162] avg loss 0.00272469, throughput 9.6092K wps
[Epoch 66 Batch 90/162] avg loss 0.00248978, throughput 9.60937K wps
[Epoch 66 Batch 120/162] avg loss 0.00285202, throughput 9.5301K wps
[Epoch 66 Batch 150/162] avg loss 0.00249202, throughput 9.27504K wps
Begin Testing...
[Epoch 66] train avg loss 0.00264761, dev acc 0.9078, dev avg loss 0.235078, throughput 9.4942K wps
[Epoch 67 Batch 30/162] avg loss 0.00227765, throughput 9.60786K wps
[Epoch 67 Batch 60/162] avg loss 0.00280586, throughput 9.40882K wps
[Epoch 67 Batch 90/162] avg loss 0.00261633, throughput 9.45542K wps
[Epoch 67 Batch 120/162] avg loss 0.00236493, throughput 9.3603K wps
[Epoch 67 Batch 150/162] avg loss 0.00251714, throughput 9.48641K wps
Begin Testing...
[Epoch 67] train avg loss 0.0025287, dev acc 0.9078, dev avg loss 0.235218, throughput 9.47176K wps
[Epoch 68 Batch 30/162] avg loss 0.00247929, throughput 9.50213K wps
[Epoch 68 Batch 60/162] avg loss 0.00237766, throughput 9.40113K wps
[Epoch 68 Batch 90/162] avg loss 0.00258673, throughput 9.31912K wps
[Epoch 68 Batch 120/162] avg loss 0.00243504, throughput 9.55616K wps
[Epoch 68 Batch 150/162] avg loss 0.00259164, throughput 9.27384K wps
Begin Testing...
[Epoch 68] train avg loss 0.00252898, dev acc 0.9078, dev avg loss 0.235461, throughput 9.42179K wps
[Epoch 69 Batch 30/162] avg loss 0.00265411, throughput 9.56154K wps
[Epoch 69 Batch 60/162] avg loss 0.00241417, throughput 9.462K wps
[Epoch 69 Batch 90/162] avg loss 0.00254495, throughput 9.43332K wps
[Epoch 69 Batch 120/162] avg loss 0.0021794, throughput 9.3068K wps
[Epoch 69 Batch 150/162] avg loss 0.00265025, throughput 9.39508K wps
Begin Testing...
[Epoch 69] train avg loss 0.00246926, dev acc 0.9078, dev avg loss 0.235717, throughput 9.42618K wps
[Epoch 70 Batch 30/162] avg loss 0.00257705, throughput 9.51677K wps
[Epoch 70 Batch 60/162] avg loss 0.00226928, throughput 9.36987K wps
[Epoch 70 Batch 90/162] avg loss 0.00249652, throughput 9.38456K wps
[Epoch 70 Batch 120/162] avg loss 0.00211324, throughput 9.50327K wps
[Epoch 70 Batch 150/162] avg loss 0.00271965, throughput 9.46819K wps
Begin Testing...
[Epoch 70] train avg loss 0.00242111, dev acc 0.9089, dev avg loss 0.233756, throughput 9.43801K wps
[Epoch 71 Batch 30/162] avg loss 0.00235651, throughput 9.51452K wps
[Epoch 71 Batch 60/162] avg loss 0.00242878, throughput 9.47533K wps
[Epoch 71 Batch 90/162] avg loss 0.00259031, throughput 9.4441K wps
[Epoch 71 Batch 120/162] avg loss 0.00239291, throughput 9.36571K wps
[Epoch 71 Batch 150/162] avg loss 0.00234223, throughput 9.32055K wps
Begin Testing...
[Epoch 71] train avg loss 0.00241264, dev acc 0.9089, dev avg loss 0.233406, throughput 9.4133K wps
[Epoch 72 Batch 30/162] avg loss 0.00242557, throughput 9.51617K wps
[Epoch 72 Batch 60/162] avg loss 0.00242044, throughput 9.2519K wps
[Epoch 72 Batch 90/162] avg loss 0.00240604, throughput 9.48976K wps
[Epoch 72 Batch 120/162] avg loss 0.00222254, throughput 9.34074K wps
[Epoch 72 Batch 150/162] avg loss 0.00241634, throughput 9.24993K wps
Begin Testing...
[Epoch 72] train avg loss 0.00238551, dev acc 0.9078, dev avg loss 0.232928, throughput 9.37433K wps
[Epoch 73 Batch 30/162] avg loss 0.00225104, throughput 9.69523K wps
[Epoch 73 Batch 60/162] avg loss 0.00251166, throughput 9.39546K wps
[Epoch 73 Batch 90/162] avg loss 0.00224245, throughput 9.55207K wps
[Epoch 73 Batch 120/162] avg loss 0.00242628, throughput 9.45621K wps
[Epoch 73 Batch 150/162] avg loss 0.00239749, throughput 9.45189K wps
Begin Testing...
[Epoch 73] train avg loss 0.00236795, dev acc 0.9089, dev avg loss 0.232871, throughput 9.4951K wps
[Epoch 74 Batch 30/162] avg loss 0.00213699, throughput 9.43431K wps
[Epoch 74 Batch 60/162] avg loss 0.00242813, throughput 9.41929K wps
[Epoch 74 Batch 90/162] avg loss 0.00227872, throughput 9.30512K wps
[Epoch 74 Batch 120/162] avg loss 0.00230027, throughput 9.32665K wps
[Epoch 74 Batch 150/162] avg loss 0.00216548, throughput 9.6628K wps
Begin Testing...
[Epoch 74] train avg loss 0.00230945, dev acc 0.9100, dev avg loss 0.233672, throughput 9.44452K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/162] avg loss 0.00229099, throughput 9.66157K wps
[Epoch 75 Batch 60/162] avg loss 0.00231805, throughput 9.19063K wps
[Epoch 75 Batch 90/162] avg loss 0.00238184, throughput 9.45356K wps
[Epoch 75 Batch 120/162] avg loss 0.00212968, throughput 9.50277K wps
[Epoch 75 Batch 150/162] avg loss 0.00231665, throughput 9.42112K wps
Begin Testing...
[Epoch 75] train avg loss 0.00234109, dev acc 0.9067, dev avg loss 0.232223, throughput 9.43704K wps
[Epoch 76 Batch 30/162] avg loss 0.00228757, throughput 9.60872K wps
[Epoch 76 Batch 60/162] avg loss 0.00238633, throughput 9.20488K wps
[Epoch 76 Batch 90/162] avg loss 0.00226496, throughput 9.28966K wps
[Epoch 76 Batch 120/162] avg loss 0.00231158, throughput 9.40825K wps
[Epoch 76 Batch 150/162] avg loss 0.0021501, throughput 9.45735K wps
Begin Testing...
[Epoch 76] train avg loss 0.00224102, dev acc 0.9067, dev avg loss 0.232594, throughput 9.37968K wps
[Epoch 77 Batch 30/162] avg loss 0.00203727, throughput 9.46889K wps
[Epoch 77 Batch 60/162] avg loss 0.00236498, throughput 9.41467K wps
[Epoch 77 Batch 90/162] avg loss 0.00234807, throughput 9.31376K wps
[Epoch 77 Batch 120/162] avg loss 0.00238359, throughput 9.66378K wps
[Epoch 77 Batch 150/162] avg loss 0.00216933, throughput 9.54635K wps
Begin Testing...
[Epoch 77] train avg loss 0.00222111, dev acc 0.9078, dev avg loss 0.232175, throughput 9.46046K wps
[Epoch 78 Batch 30/162] avg loss 0.00237146, throughput 9.48115K wps
[Epoch 78 Batch 60/162] avg loss 0.00226466, throughput 9.40173K wps
[Epoch 78 Batch 90/162] avg loss 0.00214546, throughput 9.49443K wps
[Epoch 78 Batch 120/162] avg loss 0.00216933, throughput 9.43839K wps
[Epoch 78 Batch 150/162] avg loss 0.00218924, throughput 9.4294K wps
Begin Testing...
[Epoch 78] train avg loss 0.0022003, dev acc 0.9078, dev avg loss 0.231781, throughput 9.44886K wps
[Epoch 79 Batch 30/162] avg loss 0.00214779, throughput 9.63114K wps
[Epoch 79 Batch 60/162] avg loss 0.00208279, throughput 9.5244K wps
[Epoch 79 Batch 90/162] avg loss 0.00217244, throughput 9.30055K wps
[Epoch 79 Batch 120/162] avg loss 0.00213707, throughput 9.37776K wps
[Epoch 79 Batch 150/162] avg loss 0.00215754, throughput 9.31297K wps
Begin Testing...
[Epoch 79] train avg loss 0.00213948, dev acc 0.9078, dev avg loss 0.231746, throughput 9.43039K wps
[Epoch 80 Batch 30/162] avg loss 0.00212842, throughput 9.63843K wps
[Epoch 80 Batch 60/162] avg loss 0.00195101, throughput 9.35867K wps
[Epoch 80 Batch 90/162] avg loss 0.00209711, throughput 9.50732K wps
[Epoch 80 Batch 120/162] avg loss 0.00241555, throughput 9.50327K wps
[Epoch 80 Batch 150/162] avg loss 0.00220332, throughput 9.46487K wps
Begin Testing...
[Epoch 80] train avg loss 0.00215957, dev acc 0.9089, dev avg loss 0.231133, throughput 9.49374K wps
[Epoch 81 Batch 30/162] avg loss 0.0018672, throughput 9.54823K wps
[Epoch 81 Batch 60/162] avg loss 0.00219145, throughput 9.32757K wps
[Epoch 81 Batch 90/162] avg loss 0.00221833, throughput 9.41622K wps
[Epoch 81 Batch 120/162] avg loss 0.00197305, throughput 9.29481K wps
[Epoch 81 Batch 150/162] avg loss 0.00201746, throughput 9.45812K wps
Begin Testing...
[Epoch 81] train avg loss 0.00207677, dev acc 0.9100, dev avg loss 0.230953, throughput 9.37993K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/162] avg loss 0.00188539, throughput 9.65986K wps
[Epoch 82 Batch 60/162] avg loss 0.00204478, throughput 9.46403K wps
[Epoch 82 Batch 90/162] avg loss 0.00201831, throughput 9.2557K wps
[Epoch 82 Batch 120/162] avg loss 0.00241526, throughput 9.48381K wps
[Epoch 82 Batch 150/162] avg loss 0.0020828, throughput 9.29595K wps
Begin Testing...
[Epoch 82] train avg loss 0.00207981, dev acc 0.9078, dev avg loss 0.230809, throughput 9.44472K wps
[Epoch 83 Batch 30/162] avg loss 0.00223152, throughput 9.45692K wps
[Epoch 83 Batch 60/162] avg loss 0.00222273, throughput 9.41858K wps
[Epoch 83 Batch 90/162] avg loss 0.00200768, throughput 9.38583K wps
[Epoch 83 Batch 120/162] avg loss 0.00199637, throughput 9.35457K wps
[Epoch 83 Batch 150/162] avg loss 0.00221681, throughput 9.31573K wps
Begin Testing...
[Epoch 83] train avg loss 0.00212103, dev acc 0.9100, dev avg loss 0.230458, throughput 9.40186K wps
Observed Improvement.
Begin Testing...
[Epoch 84 Batch 30/162] avg loss 0.00215412, throughput 9.72015K wps
[Epoch 84 Batch 60/162] avg loss 0.00193827, throughput 9.26361K wps
[Epoch 84 Batch 90/162] avg loss 0.0021938, throughput 9.34604K wps
[Epoch 84 Batch 120/162] avg loss 0.00206668, throughput 9.28541K wps
[Epoch 84 Batch 150/162] avg loss 0.00189408, throughput 9.25179K wps
Begin Testing...
[Epoch 84] train avg loss 0.00204826, dev acc 0.9089, dev avg loss 0.230426, throughput 9.36956K wps
[Epoch 85 Batch 30/162] avg loss 0.00185962, throughput 9.52811K wps
[Epoch 85 Batch 60/162] avg loss 0.0020107, throughput 9.42722K wps
[Epoch 85 Batch 90/162] avg loss 0.00203444, throughput 9.41173K wps
[Epoch 85 Batch 120/162] avg loss 0.00194089, throughput 9.52288K wps
[Epoch 85 Batch 150/162] avg loss 0.00204689, throughput 9.42042K wps
Begin Testing...
[Epoch 85] train avg loss 0.00200129, dev acc 0.9067, dev avg loss 0.230463, throughput 9.47302K wps
[Epoch 86 Batch 30/162] avg loss 0.00177675, throughput 9.49244K wps
[Epoch 86 Batch 60/162] avg loss 0.00202632, throughput 9.66008K wps
[Epoch 86 Batch 90/162] avg loss 0.00204808, throughput 9.38044K wps
[Epoch 86 Batch 120/162] avg loss 0.00215758, throughput 9.52754K wps
[Epoch 86 Batch 150/162] avg loss 0.00180453, throughput 9.42865K wps
Begin Testing...
[Epoch 86] train avg loss 0.00196139, dev acc 0.9078, dev avg loss 0.231065, throughput 9.48334K wps
[Epoch 87 Batch 30/162] avg loss 0.0019585, throughput 9.55588K wps
[Epoch 87 Batch 60/162] avg loss 0.00199308, throughput 9.46191K wps
[Epoch 87 Batch 90/162] avg loss 0.00180912, throughput 9.33256K wps
[Epoch 87 Batch 120/162] avg loss 0.00202829, throughput 9.53227K wps
[Epoch 87 Batch 150/162] avg loss 0.00181426, throughput 9.34087K wps
Begin Testing...
[Epoch 87] train avg loss 0.00192785, dev acc 0.9067, dev avg loss 0.230277, throughput 9.44801K wps
[Epoch 88 Batch 30/162] avg loss 0.00176082, throughput 9.42677K wps
[Epoch 88 Batch 60/162] avg loss 0.0020671, throughput 9.46952K wps
[Epoch 88 Batch 90/162] avg loss 0.0018393, throughput 9.4635K wps
[Epoch 88 Batch 120/162] avg loss 0.00189293, throughput 9.41574K wps
[Epoch 88 Batch 150/162] avg loss 0.00200162, throughput 9.41319K wps
Begin Testing...
[Epoch 88] train avg loss 0.00192076, dev acc 0.9067, dev avg loss 0.229971, throughput 9.44874K wps
[Epoch 89 Batch 30/162] avg loss 0.00196706, throughput 9.57919K wps
[Epoch 89 Batch 60/162] avg loss 0.00193885, throughput 9.35798K wps
[Epoch 89 Batch 90/162] avg loss 0.00202345, throughput 9.34419K wps
[Epoch 89 Batch 120/162] avg loss 0.00190312, throughput 9.41812K wps
[Epoch 89 Batch 150/162] avg loss 0.00173288, throughput 9.36746K wps
Begin Testing...
[Epoch 89] train avg loss 0.00191078, dev acc 0.9089, dev avg loss 0.231062, throughput 9.43163K wps
[Epoch 90 Batch 30/162] avg loss 0.001871, throughput 9.48289K wps
[Epoch 90 Batch 60/162] avg loss 0.00174085, throughput 9.46199K wps
[Epoch 90 Batch 90/162] avg loss 0.00205183, throughput 9.26592K wps
[Epoch 90 Batch 120/162] avg loss 0.0018345, throughput 9.47404K wps
[Epoch 90 Batch 150/162] avg loss 0.00207165, throughput 9.42786K wps
Begin Testing...
[Epoch 90] train avg loss 0.00190886, dev acc 0.9089, dev avg loss 0.229688, throughput 9.42216K wps
[Epoch 91 Batch 30/162] avg loss 0.00185121, throughput 9.64327K wps
[Epoch 91 Batch 60/162] avg loss 0.00188709, throughput 9.33688K wps
[Epoch 91 Batch 90/162] avg loss 0.00186984, throughput 9.43082K wps
[Epoch 91 Batch 120/162] avg loss 0.00178008, throughput 9.49693K wps
[Epoch 91 Batch 150/162] avg loss 0.00195783, throughput 9.25683K wps
Begin Testing...
[Epoch 91] train avg loss 0.00187586, dev acc 0.9100, dev avg loss 0.229561, throughput 9.43077K wps
Observed Improvement.
Begin Testing...
[Epoch 92 Batch 30/162] avg loss 0.00191878, throughput 9.58135K wps
[Epoch 92 Batch 60/162] avg loss 0.00185139, throughput 9.45728K wps
[Epoch 92 Batch 90/162] avg loss 0.00199974, throughput 9.49063K wps
[Epoch 92 Batch 120/162] avg loss 0.00185051, throughput 9.25664K wps
[Epoch 92 Batch 150/162] avg loss 0.00176807, throughput 9.41778K wps
Begin Testing...
[Epoch 92] train avg loss 0.00186376, dev acc 0.9089, dev avg loss 0.229451, throughput 9.45148K wps
[Epoch 93 Batch 30/162] avg loss 0.00169152, throughput 9.58804K wps
[Epoch 93 Batch 60/162] avg loss 0.00184662, throughput 9.37001K wps
[Epoch 93 Batch 90/162] avg loss 0.00181727, throughput 9.30156K wps
[Epoch 93 Batch 120/162] avg loss 0.00197526, throughput 9.57869K wps
[Epoch 93 Batch 150/162] avg loss 0.00155701, throughput 9.44153K wps
Begin Testing...
[Epoch 93] train avg loss 0.00177294, dev acc 0.9067, dev avg loss 0.229503, throughput 9.44153K wps
[Epoch 94 Batch 30/162] avg loss 0.0017501, throughput 9.54151K wps
[Epoch 94 Batch 60/162] avg loss 0.00189529, throughput 9.4493K wps
[Epoch 94 Batch 90/162] avg loss 0.00171395, throughput 9.34279K wps
[Epoch 94 Batch 120/162] avg loss 0.00168188, throughput 9.3006K wps
[Epoch 94 Batch 150/162] avg loss 0.00176516, throughput 9.26229K wps
Begin Testing...
[Epoch 94] train avg loss 0.00177128, dev acc 0.9089, dev avg loss 0.229222, throughput 9.3895K wps
[Epoch 95 Batch 30/162] avg loss 0.00192507, throughput 9.48517K wps
[Epoch 95 Batch 60/162] avg loss 0.00171763, throughput 9.40488K wps
[Epoch 95 Batch 90/162] avg loss 0.00158727, throughput 9.49213K wps
[Epoch 95 Batch 120/162] avg loss 0.00190603, throughput 9.37772K wps
[Epoch 95 Batch 150/162] avg loss 0.00168699, throughput 9.34463K wps
Begin Testing...
[Epoch 95] train avg loss 0.00175758, dev acc 0.9056, dev avg loss 0.229732, throughput 9.41329K wps
[Epoch 96 Batch 30/162] avg loss 0.00164219, throughput 9.64598K wps
[Epoch 96 Batch 60/162] avg loss 0.00180407, throughput 9.26244K wps
[Epoch 96 Batch 90/162] avg loss 0.00168376, throughput 9.47289K wps
[Epoch 96 Batch 120/162] avg loss 0.00174691, throughput 9.56701K wps
[Epoch 96 Batch 150/162] avg loss 0.00146591, throughput 9.55153K wps
Begin Testing...
[Epoch 96] train avg loss 0.00166984, dev acc 0.9056, dev avg loss 0.230337, throughput 9.4896K wps
[Epoch 97 Batch 30/162] avg loss 0.00164687, throughput 9.45882K wps
[Epoch 97 Batch 60/162] avg loss 0.00149098, throughput 9.34822K wps
[Epoch 97 Batch 90/162] avg loss 0.00176893, throughput 9.42753K wps
[Epoch 97 Batch 120/162] avg loss 0.00181979, throughput 9.41794K wps
[Epoch 97 Batch 150/162] avg loss 0.00175131, throughput 9.41042K wps
Begin Testing...
[Epoch 97] train avg loss 0.00171622, dev acc 0.9056, dev avg loss 0.22993, throughput 9.42032K wps
[Epoch 98 Batch 30/162] avg loss 0.00152995, throughput 9.69674K wps
[Epoch 98 Batch 60/162] avg loss 0.00145983, throughput 9.30966K wps
[Epoch 98 Batch 90/162] avg loss 0.0015705, throughput 9.51451K wps
[Epoch 98 Batch 120/162] avg loss 0.00182273, throughput 9.4183K wps
[Epoch 98 Batch 150/162] avg loss 0.00193128, throughput 9.33212K wps
Begin Testing...
[Epoch 98] train avg loss 0.00168408, dev acc 0.9067, dev avg loss 0.230773, throughput 9.46114K wps
[Epoch 99 Batch 30/162] avg loss 0.00166608, throughput 9.61557K wps
[Epoch 99 Batch 60/162] avg loss 0.00174995, throughput 9.18671K wps
[Epoch 99 Batch 90/162] avg loss 0.0017058, throughput 9.48183K wps
[Epoch 99 Batch 120/162] avg loss 0.00178574, throughput 9.26609K wps
[Epoch 99 Batch 150/162] avg loss 0.00156201, throughput 9.28227K wps
Begin Testing...
[Epoch 99] train avg loss 0.0016845, dev acc 0.9044, dev avg loss 0.229811, throughput 9.35877K wps
[Epoch 100 Batch 30/162] avg loss 0.00161489, throughput 9.53247K wps
[Epoch 100 Batch 60/162] avg loss 0.00151833, throughput 9.30704K wps
[Epoch 100 Batch 90/162] avg loss 0.00163977, throughput 9.63101K wps
[Epoch 100 Batch 120/162] avg loss 0.00157494, throughput 9.45964K wps
[Epoch 100 Batch 150/162] avg loss 0.00160811, throughput 9.34586K wps
Begin Testing...
[Epoch 100] train avg loss 0.00158965, dev acc 0.9067, dev avg loss 0.229617, throughput 9.4506K wps
[Epoch 101 Batch 30/162] avg loss 0.00155059, throughput 9.58724K wps
[Epoch 101 Batch 60/162] avg loss 0.00159646, throughput 9.43018K wps
[Epoch 101 Batch 90/162] avg loss 0.00153429, throughput 9.39148K wps
[Epoch 101 Batch 120/162] avg loss 0.0017968, throughput 9.41788K wps
[Epoch 101 Batch 150/162] avg loss 0.00141439, throughput 9.53901K wps
Begin Testing...
[Epoch 101] train avg loss 0.00158273, dev acc 0.9056, dev avg loss 0.229857, throughput 9.48433K wps
[Epoch 102 Batch 30/162] avg loss 0.00145153, throughput 9.51218K wps
[Epoch 102 Batch 60/162] avg loss 0.00146349, throughput 9.39322K wps
[Epoch 102 Batch 90/162] avg loss 0.00155615, throughput 9.55787K wps
[Epoch 102 Batch 120/162] avg loss 0.00171872, throughput 9.48334K wps
[Epoch 102 Batch 150/162] avg loss 0.00179406, throughput 9.36459K wps
Begin Testing...
[Epoch 102] train avg loss 0.00158761, dev acc 0.9056, dev avg loss 0.229883, throughput 9.4716K wps
[Epoch 103 Batch 30/162] avg loss 0.0016011, throughput 9.51903K wps
[Epoch 103 Batch 60/162] avg loss 0.00142531, throughput 9.58996K wps
[Epoch 103 Batch 90/162] avg loss 0.00175576, throughput 9.39976K wps
[Epoch 103 Batch 120/162] avg loss 0.00163905, throughput 9.37047K wps
[Epoch 103 Batch 150/162] avg loss 0.00150459, throughput 9.43596K wps
Begin Testing...
[Epoch 103] train avg loss 0.00157957, dev acc 0.9089, dev avg loss 0.229327, throughput 9.47277K wps
[Epoch 104 Batch 30/162] avg loss 0.00153513, throughput 9.33868K wps
[Epoch 104 Batch 60/162] avg loss 0.00149964, throughput 9.5506K wps
[Epoch 104 Batch 90/162] avg loss 0.00160712, throughput 9.2441K wps
[Epoch 104 Batch 120/162] avg loss 0.00144635, throughput 9.53K wps
[Epoch 104 Batch 150/162] avg loss 0.00153667, throughput 9.47824K wps
Begin Testing...
[Epoch 104] train avg loss 0.00155162, dev acc 0.9078, dev avg loss 0.229468, throughput 9.40776K wps
[Epoch 105 Batch 30/162] avg loss 0.00141183, throughput 9.51289K wps
[Epoch 105 Batch 60/162] avg loss 0.00151852, throughput 9.49043K wps
[Epoch 105 Batch 90/162] avg loss 0.00152356, throughput 9.39677K wps
[Epoch 105 Batch 120/162] avg loss 0.0015766, throughput 9.27471K wps
[Epoch 105 Batch 150/162] avg loss 0.00143873, throughput 9.32436K wps
Begin Testing...
[Epoch 105] train avg loss 0.00149833, dev acc 0.9067, dev avg loss 0.229239, throughput 9.40824K wps
[Epoch 106 Batch 30/162] avg loss 0.00154192, throughput 9.75021K wps
[Epoch 106 Batch 60/162] avg loss 0.0015088, throughput 9.44832K wps
[Epoch 106 Batch 90/162] avg loss 0.00148075, throughput 9.30569K wps
[Epoch 106 Batch 120/162] avg loss 0.00147681, throughput 9.3575K wps
[Epoch 106 Batch 150/162] avg loss 0.00143831, throughput 9.3803K wps
Begin Testing...
[Epoch 106] train avg loss 0.00147954, dev acc 0.9078, dev avg loss 0.229283, throughput 9.42957K wps
[Epoch 107 Batch 30/162] avg loss 0.00161458, throughput 9.68302K wps
[Epoch 107 Batch 60/162] avg loss 0.00136929, throughput 9.38921K wps
[Epoch 107 Batch 90/162] avg loss 0.00169635, throughput 9.27688K wps
[Epoch 107 Batch 120/162] avg loss 0.00150827, throughput 9.4801K wps
[Epoch 107 Batch 150/162] avg loss 0.00157872, throughput 9.39013K wps
Begin Testing...
[Epoch 107] train avg loss 0.00155469, dev acc 0.9078, dev avg loss 0.229541, throughput 9.42395K wps
[Epoch 108 Batch 30/162] avg loss 0.00155633, throughput 9.56858K wps
[Epoch 108 Batch 60/162] avg loss 0.00148016, throughput 9.56082K wps
[Epoch 108 Batch 90/162] avg loss 0.0015239, throughput 9.29913K wps
[Epoch 108 Batch 120/162] avg loss 0.00149799, throughput 9.50152K wps
[Epoch 108 Batch 150/162] avg loss 0.00138366, throughput 9.34164K wps
Begin Testing...
[Epoch 108] train avg loss 0.00148478, dev acc 0.9067, dev avg loss 0.23007, throughput 9.44638K wps
[Epoch 109 Batch 30/162] avg loss 0.00148439, throughput 9.60358K wps
[Epoch 109 Batch 60/162] avg loss 0.00142033, throughput 9.33026K wps
[Epoch 109 Batch 90/162] avg loss 0.00131167, throughput 9.40558K wps
[Epoch 109 Batch 120/162] avg loss 0.00148522, throughput 9.38532K wps
[Epoch 109 Batch 150/162] avg loss 0.0016369, throughput 9.36925K wps
Begin Testing...
[Epoch 109] train avg loss 0.00146949, dev acc 0.9044, dev avg loss 0.230278, throughput 9.41857K wps
[Epoch 110 Batch 30/162] avg loss 0.00143108, throughput 9.57213K wps
[Epoch 110 Batch 60/162] avg loss 0.00141821, throughput 9.59217K wps
[Epoch 110 Batch 90/162] avg loss 0.00134611, throughput 9.18034K wps
[Epoch 110 Batch 120/162] avg loss 0.00144927, throughput 9.48332K wps
[Epoch 110 Batch 150/162] avg loss 0.00164894, throughput 9.54565K wps
Begin Testing...
[Epoch 110] train avg loss 0.00145218, dev acc 0.9044, dev avg loss 0.229918, throughput 9.46688K wps
[Epoch 111 Batch 30/162] avg loss 0.00155226, throughput 9.71186K wps
[Epoch 111 Batch 60/162] avg loss 0.00130029, throughput 9.2966K wps
[Epoch 111 Batch 90/162] avg loss 0.00141153, throughput 9.27986K wps
[Epoch 111 Batch 120/162] avg loss 0.00144165, throughput 9.43018K wps
[Epoch 111 Batch 150/162] avg loss 0.00145312, throughput 9.34458K wps
Begin Testing...
[Epoch 111] train avg loss 0.00143556, dev acc 0.9067, dev avg loss 0.230067, throughput 9.42379K wps
[Epoch 112 Batch 30/162] avg loss 0.00139717, throughput 9.45331K wps
[Epoch 112 Batch 60/162] avg loss 0.00138277, throughput 9.41432K wps
[Epoch 112 Batch 90/162] avg loss 0.00147046, throughput 9.40112K wps
[Epoch 112 Batch 120/162] avg loss 0.0013802, throughput 9.26478K wps
[Epoch 112 Batch 150/162] avg loss 0.00127427, throughput 9.24456K wps
Begin Testing...
[Epoch 112] train avg loss 0.00136954, dev acc 0.9067, dev avg loss 0.229929, throughput 9.34168K wps
[Epoch 113 Batch 30/162] avg loss 0.00143926, throughput 9.31985K wps
[Epoch 113 Batch 60/162] avg loss 0.00141941, throughput 9.49312K wps
[Epoch 113 Batch 90/162] avg loss 0.00138592, throughput 9.25285K wps
[Epoch 113 Batch 120/162] avg loss 0.0014533, throughput 9.49174K wps
[Epoch 113 Batch 150/162] avg loss 0.00131121, throughput 9.22303K wps
Begin Testing...
[Epoch 113] train avg loss 0.00139108, dev acc 0.9056, dev avg loss 0.230654, throughput 9.36915K wps
[Epoch 114 Batch 30/162] avg loss 0.00132429, throughput 9.55196K wps
[Epoch 114 Batch 60/162] avg loss 0.00138075, throughput 9.25876K wps
[Epoch 114 Batch 90/162] avg loss 0.00130877, throughput 9.51793K wps
[Epoch 114 Batch 120/162] avg loss 0.00149331, throughput 9.36408K wps
[Epoch 114 Batch 150/162] avg loss 0.00127485, throughput 9.24873K wps
Begin Testing...
[Epoch 114] train avg loss 0.00134654, dev acc 0.9089, dev avg loss 0.229764, throughput 9.39565K wps
[Epoch 115 Batch 30/162] avg loss 0.00143874, throughput 9.50552K wps
[Epoch 115 Batch 60/162] avg loss 0.0012415, throughput 9.275K wps
[Epoch 115 Batch 90/162] avg loss 0.00140917, throughput 9.57682K wps
[Epoch 115 Batch 120/162] avg loss 0.00133977, throughput 9.35983K wps
[Epoch 115 Batch 150/162] avg loss 0.00131923, throughput 9.24978K wps
Begin Testing...
[Epoch 115] train avg loss 0.0013441, dev acc 0.9078, dev avg loss 0.229782, throughput 9.3838K wps
[Epoch 116 Batch 30/162] avg loss 0.00130383, throughput 9.59202K wps
[Epoch 116 Batch 60/162] avg loss 0.00136091, throughput 9.53288K wps
[Epoch 116 Batch 90/162] avg loss 0.00137945, throughput 9.48212K wps
[Epoch 116 Batch 120/162] avg loss 0.00136721, throughput 9.42407K wps
[Epoch 116 Batch 150/162] avg loss 0.0013071, throughput 9.43917K wps
Begin Testing...
[Epoch 116] train avg loss 0.00134737, dev acc 0.9089, dev avg loss 0.229896, throughput 9.50402K wps
[Epoch 117 Batch 30/162] avg loss 0.00134371, throughput 9.66092K wps
[Epoch 117 Batch 60/162] avg loss 0.00131858, throughput 9.41767K wps
[Epoch 117 Batch 90/162] avg loss 0.001278, throughput 9.58842K wps
[Epoch 117 Batch 120/162] avg loss 0.00138105, throughput 9.36733K wps
[Epoch 117 Batch 150/162] avg loss 0.00132221, throughput 9.27931K wps
Begin Testing...
[Epoch 117] train avg loss 0.00132176, dev acc 0.9089, dev avg loss 0.230519, throughput 9.46554K wps
[Epoch 118 Batch 30/162] avg loss 0.00149979, throughput 9.60908K wps
[Epoch 118 Batch 60/162] avg loss 0.00116583, throughput 9.47154K wps
[Epoch 118 Batch 90/162] avg loss 0.00124398, throughput 9.3079K wps
[Epoch 118 Batch 120/162] avg loss 0.00132941, throughput 9.37475K wps
[Epoch 118 Batch 150/162] avg loss 0.00113277, throughput 9.42613K wps
Begin Testing...
[Epoch 118] train avg loss 0.00127486, dev acc 0.9089, dev avg loss 0.230407, throughput 9.43966K wps
[Epoch 119 Batch 30/162] avg loss 0.00133807, throughput 9.57203K wps
[Epoch 119 Batch 60/162] avg loss 0.00126535, throughput 9.41099K wps
[Epoch 119 Batch 90/162] avg loss 0.00132665, throughput 9.52672K wps
[Epoch 119 Batch 120/162] avg loss 0.00127105, throughput 9.32167K wps
[Epoch 119 Batch 150/162] avg loss 0.00130381, throughput 9.5156K wps
Begin Testing...
[Epoch 119] train avg loss 0.00129623, dev acc 0.9078, dev avg loss 0.230656, throughput 9.45224K wps
[Epoch 120 Batch 30/162] avg loss 0.00119283, throughput 9.74089K wps
[Epoch 120 Batch 60/162] avg loss 0.00130535, throughput 9.2668K wps
[Epoch 120 Batch 90/162] avg loss 0.00129203, throughput 9.41908K wps
[Epoch 120 Batch 120/162] avg loss 0.00122921, throughput 9.2502K wps
[Epoch 120 Batch 150/162] avg loss 0.00121516, throughput 9.36331K wps
Begin Testing...
[Epoch 120] train avg loss 0.00124534, dev acc 0.9067, dev avg loss 0.232324, throughput 9.39858K wps
[Epoch 121 Batch 30/162] avg loss 0.00130413, throughput 9.51718K wps
[Epoch 121 Batch 60/162] avg loss 0.00123464, throughput 9.31816K wps
[Epoch 121 Batch 90/162] avg loss 0.00131969, throughput 9.32898K wps
[Epoch 121 Batch 120/162] avg loss 0.00112522, throughput 9.52111K wps
[Epoch 121 Batch 150/162] avg loss 0.00135551, throughput 9.27221K wps
Begin Testing...
[Epoch 121] train avg loss 0.00126869, dev acc 0.9089, dev avg loss 0.232824, throughput 9.38386K wps
[Epoch 122 Batch 30/162] avg loss 0.00121979, throughput 9.61797K wps
[Epoch 122 Batch 60/162] avg loss 0.00128912, throughput 9.48748K wps
[Epoch 122 Batch 90/162] avg loss 0.00114383, throughput 9.51794K wps
[Epoch 122 Batch 120/162] avg loss 0.00123551, throughput 9.40637K wps
[Epoch 122 Batch 150/162] avg loss 0.0013794, throughput 9.27151K wps
Begin Testing...
[Epoch 122] train avg loss 0.00125345, dev acc 0.9089, dev avg loss 0.231178, throughput 9.47191K wps
[Epoch 123 Batch 30/162] avg loss 0.00123153, throughput 9.44986K wps
[Epoch 123 Batch 60/162] avg loss 0.00117243, throughput 9.37411K wps
[Epoch 123 Batch 90/162] avg loss 0.00115469, throughput 9.23474K wps
[Epoch 123 Batch 120/162] avg loss 0.00120026, throughput 9.60462K wps
[Epoch 123 Batch 150/162] avg loss 0.00130324, throughput 9.31313K wps
Begin Testing...
[Epoch 123] train avg loss 0.00120856, dev acc 0.9100, dev avg loss 0.230575, throughput 9.38487K wps
Observed Improvement.
Begin Testing...
[Epoch 124 Batch 30/162] avg loss 0.00131587, throughput 9.45153K wps
[Epoch 124 Batch 60/162] avg loss 0.00107548, throughput 9.32783K wps
[Epoch 124 Batch 90/162] avg loss 0.00116448, throughput 9.38583K wps
[Epoch 124 Batch 120/162] avg loss 0.00123663, throughput 9.40002K wps
[Epoch 124 Batch 150/162] avg loss 0.00116129, throughput 9.54536K wps
Begin Testing...
[Epoch 124] train avg loss 0.00119748, dev acc 0.9078, dev avg loss 0.231368, throughput 9.4279K wps
[Epoch 125 Batch 30/162] avg loss 0.00122607, throughput 9.60759K wps
[Epoch 125 Batch 60/162] avg loss 0.000989195, throughput 9.34235K wps
[Epoch 125 Batch 90/162] avg loss 0.00120134, throughput 9.42609K wps
[Epoch 125 Batch 120/162] avg loss 0.00116429, throughput 9.30946K wps
[Epoch 125 Batch 150/162] avg loss 0.00120988, throughput 9.45256K wps
Begin Testing...
[Epoch 125] train avg loss 0.00117088, dev acc 0.9067, dev avg loss 0.232322, throughput 9.43699K wps
[Epoch 126 Batch 30/162] avg loss 0.00125266, throughput 9.48973K wps
[Epoch 126 Batch 60/162] avg loss 0.00116606, throughput 9.36464K wps
[Epoch 126 Batch 90/162] avg loss 0.00106504, throughput 9.47364K wps
[Epoch 126 Batch 120/162] avg loss 0.00133055, throughput 9.3321K wps
[Epoch 126 Batch 150/162] avg loss 0.00117007, throughput 9.45828K wps
Begin Testing...
[Epoch 126] train avg loss 0.00119145, dev acc 0.9089, dev avg loss 0.230981, throughput 9.41892K wps
[Epoch 127 Batch 30/162] avg loss 0.00125357, throughput 9.55533K wps
[Epoch 127 Batch 60/162] avg loss 0.00118331, throughput 9.53164K wps
[Epoch 127 Batch 90/162] avg loss 0.00122439, throughput 9.4177K wps
[Epoch 127 Batch 120/162] avg loss 0.0011428, throughput 9.35033K wps
[Epoch 127 Batch 150/162] avg loss 0.00119592, throughput 9.29073K wps
Begin Testing...
[Epoch 127] train avg loss 0.00118048, dev acc 0.9089, dev avg loss 0.233368, throughput 9.43154K wps
[Epoch 128 Batch 30/162] avg loss 0.00127954, throughput 9.52522K wps
[Epoch 128 Batch 60/162] avg loss 0.00116916, throughput 9.47814K wps
[Epoch 128 Batch 90/162] avg loss 0.00116029, throughput 9.42214K wps
[Epoch 128 Batch 120/162] avg loss 0.00101199, throughput 9.50624K wps
[Epoch 128 Batch 150/162] avg loss 0.00120605, throughput 9.48094K wps
Begin Testing...
[Epoch 128] train avg loss 0.00116465, dev acc 0.9089, dev avg loss 0.231094, throughput 9.48963K wps
[Epoch 129 Batch 30/162] avg loss 0.0010908, throughput 9.40183K wps
[Epoch 129 Batch 60/162] avg loss 0.00121261, throughput 9.32102K wps
[Epoch 129 Batch 90/162] avg loss 0.00120329, throughput 9.32543K wps
[Epoch 129 Batch 120/162] avg loss 0.00103374, throughput 9.40656K wps
[Epoch 129 Batch 150/162] avg loss 0.00116011, throughput 9.30561K wps
Begin Testing...
[Epoch 129] train avg loss 0.00112974, dev acc 0.9078, dev avg loss 0.231639, throughput 9.36664K wps
[Epoch 130 Batch 30/162] avg loss 0.00127606, throughput 9.51606K wps
[Epoch 130 Batch 60/162] avg loss 0.00115915, throughput 9.39679K wps
[Epoch 130 Batch 90/162] avg loss 0.00118054, throughput 9.40936K wps
[Epoch 130 Batch 120/162] avg loss 0.00105857, throughput 9.5402K wps
[Epoch 130 Batch 150/162] avg loss 0.00117376, throughput 9.30031K wps
Begin Testing...
[Epoch 130] train avg loss 0.00116596, dev acc 0.9089, dev avg loss 0.231688, throughput 9.43072K wps
[Epoch 131 Batch 30/162] avg loss 0.0011466, throughput 9.78201K wps
[Epoch 131 Batch 60/162] avg loss 0.00112706, throughput 9.31586K wps
[Epoch 131 Batch 90/162] avg loss 0.00104819, throughput 9.3102K wps
[Epoch 131 Batch 120/162] avg loss 0.00114394, throughput 9.37388K wps
[Epoch 131 Batch 150/162] avg loss 0.00117354, throughput 9.33197K wps
Begin Testing...
[Epoch 131] train avg loss 0.00111917, dev acc 0.9078, dev avg loss 0.231436, throughput 9.42101K wps
[Epoch 132 Batch 30/162] avg loss 0.00113781, throughput 9.52037K wps
[Epoch 132 Batch 60/162] avg loss 0.00101034, throughput 9.2867K wps
[Epoch 132 Batch 90/162] avg loss 0.00120764, throughput 9.32912K wps
[Epoch 132 Batch 120/162] avg loss 0.00107254, throughput 9.3168K wps
[Epoch 132 Batch 150/162] avg loss 0.00117417, throughput 9.35506K wps
Begin Testing...
[Epoch 132] train avg loss 0.00111857, dev acc 0.9089, dev avg loss 0.23163, throughput 9.35942K wps
[Epoch 133 Batch 30/162] avg loss 0.00116766, throughput 9.51698K wps
[Epoch 133 Batch 60/162] avg loss 0.00117161, throughput 9.35977K wps
[Epoch 133 Batch 90/162] avg loss 0.00105401, throughput 9.31941K wps
[Epoch 133 Batch 120/162] avg loss 0.00102707, throughput 9.23051K wps
[Epoch 133 Batch 150/162] avg loss 0.00107245, throughput 9.30163K wps
Begin Testing...
[Epoch 133] train avg loss 0.00108673, dev acc 0.9089, dev avg loss 0.232259, throughput 9.36327K wps
[Epoch 134 Batch 30/162] avg loss 0.00110652, throughput 9.5603K wps
[Epoch 134 Batch 60/162] avg loss 0.00104329, throughput 9.52606K wps
[Epoch 134 Batch 90/162] avg loss 0.00115807, throughput 9.19236K wps
[Epoch 134 Batch 120/162] avg loss 0.00112703, throughput 9.39201K wps
[Epoch 134 Batch 150/162] avg loss 0.000917882, throughput 9.33058K wps
Begin Testing...
[Epoch 134] train avg loss 0.00106053, dev acc 0.9089, dev avg loss 0.232083, throughput 9.40031K wps
[Epoch 135 Batch 30/162] avg loss 0.00120027, throughput 9.6566K wps
[Epoch 135 Batch 60/162] avg loss 0.000972883, throughput 9.36635K wps
[Epoch 135 Batch 90/162] avg loss 0.0010226, throughput 9.50579K wps
[Epoch 135 Batch 120/162] avg loss 0.00107877, throughput 9.57004K wps
[Epoch 135 Batch 150/162] avg loss 0.00104918, throughput 9.43732K wps
Begin Testing...
[Epoch 135] train avg loss 0.00106597, dev acc 0.9100, dev avg loss 0.233435, throughput 9.4907K wps
Observed Improvement.
Begin Testing...
[Epoch 136 Batch 30/162] avg loss 0.00104226, throughput 9.49632K wps
[Epoch 136 Batch 60/162] avg loss 0.0011248, throughput 9.42263K wps
[Epoch 136 Batch 90/162] avg loss 0.000979714, throughput 9.39932K wps
[Epoch 136 Batch 120/162] avg loss 0.00112235, throughput 9.53759K wps
[Epoch 136 Batch 150/162] avg loss 0.00106865, throughput 9.28172K wps
Begin Testing...
[Epoch 136] train avg loss 0.00107747, dev acc 0.9089, dev avg loss 0.232289, throughput 9.41755K wps
[Epoch 137 Batch 30/162] avg loss 0.00107265, throughput 9.6354K wps
[Epoch 137 Batch 60/162] avg loss 0.0010436, throughput 9.3048K wps
[Epoch 137 Batch 90/162] avg loss 0.00106676, throughput 9.47605K wps
[Epoch 137 Batch 120/162] avg loss 0.00101127, throughput 9.33882K wps
[Epoch 137 Batch 150/162] avg loss 0.0010889, throughput 9.41961K wps
Begin Testing...
[Epoch 137] train avg loss 0.00104105, dev acc 0.9089, dev avg loss 0.232055, throughput 9.42202K wps
[Epoch 138 Batch 30/162] avg loss 0.00101001, throughput 9.52996K wps
[Epoch 138 Batch 60/162] avg loss 0.00107274, throughput 9.40712K wps
[Epoch 138 Batch 90/162] avg loss 0.00112835, throughput 9.54277K wps
[Epoch 138 Batch 120/162] avg loss 0.00109074, throughput 9.36165K wps
[Epoch 138 Batch 150/162] avg loss 0.000926584, throughput 9.47375K wps
Begin Testing...
[Epoch 138] train avg loss 0.00102443, dev acc 0.9100, dev avg loss 0.234792, throughput 9.45976K wps
Observed Improvement.
Begin Testing...
[Epoch 139 Batch 30/162] avg loss 0.000917554, throughput 9.50602K wps
[Epoch 139 Batch 60/162] avg loss 0.00102483, throughput 9.56401K wps
[Epoch 139 Batch 90/162] avg loss 0.00104301, throughput 9.30412K wps
[Epoch 139 Batch 120/162] avg loss 0.000902007, throughput 9.42485K wps
[Epoch 139 Batch 150/162] avg loss 0.000978564, throughput 9.34043K wps
Begin Testing...
[Epoch 139] train avg loss 0.000973383, dev acc 0.9089, dev avg loss 0.23379, throughput 9.41532K wps
[Epoch 140 Batch 30/162] avg loss 0.00112363, throughput 9.58172K wps
[Epoch 140 Batch 60/162] avg loss 0.000927711, throughput 9.35552K wps
[Epoch 140 Batch 90/162] avg loss 0.00105667, throughput 9.27178K wps
[Epoch 140 Batch 120/162] avg loss 0.00109079, throughput 9.33046K wps
[Epoch 140 Batch 150/162] avg loss 0.000950986, throughput 9.45505K wps
Begin Testing...
[Epoch 140] train avg loss 0.00103573, dev acc 0.9089, dev avg loss 0.232376, throughput 9.38938K wps
[Epoch 141 Batch 30/162] avg loss 0.0010825, throughput 9.57874K wps
[Epoch 141 Batch 60/162] avg loss 0.000925724, throughput 9.40497K wps
[Epoch 141 Batch 90/162] avg loss 0.000907817, throughput 9.54674K wps
[Epoch 141 Batch 120/162] avg loss 0.00105981, throughput 9.22967K wps
[Epoch 141 Batch 150/162] avg loss 0.00104694, throughput 9.48152K wps
Begin Testing...
[Epoch 141] train avg loss 0.000989632, dev acc 0.9100, dev avg loss 0.232313, throughput 9.45458K wps
Observed Improvement.
Begin Testing...
[Epoch 142 Batch 30/162] avg loss 0.000791197, throughput 9.63629K wps
[Epoch 142 Batch 60/162] avg loss 0.00111298, throughput 9.30087K wps
[Epoch 142 Batch 90/162] avg loss 0.00109017, throughput 9.34436K wps
[Epoch 142 Batch 120/162] avg loss 0.000968016, throughput 9.51697K wps
[Epoch 142 Batch 150/162] avg loss 0.00100595, throughput 9.40806K wps
Begin Testing...
[Epoch 142] train avg loss 0.000993251, dev acc 0.9089, dev avg loss 0.232322, throughput 9.45164K wps
[Epoch 143 Batch 30/162] avg loss 0.000981228, throughput 9.60639K wps
[Epoch 143 Batch 60/162] avg loss 0.00095341, throughput 9.27932K wps
[Epoch 143 Batch 90/162] avg loss 0.000919176, throughput 9.42186K wps
[Epoch 143 Batch 120/162] avg loss 0.00104658, throughput 9.38615K wps
[Epoch 143 Batch 150/162] avg loss 0.000867069, throughput 9.33962K wps
Begin Testing...
[Epoch 143] train avg loss 0.000950099, dev acc 0.9078, dev avg loss 0.232937, throughput 9.42267K wps
[Epoch 144 Batch 30/162] avg loss 0.000974376, throughput 9.6411K wps
[Epoch 144 Batch 60/162] avg loss 0.00100757, throughput 9.38647K wps
[Epoch 144 Batch 90/162] avg loss 0.000908716, throughput 9.53237K wps
[Epoch 144 Batch 120/162] avg loss 0.000833444, throughput 9.28649K wps
[Epoch 144 Batch 150/162] avg loss 0.000887538, throughput 9.41759K wps
Begin Testing...
[Epoch 144] train avg loss 0.000939386, dev acc 0.9089, dev avg loss 0.23314, throughput 9.44574K wps
[Epoch 145 Batch 30/162] avg loss 0.00106849, throughput 9.56686K wps
[Epoch 145 Batch 60/162] avg loss 0.00101651, throughput 9.45993K wps
[Epoch 145 Batch 90/162] avg loss 0.000900312, throughput 9.31088K wps
[Epoch 145 Batch 120/162] avg loss 0.00108238, throughput 9.24731K wps
[Epoch 145 Batch 150/162] avg loss 0.000922512, throughput 9.37205K wps
Begin Testing...
[Epoch 145] train avg loss 0.000991222, dev acc 0.9089, dev avg loss 0.233307, throughput 9.37975K wps
[Epoch 146 Batch 30/162] avg loss 0.000977123, throughput 9.64177K wps
[Epoch 146 Batch 60/162] avg loss 0.000843502, throughput 9.33919K wps
[Epoch 146 Batch 90/162] avg loss 0.000988307, throughput 9.34273K wps
[Epoch 146 Batch 120/162] avg loss 0.000975136, throughput 9.5271K wps
[Epoch 146 Batch 150/162] avg loss 0.0010214, throughput 9.32937K wps
Begin Testing...
[Epoch 146] train avg loss 0.000946645, dev acc 0.9089, dev avg loss 0.23337, throughput 9.42273K wps
[Epoch 147 Batch 30/162] avg loss 0.000967704, throughput 9.7456K wps
[Epoch 147 Batch 60/162] avg loss 0.000919638, throughput 9.37475K wps
[Epoch 147 Batch 90/162] avg loss 0.00104986, throughput 9.57752K wps
[Epoch 147 Batch 120/162] avg loss 0.00083452, throughput 9.30869K wps
[Epoch 147 Batch 150/162] avg loss 0.000892878, throughput 9.39343K wps
Begin Testing...
[Epoch 147] train avg loss 0.000933025, dev acc 0.9089, dev avg loss 0.234006, throughput 9.45983K wps
[Epoch 148 Batch 30/162] avg loss 0.000899146, throughput 9.68824K wps
[Epoch 148 Batch 60/162] avg loss 0.000853725, throughput 9.4349K wps
[Epoch 148 Batch 90/162] avg loss 0.000912254, throughput 9.31751K wps
[Epoch 148 Batch 120/162] avg loss 0.000917275, throughput 9.34575K wps
[Epoch 148 Batch 150/162] avg loss 0.00105725, throughput 9.50296K wps
Begin Testing...
[Epoch 148] train avg loss 0.000918427, dev acc 0.9078, dev avg loss 0.233348, throughput 9.4517K wps
[Epoch 149 Batch 30/162] avg loss 0.000901917, throughput 9.57802K wps
[Epoch 149 Batch 60/162] avg loss 0.0010974, throughput 9.31524K wps
[Epoch 149 Batch 90/162] avg loss 0.000937287, throughput 9.2491K wps
[Epoch 149 Batch 120/162] avg loss 0.000894007, throughput 9.41471K wps
[Epoch 149 Batch 150/162] avg loss 0.000729819, throughput 9.5361K wps
Begin Testing...
[Epoch 149] train avg loss 0.000923802, dev acc 0.9111, dev avg loss 0.234489, throughput 9.43384K wps
Observed Improvement.
Begin Testing...
[Epoch 150 Batch 30/162] avg loss 0.000804541, throughput 9.64593K wps
[Epoch 150 Batch 60/162] avg loss 0.000940878, throughput 9.30358K wps
[Epoch 150 Batch 90/162] avg loss 0.00092122, throughput 9.45642K wps
[Epoch 150 Batch 120/162] avg loss 0.000889109, throughput 9.56398K wps
[Epoch 150 Batch 150/162] avg loss 0.000911743, throughput 9.31791K wps
Begin Testing...
[Epoch 150] train avg loss 0.000909455, dev acc 0.9133, dev avg loss 0.237016, throughput 9.45791K wps
Observed Improvement.
Begin Testing...
[Epoch 151 Batch 30/162] avg loss 0.00104687, throughput 9.62529K wps
[Epoch 151 Batch 60/162] avg loss 0.000783363, throughput 9.30955K wps
[Epoch 151 Batch 90/162] avg loss 0.000865708, throughput 9.47896K wps
[Epoch 151 Batch 120/162] avg loss 0.000898965, throughput 9.18108K wps
[Epoch 151 Batch 150/162] avg loss 0.000912511, throughput 9.32852K wps
Begin Testing...
[Epoch 151] train avg loss 0.000901924, dev acc 0.9089, dev avg loss 0.234193, throughput 9.39453K wps
[Epoch 152 Batch 30/162] avg loss 0.000942033, throughput 9.66154K wps
[Epoch 152 Batch 60/162] avg loss 0.000917, throughput 9.32729K wps
[Epoch 152 Batch 90/162] avg loss 0.000853672, throughput 9.46519K wps
[Epoch 152 Batch 120/162] avg loss 0.000865698, throughput 9.61104K wps
[Epoch 152 Batch 150/162] avg loss 0.000930457, throughput 9.46066K wps
Begin Testing...
[Epoch 152] train avg loss 0.000898754, dev acc 0.9089, dev avg loss 0.233933, throughput 9.50791K wps
[Epoch 153 Batch 30/162] avg loss 0.000848278, throughput 9.49512K wps
[Epoch 153 Batch 60/162] avg loss 0.000874891, throughput 9.34026K wps
[Epoch 153 Batch 90/162] avg loss 0.000924783, throughput 9.47398K wps
[Epoch 153 Batch 120/162] avg loss 0.0008464, throughput 9.41661K wps
[Epoch 153 Batch 150/162] avg loss 0.000908126, throughput 9.47812K wps
Begin Testing...
[Epoch 153] train avg loss 0.000874369, dev acc 0.9100, dev avg loss 0.234371, throughput 9.44979K wps
[Epoch 154 Batch 30/162] avg loss 0.000920672, throughput 9.74402K wps
[Epoch 154 Batch 60/162] avg loss 0.000814592, throughput 9.55446K wps
[Epoch 154 Batch 90/162] avg loss 0.000811327, throughput 9.28855K wps
[Epoch 154 Batch 120/162] avg loss 0.000805069, throughput 9.51916K wps
[Epoch 154 Batch 150/162] avg loss 0.000926132, throughput 9.39829K wps
Begin Testing...
[Epoch 154] train avg loss 0.000854013, dev acc 0.9089, dev avg loss 0.234696, throughput 9.50781K wps
[Epoch 155 Batch 30/162] avg loss 0.000819967, throughput 9.41411K wps
[Epoch 155 Batch 60/162] avg loss 0.000851742, throughput 9.42135K wps
[Epoch 155 Batch 90/162] avg loss 0.00085012, throughput 9.38537K wps
[Epoch 155 Batch 120/162] avg loss 0.000828359, throughput 9.33566K wps
[Epoch 155 Batch 150/162] avg loss 0.000979425, throughput 9.41623K wps
Begin Testing...
[Epoch 155] train avg loss 0.000861041, dev acc 0.9100, dev avg loss 0.234617, throughput 9.39749K wps
[Epoch 156 Batch 30/162] avg loss 0.000908386, throughput 9.62097K wps
[Epoch 156 Batch 60/162] avg loss 0.000821374, throughput 9.6243K wps
[Epoch 156 Batch 90/162] avg loss 0.000867861, throughput 9.4248K wps
[Epoch 156 Batch 120/162] avg loss 0.00078668, throughput 9.45719K wps
[Epoch 156 Batch 150/162] avg loss 0.00089233, throughput 9.41718K wps
Begin Testing...
[Epoch 156] train avg loss 0.00085236, dev acc 0.9089, dev avg loss 0.234838, throughput 9.51593K wps
[Epoch 157 Batch 30/162] avg loss 0.000736066, throughput 9.51409K wps
[Epoch 157 Batch 60/162] avg loss 0.000908002, throughput 9.40162K wps
[Epoch 157 Batch 90/162] avg loss 0.00083561, throughput 9.23029K wps
[Epoch 157 Batch 120/162] avg loss 0.000859348, throughput 9.47912K wps
[Epoch 157 Batch 150/162] avg loss 0.000805165, throughput 9.65545K wps
Begin Testing...
[Epoch 157] train avg loss 0.000831235, dev acc 0.9100, dev avg loss 0.235804, throughput 9.45897K wps
[Epoch 158 Batch 30/162] avg loss 0.000951143, throughput 9.41079K wps
[Epoch 158 Batch 60/162] avg loss 0.000778569, throughput 9.39475K wps
[Epoch 158 Batch 90/162] avg loss 0.000875491, throughput 9.5724K wps
[Epoch 158 Batch 120/162] avg loss 0.000952165, throughput 9.30331K wps
[Epoch 158 Batch 150/162] avg loss 0.000788414, throughput 9.56461K wps
Begin Testing...
[Epoch 158] train avg loss 0.000845678, dev acc 0.9078, dev avg loss 0.235733, throughput 9.4523K wps
[Epoch 159 Batch 30/162] avg loss 0.00085577, throughput 9.47013K wps
[Epoch 159 Batch 60/162] avg loss 0.000891118, throughput 9.27658K wps
[Epoch 159 Batch 90/162] avg loss 0.000861628, throughput 9.23381K wps
[Epoch 159 Batch 120/162] avg loss 0.000698481, throughput 9.2417K wps
[Epoch 159 Batch 150/162] avg loss 0.000864441, throughput 9.4254K wps
Begin Testing...
[Epoch 159] train avg loss 0.000825486, dev acc 0.9100, dev avg loss 0.234453, throughput 9.31329K wps
[Epoch 160 Batch 30/162] avg loss 0.000769882, throughput 9.48976K wps
[Epoch 160 Batch 60/162] avg loss 0.00079528, throughput 9.56516K wps
[Epoch 160 Batch 90/162] avg loss 0.000809148, throughput 9.2531K wps
[Epoch 160 Batch 120/162] avg loss 0.000830905, throughput 9.29415K wps
[Epoch 160 Batch 150/162] avg loss 0.00083244, throughput 9.44979K wps
Begin Testing...
[Epoch 160] train avg loss 0.000820002, dev acc 0.9111, dev avg loss 0.235537, throughput 9.39205K wps
[Epoch 161 Batch 30/162] avg loss 0.000844816, throughput 9.63445K wps
[Epoch 161 Batch 60/162] avg loss 0.000784664, throughput 9.44139K wps
[Epoch 161 Batch 90/162] avg loss 0.000801659, throughput 9.46971K wps
[Epoch 161 Batch 120/162] avg loss 0.000830579, throughput 9.46503K wps
[Epoch 161 Batch 150/162] avg loss 0.000821105, throughput 9.23749K wps
Begin Testing...
[Epoch 161] train avg loss 0.000821155, dev acc 0.9089, dev avg loss 0.235196, throughput 9.43005K wps
[Epoch 162 Batch 30/162] avg loss 0.000723523, throughput 9.50028K wps
[Epoch 162 Batch 60/162] avg loss 0.000831475, throughput 9.27321K wps
[Epoch 162 Batch 90/162] avg loss 0.000753669, throughput 9.24562K wps
[Epoch 162 Batch 120/162] avg loss 0.00094508, throughput 8.29039K wps
[Epoch 162 Batch 150/162] avg loss 0.000757828, throughput 9.53707K wps
Begin Testing...
[Epoch 162] train avg loss 0.000813578, dev acc 0.9100, dev avg loss 0.235683, throughput 9.15311K wps
[Epoch 163 Batch 30/162] avg loss 0.000832987, throughput 9.63042K wps
[Epoch 163 Batch 60/162] avg loss 0.000769862, throughput 9.48367K wps
[Epoch 163 Batch 90/162] avg loss 0.000810397, throughput 9.39692K wps
[Epoch 163 Batch 120/162] avg loss 0.000790574, throughput 9.33818K wps
[Epoch 163 Batch 150/162] avg loss 0.000766768, throughput 9.42976K wps
Begin Testing...
[Epoch 163] train avg loss 0.000794569, dev acc 0.9111, dev avg loss 0.235793, throughput 9.45488K wps
[Epoch 164 Batch 30/162] avg loss 0.000749019, throughput 9.55306K wps
[Epoch 164 Batch 60/162] avg loss 0.000769312, throughput 9.35973K wps
[Epoch 164 Batch 90/162] avg loss 0.000880183, throughput 9.47812K wps
[Epoch 164 Batch 120/162] avg loss 0.000738201, throughput 9.50744K wps
[Epoch 164 Batch 150/162] avg loss 0.000762934, throughput 9.31955K wps
Begin Testing...
[Epoch 164] train avg loss 0.000772451, dev acc 0.9100, dev avg loss 0.23632, throughput 9.44936K wps
[Epoch 165 Batch 30/162] avg loss 0.000771005, throughput 9.69773K wps
[Epoch 165 Batch 60/162] avg loss 0.000749823, throughput 9.37011K wps
[Epoch 165 Batch 90/162] avg loss 0.000906661, throughput 9.61047K wps
[Epoch 165 Batch 120/162] avg loss 0.000740001, throughput 9.45996K wps
[Epoch 165 Batch 150/162] avg loss 0.000830273, throughput 9.39055K wps
Begin Testing...
[Epoch 165] train avg loss 0.000794587, dev acc 0.9100, dev avg loss 0.236553, throughput 9.48944K wps
[Epoch 166 Batch 30/162] avg loss 0.000768177, throughput 9.5646K wps
[Epoch 166 Batch 60/162] avg loss 0.000824933, throughput 9.41575K wps
[Epoch 166 Batch 90/162] avg loss 0.000821158, throughput 9.27244K wps
[Epoch 166 Batch 120/162] avg loss 0.000800608, throughput 9.36184K wps
[Epoch 166 Batch 150/162] avg loss 0.000828734, throughput 9.3954K wps
Begin Testing...
[Epoch 166] train avg loss 0.000802713, dev acc 0.9100, dev avg loss 0.235764, throughput 9.39036K wps
[Epoch 167 Batch 30/162] avg loss 0.000790269, throughput 9.63225K wps
[Epoch 167 Batch 60/162] avg loss 0.00070167, throughput 9.40472K wps
[Epoch 167 Batch 90/162] avg loss 0.000787989, throughput 9.3872K wps
[Epoch 167 Batch 120/162] avg loss 0.000739007, throughput 9.42673K wps
[Epoch 167 Batch 150/162] avg loss 0.000794844, throughput 9.60576K wps
Begin Testing...
[Epoch 167] train avg loss 0.000764162, dev acc 0.9100, dev avg loss 0.236202, throughput 9.49419K wps
[Epoch 168 Batch 30/162] avg loss 0.000801519, throughput 9.55981K wps
[Epoch 168 Batch 60/162] avg loss 0.000683453, throughput 9.40529K wps
[Epoch 168 Batch 90/162] avg loss 0.000784027, throughput 9.26729K wps
[Epoch 168 Batch 120/162] avg loss 0.000743444, throughput 9.22664K wps
[Epoch 168 Batch 150/162] avg loss 0.00074639, throughput 9.33994K wps
Begin Testing...
[Epoch 168] train avg loss 0.000755654, dev acc 0.9078, dev avg loss 0.237907, throughput 9.36235K wps
[Epoch 169 Batch 30/162] avg loss 0.000706642, throughput 9.48942K wps
[Epoch 169 Batch 60/162] avg loss 0.000801273, throughput 9.38395K wps
[Epoch 169 Batch 90/162] avg loss 0.000779325, throughput 9.32791K wps
[Epoch 169 Batch 120/162] avg loss 0.000761004, throughput 9.44284K wps
[Epoch 169 Batch 150/162] avg loss 0.000729448, throughput 9.20202K wps
Begin Testing...
[Epoch 169] train avg loss 0.000756275, dev acc 0.9122, dev avg loss 0.237194, throughput 9.36017K wps
[Epoch 170 Batch 30/162] avg loss 0.00066477, throughput 9.37133K wps
[Epoch 170 Batch 60/162] avg loss 0.000778539, throughput 9.27414K wps
[Epoch 170 Batch 90/162] avg loss 0.000825341, throughput 9.49727K wps
[Epoch 170 Batch 120/162] avg loss 0.000736744, throughput 9.31899K wps
[Epoch 170 Batch 150/162] avg loss 0.000736793, throughput 9.42257K wps
Begin Testing...
[Epoch 170] train avg loss 0.000740752, dev acc 0.9100, dev avg loss 0.237422, throughput 9.3939K wps
[Epoch 171 Batch 30/162] avg loss 0.000799584, throughput 9.54575K wps
[Epoch 171 Batch 60/162] avg loss 0.000789217, throughput 9.62761K wps
[Epoch 171 Batch 90/162] avg loss 0.000725629, throughput 9.48674K wps
[Epoch 171 Batch 120/162] avg loss 0.00070809, throughput 9.38962K wps
[Epoch 171 Batch 150/162] avg loss 0.000717389, throughput 9.47381K wps
Begin Testing...
[Epoch 171] train avg loss 0.000756386, dev acc 0.9100, dev avg loss 0.237227, throughput 9.48263K wps
[Epoch 172 Batch 30/162] avg loss 0.000752299, throughput 9.6368K wps
[Epoch 172 Batch 60/162] avg loss 0.000679785, throughput 9.32707K wps
[Epoch 172 Batch 90/162] avg loss 0.000817014, throughput 9.35232K wps
[Epoch 172 Batch 120/162] avg loss 0.000623984, throughput 9.42069K wps
[Epoch 172 Batch 150/162] avg loss 0.000732433, throughput 9.23809K wps
Begin Testing...
[Epoch 172] train avg loss 0.000726017, dev acc 0.9078, dev avg loss 0.236804, throughput 9.39972K wps
[Epoch 173 Batch 30/162] avg loss 0.000755986, throughput 9.46317K wps
[Epoch 173 Batch 60/162] avg loss 0.000702394, throughput 9.3325K wps
[Epoch 173 Batch 90/162] avg loss 0.000796277, throughput 9.46001K wps
[Epoch 173 Batch 120/162] avg loss 0.000652132, throughput 9.4137K wps
[Epoch 173 Batch 150/162] avg loss 0.000687601, throughput 9.40747K wps
Begin Testing...
[Epoch 173] train avg loss 0.000724834, dev acc 0.9111, dev avg loss 0.237712, throughput 9.4084K wps
[Epoch 174 Batch 30/162] avg loss 0.000735784, throughput 9.46282K wps
[Epoch 174 Batch 60/162] avg loss 0.000731652, throughput 9.31279K wps
[Epoch 174 Batch 90/162] avg loss 0.000656412, throughput 9.47202K wps
[Epoch 174 Batch 120/162] avg loss 0.000698891, throughput 9.44206K wps
[Epoch 174 Batch 150/162] avg loss 0.000741097, throughput 9.48087K wps
Begin Testing...
[Epoch 174] train avg loss 0.000713348, dev acc 0.9078, dev avg loss 0.237118, throughput 9.44297K wps
[Epoch 175 Batch 30/162] avg loss 0.000743771, throughput 9.39445K wps
[Epoch 175 Batch 60/162] avg loss 0.000682886, throughput 9.32839K wps
[Epoch 175 Batch 90/162] avg loss 0.000649598, throughput 9.52288K wps
[Epoch 175 Batch 120/162] avg loss 0.000667763, throughput 9.41873K wps
[Epoch 175 Batch 150/162] avg loss 0.000806995, throughput 9.35282K wps
Begin Testing...
[Epoch 175] train avg loss 0.000703635, dev acc 0.9100, dev avg loss 0.237228, throughput 9.3915K wps
[Epoch 176 Batch 30/162] avg loss 0.000832819, throughput 9.40682K wps
[Epoch 176 Batch 60/162] avg loss 0.000710265, throughput 9.35888K wps
[Epoch 176 Batch 90/162] avg loss 0.000667312, throughput 9.3304K wps
[Epoch 176 Batch 120/162] avg loss 0.000637296, throughput 9.33409K wps
[Epoch 176 Batch 150/162] avg loss 0.000669669, throughput 9.34223K wps
Begin Testing...
[Epoch 176] train avg loss 0.000699142, dev acc 0.9111, dev avg loss 0.237637, throughput 9.36195K wps
[Epoch 177 Batch 30/162] avg loss 0.000693528, throughput 9.58115K wps
[Epoch 177 Batch 60/162] avg loss 0.000650959, throughput 9.5932K wps
[Epoch 177 Batch 90/162] avg loss 0.000784742, throughput 9.39822K wps
[Epoch 177 Batch 120/162] avg loss 0.000633501, throughput 9.46298K wps
[Epoch 177 Batch 150/162] avg loss 0.000686833, throughput 9.48952K wps
Begin Testing...
[Epoch 177] train avg loss 0.000681109, dev acc 0.9089, dev avg loss 0.240385, throughput 9.48392K wps
[Epoch 178 Batch 30/162] avg loss 0.000640096, throughput 9.43652K wps
[Epoch 178 Batch 60/162] avg loss 0.000612319, throughput 9.249K wps
[Epoch 178 Batch 90/162] avg loss 0.000734316, throughput 9.46672K wps
[Epoch 178 Batch 120/162] avg loss 0.000597709, throughput 9.36726K wps
[Epoch 178 Batch 150/162] avg loss 0.000672073, throughput 9.3237K wps
Begin Testing...
[Epoch 178] train avg loss 0.000665362, dev acc 0.9100, dev avg loss 0.239272, throughput 9.3727K wps
[Epoch 179 Batch 30/162] avg loss 0.000633614, throughput 9.57979K wps
[Epoch 179 Batch 60/162] avg loss 0.000687075, throughput 9.30941K wps
[Epoch 179 Batch 90/162] avg loss 0.000608268, throughput 9.52255K wps
[Epoch 179 Batch 120/162] avg loss 0.00064015, throughput 9.51619K wps
[Epoch 179 Batch 150/162] avg loss 0.000611095, throughput 9.47367K wps
Begin Testing...
[Epoch 179] train avg loss 0.000634524, dev acc 0.9111, dev avg loss 0.239065, throughput 9.45879K wps
[Epoch 180 Batch 30/162] avg loss 0.000640002, throughput 9.44108K wps
[Epoch 180 Batch 60/162] avg loss 0.00066036, throughput 9.46485K wps
[Epoch 180 Batch 90/162] avg loss 0.000695246, throughput 9.2436K wps
[Epoch 180 Batch 120/162] avg loss 0.000639377, throughput 9.31758K wps
[Epoch 180 Batch 150/162] avg loss 0.000728602, throughput 9.51451K wps
Begin Testing...
[Epoch 180] train avg loss 0.000674831, dev acc 0.9100, dev avg loss 0.238442, throughput 9.40345K wps
[Epoch 181 Batch 30/162] avg loss 0.000615604, throughput 9.36744K wps
[Epoch 181 Batch 60/162] avg loss 0.000634235, throughput 9.42493K wps
[Epoch 181 Batch 90/162] avg loss 0.000672926, throughput 9.29464K wps
[Epoch 181 Batch 120/162] avg loss 0.000726044, throughput 9.24447K wps
[Epoch 181 Batch 150/162] avg loss 0.000637706, throughput 9.29073K wps
Begin Testing...
[Epoch 181] train avg loss 0.000656826, dev acc 0.9100, dev avg loss 0.238993, throughput 9.3223K wps
[Epoch 182 Batch 30/162] avg loss 0.000695117, throughput 9.49686K wps
[Epoch 182 Batch 60/162] avg loss 0.000682117, throughput 9.32149K wps
[Epoch 182 Batch 90/162] avg loss 0.000699859, throughput 9.47571K wps
[Epoch 182 Batch 120/162] avg loss 0.000691186, throughput 9.33104K wps
[Epoch 182 Batch 150/162] avg loss 0.000641777, throughput 9.35585K wps
Begin Testing...
[Epoch 182] train avg loss 0.000676474, dev acc 0.9100, dev avg loss 0.238849, throughput 9.36984K wps
[Epoch 183 Batch 30/162] avg loss 0.000643806, throughput 9.55162K wps
[Epoch 183 Batch 60/162] avg loss 0.000675814, throughput 9.37733K wps
[Epoch 183 Batch 90/162] avg loss 0.000622648, throughput 9.45701K wps
[Epoch 183 Batch 120/162] avg loss 0.000677972, throughput 9.2751K wps
[Epoch 183 Batch 150/162] avg loss 0.00067654, throughput 9.56866K wps
Begin Testing...
[Epoch 183] train avg loss 0.000662198, dev acc 0.9100, dev avg loss 0.239296, throughput 9.43471K wps
[Epoch 184 Batch 30/162] avg loss 0.000643572, throughput 9.61286K wps
[Epoch 184 Batch 60/162] avg loss 0.000679541, throughput 9.335K wps
[Epoch 184 Batch 90/162] avg loss 0.0006128, throughput 9.36744K wps
[Epoch 184 Batch 120/162] avg loss 0.000675135, throughput 9.30609K wps
[Epoch 184 Batch 150/162] avg loss 0.0006197, throughput 9.26366K wps
Begin Testing...
[Epoch 184] train avg loss 0.000652314, dev acc 0.9089, dev avg loss 0.240087, throughput 9.37998K wps
[Epoch 185 Batch 30/162] avg loss 0.000701048, throughput 9.41467K wps
[Epoch 185 Batch 60/162] avg loss 0.000639268, throughput 9.28428K wps
[Epoch 185 Batch 90/162] avg loss 0.000594154, throughput 9.30481K wps
[Epoch 185 Batch 120/162] avg loss 0.000658006, throughput 9.23882K wps
[Epoch 185 Batch 150/162] avg loss 0.000591176, throughput 9.28895K wps
Begin Testing...
[Epoch 185] train avg loss 0.000641609, dev acc 0.9089, dev avg loss 0.24066, throughput 9.30514K wps
[Epoch 186 Batch 30/162] avg loss 0.000669224, throughput 9.59649K wps
[Epoch 186 Batch 60/162] avg loss 0.000661967, throughput 9.34397K wps
[Epoch 186 Batch 90/162] avg loss 0.000709543, throughput 9.43368K wps
[Epoch 186 Batch 120/162] avg loss 0.00056775, throughput 9.41121K wps
[Epoch 186 Batch 150/162] avg loss 0.000713935, throughput 9.23548K wps
Begin Testing...
[Epoch 186] train avg loss 0.000658315, dev acc 0.9100, dev avg loss 0.239777, throughput 9.39768K wps
[Epoch 187 Batch 30/162] avg loss 0.000600564, throughput 9.70658K wps
[Epoch 187 Batch 60/162] avg loss 0.000598476, throughput 9.38072K wps
[Epoch 187 Batch 90/162] avg loss 0.000622142, throughput 9.38747K wps
[Epoch 187 Batch 120/162] avg loss 0.000639972, throughput 9.47634K wps
[Epoch 187 Batch 150/162] avg loss 0.000609056, throughput 9.28248K wps
Begin Testing...
[Epoch 187] train avg loss 0.000613317, dev acc 0.9100, dev avg loss 0.240447, throughput 9.42922K wps
[Epoch 188 Batch 30/162] avg loss 0.000773164, throughput 9.52025K wps
[Epoch 188 Batch 60/162] avg loss 0.000617776, throughput 9.22389K wps
[Epoch 188 Batch 90/162] avg loss 0.000667963, throughput 9.36549K wps
[Epoch 188 Batch 120/162] avg loss 0.000657483, throughput 9.47177K wps
[Epoch 188 Batch 150/162] avg loss 0.000693193, throughput 9.39425K wps
Begin Testing...
[Epoch 188] train avg loss 0.000672095, dev acc 0.9100, dev avg loss 0.23955, throughput 9.38678K wps
[Epoch 189 Batch 30/162] avg loss 0.000638646, throughput 9.53631K wps
[Epoch 189 Batch 60/162] avg loss 0.000609101, throughput 9.43035K wps
[Epoch 189 Batch 90/162] avg loss 0.000576269, throughput 9.30208K wps
[Epoch 189 Batch 120/162] avg loss 0.00057591, throughput 9.44965K wps
[Epoch 189 Batch 150/162] avg loss 0.000644149, throughput 9.20806K wps
Begin Testing...
[Epoch 189] train avg loss 0.000622232, dev acc 0.9111, dev avg loss 0.239722, throughput 9.37656K wps
[Epoch 190 Batch 30/162] avg loss 0.000607203, throughput 9.60465K wps
[Epoch 190 Batch 60/162] avg loss 0.000647973, throughput 9.3575K wps
[Epoch 190 Batch 90/162] avg loss 0.000602054, throughput 9.25115K wps
[Epoch 190 Batch 120/162] avg loss 0.000639988, throughput 9.40754K wps
[Epoch 190 Batch 150/162] avg loss 0.000545217, throughput 9.29879K wps
Begin Testing...
[Epoch 190] train avg loss 0.000615136, dev acc 0.9100, dev avg loss 0.239798, throughput 9.37632K wps
[Epoch 191 Batch 30/162] avg loss 0.00058518, throughput 9.59361K wps
[Epoch 191 Batch 60/162] avg loss 0.000575453, throughput 9.39359K wps
[Epoch 191 Batch 90/162] avg loss 0.00065633, throughput 9.43916K wps
[Epoch 191 Batch 120/162] avg loss 0.000648766, throughput 9.27674K wps
[Epoch 191 Batch 150/162] avg loss 0.000553561, throughput 9.4239K wps
Begin Testing...
[Epoch 191] train avg loss 0.000602676, dev acc 0.9078, dev avg loss 0.240976, throughput 9.3981K wps
[Epoch 192 Batch 30/162] avg loss 0.000592778, throughput 9.44805K wps
[Epoch 192 Batch 60/162] avg loss 0.000596172, throughput 9.42278K wps
[Epoch 192 Batch 90/162] avg loss 0.000573371, throughput 9.26366K wps
[Epoch 192 Batch 120/162] avg loss 0.000575005, throughput 9.25155K wps
[Epoch 192 Batch 150/162] avg loss 0.000636236, throughput 9.32154K wps
Begin Testing...
[Epoch 192] train avg loss 0.000593533, dev acc 0.9100, dev avg loss 0.240407, throughput 9.35108K wps
[Epoch 193 Batch 30/162] avg loss 0.000658213, throughput 9.72959K wps
[Epoch 193 Batch 60/162] avg loss 0.000577393, throughput 9.44005K wps
[Epoch 193 Batch 90/162] avg loss 0.000586158, throughput 9.30096K wps
[Epoch 193 Batch 120/162] avg loss 0.000573065, throughput 9.52292K wps
[Epoch 193 Batch 150/162] avg loss 0.000586835, throughput 9.41289K wps
Begin Testing...
[Epoch 193] train avg loss 0.000595349, dev acc 0.9100, dev avg loss 0.240989, throughput 9.46127K wps
[Epoch 194 Batch 30/162] avg loss 0.000662096, throughput 9.62171K wps
[Epoch 194 Batch 60/162] avg loss 0.000583193, throughput 9.47202K wps
[Epoch 194 Batch 90/162] avg loss 0.000577051, throughput 9.35759K wps
[Epoch 194 Batch 120/162] avg loss 0.000570031, throughput 9.41577K wps
[Epoch 194 Batch 150/162] avg loss 0.0006127, throughput 9.41688K wps
Begin Testing...
[Epoch 194] train avg loss 0.000599261, dev acc 0.9089, dev avg loss 0.241427, throughput 9.43003K wps
[Epoch 195 Batch 30/162] avg loss 0.000605583, throughput 9.50723K wps
[Epoch 195 Batch 60/162] avg loss 0.000595955, throughput 9.30366K wps
[Epoch 195 Batch 90/162] avg loss 0.000495778, throughput 9.33759K wps
[Epoch 195 Batch 120/162] avg loss 0.000552032, throughput 9.37243K wps
[Epoch 195 Batch 150/162] avg loss 0.000550038, throughput 9.5454K wps
Begin Testing...
[Epoch 195] train avg loss 0.00057086, dev acc 0.9089, dev avg loss 0.241859, throughput 9.40634K wps
[Epoch 196 Batch 30/162] avg loss 0.000520767, throughput 9.49918K wps
[Epoch 196 Batch 60/162] avg loss 0.000716071, throughput 9.22411K wps
[Epoch 196 Batch 90/162] avg loss 0.000616169, throughput 9.24138K wps
[Epoch 196 Batch 120/162] avg loss 0.000636962, throughput 9.54738K wps
[Epoch 196 Batch 150/162] avg loss 0.000596003, throughput 9.23651K wps
Begin Testing...
[Epoch 196] train avg loss 0.000611805, dev acc 0.9089, dev avg loss 0.241178, throughput 9.33831K wps
[Epoch 197 Batch 30/162] avg loss 0.000599157, throughput 9.55958K wps
[Epoch 197 Batch 60/162] avg loss 0.000653632, throughput 9.36023K wps
[Epoch 197 Batch 90/162] avg loss 0.000706283, throughput 9.3413K wps
[Epoch 197 Batch 120/162] avg loss 0.000546314, throughput 9.47379K wps
[Epoch 197 Batch 150/162] avg loss 0.00065786, throughput 9.35161K wps
Begin Testing...
[Epoch 197] train avg loss 0.000622945, dev acc 0.9100, dev avg loss 0.242742, throughput 9.42742K wps
[Epoch 198 Batch 30/162] avg loss 0.000501483, throughput 9.53983K wps
[Epoch 198 Batch 60/162] avg loss 0.000557501, throughput 9.52597K wps
[Epoch 198 Batch 90/162] avg loss 0.00052395, throughput 9.19008K wps
[Epoch 198 Batch 120/162] avg loss 0.000651491, throughput 9.311K wps
[Epoch 198 Batch 150/162] avg loss 0.000543924, throughput 9.3464K wps
Begin Testing...
[Epoch 198] train avg loss 0.000554811, dev acc 0.9089, dev avg loss 0.242213, throughput 9.38769K wps
[Epoch 199 Batch 30/162] avg loss 0.000553013, throughput 9.62338K wps
[Epoch 199 Batch 60/162] avg loss 0.000695971, throughput 9.38402K wps
[Epoch 199 Batch 90/162] avg loss 0.000585834, throughput 9.46732K wps
[Epoch 199 Batch 120/162] avg loss 0.000557989, throughput 9.41661K wps
[Epoch 199 Batch 150/162] avg loss 0.000537312, throughput 9.31667K wps
Begin Testing...
[Epoch 199] train avg loss 0.000579422, dev acc 0.9089, dev avg loss 0.242216, throughput 9.42672K wps
Test loss 0.240849, test acc 0.9000
Total time cost 449.56s
[Epoch 0 Batch 30/162] avg loss 0.0140726, throughput 7.39141K wps
[Epoch 0 Batch 60/162] avg loss 0.0140383, throughput 9.46102K wps
[Epoch 0 Batch 90/162] avg loss 0.0137832, throughput 9.39688K wps
[Epoch 0 Batch 120/162] avg loss 0.0136756, throughput 9.50621K wps
[Epoch 0 Batch 150/162] avg loss 0.0135761, throughput 9.26843K wps
Begin Testing...
[Epoch 0] train avg loss 0.0138065, dev acc 0.5733, dev avg loss 0.670652, throughput 8.96478K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0133764, throughput 9.56917K wps
[Epoch 1 Batch 60/162] avg loss 0.0131806, throughput 9.34241K wps
[Epoch 1 Batch 90/162] avg loss 0.0131952, throughput 9.43099K wps
[Epoch 1 Batch 120/162] avg loss 0.0130626, throughput 9.4115K wps
[Epoch 1 Batch 150/162] avg loss 0.0128902, throughput 9.33768K wps
Begin Testing...
[Epoch 1] train avg loss 0.0131225, dev acc 0.8422, dev avg loss 0.635717, throughput 9.40402K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/162] avg loss 0.0126453, throughput 9.61325K wps
[Epoch 2 Batch 60/162] avg loss 0.0126212, throughput 9.53795K wps
[Epoch 2 Batch 90/162] avg loss 0.0126731, throughput 9.37195K wps
[Epoch 2 Batch 120/162] avg loss 0.0122695, throughput 9.29239K wps
[Epoch 2 Batch 150/162] avg loss 0.0121763, throughput 9.36317K wps
Begin Testing...
[Epoch 2] train avg loss 0.0124558, dev acc 0.8644, dev avg loss 0.600753, throughput 9.43269K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.0120106, throughput 9.59417K wps
[Epoch 3 Batch 60/162] avg loss 0.0119057, throughput 9.59146K wps
[Epoch 3 Batch 90/162] avg loss 0.0116017, throughput 9.41008K wps
[Epoch 3 Batch 120/162] avg loss 0.0116738, throughput 9.36862K wps
[Epoch 3 Batch 150/162] avg loss 0.0115794, throughput 9.28908K wps
Begin Testing...
[Epoch 3] train avg loss 0.0117007, dev acc 0.8378, dev avg loss 0.561953, throughput 9.4522K wps
[Epoch 4 Batch 30/162] avg loss 0.0112274, throughput 9.60132K wps
[Epoch 4 Batch 60/162] avg loss 0.0109353, throughput 9.43363K wps
[Epoch 4 Batch 90/162] avg loss 0.0109218, throughput 9.47301K wps
[Epoch 4 Batch 120/162] avg loss 0.0107493, throughput 9.49994K wps
[Epoch 4 Batch 150/162] avg loss 0.0106113, throughput 9.24157K wps
Begin Testing...
[Epoch 4] train avg loss 0.0108482, dev acc 0.8789, dev avg loss 0.518359, throughput 9.44586K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/162] avg loss 0.0104453, throughput 9.54439K wps
[Epoch 5 Batch 60/162] avg loss 0.0101523, throughput 9.42843K wps
[Epoch 5 Batch 90/162] avg loss 0.0101504, throughput 9.34022K wps
[Epoch 5 Batch 120/162] avg loss 0.0100014, throughput 9.35983K wps
[Epoch 5 Batch 150/162] avg loss 0.00994029, throughput 9.22321K wps
Begin Testing...
[Epoch 5] train avg loss 0.0101024, dev acc 0.8756, dev avg loss 0.478165, throughput 9.37346K wps
[Epoch 6 Batch 30/162] avg loss 0.00950364, throughput 9.57595K wps
[Epoch 6 Batch 60/162] avg loss 0.00935815, throughput 9.32741K wps
[Epoch 6 Batch 90/162] avg loss 0.00937507, throughput 9.42836K wps
[Epoch 6 Batch 120/162] avg loss 0.00913393, throughput 9.3183K wps
[Epoch 6 Batch 150/162] avg loss 0.00919421, throughput 9.48266K wps
Begin Testing...
[Epoch 6] train avg loss 0.00929135, dev acc 0.8822, dev avg loss 0.440005, throughput 9.41961K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/162] avg loss 0.00869316, throughput 9.4913K wps
[Epoch 7 Batch 60/162] avg loss 0.0090485, throughput 9.26146K wps
[Epoch 7 Batch 90/162] avg loss 0.0085844, throughput 9.32471K wps
[Epoch 7 Batch 120/162] avg loss 0.00854258, throughput 9.47627K wps
[Epoch 7 Batch 150/162] avg loss 0.00856236, throughput 9.35085K wps
Begin Testing...
[Epoch 7] train avg loss 0.00863502, dev acc 0.8800, dev avg loss 0.409862, throughput 9.40692K wps
[Epoch 8 Batch 30/162] avg loss 0.00828883, throughput 9.60198K wps
[Epoch 8 Batch 60/162] avg loss 0.00796956, throughput 9.41286K wps
[Epoch 8 Batch 90/162] avg loss 0.00830067, throughput 9.3611K wps
[Epoch 8 Batch 120/162] avg loss 0.00792313, throughput 9.27207K wps
[Epoch 8 Batch 150/162] avg loss 0.00797801, throughput 9.58641K wps
Begin Testing...
[Epoch 8] train avg loss 0.00807945, dev acc 0.8833, dev avg loss 0.384831, throughput 9.43896K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/162] avg loss 0.00766453, throughput 9.62713K wps
[Epoch 9 Batch 60/162] avg loss 0.00750802, throughput 9.4043K wps
[Epoch 9 Batch 90/162] avg loss 0.00775625, throughput 9.43724K wps
[Epoch 9 Batch 120/162] avg loss 0.00767138, throughput 9.52052K wps
[Epoch 9 Batch 150/162] avg loss 0.00756405, throughput 9.47361K wps
Begin Testing...
[Epoch 9] train avg loss 0.00761668, dev acc 0.8867, dev avg loss 0.365238, throughput 9.4848K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.00731415, throughput 9.53568K wps
[Epoch 10 Batch 60/162] avg loss 0.00770711, throughput 9.33454K wps
[Epoch 10 Batch 90/162] avg loss 0.0070536, throughput 9.47978K wps
[Epoch 10 Batch 120/162] avg loss 0.00719121, throughput 9.49022K wps
[Epoch 10 Batch 150/162] avg loss 0.00721707, throughput 9.42229K wps
Begin Testing...
[Epoch 10] train avg loss 0.00729648, dev acc 0.8900, dev avg loss 0.349222, throughput 9.44327K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.00690867, throughput 9.54089K wps
[Epoch 11 Batch 60/162] avg loss 0.00717689, throughput 9.48441K wps
[Epoch 11 Batch 90/162] avg loss 0.0068918, throughput 9.28624K wps
[Epoch 11 Batch 120/162] avg loss 0.0071228, throughput 9.44055K wps
[Epoch 11 Batch 150/162] avg loss 0.00669772, throughput 9.31503K wps
Begin Testing...
[Epoch 11] train avg loss 0.00697559, dev acc 0.8944, dev avg loss 0.336424, throughput 9.41227K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.00693336, throughput 9.54545K wps
[Epoch 12 Batch 60/162] avg loss 0.00663406, throughput 9.35629K wps
[Epoch 12 Batch 90/162] avg loss 0.00648058, throughput 9.51424K wps
[Epoch 12 Batch 120/162] avg loss 0.00676808, throughput 9.42161K wps
[Epoch 12 Batch 150/162] avg loss 0.00687599, throughput 9.23493K wps
Begin Testing...
[Epoch 12] train avg loss 0.00674375, dev acc 0.8922, dev avg loss 0.326054, throughput 9.39659K wps
[Epoch 13 Batch 30/162] avg loss 0.00646833, throughput 9.53929K wps
[Epoch 13 Batch 60/162] avg loss 0.00647085, throughput 9.55425K wps
[Epoch 13 Batch 90/162] avg loss 0.00639204, throughput 9.33323K wps
[Epoch 13 Batch 120/162] avg loss 0.00652885, throughput 9.37294K wps
[Epoch 13 Batch 150/162] avg loss 0.00633697, throughput 9.29223K wps
Begin Testing...
[Epoch 13] train avg loss 0.006429, dev acc 0.8967, dev avg loss 0.315885, throughput 9.41412K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00604977, throughput 9.72143K wps
[Epoch 14 Batch 60/162] avg loss 0.00650789, throughput 9.20917K wps
[Epoch 14 Batch 90/162] avg loss 0.00629555, throughput 9.21399K wps
[Epoch 14 Batch 120/162] avg loss 0.00603097, throughput 9.34961K wps
[Epoch 14 Batch 150/162] avg loss 0.00663879, throughput 9.23972K wps
Begin Testing...
[Epoch 14] train avg loss 0.00628041, dev acc 0.9000, dev avg loss 0.308417, throughput 9.36123K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.00609191, throughput 9.70632K wps
[Epoch 15 Batch 60/162] avg loss 0.00608192, throughput 9.40351K wps
[Epoch 15 Batch 90/162] avg loss 0.00635003, throughput 9.35678K wps
[Epoch 15 Batch 120/162] avg loss 0.00639329, throughput 9.49997K wps
[Epoch 15 Batch 150/162] avg loss 0.00603197, throughput 9.32145K wps
Begin Testing...
[Epoch 15] train avg loss 0.0061382, dev acc 0.8967, dev avg loss 0.303301, throughput 9.45129K wps
[Epoch 16 Batch 30/162] avg loss 0.00598804, throughput 9.71958K wps
[Epoch 16 Batch 60/162] avg loss 0.00622243, throughput 9.46557K wps
[Epoch 16 Batch 90/162] avg loss 0.00580353, throughput 9.31757K wps
[Epoch 16 Batch 120/162] avg loss 0.00564709, throughput 9.66407K wps
[Epoch 16 Batch 150/162] avg loss 0.00628037, throughput 9.39222K wps
Begin Testing...
[Epoch 16] train avg loss 0.00594148, dev acc 0.8989, dev avg loss 0.295985, throughput 9.50332K wps
[Epoch 17 Batch 30/162] avg loss 0.00580788, throughput 9.65908K wps
[Epoch 17 Batch 60/162] avg loss 0.00586131, throughput 9.35996K wps
[Epoch 17 Batch 90/162] avg loss 0.00601423, throughput 9.41336K wps
[Epoch 17 Batch 120/162] avg loss 0.005885, throughput 9.35865K wps
[Epoch 17 Batch 150/162] avg loss 0.00577548, throughput 9.40211K wps
Begin Testing...
[Epoch 17] train avg loss 0.00585919, dev acc 0.9000, dev avg loss 0.290959, throughput 9.44395K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/162] avg loss 0.00584757, throughput 9.55851K wps
[Epoch 18 Batch 60/162] avg loss 0.00571791, throughput 9.4662K wps
[Epoch 18 Batch 90/162] avg loss 0.0056391, throughput 9.49958K wps
[Epoch 18 Batch 120/162] avg loss 0.00571704, throughput 9.29116K wps
[Epoch 18 Batch 150/162] avg loss 0.00547214, throughput 9.55196K wps
Begin Testing...
[Epoch 18] train avg loss 0.00566763, dev acc 0.9078, dev avg loss 0.286611, throughput 9.45075K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/162] avg loss 0.00576338, throughput 9.73011K wps
[Epoch 19 Batch 60/162] avg loss 0.00537843, throughput 9.51917K wps
[Epoch 19 Batch 90/162] avg loss 0.00584516, throughput 9.29239K wps
[Epoch 19 Batch 120/162] avg loss 0.00548162, throughput 9.33315K wps
[Epoch 19 Batch 150/162] avg loss 0.00554966, throughput 9.51513K wps
Begin Testing...
[Epoch 19] train avg loss 0.00560618, dev acc 0.9011, dev avg loss 0.280949, throughput 9.46913K wps
[Epoch 20 Batch 30/162] avg loss 0.00561643, throughput 9.68142K wps
[Epoch 20 Batch 60/162] avg loss 0.00528191, throughput 9.5243K wps
[Epoch 20 Batch 90/162] avg loss 0.00544346, throughput 9.56445K wps
[Epoch 20 Batch 120/162] avg loss 0.00574723, throughput 9.53746K wps
[Epoch 20 Batch 150/162] avg loss 0.00529826, throughput 9.39591K wps
Begin Testing...
[Epoch 20] train avg loss 0.00544785, dev acc 0.9056, dev avg loss 0.276786, throughput 9.53617K wps
[Epoch 21 Batch 30/162] avg loss 0.00524969, throughput 9.57011K wps
[Epoch 21 Batch 60/162] avg loss 0.00548531, throughput 9.49003K wps
[Epoch 21 Batch 90/162] avg loss 0.00517617, throughput 9.33073K wps
[Epoch 21 Batch 120/162] avg loss 0.00527949, throughput 9.47007K wps
[Epoch 21 Batch 150/162] avg loss 0.00526136, throughput 9.34523K wps
Begin Testing...
[Epoch 21] train avg loss 0.00531254, dev acc 0.9056, dev avg loss 0.273125, throughput 9.45373K wps
[Epoch 22 Batch 30/162] avg loss 0.00510113, throughput 9.6955K wps
[Epoch 22 Batch 60/162] avg loss 0.00489239, throughput 9.56468K wps
[Epoch 22 Batch 90/162] avg loss 0.00510778, throughput 9.40444K wps
[Epoch 22 Batch 120/162] avg loss 0.00564604, throughput 9.3401K wps
[Epoch 22 Batch 150/162] avg loss 0.00520852, throughput 9.40036K wps
Begin Testing...
[Epoch 22] train avg loss 0.0052046, dev acc 0.9067, dev avg loss 0.270058, throughput 9.46844K wps
[Epoch 23 Batch 30/162] avg loss 0.00501132, throughput 9.45797K wps
[Epoch 23 Batch 60/162] avg loss 0.00527927, throughput 9.50068K wps
[Epoch 23 Batch 90/162] avg loss 0.00513286, throughput 9.45796K wps
[Epoch 23 Batch 120/162] avg loss 0.00516424, throughput 9.5026K wps
[Epoch 23 Batch 150/162] avg loss 0.00464099, throughput 9.45687K wps
Begin Testing...
[Epoch 23] train avg loss 0.00506607, dev acc 0.9056, dev avg loss 0.266341, throughput 9.47333K wps
[Epoch 24 Batch 30/162] avg loss 0.00508284, throughput 9.45475K wps
[Epoch 24 Batch 60/162] avg loss 0.00504651, throughput 9.29793K wps
[Epoch 24 Batch 90/162] avg loss 0.00519296, throughput 9.44859K wps
[Epoch 24 Batch 120/162] avg loss 0.00511292, throughput 9.3703K wps
[Epoch 24 Batch 150/162] avg loss 0.00481905, throughput 9.3752K wps
Begin Testing...
[Epoch 24] train avg loss 0.00500738, dev acc 0.9100, dev avg loss 0.262825, throughput 9.38826K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/162] avg loss 0.00489864, throughput 9.56647K wps
[Epoch 25 Batch 60/162] avg loss 0.00520048, throughput 9.55132K wps
[Epoch 25 Batch 90/162] avg loss 0.0046729, throughput 9.5449K wps
[Epoch 25 Batch 120/162] avg loss 0.00470683, throughput 9.48367K wps
[Epoch 25 Batch 150/162] avg loss 0.00493507, throughput 9.40231K wps
Begin Testing...
[Epoch 25] train avg loss 0.00494214, dev acc 0.9067, dev avg loss 0.260009, throughput 9.49344K wps
[Epoch 26 Batch 30/162] avg loss 0.00523675, throughput 9.73317K wps
[Epoch 26 Batch 60/162] avg loss 0.00442369, throughput 9.4471K wps
[Epoch 26 Batch 90/162] avg loss 0.00478524, throughput 9.50189K wps
[Epoch 26 Batch 120/162] avg loss 0.00457294, throughput 9.33065K wps
[Epoch 26 Batch 150/162] avg loss 0.00506119, throughput 9.34235K wps
Begin Testing...
[Epoch 26] train avg loss 0.00482379, dev acc 0.9111, dev avg loss 0.25761, throughput 9.47817K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/162] avg loss 0.00465707, throughput 9.55972K wps
[Epoch 27 Batch 60/162] avg loss 0.00476009, throughput 9.29239K wps
[Epoch 27 Batch 90/162] avg loss 0.00458761, throughput 9.3975K wps
[Epoch 27 Batch 120/162] avg loss 0.00510205, throughput 9.32087K wps
[Epoch 27 Batch 150/162] avg loss 0.00485559, throughput 9.37847K wps
Begin Testing...
[Epoch 27] train avg loss 0.00477718, dev acc 0.9111, dev avg loss 0.255202, throughput 9.40621K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/162] avg loss 0.00459179, throughput 9.4494K wps
[Epoch 28 Batch 60/162] avg loss 0.00477292, throughput 9.29028K wps
[Epoch 28 Batch 90/162] avg loss 0.00475007, throughput 9.35829K wps
[Epoch 28 Batch 120/162] avg loss 0.00456027, throughput 9.31185K wps
[Epoch 28 Batch 150/162] avg loss 0.00476986, throughput 9.46715K wps
Begin Testing...
[Epoch 28] train avg loss 0.00464604, dev acc 0.9111, dev avg loss 0.252468, throughput 9.36739K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/162] avg loss 0.00495152, throughput 9.55223K wps
[Epoch 29 Batch 60/162] avg loss 0.00472496, throughput 9.37179K wps
[Epoch 29 Batch 90/162] avg loss 0.00440736, throughput 9.48296K wps
[Epoch 29 Batch 120/162] avg loss 0.00450659, throughput 9.41146K wps
[Epoch 29 Batch 150/162] avg loss 0.00443813, throughput 9.42477K wps
Begin Testing...
[Epoch 29] train avg loss 0.00455219, dev acc 0.9122, dev avg loss 0.250449, throughput 9.45928K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/162] avg loss 0.00450807, throughput 9.44462K wps
[Epoch 30 Batch 60/162] avg loss 0.00411041, throughput 9.24222K wps
[Epoch 30 Batch 90/162] avg loss 0.00421674, throughput 9.40247K wps
[Epoch 30 Batch 120/162] avg loss 0.00494776, throughput 9.40438K wps
[Epoch 30 Batch 150/162] avg loss 0.00468724, throughput 9.25262K wps
Begin Testing...
[Epoch 30] train avg loss 0.00453674, dev acc 0.9133, dev avg loss 0.249946, throughput 9.34694K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/162] avg loss 0.00443646, throughput 9.5003K wps
[Epoch 31 Batch 60/162] avg loss 0.00432406, throughput 9.41434K wps
[Epoch 31 Batch 90/162] avg loss 0.00490161, throughput 9.54533K wps
[Epoch 31 Batch 120/162] avg loss 0.00435793, throughput 9.5111K wps
[Epoch 31 Batch 150/162] avg loss 0.00419358, throughput 9.38685K wps
Begin Testing...
[Epoch 31] train avg loss 0.00443906, dev acc 0.9122, dev avg loss 0.247184, throughput 9.47838K wps
[Epoch 32 Batch 30/162] avg loss 0.0043791, throughput 9.59499K wps
[Epoch 32 Batch 60/162] avg loss 0.00418867, throughput 9.55898K wps
[Epoch 32 Batch 90/162] avg loss 0.00421759, throughput 9.51839K wps
[Epoch 32 Batch 120/162] avg loss 0.0043961, throughput 9.38598K wps
[Epoch 32 Batch 150/162] avg loss 0.00467612, throughput 9.29118K wps
Begin Testing...
[Epoch 32] train avg loss 0.0043886, dev acc 0.9100, dev avg loss 0.245101, throughput 9.4542K wps
[Epoch 33 Batch 30/162] avg loss 0.0041956, throughput 9.51613K wps
[Epoch 33 Batch 60/162] avg loss 0.00444644, throughput 9.36423K wps
[Epoch 33 Batch 90/162] avg loss 0.00451396, throughput 9.23207K wps
[Epoch 33 Batch 120/162] avg loss 0.00449578, throughput 9.38102K wps
[Epoch 33 Batch 150/162] avg loss 0.00411528, throughput 9.23957K wps
Begin Testing...
[Epoch 33] train avg loss 0.00432072, dev acc 0.9111, dev avg loss 0.242991, throughput 9.32943K wps
[Epoch 34 Batch 30/162] avg loss 0.00404961, throughput 9.41653K wps
[Epoch 34 Batch 60/162] avg loss 0.00407636, throughput 9.388K wps
[Epoch 34 Batch 90/162] avg loss 0.00410159, throughput 9.3557K wps
[Epoch 34 Batch 120/162] avg loss 0.00455285, throughput 9.37711K wps
[Epoch 34 Batch 150/162] avg loss 0.00447952, throughput 9.40347K wps
Begin Testing...
[Epoch 34] train avg loss 0.00422945, dev acc 0.9167, dev avg loss 0.24166, throughput 9.38692K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/162] avg loss 0.00405558, throughput 9.56081K wps
[Epoch 35 Batch 60/162] avg loss 0.00425252, throughput 9.53486K wps
[Epoch 35 Batch 90/162] avg loss 0.00398845, throughput 9.32025K wps
[Epoch 35 Batch 120/162] avg loss 0.00420538, throughput 9.40946K wps
[Epoch 35 Batch 150/162] avg loss 0.00426934, throughput 9.36935K wps
Begin Testing...
[Epoch 35] train avg loss 0.00418682, dev acc 0.9111, dev avg loss 0.242246, throughput 9.43124K wps
[Epoch 36 Batch 30/162] avg loss 0.00448431, throughput 9.55008K wps
[Epoch 36 Batch 60/162] avg loss 0.00398378, throughput 9.34084K wps
[Epoch 36 Batch 90/162] avg loss 0.00405471, throughput 9.26619K wps
[Epoch 36 Batch 120/162] avg loss 0.00436351, throughput 9.30262K wps
[Epoch 36 Batch 150/162] avg loss 0.00359431, throughput 9.45511K wps
Begin Testing...
[Epoch 36] train avg loss 0.00408005, dev acc 0.9156, dev avg loss 0.23805, throughput 9.37372K wps
[Epoch 37 Batch 30/162] avg loss 0.00443964, throughput 9.59793K wps
[Epoch 37 Batch 60/162] avg loss 0.00417021, throughput 9.25536K wps
[Epoch 37 Batch 90/162] avg loss 0.00369751, throughput 9.48843K wps
[Epoch 37 Batch 120/162] avg loss 0.00391587, throughput 9.25898K wps
[Epoch 37 Batch 150/162] avg loss 0.00380923, throughput 9.34119K wps
Begin Testing...
[Epoch 37] train avg loss 0.00403013, dev acc 0.9122, dev avg loss 0.237798, throughput 9.3771K wps
[Epoch 38 Batch 30/162] avg loss 0.00368733, throughput 9.55393K wps
[Epoch 38 Batch 60/162] avg loss 0.00391182, throughput 9.37262K wps
[Epoch 38 Batch 90/162] avg loss 0.00396453, throughput 9.43634K wps
[Epoch 38 Batch 120/162] avg loss 0.0041397, throughput 9.44248K wps
[Epoch 38 Batch 150/162] avg loss 0.00424483, throughput 9.51457K wps
Begin Testing...
[Epoch 38] train avg loss 0.00396421, dev acc 0.9156, dev avg loss 0.235457, throughput 9.46659K wps
[Epoch 39 Batch 30/162] avg loss 0.00384706, throughput 9.46862K wps
[Epoch 39 Batch 60/162] avg loss 0.00376509, throughput 9.39534K wps
[Epoch 39 Batch 90/162] avg loss 0.00371358, throughput 9.39119K wps
[Epoch 39 Batch 120/162] avg loss 0.0036837, throughput 9.38283K wps
[Epoch 39 Batch 150/162] avg loss 0.00421515, throughput 9.36233K wps
Begin Testing...
[Epoch 39] train avg loss 0.00384098, dev acc 0.9156, dev avg loss 0.233676, throughput 9.39655K wps
[Epoch 40 Batch 30/162] avg loss 0.0038304, throughput 9.50969K wps
[Epoch 40 Batch 60/162] avg loss 0.00371572, throughput 9.41708K wps
[Epoch 40 Batch 90/162] avg loss 0.00422123, throughput 9.42236K wps
[Epoch 40 Batch 120/162] avg loss 0.00393215, throughput 9.52516K wps
[Epoch 40 Batch 150/162] avg loss 0.00343454, throughput 9.39763K wps
Begin Testing...
[Epoch 40] train avg loss 0.00386679, dev acc 0.9156, dev avg loss 0.233001, throughput 9.45388K wps
[Epoch 41 Batch 30/162] avg loss 0.00365855, throughput 9.71311K wps
[Epoch 41 Batch 60/162] avg loss 0.00360422, throughput 9.37947K wps
[Epoch 41 Batch 90/162] avg loss 0.00393725, throughput 9.3635K wps
[Epoch 41 Batch 120/162] avg loss 0.00386439, throughput 9.22235K wps
[Epoch 41 Batch 150/162] avg loss 0.00364322, throughput 9.45094K wps
Begin Testing...
[Epoch 41] train avg loss 0.00376707, dev acc 0.9111, dev avg loss 0.232585, throughput 9.43671K wps
[Epoch 42 Batch 30/162] avg loss 0.00356105, throughput 9.66854K wps
[Epoch 42 Batch 60/162] avg loss 0.00405164, throughput 9.46931K wps
[Epoch 42 Batch 90/162] avg loss 0.00365844, throughput 9.45482K wps
[Epoch 42 Batch 120/162] avg loss 0.00366082, throughput 9.3757K wps
[Epoch 42 Batch 150/162] avg loss 0.00370244, throughput 9.62455K wps
Begin Testing...
[Epoch 42] train avg loss 0.00371542, dev acc 0.9156, dev avg loss 0.230118, throughput 9.50509K wps
[Epoch 43 Batch 30/162] avg loss 0.0037323, throughput 9.6174K wps
[Epoch 43 Batch 60/162] avg loss 0.00351933, throughput 9.3256K wps
[Epoch 43 Batch 90/162] avg loss 0.00373353, throughput 9.42253K wps
[Epoch 43 Batch 120/162] avg loss 0.00364719, throughput 9.56141K wps
[Epoch 43 Batch 150/162] avg loss 0.00365445, throughput 9.23229K wps
Begin Testing...
[Epoch 43] train avg loss 0.00364265, dev acc 0.9178, dev avg loss 0.228829, throughput 9.43623K wps
Observed Improvement.
Begin Testing...
[Epoch 44 Batch 30/162] avg loss 0.00348581, throughput 9.70121K wps
[Epoch 44 Batch 60/162] avg loss 0.00367857, throughput 9.21387K wps
[Epoch 44 Batch 90/162] avg loss 0.00366131, throughput 9.38915K wps
[Epoch 44 Batch 120/162] avg loss 0.00388536, throughput 9.33646K wps
[Epoch 44 Batch 150/162] avg loss 0.00338824, throughput 9.41887K wps
Begin Testing...
[Epoch 44] train avg loss 0.00362926, dev acc 0.9167, dev avg loss 0.22834, throughput 9.37063K wps
[Epoch 45 Batch 30/162] avg loss 0.0035622, throughput 9.46774K wps
[Epoch 45 Batch 60/162] avg loss 0.00341074, throughput 9.43816K wps
[Epoch 45 Batch 90/162] avg loss 0.00357691, throughput 9.25185K wps
[Epoch 45 Batch 120/162] avg loss 0.00365014, throughput 9.48571K wps
[Epoch 45 Batch 150/162] avg loss 0.00372701, throughput 9.30793K wps
Begin Testing...
[Epoch 45] train avg loss 0.0035557, dev acc 0.9189, dev avg loss 0.227091, throughput 9.38509K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/162] avg loss 0.00373619, throughput 9.54032K wps
[Epoch 46 Batch 60/162] avg loss 0.00390801, throughput 9.21746K wps
[Epoch 46 Batch 90/162] avg loss 0.00321295, throughput 9.44002K wps
[Epoch 46 Batch 120/162] avg loss 0.00347194, throughput 9.64346K wps
[Epoch 46 Batch 150/162] avg loss 0.0032535, throughput 9.49093K wps
Begin Testing...
[Epoch 46] train avg loss 0.00349264, dev acc 0.9178, dev avg loss 0.225902, throughput 9.44335K wps
[Epoch 47 Batch 30/162] avg loss 0.00321667, throughput 9.52042K wps
[Epoch 47 Batch 60/162] avg loss 0.0036551, throughput 9.5516K wps
[Epoch 47 Batch 90/162] avg loss 0.00330308, throughput 9.26896K wps
[Epoch 47 Batch 120/162] avg loss 0.00351787, throughput 9.53198K wps
[Epoch 47 Batch 150/162] avg loss 0.00354322, throughput 9.50896K wps
Begin Testing...
[Epoch 47] train avg loss 0.0034673, dev acc 0.9178, dev avg loss 0.22496, throughput 9.48018K wps
[Epoch 48 Batch 30/162] avg loss 0.00339867, throughput 9.58202K wps
[Epoch 48 Batch 60/162] avg loss 0.00356258, throughput 9.48541K wps
[Epoch 48 Batch 90/162] avg loss 0.00366516, throughput 9.54232K wps
[Epoch 48 Batch 120/162] avg loss 0.00348893, throughput 9.47152K wps
[Epoch 48 Batch 150/162] avg loss 0.00322505, throughput 9.32159K wps
Begin Testing...
[Epoch 48] train avg loss 0.00342565, dev acc 0.9178, dev avg loss 0.224293, throughput 9.44939K wps
[Epoch 49 Batch 30/162] avg loss 0.00342041, throughput 9.46309K wps
[Epoch 49 Batch 60/162] avg loss 0.00356455, throughput 9.30137K wps
[Epoch 49 Batch 90/162] avg loss 0.00336422, throughput 9.20449K wps
[Epoch 49 Batch 120/162] avg loss 0.0031011, throughput 9.21031K wps
[Epoch 49 Batch 150/162] avg loss 0.00367254, throughput 9.30586K wps
Begin Testing...
[Epoch 49] train avg loss 0.00340466, dev acc 0.9178, dev avg loss 0.223225, throughput 9.30418K wps
[Epoch 50 Batch 30/162] avg loss 0.00347847, throughput 9.46388K wps
[Epoch 50 Batch 60/162] avg loss 0.00312123, throughput 9.36587K wps
[Epoch 50 Batch 90/162] avg loss 0.00341098, throughput 9.37377K wps
[Epoch 50 Batch 120/162] avg loss 0.00322874, throughput 9.34715K wps
[Epoch 50 Batch 150/162] avg loss 0.00349886, throughput 9.31039K wps
Begin Testing...
[Epoch 50] train avg loss 0.00334845, dev acc 0.9189, dev avg loss 0.222546, throughput 9.38239K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/162] avg loss 0.00304245, throughput 9.57039K wps
[Epoch 51 Batch 60/162] avg loss 0.00320162, throughput 9.40134K wps
[Epoch 51 Batch 90/162] avg loss 0.0035254, throughput 9.22304K wps
[Epoch 51 Batch 120/162] avg loss 0.00332416, throughput 9.21957K wps
[Epoch 51 Batch 150/162] avg loss 0.00290001, throughput 9.46228K wps
Begin Testing...
[Epoch 51] train avg loss 0.00320044, dev acc 0.9178, dev avg loss 0.221703, throughput 9.37985K wps
[Epoch 52 Batch 30/162] avg loss 0.00330129, throughput 9.53952K wps
[Epoch 52 Batch 60/162] avg loss 0.00319879, throughput 9.42616K wps
[Epoch 52 Batch 90/162] avg loss 0.00331638, throughput 9.37914K wps
[Epoch 52 Batch 120/162] avg loss 0.0031235, throughput 9.45869K wps
[Epoch 52 Batch 150/162] avg loss 0.00322151, throughput 9.35381K wps
Begin Testing...
[Epoch 52] train avg loss 0.00324313, dev acc 0.9178, dev avg loss 0.221029, throughput 9.41941K wps
[Epoch 53 Batch 30/162] avg loss 0.00312318, throughput 9.48197K wps
[Epoch 53 Batch 60/162] avg loss 0.00293857, throughput 9.30991K wps
[Epoch 53 Batch 90/162] avg loss 0.00348293, throughput 9.29317K wps
[Epoch 53 Batch 120/162] avg loss 0.00308333, throughput 9.43754K wps
[Epoch 53 Batch 150/162] avg loss 0.00317115, throughput 9.38241K wps
Begin Testing...
[Epoch 53] train avg loss 0.00313697, dev acc 0.9189, dev avg loss 0.219898, throughput 9.37372K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/162] avg loss 0.00306758, throughput 9.53444K wps
[Epoch 54 Batch 60/162] avg loss 0.0030859, throughput 9.1984K wps
[Epoch 54 Batch 90/162] avg loss 0.00317954, throughput 9.1559K wps
[Epoch 54 Batch 120/162] avg loss 0.00295329, throughput 9.38802K wps
[Epoch 54 Batch 150/162] avg loss 0.00323905, throughput 9.28328K wps
Begin Testing...
[Epoch 54] train avg loss 0.00309711, dev acc 0.9156, dev avg loss 0.219606, throughput 9.30942K wps
[Epoch 55 Batch 30/162] avg loss 0.00309912, throughput 9.55629K wps
[Epoch 55 Batch 60/162] avg loss 0.00328996, throughput 9.26022K wps
[Epoch 55 Batch 90/162] avg loss 0.00283497, throughput 9.3437K wps
[Epoch 55 Batch 120/162] avg loss 0.00309373, throughput 9.33788K wps
[Epoch 55 Batch 150/162] avg loss 0.00322807, throughput 9.32481K wps
Begin Testing...
[Epoch 55] train avg loss 0.00314959, dev acc 0.9189, dev avg loss 0.218877, throughput 9.35597K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/162] avg loss 0.00299748, throughput 9.57061K wps
[Epoch 56 Batch 60/162] avg loss 0.00324153, throughput 9.3992K wps
[Epoch 56 Batch 90/162] avg loss 0.0029744, throughput 9.34628K wps
[Epoch 56 Batch 120/162] avg loss 0.00312703, throughput 9.26112K wps
[Epoch 56 Batch 150/162] avg loss 0.0027195, throughput 9.38876K wps
Begin Testing...
[Epoch 56] train avg loss 0.0030296, dev acc 0.9189, dev avg loss 0.218157, throughput 9.39806K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/162] avg loss 0.00297457, throughput 9.53425K wps
[Epoch 57 Batch 60/162] avg loss 0.00302075, throughput 9.39484K wps
[Epoch 57 Batch 90/162] avg loss 0.00308144, throughput 9.471K wps
[Epoch 57 Batch 120/162] avg loss 0.00301216, throughput 9.33223K wps
[Epoch 57 Batch 150/162] avg loss 0.00285948, throughput 9.5359K wps
Begin Testing...
[Epoch 57] train avg loss 0.00298452, dev acc 0.9211, dev avg loss 0.216941, throughput 9.44929K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/162] avg loss 0.0030519, throughput 9.42538K wps
[Epoch 58 Batch 60/162] avg loss 0.00304264, throughput 9.36764K wps
[Epoch 58 Batch 90/162] avg loss 0.00318671, throughput 9.45294K wps
[Epoch 58 Batch 120/162] avg loss 0.00294361, throughput 9.41266K wps
[Epoch 58 Batch 150/162] avg loss 0.00281069, throughput 9.29921K wps
Begin Testing...
[Epoch 58] train avg loss 0.00299149, dev acc 0.9200, dev avg loss 0.21641, throughput 9.37769K wps
[Epoch 59 Batch 30/162] avg loss 0.00309003, throughput 9.76164K wps
[Epoch 59 Batch 60/162] avg loss 0.00288478, throughput 9.43423K wps
[Epoch 59 Batch 90/162] avg loss 0.0030056, throughput 9.36587K wps
[Epoch 59 Batch 120/162] avg loss 0.00284529, throughput 9.49486K wps
[Epoch 59 Batch 150/162] avg loss 0.00296807, throughput 9.34959K wps
Begin Testing...
[Epoch 59] train avg loss 0.00294704, dev acc 0.9189, dev avg loss 0.21637, throughput 9.48271K wps
[Epoch 60 Batch 30/162] avg loss 0.00309789, throughput 9.58397K wps
[Epoch 60 Batch 60/162] avg loss 0.0028392, throughput 9.416K wps
[Epoch 60 Batch 90/162] avg loss 0.00275479, throughput 9.4663K wps
[Epoch 60 Batch 120/162] avg loss 0.00266363, throughput 9.38289K wps
[Epoch 60 Batch 150/162] avg loss 0.00285094, throughput 9.39935K wps
Begin Testing...
[Epoch 60] train avg loss 0.00286126, dev acc 0.9189, dev avg loss 0.215594, throughput 9.43455K wps
[Epoch 61 Batch 30/162] avg loss 0.00253019, throughput 9.76977K wps
[Epoch 61 Batch 60/162] avg loss 0.00296642, throughput 9.38459K wps
[Epoch 61 Batch 90/162] avg loss 0.00277417, throughput 9.35544K wps
[Epoch 61 Batch 120/162] avg loss 0.00305958, throughput 9.33514K wps
[Epoch 61 Batch 150/162] avg loss 0.00266039, throughput 9.2967K wps
Begin Testing...
[Epoch 61] train avg loss 0.00282061, dev acc 0.9211, dev avg loss 0.214901, throughput 9.42117K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/162] avg loss 0.00269149, throughput 9.6304K wps
[Epoch 62 Batch 60/162] avg loss 0.00275886, throughput 9.28424K wps
[Epoch 62 Batch 90/162] avg loss 0.00278965, throughput 9.46314K wps
[Epoch 62 Batch 120/162] avg loss 0.00278281, throughput 9.41509K wps
[Epoch 62 Batch 150/162] avg loss 0.00296652, throughput 9.46217K wps
Begin Testing...
[Epoch 62] train avg loss 0.00280655, dev acc 0.9200, dev avg loss 0.214234, throughput 9.45876K wps
[Epoch 63 Batch 30/162] avg loss 0.0026918, throughput 9.60831K wps
[Epoch 63 Batch 60/162] avg loss 0.00272578, throughput 9.43545K wps
[Epoch 63 Batch 90/162] avg loss 0.00265863, throughput 9.32025K wps
[Epoch 63 Batch 120/162] avg loss 0.00290196, throughput 9.13743K wps
[Epoch 63 Batch 150/162] avg loss 0.00266342, throughput 9.28168K wps
Begin Testing...
[Epoch 63] train avg loss 0.0027211, dev acc 0.9211, dev avg loss 0.214618, throughput 9.34955K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/162] avg loss 0.0026126, throughput 9.56088K wps
[Epoch 64 Batch 60/162] avg loss 0.00261168, throughput 9.39434K wps
[Epoch 64 Batch 90/162] avg loss 0.00284288, throughput 9.35492K wps
[Epoch 64 Batch 120/162] avg loss 0.00285913, throughput 9.30435K wps
[Epoch 64 Batch 150/162] avg loss 0.00259517, throughput 9.2835K wps
Begin Testing...
[Epoch 64] train avg loss 0.00270896, dev acc 0.9222, dev avg loss 0.212864, throughput 9.38041K wps
Observed Improvement.
Begin Testing...
[Epoch 65 Batch 30/162] avg loss 0.00266184, throughput 9.457K wps
[Epoch 65 Batch 60/162] avg loss 0.00256257, throughput 9.45132K wps
[Epoch 65 Batch 90/162] avg loss 0.00249786, throughput 9.32666K wps
[Epoch 65 Batch 120/162] avg loss 0.00299726, throughput 9.40122K wps
[Epoch 65 Batch 150/162] avg loss 0.00266928, throughput 9.27458K wps
Begin Testing...
[Epoch 65] train avg loss 0.00266813, dev acc 0.9222, dev avg loss 0.212283, throughput 9.37011K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/162] avg loss 0.00250264, throughput 9.52543K wps
[Epoch 66 Batch 60/162] avg loss 0.00241669, throughput 9.27055K wps
[Epoch 66 Batch 90/162] avg loss 0.00282027, throughput 9.42991K wps
[Epoch 66 Batch 120/162] avg loss 0.00269262, throughput 9.23692K wps
[Epoch 66 Batch 150/162] avg loss 0.00266666, throughput 9.27168K wps
Begin Testing...
[Epoch 66] train avg loss 0.00259419, dev acc 0.9222, dev avg loss 0.211483, throughput 9.33889K wps
Observed Improvement.
Begin Testing...
[Epoch 67 Batch 30/162] avg loss 0.00249504, throughput 9.47676K wps
[Epoch 67 Batch 60/162] avg loss 0.00250064, throughput 9.29991K wps
[Epoch 67 Batch 90/162] avg loss 0.00257394, throughput 9.36837K wps
[Epoch 67 Batch 120/162] avg loss 0.00274929, throughput 9.35763K wps
[Epoch 67 Batch 150/162] avg loss 0.00266347, throughput 9.44608K wps
Begin Testing...
[Epoch 67] train avg loss 0.00259588, dev acc 0.9200, dev avg loss 0.210808, throughput 9.37524K wps
[Epoch 68 Batch 30/162] avg loss 0.0026491, throughput 9.55714K wps
[Epoch 68 Batch 60/162] avg loss 0.00293124, throughput 9.24097K wps
[Epoch 68 Batch 90/162] avg loss 0.0023547, throughput 9.39898K wps
[Epoch 68 Batch 120/162] avg loss 0.00261437, throughput 9.43415K wps
[Epoch 68 Batch 150/162] avg loss 0.0023147, throughput 9.37737K wps
Begin Testing...
[Epoch 68] train avg loss 0.00255004, dev acc 0.9200, dev avg loss 0.210856, throughput 9.4112K wps
[Epoch 69 Batch 30/162] avg loss 0.00236264, throughput 9.76615K wps
[Epoch 69 Batch 60/162] avg loss 0.00254204, throughput 9.53889K wps
[Epoch 69 Batch 90/162] avg loss 0.00256414, throughput 9.38684K wps
[Epoch 69 Batch 120/162] avg loss 0.00262695, throughput 9.44273K wps
[Epoch 69 Batch 150/162] avg loss 0.00251046, throughput 9.40966K wps
Begin Testing...
[Epoch 69] train avg loss 0.00253583, dev acc 0.9200, dev avg loss 0.210559, throughput 9.50258K wps
[Epoch 70 Batch 30/162] avg loss 0.0024204, throughput 9.53716K wps
[Epoch 70 Batch 60/162] avg loss 0.00244967, throughput 9.4404K wps
[Epoch 70 Batch 90/162] avg loss 0.00240488, throughput 9.36018K wps
[Epoch 70 Batch 120/162] avg loss 0.00263349, throughput 9.45604K wps
[Epoch 70 Batch 150/162] avg loss 0.00234914, throughput 9.35992K wps
Begin Testing...
[Epoch 70] train avg loss 0.00245254, dev acc 0.9200, dev avg loss 0.209931, throughput 9.43631K wps
[Epoch 71 Batch 30/162] avg loss 0.00240789, throughput 9.68664K wps
[Epoch 71 Batch 60/162] avg loss 0.00248945, throughput 9.3224K wps
[Epoch 71 Batch 90/162] avg loss 0.00256101, throughput 9.40859K wps
[Epoch 71 Batch 120/162] avg loss 0.00226569, throughput 9.23253K wps
[Epoch 71 Batch 150/162] avg loss 0.00262153, throughput 9.38478K wps
Begin Testing...
[Epoch 71] train avg loss 0.00248414, dev acc 0.9211, dev avg loss 0.209473, throughput 9.38691K wps
[Epoch 72 Batch 30/162] avg loss 0.00231295, throughput 9.40852K wps
[Epoch 72 Batch 60/162] avg loss 0.00227905, throughput 9.18122K wps
[Epoch 72 Batch 90/162] avg loss 0.00249192, throughput 9.33384K wps
[Epoch 72 Batch 120/162] avg loss 0.00220115, throughput 9.49662K wps
[Epoch 72 Batch 150/162] avg loss 0.00257822, throughput 9.53931K wps
Begin Testing...
[Epoch 72] train avg loss 0.00237862, dev acc 0.9222, dev avg loss 0.208919, throughput 9.37349K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/162] avg loss 0.00248132, throughput 9.66719K wps
[Epoch 73 Batch 60/162] avg loss 0.00223944, throughput 9.3788K wps
[Epoch 73 Batch 90/162] avg loss 0.00237729, throughput 9.33332K wps
[Epoch 73 Batch 120/162] avg loss 0.00263944, throughput 9.51485K wps
[Epoch 73 Batch 150/162] avg loss 0.00230547, throughput 9.52591K wps
Begin Testing...
[Epoch 73] train avg loss 0.00239745, dev acc 0.9222, dev avg loss 0.208237, throughput 9.47115K wps
Observed Improvement.
Begin Testing...
[Epoch 74 Batch 30/162] avg loss 0.00233452, throughput 9.72133K wps
[Epoch 74 Batch 60/162] avg loss 0.00235288, throughput 9.42341K wps
[Epoch 74 Batch 90/162] avg loss 0.00223811, throughput 9.50422K wps
[Epoch 74 Batch 120/162] avg loss 0.00246239, throughput 9.34197K wps
[Epoch 74 Batch 150/162] avg loss 0.00239308, throughput 9.39556K wps
Begin Testing...
[Epoch 74] train avg loss 0.00235093, dev acc 0.9211, dev avg loss 0.207968, throughput 9.46835K wps
[Epoch 75 Batch 30/162] avg loss 0.00257944, throughput 9.6299K wps
[Epoch 75 Batch 60/162] avg loss 0.00218365, throughput 9.38988K wps
[Epoch 75 Batch 90/162] avg loss 0.0021832, throughput 9.56108K wps
[Epoch 75 Batch 120/162] avg loss 0.00231687, throughput 9.35633K wps
[Epoch 75 Batch 150/162] avg loss 0.00227732, throughput 9.26077K wps
Begin Testing...
[Epoch 75] train avg loss 0.00234944, dev acc 0.9211, dev avg loss 0.207756, throughput 9.44125K wps
[Epoch 76 Batch 30/162] avg loss 0.00246977, throughput 9.57019K wps
[Epoch 76 Batch 60/162] avg loss 0.00233069, throughput 9.47189K wps
[Epoch 76 Batch 90/162] avg loss 0.00205494, throughput 9.47045K wps
[Epoch 76 Batch 120/162] avg loss 0.00233756, throughput 9.3395K wps
[Epoch 76 Batch 150/162] avg loss 0.00245006, throughput 9.38939K wps
Begin Testing...
[Epoch 76] train avg loss 0.00231847, dev acc 0.9233, dev avg loss 0.208611, throughput 9.43029K wps
Observed Improvement.
Begin Testing...
[Epoch 77 Batch 30/162] avg loss 0.0024733, throughput 9.6933K wps
[Epoch 77 Batch 60/162] avg loss 0.00224544, throughput 9.36658K wps
[Epoch 77 Batch 90/162] avg loss 0.00225306, throughput 9.3526K wps
[Epoch 77 Batch 120/162] avg loss 0.00239805, throughput 9.40154K wps
[Epoch 77 Batch 150/162] avg loss 0.00195458, throughput 9.40228K wps
Begin Testing...
[Epoch 77] train avg loss 0.00228019, dev acc 0.9233, dev avg loss 0.206629, throughput 9.44618K wps
Observed Improvement.
Begin Testing...
[Epoch 78 Batch 30/162] avg loss 0.00207113, throughput 9.69232K wps
[Epoch 78 Batch 60/162] avg loss 0.00228975, throughput 9.23539K wps
[Epoch 78 Batch 90/162] avg loss 0.00230338, throughput 9.42575K wps
[Epoch 78 Batch 120/162] avg loss 0.00218398, throughput 9.37048K wps
[Epoch 78 Batch 150/162] avg loss 0.00197783, throughput 9.32321K wps
Begin Testing...
[Epoch 78] train avg loss 0.00217135, dev acc 0.9211, dev avg loss 0.206674, throughput 9.40951K wps
[Epoch 79 Batch 30/162] avg loss 0.00235005, throughput 9.47869K wps
[Epoch 79 Batch 60/162] avg loss 0.00216836, throughput 9.3978K wps
[Epoch 79 Batch 90/162] avg loss 0.00214022, throughput 9.37198K wps
[Epoch 79 Batch 120/162] avg loss 0.00223165, throughput 9.33747K wps
[Epoch 79 Batch 150/162] avg loss 0.00198613, throughput 9.41899K wps
Begin Testing...
[Epoch 79] train avg loss 0.00215863, dev acc 0.9222, dev avg loss 0.206344, throughput 9.39235K wps
[Epoch 80 Batch 30/162] avg loss 0.00222975, throughput 9.6298K wps
[Epoch 80 Batch 60/162] avg loss 0.00184238, throughput 9.28435K wps
[Epoch 80 Batch 90/162] avg loss 0.00252918, throughput 9.25054K wps
[Epoch 80 Batch 120/162] avg loss 0.00220173, throughput 9.49821K wps
[Epoch 80 Batch 150/162] avg loss 0.00183768, throughput 9.28979K wps
Begin Testing...
[Epoch 80] train avg loss 0.00212165, dev acc 0.9244, dev avg loss 0.206298, throughput 9.40949K wps
Observed Improvement.
Begin Testing...
[Epoch 81 Batch 30/162] avg loss 0.00230204, throughput 9.49307K wps
[Epoch 81 Batch 60/162] avg loss 0.00216045, throughput 9.31312K wps
[Epoch 81 Batch 90/162] avg loss 0.00231145, throughput 9.29803K wps
[Epoch 81 Batch 120/162] avg loss 0.00209262, throughput 9.33757K wps
[Epoch 81 Batch 150/162] avg loss 0.0019635, throughput 9.31314K wps
Begin Testing...
[Epoch 81] train avg loss 0.0021713, dev acc 0.9233, dev avg loss 0.205901, throughput 9.34852K wps
[Epoch 82 Batch 30/162] avg loss 0.00190619, throughput 9.55957K wps
[Epoch 82 Batch 60/162] avg loss 0.00234425, throughput 9.35162K wps
[Epoch 82 Batch 90/162] avg loss 0.00232376, throughput 9.22046K wps
[Epoch 82 Batch 120/162] avg loss 0.00202978, throughput 9.54604K wps
[Epoch 82 Batch 150/162] avg loss 0.00209262, throughput 9.52053K wps
Begin Testing...
[Epoch 82] train avg loss 0.00211971, dev acc 0.9222, dev avg loss 0.205093, throughput 9.42898K wps
[Epoch 83 Batch 30/162] avg loss 0.00217346, throughput 9.32124K wps
[Epoch 83 Batch 60/162] avg loss 0.00185793, throughput 9.43477K wps
[Epoch 83 Batch 90/162] avg loss 0.00228981, throughput 9.39923K wps
[Epoch 83 Batch 120/162] avg loss 0.00215277, throughput 9.38176K wps
[Epoch 83 Batch 150/162] avg loss 0.0022617, throughput 9.43604K wps
Begin Testing...
[Epoch 83] train avg loss 0.00215133, dev acc 0.9244, dev avg loss 0.205591, throughput 9.37267K wps
Observed Improvement.
Begin Testing...
[Epoch 84 Batch 30/162] avg loss 0.00187361, throughput 9.58512K wps
[Epoch 84 Batch 60/162] avg loss 0.00205785, throughput 9.26876K wps
[Epoch 84 Batch 90/162] avg loss 0.00206367, throughput 9.509K wps
[Epoch 84 Batch 120/162] avg loss 0.00219402, throughput 9.33053K wps
[Epoch 84 Batch 150/162] avg loss 0.00198974, throughput 9.37345K wps
Begin Testing...
[Epoch 84] train avg loss 0.00203205, dev acc 0.9222, dev avg loss 0.204304, throughput 9.39575K wps
[Epoch 85 Batch 30/162] avg loss 0.00208433, throughput 9.5062K wps
[Epoch 85 Batch 60/162] avg loss 0.00177657, throughput 9.31559K wps
[Epoch 85 Batch 90/162] avg loss 0.00198931, throughput 9.29835K wps
[Epoch 85 Batch 120/162] avg loss 0.00200297, throughput 9.39833K wps
[Epoch 85 Batch 150/162] avg loss 0.00227748, throughput 9.29818K wps
Begin Testing...
[Epoch 85] train avg loss 0.00202292, dev acc 0.9256, dev avg loss 0.203931, throughput 9.3396K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/162] avg loss 0.00196756, throughput 9.52375K wps
[Epoch 86 Batch 60/162] avg loss 0.0019974, throughput 9.23258K wps
[Epoch 86 Batch 90/162] avg loss 0.00204913, throughput 9.34431K wps
[Epoch 86 Batch 120/162] avg loss 0.00179861, throughput 9.4073K wps
[Epoch 86 Batch 150/162] avg loss 0.0018706, throughput 9.37705K wps
Begin Testing...
[Epoch 86] train avg loss 0.0019604, dev acc 0.9244, dev avg loss 0.204496, throughput 9.37745K wps
[Epoch 87 Batch 30/162] avg loss 0.00190248, throughput 9.5779K wps
[Epoch 87 Batch 60/162] avg loss 0.00211798, throughput 9.29668K wps
[Epoch 87 Batch 90/162] avg loss 0.00191901, throughput 9.24771K wps
[Epoch 87 Batch 120/162] avg loss 0.00195456, throughput 9.44121K wps
[Epoch 87 Batch 150/162] avg loss 0.00204954, throughput 9.27785K wps
Begin Testing...
[Epoch 87] train avg loss 0.00199495, dev acc 0.9256, dev avg loss 0.203738, throughput 9.37875K wps
Observed Improvement.
Begin Testing...
[Epoch 88 Batch 30/162] avg loss 0.0019946, throughput 9.41906K wps
[Epoch 88 Batch 60/162] avg loss 0.00193633, throughput 9.30126K wps
[Epoch 88 Batch 90/162] avg loss 0.00185998, throughput 9.5232K wps
[Epoch 88 Batch 120/162] avg loss 0.00195051, throughput 9.43933K wps
[Epoch 88 Batch 150/162] avg loss 0.00214931, throughput 9.31672K wps
Begin Testing...
[Epoch 88] train avg loss 0.00195381, dev acc 0.9256, dev avg loss 0.203697, throughput 9.39728K wps
Observed Improvement.
Begin Testing...
[Epoch 89 Batch 30/162] avg loss 0.00192866, throughput 9.52825K wps
[Epoch 89 Batch 60/162] avg loss 0.00197791, throughput 9.26104K wps
[Epoch 89 Batch 90/162] avg loss 0.00193645, throughput 9.40345K wps
[Epoch 89 Batch 120/162] avg loss 0.00217938, throughput 9.37801K wps
[Epoch 89 Batch 150/162] avg loss 0.00199219, throughput 9.28022K wps
Begin Testing...
[Epoch 89] train avg loss 0.00200452, dev acc 0.9244, dev avg loss 0.204087, throughput 9.38397K wps
[Epoch 90 Batch 30/162] avg loss 0.00185857, throughput 9.51826K wps
[Epoch 90 Batch 60/162] avg loss 0.00202954, throughput 9.3183K wps
[Epoch 90 Batch 90/162] avg loss 0.00203731, throughput 9.44054K wps
[Epoch 90 Batch 120/162] avg loss 0.00187401, throughput 9.31878K wps
[Epoch 90 Batch 150/162] avg loss 0.00185091, throughput 9.23475K wps
Begin Testing...
[Epoch 90] train avg loss 0.00190673, dev acc 0.9256, dev avg loss 0.202735, throughput 9.38357K wps
Observed Improvement.
Begin Testing...
[Epoch 91 Batch 30/162] avg loss 0.00167195, throughput 9.48947K wps
[Epoch 91 Batch 60/162] avg loss 0.00180569, throughput 9.41164K wps
[Epoch 91 Batch 90/162] avg loss 0.00181314, throughput 9.38445K wps
[Epoch 91 Batch 120/162] avg loss 0.00202195, throughput 9.43237K wps
[Epoch 91 Batch 150/162] avg loss 0.00214542, throughput 9.30149K wps
Begin Testing...
[Epoch 91] train avg loss 0.00186402, dev acc 0.9233, dev avg loss 0.202484, throughput 9.41859K wps
[Epoch 92 Batch 30/162] avg loss 0.00202393, throughput 9.57583K wps
[Epoch 92 Batch 60/162] avg loss 0.00166262, throughput 9.4278K wps
[Epoch 92 Batch 90/162] avg loss 0.0018026, throughput 9.32236K wps
[Epoch 92 Batch 120/162] avg loss 0.0016359, throughput 9.46342K wps
[Epoch 92 Batch 150/162] avg loss 0.00196304, throughput 9.4185K wps
Begin Testing...
[Epoch 92] train avg loss 0.00184312, dev acc 0.9256, dev avg loss 0.202851, throughput 9.4406K wps
Observed Improvement.
Begin Testing...
[Epoch 93 Batch 30/162] avg loss 0.00192257, throughput 9.51555K wps
[Epoch 93 Batch 60/162] avg loss 0.00183915, throughput 9.23505K wps
[Epoch 93 Batch 90/162] avg loss 0.00188178, throughput 9.42059K wps
[Epoch 93 Batch 120/162] avg loss 0.00172586, throughput 9.35112K wps
[Epoch 93 Batch 150/162] avg loss 0.00162763, throughput 9.27694K wps
Begin Testing...
[Epoch 93] train avg loss 0.0018171, dev acc 0.9244, dev avg loss 0.202735, throughput 9.35486K wps
[Epoch 94 Batch 30/162] avg loss 0.00163137, throughput 9.48787K wps
[Epoch 94 Batch 60/162] avg loss 0.00172054, throughput 9.52759K wps
[Epoch 94 Batch 90/162] avg loss 0.00191297, throughput 9.29313K wps
[Epoch 94 Batch 120/162] avg loss 0.0017799, throughput 9.39914K wps
[Epoch 94 Batch 150/162] avg loss 0.00186745, throughput 9.41909K wps
Begin Testing...
[Epoch 94] train avg loss 0.00177816, dev acc 0.9267, dev avg loss 0.201452, throughput 9.4095K wps
Observed Improvement.
Begin Testing...
[Epoch 95 Batch 30/162] avg loss 0.00165395, throughput 9.54676K wps
[Epoch 95 Batch 60/162] avg loss 0.00188043, throughput 9.23283K wps
[Epoch 95 Batch 90/162] avg loss 0.00188678, throughput 9.33186K wps
[Epoch 95 Batch 120/162] avg loss 0.00183553, throughput 9.30612K wps
[Epoch 95 Batch 150/162] avg loss 0.00167964, throughput 9.36738K wps
Begin Testing...
[Epoch 95] train avg loss 0.00177887, dev acc 0.9278, dev avg loss 0.201506, throughput 9.35049K wps
Observed Improvement.
Begin Testing...
[Epoch 96 Batch 30/162] avg loss 0.0018067, throughput 9.36084K wps
[Epoch 96 Batch 60/162] avg loss 0.00150185, throughput 9.45998K wps
[Epoch 96 Batch 90/162] avg loss 0.00174129, throughput 9.40955K wps
[Epoch 96 Batch 120/162] avg loss 0.00168815, throughput 9.43815K wps
[Epoch 96 Batch 150/162] avg loss 0.00183747, throughput 9.38871K wps
Begin Testing...
[Epoch 96] train avg loss 0.00172719, dev acc 0.9256, dev avg loss 0.202846, throughput 9.4169K wps
[Epoch 97 Batch 30/162] avg loss 0.0016267, throughput 9.4709K wps
[Epoch 97 Batch 60/162] avg loss 0.00164389, throughput 9.51736K wps
[Epoch 97 Batch 90/162] avg loss 0.00153075, throughput 9.30254K wps
[Epoch 97 Batch 120/162] avg loss 0.00184117, throughput 9.53896K wps
[Epoch 97 Batch 150/162] avg loss 0.00179041, throughput 9.29699K wps
Begin Testing...
[Epoch 97] train avg loss 0.00169207, dev acc 0.9256, dev avg loss 0.201731, throughput 9.40968K wps
[Epoch 98 Batch 30/162] avg loss 0.00161255, throughput 9.55141K wps
[Epoch 98 Batch 60/162] avg loss 0.0017042, throughput 9.25578K wps
[Epoch 98 Batch 90/162] avg loss 0.00162267, throughput 9.31859K wps
[Epoch 98 Batch 120/162] avg loss 0.00183334, throughput 9.42684K wps
[Epoch 98 Batch 150/162] avg loss 0.0018736, throughput 9.26711K wps
Begin Testing...
[Epoch 98] train avg loss 0.00172534, dev acc 0.9278, dev avg loss 0.20133, throughput 9.35885K wps
Observed Improvement.
Begin Testing...
[Epoch 99 Batch 30/162] avg loss 0.00178203, throughput 9.69662K wps
[Epoch 99 Batch 60/162] avg loss 0.00168355, throughput 9.25415K wps
[Epoch 99 Batch 90/162] avg loss 0.00180107, throughput 9.45727K wps
[Epoch 99 Batch 120/162] avg loss 0.00156636, throughput 9.33142K wps
[Epoch 99 Batch 150/162] avg loss 0.00161922, throughput 9.42126K wps
Begin Testing...
[Epoch 99] train avg loss 0.00169234, dev acc 0.9278, dev avg loss 0.201103, throughput 9.43427K wps
Observed Improvement.
Begin Testing...
[Epoch 100 Batch 30/162] avg loss 0.0016259, throughput 9.72245K wps
[Epoch 100 Batch 60/162] avg loss 0.00157466, throughput 9.47123K wps
[Epoch 100 Batch 90/162] avg loss 0.00155395, throughput 9.48682K wps
[Epoch 100 Batch 120/162] avg loss 0.00169453, throughput 9.49779K wps
[Epoch 100 Batch 150/162] avg loss 0.00187769, throughput 9.32434K wps
Begin Testing...
[Epoch 100] train avg loss 0.00167188, dev acc 0.9278, dev avg loss 0.201584, throughput 9.50583K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/162] avg loss 0.00150735, throughput 9.55371K wps
[Epoch 101 Batch 60/162] avg loss 0.00189267, throughput 9.30413K wps
[Epoch 101 Batch 90/162] avg loss 0.00151687, throughput 9.31869K wps
[Epoch 101 Batch 120/162] avg loss 0.00165239, throughput 9.52223K wps
[Epoch 101 Batch 150/162] avg loss 0.0016823, throughput 9.32521K wps
Begin Testing...
[Epoch 101] train avg loss 0.00164527, dev acc 0.9244, dev avg loss 0.201661, throughput 9.39745K wps
[Epoch 102 Batch 30/162] avg loss 0.00150766, throughput 9.43727K wps
[Epoch 102 Batch 60/162] avg loss 0.00164845, throughput 9.2507K wps
[Epoch 102 Batch 90/162] avg loss 0.00175769, throughput 9.35673K wps
[Epoch 102 Batch 120/162] avg loss 0.00178624, throughput 9.4864K wps
[Epoch 102 Batch 150/162] avg loss 0.00154615, throughput 9.39803K wps
Begin Testing...
[Epoch 102] train avg loss 0.00164297, dev acc 0.9256, dev avg loss 0.201279, throughput 9.4006K wps
[Epoch 103 Batch 30/162] avg loss 0.00161365, throughput 9.58287K wps
[Epoch 103 Batch 60/162] avg loss 0.00161417, throughput 9.36368K wps
[Epoch 103 Batch 90/162] avg loss 0.00167703, throughput 9.27214K wps
[Epoch 103 Batch 120/162] avg loss 0.00152906, throughput 9.445K wps
[Epoch 103 Batch 150/162] avg loss 0.00155203, throughput 9.4719K wps
Begin Testing...
[Epoch 103] train avg loss 0.00158686, dev acc 0.9267, dev avg loss 0.20078, throughput 9.41522K wps
[Epoch 104 Batch 30/162] avg loss 0.0015189, throughput 9.55253K wps
[Epoch 104 Batch 60/162] avg loss 0.00157563, throughput 9.37204K wps
[Epoch 104 Batch 90/162] avg loss 0.00151875, throughput 9.47342K wps
[Epoch 104 Batch 120/162] avg loss 0.00146111, throughput 9.28916K wps
[Epoch 104 Batch 150/162] avg loss 0.00167138, throughput 9.40638K wps
Begin Testing...
[Epoch 104] train avg loss 0.00155683, dev acc 0.9267, dev avg loss 0.200666, throughput 9.42508K wps
[Epoch 105 Batch 30/162] avg loss 0.00149955, throughput 9.63626K wps
[Epoch 105 Batch 60/162] avg loss 0.00157579, throughput 9.2207K wps
[Epoch 105 Batch 90/162] avg loss 0.00165946, throughput 9.4933K wps
[Epoch 105 Batch 120/162] avg loss 0.00146336, throughput 9.41137K wps
[Epoch 105 Batch 150/162] avg loss 0.00153732, throughput 9.30036K wps
Begin Testing...
[Epoch 105] train avg loss 0.00155901, dev acc 0.9289, dev avg loss 0.199678, throughput 9.40358K wps
Observed Improvement.
Begin Testing...
[Epoch 106 Batch 30/162] avg loss 0.00142332, throughput 9.66661K wps
[Epoch 106 Batch 60/162] avg loss 0.00158049, throughput 9.40541K wps
[Epoch 106 Batch 90/162] avg loss 0.00157714, throughput 9.41834K wps
[Epoch 106 Batch 120/162] avg loss 0.00141014, throughput 9.35045K wps
[Epoch 106 Batch 150/162] avg loss 0.00164792, throughput 9.43854K wps
Begin Testing...
[Epoch 106] train avg loss 0.00152312, dev acc 0.9278, dev avg loss 0.200344, throughput 9.4436K wps
[Epoch 107 Batch 30/162] avg loss 0.00140457, throughput 9.57794K wps
[Epoch 107 Batch 60/162] avg loss 0.00154623, throughput 9.51554K wps
[Epoch 107 Batch 90/162] avg loss 0.00144479, throughput 9.43262K wps
[Epoch 107 Batch 120/162] avg loss 0.00160065, throughput 9.45721K wps
[Epoch 107 Batch 150/162] avg loss 0.0015154, throughput 9.43334K wps
Begin Testing...
[Epoch 107] train avg loss 0.00151217, dev acc 0.9244, dev avg loss 0.201207, throughput 9.46985K wps
[Epoch 108 Batch 30/162] avg loss 0.00157157, throughput 9.5936K wps
[Epoch 108 Batch 60/162] avg loss 0.00155498, throughput 9.36322K wps
[Epoch 108 Batch 90/162] avg loss 0.00149941, throughput 9.37888K wps
[Epoch 108 Batch 120/162] avg loss 0.00141142, throughput 9.22913K wps
[Epoch 108 Batch 150/162] avg loss 0.00173523, throughput 9.35308K wps
Begin Testing...
[Epoch 108] train avg loss 0.00154691, dev acc 0.9278, dev avg loss 0.200023, throughput 9.38223K wps
[Epoch 109 Batch 30/162] avg loss 0.0013554, throughput 9.6K wps
[Epoch 109 Batch 60/162] avg loss 0.00149964, throughput 9.3052K wps
[Epoch 109 Batch 90/162] avg loss 0.00153402, throughput 9.29517K wps
[Epoch 109 Batch 120/162] avg loss 0.001613, throughput 9.28471K wps
[Epoch 109 Batch 150/162] avg loss 0.00141026, throughput 9.37518K wps
Begin Testing...
[Epoch 109] train avg loss 0.00150107, dev acc 0.9278, dev avg loss 0.199981, throughput 9.3516K wps
[Epoch 110 Batch 30/162] avg loss 0.00147237, throughput 9.49169K wps
[Epoch 110 Batch 60/162] avg loss 0.00133405, throughput 9.4005K wps
[Epoch 110 Batch 90/162] avg loss 0.00153076, throughput 9.60073K wps
[Epoch 110 Batch 120/162] avg loss 0.00142159, throughput 9.35705K wps
[Epoch 110 Batch 150/162] avg loss 0.00150565, throughput 9.50525K wps
Begin Testing...
[Epoch 110] train avg loss 0.00147086, dev acc 0.9278, dev avg loss 0.199383, throughput 9.4574K wps
[Epoch 111 Batch 30/162] avg loss 0.00139733, throughput 9.35636K wps
[Epoch 111 Batch 60/162] avg loss 0.00153088, throughput 9.3379K wps
[Epoch 111 Batch 90/162] avg loss 0.00133508, throughput 9.30945K wps
[Epoch 111 Batch 120/162] avg loss 0.00143346, throughput 9.27795K wps
[Epoch 111 Batch 150/162] avg loss 0.00156747, throughput 9.28131K wps
Begin Testing...
[Epoch 111] train avg loss 0.00145299, dev acc 0.9278, dev avg loss 0.199196, throughput 9.32067K wps
[Epoch 112 Batch 30/162] avg loss 0.00141012, throughput 9.49453K wps
[Epoch 112 Batch 60/162] avg loss 0.00164314, throughput 9.31206K wps
[Epoch 112 Batch 90/162] avg loss 0.00135187, throughput 9.31677K wps
[Epoch 112 Batch 120/162] avg loss 0.00145206, throughput 9.49807K wps
[Epoch 112 Batch 150/162] avg loss 0.00138705, throughput 9.59828K wps
Begin Testing...
[Epoch 112] train avg loss 0.0014432, dev acc 0.9311, dev avg loss 0.199142, throughput 9.44057K wps
Observed Improvement.
Begin Testing...
[Epoch 113 Batch 30/162] avg loss 0.0013675, throughput 9.80452K wps
[Epoch 113 Batch 60/162] avg loss 0.00140924, throughput 9.24686K wps
[Epoch 113 Batch 90/162] avg loss 0.00136147, throughput 9.41749K wps
[Epoch 113 Batch 120/162] avg loss 0.00138529, throughput 9.19369K wps
[Epoch 113 Batch 150/162] avg loss 0.00147477, throughput 9.44531K wps
Begin Testing...
[Epoch 113] train avg loss 0.00138702, dev acc 0.9289, dev avg loss 0.199933, throughput 9.4024K wps
[Epoch 114 Batch 30/162] avg loss 0.00127599, throughput 9.65191K wps
[Epoch 114 Batch 60/162] avg loss 0.00142577, throughput 9.34895K wps
[Epoch 114 Batch 90/162] avg loss 0.00155588, throughput 9.31974K wps
[Epoch 114 Batch 120/162] avg loss 0.00124705, throughput 9.319K wps
[Epoch 114 Batch 150/162] avg loss 0.00156108, throughput 9.2978K wps
Begin Testing...
[Epoch 114] train avg loss 0.00140714, dev acc 0.9267, dev avg loss 0.199581, throughput 9.38194K wps
[Epoch 115 Batch 30/162] avg loss 0.0012881, throughput 9.43695K wps
[Epoch 115 Batch 60/162] avg loss 0.00130016, throughput 9.57039K wps
[Epoch 115 Batch 90/162] avg loss 0.00149962, throughput 9.3521K wps
[Epoch 115 Batch 120/162] avg loss 0.00131644, throughput 9.29963K wps
[Epoch 115 Batch 150/162] avg loss 0.00136541, throughput 9.43992K wps
Begin Testing...
[Epoch 115] train avg loss 0.00136657, dev acc 0.9267, dev avg loss 0.199445, throughput 9.40994K wps
[Epoch 116 Batch 30/162] avg loss 0.00144005, throughput 9.5065K wps
[Epoch 116 Batch 60/162] avg loss 0.00121961, throughput 9.38819K wps
[Epoch 116 Batch 90/162] avg loss 0.00146966, throughput 9.44021K wps
[Epoch 116 Batch 120/162] avg loss 0.00128273, throughput 9.38758K wps
[Epoch 116 Batch 150/162] avg loss 0.00126835, throughput 9.37983K wps
Begin Testing...
[Epoch 116] train avg loss 0.00134921, dev acc 0.9278, dev avg loss 0.19927, throughput 9.41364K wps
[Epoch 117 Batch 30/162] avg loss 0.00128174, throughput 9.70739K wps
[Epoch 117 Batch 60/162] avg loss 0.00136264, throughput 9.47089K wps
[Epoch 117 Batch 90/162] avg loss 0.00132684, throughput 9.28686K wps
[Epoch 117 Batch 120/162] avg loss 0.00131676, throughput 9.39006K wps
[Epoch 117 Batch 150/162] avg loss 0.00164203, throughput 9.41568K wps
Begin Testing...
[Epoch 117] train avg loss 0.00136938, dev acc 0.9256, dev avg loss 0.200608, throughput 9.46134K wps
[Epoch 118 Batch 30/162] avg loss 0.00146328, throughput 9.43066K wps
[Epoch 118 Batch 60/162] avg loss 0.00131696, throughput 9.39652K wps
[Epoch 118 Batch 90/162] avg loss 0.00116184, throughput 9.40948K wps
[Epoch 118 Batch 120/162] avg loss 0.00123258, throughput 9.44742K wps
[Epoch 118 Batch 150/162] avg loss 0.00131881, throughput 9.45972K wps
Begin Testing...
[Epoch 118] train avg loss 0.00130584, dev acc 0.9311, dev avg loss 0.199374, throughput 9.42266K wps
Observed Improvement.
Begin Testing...
[Epoch 119 Batch 30/162] avg loss 0.00122319, throughput 9.47212K wps
[Epoch 119 Batch 60/162] avg loss 0.00136209, throughput 9.36105K wps
[Epoch 119 Batch 90/162] avg loss 0.00125783, throughput 9.22653K wps
[Epoch 119 Batch 120/162] avg loss 0.00128243, throughput 9.20817K wps
[Epoch 119 Batch 150/162] avg loss 0.00131484, throughput 9.42213K wps
Begin Testing...
[Epoch 119] train avg loss 0.00129441, dev acc 0.9300, dev avg loss 0.199031, throughput 9.32579K wps
[Epoch 120 Batch 30/162] avg loss 0.00126199, throughput 9.60037K wps
[Epoch 120 Batch 60/162] avg loss 0.00133035, throughput 9.45657K wps
[Epoch 120 Batch 90/162] avg loss 0.00128758, throughput 9.28242K wps
[Epoch 120 Batch 120/162] avg loss 0.00128132, throughput 9.41893K wps
[Epoch 120 Batch 150/162] avg loss 0.00130215, throughput 9.36201K wps
Begin Testing...
[Epoch 120] train avg loss 0.00129626, dev acc 0.9267, dev avg loss 0.199177, throughput 9.41975K wps
[Epoch 121 Batch 30/162] avg loss 0.00114233, throughput 9.58115K wps
[Epoch 121 Batch 60/162] avg loss 0.00125421, throughput 9.46377K wps
[Epoch 121 Batch 90/162] avg loss 0.00131915, throughput 9.26709K wps
[Epoch 121 Batch 120/162] avg loss 0.0011579, throughput 9.42059K wps
[Epoch 121 Batch 150/162] avg loss 0.00128974, throughput 9.44377K wps
Begin Testing...
[Epoch 121] train avg loss 0.00125055, dev acc 0.9256, dev avg loss 0.199479, throughput 9.42811K wps
[Epoch 122 Batch 30/162] avg loss 0.0012395, throughput 9.58242K wps
[Epoch 122 Batch 60/162] avg loss 0.00127934, throughput 9.32832K wps
[Epoch 122 Batch 90/162] avg loss 0.00135989, throughput 9.45706K wps
[Epoch 122 Batch 120/162] avg loss 0.00122102, throughput 9.56583K wps
[Epoch 122 Batch 150/162] avg loss 0.00137403, throughput 9.3757K wps
Begin Testing...
[Epoch 122] train avg loss 0.00127423, dev acc 0.9256, dev avg loss 0.200026, throughput 9.45075K wps
[Epoch 123 Batch 30/162] avg loss 0.00123097, throughput 9.64759K wps
[Epoch 123 Batch 60/162] avg loss 0.00124824, throughput 9.24797K wps
[Epoch 123 Batch 90/162] avg loss 0.00129555, throughput 9.52339K wps
[Epoch 123 Batch 120/162] avg loss 0.00129079, throughput 9.43908K wps
[Epoch 123 Batch 150/162] avg loss 0.00116047, throughput 9.38876K wps
Begin Testing...
[Epoch 123] train avg loss 0.00125174, dev acc 0.9322, dev avg loss 0.198537, throughput 9.44336K wps
Observed Improvement.
Begin Testing...
[Epoch 124 Batch 30/162] avg loss 0.00120542, throughput 9.45576K wps
[Epoch 124 Batch 60/162] avg loss 0.00110786, throughput 9.42433K wps
[Epoch 124 Batch 90/162] avg loss 0.00113806, throughput 9.27189K wps
[Epoch 124 Batch 120/162] avg loss 0.00112598, throughput 9.42717K wps
[Epoch 124 Batch 150/162] avg loss 0.00146502, throughput 9.23453K wps
Begin Testing...
[Epoch 124] train avg loss 0.00120405, dev acc 0.9311, dev avg loss 0.198407, throughput 9.36526K wps
[Epoch 125 Batch 30/162] avg loss 0.00114488, throughput 9.41631K wps
[Epoch 125 Batch 60/162] avg loss 0.00112474, throughput 9.16766K wps
[Epoch 125 Batch 90/162] avg loss 0.00120823, throughput 9.31222K wps
[Epoch 125 Batch 120/162] avg loss 0.00131459, throughput 9.54004K wps
[Epoch 125 Batch 150/162] avg loss 0.00123069, throughput 9.33013K wps
Begin Testing...
[Epoch 125] train avg loss 0.00119058, dev acc 0.9289, dev avg loss 0.19879, throughput 9.34311K wps
[Epoch 126 Batch 30/162] avg loss 0.00115111, throughput 9.4237K wps
[Epoch 126 Batch 60/162] avg loss 0.00115781, throughput 9.33512K wps
[Epoch 126 Batch 90/162] avg loss 0.00119558, throughput 9.18513K wps
[Epoch 126 Batch 120/162] avg loss 0.00126233, throughput 9.22263K wps
[Epoch 126 Batch 150/162] avg loss 0.00114824, throughput 9.25024K wps
Begin Testing...
[Epoch 126] train avg loss 0.00117731, dev acc 0.9267, dev avg loss 0.199156, throughput 9.28786K wps
[Epoch 127 Batch 30/162] avg loss 0.00108638, throughput 9.64139K wps
[Epoch 127 Batch 60/162] avg loss 0.00122516, throughput 9.32021K wps
[Epoch 127 Batch 90/162] avg loss 0.00125105, throughput 9.32356K wps
[Epoch 127 Batch 120/162] avg loss 0.00113019, throughput 9.35535K wps
[Epoch 127 Batch 150/162] avg loss 0.00131444, throughput 9.29766K wps
Begin Testing...
[Epoch 127] train avg loss 0.00118893, dev acc 0.9322, dev avg loss 0.19873, throughput 9.36964K wps
Observed Improvement.
Begin Testing...
[Epoch 128 Batch 30/162] avg loss 0.00130396, throughput 9.71072K wps
[Epoch 128 Batch 60/162] avg loss 0.00114678, throughput 9.37894K wps
[Epoch 128 Batch 90/162] avg loss 0.00102016, throughput 9.46432K wps
[Epoch 128 Batch 120/162] avg loss 0.00126073, throughput 9.26486K wps
[Epoch 128 Batch 150/162] avg loss 0.00125993, throughput 9.30034K wps
Begin Testing...
[Epoch 128] train avg loss 0.00119159, dev acc 0.9311, dev avg loss 0.199058, throughput 9.42134K wps
[Epoch 129 Batch 30/162] avg loss 0.00113545, throughput 9.54132K wps
[Epoch 129 Batch 60/162] avg loss 0.00119513, throughput 9.4028K wps
[Epoch 129 Batch 90/162] avg loss 0.00116344, throughput 9.2748K wps
[Epoch 129 Batch 120/162] avg loss 0.0011865, throughput 9.5926K wps
[Epoch 129 Batch 150/162] avg loss 0.00122651, throughput 9.59699K wps
Begin Testing...
[Epoch 129] train avg loss 0.00117882, dev acc 0.9311, dev avg loss 0.199058, throughput 9.48258K wps
[Epoch 130 Batch 30/162] avg loss 0.00106041, throughput 9.54652K wps
[Epoch 130 Batch 60/162] avg loss 0.00111793, throughput 9.4356K wps
[Epoch 130 Batch 90/162] avg loss 0.00113075, throughput 9.45868K wps
[Epoch 130 Batch 120/162] avg loss 0.00121747, throughput 9.33525K wps
[Epoch 130 Batch 150/162] avg loss 0.00117444, throughput 9.33262K wps
Begin Testing...
[Epoch 130] train avg loss 0.00114068, dev acc 0.9311, dev avg loss 0.199081, throughput 9.42246K wps
[Epoch 131 Batch 30/162] avg loss 0.00114748, throughput 9.52469K wps
[Epoch 131 Batch 60/162] avg loss 0.00116264, throughput 9.48833K wps
[Epoch 131 Batch 90/162] avg loss 0.00111973, throughput 9.24726K wps
[Epoch 131 Batch 120/162] avg loss 0.0010949, throughput 9.43146K wps
[Epoch 131 Batch 150/162] avg loss 0.00104831, throughput 9.30233K wps
Begin Testing...
[Epoch 131] train avg loss 0.00111375, dev acc 0.9311, dev avg loss 0.199268, throughput 9.40373K wps
[Epoch 132 Batch 30/162] avg loss 0.00117632, throughput 9.65601K wps
[Epoch 132 Batch 60/162] avg loss 0.00114367, throughput 9.38117K wps
[Epoch 132 Batch 90/162] avg loss 0.00119868, throughput 9.33662K wps
[Epoch 132 Batch 120/162] avg loss 0.00114817, throughput 9.44067K wps
[Epoch 132 Batch 150/162] avg loss 0.00101591, throughput 9.5173K wps
Begin Testing...
[Epoch 132] train avg loss 0.00113164, dev acc 0.9311, dev avg loss 0.199261, throughput 9.4615K wps
[Epoch 133 Batch 30/162] avg loss 0.0010862, throughput 9.70623K wps
[Epoch 133 Batch 60/162] avg loss 0.00109126, throughput 9.31304K wps
[Epoch 133 Batch 90/162] avg loss 0.00102857, throughput 9.47679K wps
[Epoch 133 Batch 120/162] avg loss 0.0010486, throughput 9.32354K wps
[Epoch 133 Batch 150/162] avg loss 0.00107382, throughput 9.23628K wps
Begin Testing...
[Epoch 133] train avg loss 0.00108574, dev acc 0.9322, dev avg loss 0.199188, throughput 9.41648K wps
Observed Improvement.
Begin Testing...
[Epoch 134 Batch 30/162] avg loss 0.00115467, throughput 9.47643K wps
[Epoch 134 Batch 60/162] avg loss 0.00110443, throughput 9.4626K wps
[Epoch 134 Batch 90/162] avg loss 0.00110047, throughput 9.21819K wps
[Epoch 134 Batch 120/162] avg loss 0.00104304, throughput 9.3083K wps
[Epoch 134 Batch 150/162] avg loss 0.00117293, throughput 9.41173K wps
Begin Testing...
[Epoch 134] train avg loss 0.00110549, dev acc 0.9322, dev avg loss 0.198988, throughput 9.36432K wps
Observed Improvement.
Begin Testing...
[Epoch 135 Batch 30/162] avg loss 0.000991745, throughput 9.65294K wps
[Epoch 135 Batch 60/162] avg loss 0.0010149, throughput 9.22678K wps
[Epoch 135 Batch 90/162] avg loss 0.00119886, throughput 9.41089K wps
[Epoch 135 Batch 120/162] avg loss 0.00118241, throughput 9.22216K wps
[Epoch 135 Batch 150/162] avg loss 0.00113847, throughput 9.36905K wps
Begin Testing...
[Epoch 135] train avg loss 0.0011135, dev acc 0.9300, dev avg loss 0.198867, throughput 9.36878K wps
[Epoch 136 Batch 30/162] avg loss 0.00108496, throughput 9.4102K wps
[Epoch 136 Batch 60/162] avg loss 0.00113656, throughput 9.34808K wps
[Epoch 136 Batch 90/162] avg loss 0.00104562, throughput 9.44032K wps
[Epoch 136 Batch 120/162] avg loss 0.000964258, throughput 9.44368K wps
[Epoch 136 Batch 150/162] avg loss 0.00103422, throughput 9.45424K wps
Begin Testing...
[Epoch 136] train avg loss 0.00105575, dev acc 0.9311, dev avg loss 0.198962, throughput 9.41019K wps
[Epoch 137 Batch 30/162] avg loss 0.000985504, throughput 9.58277K wps
[Epoch 137 Batch 60/162] avg loss 0.00108232, throughput 9.46336K wps
[Epoch 137 Batch 90/162] avg loss 0.00111065, throughput 9.5478K wps
[Epoch 137 Batch 120/162] avg loss 0.000938915, throughput 9.51131K wps
[Epoch 137 Batch 150/162] avg loss 0.000987049, throughput 9.42027K wps
Begin Testing...
[Epoch 137] train avg loss 0.00104075, dev acc 0.9300, dev avg loss 0.198662, throughput 9.49439K wps
[Epoch 138 Batch 30/162] avg loss 0.000983391, throughput 9.45691K wps
[Epoch 138 Batch 60/162] avg loss 0.00104918, throughput 9.3775K wps
[Epoch 138 Batch 90/162] avg loss 0.00114148, throughput 9.29709K wps
[Epoch 138 Batch 120/162] avg loss 0.00105716, throughput 9.47397K wps
[Epoch 138 Batch 150/162] avg loss 0.000948318, throughput 9.33785K wps
Begin Testing...
[Epoch 138] train avg loss 0.00103061, dev acc 0.9322, dev avg loss 0.198622, throughput 9.37361K wps
Observed Improvement.
Begin Testing...
[Epoch 139 Batch 30/162] avg loss 0.00101436, throughput 9.4718K wps
[Epoch 139 Batch 60/162] avg loss 0.00108273, throughput 9.42683K wps
[Epoch 139 Batch 90/162] avg loss 0.00106279, throughput 9.25619K wps
[Epoch 139 Batch 120/162] avg loss 0.0011005, throughput 9.53493K wps
[Epoch 139 Batch 150/162] avg loss 0.00114215, throughput 9.4667K wps
Begin Testing...
[Epoch 139] train avg loss 0.00107766, dev acc 0.9267, dev avg loss 0.199262, throughput 9.43912K wps
[Epoch 140 Batch 30/162] avg loss 0.00115548, throughput 9.67493K wps
[Epoch 140 Batch 60/162] avg loss 0.000960541, throughput 9.40587K wps
[Epoch 140 Batch 90/162] avg loss 0.0010266, throughput 9.3468K wps
[Epoch 140 Batch 120/162] avg loss 0.00101439, throughput 9.3208K wps
[Epoch 140 Batch 150/162] avg loss 0.000913721, throughput 9.59213K wps
Begin Testing...
[Epoch 140] train avg loss 0.00102539, dev acc 0.9278, dev avg loss 0.199041, throughput 9.46759K wps
[Epoch 141 Batch 30/162] avg loss 0.00104014, throughput 9.61873K wps
[Epoch 141 Batch 60/162] avg loss 0.00105698, throughput 9.27773K wps
[Epoch 141 Batch 90/162] avg loss 0.00100694, throughput 9.45673K wps
[Epoch 141 Batch 120/162] avg loss 0.00105628, throughput 9.39983K wps
[Epoch 141 Batch 150/162] avg loss 0.00104997, throughput 9.34983K wps
Begin Testing...
[Epoch 141] train avg loss 0.00103387, dev acc 0.9322, dev avg loss 0.199055, throughput 9.40671K wps
Observed Improvement.
Begin Testing...
[Epoch 142 Batch 30/162] avg loss 0.000887658, throughput 9.55299K wps
[Epoch 142 Batch 60/162] avg loss 0.00104784, throughput 9.33178K wps
[Epoch 142 Batch 90/162] avg loss 0.000964878, throughput 9.32028K wps
[Epoch 142 Batch 120/162] avg loss 0.00104766, throughput 9.34455K wps
[Epoch 142 Batch 150/162] avg loss 0.00104082, throughput 9.29959K wps
Begin Testing...
[Epoch 142] train avg loss 0.000993459, dev acc 0.9344, dev avg loss 0.198146, throughput 9.36844K wps
Observed Improvement.
Begin Testing...
[Epoch 143 Batch 30/162] avg loss 0.000976041, throughput 9.57822K wps
[Epoch 143 Batch 60/162] avg loss 0.000971394, throughput 9.26075K wps
[Epoch 143 Batch 90/162] avg loss 0.000987658, throughput 9.34284K wps
[Epoch 143 Batch 120/162] avg loss 0.00105291, throughput 9.36807K wps
[Epoch 143 Batch 150/162] avg loss 0.000950739, throughput 9.50905K wps
Begin Testing...
[Epoch 143] train avg loss 0.000973638, dev acc 0.9267, dev avg loss 0.199737, throughput 9.41619K wps
[Epoch 144 Batch 30/162] avg loss 0.000952713, throughput 9.41633K wps
[Epoch 144 Batch 60/162] avg loss 0.00102474, throughput 9.38271K wps
[Epoch 144 Batch 90/162] avg loss 0.000948891, throughput 9.46628K wps
[Epoch 144 Batch 120/162] avg loss 0.000970719, throughput 9.53042K wps
[Epoch 144 Batch 150/162] avg loss 0.0010117, throughput 9.55971K wps
Begin Testing...
[Epoch 144] train avg loss 0.000975485, dev acc 0.9333, dev avg loss 0.198338, throughput 9.47488K wps
[Epoch 145 Batch 30/162] avg loss 0.000999671, throughput 9.57715K wps
[Epoch 145 Batch 60/162] avg loss 0.0010243, throughput 9.52931K wps
[Epoch 145 Batch 90/162] avg loss 0.000994705, throughput 9.38481K wps
[Epoch 145 Batch 120/162] avg loss 0.000852265, throughput 9.36572K wps
[Epoch 145 Batch 150/162] avg loss 0.000894153, throughput 9.31787K wps
Begin Testing...
[Epoch 145] train avg loss 0.000959826, dev acc 0.9322, dev avg loss 0.197951, throughput 9.42148K wps
[Epoch 146 Batch 30/162] avg loss 0.0009213, throughput 9.51997K wps
[Epoch 146 Batch 60/162] avg loss 0.000996457, throughput 9.52512K wps
[Epoch 146 Batch 90/162] avg loss 0.000974005, throughput 9.33838K wps
[Epoch 146 Batch 120/162] avg loss 0.000958056, throughput 9.38998K wps
[Epoch 146 Batch 150/162] avg loss 0.000898112, throughput 9.30961K wps
Begin Testing...
[Epoch 146] train avg loss 0.000944362, dev acc 0.9311, dev avg loss 0.198404, throughput 9.40715K wps
[Epoch 147 Batch 30/162] avg loss 0.000833541, throughput 9.51444K wps
[Epoch 147 Batch 60/162] avg loss 0.000823794, throughput 9.5389K wps
[Epoch 147 Batch 90/162] avg loss 0.00104564, throughput 9.4814K wps
[Epoch 147 Batch 120/162] avg loss 0.00101268, throughput 9.45189K wps
[Epoch 147 Batch 150/162] avg loss 0.00108087, throughput 9.47941K wps
Begin Testing...
[Epoch 147] train avg loss 0.000962311, dev acc 0.9322, dev avg loss 0.198059, throughput 9.48631K wps
[Epoch 148 Batch 30/162] avg loss 0.000961202, throughput 9.59616K wps
[Epoch 148 Batch 60/162] avg loss 0.000909715, throughput 9.2111K wps
[Epoch 148 Batch 90/162] avg loss 0.00097298, throughput 9.42474K wps
[Epoch 148 Batch 120/162] avg loss 0.000981802, throughput 9.46109K wps
[Epoch 148 Batch 150/162] avg loss 0.000926924, throughput 9.42846K wps
Begin Testing...
[Epoch 148] train avg loss 0.000952834, dev acc 0.9300, dev avg loss 0.198461, throughput 9.42231K wps
[Epoch 149 Batch 30/162] avg loss 0.00094544, throughput 9.59141K wps
[Epoch 149 Batch 60/162] avg loss 0.000798458, throughput 9.39647K wps
[Epoch 149 Batch 90/162] avg loss 0.000941157, throughput 9.48778K wps
[Epoch 149 Batch 120/162] avg loss 0.00103026, throughput 9.37045K wps
[Epoch 149 Batch 150/162] avg loss 0.00100158, throughput 9.4224K wps
Begin Testing...
[Epoch 149] train avg loss 0.00095037, dev acc 0.9322, dev avg loss 0.199338, throughput 9.45777K wps
[Epoch 150 Batch 30/162] avg loss 0.000941625, throughput 9.73412K wps
[Epoch 150 Batch 60/162] avg loss 0.000918315, throughput 9.47646K wps
[Epoch 150 Batch 90/162] avg loss 0.000879122, throughput 9.38709K wps
[Epoch 150 Batch 120/162] avg loss 0.000972516, throughput 9.3787K wps
[Epoch 150 Batch 150/162] avg loss 0.00093079, throughput 9.26623K wps
Begin Testing...
[Epoch 150] train avg loss 0.000932415, dev acc 0.9322, dev avg loss 0.198455, throughput 9.44923K wps
[Epoch 151 Batch 30/162] avg loss 0.00101544, throughput 9.52437K wps
[Epoch 151 Batch 60/162] avg loss 0.000847919, throughput 9.28802K wps
[Epoch 151 Batch 90/162] avg loss 0.000894011, throughput 9.41631K wps
[Epoch 151 Batch 120/162] avg loss 0.000859629, throughput 9.24288K wps
[Epoch 151 Batch 150/162] avg loss 0.000998474, throughput 9.22177K wps
Begin Testing...
[Epoch 151] train avg loss 0.000921964, dev acc 0.9289, dev avg loss 0.198911, throughput 9.33929K wps
[Epoch 152 Batch 30/162] avg loss 0.000965267, throughput 9.56187K wps
[Epoch 152 Batch 60/162] avg loss 0.000948554, throughput 9.28427K wps
[Epoch 152 Batch 90/162] avg loss 0.000796564, throughput 9.40172K wps
[Epoch 152 Batch 120/162] avg loss 0.000977773, throughput 9.28411K wps
[Epoch 152 Batch 150/162] avg loss 0.00079192, throughput 9.35199K wps
Begin Testing...
[Epoch 152] train avg loss 0.000899693, dev acc 0.9344, dev avg loss 0.198833, throughput 9.37892K wps
Observed Improvement.
Begin Testing...
[Epoch 153 Batch 30/162] avg loss 0.000805971, throughput 9.44939K wps
[Epoch 153 Batch 60/162] avg loss 0.000873838, throughput 9.18985K wps
[Epoch 153 Batch 90/162] avg loss 0.000994105, throughput 9.57794K wps
[Epoch 153 Batch 120/162] avg loss 0.000861975, throughput 9.38475K wps
[Epoch 153 Batch 150/162] avg loss 0.000864721, throughput 9.4132K wps
Begin Testing...
[Epoch 153] train avg loss 0.000902006, dev acc 0.9289, dev avg loss 0.199371, throughput 9.4132K wps
[Epoch 154 Batch 30/162] avg loss 0.000854582, throughput 9.56797K wps
[Epoch 154 Batch 60/162] avg loss 0.000876126, throughput 9.38431K wps
[Epoch 154 Batch 90/162] avg loss 0.000881669, throughput 9.35013K wps
[Epoch 154 Batch 120/162] avg loss 0.000845758, throughput 9.40708K wps
[Epoch 154 Batch 150/162] avg loss 0.000901686, throughput 9.38583K wps
Begin Testing...
[Epoch 154] train avg loss 0.000883258, dev acc 0.9311, dev avg loss 0.198855, throughput 9.4104K wps
[Epoch 155 Batch 30/162] avg loss 0.000877631, throughput 9.62206K wps
[Epoch 155 Batch 60/162] avg loss 0.00094612, throughput 9.55996K wps
[Epoch 155 Batch 90/162] avg loss 0.000838889, throughput 9.31259K wps
[Epoch 155 Batch 120/162] avg loss 0.000806639, throughput 9.41421K wps
[Epoch 155 Batch 150/162] avg loss 0.000920395, throughput 9.22436K wps
Begin Testing...
[Epoch 155] train avg loss 0.000877413, dev acc 0.9344, dev avg loss 0.199072, throughput 9.4179K wps
Observed Improvement.
Begin Testing...
[Epoch 156 Batch 30/162] avg loss 0.000917375, throughput 9.66404K wps
[Epoch 156 Batch 60/162] avg loss 0.000891523, throughput 9.58484K wps
[Epoch 156 Batch 90/162] avg loss 0.000843721, throughput 9.5537K wps
[Epoch 156 Batch 120/162] avg loss 0.000926243, throughput 9.5482K wps
[Epoch 156 Batch 150/162] avg loss 0.000885359, throughput 9.49341K wps
Begin Testing...
[Epoch 156] train avg loss 0.000888057, dev acc 0.9300, dev avg loss 0.199528, throughput 9.54813K wps
[Epoch 157 Batch 30/162] avg loss 0.000883461, throughput 9.73445K wps
[Epoch 157 Batch 60/162] avg loss 0.00081473, throughput 9.37778K wps
[Epoch 157 Batch 90/162] avg loss 0.000874448, throughput 9.42499K wps
[Epoch 157 Batch 120/162] avg loss 0.000875227, throughput 9.26044K wps
[Epoch 157 Batch 150/162] avg loss 0.000830335, throughput 9.40601K wps
Begin Testing...
[Epoch 157] train avg loss 0.000852687, dev acc 0.9311, dev avg loss 0.199195, throughput 9.43721K wps
[Epoch 158 Batch 30/162] avg loss 0.00079989, throughput 9.70167K wps
[Epoch 158 Batch 60/162] avg loss 0.000988619, throughput 9.3357K wps
[Epoch 158 Batch 90/162] avg loss 0.000847354, throughput 9.32659K wps
[Epoch 158 Batch 120/162] avg loss 0.000791548, throughput 9.37937K wps
[Epoch 158 Batch 150/162] avg loss 0.000873491, throughput 9.41736K wps
Begin Testing...
[Epoch 158] train avg loss 0.000855033, dev acc 0.9311, dev avg loss 0.19911, throughput 9.42603K wps
[Epoch 159 Batch 30/162] avg loss 0.00077084, throughput 9.67629K wps
[Epoch 159 Batch 60/162] avg loss 0.000712891, throughput 9.58881K wps
[Epoch 159 Batch 90/162] avg loss 0.000753351, throughput 9.41029K wps
[Epoch 159 Batch 120/162] avg loss 0.000934713, throughput 9.5363K wps
[Epoch 159 Batch 150/162] avg loss 0.000829498, throughput 9.24886K wps
Begin Testing...
[Epoch 159] train avg loss 0.000808835, dev acc 0.9311, dev avg loss 0.19963, throughput 9.47987K wps
[Epoch 160 Batch 30/162] avg loss 0.000808971, throughput 9.51016K wps
[Epoch 160 Batch 60/162] avg loss 0.000879163, throughput 9.48295K wps
[Epoch 160 Batch 90/162] avg loss 0.000803701, throughput 9.26398K wps
[Epoch 160 Batch 120/162] avg loss 0.000860717, throughput 9.23514K wps
[Epoch 160 Batch 150/162] avg loss 0.000750216, throughput 9.42975K wps
Begin Testing...
[Epoch 160] train avg loss 0.000829286, dev acc 0.9311, dev avg loss 0.199735, throughput 9.3919K wps
[Epoch 161 Batch 30/162] avg loss 0.000974678, throughput 9.54246K wps
[Epoch 161 Batch 60/162] avg loss 0.000845298, throughput 9.41976K wps
[Epoch 161 Batch 90/162] avg loss 0.000808872, throughput 9.37979K wps
[Epoch 161 Batch 120/162] avg loss 0.000905347, throughput 9.24486K wps
[Epoch 161 Batch 150/162] avg loss 0.000705771, throughput 9.40624K wps
Begin Testing...
[Epoch 161] train avg loss 0.000844436, dev acc 0.9289, dev avg loss 0.199593, throughput 9.41192K wps
[Epoch 162 Batch 30/162] avg loss 0.000703221, throughput 9.69891K wps
[Epoch 162 Batch 60/162] avg loss 0.000955533, throughput 9.26135K wps
[Epoch 162 Batch 90/162] avg loss 0.000947767, throughput 9.27048K wps
[Epoch 162 Batch 120/162] avg loss 0.000730136, throughput 9.36203K wps
[Epoch 162 Batch 150/162] avg loss 0.000868703, throughput 9.51287K wps
Begin Testing...
[Epoch 162] train avg loss 0.000837811, dev acc 0.9289, dev avg loss 0.201457, throughput 9.43234K wps
[Epoch 163 Batch 30/162] avg loss 0.000796378, throughput 9.63017K wps
[Epoch 163 Batch 60/162] avg loss 0.000855796, throughput 9.3025K wps
[Epoch 163 Batch 90/162] avg loss 0.000751803, throughput 9.40544K wps
[Epoch 163 Batch 120/162] avg loss 0.000802561, throughput 9.47464K wps
[Epoch 163 Batch 150/162] avg loss 0.000814194, throughput 9.3446K wps
Begin Testing...
[Epoch 163] train avg loss 0.000806808, dev acc 0.9311, dev avg loss 0.199386, throughput 9.4354K wps
[Epoch 164 Batch 30/162] avg loss 0.000694118, throughput 9.72815K wps
[Epoch 164 Batch 60/162] avg loss 0.000772537, throughput 9.35489K wps
[Epoch 164 Batch 90/162] avg loss 0.000798749, throughput 9.45838K wps
[Epoch 164 Batch 120/162] avg loss 0.000732335, throughput 9.42304K wps
[Epoch 164 Batch 150/162] avg loss 0.000873458, throughput 9.4681K wps
Begin Testing...
[Epoch 164] train avg loss 0.000774546, dev acc 0.9322, dev avg loss 0.199672, throughput 9.47183K wps
[Epoch 165 Batch 30/162] avg loss 0.000755503, throughput 9.50419K wps
[Epoch 165 Batch 60/162] avg loss 0.000855095, throughput 9.48564K wps
[Epoch 165 Batch 90/162] avg loss 0.000718117, throughput 9.33272K wps
[Epoch 165 Batch 120/162] avg loss 0.000749274, throughput 9.50652K wps
[Epoch 165 Batch 150/162] avg loss 0.000889802, throughput 9.43312K wps
Begin Testing...
[Epoch 165] train avg loss 0.000797565, dev acc 0.9322, dev avg loss 0.199214, throughput 9.43677K wps
[Epoch 166 Batch 30/162] avg loss 0.000786124, throughput 9.50313K wps
[Epoch 166 Batch 60/162] avg loss 0.000757047, throughput 9.40146K wps
[Epoch 166 Batch 90/162] avg loss 0.000846746, throughput 9.37472K wps
[Epoch 166 Batch 120/162] avg loss 0.000822213, throughput 9.53916K wps
[Epoch 166 Batch 150/162] avg loss 0.000710454, throughput 9.288K wps
Begin Testing...
[Epoch 166] train avg loss 0.000792623, dev acc 0.9311, dev avg loss 0.19945, throughput 9.41592K wps
[Epoch 167 Batch 30/162] avg loss 0.00078465, throughput 9.7727K wps
[Epoch 167 Batch 60/162] avg loss 0.000749043, throughput 9.39833K wps
[Epoch 167 Batch 90/162] avg loss 0.000751108, throughput 9.28201K wps
[Epoch 167 Batch 120/162] avg loss 0.000842619, throughput 9.51506K wps
[Epoch 167 Batch 150/162] avg loss 0.000828416, throughput 9.48429K wps
Begin Testing...
[Epoch 167] train avg loss 0.000775596, dev acc 0.9344, dev avg loss 0.199765, throughput 9.4774K wps
Observed Improvement.
Begin Testing...
[Epoch 168 Batch 30/162] avg loss 0.000750158, throughput 9.48357K wps
[Epoch 168 Batch 60/162] avg loss 0.000751583, throughput 9.45087K wps
[Epoch 168 Batch 90/162] avg loss 0.0008342, throughput 9.40624K wps
[Epoch 168 Batch 120/162] avg loss 0.000724395, throughput 9.41174K wps
[Epoch 168 Batch 150/162] avg loss 0.000741043, throughput 9.29438K wps
Begin Testing...
[Epoch 168] train avg loss 0.000766779, dev acc 0.9311, dev avg loss 0.199704, throughput 9.41703K wps
[Epoch 169 Batch 30/162] avg loss 0.00074829, throughput 9.50446K wps
[Epoch 169 Batch 60/162] avg loss 0.000697833, throughput 9.35994K wps
[Epoch 169 Batch 90/162] avg loss 0.000785212, throughput 9.47869K wps
[Epoch 169 Batch 120/162] avg loss 0.000862278, throughput 9.40258K wps
[Epoch 169 Batch 150/162] avg loss 0.000765823, throughput 9.44211K wps
Begin Testing...
[Epoch 169] train avg loss 0.000781461, dev acc 0.9289, dev avg loss 0.200178, throughput 9.42419K wps
[Epoch 170 Batch 30/162] avg loss 0.000665183, throughput 9.59721K wps
[Epoch 170 Batch 60/162] avg loss 0.000721135, throughput 9.40824K wps
[Epoch 170 Batch 90/162] avg loss 0.0007045, throughput 9.32782K wps
[Epoch 170 Batch 120/162] avg loss 0.000762113, throughput 9.30466K wps
[Epoch 170 Batch 150/162] avg loss 0.000850951, throughput 9.24936K wps
Begin Testing...
[Epoch 170] train avg loss 0.000734841, dev acc 0.9333, dev avg loss 0.200847, throughput 9.36183K wps
[Epoch 171 Batch 30/162] avg loss 0.000838534, throughput 9.62985K wps
[Epoch 171 Batch 60/162] avg loss 0.000800776, throughput 9.42853K wps
[Epoch 171 Batch 90/162] avg loss 0.000743843, throughput 9.41032K wps
[Epoch 171 Batch 120/162] avg loss 0.000806597, throughput 9.29138K wps
[Epoch 171 Batch 150/162] avg loss 0.00069783, throughput 9.50147K wps
Begin Testing...
[Epoch 171] train avg loss 0.000776812, dev acc 0.9300, dev avg loss 0.200217, throughput 9.43441K wps
[Epoch 172 Batch 30/162] avg loss 0.000723996, throughput 9.50593K wps
[Epoch 172 Batch 60/162] avg loss 0.000680573, throughput 9.47756K wps
[Epoch 172 Batch 90/162] avg loss 0.000735702, throughput 9.35396K wps
[Epoch 172 Batch 120/162] avg loss 0.000859069, throughput 9.27223K wps
[Epoch 172 Batch 150/162] avg loss 0.000791226, throughput 9.30891K wps
Begin Testing...
[Epoch 172] train avg loss 0.000755193, dev acc 0.9311, dev avg loss 0.199716, throughput 9.39513K wps
[Epoch 173 Batch 30/162] avg loss 0.000786021, throughput 9.63078K wps
[Epoch 173 Batch 60/162] avg loss 0.000690998, throughput 9.32488K wps
[Epoch 173 Batch 90/162] avg loss 0.000712721, throughput 9.39249K wps
[Epoch 173 Batch 120/162] avg loss 0.000668333, throughput 9.41696K wps
[Epoch 173 Batch 150/162] avg loss 0.000770785, throughput 9.43979K wps
Begin Testing...
[Epoch 173] train avg loss 0.000714309, dev acc 0.9311, dev avg loss 0.200101, throughput 9.41934K wps
[Epoch 174 Batch 30/162] avg loss 0.000684969, throughput 9.76961K wps
[Epoch 174 Batch 60/162] avg loss 0.000735002, throughput 9.3513K wps
[Epoch 174 Batch 90/162] avg loss 0.000772401, throughput 9.34882K wps
[Epoch 174 Batch 120/162] avg loss 0.000708095, throughput 9.59746K wps
[Epoch 174 Batch 150/162] avg loss 0.000749415, throughput 9.32475K wps
Begin Testing...
[Epoch 174] train avg loss 0.000722941, dev acc 0.9333, dev avg loss 0.200322, throughput 9.4791K wps
[Epoch 175 Batch 30/162] avg loss 0.000776178, throughput 9.55876K wps
[Epoch 175 Batch 60/162] avg loss 0.000646456, throughput 9.36128K wps
[Epoch 175 Batch 90/162] avg loss 0.00077226, throughput 9.23862K wps
[Epoch 175 Batch 120/162] avg loss 0.000751662, throughput 9.40855K wps
[Epoch 175 Batch 150/162] avg loss 0.000762339, throughput 9.50439K wps
Begin Testing...
[Epoch 175] train avg loss 0.000741196, dev acc 0.9311, dev avg loss 0.200233, throughput 9.40618K wps
[Epoch 176 Batch 30/162] avg loss 0.000718871, throughput 9.54854K wps
[Epoch 176 Batch 60/162] avg loss 0.000761122, throughput 9.29165K wps
[Epoch 176 Batch 90/162] avg loss 0.000761968, throughput 9.51773K wps
[Epoch 176 Batch 120/162] avg loss 0.000767324, throughput 9.45121K wps
[Epoch 176 Batch 150/162] avg loss 0.000621879, throughput 9.37256K wps
Begin Testing...
[Epoch 176] train avg loss 0.000716771, dev acc 0.9322, dev avg loss 0.199961, throughput 9.42582K wps
[Epoch 177 Batch 30/162] avg loss 0.000749861, throughput 9.62679K wps
[Epoch 177 Batch 60/162] avg loss 0.00070125, throughput 9.32116K wps
[Epoch 177 Batch 90/162] avg loss 0.000648393, throughput 9.36197K wps
[Epoch 177 Batch 120/162] avg loss 0.000711747, throughput 9.3052K wps
[Epoch 177 Batch 150/162] avg loss 0.000782345, throughput 9.31305K wps
Begin Testing...
[Epoch 177] train avg loss 0.000716524, dev acc 0.9300, dev avg loss 0.2003, throughput 9.36782K wps
[Epoch 178 Batch 30/162] avg loss 0.000694793, throughput 9.58033K wps
[Epoch 178 Batch 60/162] avg loss 0.000639935, throughput 9.37018K wps
[Epoch 178 Batch 90/162] avg loss 0.000586784, throughput 9.32866K wps
[Epoch 178 Batch 120/162] avg loss 0.000634432, throughput 9.41552K wps
[Epoch 178 Batch 150/162] avg loss 0.000779367, throughput 9.34433K wps
Begin Testing...
[Epoch 178] train avg loss 0.000680366, dev acc 0.9322, dev avg loss 0.200603, throughput 9.39735K wps
[Epoch 179 Batch 30/162] avg loss 0.000694844, throughput 9.45319K wps
[Epoch 179 Batch 60/162] avg loss 0.000702892, throughput 9.41916K wps
[Epoch 179 Batch 90/162] avg loss 0.000746897, throughput 9.54811K wps
[Epoch 179 Batch 120/162] avg loss 0.000635815, throughput 9.36965K wps
[Epoch 179 Batch 150/162] avg loss 0.000724774, throughput 9.27574K wps
Begin Testing...
[Epoch 179] train avg loss 0.000697136, dev acc 0.9333, dev avg loss 0.200535, throughput 9.42392K wps
[Epoch 180 Batch 30/162] avg loss 0.000765003, throughput 9.43349K wps
[Epoch 180 Batch 60/162] avg loss 0.00069129, throughput 9.36319K wps
[Epoch 180 Batch 90/162] avg loss 0.000677021, throughput 9.31895K wps
[Epoch 180 Batch 120/162] avg loss 0.000646422, throughput 9.31929K wps
[Epoch 180 Batch 150/162] avg loss 0.000598789, throughput 9.3553K wps
Begin Testing...
[Epoch 180] train avg loss 0.000685517, dev acc 0.9333, dev avg loss 0.201007, throughput 9.38013K wps
[Epoch 181 Batch 30/162] avg loss 0.000629044, throughput 9.54956K wps
[Epoch 181 Batch 60/162] avg loss 0.000800723, throughput 9.44998K wps
[Epoch 181 Batch 90/162] avg loss 0.000714992, throughput 9.32744K wps
[Epoch 181 Batch 120/162] avg loss 0.000725197, throughput 9.30969K wps
[Epoch 181 Batch 150/162] avg loss 0.000709321, throughput 9.35682K wps
Begin Testing...
[Epoch 181] train avg loss 0.000712253, dev acc 0.9333, dev avg loss 0.200674, throughput 9.41087K wps
[Epoch 182 Batch 30/162] avg loss 0.000742768, throughput 9.64101K wps
[Epoch 182 Batch 60/162] avg loss 0.000596728, throughput 9.33111K wps
[Epoch 182 Batch 90/162] avg loss 0.000640098, throughput 9.42153K wps
[Epoch 182 Batch 120/162] avg loss 0.000700641, throughput 9.40956K wps
[Epoch 182 Batch 150/162] avg loss 0.000668972, throughput 9.48947K wps
Begin Testing...
[Epoch 182] train avg loss 0.000669023, dev acc 0.9333, dev avg loss 0.200928, throughput 9.46244K wps
[Epoch 183 Batch 30/162] avg loss 0.000716692, throughput 9.61282K wps
[Epoch 183 Batch 60/162] avg loss 0.000643486, throughput 9.30336K wps
[Epoch 183 Batch 90/162] avg loss 0.000788289, throughput 9.30866K wps
[Epoch 183 Batch 120/162] avg loss 0.000729306, throughput 9.5337K wps
[Epoch 183 Batch 150/162] avg loss 0.000670616, throughput 9.38849K wps
Begin Testing...
[Epoch 183] train avg loss 0.000703653, dev acc 0.9333, dev avg loss 0.201005, throughput 9.41901K wps
[Epoch 184 Batch 30/162] avg loss 0.000672865, throughput 9.57261K wps
[Epoch 184 Batch 60/162] avg loss 0.000736718, throughput 9.30365K wps
[Epoch 184 Batch 90/162] avg loss 0.000653935, throughput 9.46661K wps
[Epoch 184 Batch 120/162] avg loss 0.000689856, throughput 9.49916K wps
[Epoch 184 Batch 150/162] avg loss 0.000644164, throughput 9.50608K wps
Begin Testing...
[Epoch 184] train avg loss 0.000681758, dev acc 0.9333, dev avg loss 0.20135, throughput 9.46569K wps
[Epoch 185 Batch 30/162] avg loss 0.000623803, throughput 9.54737K wps
[Epoch 185 Batch 60/162] avg loss 0.000700642, throughput 9.57664K wps
[Epoch 185 Batch 90/162] avg loss 0.000708221, throughput 9.4525K wps
[Epoch 185 Batch 120/162] avg loss 0.000657087, throughput 9.38965K wps
[Epoch 185 Batch 150/162] avg loss 0.000690085, throughput 9.19199K wps
Begin Testing...
[Epoch 185] train avg loss 0.000683682, dev acc 0.9322, dev avg loss 0.201315, throughput 9.43165K wps
[Epoch 186 Batch 30/162] avg loss 0.000609656, throughput 9.55662K wps
[Epoch 186 Batch 60/162] avg loss 0.000669506, throughput 9.30114K wps
[Epoch 186 Batch 90/162] avg loss 0.000723473, throughput 9.32232K wps
[Epoch 186 Batch 120/162] avg loss 0.000599758, throughput 9.54169K wps
[Epoch 186 Batch 150/162] avg loss 0.000777129, throughput 9.25565K wps
Begin Testing...
[Epoch 186] train avg loss 0.00066921, dev acc 0.9333, dev avg loss 0.20093, throughput 9.39597K wps
[Epoch 187 Batch 30/162] avg loss 0.000697954, throughput 9.56033K wps
[Epoch 187 Batch 60/162] avg loss 0.000675127, throughput 9.35616K wps
[Epoch 187 Batch 90/162] avg loss 0.000613366, throughput 9.27318K wps
[Epoch 187 Batch 120/162] avg loss 0.000621279, throughput 9.39594K wps
[Epoch 187 Batch 150/162] avg loss 0.000611037, throughput 9.32422K wps
Begin Testing...
[Epoch 187] train avg loss 0.000645539, dev acc 0.9344, dev avg loss 0.201274, throughput 9.38123K wps
Observed Improvement.
Begin Testing...
[Epoch 188 Batch 30/162] avg loss 0.000683857, throughput 9.56185K wps
[Epoch 188 Batch 60/162] avg loss 0.000638752, throughput 9.33461K wps
[Epoch 188 Batch 90/162] avg loss 0.000626189, throughput 9.38546K wps
[Epoch 188 Batch 120/162] avg loss 0.000655623, throughput 9.36592K wps
[Epoch 188 Batch 150/162] avg loss 0.000704469, throughput 9.31537K wps
Begin Testing...
[Epoch 188] train avg loss 0.000660655, dev acc 0.9344, dev avg loss 0.202151, throughput 9.37124K wps
Observed Improvement.
Begin Testing...
[Epoch 189 Batch 30/162] avg loss 0.000737478, throughput 9.59331K wps
[Epoch 189 Batch 60/162] avg loss 0.000649084, throughput 9.3144K wps
[Epoch 189 Batch 90/162] avg loss 0.000635279, throughput 9.44601K wps
[Epoch 189 Batch 120/162] avg loss 0.000621244, throughput 9.50741K wps
[Epoch 189 Batch 150/162] avg loss 0.000643737, throughput 9.30601K wps
Begin Testing...
[Epoch 189] train avg loss 0.000656525, dev acc 0.9333, dev avg loss 0.20168, throughput 9.42289K wps
[Epoch 190 Batch 30/162] avg loss 0.00060197, throughput 9.54444K wps
[Epoch 190 Batch 60/162] avg loss 0.000594402, throughput 9.55803K wps
[Epoch 190 Batch 90/162] avg loss 0.000541556, throughput 9.3759K wps
[Epoch 190 Batch 120/162] avg loss 0.000611504, throughput 9.29774K wps
[Epoch 190 Batch 150/162] avg loss 0.0006616, throughput 9.23232K wps
Begin Testing...
[Epoch 190] train avg loss 0.000611076, dev acc 0.9322, dev avg loss 0.202138, throughput 9.3941K wps
[Epoch 191 Batch 30/162] avg loss 0.000583513, throughput 9.54981K wps
[Epoch 191 Batch 60/162] avg loss 0.000607934, throughput 9.38146K wps
[Epoch 191 Batch 90/162] avg loss 0.000771854, throughput 9.33102K wps
[Epoch 191 Batch 120/162] avg loss 0.000666878, throughput 9.27158K wps
[Epoch 191 Batch 150/162] avg loss 0.000567977, throughput 9.40895K wps
Begin Testing...
[Epoch 191] train avg loss 0.000644209, dev acc 0.9333, dev avg loss 0.20221, throughput 9.39202K wps
[Epoch 192 Batch 30/162] avg loss 0.000730085, throughput 9.49941K wps
[Epoch 192 Batch 60/162] avg loss 0.000583115, throughput 9.39319K wps
[Epoch 192 Batch 90/162] avg loss 0.000663236, throughput 9.45732K wps
[Epoch 192 Batch 120/162] avg loss 0.000608084, throughput 9.51189K wps
[Epoch 192 Batch 150/162] avg loss 0.000570181, throughput 9.56225K wps
Begin Testing...
[Epoch 192] train avg loss 0.000618323, dev acc 0.9322, dev avg loss 0.202221, throughput 9.45856K wps
[Epoch 193 Batch 30/162] avg loss 0.000572176, throughput 9.60178K wps
[Epoch 193 Batch 60/162] avg loss 0.000565171, throughput 9.22527K wps
[Epoch 193 Batch 90/162] avg loss 0.000631348, throughput 9.47041K wps
[Epoch 193 Batch 120/162] avg loss 0.000595776, throughput 9.33119K wps
[Epoch 193 Batch 150/162] avg loss 0.000636903, throughput 9.37086K wps
Begin Testing...
[Epoch 193] train avg loss 0.000610488, dev acc 0.9344, dev avg loss 0.202612, throughput 9.40767K wps
Observed Improvement.
Begin Testing...
[Epoch 194 Batch 30/162] avg loss 0.000685029, throughput 9.54745K wps
[Epoch 194 Batch 60/162] avg loss 0.00062756, throughput 9.2748K wps
[Epoch 194 Batch 90/162] avg loss 0.000635914, throughput 9.31694K wps
[Epoch 194 Batch 120/162] avg loss 0.000535939, throughput 9.36308K wps
[Epoch 194 Batch 150/162] avg loss 0.000592485, throughput 9.24411K wps
Begin Testing...
[Epoch 194] train avg loss 0.000609878, dev acc 0.9300, dev avg loss 0.202956, throughput 9.34897K wps
[Epoch 195 Batch 30/162] avg loss 0.000644012, throughput 9.55858K wps
[Epoch 195 Batch 60/162] avg loss 0.000567187, throughput 9.33549K wps
[Epoch 195 Batch 90/162] avg loss 0.00067197, throughput 9.34275K wps
[Epoch 195 Batch 120/162] avg loss 0.000632038, throughput 9.24271K wps
[Epoch 195 Batch 150/162] avg loss 0.000684886, throughput 9.475K wps
Begin Testing...
[Epoch 195] train avg loss 0.000636783, dev acc 0.9322, dev avg loss 0.202272, throughput 9.38602K wps
[Epoch 196 Batch 30/162] avg loss 0.000696987, throughput 9.51412K wps
[Epoch 196 Batch 60/162] avg loss 0.000623682, throughput 9.47651K wps
[Epoch 196 Batch 90/162] avg loss 0.000584644, throughput 9.39549K wps
[Epoch 196 Batch 120/162] avg loss 0.000528887, throughput 9.49222K wps
[Epoch 196 Batch 150/162] avg loss 0.0005824, throughput 9.44647K wps
Begin Testing...
[Epoch 196] train avg loss 0.000590905, dev acc 0.9333, dev avg loss 0.202797, throughput 9.45447K wps
[Epoch 197 Batch 30/162] avg loss 0.000651228, throughput 9.63591K wps
[Epoch 197 Batch 60/162] avg loss 0.000498823, throughput 9.3449K wps
[Epoch 197 Batch 90/162] avg loss 0.000651713, throughput 9.45516K wps
[Epoch 197 Batch 120/162] avg loss 0.000642572, throughput 9.35166K wps
[Epoch 197 Batch 150/162] avg loss 0.000611978, throughput 9.3093K wps
Begin Testing...
[Epoch 197] train avg loss 0.000606933, dev acc 0.9322, dev avg loss 0.202357, throughput 9.40384K wps
[Epoch 198 Batch 30/162] avg loss 0.000587599, throughput 9.59051K wps
[Epoch 198 Batch 60/162] avg loss 0.000625381, throughput 9.40191K wps
[Epoch 198 Batch 90/162] avg loss 0.000543357, throughput 9.46783K wps
[Epoch 198 Batch 120/162] avg loss 0.000748192, throughput 9.31532K wps
[Epoch 198 Batch 150/162] avg loss 0.000580954, throughput 9.30742K wps
Begin Testing...
[Epoch 198] train avg loss 0.000609185, dev acc 0.9344, dev avg loss 0.202302, throughput 9.4086K wps
Observed Improvement.
Begin Testing...
[Epoch 199 Batch 30/162] avg loss 0.00056139, throughput 9.52406K wps
[Epoch 199 Batch 60/162] avg loss 0.000584519, throughput 9.24864K wps
[Epoch 199 Batch 90/162] avg loss 0.000607698, throughput 9.32767K wps
[Epoch 199 Batch 120/162] avg loss 0.000543109, throughput 9.31358K wps
[Epoch 199 Batch 150/162] avg loss 0.00053073, throughput 9.47223K wps
Begin Testing...
[Epoch 199] train avg loss 0.000561443, dev acc 0.9311, dev avg loss 0.202647, throughput 9.37277K wps
Test loss 0.234349, test acc 0.9150
Total time cost 445.65s
[Epoch 0 Batch 30/162] avg loss 0.0140324, throughput 7.31594K wps
[Epoch 0 Batch 60/162] avg loss 0.0137697, throughput 9.53558K wps
[Epoch 0 Batch 90/162] avg loss 0.013667, throughput 9.42413K wps
[Epoch 0 Batch 120/162] avg loss 0.0135796, throughput 9.41052K wps
[Epoch 0 Batch 150/162] avg loss 0.0135132, throughput 9.30372K wps
Begin Testing...
[Epoch 0] train avg loss 0.0136906, dev acc 0.6933, dev avg loss 0.663671, throughput 8.95059K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/162] avg loss 0.0132853, throughput 9.56229K wps
[Epoch 1 Batch 60/162] avg loss 0.0132598, throughput 9.34035K wps
[Epoch 1 Batch 90/162] avg loss 0.0131167, throughput 9.30347K wps
[Epoch 1 Batch 120/162] avg loss 0.0128403, throughput 9.25713K wps
[Epoch 1 Batch 150/162] avg loss 0.0128463, throughput 9.50765K wps
Begin Testing...
[Epoch 1] train avg loss 0.0130497, dev acc 0.8122, dev avg loss 0.635224, throughput 9.38824K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/162] avg loss 0.0126278, throughput 9.46677K wps
[Epoch 2 Batch 60/162] avg loss 0.0126171, throughput 9.33006K wps
[Epoch 2 Batch 90/162] avg loss 0.0123981, throughput 9.42421K wps
[Epoch 2 Batch 120/162] avg loss 0.0123127, throughput 9.41583K wps
[Epoch 2 Batch 150/162] avg loss 0.0121766, throughput 9.29956K wps
Begin Testing...
[Epoch 2] train avg loss 0.0124066, dev acc 0.8222, dev avg loss 0.602879, throughput 9.39711K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/162] avg loss 0.012074, throughput 9.77186K wps
[Epoch 3 Batch 60/162] avg loss 0.0119799, throughput 9.41283K wps
[Epoch 3 Batch 90/162] avg loss 0.0116279, throughput 9.50668K wps
[Epoch 3 Batch 120/162] avg loss 0.0116906, throughput 9.35647K wps
[Epoch 3 Batch 150/162] avg loss 0.0114773, throughput 9.56254K wps
Begin Testing...
[Epoch 3] train avg loss 0.0117511, dev acc 0.8456, dev avg loss 0.566611, throughput 9.49692K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/162] avg loss 0.0113544, throughput 9.40342K wps
[Epoch 4 Batch 60/162] avg loss 0.0110726, throughput 9.44167K wps
[Epoch 4 Batch 90/162] avg loss 0.0109634, throughput 9.49997K wps
[Epoch 4 Batch 120/162] avg loss 0.0108205, throughput 9.45509K wps
[Epoch 4 Batch 150/162] avg loss 0.0105801, throughput 9.43651K wps
Begin Testing...
[Epoch 4] train avg loss 0.0109219, dev acc 0.8589, dev avg loss 0.526782, throughput 9.43199K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/162] avg loss 0.0104652, throughput 9.6518K wps
[Epoch 5 Batch 60/162] avg loss 0.0104402, throughput 9.27921K wps
[Epoch 5 Batch 90/162] avg loss 0.0102351, throughput 9.39766K wps
[Epoch 5 Batch 120/162] avg loss 0.00998377, throughput 9.39601K wps
[Epoch 5 Batch 150/162] avg loss 0.00978893, throughput 9.24047K wps
Begin Testing...
[Epoch 5] train avg loss 0.0101777, dev acc 0.8667, dev avg loss 0.488422, throughput 9.40287K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/162] avg loss 0.00962471, throughput 9.50385K wps
[Epoch 6 Batch 60/162] avg loss 0.00960506, throughput 9.22368K wps
[Epoch 6 Batch 90/162] avg loss 0.00940988, throughput 9.52823K wps
[Epoch 6 Batch 120/162] avg loss 0.00923151, throughput 9.32226K wps
[Epoch 6 Batch 150/162] avg loss 0.00923229, throughput 9.39732K wps
Begin Testing...
[Epoch 6] train avg loss 0.00941625, dev acc 0.8600, dev avg loss 0.453629, throughput 9.38266K wps
[Epoch 7 Batch 30/162] avg loss 0.00911647, throughput 9.38688K wps
[Epoch 7 Batch 60/162] avg loss 0.00882835, throughput 9.35864K wps
[Epoch 7 Batch 90/162] avg loss 0.00889105, throughput 9.25503K wps
[Epoch 7 Batch 120/162] avg loss 0.00855239, throughput 9.38802K wps
[Epoch 7 Batch 150/162] avg loss 0.00855025, throughput 9.38374K wps
Begin Testing...
[Epoch 7] train avg loss 0.00878858, dev acc 0.8667, dev avg loss 0.423234, throughput 9.33794K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/162] avg loss 0.00859895, throughput 9.50349K wps
[Epoch 8 Batch 60/162] avg loss 0.00849058, throughput 9.48742K wps
[Epoch 8 Batch 90/162] avg loss 0.00805606, throughput 9.36248K wps
[Epoch 8 Batch 120/162] avg loss 0.00792628, throughput 9.46156K wps
[Epoch 8 Batch 150/162] avg loss 0.00799315, throughput 9.52324K wps
Begin Testing...
[Epoch 8] train avg loss 0.00823134, dev acc 0.8722, dev avg loss 0.398932, throughput 9.46574K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/162] avg loss 0.00803663, throughput 9.69821K wps
[Epoch 9 Batch 60/162] avg loss 0.00798854, throughput 9.54757K wps
[Epoch 9 Batch 90/162] avg loss 0.00790172, throughput 9.50315K wps
[Epoch 9 Batch 120/162] avg loss 0.00761154, throughput 9.29564K wps
[Epoch 9 Batch 150/162] avg loss 0.0076133, throughput 9.59119K wps
Begin Testing...
[Epoch 9] train avg loss 0.00779802, dev acc 0.8767, dev avg loss 0.377216, throughput 9.52225K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/162] avg loss 0.00754552, throughput 9.46628K wps
[Epoch 10 Batch 60/162] avg loss 0.00787831, throughput 9.51402K wps
[Epoch 10 Batch 90/162] avg loss 0.00731, throughput 9.3669K wps
[Epoch 10 Batch 120/162] avg loss 0.00692783, throughput 9.52606K wps
[Epoch 10 Batch 150/162] avg loss 0.00738703, throughput 9.33564K wps
Begin Testing...
[Epoch 10] train avg loss 0.00739839, dev acc 0.8811, dev avg loss 0.360572, throughput 9.41177K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/162] avg loss 0.00698418, throughput 9.47101K wps
[Epoch 11 Batch 60/162] avg loss 0.00701242, throughput 9.54302K wps
[Epoch 11 Batch 90/162] avg loss 0.00728473, throughput 9.3188K wps
[Epoch 11 Batch 120/162] avg loss 0.00717908, throughput 9.35873K wps
[Epoch 11 Batch 150/162] avg loss 0.00703294, throughput 9.44976K wps
Begin Testing...
[Epoch 11] train avg loss 0.00709986, dev acc 0.8844, dev avg loss 0.346173, throughput 9.42604K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/162] avg loss 0.00696089, throughput 9.63775K wps
[Epoch 12 Batch 60/162] avg loss 0.00690618, throughput 9.33674K wps
[Epoch 12 Batch 90/162] avg loss 0.00688934, throughput 9.55114K wps
[Epoch 12 Batch 120/162] avg loss 0.006897, throughput 9.55849K wps
[Epoch 12 Batch 150/162] avg loss 0.00638437, throughput 9.29649K wps
Begin Testing...
[Epoch 12] train avg loss 0.00680112, dev acc 0.8856, dev avg loss 0.334045, throughput 9.4697K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/162] avg loss 0.00664919, throughput 9.41764K wps
[Epoch 13 Batch 60/162] avg loss 0.00679228, throughput 9.43987K wps
[Epoch 13 Batch 90/162] avg loss 0.00633877, throughput 9.26264K wps
[Epoch 13 Batch 120/162] avg loss 0.00683719, throughput 9.44104K wps
[Epoch 13 Batch 150/162] avg loss 0.00643042, throughput 9.16669K wps
Begin Testing...
[Epoch 13] train avg loss 0.00661544, dev acc 0.8867, dev avg loss 0.324257, throughput 9.34391K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/162] avg loss 0.00618894, throughput 9.48752K wps
[Epoch 14 Batch 60/162] avg loss 0.0063971, throughput 9.43725K wps
[Epoch 14 Batch 90/162] avg loss 0.00644065, throughput 9.27515K wps
[Epoch 14 Batch 120/162] avg loss 0.00612615, throughput 9.38826K wps
[Epoch 14 Batch 150/162] avg loss 0.00650961, throughput 9.22306K wps
Begin Testing...
[Epoch 14] train avg loss 0.00634747, dev acc 0.8900, dev avg loss 0.315319, throughput 9.36034K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/162] avg loss 0.0059935, throughput 9.46308K wps
[Epoch 15 Batch 60/162] avg loss 0.00645696, throughput 9.29715K wps
[Epoch 15 Batch 90/162] avg loss 0.00609282, throughput 9.34504K wps
[Epoch 15 Batch 120/162] avg loss 0.00612239, throughput 9.34152K wps
[Epoch 15 Batch 150/162] avg loss 0.00620774, throughput 9.44577K wps
Begin Testing...
[Epoch 15] train avg loss 0.00616604, dev acc 0.8878, dev avg loss 0.310088, throughput 9.3773K wps
[Epoch 16 Batch 30/162] avg loss 0.00619185, throughput 9.66085K wps
[Epoch 16 Batch 60/162] avg loss 0.00597811, throughput 9.27337K wps
[Epoch 16 Batch 90/162] avg loss 0.00608168, throughput 9.34676K wps
[Epoch 16 Batch 120/162] avg loss 0.0060234, throughput 9.29497K wps
[Epoch 16 Batch 150/162] avg loss 0.00622411, throughput 9.30938K wps
Begin Testing...
[Epoch 16] train avg loss 0.00607492, dev acc 0.8911, dev avg loss 0.301734, throughput 9.36725K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/162] avg loss 0.00532383, throughput 9.71143K wps
[Epoch 17 Batch 60/162] avg loss 0.00631796, throughput 9.25044K wps
[Epoch 17 Batch 90/162] avg loss 0.00572296, throughput 9.53512K wps
[Epoch 17 Batch 120/162] avg loss 0.0059069, throughput 9.46149K wps
[Epoch 17 Batch 150/162] avg loss 0.00613275, throughput 9.26207K wps
Begin Testing...
[Epoch 17] train avg loss 0.00588974, dev acc 0.8978, dev avg loss 0.29529, throughput 9.43012K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/162] avg loss 0.0063085, throughput 9.71707K wps
[Epoch 18 Batch 60/162] avg loss 0.00548676, throughput 9.34587K wps
[Epoch 18 Batch 90/162] avg loss 0.00558735, throughput 9.44843K wps
[Epoch 18 Batch 120/162] avg loss 0.00593278, throughput 9.43432K wps
[Epoch 18 Batch 150/162] avg loss 0.00505024, throughput 9.37806K wps
Begin Testing...
[Epoch 18] train avg loss 0.00571916, dev acc 0.8911, dev avg loss 0.289942, throughput 9.46878K wps
[Epoch 19 Batch 30/162] avg loss 0.00602063, throughput 9.39825K wps
[Epoch 19 Batch 60/162] avg loss 0.00534838, throughput 9.27807K wps
[Epoch 19 Batch 90/162] avg loss 0.00539531, throughput 9.4522K wps
[Epoch 19 Batch 120/162] avg loss 0.00563241, throughput 9.57426K wps
[Epoch 19 Batch 150/162] avg loss 0.00549891, throughput 9.39374K wps
Begin Testing...
[Epoch 19] train avg loss 0.00558118, dev acc 0.8944, dev avg loss 0.287854, throughput 9.41553K wps
[Epoch 20 Batch 30/162] avg loss 0.00539524, throughput 9.51736K wps
[Epoch 20 Batch 60/162] avg loss 0.00532709, throughput 9.32092K wps
[Epoch 20 Batch 90/162] avg loss 0.00552385, throughput 9.39393K wps
[Epoch 20 Batch 120/162] avg loss 0.00528053, throughput 9.40094K wps
[Epoch 20 Batch 150/162] avg loss 0.00551267, throughput 9.52752K wps
Begin Testing...
[Epoch 20] train avg loss 0.00543856, dev acc 0.8978, dev avg loss 0.280875, throughput 9.43418K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/162] avg loss 0.00546986, throughput 9.67087K wps
[Epoch 21 Batch 60/162] avg loss 0.00544896, throughput 9.24313K wps
[Epoch 21 Batch 90/162] avg loss 0.00513897, throughput 9.25813K wps
[Epoch 21 Batch 120/162] avg loss 0.0058843, throughput 9.52954K wps
[Epoch 21 Batch 150/162] avg loss 0.00520905, throughput 9.31865K wps
Begin Testing...
[Epoch 21] train avg loss 0.00543021, dev acc 0.9022, dev avg loss 0.277326, throughput 9.38967K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/162] avg loss 0.0052814, throughput 9.66198K wps
[Epoch 22 Batch 60/162] avg loss 0.00526309, throughput 9.30709K wps
[Epoch 22 Batch 90/162] avg loss 0.00492756, throughput 9.33057K wps
[Epoch 22 Batch 120/162] avg loss 0.00540431, throughput 9.36646K wps
[Epoch 22 Batch 150/162] avg loss 0.0049414, throughput 9.31729K wps
Begin Testing...
[Epoch 22] train avg loss 0.00519418, dev acc 0.8989, dev avg loss 0.2743, throughput 9.38903K wps
[Epoch 23 Batch 30/162] avg loss 0.00495773, throughput 9.58527K wps
[Epoch 23 Batch 60/162] avg loss 0.00509811, throughput 9.28268K wps
[Epoch 23 Batch 90/162] avg loss 0.00522338, throughput 9.28067K wps
[Epoch 23 Batch 120/162] avg loss 0.00526962, throughput 9.36207K wps
[Epoch 23 Batch 150/162] avg loss 0.00513892, throughput 9.34404K wps
Begin Testing...
[Epoch 23] train avg loss 0.00515225, dev acc 0.9000, dev avg loss 0.271179, throughput 9.38602K wps
[Epoch 24 Batch 30/162] avg loss 0.00527121, throughput 9.49717K wps
[Epoch 24 Batch 60/162] avg loss 0.00527566, throughput 9.24494K wps
[Epoch 24 Batch 90/162] avg loss 0.00490714, throughput 9.44236K wps
[Epoch 24 Batch 120/162] avg loss 0.00485533, throughput 9.46745K wps
[Epoch 24 Batch 150/162] avg loss 0.0054804, throughput 9.37747K wps
Begin Testing...
[Epoch 24] train avg loss 0.00509567, dev acc 0.9022, dev avg loss 0.266359, throughput 9.39867K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/162] avg loss 0.00514784, throughput 9.56632K wps
[Epoch 25 Batch 60/162] avg loss 0.00523556, throughput 9.42523K wps
[Epoch 25 Batch 90/162] avg loss 0.00474505, throughput 9.22163K wps
[Epoch 25 Batch 120/162] avg loss 0.0051623, throughput 9.41804K wps
[Epoch 25 Batch 150/162] avg loss 0.0046691, throughput 9.29541K wps
Begin Testing...
[Epoch 25] train avg loss 0.00495063, dev acc 0.9011, dev avg loss 0.264198, throughput 9.37651K wps
[Epoch 26 Batch 30/162] avg loss 0.00481072, throughput 9.58693K wps
[Epoch 26 Batch 60/162] avg loss 0.00515807, throughput 9.33722K wps
[Epoch 26 Batch 90/162] avg loss 0.00489274, throughput 9.53344K wps
[Epoch 26 Batch 120/162] avg loss 0.00501662, throughput 9.52266K wps
[Epoch 26 Batch 150/162] avg loss 0.00482542, throughput 9.29128K wps
Begin Testing...
[Epoch 26] train avg loss 0.00494098, dev acc 0.9033, dev avg loss 0.262455, throughput 9.43532K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/162] avg loss 0.00505959, throughput 9.53086K wps
[Epoch 27 Batch 60/162] avg loss 0.00461797, throughput 9.24031K wps
[Epoch 27 Batch 90/162] avg loss 0.00468589, throughput 9.47949K wps
[Epoch 27 Batch 120/162] avg loss 0.00475254, throughput 9.49519K wps
[Epoch 27 Batch 150/162] avg loss 0.0049145, throughput 9.49342K wps
Begin Testing...
[Epoch 27] train avg loss 0.00478827, dev acc 0.9044, dev avg loss 0.257602, throughput 9.42375K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/162] avg loss 0.00498717, throughput 9.4896K wps
[Epoch 28 Batch 60/162] avg loss 0.0044837, throughput 9.29405K wps
[Epoch 28 Batch 90/162] avg loss 0.0047602, throughput 9.22584K wps
[Epoch 28 Batch 120/162] avg loss 0.0044738, throughput 9.46349K wps
[Epoch 28 Batch 150/162] avg loss 0.00476912, throughput 9.28657K wps
Begin Testing...
[Epoch 28] train avg loss 0.00471349, dev acc 0.9044, dev avg loss 0.25515, throughput 9.33295K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/162] avg loss 0.00475557, throughput 9.5363K wps
[Epoch 29 Batch 60/162] avg loss 0.00439865, throughput 9.46315K wps
[Epoch 29 Batch 90/162] avg loss 0.00452953, throughput 9.27957K wps
[Epoch 29 Batch 120/162] avg loss 0.00509885, throughput 9.47118K wps
[Epoch 29 Batch 150/162] avg loss 0.00460024, throughput 9.34474K wps
Begin Testing...
[Epoch 29] train avg loss 0.00467039, dev acc 0.9078, dev avg loss 0.25361, throughput 9.41418K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/162] avg loss 0.00423084, throughput 9.57885K wps
[Epoch 30 Batch 60/162] avg loss 0.00505471, throughput 9.35229K wps
[Epoch 30 Batch 90/162] avg loss 0.00457207, throughput 9.30255K wps
[Epoch 30 Batch 120/162] avg loss 0.00457113, throughput 9.22787K wps
[Epoch 30 Batch 150/162] avg loss 0.00462759, throughput 9.35478K wps
Begin Testing...
[Epoch 30] train avg loss 0.00458903, dev acc 0.9089, dev avg loss 0.250859, throughput 9.34796K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/162] avg loss 0.00457845, throughput 9.7521K wps
[Epoch 31 Batch 60/162] avg loss 0.00443692, throughput 9.44723K wps
[Epoch 31 Batch 90/162] avg loss 0.0042473, throughput 9.49401K wps
[Epoch 31 Batch 120/162] avg loss 0.00449582, throughput 9.37604K wps
[Epoch 31 Batch 150/162] avg loss 0.00490598, throughput 9.27065K wps
Begin Testing...
[Epoch 31] train avg loss 0.00448793, dev acc 0.9044, dev avg loss 0.250111, throughput 9.47686K wps
[Epoch 32 Batch 30/162] avg loss 0.00438358, throughput 9.63072K wps
[Epoch 32 Batch 60/162] avg loss 0.00459197, throughput 9.30356K wps
[Epoch 32 Batch 90/162] avg loss 0.00415089, throughput 9.39326K wps
[Epoch 32 Batch 120/162] avg loss 0.00419427, throughput 9.38156K wps
[Epoch 32 Batch 150/162] avg loss 0.00442435, throughput 9.38964K wps
Begin Testing...
[Epoch 32] train avg loss 0.00434168, dev acc 0.9067, dev avg loss 0.247303, throughput 9.4318K wps
[Epoch 33 Batch 30/162] avg loss 0.00438759, throughput 9.72087K wps
[Epoch 33 Batch 60/162] avg loss 0.00426632, throughput 9.18907K wps
[Epoch 33 Batch 90/162] avg loss 0.00416034, throughput 9.44921K wps
[Epoch 33 Batch 120/162] avg loss 0.00445434, throughput 9.52463K wps
[Epoch 33 Batch 150/162] avg loss 0.00432861, throughput 9.50081K wps
Begin Testing...
[Epoch 33] train avg loss 0.00434186, dev acc 0.9089, dev avg loss 0.244891, throughput 9.46679K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/162] avg loss 0.00439183, throughput 9.5572K wps
[Epoch 34 Batch 60/162] avg loss 0.00444066, throughput 9.33924K wps
[Epoch 34 Batch 90/162] avg loss 0.00433305, throughput 9.49454K wps
[Epoch 34 Batch 120/162] avg loss 0.00399763, throughput 9.40104K wps
[Epoch 34 Batch 150/162] avg loss 0.0041683, throughput 9.36359K wps
Begin Testing...
[Epoch 34] train avg loss 0.0042618, dev acc 0.9122, dev avg loss 0.2424, throughput 9.41809K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/162] avg loss 0.00396095, throughput 9.53064K wps
[Epoch 35 Batch 60/162] avg loss 0.00422483, throughput 9.40817K wps
[Epoch 35 Batch 90/162] avg loss 0.0042559, throughput 9.3102K wps
[Epoch 35 Batch 120/162] avg loss 0.00440011, throughput 9.30157K wps
[Epoch 35 Batch 150/162] avg loss 0.00414947, throughput 9.4243K wps
Begin Testing...
[Epoch 35] train avg loss 0.00420051, dev acc 0.9078, dev avg loss 0.240934, throughput 9.3835K wps
[Epoch 36 Batch 30/162] avg loss 0.00419109, throughput 9.68718K wps
[Epoch 36 Batch 60/162] avg loss 0.00415082, throughput 9.38639K wps
[Epoch 36 Batch 90/162] avg loss 0.00407759, throughput 9.31439K wps
[Epoch 36 Batch 120/162] avg loss 0.00443489, throughput 9.37311K wps
[Epoch 36 Batch 150/162] avg loss 0.00407228, throughput 9.56743K wps
Begin Testing...
[Epoch 36] train avg loss 0.00418269, dev acc 0.9067, dev avg loss 0.240895, throughput 9.44762K wps
[Epoch 37 Batch 30/162] avg loss 0.00423626, throughput 9.45649K wps
[Epoch 37 Batch 60/162] avg loss 0.00430995, throughput 9.37398K wps
[Epoch 37 Batch 90/162] avg loss 0.00390362, throughput 9.29196K wps
[Epoch 37 Batch 120/162] avg loss 0.0039908, throughput 9.42386K wps
[Epoch 37 Batch 150/162] avg loss 0.00387184, throughput 9.41925K wps
Begin Testing...
[Epoch 37] train avg loss 0.00401795, dev acc 0.9067, dev avg loss 0.240547, throughput 9.38779K wps
[Epoch 38 Batch 30/162] avg loss 0.0036163, throughput 9.47012K wps
[Epoch 38 Batch 60/162] avg loss 0.00385092, throughput 9.31015K wps
[Epoch 38 Batch 90/162] avg loss 0.0038551, throughput 9.29771K wps
[Epoch 38 Batch 120/162] avg loss 0.00418608, throughput 9.29542K wps
[Epoch 38 Batch 150/162] avg loss 0.00415054, throughput 9.31799K wps
Begin Testing...
[Epoch 38] train avg loss 0.0039579, dev acc 0.9133, dev avg loss 0.235807, throughput 9.3362K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/162] avg loss 0.00393028, throughput 9.55193K wps
[Epoch 39 Batch 60/162] avg loss 0.00413509, throughput 9.34994K wps
[Epoch 39 Batch 90/162] avg loss 0.00382916, throughput 9.38689K wps
[Epoch 39 Batch 120/162] avg loss 0.00398161, throughput 9.28729K wps
[Epoch 39 Batch 150/162] avg loss 0.0038787, throughput 9.40004K wps
Begin Testing...
[Epoch 39] train avg loss 0.00393537, dev acc 0.9100, dev avg loss 0.238395, throughput 9.40121K wps
[Epoch 40 Batch 30/162] avg loss 0.00403528, throughput 9.53022K wps
[Epoch 40 Batch 60/162] avg loss 0.00352629, throughput 9.40001K wps
[Epoch 40 Batch 90/162] avg loss 0.00368269, throughput 9.27869K wps
[Epoch 40 Batch 120/162] avg loss 0.00396326, throughput 9.43897K wps
[Epoch 40 Batch 150/162] avg loss 0.00412601, throughput 9.28515K wps
Begin Testing...
[Epoch 40] train avg loss 0.00389714, dev acc 0.9044, dev avg loss 0.233922, throughput 9.39103K wps
[Epoch 41 Batch 30/162] avg loss 0.00374791, throughput 9.47088K wps
[Epoch 41 Batch 60/162] avg loss 0.00383387, throughput 9.35316K wps
[Epoch 41 Batch 90/162] avg loss 0.00386738, throughput 9.28109K wps
[Epoch 41 Batch 120/162] avg loss 0.00399312, throughput 9.53913K wps
[Epoch 41 Batch 150/162] avg loss 0.00365472, throughput 9.3243K wps
Begin Testing...
[Epoch 41] train avg loss 0.00382076, dev acc 0.9044, dev avg loss 0.233103, throughput 9.39679K wps
[Epoch 42 Batch 30/162] avg loss 0.00369949, throughput 9.40587K wps
[Epoch 42 Batch 60/162] avg loss 0.00348854, throughput 9.36362K wps
[Epoch 42 Batch 90/162] avg loss 0.00353717, throughput 9.33468K wps
[Epoch 42 Batch 120/162] avg loss 0.00380575, throughput 9.375K wps
[Epoch 42 Batch 150/162] avg loss 0.00376093, throughput 9.35851K wps
Begin Testing...
[Epoch 42] train avg loss 0.00367571, dev acc 0.9089, dev avg loss 0.236939, throughput 9.17275K wps
[Epoch 43 Batch 30/162] avg loss 0.00370348, throughput 9.65191K wps
[Epoch 43 Batch 60/162] avg loss 0.00404026, throughput 9.25523K wps
[Epoch 43 Batch 90/162] avg loss 0.00337673, throughput 9.46245K wps
[Epoch 43 Batch 120/162] avg loss 0.0038275, throughput 9.33324K wps
[Epoch 43 Batch 150/162] avg loss 0.00347124, throughput 9.53556K wps
Begin Testing...
[Epoch 43] train avg loss 0.00368728, dev acc 0.9100, dev avg loss 0.232106, throughput 9.4557K wps
[Epoch 44 Batch 30/162] avg loss 0.0032868, throughput 9.41737K wps
[Epoch 44 Batch 60/162] avg loss 0.00362133, throughput 9.4609K wps
[Epoch 44 Batch 90/162] avg loss 0.00386706, throughput 9.47567K wps
[Epoch 44 Batch 120/162] avg loss 0.00421878, throughput 9.26623K wps
[Epoch 44 Batch 150/162] avg loss 0.00317487, throughput 9.39186K wps
Begin Testing...
[Epoch 44] train avg loss 0.00365101, dev acc 0.9078, dev avg loss 0.228685, throughput 9.3968K wps
[Epoch 45 Batch 30/162] avg loss 0.00391822, throughput 9.68156K wps
[Epoch 45 Batch 60/162] avg loss 0.00368644, throughput 9.36446K wps
[Epoch 45 Batch 90/162] avg loss 0.00347633, throughput 9.37991K wps
[Epoch 45 Batch 120/162] avg loss 0.00338192, throughput 9.28825K wps
[Epoch 45 Batch 150/162] avg loss 0.00379859, throughput 9.52351K wps
Begin Testing...
[Epoch 45] train avg loss 0.00360312, dev acc 0.9122, dev avg loss 0.226747, throughput 9.43741K wps
[Epoch 46 Batch 30/162] avg loss 0.00337851, throughput 9.52281K wps
[Epoch 46 Batch 60/162] avg loss 0.00353297, throughput 9.42704K wps
[Epoch 46 Batch 90/162] avg loss 0.00361803, throughput 9.23996K wps
[Epoch 46 Batch 120/162] avg loss 0.00373974, throughput 9.41785K wps
[Epoch 46 Batch 150/162] avg loss 0.00347415, throughput 9.30365K wps
Begin Testing...
[Epoch 46] train avg loss 0.003556, dev acc 0.9089, dev avg loss 0.226497, throughput 9.3744K wps
[Epoch 47 Batch 30/162] avg loss 0.00317813, throughput 9.68463K wps
[Epoch 47 Batch 60/162] avg loss 0.00360974, throughput 9.35363K wps
[Epoch 47 Batch 90/162] avg loss 0.00345975, throughput 9.39537K wps
[Epoch 47 Batch 120/162] avg loss 0.00384116, throughput 9.39674K wps
[Epoch 47 Batch 150/162] avg loss 0.00356022, throughput 9.32371K wps
Begin Testing...
[Epoch 47] train avg loss 0.0035078, dev acc 0.9111, dev avg loss 0.22506, throughput 9.42128K wps
[Epoch 48 Batch 30/162] avg loss 0.00318078, throughput 9.50623K wps
[Epoch 48 Batch 60/162] avg loss 0.00334933, throughput 9.43885K wps
[Epoch 48 Batch 90/162] avg loss 0.00356736, throughput 9.2695K wps
[Epoch 48 Batch 120/162] avg loss 0.00351245, throughput 9.29588K wps
[Epoch 48 Batch 150/162] avg loss 0.00366584, throughput 9.46076K wps
Begin Testing...
[Epoch 48] train avg loss 0.00347328, dev acc 0.9100, dev avg loss 0.224737, throughput 9.39616K wps
[Epoch 49 Batch 30/162] avg loss 0.00315785, throughput 9.3593K wps
[Epoch 49 Batch 60/162] avg loss 0.00340777, throughput 9.42631K wps
[Epoch 49 Batch 90/162] avg loss 0.0033833, throughput 9.44367K wps
[Epoch 49 Batch 120/162] avg loss 0.00370163, throughput 9.3962K wps
[Epoch 49 Batch 150/162] avg loss 0.00358835, throughput 9.29303K wps
Begin Testing...
[Epoch 49] train avg loss 0.00343552, dev acc 0.9100, dev avg loss 0.22304, throughput 9.38923K wps
[Epoch 50 Batch 30/162] avg loss 0.00358856, throughput 9.53773K wps
[Epoch 50 Batch 60/162] avg loss 0.00340707, throughput 9.54984K wps
[Epoch 50 Batch 90/162] avg loss 0.0032981, throughput 9.35108K wps
[Epoch 50 Batch 120/162] avg loss 0.00342001, throughput 9.50797K wps
[Epoch 50 Batch 150/162] avg loss 0.00315331, throughput 9.31105K wps
Begin Testing...
[Epoch 50] train avg loss 0.00336502, dev acc 0.9100, dev avg loss 0.222865, throughput 9.43515K wps
[Epoch 51 Batch 30/162] avg loss 0.00344844, throughput 9.54461K wps
[Epoch 51 Batch 60/162] avg loss 0.00366597, throughput 9.49713K wps
[Epoch 51 Batch 90/162] avg loss 0.00339648, throughput 9.30367K wps
[Epoch 51 Batch 120/162] avg loss 0.00334348, throughput 9.50288K wps
[Epoch 51 Batch 150/162] avg loss 0.00281499, throughput 9.3962K wps
Begin Testing...
[Epoch 51] train avg loss 0.00332766, dev acc 0.9144, dev avg loss 0.221063, throughput 9.42948K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/162] avg loss 0.00330193, throughput 9.59202K wps
[Epoch 52 Batch 60/162] avg loss 0.00308882, throughput 9.34199K wps
[Epoch 52 Batch 90/162] avg loss 0.00306215, throughput 9.32314K wps
[Epoch 52 Batch 120/162] avg loss 0.00336281, throughput 9.34332K wps
[Epoch 52 Batch 150/162] avg loss 0.00334005, throughput 9.27212K wps
Begin Testing...
[Epoch 52] train avg loss 0.00323267, dev acc 0.9122, dev avg loss 0.220731, throughput 9.3833K wps
[Epoch 53 Batch 30/162] avg loss 0.00288676, throughput 9.59984K wps
[Epoch 53 Batch 60/162] avg loss 0.00316525, throughput 9.32294K wps
[Epoch 53 Batch 90/162] avg loss 0.00340108, throughput 9.57318K wps
[Epoch 53 Batch 120/162] avg loss 0.00304465, throughput 9.38061K wps
[Epoch 53 Batch 150/162] avg loss 0.00329722, throughput 9.33921K wps
Begin Testing...
[Epoch 53] train avg loss 0.00317741, dev acc 0.9156, dev avg loss 0.219419, throughput 9.43784K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/162] avg loss 0.00352498, throughput 9.59891K wps
[Epoch 54 Batch 60/162] avg loss 0.0031086, throughput 9.22745K wps
[Epoch 54 Batch 90/162] avg loss 0.00299515, throughput 9.28343K wps
[Epoch 54 Batch 120/162] avg loss 0.00326317, throughput 9.53516K wps
[Epoch 54 Batch 150/162] avg loss 0.00288429, throughput 9.26826K wps
Begin Testing...
[Epoch 54] train avg loss 0.00315196, dev acc 0.9133, dev avg loss 0.222463, throughput 9.38464K wps
[Epoch 55 Batch 30/162] avg loss 0.00313675, throughput 9.60211K wps
[Epoch 55 Batch 60/162] avg loss 0.00327937, throughput 9.37876K wps
[Epoch 55 Batch 90/162] avg loss 0.00304455, throughput 9.43512K wps
[Epoch 55 Batch 120/162] avg loss 0.00310149, throughput 9.37944K wps
[Epoch 55 Batch 150/162] avg loss 0.00310378, throughput 9.56915K wps
Begin Testing...
[Epoch 55] train avg loss 0.00313147, dev acc 0.9156, dev avg loss 0.217416, throughput 9.45598K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/162] avg loss 0.00308663, throughput 9.63825K wps
[Epoch 56 Batch 60/162] avg loss 0.0031728, throughput 9.46906K wps
[Epoch 56 Batch 90/162] avg loss 0.003047, throughput 9.50369K wps
[Epoch 56 Batch 120/162] avg loss 0.00320278, throughput 9.30888K wps
[Epoch 56 Batch 150/162] avg loss 0.00311601, throughput 9.34739K wps
Begin Testing...
[Epoch 56] train avg loss 0.00311877, dev acc 0.9133, dev avg loss 0.217659, throughput 9.46213K wps
[Epoch 57 Batch 30/162] avg loss 0.00314119, throughput 9.66441K wps
[Epoch 57 Batch 60/162] avg loss 0.0030945, throughput 9.48607K wps
[Epoch 57 Batch 90/162] avg loss 0.00284895, throughput 9.44406K wps
[Epoch 57 Batch 120/162] avg loss 0.00312214, throughput 9.31305K wps
[Epoch 57 Batch 150/162] avg loss 0.00320701, throughput 9.33659K wps
Begin Testing...
[Epoch 57] train avg loss 0.00304471, dev acc 0.9133, dev avg loss 0.21656, throughput 9.43125K wps
[Epoch 58 Batch 30/162] avg loss 0.00289749, throughput 9.5139K wps
[Epoch 58 Batch 60/162] avg loss 0.0030572, throughput 9.2619K wps
[Epoch 58 Batch 90/162] avg loss 0.00319532, throughput 9.17808K wps
[Epoch 58 Batch 120/162] avg loss 0.00298137, throughput 9.35268K wps
[Epoch 58 Batch 150/162] avg loss 0.00273173, throughput 9.3635K wps
Begin Testing...
[Epoch 58] train avg loss 0.0029873, dev acc 0.9133, dev avg loss 0.215808, throughput 9.34364K wps
[Epoch 59 Batch 30/162] avg loss 0.00288885, throughput 9.39518K wps
[Epoch 59 Batch 60/162] avg loss 0.00281376, throughput 9.52286K wps
[Epoch 59 Batch 90/162] avg loss 0.0028003, throughput 9.22134K wps
[Epoch 59 Batch 120/162] avg loss 0.00306146, throughput 9.36954K wps
[Epoch 59 Batch 150/162] avg loss 0.00307794, throughput 9.22466K wps
Begin Testing...
[Epoch 59] train avg loss 0.00294132, dev acc 0.9178, dev avg loss 0.214598, throughput 9.33721K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/162] avg loss 0.00281642, throughput 9.33315K wps
[Epoch 60 Batch 60/162] avg loss 0.0027846, throughput 9.3658K wps
[Epoch 60 Batch 90/162] avg loss 0.00288393, throughput 9.3219K wps
[Epoch 60 Batch 120/162] avg loss 0.00260388, throughput 9.1854K wps
[Epoch 60 Batch 150/162] avg loss 0.00311961, throughput 9.28927K wps
Begin Testing...
[Epoch 60] train avg loss 0.00284866, dev acc 0.9178, dev avg loss 0.213585, throughput 9.29819K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/162] avg loss 0.0029454, throughput 9.68048K wps
[Epoch 61 Batch 60/162] avg loss 0.0029491, throughput 9.42217K wps
[Epoch 61 Batch 90/162] avg loss 0.00293569, throughput 9.49673K wps
[Epoch 61 Batch 120/162] avg loss 0.00280779, throughput 9.34427K wps
[Epoch 61 Batch 150/162] avg loss 0.00286973, throughput 9.38538K wps
Begin Testing...
[Epoch 61] train avg loss 0.00289454, dev acc 0.9156, dev avg loss 0.214966, throughput 9.45196K wps
[Epoch 62 Batch 30/162] avg loss 0.00307331, throughput 9.67633K wps
[Epoch 62 Batch 60/162] avg loss 0.00260981, throughput 9.38458K wps
[Epoch 62 Batch 90/162] avg loss 0.00287654, throughput 9.23112K wps
[Epoch 62 Batch 120/162] avg loss 0.00253145, throughput 9.33879K wps
[Epoch 62 Batch 150/162] avg loss 0.00286575, throughput 9.36939K wps
Begin Testing...
[Epoch 62] train avg loss 0.00279087, dev acc 0.9178, dev avg loss 0.212548, throughput 9.40694K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/162] avg loss 0.00266319, throughput 9.49314K wps
[Epoch 63 Batch 60/162] avg loss 0.00299049, throughput 9.39792K wps
[Epoch 63 Batch 90/162] avg loss 0.0026538, throughput 9.2369K wps
[Epoch 63 Batch 120/162] avg loss 0.00294969, throughput 9.47272K wps
[Epoch 63 Batch 150/162] avg loss 0.00269789, throughput 9.30891K wps
Begin Testing...
[Epoch 63] train avg loss 0.00279171, dev acc 0.9144, dev avg loss 0.212288, throughput 9.37613K wps
[Epoch 64 Batch 30/162] avg loss 0.00279522, throughput 9.50945K wps
[Epoch 64 Batch 60/162] avg loss 0.00290671, throughput 9.3279K wps
[Epoch 64 Batch 90/162] avg loss 0.00285916, throughput 9.3628K wps
[Epoch 64 Batch 120/162] avg loss 0.00260125, throughput 9.44088K wps
[Epoch 64 Batch 150/162] avg loss 0.00258285, throughput 9.28697K wps
Begin Testing...
[Epoch 64] train avg loss 0.00275438, dev acc 0.9156, dev avg loss 0.211617, throughput 9.3893K wps
[Epoch 65 Batch 30/162] avg loss 0.00253041, throughput 9.54755K wps
[Epoch 65 Batch 60/162] avg loss 0.00283382, throughput 9.20343K wps
[Epoch 65 Batch 90/162] avg loss 0.00266519, throughput 9.44997K wps
[Epoch 65 Batch 120/162] avg loss 0.00278815, throughput 9.42333K wps
[Epoch 65 Batch 150/162] avg loss 0.0025763, throughput 9.27003K wps
Begin Testing...
[Epoch 65] train avg loss 0.00271699, dev acc 0.9156, dev avg loss 0.210817, throughput 9.39242K wps
[Epoch 66 Batch 30/162] avg loss 0.00255358, throughput 9.70619K wps
[Epoch 66 Batch 60/162] avg loss 0.00247572, throughput 9.47814K wps
[Epoch 66 Batch 90/162] avg loss 0.0026469, throughput 9.37641K wps
[Epoch 66 Batch 120/162] avg loss 0.002756, throughput 9.32051K wps
[Epoch 66 Batch 150/162] avg loss 0.00253668, throughput 9.39469K wps
Begin Testing...
[Epoch 66] train avg loss 0.00262527, dev acc 0.9144, dev avg loss 0.211205, throughput 9.4388K wps
[Epoch 67 Batch 30/162] avg loss 0.00264357, throughput 9.70131K wps
[Epoch 67 Batch 60/162] avg loss 0.00233298, throughput 9.39527K wps
[Epoch 67 Batch 90/162] avg loss 0.00265754, throughput 9.40917K wps
[Epoch 67 Batch 120/162] avg loss 0.00260187, throughput 9.34168K wps
[Epoch 67 Batch 150/162] avg loss 0.00255836, throughput 9.30964K wps
Begin Testing...
[Epoch 67] train avg loss 0.00256947, dev acc 0.9144, dev avg loss 0.210512, throughput 9.42315K wps
[Epoch 68 Batch 30/162] avg loss 0.00253492, throughput 9.50476K wps
[Epoch 68 Batch 60/162] avg loss 0.0028042, throughput 9.32187K wps
[Epoch 68 Batch 90/162] avg loss 0.00247748, throughput 9.53965K wps
[Epoch 68 Batch 120/162] avg loss 0.00262187, throughput 9.3745K wps
[Epoch 68 Batch 150/162] avg loss 0.00254095, throughput 9.39217K wps
Begin Testing...
[Epoch 68] train avg loss 0.00261883, dev acc 0.9200, dev avg loss 0.209441, throughput 9.40889K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/162] avg loss 0.0024845, throughput 9.75069K wps
[Epoch 69 Batch 60/162] avg loss 0.00260505, throughput 9.46023K wps
[Epoch 69 Batch 90/162] avg loss 0.00241479, throughput 9.37886K wps
[Epoch 69 Batch 120/162] avg loss 0.00273826, throughput 9.37237K wps
[Epoch 69 Batch 150/162] avg loss 0.00243985, throughput 9.35268K wps
Begin Testing...
[Epoch 69] train avg loss 0.00254853, dev acc 0.9178, dev avg loss 0.212331, throughput 9.44304K wps
[Epoch 70 Batch 30/162] avg loss 0.00260144, throughput 9.64096K wps
[Epoch 70 Batch 60/162] avg loss 0.00241532, throughput 9.24735K wps
[Epoch 70 Batch 90/162] avg loss 0.00272795, throughput 9.28185K wps
[Epoch 70 Batch 120/162] avg loss 0.00226338, throughput 9.56554K wps
[Epoch 70 Batch 150/162] avg loss 0.00242127, throughput 9.37051K wps
Begin Testing...
[Epoch 70] train avg loss 0.00247929, dev acc 0.9144, dev avg loss 0.209441, throughput 9.41115K wps
[Epoch 71 Batch 30/162] avg loss 0.00251706, throughput 9.63986K wps
[Epoch 71 Batch 60/162] avg loss 0.00224989, throughput 9.48325K wps
[Epoch 71 Batch 90/162] avg loss 0.00259469, throughput 9.32033K wps
[Epoch 71 Batch 120/162] avg loss 0.00230965, throughput 9.41586K wps
[Epoch 71 Batch 150/162] avg loss 0.00266199, throughput 9.39938K wps
Begin Testing...
[Epoch 71] train avg loss 0.00249198, dev acc 0.9167, dev avg loss 0.21003, throughput 9.44658K wps
[Epoch 72 Batch 30/162] avg loss 0.00274066, throughput 9.5333K wps
[Epoch 72 Batch 60/162] avg loss 0.00244282, throughput 9.40396K wps
[Epoch 72 Batch 90/162] avg loss 0.00218038, throughput 9.45801K wps
[Epoch 72 Batch 120/162] avg loss 0.00225924, throughput 9.35715K wps
[Epoch 72 Batch 150/162] avg loss 0.00260395, throughput 9.41767K wps
Begin Testing...
[Epoch 72] train avg loss 0.00244746, dev acc 0.9156, dev avg loss 0.207762, throughput 9.40931K wps
[Epoch 73 Batch 30/162] avg loss 0.00248187, throughput 9.66991K wps
[Epoch 73 Batch 60/162] avg loss 0.00233156, throughput 9.34824K wps
[Epoch 73 Batch 90/162] avg loss 0.00242208, throughput 9.51583K wps
[Epoch 73 Batch 120/162] avg loss 0.0025021, throughput 9.23414K wps
[Epoch 73 Batch 150/162] avg loss 0.00233627, throughput 9.3648K wps
Begin Testing...
[Epoch 73] train avg loss 0.00242023, dev acc 0.9156, dev avg loss 0.210773, throughput 9.4349K wps
[Epoch 74 Batch 30/162] avg loss 0.00240564, throughput 9.71273K wps
[Epoch 74 Batch 60/162] avg loss 0.00210959, throughput 9.31269K wps
[Epoch 74 Batch 90/162] avg loss 0.0022461, throughput 9.38421K wps
[Epoch 74 Batch 120/162] avg loss 0.00245088, throughput 9.49291K wps
[Epoch 74 Batch 150/162] avg loss 0.0023129, throughput 9.33802K wps
Begin Testing...
[Epoch 74] train avg loss 0.00232866, dev acc 0.9144, dev avg loss 0.207504, throughput 9.437K wps
[Epoch 75 Batch 30/162] avg loss 0.00219913, throughput 9.70508K wps
[Epoch 75 Batch 60/162] avg loss 0.00230746, throughput 9.35817K wps
[Epoch 75 Batch 90/162] avg loss 0.00229709, throughput 9.47615K wps
[Epoch 75 Batch 120/162] avg loss 0.00241229, throughput 9.38008K wps
[Epoch 75 Batch 150/162] avg loss 0.00231995, throughput 9.43205K wps
Begin Testing...
[Epoch 75] train avg loss 0.00233382, dev acc 0.9144, dev avg loss 0.2076, throughput 9.46329K wps
[Epoch 76 Batch 30/162] avg loss 0.00269728, throughput 9.52326K wps
[Epoch 76 Batch 60/162] avg loss 0.00233437, throughput 9.4011K wps
[Epoch 76 Batch 90/162] avg loss 0.00212156, throughput 9.50492K wps
[Epoch 76 Batch 120/162] avg loss 0.00245929, throughput 9.32548K wps
[Epoch 76 Batch 150/162] avg loss 0.00199076, throughput 9.44869K wps
Begin Testing...
[Epoch 76] train avg loss 0.00232555, dev acc 0.9156, dev avg loss 0.207058, throughput 9.45389K wps
[Epoch 77 Batch 30/162] avg loss 0.0023459, throughput 9.59286K wps
[Epoch 77 Batch 60/162] avg loss 0.00216819, throughput 9.47974K wps
[Epoch 77 Batch 90/162] avg loss 0.00225828, throughput 9.31954K wps
[Epoch 77 Batch 120/162] avg loss 0.00233275, throughput 9.44167K wps
[Epoch 77 Batch 150/162] avg loss 0.00243333, throughput 9.35746K wps
Begin Testing...
[Epoch 77] train avg loss 0.00230908, dev acc 0.9156, dev avg loss 0.206201, throughput 9.44723K wps
[Epoch 78 Batch 30/162] avg loss 0.00229113, throughput 9.68876K wps
[Epoch 78 Batch 60/162] avg loss 0.00196828, throughput 9.38466K wps
[Epoch 78 Batch 90/162] avg loss 0.00245172, throughput 9.32568K wps
[Epoch 78 Batch 120/162] avg loss 0.00239058, throughput 9.38389K wps
[Epoch 78 Batch 150/162] avg loss 0.00209169, throughput 9.46135K wps
Begin Testing...
[Epoch 78] train avg loss 0.00221657, dev acc 0.9156, dev avg loss 0.206973, throughput 9.44583K wps
[Epoch 79 Batch 30/162] avg loss 0.0022081, throughput 9.57182K wps
[Epoch 79 Batch 60/162] avg loss 0.00244722, throughput 9.41165K wps
[Epoch 79 Batch 90/162] avg loss 0.00207158, throughput 9.38219K wps
[Epoch 79 Batch 120/162] avg loss 0.00234296, throughput 9.37796K wps
[Epoch 79 Batch 150/162] avg loss 0.00220557, throughput 9.45205K wps
Begin Testing...
[Epoch 79] train avg loss 0.00223564, dev acc 0.9167, dev avg loss 0.205641, throughput 9.42664K wps
[Epoch 80 Batch 30/162] avg loss 0.00218495, throughput 9.52886K wps
[Epoch 80 Batch 60/162] avg loss 0.00238412, throughput 9.34769K wps
[Epoch 80 Batch 90/162] avg loss 0.00205708, throughput 9.27918K wps
[Epoch 80 Batch 120/162] avg loss 0.00216558, throughput 9.36922K wps
[Epoch 80 Batch 150/162] avg loss 0.00213911, throughput 9.28894K wps
Begin Testing...
[Epoch 80] train avg loss 0.00216977, dev acc 0.9167, dev avg loss 0.206409, throughput 9.37191K wps
[Epoch 81 Batch 30/162] avg loss 0.00219461, throughput 9.5966K wps
[Epoch 81 Batch 60/162] avg loss 0.00233392, throughput 9.42277K wps
[Epoch 81 Batch 90/162] avg loss 0.0020885, throughput 9.61725K wps
[Epoch 81 Batch 120/162] avg loss 0.00215105, throughput 9.48584K wps
[Epoch 81 Batch 150/162] avg loss 0.00211223, throughput 9.32248K wps
Begin Testing...
[Epoch 81] train avg loss 0.00218494, dev acc 0.9167, dev avg loss 0.204692, throughput 9.50062K wps
[Epoch 82 Batch 30/162] avg loss 0.00209817, throughput 9.67439K wps
[Epoch 82 Batch 60/162] avg loss 0.00222351, throughput 9.3846K wps
[Epoch 82 Batch 90/162] avg loss 0.00210819, throughput 9.42394K wps
[Epoch 82 Batch 120/162] avg loss 0.00205875, throughput 9.3767K wps
[Epoch 82 Batch 150/162] avg loss 0.00206784, throughput 9.54912K wps
Begin Testing...
[Epoch 82] train avg loss 0.0020979, dev acc 0.9178, dev avg loss 0.204249, throughput 9.48434K wps
[Epoch 83 Batch 30/162] avg loss 0.00216141, throughput 9.69198K wps
[Epoch 83 Batch 60/162] avg loss 0.00197263, throughput 9.5139K wps
[Epoch 83 Batch 90/162] avg loss 0.00221968, throughput 9.57626K wps
[Epoch 83 Batch 120/162] avg loss 0.00208259, throughput 9.40631K wps
[Epoch 83 Batch 150/162] avg loss 0.00198753, throughput 9.41953K wps
Begin Testing...
[Epoch 83] train avg loss 0.00209728, dev acc 0.9178, dev avg loss 0.203932, throughput 9.5107K wps
[Epoch 84 Batch 30/162] avg loss 0.00212332, throughput 9.4803K wps
[Epoch 84 Batch 60/162] avg loss 0.00211027, throughput 9.61343K wps
[Epoch 84 Batch 90/162] avg loss 0.00221945, throughput 9.50269K wps
[Epoch 84 Batch 120/162] avg loss 0.00212179, throughput 9.31858K wps
[Epoch 84 Batch 150/162] avg loss 0.00196616, throughput 9.3493K wps
Begin Testing...
[Epoch 84] train avg loss 0.00210958, dev acc 0.9178, dev avg loss 0.203869, throughput 9.46579K wps
[Epoch 85 Batch 30/162] avg loss 0.00208976, throughput 9.55269K wps
[Epoch 85 Batch 60/162] avg loss 0.00201801, throughput 9.46874K wps
[Epoch 85 Batch 90/162] avg loss 0.0019091, throughput 9.35444K wps
[Epoch 85 Batch 120/162] avg loss 0.00200232, throughput 9.48897K wps
[Epoch 85 Batch 150/162] avg loss 0.00218931, throughput 9.43171K wps
Begin Testing...
[Epoch 85] train avg loss 0.00204443, dev acc 0.9178, dev avg loss 0.203934, throughput 9.43316K wps
[Epoch 86 Batch 30/162] avg loss 0.00211727, throughput 9.6974K wps
[Epoch 86 Batch 60/162] avg loss 0.00207113, throughput 9.30964K wps
[Epoch 86 Batch 90/162] avg loss 0.00185112, throughput 9.34813K wps
[Epoch 86 Batch 120/162] avg loss 0.00212271, throughput 9.3216K wps
[Epoch 86 Batch 150/162] avg loss 0.00202072, throughput 9.61068K wps
Begin Testing...
[Epoch 86] train avg loss 0.00202856, dev acc 0.9167, dev avg loss 0.206555, throughput 9.46183K wps
[Epoch 87 Batch 30/162] avg loss 0.00189716, throughput 9.60306K wps
[Epoch 87 Batch 60/162] avg loss 0.00204118, throughput 9.29589K wps
[Epoch 87 Batch 90/162] avg loss 0.00218356, throughput 9.51961K wps
[Epoch 87 Batch 120/162] avg loss 0.00182706, throughput 9.23602K wps
[Epoch 87 Batch 150/162] avg loss 0.00196395, throughput 9.37668K wps
Begin Testing...
[Epoch 87] train avg loss 0.00196859, dev acc 0.9178, dev avg loss 0.202876, throughput 9.41142K wps
[Epoch 88 Batch 30/162] avg loss 0.00204261, throughput 9.57811K wps
[Epoch 88 Batch 60/162] avg loss 0.00207047, throughput 9.47736K wps
[Epoch 88 Batch 90/162] avg loss 0.00178826, throughput 9.45543K wps
[Epoch 88 Batch 120/162] avg loss 0.00192106, throughput 9.35268K wps
[Epoch 88 Batch 150/162] avg loss 0.00213253, throughput 9.48186K wps
Begin Testing...
[Epoch 88] train avg loss 0.00198777, dev acc 0.9178, dev avg loss 0.203035, throughput 9.45533K wps
[Epoch 89 Batch 30/162] avg loss 0.00177821, throughput 9.58372K wps
[Epoch 89 Batch 60/162] avg loss 0.00192296, throughput 9.39022K wps
[Epoch 89 Batch 90/162] avg loss 0.00220993, throughput 9.49686K wps
[Epoch 89 Batch 120/162] avg loss 0.00193389, throughput 9.24021K wps
[Epoch 89 Batch 150/162] avg loss 0.00198891, throughput 9.34758K wps
Begin Testing...
[Epoch 89] train avg loss 0.00194793, dev acc 0.9178, dev avg loss 0.202888, throughput 9.40415K wps
[Epoch 90 Batch 30/162] avg loss 0.00193358, throughput 9.83959K wps
[Epoch 90 Batch 60/162] avg loss 0.00187825, throughput 9.54346K wps
[Epoch 90 Batch 90/162] avg loss 0.00191117, throughput 9.35081K wps
[Epoch 90 Batch 120/162] avg loss 0.00184486, throughput 9.39868K wps
[Epoch 90 Batch 150/162] avg loss 0.00194007, throughput 9.47053K wps
Begin Testing...
[Epoch 90] train avg loss 0.00189864, dev acc 0.9178, dev avg loss 0.202907, throughput 9.51493K wps
[Epoch 91 Batch 30/162] avg loss 0.00207203, throughput 9.56273K wps
[Epoch 91 Batch 60/162] avg loss 0.00165915, throughput 9.22933K wps
[Epoch 91 Batch 90/162] avg loss 0.00207815, throughput 9.42723K wps
[Epoch 91 Batch 120/162] avg loss 0.00176788, throughput 9.50385K wps
[Epoch 91 Batch 150/162] avg loss 0.00194098, throughput 9.26717K wps
Begin Testing...
[Epoch 91] train avg loss 0.00190179, dev acc 0.9189, dev avg loss 0.203068, throughput 9.41169K wps
[Epoch 92 Batch 30/162] avg loss 0.00178526, throughput 9.41938K wps
[Epoch 92 Batch 60/162] avg loss 0.00196837, throughput 9.45556K wps
[Epoch 92 Batch 90/162] avg loss 0.00196454, throughput 9.45827K wps
[Epoch 92 Batch 120/162] avg loss 0.00180546, throughput 9.23448K wps
[Epoch 92 Batch 150/162] avg loss 0.0017621, throughput 9.55964K wps
Begin Testing...
[Epoch 92] train avg loss 0.00184886, dev acc 0.9200, dev avg loss 0.202755, throughput 9.43996K wps
Observed Improvement.
Begin Testing...
[Epoch 93 Batch 30/162] avg loss 0.0018225, throughput 9.5948K wps
[Epoch 93 Batch 60/162] avg loss 0.00184558, throughput 9.23939K wps
[Epoch 93 Batch 90/162] avg loss 0.0019831, throughput 9.37298K wps
[Epoch 93 Batch 120/162] avg loss 0.00195239, throughput 9.61804K wps
[Epoch 93 Batch 150/162] avg loss 0.00172166, throughput 9.40221K wps
Begin Testing...
[Epoch 93] train avg loss 0.0018492, dev acc 0.9189, dev avg loss 0.203956, throughput 9.41678K wps
[Epoch 94 Batch 30/162] avg loss 0.00190348, throughput 9.48548K wps
[Epoch 94 Batch 60/162] avg loss 0.00194797, throughput 9.40303K wps
[Epoch 94 Batch 90/162] avg loss 0.00179099, throughput 9.3183K wps
[Epoch 94 Batch 120/162] avg loss 0.00163815, throughput 9.34593K wps
[Epoch 94 Batch 150/162] avg loss 0.00198078, throughput 9.5256K wps
Begin Testing...
[Epoch 94] train avg loss 0.00182133, dev acc 0.9178, dev avg loss 0.201901, throughput 9.43058K wps
[Epoch 95 Batch 30/162] avg loss 0.0018496, throughput 9.49513K wps
[Epoch 95 Batch 60/162] avg loss 0.00176679, throughput 9.32525K wps
[Epoch 95 Batch 90/162] avg loss 0.0017089, throughput 9.32987K wps
[Epoch 95 Batch 120/162] avg loss 0.00176831, throughput 9.29309K wps
[Epoch 95 Batch 150/162] avg loss 0.00176792, throughput 9.61112K wps
Begin Testing...
[Epoch 95] train avg loss 0.00175803, dev acc 0.9211, dev avg loss 0.202255, throughput 9.40555K wps
Observed Improvement.
Begin Testing...
[Epoch 96 Batch 30/162] avg loss 0.001919, throughput 9.48883K wps
[Epoch 96 Batch 60/162] avg loss 0.00177151, throughput 9.24487K wps
[Epoch 96 Batch 90/162] avg loss 0.00166347, throughput 9.38465K wps
[Epoch 96 Batch 120/162] avg loss 0.00189636, throughput 9.31054K wps
[Epoch 96 Batch 150/162] avg loss 0.00175136, throughput 9.59829K wps
Begin Testing...
[Epoch 96] train avg loss 0.00180056, dev acc 0.9189, dev avg loss 0.202253, throughput 9.40084K wps
[Epoch 97 Batch 30/162] avg loss 0.00179326, throughput 9.46509K wps
[Epoch 97 Batch 60/162] avg loss 0.00173355, throughput 9.37721K wps
[Epoch 97 Batch 90/162] avg loss 0.00154658, throughput 9.30914K wps
[Epoch 97 Batch 120/162] avg loss 0.00171731, throughput 9.29621K wps
[Epoch 97 Batch 150/162] avg loss 0.0017593, throughput 9.56154K wps
Begin Testing...
[Epoch 97] train avg loss 0.00170653, dev acc 0.9189, dev avg loss 0.201612, throughput 9.37815K wps
[Epoch 98 Batch 30/162] avg loss 0.00173192, throughput 9.62262K wps
[Epoch 98 Batch 60/162] avg loss 0.00164598, throughput 9.25017K wps
[Epoch 98 Batch 90/162] avg loss 0.00177925, throughput 9.42925K wps
[Epoch 98 Batch 120/162] avg loss 0.00181834, throughput 9.50292K wps
[Epoch 98 Batch 150/162] avg loss 0.00165632, throughput 9.37837K wps
Begin Testing...
[Epoch 98] train avg loss 0.00174675, dev acc 0.9178, dev avg loss 0.201147, throughput 9.43347K wps
[Epoch 99 Batch 30/162] avg loss 0.00167377, throughput 9.65808K wps
[Epoch 99 Batch 60/162] avg loss 0.00180385, throughput 9.31154K wps
[Epoch 99 Batch 90/162] avg loss 0.00147296, throughput 9.34596K wps
[Epoch 99 Batch 120/162] avg loss 0.00184987, throughput 9.33137K wps
[Epoch 99 Batch 150/162] avg loss 0.00167669, throughput 9.60396K wps
Begin Testing...
[Epoch 99] train avg loss 0.00168801, dev acc 0.9189, dev avg loss 0.202509, throughput 9.44778K wps
[Epoch 100 Batch 30/162] avg loss 0.00160823, throughput 9.46893K wps
[Epoch 100 Batch 60/162] avg loss 0.00167075, throughput 9.36293K wps
[Epoch 100 Batch 90/162] avg loss 0.00162697, throughput 9.53216K wps
[Epoch 100 Batch 120/162] avg loss 0.00166427, throughput 9.23275K wps
[Epoch 100 Batch 150/162] avg loss 0.00160112, throughput 9.37408K wps
Begin Testing...
[Epoch 100] train avg loss 0.00165907, dev acc 0.9200, dev avg loss 0.200766, throughput 9.38997K wps
[Epoch 101 Batch 30/162] avg loss 0.00166235, throughput 9.72319K wps
[Epoch 101 Batch 60/162] avg loss 0.00170181, throughput 9.3252K wps
[Epoch 101 Batch 90/162] avg loss 0.00162537, throughput 9.54429K wps
[Epoch 101 Batch 120/162] avg loss 0.00162783, throughput 9.42488K wps
[Epoch 101 Batch 150/162] avg loss 0.0016265, throughput 9.27651K wps
Begin Testing...
[Epoch 101] train avg loss 0.00165368, dev acc 0.9200, dev avg loss 0.201949, throughput 9.46257K wps
[Epoch 102 Batch 30/162] avg loss 0.00157725, throughput 9.64011K wps
[Epoch 102 Batch 60/162] avg loss 0.0017191, throughput 9.21539K wps
[Epoch 102 Batch 90/162] avg loss 0.00158635, throughput 9.53335K wps
[Epoch 102 Batch 120/162] avg loss 0.00172382, throughput 9.53554K wps
[Epoch 102 Batch 150/162] avg loss 0.00152106, throughput 9.35435K wps
Begin Testing...
[Epoch 102] train avg loss 0.00163151, dev acc 0.9211, dev avg loss 0.201249, throughput 9.44903K wps
Observed Improvement.
Begin Testing...
[Epoch 103 Batch 30/162] avg loss 0.0016471, throughput 9.43055K wps
[Epoch 103 Batch 60/162] avg loss 0.00152364, throughput 9.39166K wps
[Epoch 103 Batch 90/162] avg loss 0.00157243, throughput 9.44942K wps
[Epoch 103 Batch 120/162] avg loss 0.00154024, throughput 9.30904K wps
[Epoch 103 Batch 150/162] avg loss 0.00167516, throughput 9.42171K wps
Begin Testing...
[Epoch 103] train avg loss 0.00160252, dev acc 0.9211, dev avg loss 0.200536, throughput 9.42119K wps
Observed Improvement.
Begin Testing...
[Epoch 104 Batch 30/162] avg loss 0.0014871, throughput 9.7828K wps
[Epoch 104 Batch 60/162] avg loss 0.00166755, throughput 9.30856K wps
[Epoch 104 Batch 90/162] avg loss 0.00146944, throughput 9.30463K wps
[Epoch 104 Batch 120/162] avg loss 0.00148933, throughput 9.47154K wps
[Epoch 104 Batch 150/162] avg loss 0.00166115, throughput 9.49217K wps
Begin Testing...
[Epoch 104] train avg loss 0.0015517, dev acc 0.9222, dev avg loss 0.201214, throughput 9.44257K wps
Observed Improvement.
Begin Testing...
[Epoch 105 Batch 30/162] avg loss 0.00155719, throughput 9.62544K wps
[Epoch 105 Batch 60/162] avg loss 0.00155679, throughput 9.44923K wps
[Epoch 105 Batch 90/162] avg loss 0.00154308, throughput 9.39391K wps
[Epoch 105 Batch 120/162] avg loss 0.00153408, throughput 9.26192K wps