Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
web-data/gluonnlp/logs/sentiment/Subj_static.log
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
15175 lines (15175 sloc)
937 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Namespace(batch_size=50, data_name='Subj', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='static') | |
Use gpu0 | |
maximum length (in tokens): 120 | |
Done! Tokenizing Time=0.24s, #Sentences=10000 | |
SentimentNet( | |
(embedding): Embedding(21326 -> 300, float32) | |
(encoder): ConvolutionalEncoder( | |
(_convs): HybridConcurrent( | |
(0): HybridSequential( | |
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,)) | |
(1): HybridLambda(<lambda>) | |
(2): Activation(relu) | |
) | |
(1): HybridSequential( | |
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,)) | |
(1): HybridLambda(<lambda>) | |
(2): Activation(relu) | |
) | |
(2): HybridSequential( | |
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,)) | |
(1): HybridLambda(<lambda>) | |
(2): Activation(relu) | |
) | |
) | |
) | |
(output): HybridSequential( | |
(0): Dropout(p = 0.5, axes=()) | |
(1): Dense(None -> 2, linear) | |
) | |
) | |
[Epoch 0 Batch 30/162] avg loss 0.0139927, throughput 0.563051K wps | |
[Epoch 0 Batch 60/162] avg loss 0.0138302, throughput 9.52509K wps | |
[Epoch 0 Batch 90/162] avg loss 0.0137475, throughput 9.45732K wps | |
[Epoch 0 Batch 120/162] avg loss 0.0136195, throughput 9.51905K wps | |
[Epoch 0 Batch 150/162] avg loss 0.0135821, throughput 9.46978K wps | |
Begin Testing... | |
[Epoch 0] train avg loss 0.0137352, dev acc 0.7733, dev avg loss 0.662958, throughput 2.41023K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 1 Batch 30/162] avg loss 0.0133297, throughput 9.67788K wps | |
[Epoch 1 Batch 60/162] avg loss 0.0131657, throughput 9.37649K wps | |
[Epoch 1 Batch 90/162] avg loss 0.0131255, throughput 9.49583K wps | |
[Epoch 1 Batch 120/162] avg loss 0.0129709, throughput 9.35736K wps | |
[Epoch 1 Batch 150/162] avg loss 0.0129759, throughput 9.46346K wps | |
Begin Testing... | |
[Epoch 1] train avg loss 0.013072, dev acc 0.6622, dev avg loss 0.63981, throughput 9.48049K wps | |
[Epoch 2 Batch 30/162] avg loss 0.0128046, throughput 9.62158K wps | |
[Epoch 2 Batch 60/162] avg loss 0.0125604, throughput 9.39331K wps | |
[Epoch 2 Batch 90/162] avg loss 0.0124079, throughput 9.55848K wps | |
[Epoch 2 Batch 120/162] avg loss 0.0122335, throughput 9.54988K wps | |
[Epoch 2 Batch 150/162] avg loss 0.0122206, throughput 9.68006K wps | |
Begin Testing... | |
[Epoch 2] train avg loss 0.0124286, dev acc 0.8478, dev avg loss 0.600085, throughput 9.55656K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 3 Batch 30/162] avg loss 0.0119629, throughput 9.74152K wps | |
[Epoch 3 Batch 60/162] avg loss 0.0119284, throughput 9.48386K wps | |
[Epoch 3 Batch 90/162] avg loss 0.0117038, throughput 9.53708K wps | |
[Epoch 3 Batch 120/162] avg loss 0.0116698, throughput 9.333K wps | |
[Epoch 3 Batch 150/162] avg loss 0.0113905, throughput 9.67526K wps | |
Begin Testing... | |
[Epoch 3] train avg loss 0.011693, dev acc 0.8567, dev avg loss 0.564077, throughput 9.52981K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 4 Batch 30/162] avg loss 0.0111505, throughput 9.79962K wps | |
[Epoch 4 Batch 60/162] avg loss 0.0109733, throughput 9.69988K wps | |
[Epoch 4 Batch 90/162] avg loss 0.0109122, throughput 9.54538K wps | |
[Epoch 4 Batch 120/162] avg loss 0.01089, throughput 9.43827K wps | |
[Epoch 4 Batch 150/162] avg loss 0.0107541, throughput 9.62589K wps | |
Begin Testing... | |
[Epoch 4] train avg loss 0.0108868, dev acc 0.8678, dev avg loss 0.522418, throughput 9.6274K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 5 Batch 30/162] avg loss 0.0104188, throughput 9.71097K wps | |
[Epoch 5 Batch 60/162] avg loss 0.0102718, throughput 9.58992K wps | |
[Epoch 5 Batch 90/162] avg loss 0.00994886, throughput 9.49034K wps | |
[Epoch 5 Batch 120/162] avg loss 0.00980264, throughput 9.46891K wps | |
[Epoch 5 Batch 150/162] avg loss 0.00998378, throughput 9.70823K wps | |
Begin Testing... | |
[Epoch 5] train avg loss 0.0100641, dev acc 0.8722, dev avg loss 0.482372, throughput 9.60097K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 6 Batch 30/162] avg loss 0.00960078, throughput 9.66465K wps | |
[Epoch 6 Batch 60/162] avg loss 0.00951817, throughput 9.49316K wps | |
[Epoch 6 Batch 90/162] avg loss 0.00915172, throughput 9.61593K wps | |
[Epoch 6 Batch 120/162] avg loss 0.00909617, throughput 9.456K wps | |
[Epoch 6 Batch 150/162] avg loss 0.00920177, throughput 9.35415K wps | |
Begin Testing... | |
[Epoch 6] train avg loss 0.00926776, dev acc 0.8689, dev avg loss 0.447398, throughput 9.5075K wps | |
[Epoch 7 Batch 30/162] avg loss 0.00898816, throughput 9.71807K wps | |
[Epoch 7 Batch 60/162] avg loss 0.00859521, throughput 9.54446K wps | |
[Epoch 7 Batch 90/162] avg loss 0.00854635, throughput 9.38923K wps | |
[Epoch 7 Batch 120/162] avg loss 0.0082965, throughput 9.61768K wps | |
[Epoch 7 Batch 150/162] avg loss 0.00855405, throughput 9.64452K wps | |
Begin Testing... | |
[Epoch 7] train avg loss 0.00859653, dev acc 0.8722, dev avg loss 0.417487, throughput 9.58623K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 8 Batch 30/162] avg loss 0.00807569, throughput 9.63364K wps | |
[Epoch 8 Batch 60/162] avg loss 0.0081229, throughput 9.54556K wps | |
[Epoch 8 Batch 90/162] avg loss 0.00795082, throughput 9.47762K wps | |
[Epoch 8 Batch 120/162] avg loss 0.00809858, throughput 9.34018K wps | |
[Epoch 8 Batch 150/162] avg loss 0.00807176, throughput 9.64029K wps | |
Begin Testing... | |
[Epoch 8] train avg loss 0.00802045, dev acc 0.8700, dev avg loss 0.394164, throughput 9.51211K wps | |
[Epoch 9 Batch 30/162] avg loss 0.00765048, throughput 9.68748K wps | |
[Epoch 9 Batch 60/162] avg loss 0.00766964, throughput 9.60219K wps | |
[Epoch 9 Batch 90/162] avg loss 0.00778438, throughput 9.46143K wps | |
[Epoch 9 Batch 120/162] avg loss 0.00752026, throughput 9.51964K wps | |
[Epoch 9 Batch 150/162] avg loss 0.0073391, throughput 9.57476K wps | |
Begin Testing... | |
[Epoch 9] train avg loss 0.0075777, dev acc 0.8733, dev avg loss 0.376142, throughput 9.548K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 10 Batch 30/162] avg loss 0.00717406, throughput 9.68558K wps | |
[Epoch 10 Batch 60/162] avg loss 0.00721908, throughput 9.46155K wps | |
[Epoch 10 Batch 90/162] avg loss 0.00730357, throughput 9.58996K wps | |
[Epoch 10 Batch 120/162] avg loss 0.00733003, throughput 9.44458K wps | |
[Epoch 10 Batch 150/162] avg loss 0.00701399, throughput 9.40509K wps | |
Begin Testing... | |
[Epoch 10] train avg loss 0.00718916, dev acc 0.8756, dev avg loss 0.363214, throughput 9.51757K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 11 Batch 30/162] avg loss 0.00678275, throughput 9.54591K wps | |
[Epoch 11 Batch 60/162] avg loss 0.00700366, throughput 9.4893K wps | |
[Epoch 11 Batch 90/162] avg loss 0.00689843, throughput 9.4057K wps | |
[Epoch 11 Batch 120/162] avg loss 0.0070885, throughput 9.4621K wps | |
[Epoch 11 Batch 150/162] avg loss 0.0067608, throughput 9.6145K wps | |
Begin Testing... | |
[Epoch 11] train avg loss 0.00691345, dev acc 0.8778, dev avg loss 0.352661, throughput 9.48861K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 12 Batch 30/162] avg loss 0.00683382, throughput 9.78665K wps | |
[Epoch 12 Batch 60/162] avg loss 0.00662973, throughput 9.41849K wps | |
[Epoch 12 Batch 90/162] avg loss 0.0062095, throughput 9.52567K wps | |
[Epoch 12 Batch 120/162] avg loss 0.00622745, throughput 9.63293K wps | |
[Epoch 12 Batch 150/162] avg loss 0.00668349, throughput 9.51729K wps | |
Begin Testing... | |
[Epoch 12] train avg loss 0.00650781, dev acc 0.8811, dev avg loss 0.341088, throughput 9.57071K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 13 Batch 30/162] avg loss 0.0066457, throughput 9.55024K wps | |
[Epoch 13 Batch 60/162] avg loss 0.0066435, throughput 9.52019K wps | |
[Epoch 13 Batch 90/162] avg loss 0.00622139, throughput 9.42255K wps | |
[Epoch 13 Batch 120/162] avg loss 0.00649197, throughput 9.44282K wps | |
[Epoch 13 Batch 150/162] avg loss 0.00611179, throughput 9.57489K wps | |
Begin Testing... | |
[Epoch 13] train avg loss 0.00644013, dev acc 0.8856, dev avg loss 0.333625, throughput 9.49058K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 14 Batch 30/162] avg loss 0.00632254, throughput 9.62778K wps | |
[Epoch 14 Batch 60/162] avg loss 0.00630582, throughput 9.36483K wps | |
[Epoch 14 Batch 90/162] avg loss 0.00641791, throughput 9.35305K wps | |
[Epoch 14 Batch 120/162] avg loss 0.00604265, throughput 9.41917K wps | |
[Epoch 14 Batch 150/162] avg loss 0.00612689, throughput 9.57848K wps | |
Begin Testing... | |
[Epoch 14] train avg loss 0.00624127, dev acc 0.8867, dev avg loss 0.326977, throughput 9.44655K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 15 Batch 30/162] avg loss 0.00631406, throughput 9.60864K wps | |
[Epoch 15 Batch 60/162] avg loss 0.00592309, throughput 9.57579K wps | |
[Epoch 15 Batch 90/162] avg loss 0.00617731, throughput 9.50213K wps | |
[Epoch 15 Batch 120/162] avg loss 0.00596338, throughput 9.37959K wps | |
[Epoch 15 Batch 150/162] avg loss 0.00599038, throughput 9.4467K wps | |
Begin Testing... | |
[Epoch 15] train avg loss 0.00605979, dev acc 0.8889, dev avg loss 0.320968, throughput 9.48432K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 16 Batch 30/162] avg loss 0.00569059, throughput 9.56589K wps | |
[Epoch 16 Batch 60/162] avg loss 0.00582947, throughput 9.40569K wps | |
[Epoch 16 Batch 90/162] avg loss 0.0062145, throughput 9.26223K wps | |
[Epoch 16 Batch 120/162] avg loss 0.00600952, throughput 9.43629K wps | |
[Epoch 16 Batch 150/162] avg loss 0.00606738, throughput 9.49933K wps | |
Begin Testing... | |
[Epoch 16] train avg loss 0.00593747, dev acc 0.8867, dev avg loss 0.315545, throughput 9.42478K wps | |
[Epoch 17 Batch 30/162] avg loss 0.00557341, throughput 9.58483K wps | |
[Epoch 17 Batch 60/162] avg loss 0.00592358, throughput 9.39695K wps | |
[Epoch 17 Batch 90/162] avg loss 0.00622066, throughput 9.32288K wps | |
[Epoch 17 Batch 120/162] avg loss 0.00554141, throughput 9.35937K wps | |
[Epoch 17 Batch 150/162] avg loss 0.00607266, throughput 9.64441K wps | |
Begin Testing... | |
[Epoch 17] train avg loss 0.00583486, dev acc 0.8889, dev avg loss 0.311482, throughput 9.46071K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 18 Batch 30/162] avg loss 0.00559593, throughput 9.73618K wps | |
[Epoch 18 Batch 60/162] avg loss 0.00565354, throughput 9.46467K wps | |
[Epoch 18 Batch 90/162] avg loss 0.00555657, throughput 9.26849K wps | |
[Epoch 18 Batch 120/162] avg loss 0.00615792, throughput 9.34718K wps | |
[Epoch 18 Batch 150/162] avg loss 0.00531747, throughput 9.45421K wps | |
Begin Testing... | |
[Epoch 18] train avg loss 0.0056676, dev acc 0.8889, dev avg loss 0.306577, throughput 9.44157K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 19 Batch 30/162] avg loss 0.00553821, throughput 9.64454K wps | |
[Epoch 19 Batch 60/162] avg loss 0.00530412, throughput 9.39186K wps | |
[Epoch 19 Batch 90/162] avg loss 0.00555156, throughput 9.39406K wps | |
[Epoch 19 Batch 120/162] avg loss 0.00581841, throughput 9.43238K wps | |
[Epoch 19 Batch 150/162] avg loss 0.00549919, throughput 9.34169K wps | |
Begin Testing... | |
[Epoch 19] train avg loss 0.00553728, dev acc 0.8911, dev avg loss 0.302825, throughput 9.43016K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 20 Batch 30/162] avg loss 0.00521793, throughput 9.78528K wps | |
[Epoch 20 Batch 60/162] avg loss 0.00523199, throughput 9.59378K wps | |
[Epoch 20 Batch 90/162] avg loss 0.00544018, throughput 9.55586K wps | |
[Epoch 20 Batch 120/162] avg loss 0.00572085, throughput 9.62187K wps | |
[Epoch 20 Batch 150/162] avg loss 0.00584005, throughput 9.55164K wps | |
Begin Testing... | |
[Epoch 20] train avg loss 0.00545546, dev acc 0.8922, dev avg loss 0.298785, throughput 9.61922K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 21 Batch 30/162] avg loss 0.00563956, throughput 9.55574K wps | |
[Epoch 21 Batch 60/162] avg loss 0.00519013, throughput 9.38313K wps | |
[Epoch 21 Batch 90/162] avg loss 0.0051913, throughput 9.47522K wps | |
[Epoch 21 Batch 120/162] avg loss 0.00510079, throughput 9.50785K wps | |
[Epoch 21 Batch 150/162] avg loss 0.00502946, throughput 9.27265K wps | |
Begin Testing... | |
[Epoch 21] train avg loss 0.00524858, dev acc 0.8956, dev avg loss 0.295013, throughput 9.43116K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 22 Batch 30/162] avg loss 0.0052942, throughput 9.69388K wps | |
[Epoch 22 Batch 60/162] avg loss 0.00522257, throughput 9.48309K wps | |
[Epoch 22 Batch 90/162] avg loss 0.00504285, throughput 9.44366K wps | |
[Epoch 22 Batch 120/162] avg loss 0.00495531, throughput 9.39026K wps | |
[Epoch 22 Batch 150/162] avg loss 0.00539518, throughput 9.52251K wps | |
Begin Testing... | |
[Epoch 22] train avg loss 0.00516369, dev acc 0.8967, dev avg loss 0.291609, throughput 9.51514K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 23 Batch 30/162] avg loss 0.00514018, throughput 9.64495K wps | |
[Epoch 23 Batch 60/162] avg loss 0.00501121, throughput 9.4804K wps | |
[Epoch 23 Batch 90/162] avg loss 0.00491868, throughput 9.47652K wps | |
[Epoch 23 Batch 120/162] avg loss 0.00524275, throughput 9.41957K wps | |
[Epoch 23 Batch 150/162] avg loss 0.0052641, throughput 9.45964K wps | |
Begin Testing... | |
[Epoch 23] train avg loss 0.00508836, dev acc 0.8922, dev avg loss 0.289848, throughput 9.48341K wps | |
[Epoch 24 Batch 30/162] avg loss 0.00510846, throughput 9.47826K wps | |
[Epoch 24 Batch 60/162] avg loss 0.00495213, throughput 9.58132K wps | |
[Epoch 24 Batch 90/162] avg loss 0.00466404, throughput 9.34193K wps | |
[Epoch 24 Batch 120/162] avg loss 0.00530204, throughput 9.60358K wps | |
[Epoch 24 Batch 150/162] avg loss 0.0047967, throughput 9.47435K wps | |
Begin Testing... | |
[Epoch 24] train avg loss 0.00494935, dev acc 0.8956, dev avg loss 0.285739, throughput 9.50178K wps | |
[Epoch 25 Batch 30/162] avg loss 0.00484253, throughput 9.66595K wps | |
[Epoch 25 Batch 60/162] avg loss 0.00514943, throughput 9.43817K wps | |
[Epoch 25 Batch 90/162] avg loss 0.00455663, throughput 9.34473K wps | |
[Epoch 25 Batch 120/162] avg loss 0.00507725, throughput 9.44696K wps | |
[Epoch 25 Batch 150/162] avg loss 0.00469302, throughput 9.48272K wps | |
Begin Testing... | |
[Epoch 25] train avg loss 0.00487417, dev acc 0.8956, dev avg loss 0.283306, throughput 9.47791K wps | |
[Epoch 26 Batch 30/162] avg loss 0.00503558, throughput 9.49072K wps | |
[Epoch 26 Batch 60/162] avg loss 0.00440408, throughput 9.44584K wps | |
[Epoch 26 Batch 90/162] avg loss 0.00464679, throughput 9.52314K wps | |
[Epoch 26 Batch 120/162] avg loss 0.00515276, throughput 9.33639K wps | |
[Epoch 26 Batch 150/162] avg loss 0.00482135, throughput 9.45824K wps | |
Begin Testing... | |
[Epoch 26] train avg loss 0.00482186, dev acc 0.8978, dev avg loss 0.280569, throughput 9.45502K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 27 Batch 30/162] avg loss 0.0045356, throughput 9.64995K wps | |
[Epoch 27 Batch 60/162] avg loss 0.00455278, throughput 9.36593K wps | |
[Epoch 27 Batch 90/162] avg loss 0.00482514, throughput 9.35764K wps | |
[Epoch 27 Batch 120/162] avg loss 0.00492322, throughput 9.25941K wps | |
[Epoch 27 Batch 150/162] avg loss 0.00494774, throughput 9.46653K wps | |
Begin Testing... | |
[Epoch 27] train avg loss 0.00475806, dev acc 0.8967, dev avg loss 0.2784, throughput 9.40425K wps | |
[Epoch 28 Batch 30/162] avg loss 0.00434146, throughput 9.67811K wps | |
[Epoch 28 Batch 60/162] avg loss 0.00488754, throughput 9.60404K wps | |
[Epoch 28 Batch 90/162] avg loss 0.00423977, throughput 9.34693K wps | |
[Epoch 28 Batch 120/162] avg loss 0.00503577, throughput 9.59636K wps | |
[Epoch 28 Batch 150/162] avg loss 0.00460634, throughput 9.33307K wps | |
Begin Testing... | |
[Epoch 28] train avg loss 0.00464542, dev acc 0.8978, dev avg loss 0.275934, throughput 9.50684K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 29 Batch 30/162] avg loss 0.00482399, throughput 9.56692K wps | |
[Epoch 29 Batch 60/162] avg loss 0.00454589, throughput 9.25683K wps | |
[Epoch 29 Batch 90/162] avg loss 0.00435749, throughput 9.57799K wps | |
[Epoch 29 Batch 120/162] avg loss 0.00478881, throughput 9.4965K wps | |
[Epoch 29 Batch 150/162] avg loss 0.00409712, throughput 9.51701K wps | |
Begin Testing... | |
[Epoch 29] train avg loss 0.00448683, dev acc 0.8978, dev avg loss 0.273418, throughput 9.4819K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 30 Batch 30/162] avg loss 0.00478067, throughput 9.80287K wps | |
[Epoch 30 Batch 60/162] avg loss 0.00419504, throughput 9.3498K wps | |
[Epoch 30 Batch 90/162] avg loss 0.00431353, throughput 9.42033K wps | |
[Epoch 30 Batch 120/162] avg loss 0.00470617, throughput 9.29221K wps | |
[Epoch 30 Batch 150/162] avg loss 0.00430479, throughput 9.39394K wps | |
Begin Testing... | |
[Epoch 30] train avg loss 0.00446317, dev acc 0.9000, dev avg loss 0.272378, throughput 9.45538K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 31 Batch 30/162] avg loss 0.00446604, throughput 9.61072K wps | |
[Epoch 31 Batch 60/162] avg loss 0.00452568, throughput 9.51161K wps | |
[Epoch 31 Batch 90/162] avg loss 0.00415589, throughput 9.45126K wps | |
[Epoch 31 Batch 120/162] avg loss 0.00450758, throughput 9.564K wps | |
[Epoch 31 Batch 150/162] avg loss 0.00432665, throughput 9.42179K wps | |
Begin Testing... | |
[Epoch 31] train avg loss 0.00440475, dev acc 0.8967, dev avg loss 0.269901, throughput 9.51569K wps | |
[Epoch 32 Batch 30/162] avg loss 0.00433878, throughput 9.51472K wps | |
[Epoch 32 Batch 60/162] avg loss 0.00443535, throughput 9.47296K wps | |
[Epoch 32 Batch 90/162] avg loss 0.00449091, throughput 9.45848K wps | |
[Epoch 32 Batch 120/162] avg loss 0.00428892, throughput 9.28994K wps | |
[Epoch 32 Batch 150/162] avg loss 0.00416887, throughput 9.31816K wps | |
Begin Testing... | |
[Epoch 32] train avg loss 0.00433306, dev acc 0.8989, dev avg loss 0.268951, throughput 9.4196K wps | |
[Epoch 33 Batch 30/162] avg loss 0.00421211, throughput 9.58092K wps | |
[Epoch 33 Batch 60/162] avg loss 0.00408083, throughput 9.63121K wps | |
[Epoch 33 Batch 90/162] avg loss 0.00463901, throughput 9.34245K wps | |
[Epoch 33 Batch 120/162] avg loss 0.0044146, throughput 9.38966K wps | |
[Epoch 33 Batch 150/162] avg loss 0.00431222, throughput 9.31115K wps | |
Begin Testing... | |
[Epoch 33] train avg loss 0.00428337, dev acc 0.8989, dev avg loss 0.268036, throughput 9.46413K wps | |
[Epoch 34 Batch 30/162] avg loss 0.00408815, throughput 9.47671K wps | |
[Epoch 34 Batch 60/162] avg loss 0.00419916, throughput 9.39808K wps | |
[Epoch 34 Batch 90/162] avg loss 0.00407953, throughput 9.35001K wps | |
[Epoch 34 Batch 120/162] avg loss 0.0037462, throughput 9.5343K wps | |
[Epoch 34 Batch 150/162] avg loss 0.00441781, throughput 9.38551K wps | |
Begin Testing... | |
[Epoch 34] train avg loss 0.00414092, dev acc 0.8989, dev avg loss 0.264133, throughput 9.41777K wps | |
[Epoch 35 Batch 30/162] avg loss 0.00430011, throughput 9.60308K wps | |
[Epoch 35 Batch 60/162] avg loss 0.0040314, throughput 9.43085K wps | |
[Epoch 35 Batch 90/162] avg loss 0.00414802, throughput 9.59724K wps | |
[Epoch 35 Batch 120/162] avg loss 0.00419202, throughput 9.41857K wps | |
[Epoch 35 Batch 150/162] avg loss 0.00377472, throughput 9.55167K wps | |
Begin Testing... | |
[Epoch 35] train avg loss 0.00408505, dev acc 0.8978, dev avg loss 0.262154, throughput 9.52085K wps | |
[Epoch 36 Batch 30/162] avg loss 0.00398365, throughput 9.4616K wps | |
[Epoch 36 Batch 60/162] avg loss 0.00406087, throughput 9.57624K wps | |
[Epoch 36 Batch 90/162] avg loss 0.00419579, throughput 9.36475K wps | |
[Epoch 36 Batch 120/162] avg loss 0.00370225, throughput 9.47195K wps | |
[Epoch 36 Batch 150/162] avg loss 0.00422434, throughput 9.56823K wps | |
Begin Testing... | |
[Epoch 36] train avg loss 0.00403643, dev acc 0.9022, dev avg loss 0.26199, throughput 9.48986K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 37 Batch 30/162] avg loss 0.00369248, throughput 9.70319K wps | |
[Epoch 37 Batch 60/162] avg loss 0.00399778, throughput 9.23667K wps | |
[Epoch 37 Batch 90/162] avg loss 0.00388887, throughput 9.34555K wps | |
[Epoch 37 Batch 120/162] avg loss 0.0043826, throughput 9.49703K wps | |
[Epoch 37 Batch 150/162] avg loss 0.00427731, throughput 9.60465K wps | |
Begin Testing... | |
[Epoch 37] train avg loss 0.00402125, dev acc 0.9000, dev avg loss 0.259542, throughput 9.4719K wps | |
[Epoch 38 Batch 30/162] avg loss 0.00380891, throughput 9.49633K wps | |
[Epoch 38 Batch 60/162] avg loss 0.0040139, throughput 9.49164K wps | |
[Epoch 38 Batch 90/162] avg loss 0.00386191, throughput 9.35476K wps | |
[Epoch 38 Batch 120/162] avg loss 0.00370861, throughput 9.45103K wps | |
[Epoch 38 Batch 150/162] avg loss 0.0042393, throughput 9.26727K wps | |
Begin Testing... | |
[Epoch 38] train avg loss 0.0039613, dev acc 0.9011, dev avg loss 0.258285, throughput 9.40156K wps | |
[Epoch 39 Batch 30/162] avg loss 0.00401124, throughput 9.54127K wps | |
[Epoch 39 Batch 60/162] avg loss 0.00390607, throughput 9.31581K wps | |
[Epoch 39 Batch 90/162] avg loss 0.00373968, throughput 9.27547K wps | |
[Epoch 39 Batch 120/162] avg loss 0.00396615, throughput 9.41871K wps | |
[Epoch 39 Batch 150/162] avg loss 0.00379264, throughput 9.39576K wps | |
Begin Testing... | |
[Epoch 39] train avg loss 0.00390702, dev acc 0.8989, dev avg loss 0.256657, throughput 9.39427K wps | |
[Epoch 40 Batch 30/162] avg loss 0.00393162, throughput 9.67434K wps | |
[Epoch 40 Batch 60/162] avg loss 0.0037819, throughput 9.57699K wps | |
[Epoch 40 Batch 90/162] avg loss 0.00377915, throughput 9.46876K wps | |
[Epoch 40 Batch 120/162] avg loss 0.00373343, throughput 9.45007K wps | |
[Epoch 40 Batch 150/162] avg loss 0.00380252, throughput 9.26224K wps | |
Begin Testing... | |
[Epoch 40] train avg loss 0.00381492, dev acc 0.8989, dev avg loss 0.255187, throughput 9.4706K wps | |
[Epoch 41 Batch 30/162] avg loss 0.00399802, throughput 9.49814K wps | |
[Epoch 41 Batch 60/162] avg loss 0.00345699, throughput 9.29504K wps | |
[Epoch 41 Batch 90/162] avg loss 0.00360227, throughput 9.45748K wps | |
[Epoch 41 Batch 120/162] avg loss 0.00398935, throughput 9.41777K wps | |
[Epoch 41 Batch 150/162] avg loss 0.00378132, throughput 9.27861K wps | |
Begin Testing... | |
[Epoch 41] train avg loss 0.00374704, dev acc 0.8989, dev avg loss 0.25391, throughput 9.38096K wps | |
[Epoch 42 Batch 30/162] avg loss 0.00357563, throughput 9.54487K wps | |
[Epoch 42 Batch 60/162] avg loss 0.00361566, throughput 9.62968K wps | |
[Epoch 42 Batch 90/162] avg loss 0.00354063, throughput 9.49293K wps | |
[Epoch 42 Batch 120/162] avg loss 0.00361762, throughput 9.44148K wps | |
[Epoch 42 Batch 150/162] avg loss 0.00418969, throughput 9.36544K wps | |
Begin Testing... | |
[Epoch 42] train avg loss 0.00373444, dev acc 0.9000, dev avg loss 0.25275, throughput 9.50581K wps | |
[Epoch 43 Batch 30/162] avg loss 0.00370573, throughput 9.5242K wps | |
[Epoch 43 Batch 60/162] avg loss 0.00373772, throughput 9.43488K wps | |
[Epoch 43 Batch 90/162] avg loss 0.00357915, throughput 9.30858K wps | |
[Epoch 43 Batch 120/162] avg loss 0.00372048, throughput 9.3571K wps | |
[Epoch 43 Batch 150/162] avg loss 0.00356147, throughput 9.36751K wps | |
Begin Testing... | |
[Epoch 43] train avg loss 0.00365763, dev acc 0.9011, dev avg loss 0.251234, throughput 9.39331K wps | |
[Epoch 44 Batch 30/162] avg loss 0.00378978, throughput 9.61609K wps | |
[Epoch 44 Batch 60/162] avg loss 0.00358534, throughput 9.39665K wps | |
[Epoch 44 Batch 90/162] avg loss 0.00380747, throughput 9.63629K wps | |
[Epoch 44 Batch 120/162] avg loss 0.00341006, throughput 9.35279K wps | |
[Epoch 44 Batch 150/162] avg loss 0.00383862, throughput 9.451K wps | |
Begin Testing... | |
[Epoch 44] train avg loss 0.00362973, dev acc 0.9011, dev avg loss 0.250194, throughput 9.48177K wps | |
[Epoch 45 Batch 30/162] avg loss 0.00359368, throughput 9.65537K wps | |
[Epoch 45 Batch 60/162] avg loss 0.00378111, throughput 9.51237K wps | |
[Epoch 45 Batch 90/162] avg loss 0.00363177, throughput 9.39398K wps | |
[Epoch 45 Batch 120/162] avg loss 0.00371031, throughput 9.26322K wps | |
[Epoch 45 Batch 150/162] avg loss 0.00336906, throughput 9.49175K wps | |
Begin Testing... | |
[Epoch 45] train avg loss 0.00359808, dev acc 0.9011, dev avg loss 0.249553, throughput 9.46465K wps | |
[Epoch 46 Batch 30/162] avg loss 0.00344127, throughput 9.72143K wps | |
[Epoch 46 Batch 60/162] avg loss 0.00328195, throughput 9.34182K wps | |
[Epoch 46 Batch 90/162] avg loss 0.00362182, throughput 9.33831K wps | |
[Epoch 46 Batch 120/162] avg loss 0.00373991, throughput 9.35089K wps | |
[Epoch 46 Batch 150/162] avg loss 0.00354795, throughput 9.44608K wps | |
Begin Testing... | |
[Epoch 46] train avg loss 0.00350159, dev acc 0.9056, dev avg loss 0.249423, throughput 9.43889K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 47 Batch 30/162] avg loss 0.00342204, throughput 9.64541K wps | |
[Epoch 47 Batch 60/162] avg loss 0.00347048, throughput 9.31912K wps | |
[Epoch 47 Batch 90/162] avg loss 0.00336565, throughput 9.37817K wps | |
[Epoch 47 Batch 120/162] avg loss 0.00349036, throughput 9.57605K wps | |
[Epoch 47 Batch 150/162] avg loss 0.00336753, throughput 9.44081K wps | |
Begin Testing... | |
[Epoch 47] train avg loss 0.00339945, dev acc 0.9067, dev avg loss 0.248338, throughput 9.44925K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 48 Batch 30/162] avg loss 0.00350012, throughput 9.44065K wps | |
[Epoch 48 Batch 60/162] avg loss 0.00330553, throughput 9.36003K wps | |
[Epoch 48 Batch 90/162] avg loss 0.00328175, throughput 9.31135K wps | |
[Epoch 48 Batch 120/162] avg loss 0.00379125, throughput 9.6338K wps | |
[Epoch 48 Batch 150/162] avg loss 0.00353947, throughput 9.38186K wps | |
Begin Testing... | |
[Epoch 48] train avg loss 0.00342202, dev acc 0.9067, dev avg loss 0.247824, throughput 9.43725K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 49 Batch 30/162] avg loss 0.00321639, throughput 9.54198K wps | |
[Epoch 49 Batch 60/162] avg loss 0.00353585, throughput 9.35214K wps | |
[Epoch 49 Batch 90/162] avg loss 0.00326472, throughput 9.29211K wps | |
[Epoch 49 Batch 120/162] avg loss 0.00307165, throughput 9.37366K wps | |
[Epoch 49 Batch 150/162] avg loss 0.00346831, throughput 9.27105K wps | |
Begin Testing... | |
[Epoch 49] train avg loss 0.00331722, dev acc 0.9022, dev avg loss 0.245726, throughput 9.35504K wps | |
[Epoch 50 Batch 30/162] avg loss 0.00318394, throughput 9.53669K wps | |
[Epoch 50 Batch 60/162] avg loss 0.00343808, throughput 9.49831K wps | |
[Epoch 50 Batch 90/162] avg loss 0.00327675, throughput 9.54207K wps | |
[Epoch 50 Batch 120/162] avg loss 0.00381823, throughput 9.49602K wps | |
[Epoch 50 Batch 150/162] avg loss 0.00297331, throughput 9.33471K wps | |
Begin Testing... | |
[Epoch 50] train avg loss 0.00333092, dev acc 0.9022, dev avg loss 0.244905, throughput 9.47841K wps | |
[Epoch 51 Batch 30/162] avg loss 0.00321521, throughput 9.52019K wps | |
[Epoch 51 Batch 60/162] avg loss 0.0032855, throughput 9.41971K wps | |
[Epoch 51 Batch 90/162] avg loss 0.00326702, throughput 9.27154K wps | |
[Epoch 51 Batch 120/162] avg loss 0.00322066, throughput 9.26474K wps | |
[Epoch 51 Batch 150/162] avg loss 0.00306519, throughput 9.47363K wps | |
Begin Testing... | |
[Epoch 51] train avg loss 0.00322283, dev acc 0.9022, dev avg loss 0.244081, throughput 9.39416K wps | |
[Epoch 52 Batch 30/162] avg loss 0.00318501, throughput 9.45487K wps | |
[Epoch 52 Batch 60/162] avg loss 0.00311068, throughput 9.43871K wps | |
[Epoch 52 Batch 90/162] avg loss 0.00349898, throughput 9.52401K wps | |
[Epoch 52 Batch 120/162] avg loss 0.00307708, throughput 9.49713K wps | |
[Epoch 52 Batch 150/162] avg loss 0.00310367, throughput 9.58282K wps | |
Begin Testing... | |
[Epoch 52] train avg loss 0.00319324, dev acc 0.9078, dev avg loss 0.244634, throughput 9.4922K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 53 Batch 30/162] avg loss 0.00330412, throughput 9.5568K wps | |
[Epoch 53 Batch 60/162] avg loss 0.00340498, throughput 9.38188K wps | |
[Epoch 53 Batch 90/162] avg loss 0.002726, throughput 9.33004K wps | |
[Epoch 53 Batch 120/162] avg loss 0.00325923, throughput 9.30984K wps | |
[Epoch 53 Batch 150/162] avg loss 0.00311163, throughput 9.50999K wps | |
Begin Testing... | |
[Epoch 53] train avg loss 0.00313863, dev acc 0.9078, dev avg loss 0.24387, throughput 9.40989K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 54 Batch 30/162] avg loss 0.00280522, throughput 9.5136K wps | |
[Epoch 54 Batch 60/162] avg loss 0.00312256, throughput 9.34957K wps | |
[Epoch 54 Batch 90/162] avg loss 0.0032516, throughput 9.49484K wps | |
[Epoch 54 Batch 120/162] avg loss 0.00296608, throughput 9.6302K wps | |
[Epoch 54 Batch 150/162] avg loss 0.00336393, throughput 9.57051K wps | |
Begin Testing... | |
[Epoch 54] train avg loss 0.00309793, dev acc 0.9067, dev avg loss 0.24246, throughput 9.5052K wps | |
[Epoch 55 Batch 30/162] avg loss 0.00311743, throughput 9.41864K wps | |
[Epoch 55 Batch 60/162] avg loss 0.00315787, throughput 9.32091K wps | |
[Epoch 55 Batch 90/162] avg loss 0.00304349, throughput 9.47142K wps | |
[Epoch 55 Batch 120/162] avg loss 0.00312005, throughput 9.26735K wps | |
[Epoch 55 Batch 150/162] avg loss 0.00288105, throughput 9.34897K wps | |
Begin Testing... | |
[Epoch 55] train avg loss 0.00304497, dev acc 0.9033, dev avg loss 0.240875, throughput 9.3566K wps | |
[Epoch 56 Batch 30/162] avg loss 0.00289679, throughput 9.51227K wps | |
[Epoch 56 Batch 60/162] avg loss 0.00314012, throughput 9.31222K wps | |
[Epoch 56 Batch 90/162] avg loss 0.00300704, throughput 9.40977K wps | |
[Epoch 56 Batch 120/162] avg loss 0.00313969, throughput 9.2384K wps | |
[Epoch 56 Batch 150/162] avg loss 0.00277625, throughput 9.54468K wps | |
Begin Testing... | |
[Epoch 56] train avg loss 0.00299008, dev acc 0.9067, dev avg loss 0.240352, throughput 9.39314K wps | |
[Epoch 57 Batch 30/162] avg loss 0.00299225, throughput 9.46794K wps | |
[Epoch 57 Batch 60/162] avg loss 0.00281025, throughput 9.35416K wps | |
[Epoch 57 Batch 90/162] avg loss 0.00297896, throughput 9.40174K wps | |
[Epoch 57 Batch 120/162] avg loss 0.00293832, throughput 9.25101K wps | |
[Epoch 57 Batch 150/162] avg loss 0.00276375, throughput 9.43863K wps | |
Begin Testing... | |
[Epoch 57] train avg loss 0.0029357, dev acc 0.9078, dev avg loss 0.240183, throughput 9.39874K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 58 Batch 30/162] avg loss 0.00280653, throughput 9.64921K wps | |
[Epoch 58 Batch 60/162] avg loss 0.00301661, throughput 9.3229K wps | |
[Epoch 58 Batch 90/162] avg loss 0.00294062, throughput 9.38517K wps | |
[Epoch 58 Batch 120/162] avg loss 0.00316029, throughput 9.40372K wps | |
[Epoch 58 Batch 150/162] avg loss 0.00312705, throughput 9.37382K wps | |
Begin Testing... | |
[Epoch 58] train avg loss 0.00300378, dev acc 0.9100, dev avg loss 0.240917, throughput 9.41193K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 59 Batch 30/162] avg loss 0.00293071, throughput 9.59655K wps | |
[Epoch 59 Batch 60/162] avg loss 0.00278885, throughput 9.3282K wps | |
[Epoch 59 Batch 90/162] avg loss 0.00260123, throughput 9.34104K wps | |
[Epoch 59 Batch 120/162] avg loss 0.00304476, throughput 9.48451K wps | |
[Epoch 59 Batch 150/162] avg loss 0.00294758, throughput 9.38572K wps | |
Begin Testing... | |
[Epoch 59] train avg loss 0.00288044, dev acc 0.9044, dev avg loss 0.238944, throughput 9.42144K wps | |
[Epoch 60 Batch 30/162] avg loss 0.00300888, throughput 9.568K wps | |
[Epoch 60 Batch 60/162] avg loss 0.00291366, throughput 9.44866K wps | |
[Epoch 60 Batch 90/162] avg loss 0.00267981, throughput 9.24282K wps | |
[Epoch 60 Batch 120/162] avg loss 0.00292821, throughput 9.48347K wps | |
[Epoch 60 Batch 150/162] avg loss 0.00276907, throughput 9.45153K wps | |
Begin Testing... | |
[Epoch 60] train avg loss 0.00283028, dev acc 0.9056, dev avg loss 0.239464, throughput 9.42948K wps | |
[Epoch 61 Batch 30/162] avg loss 0.00260855, throughput 9.67308K wps | |
[Epoch 61 Batch 60/162] avg loss 0.00278626, throughput 9.50827K wps | |
[Epoch 61 Batch 90/162] avg loss 0.00290986, throughput 9.53516K wps | |
[Epoch 61 Batch 120/162] avg loss 0.00299438, throughput 9.45582K wps | |
[Epoch 61 Batch 150/162] avg loss 0.00274816, throughput 9.48674K wps | |
Begin Testing... | |
[Epoch 61] train avg loss 0.00281143, dev acc 0.9056, dev avg loss 0.237615, throughput 9.5077K wps | |
[Epoch 62 Batch 30/162] avg loss 0.00288795, throughput 9.4864K wps | |
[Epoch 62 Batch 60/162] avg loss 0.00276271, throughput 9.33604K wps | |
[Epoch 62 Batch 90/162] avg loss 0.00266529, throughput 9.41457K wps | |
[Epoch 62 Batch 120/162] avg loss 0.00267132, throughput 9.4805K wps | |
[Epoch 62 Batch 150/162] avg loss 0.00278233, throughput 9.42662K wps | |
Begin Testing... | |
[Epoch 62] train avg loss 0.002759, dev acc 0.9056, dev avg loss 0.23689, throughput 9.42278K wps | |
[Epoch 63 Batch 30/162] avg loss 0.00265385, throughput 9.74434K wps | |
[Epoch 63 Batch 60/162] avg loss 0.00286414, throughput 9.40379K wps | |
[Epoch 63 Batch 90/162] avg loss 0.0028863, throughput 9.39386K wps | |
[Epoch 63 Batch 120/162] avg loss 0.00272298, throughput 9.50042K wps | |
[Epoch 63 Batch 150/162] avg loss 0.00257107, throughput 9.36452K wps | |
Begin Testing... | |
[Epoch 63] train avg loss 0.00273231, dev acc 0.9078, dev avg loss 0.23665, throughput 9.49075K wps | |
[Epoch 64 Batch 30/162] avg loss 0.00287369, throughput 9.70174K wps | |
[Epoch 64 Batch 60/162] avg loss 0.00268595, throughput 9.32407K wps | |
[Epoch 64 Batch 90/162] avg loss 0.00249368, throughput 9.40545K wps | |
[Epoch 64 Batch 120/162] avg loss 0.00279411, throughput 9.55831K wps | |
[Epoch 64 Batch 150/162] avg loss 0.00262947, throughput 9.50336K wps | |
Begin Testing... | |
[Epoch 64] train avg loss 0.00268018, dev acc 0.9067, dev avg loss 0.236, throughput 9.49381K wps | |
[Epoch 65 Batch 30/162] avg loss 0.00239457, throughput 9.43358K wps | |
[Epoch 65 Batch 60/162] avg loss 0.00267318, throughput 9.31637K wps | |
[Epoch 65 Batch 90/162] avg loss 0.00275123, throughput 9.47293K wps | |
[Epoch 65 Batch 120/162] avg loss 0.00294261, throughput 9.32146K wps | |
[Epoch 65 Batch 150/162] avg loss 0.00245131, throughput 9.52185K wps | |
Begin Testing... | |
[Epoch 65] train avg loss 0.00264365, dev acc 0.9089, dev avg loss 0.235717, throughput 9.40323K wps | |
[Epoch 66 Batch 30/162] avg loss 0.00268589, throughput 9.51885K wps | |
[Epoch 66 Batch 60/162] avg loss 0.00272469, throughput 9.6092K wps | |
[Epoch 66 Batch 90/162] avg loss 0.00248978, throughput 9.60937K wps | |
[Epoch 66 Batch 120/162] avg loss 0.00285202, throughput 9.5301K wps | |
[Epoch 66 Batch 150/162] avg loss 0.00249202, throughput 9.27504K wps | |
Begin Testing... | |
[Epoch 66] train avg loss 0.00264761, dev acc 0.9078, dev avg loss 0.235078, throughput 9.4942K wps | |
[Epoch 67 Batch 30/162] avg loss 0.00227765, throughput 9.60786K wps | |
[Epoch 67 Batch 60/162] avg loss 0.00280586, throughput 9.40882K wps | |
[Epoch 67 Batch 90/162] avg loss 0.00261633, throughput 9.45542K wps | |
[Epoch 67 Batch 120/162] avg loss 0.00236493, throughput 9.3603K wps | |
[Epoch 67 Batch 150/162] avg loss 0.00251714, throughput 9.48641K wps | |
Begin Testing... | |
[Epoch 67] train avg loss 0.0025287, dev acc 0.9078, dev avg loss 0.235218, throughput 9.47176K wps | |
[Epoch 68 Batch 30/162] avg loss 0.00247929, throughput 9.50213K wps | |
[Epoch 68 Batch 60/162] avg loss 0.00237766, throughput 9.40113K wps | |
[Epoch 68 Batch 90/162] avg loss 0.00258673, throughput 9.31912K wps | |
[Epoch 68 Batch 120/162] avg loss 0.00243504, throughput 9.55616K wps | |
[Epoch 68 Batch 150/162] avg loss 0.00259164, throughput 9.27384K wps | |
Begin Testing... | |
[Epoch 68] train avg loss 0.00252898, dev acc 0.9078, dev avg loss 0.235461, throughput 9.42179K wps | |
[Epoch 69 Batch 30/162] avg loss 0.00265411, throughput 9.56154K wps | |
[Epoch 69 Batch 60/162] avg loss 0.00241417, throughput 9.462K wps | |
[Epoch 69 Batch 90/162] avg loss 0.00254495, throughput 9.43332K wps | |
[Epoch 69 Batch 120/162] avg loss 0.0021794, throughput 9.3068K wps | |
[Epoch 69 Batch 150/162] avg loss 0.00265025, throughput 9.39508K wps | |
Begin Testing... | |
[Epoch 69] train avg loss 0.00246926, dev acc 0.9078, dev avg loss 0.235717, throughput 9.42618K wps | |
[Epoch 70 Batch 30/162] avg loss 0.00257705, throughput 9.51677K wps | |
[Epoch 70 Batch 60/162] avg loss 0.00226928, throughput 9.36987K wps | |
[Epoch 70 Batch 90/162] avg loss 0.00249652, throughput 9.38456K wps | |
[Epoch 70 Batch 120/162] avg loss 0.00211324, throughput 9.50327K wps | |
[Epoch 70 Batch 150/162] avg loss 0.00271965, throughput 9.46819K wps | |
Begin Testing... | |
[Epoch 70] train avg loss 0.00242111, dev acc 0.9089, dev avg loss 0.233756, throughput 9.43801K wps | |
[Epoch 71 Batch 30/162] avg loss 0.00235651, throughput 9.51452K wps | |
[Epoch 71 Batch 60/162] avg loss 0.00242878, throughput 9.47533K wps | |
[Epoch 71 Batch 90/162] avg loss 0.00259031, throughput 9.4441K wps | |
[Epoch 71 Batch 120/162] avg loss 0.00239291, throughput 9.36571K wps | |
[Epoch 71 Batch 150/162] avg loss 0.00234223, throughput 9.32055K wps | |
Begin Testing... | |
[Epoch 71] train avg loss 0.00241264, dev acc 0.9089, dev avg loss 0.233406, throughput 9.4133K wps | |
[Epoch 72 Batch 30/162] avg loss 0.00242557, throughput 9.51617K wps | |
[Epoch 72 Batch 60/162] avg loss 0.00242044, throughput 9.2519K wps | |
[Epoch 72 Batch 90/162] avg loss 0.00240604, throughput 9.48976K wps | |
[Epoch 72 Batch 120/162] avg loss 0.00222254, throughput 9.34074K wps | |
[Epoch 72 Batch 150/162] avg loss 0.00241634, throughput 9.24993K wps | |
Begin Testing... | |
[Epoch 72] train avg loss 0.00238551, dev acc 0.9078, dev avg loss 0.232928, throughput 9.37433K wps | |
[Epoch 73 Batch 30/162] avg loss 0.00225104, throughput 9.69523K wps | |
[Epoch 73 Batch 60/162] avg loss 0.00251166, throughput 9.39546K wps | |
[Epoch 73 Batch 90/162] avg loss 0.00224245, throughput 9.55207K wps | |
[Epoch 73 Batch 120/162] avg loss 0.00242628, throughput 9.45621K wps | |
[Epoch 73 Batch 150/162] avg loss 0.00239749, throughput 9.45189K wps | |
Begin Testing... | |
[Epoch 73] train avg loss 0.00236795, dev acc 0.9089, dev avg loss 0.232871, throughput 9.4951K wps | |
[Epoch 74 Batch 30/162] avg loss 0.00213699, throughput 9.43431K wps | |
[Epoch 74 Batch 60/162] avg loss 0.00242813, throughput 9.41929K wps | |
[Epoch 74 Batch 90/162] avg loss 0.00227872, throughput 9.30512K wps | |
[Epoch 74 Batch 120/162] avg loss 0.00230027, throughput 9.32665K wps | |
[Epoch 74 Batch 150/162] avg loss 0.00216548, throughput 9.6628K wps | |
Begin Testing... | |
[Epoch 74] train avg loss 0.00230945, dev acc 0.9100, dev avg loss 0.233672, throughput 9.44452K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 75 Batch 30/162] avg loss 0.00229099, throughput 9.66157K wps | |
[Epoch 75 Batch 60/162] avg loss 0.00231805, throughput 9.19063K wps | |
[Epoch 75 Batch 90/162] avg loss 0.00238184, throughput 9.45356K wps | |
[Epoch 75 Batch 120/162] avg loss 0.00212968, throughput 9.50277K wps | |
[Epoch 75 Batch 150/162] avg loss 0.00231665, throughput 9.42112K wps | |
Begin Testing... | |
[Epoch 75] train avg loss 0.00234109, dev acc 0.9067, dev avg loss 0.232223, throughput 9.43704K wps | |
[Epoch 76 Batch 30/162] avg loss 0.00228757, throughput 9.60872K wps | |
[Epoch 76 Batch 60/162] avg loss 0.00238633, throughput 9.20488K wps | |
[Epoch 76 Batch 90/162] avg loss 0.00226496, throughput 9.28966K wps | |
[Epoch 76 Batch 120/162] avg loss 0.00231158, throughput 9.40825K wps | |
[Epoch 76 Batch 150/162] avg loss 0.0021501, throughput 9.45735K wps | |
Begin Testing... | |
[Epoch 76] train avg loss 0.00224102, dev acc 0.9067, dev avg loss 0.232594, throughput 9.37968K wps | |
[Epoch 77 Batch 30/162] avg loss 0.00203727, throughput 9.46889K wps | |
[Epoch 77 Batch 60/162] avg loss 0.00236498, throughput 9.41467K wps | |
[Epoch 77 Batch 90/162] avg loss 0.00234807, throughput 9.31376K wps | |
[Epoch 77 Batch 120/162] avg loss 0.00238359, throughput 9.66378K wps | |
[Epoch 77 Batch 150/162] avg loss 0.00216933, throughput 9.54635K wps | |
Begin Testing... | |
[Epoch 77] train avg loss 0.00222111, dev acc 0.9078, dev avg loss 0.232175, throughput 9.46046K wps | |
[Epoch 78 Batch 30/162] avg loss 0.00237146, throughput 9.48115K wps | |
[Epoch 78 Batch 60/162] avg loss 0.00226466, throughput 9.40173K wps | |
[Epoch 78 Batch 90/162] avg loss 0.00214546, throughput 9.49443K wps | |
[Epoch 78 Batch 120/162] avg loss 0.00216933, throughput 9.43839K wps | |
[Epoch 78 Batch 150/162] avg loss 0.00218924, throughput 9.4294K wps | |
Begin Testing... | |
[Epoch 78] train avg loss 0.0022003, dev acc 0.9078, dev avg loss 0.231781, throughput 9.44886K wps | |
[Epoch 79 Batch 30/162] avg loss 0.00214779, throughput 9.63114K wps | |
[Epoch 79 Batch 60/162] avg loss 0.00208279, throughput 9.5244K wps | |
[Epoch 79 Batch 90/162] avg loss 0.00217244, throughput 9.30055K wps | |
[Epoch 79 Batch 120/162] avg loss 0.00213707, throughput 9.37776K wps | |
[Epoch 79 Batch 150/162] avg loss 0.00215754, throughput 9.31297K wps | |
Begin Testing... | |
[Epoch 79] train avg loss 0.00213948, dev acc 0.9078, dev avg loss 0.231746, throughput 9.43039K wps | |
[Epoch 80 Batch 30/162] avg loss 0.00212842, throughput 9.63843K wps | |
[Epoch 80 Batch 60/162] avg loss 0.00195101, throughput 9.35867K wps | |
[Epoch 80 Batch 90/162] avg loss 0.00209711, throughput 9.50732K wps | |
[Epoch 80 Batch 120/162] avg loss 0.00241555, throughput 9.50327K wps | |
[Epoch 80 Batch 150/162] avg loss 0.00220332, throughput 9.46487K wps | |
Begin Testing... | |
[Epoch 80] train avg loss 0.00215957, dev acc 0.9089, dev avg loss 0.231133, throughput 9.49374K wps | |
[Epoch 81 Batch 30/162] avg loss 0.0018672, throughput 9.54823K wps | |
[Epoch 81 Batch 60/162] avg loss 0.00219145, throughput 9.32757K wps | |
[Epoch 81 Batch 90/162] avg loss 0.00221833, throughput 9.41622K wps | |
[Epoch 81 Batch 120/162] avg loss 0.00197305, throughput 9.29481K wps | |
[Epoch 81 Batch 150/162] avg loss 0.00201746, throughput 9.45812K wps | |
Begin Testing... | |
[Epoch 81] train avg loss 0.00207677, dev acc 0.9100, dev avg loss 0.230953, throughput 9.37993K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 82 Batch 30/162] avg loss 0.00188539, throughput 9.65986K wps | |
[Epoch 82 Batch 60/162] avg loss 0.00204478, throughput 9.46403K wps | |
[Epoch 82 Batch 90/162] avg loss 0.00201831, throughput 9.2557K wps | |
[Epoch 82 Batch 120/162] avg loss 0.00241526, throughput 9.48381K wps | |
[Epoch 82 Batch 150/162] avg loss 0.0020828, throughput 9.29595K wps | |
Begin Testing... | |
[Epoch 82] train avg loss 0.00207981, dev acc 0.9078, dev avg loss 0.230809, throughput 9.44472K wps | |
[Epoch 83 Batch 30/162] avg loss 0.00223152, throughput 9.45692K wps | |
[Epoch 83 Batch 60/162] avg loss 0.00222273, throughput 9.41858K wps | |
[Epoch 83 Batch 90/162] avg loss 0.00200768, throughput 9.38583K wps | |
[Epoch 83 Batch 120/162] avg loss 0.00199637, throughput 9.35457K wps | |
[Epoch 83 Batch 150/162] avg loss 0.00221681, throughput 9.31573K wps | |
Begin Testing... | |
[Epoch 83] train avg loss 0.00212103, dev acc 0.9100, dev avg loss 0.230458, throughput 9.40186K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 84 Batch 30/162] avg loss 0.00215412, throughput 9.72015K wps | |
[Epoch 84 Batch 60/162] avg loss 0.00193827, throughput 9.26361K wps | |
[Epoch 84 Batch 90/162] avg loss 0.0021938, throughput 9.34604K wps | |
[Epoch 84 Batch 120/162] avg loss 0.00206668, throughput 9.28541K wps | |
[Epoch 84 Batch 150/162] avg loss 0.00189408, throughput 9.25179K wps | |
Begin Testing... | |
[Epoch 84] train avg loss 0.00204826, dev acc 0.9089, dev avg loss 0.230426, throughput 9.36956K wps | |
[Epoch 85 Batch 30/162] avg loss 0.00185962, throughput 9.52811K wps | |
[Epoch 85 Batch 60/162] avg loss 0.0020107, throughput 9.42722K wps | |
[Epoch 85 Batch 90/162] avg loss 0.00203444, throughput 9.41173K wps | |
[Epoch 85 Batch 120/162] avg loss 0.00194089, throughput 9.52288K wps | |
[Epoch 85 Batch 150/162] avg loss 0.00204689, throughput 9.42042K wps | |
Begin Testing... | |
[Epoch 85] train avg loss 0.00200129, dev acc 0.9067, dev avg loss 0.230463, throughput 9.47302K wps | |
[Epoch 86 Batch 30/162] avg loss 0.00177675, throughput 9.49244K wps | |
[Epoch 86 Batch 60/162] avg loss 0.00202632, throughput 9.66008K wps | |
[Epoch 86 Batch 90/162] avg loss 0.00204808, throughput 9.38044K wps | |
[Epoch 86 Batch 120/162] avg loss 0.00215758, throughput 9.52754K wps | |
[Epoch 86 Batch 150/162] avg loss 0.00180453, throughput 9.42865K wps | |
Begin Testing... | |
[Epoch 86] train avg loss 0.00196139, dev acc 0.9078, dev avg loss 0.231065, throughput 9.48334K wps | |
[Epoch 87 Batch 30/162] avg loss 0.0019585, throughput 9.55588K wps | |
[Epoch 87 Batch 60/162] avg loss 0.00199308, throughput 9.46191K wps | |
[Epoch 87 Batch 90/162] avg loss 0.00180912, throughput 9.33256K wps | |
[Epoch 87 Batch 120/162] avg loss 0.00202829, throughput 9.53227K wps | |
[Epoch 87 Batch 150/162] avg loss 0.00181426, throughput 9.34087K wps | |
Begin Testing... | |
[Epoch 87] train avg loss 0.00192785, dev acc 0.9067, dev avg loss 0.230277, throughput 9.44801K wps | |
[Epoch 88 Batch 30/162] avg loss 0.00176082, throughput 9.42677K wps | |
[Epoch 88 Batch 60/162] avg loss 0.0020671, throughput 9.46952K wps | |
[Epoch 88 Batch 90/162] avg loss 0.0018393, throughput 9.4635K wps | |
[Epoch 88 Batch 120/162] avg loss 0.00189293, throughput 9.41574K wps | |
[Epoch 88 Batch 150/162] avg loss 0.00200162, throughput 9.41319K wps | |
Begin Testing... | |
[Epoch 88] train avg loss 0.00192076, dev acc 0.9067, dev avg loss 0.229971, throughput 9.44874K wps | |
[Epoch 89 Batch 30/162] avg loss 0.00196706, throughput 9.57919K wps | |
[Epoch 89 Batch 60/162] avg loss 0.00193885, throughput 9.35798K wps | |
[Epoch 89 Batch 90/162] avg loss 0.00202345, throughput 9.34419K wps | |
[Epoch 89 Batch 120/162] avg loss 0.00190312, throughput 9.41812K wps | |
[Epoch 89 Batch 150/162] avg loss 0.00173288, throughput 9.36746K wps | |
Begin Testing... | |
[Epoch 89] train avg loss 0.00191078, dev acc 0.9089, dev avg loss 0.231062, throughput 9.43163K wps | |
[Epoch 90 Batch 30/162] avg loss 0.001871, throughput 9.48289K wps | |
[Epoch 90 Batch 60/162] avg loss 0.00174085, throughput 9.46199K wps | |
[Epoch 90 Batch 90/162] avg loss 0.00205183, throughput 9.26592K wps | |
[Epoch 90 Batch 120/162] avg loss 0.0018345, throughput 9.47404K wps | |
[Epoch 90 Batch 150/162] avg loss 0.00207165, throughput 9.42786K wps | |
Begin Testing... | |
[Epoch 90] train avg loss 0.00190886, dev acc 0.9089, dev avg loss 0.229688, throughput 9.42216K wps | |
[Epoch 91 Batch 30/162] avg loss 0.00185121, throughput 9.64327K wps | |
[Epoch 91 Batch 60/162] avg loss 0.00188709, throughput 9.33688K wps | |
[Epoch 91 Batch 90/162] avg loss 0.00186984, throughput 9.43082K wps | |
[Epoch 91 Batch 120/162] avg loss 0.00178008, throughput 9.49693K wps | |
[Epoch 91 Batch 150/162] avg loss 0.00195783, throughput 9.25683K wps | |
Begin Testing... | |
[Epoch 91] train avg loss 0.00187586, dev acc 0.9100, dev avg loss 0.229561, throughput 9.43077K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 92 Batch 30/162] avg loss 0.00191878, throughput 9.58135K wps | |
[Epoch 92 Batch 60/162] avg loss 0.00185139, throughput 9.45728K wps | |
[Epoch 92 Batch 90/162] avg loss 0.00199974, throughput 9.49063K wps | |
[Epoch 92 Batch 120/162] avg loss 0.00185051, throughput 9.25664K wps | |
[Epoch 92 Batch 150/162] avg loss 0.00176807, throughput 9.41778K wps | |
Begin Testing... | |
[Epoch 92] train avg loss 0.00186376, dev acc 0.9089, dev avg loss 0.229451, throughput 9.45148K wps | |
[Epoch 93 Batch 30/162] avg loss 0.00169152, throughput 9.58804K wps | |
[Epoch 93 Batch 60/162] avg loss 0.00184662, throughput 9.37001K wps | |
[Epoch 93 Batch 90/162] avg loss 0.00181727, throughput 9.30156K wps | |
[Epoch 93 Batch 120/162] avg loss 0.00197526, throughput 9.57869K wps | |
[Epoch 93 Batch 150/162] avg loss 0.00155701, throughput 9.44153K wps | |
Begin Testing... | |
[Epoch 93] train avg loss 0.00177294, dev acc 0.9067, dev avg loss 0.229503, throughput 9.44153K wps | |
[Epoch 94 Batch 30/162] avg loss 0.0017501, throughput 9.54151K wps | |
[Epoch 94 Batch 60/162] avg loss 0.00189529, throughput 9.4493K wps | |
[Epoch 94 Batch 90/162] avg loss 0.00171395, throughput 9.34279K wps | |
[Epoch 94 Batch 120/162] avg loss 0.00168188, throughput 9.3006K wps | |
[Epoch 94 Batch 150/162] avg loss 0.00176516, throughput 9.26229K wps | |
Begin Testing... | |
[Epoch 94] train avg loss 0.00177128, dev acc 0.9089, dev avg loss 0.229222, throughput 9.3895K wps | |
[Epoch 95 Batch 30/162] avg loss 0.00192507, throughput 9.48517K wps | |
[Epoch 95 Batch 60/162] avg loss 0.00171763, throughput 9.40488K wps | |
[Epoch 95 Batch 90/162] avg loss 0.00158727, throughput 9.49213K wps | |
[Epoch 95 Batch 120/162] avg loss 0.00190603, throughput 9.37772K wps | |
[Epoch 95 Batch 150/162] avg loss 0.00168699, throughput 9.34463K wps | |
Begin Testing... | |
[Epoch 95] train avg loss 0.00175758, dev acc 0.9056, dev avg loss 0.229732, throughput 9.41329K wps | |
[Epoch 96 Batch 30/162] avg loss 0.00164219, throughput 9.64598K wps | |
[Epoch 96 Batch 60/162] avg loss 0.00180407, throughput 9.26244K wps | |
[Epoch 96 Batch 90/162] avg loss 0.00168376, throughput 9.47289K wps | |
[Epoch 96 Batch 120/162] avg loss 0.00174691, throughput 9.56701K wps | |
[Epoch 96 Batch 150/162] avg loss 0.00146591, throughput 9.55153K wps | |
Begin Testing... | |
[Epoch 96] train avg loss 0.00166984, dev acc 0.9056, dev avg loss 0.230337, throughput 9.4896K wps | |
[Epoch 97 Batch 30/162] avg loss 0.00164687, throughput 9.45882K wps | |
[Epoch 97 Batch 60/162] avg loss 0.00149098, throughput 9.34822K wps | |
[Epoch 97 Batch 90/162] avg loss 0.00176893, throughput 9.42753K wps | |
[Epoch 97 Batch 120/162] avg loss 0.00181979, throughput 9.41794K wps | |
[Epoch 97 Batch 150/162] avg loss 0.00175131, throughput 9.41042K wps | |
Begin Testing... | |
[Epoch 97] train avg loss 0.00171622, dev acc 0.9056, dev avg loss 0.22993, throughput 9.42032K wps | |
[Epoch 98 Batch 30/162] avg loss 0.00152995, throughput 9.69674K wps | |
[Epoch 98 Batch 60/162] avg loss 0.00145983, throughput 9.30966K wps | |
[Epoch 98 Batch 90/162] avg loss 0.0015705, throughput 9.51451K wps | |
[Epoch 98 Batch 120/162] avg loss 0.00182273, throughput 9.4183K wps | |
[Epoch 98 Batch 150/162] avg loss 0.00193128, throughput 9.33212K wps | |
Begin Testing... | |
[Epoch 98] train avg loss 0.00168408, dev acc 0.9067, dev avg loss 0.230773, throughput 9.46114K wps | |
[Epoch 99 Batch 30/162] avg loss 0.00166608, throughput 9.61557K wps | |
[Epoch 99 Batch 60/162] avg loss 0.00174995, throughput 9.18671K wps | |
[Epoch 99 Batch 90/162] avg loss 0.0017058, throughput 9.48183K wps | |
[Epoch 99 Batch 120/162] avg loss 0.00178574, throughput 9.26609K wps | |
[Epoch 99 Batch 150/162] avg loss 0.00156201, throughput 9.28227K wps | |
Begin Testing... | |
[Epoch 99] train avg loss 0.0016845, dev acc 0.9044, dev avg loss 0.229811, throughput 9.35877K wps | |
[Epoch 100 Batch 30/162] avg loss 0.00161489, throughput 9.53247K wps | |
[Epoch 100 Batch 60/162] avg loss 0.00151833, throughput 9.30704K wps | |
[Epoch 100 Batch 90/162] avg loss 0.00163977, throughput 9.63101K wps | |
[Epoch 100 Batch 120/162] avg loss 0.00157494, throughput 9.45964K wps | |
[Epoch 100 Batch 150/162] avg loss 0.00160811, throughput 9.34586K wps | |
Begin Testing... | |
[Epoch 100] train avg loss 0.00158965, dev acc 0.9067, dev avg loss 0.229617, throughput 9.4506K wps | |
[Epoch 101 Batch 30/162] avg loss 0.00155059, throughput 9.58724K wps | |
[Epoch 101 Batch 60/162] avg loss 0.00159646, throughput 9.43018K wps | |
[Epoch 101 Batch 90/162] avg loss 0.00153429, throughput 9.39148K wps | |
[Epoch 101 Batch 120/162] avg loss 0.0017968, throughput 9.41788K wps | |
[Epoch 101 Batch 150/162] avg loss 0.00141439, throughput 9.53901K wps | |
Begin Testing... | |
[Epoch 101] train avg loss 0.00158273, dev acc 0.9056, dev avg loss 0.229857, throughput 9.48433K wps | |
[Epoch 102 Batch 30/162] avg loss 0.00145153, throughput 9.51218K wps | |
[Epoch 102 Batch 60/162] avg loss 0.00146349, throughput 9.39322K wps | |
[Epoch 102 Batch 90/162] avg loss 0.00155615, throughput 9.55787K wps | |
[Epoch 102 Batch 120/162] avg loss 0.00171872, throughput 9.48334K wps | |
[Epoch 102 Batch 150/162] avg loss 0.00179406, throughput 9.36459K wps | |
Begin Testing... | |
[Epoch 102] train avg loss 0.00158761, dev acc 0.9056, dev avg loss 0.229883, throughput 9.4716K wps | |
[Epoch 103 Batch 30/162] avg loss 0.0016011, throughput 9.51903K wps | |
[Epoch 103 Batch 60/162] avg loss 0.00142531, throughput 9.58996K wps | |
[Epoch 103 Batch 90/162] avg loss 0.00175576, throughput 9.39976K wps | |
[Epoch 103 Batch 120/162] avg loss 0.00163905, throughput 9.37047K wps | |
[Epoch 103 Batch 150/162] avg loss 0.00150459, throughput 9.43596K wps | |
Begin Testing... | |
[Epoch 103] train avg loss 0.00157957, dev acc 0.9089, dev avg loss 0.229327, throughput 9.47277K wps | |
[Epoch 104 Batch 30/162] avg loss 0.00153513, throughput 9.33868K wps | |
[Epoch 104 Batch 60/162] avg loss 0.00149964, throughput 9.5506K wps | |
[Epoch 104 Batch 90/162] avg loss 0.00160712, throughput 9.2441K wps | |
[Epoch 104 Batch 120/162] avg loss 0.00144635, throughput 9.53K wps | |
[Epoch 104 Batch 150/162] avg loss 0.00153667, throughput 9.47824K wps | |
Begin Testing... | |
[Epoch 104] train avg loss 0.00155162, dev acc 0.9078, dev avg loss 0.229468, throughput 9.40776K wps | |
[Epoch 105 Batch 30/162] avg loss 0.00141183, throughput 9.51289K wps | |
[Epoch 105 Batch 60/162] avg loss 0.00151852, throughput 9.49043K wps | |
[Epoch 105 Batch 90/162] avg loss 0.00152356, throughput 9.39677K wps | |
[Epoch 105 Batch 120/162] avg loss 0.0015766, throughput 9.27471K wps | |
[Epoch 105 Batch 150/162] avg loss 0.00143873, throughput 9.32436K wps | |
Begin Testing... | |
[Epoch 105] train avg loss 0.00149833, dev acc 0.9067, dev avg loss 0.229239, throughput 9.40824K wps | |
[Epoch 106 Batch 30/162] avg loss 0.00154192, throughput 9.75021K wps | |
[Epoch 106 Batch 60/162] avg loss 0.0015088, throughput 9.44832K wps | |
[Epoch 106 Batch 90/162] avg loss 0.00148075, throughput 9.30569K wps | |
[Epoch 106 Batch 120/162] avg loss 0.00147681, throughput 9.3575K wps | |
[Epoch 106 Batch 150/162] avg loss 0.00143831, throughput 9.3803K wps | |
Begin Testing... | |
[Epoch 106] train avg loss 0.00147954, dev acc 0.9078, dev avg loss 0.229283, throughput 9.42957K wps | |
[Epoch 107 Batch 30/162] avg loss 0.00161458, throughput 9.68302K wps | |
[Epoch 107 Batch 60/162] avg loss 0.00136929, throughput 9.38921K wps | |
[Epoch 107 Batch 90/162] avg loss 0.00169635, throughput 9.27688K wps | |
[Epoch 107 Batch 120/162] avg loss 0.00150827, throughput 9.4801K wps | |
[Epoch 107 Batch 150/162] avg loss 0.00157872, throughput 9.39013K wps | |
Begin Testing... | |
[Epoch 107] train avg loss 0.00155469, dev acc 0.9078, dev avg loss 0.229541, throughput 9.42395K wps | |
[Epoch 108 Batch 30/162] avg loss 0.00155633, throughput 9.56858K wps | |
[Epoch 108 Batch 60/162] avg loss 0.00148016, throughput 9.56082K wps | |
[Epoch 108 Batch 90/162] avg loss 0.0015239, throughput 9.29913K wps | |
[Epoch 108 Batch 120/162] avg loss 0.00149799, throughput 9.50152K wps | |
[Epoch 108 Batch 150/162] avg loss 0.00138366, throughput 9.34164K wps | |
Begin Testing... | |
[Epoch 108] train avg loss 0.00148478, dev acc 0.9067, dev avg loss 0.23007, throughput 9.44638K wps | |
[Epoch 109 Batch 30/162] avg loss 0.00148439, throughput 9.60358K wps | |
[Epoch 109 Batch 60/162] avg loss 0.00142033, throughput 9.33026K wps | |
[Epoch 109 Batch 90/162] avg loss 0.00131167, throughput 9.40558K wps | |
[Epoch 109 Batch 120/162] avg loss 0.00148522, throughput 9.38532K wps | |
[Epoch 109 Batch 150/162] avg loss 0.0016369, throughput 9.36925K wps | |
Begin Testing... | |
[Epoch 109] train avg loss 0.00146949, dev acc 0.9044, dev avg loss 0.230278, throughput 9.41857K wps | |
[Epoch 110 Batch 30/162] avg loss 0.00143108, throughput 9.57213K wps | |
[Epoch 110 Batch 60/162] avg loss 0.00141821, throughput 9.59217K wps | |
[Epoch 110 Batch 90/162] avg loss 0.00134611, throughput 9.18034K wps | |
[Epoch 110 Batch 120/162] avg loss 0.00144927, throughput 9.48332K wps | |
[Epoch 110 Batch 150/162] avg loss 0.00164894, throughput 9.54565K wps | |
Begin Testing... | |
[Epoch 110] train avg loss 0.00145218, dev acc 0.9044, dev avg loss 0.229918, throughput 9.46688K wps | |
[Epoch 111 Batch 30/162] avg loss 0.00155226, throughput 9.71186K wps | |
[Epoch 111 Batch 60/162] avg loss 0.00130029, throughput 9.2966K wps | |
[Epoch 111 Batch 90/162] avg loss 0.00141153, throughput 9.27986K wps | |
[Epoch 111 Batch 120/162] avg loss 0.00144165, throughput 9.43018K wps | |
[Epoch 111 Batch 150/162] avg loss 0.00145312, throughput 9.34458K wps | |
Begin Testing... | |
[Epoch 111] train avg loss 0.00143556, dev acc 0.9067, dev avg loss 0.230067, throughput 9.42379K wps | |
[Epoch 112 Batch 30/162] avg loss 0.00139717, throughput 9.45331K wps | |
[Epoch 112 Batch 60/162] avg loss 0.00138277, throughput 9.41432K wps | |
[Epoch 112 Batch 90/162] avg loss 0.00147046, throughput 9.40112K wps | |
[Epoch 112 Batch 120/162] avg loss 0.0013802, throughput 9.26478K wps | |
[Epoch 112 Batch 150/162] avg loss 0.00127427, throughput 9.24456K wps | |
Begin Testing... | |
[Epoch 112] train avg loss 0.00136954, dev acc 0.9067, dev avg loss 0.229929, throughput 9.34168K wps | |
[Epoch 113 Batch 30/162] avg loss 0.00143926, throughput 9.31985K wps | |
[Epoch 113 Batch 60/162] avg loss 0.00141941, throughput 9.49312K wps | |
[Epoch 113 Batch 90/162] avg loss 0.00138592, throughput 9.25285K wps | |
[Epoch 113 Batch 120/162] avg loss 0.0014533, throughput 9.49174K wps | |
[Epoch 113 Batch 150/162] avg loss 0.00131121, throughput 9.22303K wps | |
Begin Testing... | |
[Epoch 113] train avg loss 0.00139108, dev acc 0.9056, dev avg loss 0.230654, throughput 9.36915K wps | |
[Epoch 114 Batch 30/162] avg loss 0.00132429, throughput 9.55196K wps | |
[Epoch 114 Batch 60/162] avg loss 0.00138075, throughput 9.25876K wps | |
[Epoch 114 Batch 90/162] avg loss 0.00130877, throughput 9.51793K wps | |
[Epoch 114 Batch 120/162] avg loss 0.00149331, throughput 9.36408K wps | |
[Epoch 114 Batch 150/162] avg loss 0.00127485, throughput 9.24873K wps | |
Begin Testing... | |
[Epoch 114] train avg loss 0.00134654, dev acc 0.9089, dev avg loss 0.229764, throughput 9.39565K wps | |
[Epoch 115 Batch 30/162] avg loss 0.00143874, throughput 9.50552K wps | |
[Epoch 115 Batch 60/162] avg loss 0.0012415, throughput 9.275K wps | |
[Epoch 115 Batch 90/162] avg loss 0.00140917, throughput 9.57682K wps | |
[Epoch 115 Batch 120/162] avg loss 0.00133977, throughput 9.35983K wps | |
[Epoch 115 Batch 150/162] avg loss 0.00131923, throughput 9.24978K wps | |
Begin Testing... | |
[Epoch 115] train avg loss 0.0013441, dev acc 0.9078, dev avg loss 0.229782, throughput 9.3838K wps | |
[Epoch 116 Batch 30/162] avg loss 0.00130383, throughput 9.59202K wps | |
[Epoch 116 Batch 60/162] avg loss 0.00136091, throughput 9.53288K wps | |
[Epoch 116 Batch 90/162] avg loss 0.00137945, throughput 9.48212K wps | |
[Epoch 116 Batch 120/162] avg loss 0.00136721, throughput 9.42407K wps | |
[Epoch 116 Batch 150/162] avg loss 0.0013071, throughput 9.43917K wps | |
Begin Testing... | |
[Epoch 116] train avg loss 0.00134737, dev acc 0.9089, dev avg loss 0.229896, throughput 9.50402K wps | |
[Epoch 117 Batch 30/162] avg loss 0.00134371, throughput 9.66092K wps | |
[Epoch 117 Batch 60/162] avg loss 0.00131858, throughput 9.41767K wps | |
[Epoch 117 Batch 90/162] avg loss 0.001278, throughput 9.58842K wps | |
[Epoch 117 Batch 120/162] avg loss 0.00138105, throughput 9.36733K wps | |
[Epoch 117 Batch 150/162] avg loss 0.00132221, throughput 9.27931K wps | |
Begin Testing... | |
[Epoch 117] train avg loss 0.00132176, dev acc 0.9089, dev avg loss 0.230519, throughput 9.46554K wps | |
[Epoch 118 Batch 30/162] avg loss 0.00149979, throughput 9.60908K wps | |
[Epoch 118 Batch 60/162] avg loss 0.00116583, throughput 9.47154K wps | |
[Epoch 118 Batch 90/162] avg loss 0.00124398, throughput 9.3079K wps | |
[Epoch 118 Batch 120/162] avg loss 0.00132941, throughput 9.37475K wps | |
[Epoch 118 Batch 150/162] avg loss 0.00113277, throughput 9.42613K wps | |
Begin Testing... | |
[Epoch 118] train avg loss 0.00127486, dev acc 0.9089, dev avg loss 0.230407, throughput 9.43966K wps | |
[Epoch 119 Batch 30/162] avg loss 0.00133807, throughput 9.57203K wps | |
[Epoch 119 Batch 60/162] avg loss 0.00126535, throughput 9.41099K wps | |
[Epoch 119 Batch 90/162] avg loss 0.00132665, throughput 9.52672K wps | |
[Epoch 119 Batch 120/162] avg loss 0.00127105, throughput 9.32167K wps | |
[Epoch 119 Batch 150/162] avg loss 0.00130381, throughput 9.5156K wps | |
Begin Testing... | |
[Epoch 119] train avg loss 0.00129623, dev acc 0.9078, dev avg loss 0.230656, throughput 9.45224K wps | |
[Epoch 120 Batch 30/162] avg loss 0.00119283, throughput 9.74089K wps | |
[Epoch 120 Batch 60/162] avg loss 0.00130535, throughput 9.2668K wps | |
[Epoch 120 Batch 90/162] avg loss 0.00129203, throughput 9.41908K wps | |
[Epoch 120 Batch 120/162] avg loss 0.00122921, throughput 9.2502K wps | |
[Epoch 120 Batch 150/162] avg loss 0.00121516, throughput 9.36331K wps | |
Begin Testing... | |
[Epoch 120] train avg loss 0.00124534, dev acc 0.9067, dev avg loss 0.232324, throughput 9.39858K wps | |
[Epoch 121 Batch 30/162] avg loss 0.00130413, throughput 9.51718K wps | |
[Epoch 121 Batch 60/162] avg loss 0.00123464, throughput 9.31816K wps | |
[Epoch 121 Batch 90/162] avg loss 0.00131969, throughput 9.32898K wps | |
[Epoch 121 Batch 120/162] avg loss 0.00112522, throughput 9.52111K wps | |
[Epoch 121 Batch 150/162] avg loss 0.00135551, throughput 9.27221K wps | |
Begin Testing... | |
[Epoch 121] train avg loss 0.00126869, dev acc 0.9089, dev avg loss 0.232824, throughput 9.38386K wps | |
[Epoch 122 Batch 30/162] avg loss 0.00121979, throughput 9.61797K wps | |
[Epoch 122 Batch 60/162] avg loss 0.00128912, throughput 9.48748K wps | |
[Epoch 122 Batch 90/162] avg loss 0.00114383, throughput 9.51794K wps | |
[Epoch 122 Batch 120/162] avg loss 0.00123551, throughput 9.40637K wps | |
[Epoch 122 Batch 150/162] avg loss 0.0013794, throughput 9.27151K wps | |
Begin Testing... | |
[Epoch 122] train avg loss 0.00125345, dev acc 0.9089, dev avg loss 0.231178, throughput 9.47191K wps | |
[Epoch 123 Batch 30/162] avg loss 0.00123153, throughput 9.44986K wps | |
[Epoch 123 Batch 60/162] avg loss 0.00117243, throughput 9.37411K wps | |
[Epoch 123 Batch 90/162] avg loss 0.00115469, throughput 9.23474K wps | |
[Epoch 123 Batch 120/162] avg loss 0.00120026, throughput 9.60462K wps | |
[Epoch 123 Batch 150/162] avg loss 0.00130324, throughput 9.31313K wps | |
Begin Testing... | |
[Epoch 123] train avg loss 0.00120856, dev acc 0.9100, dev avg loss 0.230575, throughput 9.38487K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 124 Batch 30/162] avg loss 0.00131587, throughput 9.45153K wps | |
[Epoch 124 Batch 60/162] avg loss 0.00107548, throughput 9.32783K wps | |
[Epoch 124 Batch 90/162] avg loss 0.00116448, throughput 9.38583K wps | |
[Epoch 124 Batch 120/162] avg loss 0.00123663, throughput 9.40002K wps | |
[Epoch 124 Batch 150/162] avg loss 0.00116129, throughput 9.54536K wps | |
Begin Testing... | |
[Epoch 124] train avg loss 0.00119748, dev acc 0.9078, dev avg loss 0.231368, throughput 9.4279K wps | |
[Epoch 125 Batch 30/162] avg loss 0.00122607, throughput 9.60759K wps | |
[Epoch 125 Batch 60/162] avg loss 0.000989195, throughput 9.34235K wps | |
[Epoch 125 Batch 90/162] avg loss 0.00120134, throughput 9.42609K wps | |
[Epoch 125 Batch 120/162] avg loss 0.00116429, throughput 9.30946K wps | |
[Epoch 125 Batch 150/162] avg loss 0.00120988, throughput 9.45256K wps | |
Begin Testing... | |
[Epoch 125] train avg loss 0.00117088, dev acc 0.9067, dev avg loss 0.232322, throughput 9.43699K wps | |
[Epoch 126 Batch 30/162] avg loss 0.00125266, throughput 9.48973K wps | |
[Epoch 126 Batch 60/162] avg loss 0.00116606, throughput 9.36464K wps | |
[Epoch 126 Batch 90/162] avg loss 0.00106504, throughput 9.47364K wps | |
[Epoch 126 Batch 120/162] avg loss 0.00133055, throughput 9.3321K wps | |
[Epoch 126 Batch 150/162] avg loss 0.00117007, throughput 9.45828K wps | |
Begin Testing... | |
[Epoch 126] train avg loss 0.00119145, dev acc 0.9089, dev avg loss 0.230981, throughput 9.41892K wps | |
[Epoch 127 Batch 30/162] avg loss 0.00125357, throughput 9.55533K wps | |
[Epoch 127 Batch 60/162] avg loss 0.00118331, throughput 9.53164K wps | |
[Epoch 127 Batch 90/162] avg loss 0.00122439, throughput 9.4177K wps | |
[Epoch 127 Batch 120/162] avg loss 0.0011428, throughput 9.35033K wps | |
[Epoch 127 Batch 150/162] avg loss 0.00119592, throughput 9.29073K wps | |
Begin Testing... | |
[Epoch 127] train avg loss 0.00118048, dev acc 0.9089, dev avg loss 0.233368, throughput 9.43154K wps | |
[Epoch 128 Batch 30/162] avg loss 0.00127954, throughput 9.52522K wps | |
[Epoch 128 Batch 60/162] avg loss 0.00116916, throughput 9.47814K wps | |
[Epoch 128 Batch 90/162] avg loss 0.00116029, throughput 9.42214K wps | |
[Epoch 128 Batch 120/162] avg loss 0.00101199, throughput 9.50624K wps | |
[Epoch 128 Batch 150/162] avg loss 0.00120605, throughput 9.48094K wps | |
Begin Testing... | |
[Epoch 128] train avg loss 0.00116465, dev acc 0.9089, dev avg loss 0.231094, throughput 9.48963K wps | |
[Epoch 129 Batch 30/162] avg loss 0.0010908, throughput 9.40183K wps | |
[Epoch 129 Batch 60/162] avg loss 0.00121261, throughput 9.32102K wps | |
[Epoch 129 Batch 90/162] avg loss 0.00120329, throughput 9.32543K wps | |
[Epoch 129 Batch 120/162] avg loss 0.00103374, throughput 9.40656K wps | |
[Epoch 129 Batch 150/162] avg loss 0.00116011, throughput 9.30561K wps | |
Begin Testing... | |
[Epoch 129] train avg loss 0.00112974, dev acc 0.9078, dev avg loss 0.231639, throughput 9.36664K wps | |
[Epoch 130 Batch 30/162] avg loss 0.00127606, throughput 9.51606K wps | |
[Epoch 130 Batch 60/162] avg loss 0.00115915, throughput 9.39679K wps | |
[Epoch 130 Batch 90/162] avg loss 0.00118054, throughput 9.40936K wps | |
[Epoch 130 Batch 120/162] avg loss 0.00105857, throughput 9.5402K wps | |
[Epoch 130 Batch 150/162] avg loss 0.00117376, throughput 9.30031K wps | |
Begin Testing... | |
[Epoch 130] train avg loss 0.00116596, dev acc 0.9089, dev avg loss 0.231688, throughput 9.43072K wps | |
[Epoch 131 Batch 30/162] avg loss 0.0011466, throughput 9.78201K wps | |
[Epoch 131 Batch 60/162] avg loss 0.00112706, throughput 9.31586K wps | |
[Epoch 131 Batch 90/162] avg loss 0.00104819, throughput 9.3102K wps | |
[Epoch 131 Batch 120/162] avg loss 0.00114394, throughput 9.37388K wps | |
[Epoch 131 Batch 150/162] avg loss 0.00117354, throughput 9.33197K wps | |
Begin Testing... | |
[Epoch 131] train avg loss 0.00111917, dev acc 0.9078, dev avg loss 0.231436, throughput 9.42101K wps | |
[Epoch 132 Batch 30/162] avg loss 0.00113781, throughput 9.52037K wps | |
[Epoch 132 Batch 60/162] avg loss 0.00101034, throughput 9.2867K wps | |
[Epoch 132 Batch 90/162] avg loss 0.00120764, throughput 9.32912K wps | |
[Epoch 132 Batch 120/162] avg loss 0.00107254, throughput 9.3168K wps | |
[Epoch 132 Batch 150/162] avg loss 0.00117417, throughput 9.35506K wps | |
Begin Testing... | |
[Epoch 132] train avg loss 0.00111857, dev acc 0.9089, dev avg loss 0.23163, throughput 9.35942K wps | |
[Epoch 133 Batch 30/162] avg loss 0.00116766, throughput 9.51698K wps | |
[Epoch 133 Batch 60/162] avg loss 0.00117161, throughput 9.35977K wps | |
[Epoch 133 Batch 90/162] avg loss 0.00105401, throughput 9.31941K wps | |
[Epoch 133 Batch 120/162] avg loss 0.00102707, throughput 9.23051K wps | |
[Epoch 133 Batch 150/162] avg loss 0.00107245, throughput 9.30163K wps | |
Begin Testing... | |
[Epoch 133] train avg loss 0.00108673, dev acc 0.9089, dev avg loss 0.232259, throughput 9.36327K wps | |
[Epoch 134 Batch 30/162] avg loss 0.00110652, throughput 9.5603K wps | |
[Epoch 134 Batch 60/162] avg loss 0.00104329, throughput 9.52606K wps | |
[Epoch 134 Batch 90/162] avg loss 0.00115807, throughput 9.19236K wps | |
[Epoch 134 Batch 120/162] avg loss 0.00112703, throughput 9.39201K wps | |
[Epoch 134 Batch 150/162] avg loss 0.000917882, throughput 9.33058K wps | |
Begin Testing... | |
[Epoch 134] train avg loss 0.00106053, dev acc 0.9089, dev avg loss 0.232083, throughput 9.40031K wps | |
[Epoch 135 Batch 30/162] avg loss 0.00120027, throughput 9.6566K wps | |
[Epoch 135 Batch 60/162] avg loss 0.000972883, throughput 9.36635K wps | |
[Epoch 135 Batch 90/162] avg loss 0.0010226, throughput 9.50579K wps | |
[Epoch 135 Batch 120/162] avg loss 0.00107877, throughput 9.57004K wps | |
[Epoch 135 Batch 150/162] avg loss 0.00104918, throughput 9.43732K wps | |
Begin Testing... | |
[Epoch 135] train avg loss 0.00106597, dev acc 0.9100, dev avg loss 0.233435, throughput 9.4907K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 136 Batch 30/162] avg loss 0.00104226, throughput 9.49632K wps | |
[Epoch 136 Batch 60/162] avg loss 0.0011248, throughput 9.42263K wps | |
[Epoch 136 Batch 90/162] avg loss 0.000979714, throughput 9.39932K wps | |
[Epoch 136 Batch 120/162] avg loss 0.00112235, throughput 9.53759K wps | |
[Epoch 136 Batch 150/162] avg loss 0.00106865, throughput 9.28172K wps | |
Begin Testing... | |
[Epoch 136] train avg loss 0.00107747, dev acc 0.9089, dev avg loss 0.232289, throughput 9.41755K wps | |
[Epoch 137 Batch 30/162] avg loss 0.00107265, throughput 9.6354K wps | |
[Epoch 137 Batch 60/162] avg loss 0.0010436, throughput 9.3048K wps | |
[Epoch 137 Batch 90/162] avg loss 0.00106676, throughput 9.47605K wps | |
[Epoch 137 Batch 120/162] avg loss 0.00101127, throughput 9.33882K wps | |
[Epoch 137 Batch 150/162] avg loss 0.0010889, throughput 9.41961K wps | |
Begin Testing... | |
[Epoch 137] train avg loss 0.00104105, dev acc 0.9089, dev avg loss 0.232055, throughput 9.42202K wps | |
[Epoch 138 Batch 30/162] avg loss 0.00101001, throughput 9.52996K wps | |
[Epoch 138 Batch 60/162] avg loss 0.00107274, throughput 9.40712K wps | |
[Epoch 138 Batch 90/162] avg loss 0.00112835, throughput 9.54277K wps | |
[Epoch 138 Batch 120/162] avg loss 0.00109074, throughput 9.36165K wps | |
[Epoch 138 Batch 150/162] avg loss 0.000926584, throughput 9.47375K wps | |
Begin Testing... | |
[Epoch 138] train avg loss 0.00102443, dev acc 0.9100, dev avg loss 0.234792, throughput 9.45976K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 139 Batch 30/162] avg loss 0.000917554, throughput 9.50602K wps | |
[Epoch 139 Batch 60/162] avg loss 0.00102483, throughput 9.56401K wps | |
[Epoch 139 Batch 90/162] avg loss 0.00104301, throughput 9.30412K wps | |
[Epoch 139 Batch 120/162] avg loss 0.000902007, throughput 9.42485K wps | |
[Epoch 139 Batch 150/162] avg loss 0.000978564, throughput 9.34043K wps | |
Begin Testing... | |
[Epoch 139] train avg loss 0.000973383, dev acc 0.9089, dev avg loss 0.23379, throughput 9.41532K wps | |
[Epoch 140 Batch 30/162] avg loss 0.00112363, throughput 9.58172K wps | |
[Epoch 140 Batch 60/162] avg loss 0.000927711, throughput 9.35552K wps | |
[Epoch 140 Batch 90/162] avg loss 0.00105667, throughput 9.27178K wps | |
[Epoch 140 Batch 120/162] avg loss 0.00109079, throughput 9.33046K wps | |
[Epoch 140 Batch 150/162] avg loss 0.000950986, throughput 9.45505K wps | |
Begin Testing... | |
[Epoch 140] train avg loss 0.00103573, dev acc 0.9089, dev avg loss 0.232376, throughput 9.38938K wps | |
[Epoch 141 Batch 30/162] avg loss 0.0010825, throughput 9.57874K wps | |
[Epoch 141 Batch 60/162] avg loss 0.000925724, throughput 9.40497K wps | |
[Epoch 141 Batch 90/162] avg loss 0.000907817, throughput 9.54674K wps | |
[Epoch 141 Batch 120/162] avg loss 0.00105981, throughput 9.22967K wps | |
[Epoch 141 Batch 150/162] avg loss 0.00104694, throughput 9.48152K wps | |
Begin Testing... | |
[Epoch 141] train avg loss 0.000989632, dev acc 0.9100, dev avg loss 0.232313, throughput 9.45458K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 142 Batch 30/162] avg loss 0.000791197, throughput 9.63629K wps | |
[Epoch 142 Batch 60/162] avg loss 0.00111298, throughput 9.30087K wps | |
[Epoch 142 Batch 90/162] avg loss 0.00109017, throughput 9.34436K wps | |
[Epoch 142 Batch 120/162] avg loss 0.000968016, throughput 9.51697K wps | |
[Epoch 142 Batch 150/162] avg loss 0.00100595, throughput 9.40806K wps | |
Begin Testing... | |
[Epoch 142] train avg loss 0.000993251, dev acc 0.9089, dev avg loss 0.232322, throughput 9.45164K wps | |
[Epoch 143 Batch 30/162] avg loss 0.000981228, throughput 9.60639K wps | |
[Epoch 143 Batch 60/162] avg loss 0.00095341, throughput 9.27932K wps | |
[Epoch 143 Batch 90/162] avg loss 0.000919176, throughput 9.42186K wps | |
[Epoch 143 Batch 120/162] avg loss 0.00104658, throughput 9.38615K wps | |
[Epoch 143 Batch 150/162] avg loss 0.000867069, throughput 9.33962K wps | |
Begin Testing... | |
[Epoch 143] train avg loss 0.000950099, dev acc 0.9078, dev avg loss 0.232937, throughput 9.42267K wps | |
[Epoch 144 Batch 30/162] avg loss 0.000974376, throughput 9.6411K wps | |
[Epoch 144 Batch 60/162] avg loss 0.00100757, throughput 9.38647K wps | |
[Epoch 144 Batch 90/162] avg loss 0.000908716, throughput 9.53237K wps | |
[Epoch 144 Batch 120/162] avg loss 0.000833444, throughput 9.28649K wps | |
[Epoch 144 Batch 150/162] avg loss 0.000887538, throughput 9.41759K wps | |
Begin Testing... | |
[Epoch 144] train avg loss 0.000939386, dev acc 0.9089, dev avg loss 0.23314, throughput 9.44574K wps | |
[Epoch 145 Batch 30/162] avg loss 0.00106849, throughput 9.56686K wps | |
[Epoch 145 Batch 60/162] avg loss 0.00101651, throughput 9.45993K wps | |
[Epoch 145 Batch 90/162] avg loss 0.000900312, throughput 9.31088K wps | |
[Epoch 145 Batch 120/162] avg loss 0.00108238, throughput 9.24731K wps | |
[Epoch 145 Batch 150/162] avg loss 0.000922512, throughput 9.37205K wps | |
Begin Testing... | |
[Epoch 145] train avg loss 0.000991222, dev acc 0.9089, dev avg loss 0.233307, throughput 9.37975K wps | |
[Epoch 146 Batch 30/162] avg loss 0.000977123, throughput 9.64177K wps | |
[Epoch 146 Batch 60/162] avg loss 0.000843502, throughput 9.33919K wps | |
[Epoch 146 Batch 90/162] avg loss 0.000988307, throughput 9.34273K wps | |
[Epoch 146 Batch 120/162] avg loss 0.000975136, throughput 9.5271K wps | |
[Epoch 146 Batch 150/162] avg loss 0.0010214, throughput 9.32937K wps | |
Begin Testing... | |
[Epoch 146] train avg loss 0.000946645, dev acc 0.9089, dev avg loss 0.23337, throughput 9.42273K wps | |
[Epoch 147 Batch 30/162] avg loss 0.000967704, throughput 9.7456K wps | |
[Epoch 147 Batch 60/162] avg loss 0.000919638, throughput 9.37475K wps | |
[Epoch 147 Batch 90/162] avg loss 0.00104986, throughput 9.57752K wps | |
[Epoch 147 Batch 120/162] avg loss 0.00083452, throughput 9.30869K wps | |
[Epoch 147 Batch 150/162] avg loss 0.000892878, throughput 9.39343K wps | |
Begin Testing... | |
[Epoch 147] train avg loss 0.000933025, dev acc 0.9089, dev avg loss 0.234006, throughput 9.45983K wps | |
[Epoch 148 Batch 30/162] avg loss 0.000899146, throughput 9.68824K wps | |
[Epoch 148 Batch 60/162] avg loss 0.000853725, throughput 9.4349K wps | |
[Epoch 148 Batch 90/162] avg loss 0.000912254, throughput 9.31751K wps | |
[Epoch 148 Batch 120/162] avg loss 0.000917275, throughput 9.34575K wps | |
[Epoch 148 Batch 150/162] avg loss 0.00105725, throughput 9.50296K wps | |
Begin Testing... | |
[Epoch 148] train avg loss 0.000918427, dev acc 0.9078, dev avg loss 0.233348, throughput 9.4517K wps | |
[Epoch 149 Batch 30/162] avg loss 0.000901917, throughput 9.57802K wps | |
[Epoch 149 Batch 60/162] avg loss 0.0010974, throughput 9.31524K wps | |
[Epoch 149 Batch 90/162] avg loss 0.000937287, throughput 9.2491K wps | |
[Epoch 149 Batch 120/162] avg loss 0.000894007, throughput 9.41471K wps | |
[Epoch 149 Batch 150/162] avg loss 0.000729819, throughput 9.5361K wps | |
Begin Testing... | |
[Epoch 149] train avg loss 0.000923802, dev acc 0.9111, dev avg loss 0.234489, throughput 9.43384K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 150 Batch 30/162] avg loss 0.000804541, throughput 9.64593K wps | |
[Epoch 150 Batch 60/162] avg loss 0.000940878, throughput 9.30358K wps | |
[Epoch 150 Batch 90/162] avg loss 0.00092122, throughput 9.45642K wps | |
[Epoch 150 Batch 120/162] avg loss 0.000889109, throughput 9.56398K wps | |
[Epoch 150 Batch 150/162] avg loss 0.000911743, throughput 9.31791K wps | |
Begin Testing... | |
[Epoch 150] train avg loss 0.000909455, dev acc 0.9133, dev avg loss 0.237016, throughput 9.45791K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 151 Batch 30/162] avg loss 0.00104687, throughput 9.62529K wps | |
[Epoch 151 Batch 60/162] avg loss 0.000783363, throughput 9.30955K wps | |
[Epoch 151 Batch 90/162] avg loss 0.000865708, throughput 9.47896K wps | |
[Epoch 151 Batch 120/162] avg loss 0.000898965, throughput 9.18108K wps | |
[Epoch 151 Batch 150/162] avg loss 0.000912511, throughput 9.32852K wps | |
Begin Testing... | |
[Epoch 151] train avg loss 0.000901924, dev acc 0.9089, dev avg loss 0.234193, throughput 9.39453K wps | |
[Epoch 152 Batch 30/162] avg loss 0.000942033, throughput 9.66154K wps | |
[Epoch 152 Batch 60/162] avg loss 0.000917, throughput 9.32729K wps | |
[Epoch 152 Batch 90/162] avg loss 0.000853672, throughput 9.46519K wps | |
[Epoch 152 Batch 120/162] avg loss 0.000865698, throughput 9.61104K wps | |
[Epoch 152 Batch 150/162] avg loss 0.000930457, throughput 9.46066K wps | |
Begin Testing... | |
[Epoch 152] train avg loss 0.000898754, dev acc 0.9089, dev avg loss 0.233933, throughput 9.50791K wps | |
[Epoch 153 Batch 30/162] avg loss 0.000848278, throughput 9.49512K wps | |
[Epoch 153 Batch 60/162] avg loss 0.000874891, throughput 9.34026K wps | |
[Epoch 153 Batch 90/162] avg loss 0.000924783, throughput 9.47398K wps | |
[Epoch 153 Batch 120/162] avg loss 0.0008464, throughput 9.41661K wps | |
[Epoch 153 Batch 150/162] avg loss 0.000908126, throughput 9.47812K wps | |
Begin Testing... | |
[Epoch 153] train avg loss 0.000874369, dev acc 0.9100, dev avg loss 0.234371, throughput 9.44979K wps | |
[Epoch 154 Batch 30/162] avg loss 0.000920672, throughput 9.74402K wps | |
[Epoch 154 Batch 60/162] avg loss 0.000814592, throughput 9.55446K wps | |
[Epoch 154 Batch 90/162] avg loss 0.000811327, throughput 9.28855K wps | |
[Epoch 154 Batch 120/162] avg loss 0.000805069, throughput 9.51916K wps | |
[Epoch 154 Batch 150/162] avg loss 0.000926132, throughput 9.39829K wps | |
Begin Testing... | |
[Epoch 154] train avg loss 0.000854013, dev acc 0.9089, dev avg loss 0.234696, throughput 9.50781K wps | |
[Epoch 155 Batch 30/162] avg loss 0.000819967, throughput 9.41411K wps | |
[Epoch 155 Batch 60/162] avg loss 0.000851742, throughput 9.42135K wps | |
[Epoch 155 Batch 90/162] avg loss 0.00085012, throughput 9.38537K wps | |
[Epoch 155 Batch 120/162] avg loss 0.000828359, throughput 9.33566K wps | |
[Epoch 155 Batch 150/162] avg loss 0.000979425, throughput 9.41623K wps | |
Begin Testing... | |
[Epoch 155] train avg loss 0.000861041, dev acc 0.9100, dev avg loss 0.234617, throughput 9.39749K wps | |
[Epoch 156 Batch 30/162] avg loss 0.000908386, throughput 9.62097K wps | |
[Epoch 156 Batch 60/162] avg loss 0.000821374, throughput 9.6243K wps | |
[Epoch 156 Batch 90/162] avg loss 0.000867861, throughput 9.4248K wps | |
[Epoch 156 Batch 120/162] avg loss 0.00078668, throughput 9.45719K wps | |
[Epoch 156 Batch 150/162] avg loss 0.00089233, throughput 9.41718K wps | |
Begin Testing... | |
[Epoch 156] train avg loss 0.00085236, dev acc 0.9089, dev avg loss 0.234838, throughput 9.51593K wps | |
[Epoch 157 Batch 30/162] avg loss 0.000736066, throughput 9.51409K wps | |
[Epoch 157 Batch 60/162] avg loss 0.000908002, throughput 9.40162K wps | |
[Epoch 157 Batch 90/162] avg loss 0.00083561, throughput 9.23029K wps | |
[Epoch 157 Batch 120/162] avg loss 0.000859348, throughput 9.47912K wps | |
[Epoch 157 Batch 150/162] avg loss 0.000805165, throughput 9.65545K wps | |
Begin Testing... | |
[Epoch 157] train avg loss 0.000831235, dev acc 0.9100, dev avg loss 0.235804, throughput 9.45897K wps | |
[Epoch 158 Batch 30/162] avg loss 0.000951143, throughput 9.41079K wps | |
[Epoch 158 Batch 60/162] avg loss 0.000778569, throughput 9.39475K wps | |
[Epoch 158 Batch 90/162] avg loss 0.000875491, throughput 9.5724K wps | |
[Epoch 158 Batch 120/162] avg loss 0.000952165, throughput 9.30331K wps | |
[Epoch 158 Batch 150/162] avg loss 0.000788414, throughput 9.56461K wps | |
Begin Testing... | |
[Epoch 158] train avg loss 0.000845678, dev acc 0.9078, dev avg loss 0.235733, throughput 9.4523K wps | |
[Epoch 159 Batch 30/162] avg loss 0.00085577, throughput 9.47013K wps | |
[Epoch 159 Batch 60/162] avg loss 0.000891118, throughput 9.27658K wps | |
[Epoch 159 Batch 90/162] avg loss 0.000861628, throughput 9.23381K wps | |
[Epoch 159 Batch 120/162] avg loss 0.000698481, throughput 9.2417K wps | |
[Epoch 159 Batch 150/162] avg loss 0.000864441, throughput 9.4254K wps | |
Begin Testing... | |
[Epoch 159] train avg loss 0.000825486, dev acc 0.9100, dev avg loss 0.234453, throughput 9.31329K wps | |
[Epoch 160 Batch 30/162] avg loss 0.000769882, throughput 9.48976K wps | |
[Epoch 160 Batch 60/162] avg loss 0.00079528, throughput 9.56516K wps | |
[Epoch 160 Batch 90/162] avg loss 0.000809148, throughput 9.2531K wps | |
[Epoch 160 Batch 120/162] avg loss 0.000830905, throughput 9.29415K wps | |
[Epoch 160 Batch 150/162] avg loss 0.00083244, throughput 9.44979K wps | |
Begin Testing... | |
[Epoch 160] train avg loss 0.000820002, dev acc 0.9111, dev avg loss 0.235537, throughput 9.39205K wps | |
[Epoch 161 Batch 30/162] avg loss 0.000844816, throughput 9.63445K wps | |
[Epoch 161 Batch 60/162] avg loss 0.000784664, throughput 9.44139K wps | |
[Epoch 161 Batch 90/162] avg loss 0.000801659, throughput 9.46971K wps | |
[Epoch 161 Batch 120/162] avg loss 0.000830579, throughput 9.46503K wps | |
[Epoch 161 Batch 150/162] avg loss 0.000821105, throughput 9.23749K wps | |
Begin Testing... | |
[Epoch 161] train avg loss 0.000821155, dev acc 0.9089, dev avg loss 0.235196, throughput 9.43005K wps | |
[Epoch 162 Batch 30/162] avg loss 0.000723523, throughput 9.50028K wps | |
[Epoch 162 Batch 60/162] avg loss 0.000831475, throughput 9.27321K wps | |
[Epoch 162 Batch 90/162] avg loss 0.000753669, throughput 9.24562K wps | |
[Epoch 162 Batch 120/162] avg loss 0.00094508, throughput 8.29039K wps | |
[Epoch 162 Batch 150/162] avg loss 0.000757828, throughput 9.53707K wps | |
Begin Testing... | |
[Epoch 162] train avg loss 0.000813578, dev acc 0.9100, dev avg loss 0.235683, throughput 9.15311K wps | |
[Epoch 163 Batch 30/162] avg loss 0.000832987, throughput 9.63042K wps | |
[Epoch 163 Batch 60/162] avg loss 0.000769862, throughput 9.48367K wps | |
[Epoch 163 Batch 90/162] avg loss 0.000810397, throughput 9.39692K wps | |
[Epoch 163 Batch 120/162] avg loss 0.000790574, throughput 9.33818K wps | |
[Epoch 163 Batch 150/162] avg loss 0.000766768, throughput 9.42976K wps | |
Begin Testing... | |
[Epoch 163] train avg loss 0.000794569, dev acc 0.9111, dev avg loss 0.235793, throughput 9.45488K wps | |
[Epoch 164 Batch 30/162] avg loss 0.000749019, throughput 9.55306K wps | |
[Epoch 164 Batch 60/162] avg loss 0.000769312, throughput 9.35973K wps | |
[Epoch 164 Batch 90/162] avg loss 0.000880183, throughput 9.47812K wps | |
[Epoch 164 Batch 120/162] avg loss 0.000738201, throughput 9.50744K wps | |
[Epoch 164 Batch 150/162] avg loss 0.000762934, throughput 9.31955K wps | |
Begin Testing... | |
[Epoch 164] train avg loss 0.000772451, dev acc 0.9100, dev avg loss 0.23632, throughput 9.44936K wps | |
[Epoch 165 Batch 30/162] avg loss 0.000771005, throughput 9.69773K wps | |
[Epoch 165 Batch 60/162] avg loss 0.000749823, throughput 9.37011K wps | |
[Epoch 165 Batch 90/162] avg loss 0.000906661, throughput 9.61047K wps | |
[Epoch 165 Batch 120/162] avg loss 0.000740001, throughput 9.45996K wps | |
[Epoch 165 Batch 150/162] avg loss 0.000830273, throughput 9.39055K wps | |
Begin Testing... | |
[Epoch 165] train avg loss 0.000794587, dev acc 0.9100, dev avg loss 0.236553, throughput 9.48944K wps | |
[Epoch 166 Batch 30/162] avg loss 0.000768177, throughput 9.5646K wps | |
[Epoch 166 Batch 60/162] avg loss 0.000824933, throughput 9.41575K wps | |
[Epoch 166 Batch 90/162] avg loss 0.000821158, throughput 9.27244K wps | |
[Epoch 166 Batch 120/162] avg loss 0.000800608, throughput 9.36184K wps | |
[Epoch 166 Batch 150/162] avg loss 0.000828734, throughput 9.3954K wps | |
Begin Testing... | |
[Epoch 166] train avg loss 0.000802713, dev acc 0.9100, dev avg loss 0.235764, throughput 9.39036K wps | |
[Epoch 167 Batch 30/162] avg loss 0.000790269, throughput 9.63225K wps | |
[Epoch 167 Batch 60/162] avg loss 0.00070167, throughput 9.40472K wps | |
[Epoch 167 Batch 90/162] avg loss 0.000787989, throughput 9.3872K wps | |
[Epoch 167 Batch 120/162] avg loss 0.000739007, throughput 9.42673K wps | |
[Epoch 167 Batch 150/162] avg loss 0.000794844, throughput 9.60576K wps | |
Begin Testing... | |
[Epoch 167] train avg loss 0.000764162, dev acc 0.9100, dev avg loss 0.236202, throughput 9.49419K wps | |
[Epoch 168 Batch 30/162] avg loss 0.000801519, throughput 9.55981K wps | |
[Epoch 168 Batch 60/162] avg loss 0.000683453, throughput 9.40529K wps | |
[Epoch 168 Batch 90/162] avg loss 0.000784027, throughput 9.26729K wps | |
[Epoch 168 Batch 120/162] avg loss 0.000743444, throughput 9.22664K wps | |
[Epoch 168 Batch 150/162] avg loss 0.00074639, throughput 9.33994K wps | |
Begin Testing... | |
[Epoch 168] train avg loss 0.000755654, dev acc 0.9078, dev avg loss 0.237907, throughput 9.36235K wps | |
[Epoch 169 Batch 30/162] avg loss 0.000706642, throughput 9.48942K wps | |
[Epoch 169 Batch 60/162] avg loss 0.000801273, throughput 9.38395K wps | |
[Epoch 169 Batch 90/162] avg loss 0.000779325, throughput 9.32791K wps | |
[Epoch 169 Batch 120/162] avg loss 0.000761004, throughput 9.44284K wps | |
[Epoch 169 Batch 150/162] avg loss 0.000729448, throughput 9.20202K wps | |
Begin Testing... | |
[Epoch 169] train avg loss 0.000756275, dev acc 0.9122, dev avg loss 0.237194, throughput 9.36017K wps | |
[Epoch 170 Batch 30/162] avg loss 0.00066477, throughput 9.37133K wps | |
[Epoch 170 Batch 60/162] avg loss 0.000778539, throughput 9.27414K wps | |
[Epoch 170 Batch 90/162] avg loss 0.000825341, throughput 9.49727K wps | |
[Epoch 170 Batch 120/162] avg loss 0.000736744, throughput 9.31899K wps | |
[Epoch 170 Batch 150/162] avg loss 0.000736793, throughput 9.42257K wps | |
Begin Testing... | |
[Epoch 170] train avg loss 0.000740752, dev acc 0.9100, dev avg loss 0.237422, throughput 9.3939K wps | |
[Epoch 171 Batch 30/162] avg loss 0.000799584, throughput 9.54575K wps | |
[Epoch 171 Batch 60/162] avg loss 0.000789217, throughput 9.62761K wps | |
[Epoch 171 Batch 90/162] avg loss 0.000725629, throughput 9.48674K wps | |
[Epoch 171 Batch 120/162] avg loss 0.00070809, throughput 9.38962K wps | |
[Epoch 171 Batch 150/162] avg loss 0.000717389, throughput 9.47381K wps | |
Begin Testing... | |
[Epoch 171] train avg loss 0.000756386, dev acc 0.9100, dev avg loss 0.237227, throughput 9.48263K wps | |
[Epoch 172 Batch 30/162] avg loss 0.000752299, throughput 9.6368K wps | |
[Epoch 172 Batch 60/162] avg loss 0.000679785, throughput 9.32707K wps | |
[Epoch 172 Batch 90/162] avg loss 0.000817014, throughput 9.35232K wps | |
[Epoch 172 Batch 120/162] avg loss 0.000623984, throughput 9.42069K wps | |
[Epoch 172 Batch 150/162] avg loss 0.000732433, throughput 9.23809K wps | |
Begin Testing... | |
[Epoch 172] train avg loss 0.000726017, dev acc 0.9078, dev avg loss 0.236804, throughput 9.39972K wps | |
[Epoch 173 Batch 30/162] avg loss 0.000755986, throughput 9.46317K wps | |
[Epoch 173 Batch 60/162] avg loss 0.000702394, throughput 9.3325K wps | |
[Epoch 173 Batch 90/162] avg loss 0.000796277, throughput 9.46001K wps | |
[Epoch 173 Batch 120/162] avg loss 0.000652132, throughput 9.4137K wps | |
[Epoch 173 Batch 150/162] avg loss 0.000687601, throughput 9.40747K wps | |
Begin Testing... | |
[Epoch 173] train avg loss 0.000724834, dev acc 0.9111, dev avg loss 0.237712, throughput 9.4084K wps | |
[Epoch 174 Batch 30/162] avg loss 0.000735784, throughput 9.46282K wps | |
[Epoch 174 Batch 60/162] avg loss 0.000731652, throughput 9.31279K wps | |
[Epoch 174 Batch 90/162] avg loss 0.000656412, throughput 9.47202K wps | |
[Epoch 174 Batch 120/162] avg loss 0.000698891, throughput 9.44206K wps | |
[Epoch 174 Batch 150/162] avg loss 0.000741097, throughput 9.48087K wps | |
Begin Testing... | |
[Epoch 174] train avg loss 0.000713348, dev acc 0.9078, dev avg loss 0.237118, throughput 9.44297K wps | |
[Epoch 175 Batch 30/162] avg loss 0.000743771, throughput 9.39445K wps | |
[Epoch 175 Batch 60/162] avg loss 0.000682886, throughput 9.32839K wps | |
[Epoch 175 Batch 90/162] avg loss 0.000649598, throughput 9.52288K wps | |
[Epoch 175 Batch 120/162] avg loss 0.000667763, throughput 9.41873K wps | |
[Epoch 175 Batch 150/162] avg loss 0.000806995, throughput 9.35282K wps | |
Begin Testing... | |
[Epoch 175] train avg loss 0.000703635, dev acc 0.9100, dev avg loss 0.237228, throughput 9.3915K wps | |
[Epoch 176 Batch 30/162] avg loss 0.000832819, throughput 9.40682K wps | |
[Epoch 176 Batch 60/162] avg loss 0.000710265, throughput 9.35888K wps | |
[Epoch 176 Batch 90/162] avg loss 0.000667312, throughput 9.3304K wps | |
[Epoch 176 Batch 120/162] avg loss 0.000637296, throughput 9.33409K wps | |
[Epoch 176 Batch 150/162] avg loss 0.000669669, throughput 9.34223K wps | |
Begin Testing... | |
[Epoch 176] train avg loss 0.000699142, dev acc 0.9111, dev avg loss 0.237637, throughput 9.36195K wps | |
[Epoch 177 Batch 30/162] avg loss 0.000693528, throughput 9.58115K wps | |
[Epoch 177 Batch 60/162] avg loss 0.000650959, throughput 9.5932K wps | |
[Epoch 177 Batch 90/162] avg loss 0.000784742, throughput 9.39822K wps | |
[Epoch 177 Batch 120/162] avg loss 0.000633501, throughput 9.46298K wps | |
[Epoch 177 Batch 150/162] avg loss 0.000686833, throughput 9.48952K wps | |
Begin Testing... | |
[Epoch 177] train avg loss 0.000681109, dev acc 0.9089, dev avg loss 0.240385, throughput 9.48392K wps | |
[Epoch 178 Batch 30/162] avg loss 0.000640096, throughput 9.43652K wps | |
[Epoch 178 Batch 60/162] avg loss 0.000612319, throughput 9.249K wps | |
[Epoch 178 Batch 90/162] avg loss 0.000734316, throughput 9.46672K wps | |
[Epoch 178 Batch 120/162] avg loss 0.000597709, throughput 9.36726K wps | |
[Epoch 178 Batch 150/162] avg loss 0.000672073, throughput 9.3237K wps | |
Begin Testing... | |
[Epoch 178] train avg loss 0.000665362, dev acc 0.9100, dev avg loss 0.239272, throughput 9.3727K wps | |
[Epoch 179 Batch 30/162] avg loss 0.000633614, throughput 9.57979K wps | |
[Epoch 179 Batch 60/162] avg loss 0.000687075, throughput 9.30941K wps | |
[Epoch 179 Batch 90/162] avg loss 0.000608268, throughput 9.52255K wps | |
[Epoch 179 Batch 120/162] avg loss 0.00064015, throughput 9.51619K wps | |
[Epoch 179 Batch 150/162] avg loss 0.000611095, throughput 9.47367K wps | |
Begin Testing... | |
[Epoch 179] train avg loss 0.000634524, dev acc 0.9111, dev avg loss 0.239065, throughput 9.45879K wps | |
[Epoch 180 Batch 30/162] avg loss 0.000640002, throughput 9.44108K wps | |
[Epoch 180 Batch 60/162] avg loss 0.00066036, throughput 9.46485K wps | |
[Epoch 180 Batch 90/162] avg loss 0.000695246, throughput 9.2436K wps | |
[Epoch 180 Batch 120/162] avg loss 0.000639377, throughput 9.31758K wps | |
[Epoch 180 Batch 150/162] avg loss 0.000728602, throughput 9.51451K wps | |
Begin Testing... | |
[Epoch 180] train avg loss 0.000674831, dev acc 0.9100, dev avg loss 0.238442, throughput 9.40345K wps | |
[Epoch 181 Batch 30/162] avg loss 0.000615604, throughput 9.36744K wps | |
[Epoch 181 Batch 60/162] avg loss 0.000634235, throughput 9.42493K wps | |
[Epoch 181 Batch 90/162] avg loss 0.000672926, throughput 9.29464K wps | |
[Epoch 181 Batch 120/162] avg loss 0.000726044, throughput 9.24447K wps | |
[Epoch 181 Batch 150/162] avg loss 0.000637706, throughput 9.29073K wps | |
Begin Testing... | |
[Epoch 181] train avg loss 0.000656826, dev acc 0.9100, dev avg loss 0.238993, throughput 9.3223K wps | |
[Epoch 182 Batch 30/162] avg loss 0.000695117, throughput 9.49686K wps | |
[Epoch 182 Batch 60/162] avg loss 0.000682117, throughput 9.32149K wps | |
[Epoch 182 Batch 90/162] avg loss 0.000699859, throughput 9.47571K wps | |
[Epoch 182 Batch 120/162] avg loss 0.000691186, throughput 9.33104K wps | |
[Epoch 182 Batch 150/162] avg loss 0.000641777, throughput 9.35585K wps | |
Begin Testing... | |
[Epoch 182] train avg loss 0.000676474, dev acc 0.9100, dev avg loss 0.238849, throughput 9.36984K wps | |
[Epoch 183 Batch 30/162] avg loss 0.000643806, throughput 9.55162K wps | |
[Epoch 183 Batch 60/162] avg loss 0.000675814, throughput 9.37733K wps | |
[Epoch 183 Batch 90/162] avg loss 0.000622648, throughput 9.45701K wps | |
[Epoch 183 Batch 120/162] avg loss 0.000677972, throughput 9.2751K wps | |
[Epoch 183 Batch 150/162] avg loss 0.00067654, throughput 9.56866K wps | |
Begin Testing... | |
[Epoch 183] train avg loss 0.000662198, dev acc 0.9100, dev avg loss 0.239296, throughput 9.43471K wps | |
[Epoch 184 Batch 30/162] avg loss 0.000643572, throughput 9.61286K wps | |
[Epoch 184 Batch 60/162] avg loss 0.000679541, throughput 9.335K wps | |
[Epoch 184 Batch 90/162] avg loss 0.0006128, throughput 9.36744K wps | |
[Epoch 184 Batch 120/162] avg loss 0.000675135, throughput 9.30609K wps | |
[Epoch 184 Batch 150/162] avg loss 0.0006197, throughput 9.26366K wps | |
Begin Testing... | |
[Epoch 184] train avg loss 0.000652314, dev acc 0.9089, dev avg loss 0.240087, throughput 9.37998K wps | |
[Epoch 185 Batch 30/162] avg loss 0.000701048, throughput 9.41467K wps | |
[Epoch 185 Batch 60/162] avg loss 0.000639268, throughput 9.28428K wps | |
[Epoch 185 Batch 90/162] avg loss 0.000594154, throughput 9.30481K wps | |
[Epoch 185 Batch 120/162] avg loss 0.000658006, throughput 9.23882K wps | |
[Epoch 185 Batch 150/162] avg loss 0.000591176, throughput 9.28895K wps | |
Begin Testing... | |
[Epoch 185] train avg loss 0.000641609, dev acc 0.9089, dev avg loss 0.24066, throughput 9.30514K wps | |
[Epoch 186 Batch 30/162] avg loss 0.000669224, throughput 9.59649K wps | |
[Epoch 186 Batch 60/162] avg loss 0.000661967, throughput 9.34397K wps | |
[Epoch 186 Batch 90/162] avg loss 0.000709543, throughput 9.43368K wps | |
[Epoch 186 Batch 120/162] avg loss 0.00056775, throughput 9.41121K wps | |
[Epoch 186 Batch 150/162] avg loss 0.000713935, throughput 9.23548K wps | |
Begin Testing... | |
[Epoch 186] train avg loss 0.000658315, dev acc 0.9100, dev avg loss 0.239777, throughput 9.39768K wps | |
[Epoch 187 Batch 30/162] avg loss 0.000600564, throughput 9.70658K wps | |
[Epoch 187 Batch 60/162] avg loss 0.000598476, throughput 9.38072K wps | |
[Epoch 187 Batch 90/162] avg loss 0.000622142, throughput 9.38747K wps | |
[Epoch 187 Batch 120/162] avg loss 0.000639972, throughput 9.47634K wps | |
[Epoch 187 Batch 150/162] avg loss 0.000609056, throughput 9.28248K wps | |
Begin Testing... | |
[Epoch 187] train avg loss 0.000613317, dev acc 0.9100, dev avg loss 0.240447, throughput 9.42922K wps | |
[Epoch 188 Batch 30/162] avg loss 0.000773164, throughput 9.52025K wps | |
[Epoch 188 Batch 60/162] avg loss 0.000617776, throughput 9.22389K wps | |
[Epoch 188 Batch 90/162] avg loss 0.000667963, throughput 9.36549K wps | |
[Epoch 188 Batch 120/162] avg loss 0.000657483, throughput 9.47177K wps | |
[Epoch 188 Batch 150/162] avg loss 0.000693193, throughput 9.39425K wps | |
Begin Testing... | |
[Epoch 188] train avg loss 0.000672095, dev acc 0.9100, dev avg loss 0.23955, throughput 9.38678K wps | |
[Epoch 189 Batch 30/162] avg loss 0.000638646, throughput 9.53631K wps | |
[Epoch 189 Batch 60/162] avg loss 0.000609101, throughput 9.43035K wps | |
[Epoch 189 Batch 90/162] avg loss 0.000576269, throughput 9.30208K wps | |
[Epoch 189 Batch 120/162] avg loss 0.00057591, throughput 9.44965K wps | |
[Epoch 189 Batch 150/162] avg loss 0.000644149, throughput 9.20806K wps | |
Begin Testing... | |
[Epoch 189] train avg loss 0.000622232, dev acc 0.9111, dev avg loss 0.239722, throughput 9.37656K wps | |
[Epoch 190 Batch 30/162] avg loss 0.000607203, throughput 9.60465K wps | |
[Epoch 190 Batch 60/162] avg loss 0.000647973, throughput 9.3575K wps | |
[Epoch 190 Batch 90/162] avg loss 0.000602054, throughput 9.25115K wps | |
[Epoch 190 Batch 120/162] avg loss 0.000639988, throughput 9.40754K wps | |
[Epoch 190 Batch 150/162] avg loss 0.000545217, throughput 9.29879K wps | |
Begin Testing... | |
[Epoch 190] train avg loss 0.000615136, dev acc 0.9100, dev avg loss 0.239798, throughput 9.37632K wps | |
[Epoch 191 Batch 30/162] avg loss 0.00058518, throughput 9.59361K wps | |
[Epoch 191 Batch 60/162] avg loss 0.000575453, throughput 9.39359K wps | |
[Epoch 191 Batch 90/162] avg loss 0.00065633, throughput 9.43916K wps | |
[Epoch 191 Batch 120/162] avg loss 0.000648766, throughput 9.27674K wps | |
[Epoch 191 Batch 150/162] avg loss 0.000553561, throughput 9.4239K wps | |
Begin Testing... | |
[Epoch 191] train avg loss 0.000602676, dev acc 0.9078, dev avg loss 0.240976, throughput 9.3981K wps | |
[Epoch 192 Batch 30/162] avg loss 0.000592778, throughput 9.44805K wps | |
[Epoch 192 Batch 60/162] avg loss 0.000596172, throughput 9.42278K wps | |
[Epoch 192 Batch 90/162] avg loss 0.000573371, throughput 9.26366K wps | |
[Epoch 192 Batch 120/162] avg loss 0.000575005, throughput 9.25155K wps | |
[Epoch 192 Batch 150/162] avg loss 0.000636236, throughput 9.32154K wps | |
Begin Testing... | |
[Epoch 192] train avg loss 0.000593533, dev acc 0.9100, dev avg loss 0.240407, throughput 9.35108K wps | |
[Epoch 193 Batch 30/162] avg loss 0.000658213, throughput 9.72959K wps | |
[Epoch 193 Batch 60/162] avg loss 0.000577393, throughput 9.44005K wps | |
[Epoch 193 Batch 90/162] avg loss 0.000586158, throughput 9.30096K wps | |
[Epoch 193 Batch 120/162] avg loss 0.000573065, throughput 9.52292K wps | |
[Epoch 193 Batch 150/162] avg loss 0.000586835, throughput 9.41289K wps | |
Begin Testing... | |
[Epoch 193] train avg loss 0.000595349, dev acc 0.9100, dev avg loss 0.240989, throughput 9.46127K wps | |
[Epoch 194 Batch 30/162] avg loss 0.000662096, throughput 9.62171K wps | |
[Epoch 194 Batch 60/162] avg loss 0.000583193, throughput 9.47202K wps | |
[Epoch 194 Batch 90/162] avg loss 0.000577051, throughput 9.35759K wps | |
[Epoch 194 Batch 120/162] avg loss 0.000570031, throughput 9.41577K wps | |
[Epoch 194 Batch 150/162] avg loss 0.0006127, throughput 9.41688K wps | |
Begin Testing... | |
[Epoch 194] train avg loss 0.000599261, dev acc 0.9089, dev avg loss 0.241427, throughput 9.43003K wps | |
[Epoch 195 Batch 30/162] avg loss 0.000605583, throughput 9.50723K wps | |
[Epoch 195 Batch 60/162] avg loss 0.000595955, throughput 9.30366K wps | |
[Epoch 195 Batch 90/162] avg loss 0.000495778, throughput 9.33759K wps | |
[Epoch 195 Batch 120/162] avg loss 0.000552032, throughput 9.37243K wps | |
[Epoch 195 Batch 150/162] avg loss 0.000550038, throughput 9.5454K wps | |
Begin Testing... | |
[Epoch 195] train avg loss 0.00057086, dev acc 0.9089, dev avg loss 0.241859, throughput 9.40634K wps | |
[Epoch 196 Batch 30/162] avg loss 0.000520767, throughput 9.49918K wps | |
[Epoch 196 Batch 60/162] avg loss 0.000716071, throughput 9.22411K wps | |
[Epoch 196 Batch 90/162] avg loss 0.000616169, throughput 9.24138K wps | |
[Epoch 196 Batch 120/162] avg loss 0.000636962, throughput 9.54738K wps | |
[Epoch 196 Batch 150/162] avg loss 0.000596003, throughput 9.23651K wps | |
Begin Testing... | |
[Epoch 196] train avg loss 0.000611805, dev acc 0.9089, dev avg loss 0.241178, throughput 9.33831K wps | |
[Epoch 197 Batch 30/162] avg loss 0.000599157, throughput 9.55958K wps | |
[Epoch 197 Batch 60/162] avg loss 0.000653632, throughput 9.36023K wps | |
[Epoch 197 Batch 90/162] avg loss 0.000706283, throughput 9.3413K wps | |
[Epoch 197 Batch 120/162] avg loss 0.000546314, throughput 9.47379K wps | |
[Epoch 197 Batch 150/162] avg loss 0.00065786, throughput 9.35161K wps | |
Begin Testing... | |
[Epoch 197] train avg loss 0.000622945, dev acc 0.9100, dev avg loss 0.242742, throughput 9.42742K wps | |
[Epoch 198 Batch 30/162] avg loss 0.000501483, throughput 9.53983K wps | |
[Epoch 198 Batch 60/162] avg loss 0.000557501, throughput 9.52597K wps | |
[Epoch 198 Batch 90/162] avg loss 0.00052395, throughput 9.19008K wps | |
[Epoch 198 Batch 120/162] avg loss 0.000651491, throughput 9.311K wps | |
[Epoch 198 Batch 150/162] avg loss 0.000543924, throughput 9.3464K wps | |
Begin Testing... | |
[Epoch 198] train avg loss 0.000554811, dev acc 0.9089, dev avg loss 0.242213, throughput 9.38769K wps | |
[Epoch 199 Batch 30/162] avg loss 0.000553013, throughput 9.62338K wps | |
[Epoch 199 Batch 60/162] avg loss 0.000695971, throughput 9.38402K wps | |
[Epoch 199 Batch 90/162] avg loss 0.000585834, throughput 9.46732K wps | |
[Epoch 199 Batch 120/162] avg loss 0.000557989, throughput 9.41661K wps | |
[Epoch 199 Batch 150/162] avg loss 0.000537312, throughput 9.31667K wps | |
Begin Testing... | |
[Epoch 199] train avg loss 0.000579422, dev acc 0.9089, dev avg loss 0.242216, throughput 9.42672K wps | |
Test loss 0.240849, test acc 0.9000 | |
Total time cost 449.56s | |
[Epoch 0 Batch 30/162] avg loss 0.0140726, throughput 7.39141K wps | |
[Epoch 0 Batch 60/162] avg loss 0.0140383, throughput 9.46102K wps | |
[Epoch 0 Batch 90/162] avg loss 0.0137832, throughput 9.39688K wps | |
[Epoch 0 Batch 120/162] avg loss 0.0136756, throughput 9.50621K wps | |
[Epoch 0 Batch 150/162] avg loss 0.0135761, throughput 9.26843K wps | |
Begin Testing... | |
[Epoch 0] train avg loss 0.0138065, dev acc 0.5733, dev avg loss 0.670652, throughput 8.96478K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 1 Batch 30/162] avg loss 0.0133764, throughput 9.56917K wps | |
[Epoch 1 Batch 60/162] avg loss 0.0131806, throughput 9.34241K wps | |
[Epoch 1 Batch 90/162] avg loss 0.0131952, throughput 9.43099K wps | |
[Epoch 1 Batch 120/162] avg loss 0.0130626, throughput 9.4115K wps | |
[Epoch 1 Batch 150/162] avg loss 0.0128902, throughput 9.33768K wps | |
Begin Testing... | |
[Epoch 1] train avg loss 0.0131225, dev acc 0.8422, dev avg loss 0.635717, throughput 9.40402K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 2 Batch 30/162] avg loss 0.0126453, throughput 9.61325K wps | |
[Epoch 2 Batch 60/162] avg loss 0.0126212, throughput 9.53795K wps | |
[Epoch 2 Batch 90/162] avg loss 0.0126731, throughput 9.37195K wps | |
[Epoch 2 Batch 120/162] avg loss 0.0122695, throughput 9.29239K wps | |
[Epoch 2 Batch 150/162] avg loss 0.0121763, throughput 9.36317K wps | |
Begin Testing... | |
[Epoch 2] train avg loss 0.0124558, dev acc 0.8644, dev avg loss 0.600753, throughput 9.43269K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 3 Batch 30/162] avg loss 0.0120106, throughput 9.59417K wps | |
[Epoch 3 Batch 60/162] avg loss 0.0119057, throughput 9.59146K wps | |
[Epoch 3 Batch 90/162] avg loss 0.0116017, throughput 9.41008K wps | |
[Epoch 3 Batch 120/162] avg loss 0.0116738, throughput 9.36862K wps | |
[Epoch 3 Batch 150/162] avg loss 0.0115794, throughput 9.28908K wps | |
Begin Testing... | |
[Epoch 3] train avg loss 0.0117007, dev acc 0.8378, dev avg loss 0.561953, throughput 9.4522K wps | |
[Epoch 4 Batch 30/162] avg loss 0.0112274, throughput 9.60132K wps | |
[Epoch 4 Batch 60/162] avg loss 0.0109353, throughput 9.43363K wps | |
[Epoch 4 Batch 90/162] avg loss 0.0109218, throughput 9.47301K wps | |
[Epoch 4 Batch 120/162] avg loss 0.0107493, throughput 9.49994K wps | |
[Epoch 4 Batch 150/162] avg loss 0.0106113, throughput 9.24157K wps | |
Begin Testing... | |
[Epoch 4] train avg loss 0.0108482, dev acc 0.8789, dev avg loss 0.518359, throughput 9.44586K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 5 Batch 30/162] avg loss 0.0104453, throughput 9.54439K wps | |
[Epoch 5 Batch 60/162] avg loss 0.0101523, throughput 9.42843K wps | |
[Epoch 5 Batch 90/162] avg loss 0.0101504, throughput 9.34022K wps | |
[Epoch 5 Batch 120/162] avg loss 0.0100014, throughput 9.35983K wps | |
[Epoch 5 Batch 150/162] avg loss 0.00994029, throughput 9.22321K wps | |
Begin Testing... | |
[Epoch 5] train avg loss 0.0101024, dev acc 0.8756, dev avg loss 0.478165, throughput 9.37346K wps | |
[Epoch 6 Batch 30/162] avg loss 0.00950364, throughput 9.57595K wps | |
[Epoch 6 Batch 60/162] avg loss 0.00935815, throughput 9.32741K wps | |
[Epoch 6 Batch 90/162] avg loss 0.00937507, throughput 9.42836K wps | |
[Epoch 6 Batch 120/162] avg loss 0.00913393, throughput 9.3183K wps | |
[Epoch 6 Batch 150/162] avg loss 0.00919421, throughput 9.48266K wps | |
Begin Testing... | |
[Epoch 6] train avg loss 0.00929135, dev acc 0.8822, dev avg loss 0.440005, throughput 9.41961K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 7 Batch 30/162] avg loss 0.00869316, throughput 9.4913K wps | |
[Epoch 7 Batch 60/162] avg loss 0.0090485, throughput 9.26146K wps | |
[Epoch 7 Batch 90/162] avg loss 0.0085844, throughput 9.32471K wps | |
[Epoch 7 Batch 120/162] avg loss 0.00854258, throughput 9.47627K wps | |
[Epoch 7 Batch 150/162] avg loss 0.00856236, throughput 9.35085K wps | |
Begin Testing... | |
[Epoch 7] train avg loss 0.00863502, dev acc 0.8800, dev avg loss 0.409862, throughput 9.40692K wps | |
[Epoch 8 Batch 30/162] avg loss 0.00828883, throughput 9.60198K wps | |
[Epoch 8 Batch 60/162] avg loss 0.00796956, throughput 9.41286K wps | |
[Epoch 8 Batch 90/162] avg loss 0.00830067, throughput 9.3611K wps | |
[Epoch 8 Batch 120/162] avg loss 0.00792313, throughput 9.27207K wps | |
[Epoch 8 Batch 150/162] avg loss 0.00797801, throughput 9.58641K wps | |
Begin Testing... | |
[Epoch 8] train avg loss 0.00807945, dev acc 0.8833, dev avg loss 0.384831, throughput 9.43896K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 9 Batch 30/162] avg loss 0.00766453, throughput 9.62713K wps | |
[Epoch 9 Batch 60/162] avg loss 0.00750802, throughput 9.4043K wps | |
[Epoch 9 Batch 90/162] avg loss 0.00775625, throughput 9.43724K wps | |
[Epoch 9 Batch 120/162] avg loss 0.00767138, throughput 9.52052K wps | |
[Epoch 9 Batch 150/162] avg loss 0.00756405, throughput 9.47361K wps | |
Begin Testing... | |
[Epoch 9] train avg loss 0.00761668, dev acc 0.8867, dev avg loss 0.365238, throughput 9.4848K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 10 Batch 30/162] avg loss 0.00731415, throughput 9.53568K wps | |
[Epoch 10 Batch 60/162] avg loss 0.00770711, throughput 9.33454K wps | |
[Epoch 10 Batch 90/162] avg loss 0.0070536, throughput 9.47978K wps | |
[Epoch 10 Batch 120/162] avg loss 0.00719121, throughput 9.49022K wps | |
[Epoch 10 Batch 150/162] avg loss 0.00721707, throughput 9.42229K wps | |
Begin Testing... | |
[Epoch 10] train avg loss 0.00729648, dev acc 0.8900, dev avg loss 0.349222, throughput 9.44327K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 11 Batch 30/162] avg loss 0.00690867, throughput 9.54089K wps | |
[Epoch 11 Batch 60/162] avg loss 0.00717689, throughput 9.48441K wps | |
[Epoch 11 Batch 90/162] avg loss 0.0068918, throughput 9.28624K wps | |
[Epoch 11 Batch 120/162] avg loss 0.0071228, throughput 9.44055K wps | |
[Epoch 11 Batch 150/162] avg loss 0.00669772, throughput 9.31503K wps | |
Begin Testing... | |
[Epoch 11] train avg loss 0.00697559, dev acc 0.8944, dev avg loss 0.336424, throughput 9.41227K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 12 Batch 30/162] avg loss 0.00693336, throughput 9.54545K wps | |
[Epoch 12 Batch 60/162] avg loss 0.00663406, throughput 9.35629K wps | |
[Epoch 12 Batch 90/162] avg loss 0.00648058, throughput 9.51424K wps | |
[Epoch 12 Batch 120/162] avg loss 0.00676808, throughput 9.42161K wps | |
[Epoch 12 Batch 150/162] avg loss 0.00687599, throughput 9.23493K wps | |
Begin Testing... | |
[Epoch 12] train avg loss 0.00674375, dev acc 0.8922, dev avg loss 0.326054, throughput 9.39659K wps | |
[Epoch 13 Batch 30/162] avg loss 0.00646833, throughput 9.53929K wps | |
[Epoch 13 Batch 60/162] avg loss 0.00647085, throughput 9.55425K wps | |
[Epoch 13 Batch 90/162] avg loss 0.00639204, throughput 9.33323K wps | |
[Epoch 13 Batch 120/162] avg loss 0.00652885, throughput 9.37294K wps | |
[Epoch 13 Batch 150/162] avg loss 0.00633697, throughput 9.29223K wps | |
Begin Testing... | |
[Epoch 13] train avg loss 0.006429, dev acc 0.8967, dev avg loss 0.315885, throughput 9.41412K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 14 Batch 30/162] avg loss 0.00604977, throughput 9.72143K wps | |
[Epoch 14 Batch 60/162] avg loss 0.00650789, throughput 9.20917K wps | |
[Epoch 14 Batch 90/162] avg loss 0.00629555, throughput 9.21399K wps | |
[Epoch 14 Batch 120/162] avg loss 0.00603097, throughput 9.34961K wps | |
[Epoch 14 Batch 150/162] avg loss 0.00663879, throughput 9.23972K wps | |
Begin Testing... | |
[Epoch 14] train avg loss 0.00628041, dev acc 0.9000, dev avg loss 0.308417, throughput 9.36123K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 15 Batch 30/162] avg loss 0.00609191, throughput 9.70632K wps | |
[Epoch 15 Batch 60/162] avg loss 0.00608192, throughput 9.40351K wps | |
[Epoch 15 Batch 90/162] avg loss 0.00635003, throughput 9.35678K wps | |
[Epoch 15 Batch 120/162] avg loss 0.00639329, throughput 9.49997K wps | |
[Epoch 15 Batch 150/162] avg loss 0.00603197, throughput 9.32145K wps | |
Begin Testing... | |
[Epoch 15] train avg loss 0.0061382, dev acc 0.8967, dev avg loss 0.303301, throughput 9.45129K wps | |
[Epoch 16 Batch 30/162] avg loss 0.00598804, throughput 9.71958K wps | |
[Epoch 16 Batch 60/162] avg loss 0.00622243, throughput 9.46557K wps | |
[Epoch 16 Batch 90/162] avg loss 0.00580353, throughput 9.31757K wps | |
[Epoch 16 Batch 120/162] avg loss 0.00564709, throughput 9.66407K wps | |
[Epoch 16 Batch 150/162] avg loss 0.00628037, throughput 9.39222K wps | |
Begin Testing... | |
[Epoch 16] train avg loss 0.00594148, dev acc 0.8989, dev avg loss 0.295985, throughput 9.50332K wps | |
[Epoch 17 Batch 30/162] avg loss 0.00580788, throughput 9.65908K wps | |
[Epoch 17 Batch 60/162] avg loss 0.00586131, throughput 9.35996K wps | |
[Epoch 17 Batch 90/162] avg loss 0.00601423, throughput 9.41336K wps | |
[Epoch 17 Batch 120/162] avg loss 0.005885, throughput 9.35865K wps | |
[Epoch 17 Batch 150/162] avg loss 0.00577548, throughput 9.40211K wps | |
Begin Testing... | |
[Epoch 17] train avg loss 0.00585919, dev acc 0.9000, dev avg loss 0.290959, throughput 9.44395K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 18 Batch 30/162] avg loss 0.00584757, throughput 9.55851K wps | |
[Epoch 18 Batch 60/162] avg loss 0.00571791, throughput 9.4662K wps | |
[Epoch 18 Batch 90/162] avg loss 0.0056391, throughput 9.49958K wps | |
[Epoch 18 Batch 120/162] avg loss 0.00571704, throughput 9.29116K wps | |
[Epoch 18 Batch 150/162] avg loss 0.00547214, throughput 9.55196K wps | |
Begin Testing... | |
[Epoch 18] train avg loss 0.00566763, dev acc 0.9078, dev avg loss 0.286611, throughput 9.45075K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 19 Batch 30/162] avg loss 0.00576338, throughput 9.73011K wps | |
[Epoch 19 Batch 60/162] avg loss 0.00537843, throughput 9.51917K wps | |
[Epoch 19 Batch 90/162] avg loss 0.00584516, throughput 9.29239K wps | |
[Epoch 19 Batch 120/162] avg loss 0.00548162, throughput 9.33315K wps | |
[Epoch 19 Batch 150/162] avg loss 0.00554966, throughput 9.51513K wps | |
Begin Testing... | |
[Epoch 19] train avg loss 0.00560618, dev acc 0.9011, dev avg loss 0.280949, throughput 9.46913K wps | |
[Epoch 20 Batch 30/162] avg loss 0.00561643, throughput 9.68142K wps | |
[Epoch 20 Batch 60/162] avg loss 0.00528191, throughput 9.5243K wps | |
[Epoch 20 Batch 90/162] avg loss 0.00544346, throughput 9.56445K wps | |
[Epoch 20 Batch 120/162] avg loss 0.00574723, throughput 9.53746K wps | |
[Epoch 20 Batch 150/162] avg loss 0.00529826, throughput 9.39591K wps | |
Begin Testing... | |
[Epoch 20] train avg loss 0.00544785, dev acc 0.9056, dev avg loss 0.276786, throughput 9.53617K wps | |
[Epoch 21 Batch 30/162] avg loss 0.00524969, throughput 9.57011K wps | |
[Epoch 21 Batch 60/162] avg loss 0.00548531, throughput 9.49003K wps | |
[Epoch 21 Batch 90/162] avg loss 0.00517617, throughput 9.33073K wps | |
[Epoch 21 Batch 120/162] avg loss 0.00527949, throughput 9.47007K wps | |
[Epoch 21 Batch 150/162] avg loss 0.00526136, throughput 9.34523K wps | |
Begin Testing... | |
[Epoch 21] train avg loss 0.00531254, dev acc 0.9056, dev avg loss 0.273125, throughput 9.45373K wps | |
[Epoch 22 Batch 30/162] avg loss 0.00510113, throughput 9.6955K wps | |
[Epoch 22 Batch 60/162] avg loss 0.00489239, throughput 9.56468K wps | |
[Epoch 22 Batch 90/162] avg loss 0.00510778, throughput 9.40444K wps | |
[Epoch 22 Batch 120/162] avg loss 0.00564604, throughput 9.3401K wps | |
[Epoch 22 Batch 150/162] avg loss 0.00520852, throughput 9.40036K wps | |
Begin Testing... | |
[Epoch 22] train avg loss 0.0052046, dev acc 0.9067, dev avg loss 0.270058, throughput 9.46844K wps | |
[Epoch 23 Batch 30/162] avg loss 0.00501132, throughput 9.45797K wps | |
[Epoch 23 Batch 60/162] avg loss 0.00527927, throughput 9.50068K wps | |
[Epoch 23 Batch 90/162] avg loss 0.00513286, throughput 9.45796K wps | |
[Epoch 23 Batch 120/162] avg loss 0.00516424, throughput 9.5026K wps | |
[Epoch 23 Batch 150/162] avg loss 0.00464099, throughput 9.45687K wps | |
Begin Testing... | |
[Epoch 23] train avg loss 0.00506607, dev acc 0.9056, dev avg loss 0.266341, throughput 9.47333K wps | |
[Epoch 24 Batch 30/162] avg loss 0.00508284, throughput 9.45475K wps | |
[Epoch 24 Batch 60/162] avg loss 0.00504651, throughput 9.29793K wps | |
[Epoch 24 Batch 90/162] avg loss 0.00519296, throughput 9.44859K wps | |
[Epoch 24 Batch 120/162] avg loss 0.00511292, throughput 9.3703K wps | |
[Epoch 24 Batch 150/162] avg loss 0.00481905, throughput 9.3752K wps | |
Begin Testing... | |
[Epoch 24] train avg loss 0.00500738, dev acc 0.9100, dev avg loss 0.262825, throughput 9.38826K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 25 Batch 30/162] avg loss 0.00489864, throughput 9.56647K wps | |
[Epoch 25 Batch 60/162] avg loss 0.00520048, throughput 9.55132K wps | |
[Epoch 25 Batch 90/162] avg loss 0.0046729, throughput 9.5449K wps | |
[Epoch 25 Batch 120/162] avg loss 0.00470683, throughput 9.48367K wps | |
[Epoch 25 Batch 150/162] avg loss 0.00493507, throughput 9.40231K wps | |
Begin Testing... | |
[Epoch 25] train avg loss 0.00494214, dev acc 0.9067, dev avg loss 0.260009, throughput 9.49344K wps | |
[Epoch 26 Batch 30/162] avg loss 0.00523675, throughput 9.73317K wps | |
[Epoch 26 Batch 60/162] avg loss 0.00442369, throughput 9.4471K wps | |
[Epoch 26 Batch 90/162] avg loss 0.00478524, throughput 9.50189K wps | |
[Epoch 26 Batch 120/162] avg loss 0.00457294, throughput 9.33065K wps | |
[Epoch 26 Batch 150/162] avg loss 0.00506119, throughput 9.34235K wps | |
Begin Testing... | |
[Epoch 26] train avg loss 0.00482379, dev acc 0.9111, dev avg loss 0.25761, throughput 9.47817K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 27 Batch 30/162] avg loss 0.00465707, throughput 9.55972K wps | |
[Epoch 27 Batch 60/162] avg loss 0.00476009, throughput 9.29239K wps | |
[Epoch 27 Batch 90/162] avg loss 0.00458761, throughput 9.3975K wps | |
[Epoch 27 Batch 120/162] avg loss 0.00510205, throughput 9.32087K wps | |
[Epoch 27 Batch 150/162] avg loss 0.00485559, throughput 9.37847K wps | |
Begin Testing... | |
[Epoch 27] train avg loss 0.00477718, dev acc 0.9111, dev avg loss 0.255202, throughput 9.40621K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 28 Batch 30/162] avg loss 0.00459179, throughput 9.4494K wps | |
[Epoch 28 Batch 60/162] avg loss 0.00477292, throughput 9.29028K wps | |
[Epoch 28 Batch 90/162] avg loss 0.00475007, throughput 9.35829K wps | |
[Epoch 28 Batch 120/162] avg loss 0.00456027, throughput 9.31185K wps | |
[Epoch 28 Batch 150/162] avg loss 0.00476986, throughput 9.46715K wps | |
Begin Testing... | |
[Epoch 28] train avg loss 0.00464604, dev acc 0.9111, dev avg loss 0.252468, throughput 9.36739K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 29 Batch 30/162] avg loss 0.00495152, throughput 9.55223K wps | |
[Epoch 29 Batch 60/162] avg loss 0.00472496, throughput 9.37179K wps | |
[Epoch 29 Batch 90/162] avg loss 0.00440736, throughput 9.48296K wps | |
[Epoch 29 Batch 120/162] avg loss 0.00450659, throughput 9.41146K wps | |
[Epoch 29 Batch 150/162] avg loss 0.00443813, throughput 9.42477K wps | |
Begin Testing... | |
[Epoch 29] train avg loss 0.00455219, dev acc 0.9122, dev avg loss 0.250449, throughput 9.45928K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 30 Batch 30/162] avg loss 0.00450807, throughput 9.44462K wps | |
[Epoch 30 Batch 60/162] avg loss 0.00411041, throughput 9.24222K wps | |
[Epoch 30 Batch 90/162] avg loss 0.00421674, throughput 9.40247K wps | |
[Epoch 30 Batch 120/162] avg loss 0.00494776, throughput 9.40438K wps | |
[Epoch 30 Batch 150/162] avg loss 0.00468724, throughput 9.25262K wps | |
Begin Testing... | |
[Epoch 30] train avg loss 0.00453674, dev acc 0.9133, dev avg loss 0.249946, throughput 9.34694K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 31 Batch 30/162] avg loss 0.00443646, throughput 9.5003K wps | |
[Epoch 31 Batch 60/162] avg loss 0.00432406, throughput 9.41434K wps | |
[Epoch 31 Batch 90/162] avg loss 0.00490161, throughput 9.54533K wps | |
[Epoch 31 Batch 120/162] avg loss 0.00435793, throughput 9.5111K wps | |
[Epoch 31 Batch 150/162] avg loss 0.00419358, throughput 9.38685K wps | |
Begin Testing... | |
[Epoch 31] train avg loss 0.00443906, dev acc 0.9122, dev avg loss 0.247184, throughput 9.47838K wps | |
[Epoch 32 Batch 30/162] avg loss 0.0043791, throughput 9.59499K wps | |
[Epoch 32 Batch 60/162] avg loss 0.00418867, throughput 9.55898K wps | |
[Epoch 32 Batch 90/162] avg loss 0.00421759, throughput 9.51839K wps | |
[Epoch 32 Batch 120/162] avg loss 0.0043961, throughput 9.38598K wps | |
[Epoch 32 Batch 150/162] avg loss 0.00467612, throughput 9.29118K wps | |
Begin Testing... | |
[Epoch 32] train avg loss 0.0043886, dev acc 0.9100, dev avg loss 0.245101, throughput 9.4542K wps | |
[Epoch 33 Batch 30/162] avg loss 0.0041956, throughput 9.51613K wps | |
[Epoch 33 Batch 60/162] avg loss 0.00444644, throughput 9.36423K wps | |
[Epoch 33 Batch 90/162] avg loss 0.00451396, throughput 9.23207K wps | |
[Epoch 33 Batch 120/162] avg loss 0.00449578, throughput 9.38102K wps | |
[Epoch 33 Batch 150/162] avg loss 0.00411528, throughput 9.23957K wps | |
Begin Testing... | |
[Epoch 33] train avg loss 0.00432072, dev acc 0.9111, dev avg loss 0.242991, throughput 9.32943K wps | |
[Epoch 34 Batch 30/162] avg loss 0.00404961, throughput 9.41653K wps | |
[Epoch 34 Batch 60/162] avg loss 0.00407636, throughput 9.388K wps | |
[Epoch 34 Batch 90/162] avg loss 0.00410159, throughput 9.3557K wps | |
[Epoch 34 Batch 120/162] avg loss 0.00455285, throughput 9.37711K wps | |
[Epoch 34 Batch 150/162] avg loss 0.00447952, throughput 9.40347K wps | |
Begin Testing... | |
[Epoch 34] train avg loss 0.00422945, dev acc 0.9167, dev avg loss 0.24166, throughput 9.38692K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 35 Batch 30/162] avg loss 0.00405558, throughput 9.56081K wps | |
[Epoch 35 Batch 60/162] avg loss 0.00425252, throughput 9.53486K wps | |
[Epoch 35 Batch 90/162] avg loss 0.00398845, throughput 9.32025K wps | |
[Epoch 35 Batch 120/162] avg loss 0.00420538, throughput 9.40946K wps | |
[Epoch 35 Batch 150/162] avg loss 0.00426934, throughput 9.36935K wps | |
Begin Testing... | |
[Epoch 35] train avg loss 0.00418682, dev acc 0.9111, dev avg loss 0.242246, throughput 9.43124K wps | |
[Epoch 36 Batch 30/162] avg loss 0.00448431, throughput 9.55008K wps | |
[Epoch 36 Batch 60/162] avg loss 0.00398378, throughput 9.34084K wps | |
[Epoch 36 Batch 90/162] avg loss 0.00405471, throughput 9.26619K wps | |
[Epoch 36 Batch 120/162] avg loss 0.00436351, throughput 9.30262K wps | |
[Epoch 36 Batch 150/162] avg loss 0.00359431, throughput 9.45511K wps | |
Begin Testing... | |
[Epoch 36] train avg loss 0.00408005, dev acc 0.9156, dev avg loss 0.23805, throughput 9.37372K wps | |
[Epoch 37 Batch 30/162] avg loss 0.00443964, throughput 9.59793K wps | |
[Epoch 37 Batch 60/162] avg loss 0.00417021, throughput 9.25536K wps | |
[Epoch 37 Batch 90/162] avg loss 0.00369751, throughput 9.48843K wps | |
[Epoch 37 Batch 120/162] avg loss 0.00391587, throughput 9.25898K wps | |
[Epoch 37 Batch 150/162] avg loss 0.00380923, throughput 9.34119K wps | |
Begin Testing... | |
[Epoch 37] train avg loss 0.00403013, dev acc 0.9122, dev avg loss 0.237798, throughput 9.3771K wps | |
[Epoch 38 Batch 30/162] avg loss 0.00368733, throughput 9.55393K wps | |
[Epoch 38 Batch 60/162] avg loss 0.00391182, throughput 9.37262K wps | |
[Epoch 38 Batch 90/162] avg loss 0.00396453, throughput 9.43634K wps | |
[Epoch 38 Batch 120/162] avg loss 0.0041397, throughput 9.44248K wps | |
[Epoch 38 Batch 150/162] avg loss 0.00424483, throughput 9.51457K wps | |
Begin Testing... | |
[Epoch 38] train avg loss 0.00396421, dev acc 0.9156, dev avg loss 0.235457, throughput 9.46659K wps | |
[Epoch 39 Batch 30/162] avg loss 0.00384706, throughput 9.46862K wps | |
[Epoch 39 Batch 60/162] avg loss 0.00376509, throughput 9.39534K wps | |
[Epoch 39 Batch 90/162] avg loss 0.00371358, throughput 9.39119K wps | |
[Epoch 39 Batch 120/162] avg loss 0.0036837, throughput 9.38283K wps | |
[Epoch 39 Batch 150/162] avg loss 0.00421515, throughput 9.36233K wps | |
Begin Testing... | |
[Epoch 39] train avg loss 0.00384098, dev acc 0.9156, dev avg loss 0.233676, throughput 9.39655K wps | |
[Epoch 40 Batch 30/162] avg loss 0.0038304, throughput 9.50969K wps | |
[Epoch 40 Batch 60/162] avg loss 0.00371572, throughput 9.41708K wps | |
[Epoch 40 Batch 90/162] avg loss 0.00422123, throughput 9.42236K wps | |
[Epoch 40 Batch 120/162] avg loss 0.00393215, throughput 9.52516K wps | |
[Epoch 40 Batch 150/162] avg loss 0.00343454, throughput 9.39763K wps | |
Begin Testing... | |
[Epoch 40] train avg loss 0.00386679, dev acc 0.9156, dev avg loss 0.233001, throughput 9.45388K wps | |
[Epoch 41 Batch 30/162] avg loss 0.00365855, throughput 9.71311K wps | |
[Epoch 41 Batch 60/162] avg loss 0.00360422, throughput 9.37947K wps | |
[Epoch 41 Batch 90/162] avg loss 0.00393725, throughput 9.3635K wps | |
[Epoch 41 Batch 120/162] avg loss 0.00386439, throughput 9.22235K wps | |
[Epoch 41 Batch 150/162] avg loss 0.00364322, throughput 9.45094K wps | |
Begin Testing... | |
[Epoch 41] train avg loss 0.00376707, dev acc 0.9111, dev avg loss 0.232585, throughput 9.43671K wps | |
[Epoch 42 Batch 30/162] avg loss 0.00356105, throughput 9.66854K wps | |
[Epoch 42 Batch 60/162] avg loss 0.00405164, throughput 9.46931K wps | |
[Epoch 42 Batch 90/162] avg loss 0.00365844, throughput 9.45482K wps | |
[Epoch 42 Batch 120/162] avg loss 0.00366082, throughput 9.3757K wps | |
[Epoch 42 Batch 150/162] avg loss 0.00370244, throughput 9.62455K wps | |
Begin Testing... | |
[Epoch 42] train avg loss 0.00371542, dev acc 0.9156, dev avg loss 0.230118, throughput 9.50509K wps | |
[Epoch 43 Batch 30/162] avg loss 0.0037323, throughput 9.6174K wps | |
[Epoch 43 Batch 60/162] avg loss 0.00351933, throughput 9.3256K wps | |
[Epoch 43 Batch 90/162] avg loss 0.00373353, throughput 9.42253K wps | |
[Epoch 43 Batch 120/162] avg loss 0.00364719, throughput 9.56141K wps | |
[Epoch 43 Batch 150/162] avg loss 0.00365445, throughput 9.23229K wps | |
Begin Testing... | |
[Epoch 43] train avg loss 0.00364265, dev acc 0.9178, dev avg loss 0.228829, throughput 9.43623K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 44 Batch 30/162] avg loss 0.00348581, throughput 9.70121K wps | |
[Epoch 44 Batch 60/162] avg loss 0.00367857, throughput 9.21387K wps | |
[Epoch 44 Batch 90/162] avg loss 0.00366131, throughput 9.38915K wps | |
[Epoch 44 Batch 120/162] avg loss 0.00388536, throughput 9.33646K wps | |
[Epoch 44 Batch 150/162] avg loss 0.00338824, throughput 9.41887K wps | |
Begin Testing... | |
[Epoch 44] train avg loss 0.00362926, dev acc 0.9167, dev avg loss 0.22834, throughput 9.37063K wps | |
[Epoch 45 Batch 30/162] avg loss 0.0035622, throughput 9.46774K wps | |
[Epoch 45 Batch 60/162] avg loss 0.00341074, throughput 9.43816K wps | |
[Epoch 45 Batch 90/162] avg loss 0.00357691, throughput 9.25185K wps | |
[Epoch 45 Batch 120/162] avg loss 0.00365014, throughput 9.48571K wps | |
[Epoch 45 Batch 150/162] avg loss 0.00372701, throughput 9.30793K wps | |
Begin Testing... | |
[Epoch 45] train avg loss 0.0035557, dev acc 0.9189, dev avg loss 0.227091, throughput 9.38509K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 46 Batch 30/162] avg loss 0.00373619, throughput 9.54032K wps | |
[Epoch 46 Batch 60/162] avg loss 0.00390801, throughput 9.21746K wps | |
[Epoch 46 Batch 90/162] avg loss 0.00321295, throughput 9.44002K wps | |
[Epoch 46 Batch 120/162] avg loss 0.00347194, throughput 9.64346K wps | |
[Epoch 46 Batch 150/162] avg loss 0.0032535, throughput 9.49093K wps | |
Begin Testing... | |
[Epoch 46] train avg loss 0.00349264, dev acc 0.9178, dev avg loss 0.225902, throughput 9.44335K wps | |
[Epoch 47 Batch 30/162] avg loss 0.00321667, throughput 9.52042K wps | |
[Epoch 47 Batch 60/162] avg loss 0.0036551, throughput 9.5516K wps | |
[Epoch 47 Batch 90/162] avg loss 0.00330308, throughput 9.26896K wps | |
[Epoch 47 Batch 120/162] avg loss 0.00351787, throughput 9.53198K wps | |
[Epoch 47 Batch 150/162] avg loss 0.00354322, throughput 9.50896K wps | |
Begin Testing... | |
[Epoch 47] train avg loss 0.0034673, dev acc 0.9178, dev avg loss 0.22496, throughput 9.48018K wps | |
[Epoch 48 Batch 30/162] avg loss 0.00339867, throughput 9.58202K wps | |
[Epoch 48 Batch 60/162] avg loss 0.00356258, throughput 9.48541K wps | |
[Epoch 48 Batch 90/162] avg loss 0.00366516, throughput 9.54232K wps | |
[Epoch 48 Batch 120/162] avg loss 0.00348893, throughput 9.47152K wps | |
[Epoch 48 Batch 150/162] avg loss 0.00322505, throughput 9.32159K wps | |
Begin Testing... | |
[Epoch 48] train avg loss 0.00342565, dev acc 0.9178, dev avg loss 0.224293, throughput 9.44939K wps | |
[Epoch 49 Batch 30/162] avg loss 0.00342041, throughput 9.46309K wps | |
[Epoch 49 Batch 60/162] avg loss 0.00356455, throughput 9.30137K wps | |
[Epoch 49 Batch 90/162] avg loss 0.00336422, throughput 9.20449K wps | |
[Epoch 49 Batch 120/162] avg loss 0.0031011, throughput 9.21031K wps | |
[Epoch 49 Batch 150/162] avg loss 0.00367254, throughput 9.30586K wps | |
Begin Testing... | |
[Epoch 49] train avg loss 0.00340466, dev acc 0.9178, dev avg loss 0.223225, throughput 9.30418K wps | |
[Epoch 50 Batch 30/162] avg loss 0.00347847, throughput 9.46388K wps | |
[Epoch 50 Batch 60/162] avg loss 0.00312123, throughput 9.36587K wps | |
[Epoch 50 Batch 90/162] avg loss 0.00341098, throughput 9.37377K wps | |
[Epoch 50 Batch 120/162] avg loss 0.00322874, throughput 9.34715K wps | |
[Epoch 50 Batch 150/162] avg loss 0.00349886, throughput 9.31039K wps | |
Begin Testing... | |
[Epoch 50] train avg loss 0.00334845, dev acc 0.9189, dev avg loss 0.222546, throughput 9.38239K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 51 Batch 30/162] avg loss 0.00304245, throughput 9.57039K wps | |
[Epoch 51 Batch 60/162] avg loss 0.00320162, throughput 9.40134K wps | |
[Epoch 51 Batch 90/162] avg loss 0.0035254, throughput 9.22304K wps | |
[Epoch 51 Batch 120/162] avg loss 0.00332416, throughput 9.21957K wps | |
[Epoch 51 Batch 150/162] avg loss 0.00290001, throughput 9.46228K wps | |
Begin Testing... | |
[Epoch 51] train avg loss 0.00320044, dev acc 0.9178, dev avg loss 0.221703, throughput 9.37985K wps | |
[Epoch 52 Batch 30/162] avg loss 0.00330129, throughput 9.53952K wps | |
[Epoch 52 Batch 60/162] avg loss 0.00319879, throughput 9.42616K wps | |
[Epoch 52 Batch 90/162] avg loss 0.00331638, throughput 9.37914K wps | |
[Epoch 52 Batch 120/162] avg loss 0.0031235, throughput 9.45869K wps | |
[Epoch 52 Batch 150/162] avg loss 0.00322151, throughput 9.35381K wps | |
Begin Testing... | |
[Epoch 52] train avg loss 0.00324313, dev acc 0.9178, dev avg loss 0.221029, throughput 9.41941K wps | |
[Epoch 53 Batch 30/162] avg loss 0.00312318, throughput 9.48197K wps | |
[Epoch 53 Batch 60/162] avg loss 0.00293857, throughput 9.30991K wps | |
[Epoch 53 Batch 90/162] avg loss 0.00348293, throughput 9.29317K wps | |
[Epoch 53 Batch 120/162] avg loss 0.00308333, throughput 9.43754K wps | |
[Epoch 53 Batch 150/162] avg loss 0.00317115, throughput 9.38241K wps | |
Begin Testing... | |
[Epoch 53] train avg loss 0.00313697, dev acc 0.9189, dev avg loss 0.219898, throughput 9.37372K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 54 Batch 30/162] avg loss 0.00306758, throughput 9.53444K wps | |
[Epoch 54 Batch 60/162] avg loss 0.0030859, throughput 9.1984K wps | |
[Epoch 54 Batch 90/162] avg loss 0.00317954, throughput 9.1559K wps | |
[Epoch 54 Batch 120/162] avg loss 0.00295329, throughput 9.38802K wps | |
[Epoch 54 Batch 150/162] avg loss 0.00323905, throughput 9.28328K wps | |
Begin Testing... | |
[Epoch 54] train avg loss 0.00309711, dev acc 0.9156, dev avg loss 0.219606, throughput 9.30942K wps | |
[Epoch 55 Batch 30/162] avg loss 0.00309912, throughput 9.55629K wps | |
[Epoch 55 Batch 60/162] avg loss 0.00328996, throughput 9.26022K wps | |
[Epoch 55 Batch 90/162] avg loss 0.00283497, throughput 9.3437K wps | |
[Epoch 55 Batch 120/162] avg loss 0.00309373, throughput 9.33788K wps | |
[Epoch 55 Batch 150/162] avg loss 0.00322807, throughput 9.32481K wps | |
Begin Testing... | |
[Epoch 55] train avg loss 0.00314959, dev acc 0.9189, dev avg loss 0.218877, throughput 9.35597K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 56 Batch 30/162] avg loss 0.00299748, throughput 9.57061K wps | |
[Epoch 56 Batch 60/162] avg loss 0.00324153, throughput 9.3992K wps | |
[Epoch 56 Batch 90/162] avg loss 0.0029744, throughput 9.34628K wps | |
[Epoch 56 Batch 120/162] avg loss 0.00312703, throughput 9.26112K wps | |
[Epoch 56 Batch 150/162] avg loss 0.0027195, throughput 9.38876K wps | |
Begin Testing... | |
[Epoch 56] train avg loss 0.0030296, dev acc 0.9189, dev avg loss 0.218157, throughput 9.39806K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 57 Batch 30/162] avg loss 0.00297457, throughput 9.53425K wps | |
[Epoch 57 Batch 60/162] avg loss 0.00302075, throughput 9.39484K wps | |
[Epoch 57 Batch 90/162] avg loss 0.00308144, throughput 9.471K wps | |
[Epoch 57 Batch 120/162] avg loss 0.00301216, throughput 9.33223K wps | |
[Epoch 57 Batch 150/162] avg loss 0.00285948, throughput 9.5359K wps | |
Begin Testing... | |
[Epoch 57] train avg loss 0.00298452, dev acc 0.9211, dev avg loss 0.216941, throughput 9.44929K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 58 Batch 30/162] avg loss 0.0030519, throughput 9.42538K wps | |
[Epoch 58 Batch 60/162] avg loss 0.00304264, throughput 9.36764K wps | |
[Epoch 58 Batch 90/162] avg loss 0.00318671, throughput 9.45294K wps | |
[Epoch 58 Batch 120/162] avg loss 0.00294361, throughput 9.41266K wps | |
[Epoch 58 Batch 150/162] avg loss 0.00281069, throughput 9.29921K wps | |
Begin Testing... | |
[Epoch 58] train avg loss 0.00299149, dev acc 0.9200, dev avg loss 0.21641, throughput 9.37769K wps | |
[Epoch 59 Batch 30/162] avg loss 0.00309003, throughput 9.76164K wps | |
[Epoch 59 Batch 60/162] avg loss 0.00288478, throughput 9.43423K wps | |
[Epoch 59 Batch 90/162] avg loss 0.0030056, throughput 9.36587K wps | |
[Epoch 59 Batch 120/162] avg loss 0.00284529, throughput 9.49486K wps | |
[Epoch 59 Batch 150/162] avg loss 0.00296807, throughput 9.34959K wps | |
Begin Testing... | |
[Epoch 59] train avg loss 0.00294704, dev acc 0.9189, dev avg loss 0.21637, throughput 9.48271K wps | |
[Epoch 60 Batch 30/162] avg loss 0.00309789, throughput 9.58397K wps | |
[Epoch 60 Batch 60/162] avg loss 0.0028392, throughput 9.416K wps | |
[Epoch 60 Batch 90/162] avg loss 0.00275479, throughput 9.4663K wps | |
[Epoch 60 Batch 120/162] avg loss 0.00266363, throughput 9.38289K wps | |
[Epoch 60 Batch 150/162] avg loss 0.00285094, throughput 9.39935K wps | |
Begin Testing... | |
[Epoch 60] train avg loss 0.00286126, dev acc 0.9189, dev avg loss 0.215594, throughput 9.43455K wps | |
[Epoch 61 Batch 30/162] avg loss 0.00253019, throughput 9.76977K wps | |
[Epoch 61 Batch 60/162] avg loss 0.00296642, throughput 9.38459K wps | |
[Epoch 61 Batch 90/162] avg loss 0.00277417, throughput 9.35544K wps | |
[Epoch 61 Batch 120/162] avg loss 0.00305958, throughput 9.33514K wps | |
[Epoch 61 Batch 150/162] avg loss 0.00266039, throughput 9.2967K wps | |
Begin Testing... | |
[Epoch 61] train avg loss 0.00282061, dev acc 0.9211, dev avg loss 0.214901, throughput 9.42117K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 62 Batch 30/162] avg loss 0.00269149, throughput 9.6304K wps | |
[Epoch 62 Batch 60/162] avg loss 0.00275886, throughput 9.28424K wps | |
[Epoch 62 Batch 90/162] avg loss 0.00278965, throughput 9.46314K wps | |
[Epoch 62 Batch 120/162] avg loss 0.00278281, throughput 9.41509K wps | |
[Epoch 62 Batch 150/162] avg loss 0.00296652, throughput 9.46217K wps | |
Begin Testing... | |
[Epoch 62] train avg loss 0.00280655, dev acc 0.9200, dev avg loss 0.214234, throughput 9.45876K wps | |
[Epoch 63 Batch 30/162] avg loss 0.0026918, throughput 9.60831K wps | |
[Epoch 63 Batch 60/162] avg loss 0.00272578, throughput 9.43545K wps | |
[Epoch 63 Batch 90/162] avg loss 0.00265863, throughput 9.32025K wps | |
[Epoch 63 Batch 120/162] avg loss 0.00290196, throughput 9.13743K wps | |
[Epoch 63 Batch 150/162] avg loss 0.00266342, throughput 9.28168K wps | |
Begin Testing... | |
[Epoch 63] train avg loss 0.0027211, dev acc 0.9211, dev avg loss 0.214618, throughput 9.34955K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 64 Batch 30/162] avg loss 0.0026126, throughput 9.56088K wps | |
[Epoch 64 Batch 60/162] avg loss 0.00261168, throughput 9.39434K wps | |
[Epoch 64 Batch 90/162] avg loss 0.00284288, throughput 9.35492K wps | |
[Epoch 64 Batch 120/162] avg loss 0.00285913, throughput 9.30435K wps | |
[Epoch 64 Batch 150/162] avg loss 0.00259517, throughput 9.2835K wps | |
Begin Testing... | |
[Epoch 64] train avg loss 0.00270896, dev acc 0.9222, dev avg loss 0.212864, throughput 9.38041K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 65 Batch 30/162] avg loss 0.00266184, throughput 9.457K wps | |
[Epoch 65 Batch 60/162] avg loss 0.00256257, throughput 9.45132K wps | |
[Epoch 65 Batch 90/162] avg loss 0.00249786, throughput 9.32666K wps | |
[Epoch 65 Batch 120/162] avg loss 0.00299726, throughput 9.40122K wps | |
[Epoch 65 Batch 150/162] avg loss 0.00266928, throughput 9.27458K wps | |
Begin Testing... | |
[Epoch 65] train avg loss 0.00266813, dev acc 0.9222, dev avg loss 0.212283, throughput 9.37011K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 66 Batch 30/162] avg loss 0.00250264, throughput 9.52543K wps | |
[Epoch 66 Batch 60/162] avg loss 0.00241669, throughput 9.27055K wps | |
[Epoch 66 Batch 90/162] avg loss 0.00282027, throughput 9.42991K wps | |
[Epoch 66 Batch 120/162] avg loss 0.00269262, throughput 9.23692K wps | |
[Epoch 66 Batch 150/162] avg loss 0.00266666, throughput 9.27168K wps | |
Begin Testing... | |
[Epoch 66] train avg loss 0.00259419, dev acc 0.9222, dev avg loss 0.211483, throughput 9.33889K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 67 Batch 30/162] avg loss 0.00249504, throughput 9.47676K wps | |
[Epoch 67 Batch 60/162] avg loss 0.00250064, throughput 9.29991K wps | |
[Epoch 67 Batch 90/162] avg loss 0.00257394, throughput 9.36837K wps | |
[Epoch 67 Batch 120/162] avg loss 0.00274929, throughput 9.35763K wps | |
[Epoch 67 Batch 150/162] avg loss 0.00266347, throughput 9.44608K wps | |
Begin Testing... | |
[Epoch 67] train avg loss 0.00259588, dev acc 0.9200, dev avg loss 0.210808, throughput 9.37524K wps | |
[Epoch 68 Batch 30/162] avg loss 0.0026491, throughput 9.55714K wps | |
[Epoch 68 Batch 60/162] avg loss 0.00293124, throughput 9.24097K wps | |
[Epoch 68 Batch 90/162] avg loss 0.0023547, throughput 9.39898K wps | |
[Epoch 68 Batch 120/162] avg loss 0.00261437, throughput 9.43415K wps | |
[Epoch 68 Batch 150/162] avg loss 0.0023147, throughput 9.37737K wps | |
Begin Testing... | |
[Epoch 68] train avg loss 0.00255004, dev acc 0.9200, dev avg loss 0.210856, throughput 9.4112K wps | |
[Epoch 69 Batch 30/162] avg loss 0.00236264, throughput 9.76615K wps | |
[Epoch 69 Batch 60/162] avg loss 0.00254204, throughput 9.53889K wps | |
[Epoch 69 Batch 90/162] avg loss 0.00256414, throughput 9.38684K wps | |
[Epoch 69 Batch 120/162] avg loss 0.00262695, throughput 9.44273K wps | |
[Epoch 69 Batch 150/162] avg loss 0.00251046, throughput 9.40966K wps | |
Begin Testing... | |
[Epoch 69] train avg loss 0.00253583, dev acc 0.9200, dev avg loss 0.210559, throughput 9.50258K wps | |
[Epoch 70 Batch 30/162] avg loss 0.0024204, throughput 9.53716K wps | |
[Epoch 70 Batch 60/162] avg loss 0.00244967, throughput 9.4404K wps | |
[Epoch 70 Batch 90/162] avg loss 0.00240488, throughput 9.36018K wps | |
[Epoch 70 Batch 120/162] avg loss 0.00263349, throughput 9.45604K wps | |
[Epoch 70 Batch 150/162] avg loss 0.00234914, throughput 9.35992K wps | |
Begin Testing... | |
[Epoch 70] train avg loss 0.00245254, dev acc 0.9200, dev avg loss 0.209931, throughput 9.43631K wps | |
[Epoch 71 Batch 30/162] avg loss 0.00240789, throughput 9.68664K wps | |
[Epoch 71 Batch 60/162] avg loss 0.00248945, throughput 9.3224K wps | |
[Epoch 71 Batch 90/162] avg loss 0.00256101, throughput 9.40859K wps | |
[Epoch 71 Batch 120/162] avg loss 0.00226569, throughput 9.23253K wps | |
[Epoch 71 Batch 150/162] avg loss 0.00262153, throughput 9.38478K wps | |
Begin Testing... | |
[Epoch 71] train avg loss 0.00248414, dev acc 0.9211, dev avg loss 0.209473, throughput 9.38691K wps | |
[Epoch 72 Batch 30/162] avg loss 0.00231295, throughput 9.40852K wps | |
[Epoch 72 Batch 60/162] avg loss 0.00227905, throughput 9.18122K wps | |
[Epoch 72 Batch 90/162] avg loss 0.00249192, throughput 9.33384K wps | |
[Epoch 72 Batch 120/162] avg loss 0.00220115, throughput 9.49662K wps | |
[Epoch 72 Batch 150/162] avg loss 0.00257822, throughput 9.53931K wps | |
Begin Testing... | |
[Epoch 72] train avg loss 0.00237862, dev acc 0.9222, dev avg loss 0.208919, throughput 9.37349K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 73 Batch 30/162] avg loss 0.00248132, throughput 9.66719K wps | |
[Epoch 73 Batch 60/162] avg loss 0.00223944, throughput 9.3788K wps | |
[Epoch 73 Batch 90/162] avg loss 0.00237729, throughput 9.33332K wps | |
[Epoch 73 Batch 120/162] avg loss 0.00263944, throughput 9.51485K wps | |
[Epoch 73 Batch 150/162] avg loss 0.00230547, throughput 9.52591K wps | |
Begin Testing... | |
[Epoch 73] train avg loss 0.00239745, dev acc 0.9222, dev avg loss 0.208237, throughput 9.47115K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 74 Batch 30/162] avg loss 0.00233452, throughput 9.72133K wps | |
[Epoch 74 Batch 60/162] avg loss 0.00235288, throughput 9.42341K wps | |
[Epoch 74 Batch 90/162] avg loss 0.00223811, throughput 9.50422K wps | |
[Epoch 74 Batch 120/162] avg loss 0.00246239, throughput 9.34197K wps | |
[Epoch 74 Batch 150/162] avg loss 0.00239308, throughput 9.39556K wps | |
Begin Testing... | |
[Epoch 74] train avg loss 0.00235093, dev acc 0.9211, dev avg loss 0.207968, throughput 9.46835K wps | |
[Epoch 75 Batch 30/162] avg loss 0.00257944, throughput 9.6299K wps | |
[Epoch 75 Batch 60/162] avg loss 0.00218365, throughput 9.38988K wps | |
[Epoch 75 Batch 90/162] avg loss 0.0021832, throughput 9.56108K wps | |
[Epoch 75 Batch 120/162] avg loss 0.00231687, throughput 9.35633K wps | |
[Epoch 75 Batch 150/162] avg loss 0.00227732, throughput 9.26077K wps | |
Begin Testing... | |
[Epoch 75] train avg loss 0.00234944, dev acc 0.9211, dev avg loss 0.207756, throughput 9.44125K wps | |
[Epoch 76 Batch 30/162] avg loss 0.00246977, throughput 9.57019K wps | |
[Epoch 76 Batch 60/162] avg loss 0.00233069, throughput 9.47189K wps | |
[Epoch 76 Batch 90/162] avg loss 0.00205494, throughput 9.47045K wps | |
[Epoch 76 Batch 120/162] avg loss 0.00233756, throughput 9.3395K wps | |
[Epoch 76 Batch 150/162] avg loss 0.00245006, throughput 9.38939K wps | |
Begin Testing... | |
[Epoch 76] train avg loss 0.00231847, dev acc 0.9233, dev avg loss 0.208611, throughput 9.43029K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 77 Batch 30/162] avg loss 0.0024733, throughput 9.6933K wps | |
[Epoch 77 Batch 60/162] avg loss 0.00224544, throughput 9.36658K wps | |
[Epoch 77 Batch 90/162] avg loss 0.00225306, throughput 9.3526K wps | |
[Epoch 77 Batch 120/162] avg loss 0.00239805, throughput 9.40154K wps | |
[Epoch 77 Batch 150/162] avg loss 0.00195458, throughput 9.40228K wps | |
Begin Testing... | |
[Epoch 77] train avg loss 0.00228019, dev acc 0.9233, dev avg loss 0.206629, throughput 9.44618K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 78 Batch 30/162] avg loss 0.00207113, throughput 9.69232K wps | |
[Epoch 78 Batch 60/162] avg loss 0.00228975, throughput 9.23539K wps | |
[Epoch 78 Batch 90/162] avg loss 0.00230338, throughput 9.42575K wps | |
[Epoch 78 Batch 120/162] avg loss 0.00218398, throughput 9.37048K wps | |
[Epoch 78 Batch 150/162] avg loss 0.00197783, throughput 9.32321K wps | |
Begin Testing... | |
[Epoch 78] train avg loss 0.00217135, dev acc 0.9211, dev avg loss 0.206674, throughput 9.40951K wps | |
[Epoch 79 Batch 30/162] avg loss 0.00235005, throughput 9.47869K wps | |
[Epoch 79 Batch 60/162] avg loss 0.00216836, throughput 9.3978K wps | |
[Epoch 79 Batch 90/162] avg loss 0.00214022, throughput 9.37198K wps | |
[Epoch 79 Batch 120/162] avg loss 0.00223165, throughput 9.33747K wps | |
[Epoch 79 Batch 150/162] avg loss 0.00198613, throughput 9.41899K wps | |
Begin Testing... | |
[Epoch 79] train avg loss 0.00215863, dev acc 0.9222, dev avg loss 0.206344, throughput 9.39235K wps | |
[Epoch 80 Batch 30/162] avg loss 0.00222975, throughput 9.6298K wps | |
[Epoch 80 Batch 60/162] avg loss 0.00184238, throughput 9.28435K wps | |
[Epoch 80 Batch 90/162] avg loss 0.00252918, throughput 9.25054K wps | |
[Epoch 80 Batch 120/162] avg loss 0.00220173, throughput 9.49821K wps | |
[Epoch 80 Batch 150/162] avg loss 0.00183768, throughput 9.28979K wps | |
Begin Testing... | |
[Epoch 80] train avg loss 0.00212165, dev acc 0.9244, dev avg loss 0.206298, throughput 9.40949K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 81 Batch 30/162] avg loss 0.00230204, throughput 9.49307K wps | |
[Epoch 81 Batch 60/162] avg loss 0.00216045, throughput 9.31312K wps | |
[Epoch 81 Batch 90/162] avg loss 0.00231145, throughput 9.29803K wps | |
[Epoch 81 Batch 120/162] avg loss 0.00209262, throughput 9.33757K wps | |
[Epoch 81 Batch 150/162] avg loss 0.0019635, throughput 9.31314K wps | |
Begin Testing... | |
[Epoch 81] train avg loss 0.0021713, dev acc 0.9233, dev avg loss 0.205901, throughput 9.34852K wps | |
[Epoch 82 Batch 30/162] avg loss 0.00190619, throughput 9.55957K wps | |
[Epoch 82 Batch 60/162] avg loss 0.00234425, throughput 9.35162K wps | |
[Epoch 82 Batch 90/162] avg loss 0.00232376, throughput 9.22046K wps | |
[Epoch 82 Batch 120/162] avg loss 0.00202978, throughput 9.54604K wps | |
[Epoch 82 Batch 150/162] avg loss 0.00209262, throughput 9.52053K wps | |
Begin Testing... | |
[Epoch 82] train avg loss 0.00211971, dev acc 0.9222, dev avg loss 0.205093, throughput 9.42898K wps | |
[Epoch 83 Batch 30/162] avg loss 0.00217346, throughput 9.32124K wps | |
[Epoch 83 Batch 60/162] avg loss 0.00185793, throughput 9.43477K wps | |
[Epoch 83 Batch 90/162] avg loss 0.00228981, throughput 9.39923K wps | |
[Epoch 83 Batch 120/162] avg loss 0.00215277, throughput 9.38176K wps | |
[Epoch 83 Batch 150/162] avg loss 0.0022617, throughput 9.43604K wps | |
Begin Testing... | |
[Epoch 83] train avg loss 0.00215133, dev acc 0.9244, dev avg loss 0.205591, throughput 9.37267K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 84 Batch 30/162] avg loss 0.00187361, throughput 9.58512K wps | |
[Epoch 84 Batch 60/162] avg loss 0.00205785, throughput 9.26876K wps | |
[Epoch 84 Batch 90/162] avg loss 0.00206367, throughput 9.509K wps | |
[Epoch 84 Batch 120/162] avg loss 0.00219402, throughput 9.33053K wps | |
[Epoch 84 Batch 150/162] avg loss 0.00198974, throughput 9.37345K wps | |
Begin Testing... | |
[Epoch 84] train avg loss 0.00203205, dev acc 0.9222, dev avg loss 0.204304, throughput 9.39575K wps | |
[Epoch 85 Batch 30/162] avg loss 0.00208433, throughput 9.5062K wps | |
[Epoch 85 Batch 60/162] avg loss 0.00177657, throughput 9.31559K wps | |
[Epoch 85 Batch 90/162] avg loss 0.00198931, throughput 9.29835K wps | |
[Epoch 85 Batch 120/162] avg loss 0.00200297, throughput 9.39833K wps | |
[Epoch 85 Batch 150/162] avg loss 0.00227748, throughput 9.29818K wps | |
Begin Testing... | |
[Epoch 85] train avg loss 0.00202292, dev acc 0.9256, dev avg loss 0.203931, throughput 9.3396K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 86 Batch 30/162] avg loss 0.00196756, throughput 9.52375K wps | |
[Epoch 86 Batch 60/162] avg loss 0.0019974, throughput 9.23258K wps | |
[Epoch 86 Batch 90/162] avg loss 0.00204913, throughput 9.34431K wps | |
[Epoch 86 Batch 120/162] avg loss 0.00179861, throughput 9.4073K wps | |
[Epoch 86 Batch 150/162] avg loss 0.0018706, throughput 9.37705K wps | |
Begin Testing... | |
[Epoch 86] train avg loss 0.0019604, dev acc 0.9244, dev avg loss 0.204496, throughput 9.37745K wps | |
[Epoch 87 Batch 30/162] avg loss 0.00190248, throughput 9.5779K wps | |
[Epoch 87 Batch 60/162] avg loss 0.00211798, throughput 9.29668K wps | |
[Epoch 87 Batch 90/162] avg loss 0.00191901, throughput 9.24771K wps | |
[Epoch 87 Batch 120/162] avg loss 0.00195456, throughput 9.44121K wps | |
[Epoch 87 Batch 150/162] avg loss 0.00204954, throughput 9.27785K wps | |
Begin Testing... | |
[Epoch 87] train avg loss 0.00199495, dev acc 0.9256, dev avg loss 0.203738, throughput 9.37875K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 88 Batch 30/162] avg loss 0.0019946, throughput 9.41906K wps | |
[Epoch 88 Batch 60/162] avg loss 0.00193633, throughput 9.30126K wps | |
[Epoch 88 Batch 90/162] avg loss 0.00185998, throughput 9.5232K wps | |
[Epoch 88 Batch 120/162] avg loss 0.00195051, throughput 9.43933K wps | |
[Epoch 88 Batch 150/162] avg loss 0.00214931, throughput 9.31672K wps | |
Begin Testing... | |
[Epoch 88] train avg loss 0.00195381, dev acc 0.9256, dev avg loss 0.203697, throughput 9.39728K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 89 Batch 30/162] avg loss 0.00192866, throughput 9.52825K wps | |
[Epoch 89 Batch 60/162] avg loss 0.00197791, throughput 9.26104K wps | |
[Epoch 89 Batch 90/162] avg loss 0.00193645, throughput 9.40345K wps | |
[Epoch 89 Batch 120/162] avg loss 0.00217938, throughput 9.37801K wps | |
[Epoch 89 Batch 150/162] avg loss 0.00199219, throughput 9.28022K wps | |
Begin Testing... | |
[Epoch 89] train avg loss 0.00200452, dev acc 0.9244, dev avg loss 0.204087, throughput 9.38397K wps | |
[Epoch 90 Batch 30/162] avg loss 0.00185857, throughput 9.51826K wps | |
[Epoch 90 Batch 60/162] avg loss 0.00202954, throughput 9.3183K wps | |
[Epoch 90 Batch 90/162] avg loss 0.00203731, throughput 9.44054K wps | |
[Epoch 90 Batch 120/162] avg loss 0.00187401, throughput 9.31878K wps | |
[Epoch 90 Batch 150/162] avg loss 0.00185091, throughput 9.23475K wps | |
Begin Testing... | |
[Epoch 90] train avg loss 0.00190673, dev acc 0.9256, dev avg loss 0.202735, throughput 9.38357K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 91 Batch 30/162] avg loss 0.00167195, throughput 9.48947K wps | |
[Epoch 91 Batch 60/162] avg loss 0.00180569, throughput 9.41164K wps | |
[Epoch 91 Batch 90/162] avg loss 0.00181314, throughput 9.38445K wps | |
[Epoch 91 Batch 120/162] avg loss 0.00202195, throughput 9.43237K wps | |
[Epoch 91 Batch 150/162] avg loss 0.00214542, throughput 9.30149K wps | |
Begin Testing... | |
[Epoch 91] train avg loss 0.00186402, dev acc 0.9233, dev avg loss 0.202484, throughput 9.41859K wps | |
[Epoch 92 Batch 30/162] avg loss 0.00202393, throughput 9.57583K wps | |
[Epoch 92 Batch 60/162] avg loss 0.00166262, throughput 9.4278K wps | |
[Epoch 92 Batch 90/162] avg loss 0.0018026, throughput 9.32236K wps | |
[Epoch 92 Batch 120/162] avg loss 0.0016359, throughput 9.46342K wps | |
[Epoch 92 Batch 150/162] avg loss 0.00196304, throughput 9.4185K wps | |
Begin Testing... | |
[Epoch 92] train avg loss 0.00184312, dev acc 0.9256, dev avg loss 0.202851, throughput 9.4406K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 93 Batch 30/162] avg loss 0.00192257, throughput 9.51555K wps | |
[Epoch 93 Batch 60/162] avg loss 0.00183915, throughput 9.23505K wps | |
[Epoch 93 Batch 90/162] avg loss 0.00188178, throughput 9.42059K wps | |
[Epoch 93 Batch 120/162] avg loss 0.00172586, throughput 9.35112K wps | |
[Epoch 93 Batch 150/162] avg loss 0.00162763, throughput 9.27694K wps | |
Begin Testing... | |
[Epoch 93] train avg loss 0.0018171, dev acc 0.9244, dev avg loss 0.202735, throughput 9.35486K wps | |
[Epoch 94 Batch 30/162] avg loss 0.00163137, throughput 9.48787K wps | |
[Epoch 94 Batch 60/162] avg loss 0.00172054, throughput 9.52759K wps | |
[Epoch 94 Batch 90/162] avg loss 0.00191297, throughput 9.29313K wps | |
[Epoch 94 Batch 120/162] avg loss 0.0017799, throughput 9.39914K wps | |
[Epoch 94 Batch 150/162] avg loss 0.00186745, throughput 9.41909K wps | |
Begin Testing... | |
[Epoch 94] train avg loss 0.00177816, dev acc 0.9267, dev avg loss 0.201452, throughput 9.4095K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 95 Batch 30/162] avg loss 0.00165395, throughput 9.54676K wps | |
[Epoch 95 Batch 60/162] avg loss 0.00188043, throughput 9.23283K wps | |
[Epoch 95 Batch 90/162] avg loss 0.00188678, throughput 9.33186K wps | |
[Epoch 95 Batch 120/162] avg loss 0.00183553, throughput 9.30612K wps | |
[Epoch 95 Batch 150/162] avg loss 0.00167964, throughput 9.36738K wps | |
Begin Testing... | |
[Epoch 95] train avg loss 0.00177887, dev acc 0.9278, dev avg loss 0.201506, throughput 9.35049K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 96 Batch 30/162] avg loss 0.0018067, throughput 9.36084K wps | |
[Epoch 96 Batch 60/162] avg loss 0.00150185, throughput 9.45998K wps | |
[Epoch 96 Batch 90/162] avg loss 0.00174129, throughput 9.40955K wps | |
[Epoch 96 Batch 120/162] avg loss 0.00168815, throughput 9.43815K wps | |
[Epoch 96 Batch 150/162] avg loss 0.00183747, throughput 9.38871K wps | |
Begin Testing... | |
[Epoch 96] train avg loss 0.00172719, dev acc 0.9256, dev avg loss 0.202846, throughput 9.4169K wps | |
[Epoch 97 Batch 30/162] avg loss 0.0016267, throughput 9.4709K wps | |
[Epoch 97 Batch 60/162] avg loss 0.00164389, throughput 9.51736K wps | |
[Epoch 97 Batch 90/162] avg loss 0.00153075, throughput 9.30254K wps | |
[Epoch 97 Batch 120/162] avg loss 0.00184117, throughput 9.53896K wps | |
[Epoch 97 Batch 150/162] avg loss 0.00179041, throughput 9.29699K wps | |
Begin Testing... | |
[Epoch 97] train avg loss 0.00169207, dev acc 0.9256, dev avg loss 0.201731, throughput 9.40968K wps | |
[Epoch 98 Batch 30/162] avg loss 0.00161255, throughput 9.55141K wps | |
[Epoch 98 Batch 60/162] avg loss 0.0017042, throughput 9.25578K wps | |
[Epoch 98 Batch 90/162] avg loss 0.00162267, throughput 9.31859K wps | |
[Epoch 98 Batch 120/162] avg loss 0.00183334, throughput 9.42684K wps | |
[Epoch 98 Batch 150/162] avg loss 0.0018736, throughput 9.26711K wps | |
Begin Testing... | |
[Epoch 98] train avg loss 0.00172534, dev acc 0.9278, dev avg loss 0.20133, throughput 9.35885K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 99 Batch 30/162] avg loss 0.00178203, throughput 9.69662K wps | |
[Epoch 99 Batch 60/162] avg loss 0.00168355, throughput 9.25415K wps | |
[Epoch 99 Batch 90/162] avg loss 0.00180107, throughput 9.45727K wps | |
[Epoch 99 Batch 120/162] avg loss 0.00156636, throughput 9.33142K wps | |
[Epoch 99 Batch 150/162] avg loss 0.00161922, throughput 9.42126K wps | |
Begin Testing... | |
[Epoch 99] train avg loss 0.00169234, dev acc 0.9278, dev avg loss 0.201103, throughput 9.43427K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 100 Batch 30/162] avg loss 0.0016259, throughput 9.72245K wps | |
[Epoch 100 Batch 60/162] avg loss 0.00157466, throughput 9.47123K wps | |
[Epoch 100 Batch 90/162] avg loss 0.00155395, throughput 9.48682K wps | |
[Epoch 100 Batch 120/162] avg loss 0.00169453, throughput 9.49779K wps | |
[Epoch 100 Batch 150/162] avg loss 0.00187769, throughput 9.32434K wps | |
Begin Testing... | |
[Epoch 100] train avg loss 0.00167188, dev acc 0.9278, dev avg loss 0.201584, throughput 9.50583K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 101 Batch 30/162] avg loss 0.00150735, throughput 9.55371K wps | |
[Epoch 101 Batch 60/162] avg loss 0.00189267, throughput 9.30413K wps | |
[Epoch 101 Batch 90/162] avg loss 0.00151687, throughput 9.31869K wps | |
[Epoch 101 Batch 120/162] avg loss 0.00165239, throughput 9.52223K wps | |
[Epoch 101 Batch 150/162] avg loss 0.0016823, throughput 9.32521K wps | |
Begin Testing... | |
[Epoch 101] train avg loss 0.00164527, dev acc 0.9244, dev avg loss 0.201661, throughput 9.39745K wps | |
[Epoch 102 Batch 30/162] avg loss 0.00150766, throughput 9.43727K wps | |
[Epoch 102 Batch 60/162] avg loss 0.00164845, throughput 9.2507K wps | |
[Epoch 102 Batch 90/162] avg loss 0.00175769, throughput 9.35673K wps | |
[Epoch 102 Batch 120/162] avg loss 0.00178624, throughput 9.4864K wps | |
[Epoch 102 Batch 150/162] avg loss 0.00154615, throughput 9.39803K wps | |
Begin Testing... | |
[Epoch 102] train avg loss 0.00164297, dev acc 0.9256, dev avg loss 0.201279, throughput 9.4006K wps | |
[Epoch 103 Batch 30/162] avg loss 0.00161365, throughput 9.58287K wps | |
[Epoch 103 Batch 60/162] avg loss 0.00161417, throughput 9.36368K wps | |
[Epoch 103 Batch 90/162] avg loss 0.00167703, throughput 9.27214K wps | |
[Epoch 103 Batch 120/162] avg loss 0.00152906, throughput 9.445K wps | |
[Epoch 103 Batch 150/162] avg loss 0.00155203, throughput 9.4719K wps | |
Begin Testing... | |
[Epoch 103] train avg loss 0.00158686, dev acc 0.9267, dev avg loss 0.20078, throughput 9.41522K wps | |
[Epoch 104 Batch 30/162] avg loss 0.0015189, throughput 9.55253K wps | |
[Epoch 104 Batch 60/162] avg loss 0.00157563, throughput 9.37204K wps | |
[Epoch 104 Batch 90/162] avg loss 0.00151875, throughput 9.47342K wps | |
[Epoch 104 Batch 120/162] avg loss 0.00146111, throughput 9.28916K wps | |
[Epoch 104 Batch 150/162] avg loss 0.00167138, throughput 9.40638K wps | |
Begin Testing... | |
[Epoch 104] train avg loss 0.00155683, dev acc 0.9267, dev avg loss 0.200666, throughput 9.42508K wps | |
[Epoch 105 Batch 30/162] avg loss 0.00149955, throughput 9.63626K wps | |
[Epoch 105 Batch 60/162] avg loss 0.00157579, throughput 9.2207K wps | |
[Epoch 105 Batch 90/162] avg loss 0.00165946, throughput 9.4933K wps | |
[Epoch 105 Batch 120/162] avg loss 0.00146336, throughput 9.41137K wps | |
[Epoch 105 Batch 150/162] avg loss 0.00153732, throughput 9.30036K wps | |
Begin Testing... | |
[Epoch 105] train avg loss 0.00155901, dev acc 0.9289, dev avg loss 0.199678, throughput 9.40358K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 106 Batch 30/162] avg loss 0.00142332, throughput 9.66661K wps | |
[Epoch 106 Batch 60/162] avg loss 0.00158049, throughput 9.40541K wps | |
[Epoch 106 Batch 90/162] avg loss 0.00157714, throughput 9.41834K wps | |
[Epoch 106 Batch 120/162] avg loss 0.00141014, throughput 9.35045K wps | |
[Epoch 106 Batch 150/162] avg loss 0.00164792, throughput 9.43854K wps | |
Begin Testing... | |
[Epoch 106] train avg loss 0.00152312, dev acc 0.9278, dev avg loss 0.200344, throughput 9.4436K wps | |
[Epoch 107 Batch 30/162] avg loss 0.00140457, throughput 9.57794K wps | |
[Epoch 107 Batch 60/162] avg loss 0.00154623, throughput 9.51554K wps | |
[Epoch 107 Batch 90/162] avg loss 0.00144479, throughput 9.43262K wps | |
[Epoch 107 Batch 120/162] avg loss 0.00160065, throughput 9.45721K wps | |
[Epoch 107 Batch 150/162] avg loss 0.0015154, throughput 9.43334K wps | |
Begin Testing... | |
[Epoch 107] train avg loss 0.00151217, dev acc 0.9244, dev avg loss 0.201207, throughput 9.46985K wps | |
[Epoch 108 Batch 30/162] avg loss 0.00157157, throughput 9.5936K wps | |
[Epoch 108 Batch 60/162] avg loss 0.00155498, throughput 9.36322K wps | |
[Epoch 108 Batch 90/162] avg loss 0.00149941, throughput 9.37888K wps | |
[Epoch 108 Batch 120/162] avg loss 0.00141142, throughput 9.22913K wps | |
[Epoch 108 Batch 150/162] avg loss 0.00173523, throughput 9.35308K wps | |
Begin Testing... | |
[Epoch 108] train avg loss 0.00154691, dev acc 0.9278, dev avg loss 0.200023, throughput 9.38223K wps | |
[Epoch 109 Batch 30/162] avg loss 0.0013554, throughput 9.6K wps | |
[Epoch 109 Batch 60/162] avg loss 0.00149964, throughput 9.3052K wps | |
[Epoch 109 Batch 90/162] avg loss 0.00153402, throughput 9.29517K wps | |
[Epoch 109 Batch 120/162] avg loss 0.001613, throughput 9.28471K wps | |
[Epoch 109 Batch 150/162] avg loss 0.00141026, throughput 9.37518K wps | |
Begin Testing... | |
[Epoch 109] train avg loss 0.00150107, dev acc 0.9278, dev avg loss 0.199981, throughput 9.3516K wps | |
[Epoch 110 Batch 30/162] avg loss 0.00147237, throughput 9.49169K wps | |
[Epoch 110 Batch 60/162] avg loss 0.00133405, throughput 9.4005K wps | |
[Epoch 110 Batch 90/162] avg loss 0.00153076, throughput 9.60073K wps | |
[Epoch 110 Batch 120/162] avg loss 0.00142159, throughput 9.35705K wps | |
[Epoch 110 Batch 150/162] avg loss 0.00150565, throughput 9.50525K wps | |
Begin Testing... | |
[Epoch 110] train avg loss 0.00147086, dev acc 0.9278, dev avg loss 0.199383, throughput 9.4574K wps | |
[Epoch 111 Batch 30/162] avg loss 0.00139733, throughput 9.35636K wps | |
[Epoch 111 Batch 60/162] avg loss 0.00153088, throughput 9.3379K wps | |
[Epoch 111 Batch 90/162] avg loss 0.00133508, throughput 9.30945K wps | |
[Epoch 111 Batch 120/162] avg loss 0.00143346, throughput 9.27795K wps | |
[Epoch 111 Batch 150/162] avg loss 0.00156747, throughput 9.28131K wps | |
Begin Testing... | |
[Epoch 111] train avg loss 0.00145299, dev acc 0.9278, dev avg loss 0.199196, throughput 9.32067K wps | |
[Epoch 112 Batch 30/162] avg loss 0.00141012, throughput 9.49453K wps | |
[Epoch 112 Batch 60/162] avg loss 0.00164314, throughput 9.31206K wps | |
[Epoch 112 Batch 90/162] avg loss 0.00135187, throughput 9.31677K wps | |
[Epoch 112 Batch 120/162] avg loss 0.00145206, throughput 9.49807K wps | |
[Epoch 112 Batch 150/162] avg loss 0.00138705, throughput 9.59828K wps | |
Begin Testing... | |
[Epoch 112] train avg loss 0.0014432, dev acc 0.9311, dev avg loss 0.199142, throughput 9.44057K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 113 Batch 30/162] avg loss 0.0013675, throughput 9.80452K wps | |
[Epoch 113 Batch 60/162] avg loss 0.00140924, throughput 9.24686K wps | |
[Epoch 113 Batch 90/162] avg loss 0.00136147, throughput 9.41749K wps | |
[Epoch 113 Batch 120/162] avg loss 0.00138529, throughput 9.19369K wps | |
[Epoch 113 Batch 150/162] avg loss 0.00147477, throughput 9.44531K wps | |
Begin Testing... | |
[Epoch 113] train avg loss 0.00138702, dev acc 0.9289, dev avg loss 0.199933, throughput 9.4024K wps | |
[Epoch 114 Batch 30/162] avg loss 0.00127599, throughput 9.65191K wps | |
[Epoch 114 Batch 60/162] avg loss 0.00142577, throughput 9.34895K wps | |
[Epoch 114 Batch 90/162] avg loss 0.00155588, throughput 9.31974K wps | |
[Epoch 114 Batch 120/162] avg loss 0.00124705, throughput 9.319K wps | |
[Epoch 114 Batch 150/162] avg loss 0.00156108, throughput 9.2978K wps | |
Begin Testing... | |
[Epoch 114] train avg loss 0.00140714, dev acc 0.9267, dev avg loss 0.199581, throughput 9.38194K wps | |
[Epoch 115 Batch 30/162] avg loss 0.0012881, throughput 9.43695K wps | |
[Epoch 115 Batch 60/162] avg loss 0.00130016, throughput 9.57039K wps | |
[Epoch 115 Batch 90/162] avg loss 0.00149962, throughput 9.3521K wps | |
[Epoch 115 Batch 120/162] avg loss 0.00131644, throughput 9.29963K wps | |
[Epoch 115 Batch 150/162] avg loss 0.00136541, throughput 9.43992K wps | |
Begin Testing... | |
[Epoch 115] train avg loss 0.00136657, dev acc 0.9267, dev avg loss 0.199445, throughput 9.40994K wps | |
[Epoch 116 Batch 30/162] avg loss 0.00144005, throughput 9.5065K wps | |
[Epoch 116 Batch 60/162] avg loss 0.00121961, throughput 9.38819K wps | |
[Epoch 116 Batch 90/162] avg loss 0.00146966, throughput 9.44021K wps | |
[Epoch 116 Batch 120/162] avg loss 0.00128273, throughput 9.38758K wps | |
[Epoch 116 Batch 150/162] avg loss 0.00126835, throughput 9.37983K wps | |
Begin Testing... | |
[Epoch 116] train avg loss 0.00134921, dev acc 0.9278, dev avg loss 0.19927, throughput 9.41364K wps | |
[Epoch 117 Batch 30/162] avg loss 0.00128174, throughput 9.70739K wps | |
[Epoch 117 Batch 60/162] avg loss 0.00136264, throughput 9.47089K wps | |
[Epoch 117 Batch 90/162] avg loss 0.00132684, throughput 9.28686K wps | |
[Epoch 117 Batch 120/162] avg loss 0.00131676, throughput 9.39006K wps | |
[Epoch 117 Batch 150/162] avg loss 0.00164203, throughput 9.41568K wps | |
Begin Testing... | |
[Epoch 117] train avg loss 0.00136938, dev acc 0.9256, dev avg loss 0.200608, throughput 9.46134K wps | |
[Epoch 118 Batch 30/162] avg loss 0.00146328, throughput 9.43066K wps | |
[Epoch 118 Batch 60/162] avg loss 0.00131696, throughput 9.39652K wps | |
[Epoch 118 Batch 90/162] avg loss 0.00116184, throughput 9.40948K wps | |
[Epoch 118 Batch 120/162] avg loss 0.00123258, throughput 9.44742K wps | |
[Epoch 118 Batch 150/162] avg loss 0.00131881, throughput 9.45972K wps | |
Begin Testing... | |
[Epoch 118] train avg loss 0.00130584, dev acc 0.9311, dev avg loss 0.199374, throughput 9.42266K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 119 Batch 30/162] avg loss 0.00122319, throughput 9.47212K wps | |
[Epoch 119 Batch 60/162] avg loss 0.00136209, throughput 9.36105K wps | |
[Epoch 119 Batch 90/162] avg loss 0.00125783, throughput 9.22653K wps | |
[Epoch 119 Batch 120/162] avg loss 0.00128243, throughput 9.20817K wps | |
[Epoch 119 Batch 150/162] avg loss 0.00131484, throughput 9.42213K wps | |
Begin Testing... | |
[Epoch 119] train avg loss 0.00129441, dev acc 0.9300, dev avg loss 0.199031, throughput 9.32579K wps | |
[Epoch 120 Batch 30/162] avg loss 0.00126199, throughput 9.60037K wps | |
[Epoch 120 Batch 60/162] avg loss 0.00133035, throughput 9.45657K wps | |
[Epoch 120 Batch 90/162] avg loss 0.00128758, throughput 9.28242K wps | |
[Epoch 120 Batch 120/162] avg loss 0.00128132, throughput 9.41893K wps | |
[Epoch 120 Batch 150/162] avg loss 0.00130215, throughput 9.36201K wps | |
Begin Testing... | |
[Epoch 120] train avg loss 0.00129626, dev acc 0.9267, dev avg loss 0.199177, throughput 9.41975K wps | |
[Epoch 121 Batch 30/162] avg loss 0.00114233, throughput 9.58115K wps | |
[Epoch 121 Batch 60/162] avg loss 0.00125421, throughput 9.46377K wps | |
[Epoch 121 Batch 90/162] avg loss 0.00131915, throughput 9.26709K wps | |
[Epoch 121 Batch 120/162] avg loss 0.0011579, throughput 9.42059K wps | |
[Epoch 121 Batch 150/162] avg loss 0.00128974, throughput 9.44377K wps | |
Begin Testing... | |
[Epoch 121] train avg loss 0.00125055, dev acc 0.9256, dev avg loss 0.199479, throughput 9.42811K wps | |
[Epoch 122 Batch 30/162] avg loss 0.0012395, throughput 9.58242K wps | |
[Epoch 122 Batch 60/162] avg loss 0.00127934, throughput 9.32832K wps | |
[Epoch 122 Batch 90/162] avg loss 0.00135989, throughput 9.45706K wps | |
[Epoch 122 Batch 120/162] avg loss 0.00122102, throughput 9.56583K wps | |
[Epoch 122 Batch 150/162] avg loss 0.00137403, throughput 9.3757K wps | |
Begin Testing... | |
[Epoch 122] train avg loss 0.00127423, dev acc 0.9256, dev avg loss 0.200026, throughput 9.45075K wps | |
[Epoch 123 Batch 30/162] avg loss 0.00123097, throughput 9.64759K wps | |
[Epoch 123 Batch 60/162] avg loss 0.00124824, throughput 9.24797K wps | |
[Epoch 123 Batch 90/162] avg loss 0.00129555, throughput 9.52339K wps | |
[Epoch 123 Batch 120/162] avg loss 0.00129079, throughput 9.43908K wps | |
[Epoch 123 Batch 150/162] avg loss 0.00116047, throughput 9.38876K wps | |
Begin Testing... | |
[Epoch 123] train avg loss 0.00125174, dev acc 0.9322, dev avg loss 0.198537, throughput 9.44336K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 124 Batch 30/162] avg loss 0.00120542, throughput 9.45576K wps | |
[Epoch 124 Batch 60/162] avg loss 0.00110786, throughput 9.42433K wps | |
[Epoch 124 Batch 90/162] avg loss 0.00113806, throughput 9.27189K wps | |
[Epoch 124 Batch 120/162] avg loss 0.00112598, throughput 9.42717K wps | |
[Epoch 124 Batch 150/162] avg loss 0.00146502, throughput 9.23453K wps | |
Begin Testing... | |
[Epoch 124] train avg loss 0.00120405, dev acc 0.9311, dev avg loss 0.198407, throughput 9.36526K wps | |
[Epoch 125 Batch 30/162] avg loss 0.00114488, throughput 9.41631K wps | |
[Epoch 125 Batch 60/162] avg loss 0.00112474, throughput 9.16766K wps | |
[Epoch 125 Batch 90/162] avg loss 0.00120823, throughput 9.31222K wps | |
[Epoch 125 Batch 120/162] avg loss 0.00131459, throughput 9.54004K wps | |
[Epoch 125 Batch 150/162] avg loss 0.00123069, throughput 9.33013K wps | |
Begin Testing... | |
[Epoch 125] train avg loss 0.00119058, dev acc 0.9289, dev avg loss 0.19879, throughput 9.34311K wps | |
[Epoch 126 Batch 30/162] avg loss 0.00115111, throughput 9.4237K wps | |
[Epoch 126 Batch 60/162] avg loss 0.00115781, throughput 9.33512K wps | |
[Epoch 126 Batch 90/162] avg loss 0.00119558, throughput 9.18513K wps | |
[Epoch 126 Batch 120/162] avg loss 0.00126233, throughput 9.22263K wps | |
[Epoch 126 Batch 150/162] avg loss 0.00114824, throughput 9.25024K wps | |
Begin Testing... | |
[Epoch 126] train avg loss 0.00117731, dev acc 0.9267, dev avg loss 0.199156, throughput 9.28786K wps | |
[Epoch 127 Batch 30/162] avg loss 0.00108638, throughput 9.64139K wps | |
[Epoch 127 Batch 60/162] avg loss 0.00122516, throughput 9.32021K wps | |
[Epoch 127 Batch 90/162] avg loss 0.00125105, throughput 9.32356K wps | |
[Epoch 127 Batch 120/162] avg loss 0.00113019, throughput 9.35535K wps | |
[Epoch 127 Batch 150/162] avg loss 0.00131444, throughput 9.29766K wps | |
Begin Testing... | |
[Epoch 127] train avg loss 0.00118893, dev acc 0.9322, dev avg loss 0.19873, throughput 9.36964K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 128 Batch 30/162] avg loss 0.00130396, throughput 9.71072K wps | |
[Epoch 128 Batch 60/162] avg loss 0.00114678, throughput 9.37894K wps | |
[Epoch 128 Batch 90/162] avg loss 0.00102016, throughput 9.46432K wps | |
[Epoch 128 Batch 120/162] avg loss 0.00126073, throughput 9.26486K wps | |
[Epoch 128 Batch 150/162] avg loss 0.00125993, throughput 9.30034K wps | |
Begin Testing... | |
[Epoch 128] train avg loss 0.00119159, dev acc 0.9311, dev avg loss 0.199058, throughput 9.42134K wps | |
[Epoch 129 Batch 30/162] avg loss 0.00113545, throughput 9.54132K wps | |
[Epoch 129 Batch 60/162] avg loss 0.00119513, throughput 9.4028K wps | |
[Epoch 129 Batch 90/162] avg loss 0.00116344, throughput 9.2748K wps | |
[Epoch 129 Batch 120/162] avg loss 0.0011865, throughput 9.5926K wps | |
[Epoch 129 Batch 150/162] avg loss 0.00122651, throughput 9.59699K wps | |
Begin Testing... | |
[Epoch 129] train avg loss 0.00117882, dev acc 0.9311, dev avg loss 0.199058, throughput 9.48258K wps | |
[Epoch 130 Batch 30/162] avg loss 0.00106041, throughput 9.54652K wps | |
[Epoch 130 Batch 60/162] avg loss 0.00111793, throughput 9.4356K wps | |
[Epoch 130 Batch 90/162] avg loss 0.00113075, throughput 9.45868K wps | |
[Epoch 130 Batch 120/162] avg loss 0.00121747, throughput 9.33525K wps | |
[Epoch 130 Batch 150/162] avg loss 0.00117444, throughput 9.33262K wps | |
Begin Testing... | |
[Epoch 130] train avg loss 0.00114068, dev acc 0.9311, dev avg loss 0.199081, throughput 9.42246K wps | |
[Epoch 131 Batch 30/162] avg loss 0.00114748, throughput 9.52469K wps | |
[Epoch 131 Batch 60/162] avg loss 0.00116264, throughput 9.48833K wps | |
[Epoch 131 Batch 90/162] avg loss 0.00111973, throughput 9.24726K wps | |
[Epoch 131 Batch 120/162] avg loss 0.0010949, throughput 9.43146K wps | |
[Epoch 131 Batch 150/162] avg loss 0.00104831, throughput 9.30233K wps | |
Begin Testing... | |
[Epoch 131] train avg loss 0.00111375, dev acc 0.9311, dev avg loss 0.199268, throughput 9.40373K wps | |
[Epoch 132 Batch 30/162] avg loss 0.00117632, throughput 9.65601K wps | |
[Epoch 132 Batch 60/162] avg loss 0.00114367, throughput 9.38117K wps | |
[Epoch 132 Batch 90/162] avg loss 0.00119868, throughput 9.33662K wps | |
[Epoch 132 Batch 120/162] avg loss 0.00114817, throughput 9.44067K wps | |
[Epoch 132 Batch 150/162] avg loss 0.00101591, throughput 9.5173K wps | |
Begin Testing... | |
[Epoch 132] train avg loss 0.00113164, dev acc 0.9311, dev avg loss 0.199261, throughput 9.4615K wps | |
[Epoch 133 Batch 30/162] avg loss 0.0010862, throughput 9.70623K wps | |
[Epoch 133 Batch 60/162] avg loss 0.00109126, throughput 9.31304K wps | |
[Epoch 133 Batch 90/162] avg loss 0.00102857, throughput 9.47679K wps | |
[Epoch 133 Batch 120/162] avg loss 0.0010486, throughput 9.32354K wps | |
[Epoch 133 Batch 150/162] avg loss 0.00107382, throughput 9.23628K wps | |
Begin Testing... | |
[Epoch 133] train avg loss 0.00108574, dev acc 0.9322, dev avg loss 0.199188, throughput 9.41648K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 134 Batch 30/162] avg loss 0.00115467, throughput 9.47643K wps | |
[Epoch 134 Batch 60/162] avg loss 0.00110443, throughput 9.4626K wps | |
[Epoch 134 Batch 90/162] avg loss 0.00110047, throughput 9.21819K wps | |
[Epoch 134 Batch 120/162] avg loss 0.00104304, throughput 9.3083K wps | |
[Epoch 134 Batch 150/162] avg loss 0.00117293, throughput 9.41173K wps | |
Begin Testing... | |
[Epoch 134] train avg loss 0.00110549, dev acc 0.9322, dev avg loss 0.198988, throughput 9.36432K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 135 Batch 30/162] avg loss 0.000991745, throughput 9.65294K wps | |
[Epoch 135 Batch 60/162] avg loss 0.0010149, throughput 9.22678K wps | |
[Epoch 135 Batch 90/162] avg loss 0.00119886, throughput 9.41089K wps | |
[Epoch 135 Batch 120/162] avg loss 0.00118241, throughput 9.22216K wps | |
[Epoch 135 Batch 150/162] avg loss 0.00113847, throughput 9.36905K wps | |
Begin Testing... | |
[Epoch 135] train avg loss 0.0011135, dev acc 0.9300, dev avg loss 0.198867, throughput 9.36878K wps | |
[Epoch 136 Batch 30/162] avg loss 0.00108496, throughput 9.4102K wps | |
[Epoch 136 Batch 60/162] avg loss 0.00113656, throughput 9.34808K wps | |
[Epoch 136 Batch 90/162] avg loss 0.00104562, throughput 9.44032K wps | |
[Epoch 136 Batch 120/162] avg loss 0.000964258, throughput 9.44368K wps | |
[Epoch 136 Batch 150/162] avg loss 0.00103422, throughput 9.45424K wps | |
Begin Testing... | |
[Epoch 136] train avg loss 0.00105575, dev acc 0.9311, dev avg loss 0.198962, throughput 9.41019K wps | |
[Epoch 137 Batch 30/162] avg loss 0.000985504, throughput 9.58277K wps | |
[Epoch 137 Batch 60/162] avg loss 0.00108232, throughput 9.46336K wps | |
[Epoch 137 Batch 90/162] avg loss 0.00111065, throughput 9.5478K wps | |
[Epoch 137 Batch 120/162] avg loss 0.000938915, throughput 9.51131K wps | |
[Epoch 137 Batch 150/162] avg loss 0.000987049, throughput 9.42027K wps | |
Begin Testing... | |
[Epoch 137] train avg loss 0.00104075, dev acc 0.9300, dev avg loss 0.198662, throughput 9.49439K wps | |
[Epoch 138 Batch 30/162] avg loss 0.000983391, throughput 9.45691K wps | |
[Epoch 138 Batch 60/162] avg loss 0.00104918, throughput 9.3775K wps | |
[Epoch 138 Batch 90/162] avg loss 0.00114148, throughput 9.29709K wps | |
[Epoch 138 Batch 120/162] avg loss 0.00105716, throughput 9.47397K wps | |
[Epoch 138 Batch 150/162] avg loss 0.000948318, throughput 9.33785K wps | |
Begin Testing... | |
[Epoch 138] train avg loss 0.00103061, dev acc 0.9322, dev avg loss 0.198622, throughput 9.37361K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 139 Batch 30/162] avg loss 0.00101436, throughput 9.4718K wps | |
[Epoch 139 Batch 60/162] avg loss 0.00108273, throughput 9.42683K wps | |
[Epoch 139 Batch 90/162] avg loss 0.00106279, throughput 9.25619K wps | |
[Epoch 139 Batch 120/162] avg loss 0.0011005, throughput 9.53493K wps | |
[Epoch 139 Batch 150/162] avg loss 0.00114215, throughput 9.4667K wps | |
Begin Testing... | |
[Epoch 139] train avg loss 0.00107766, dev acc 0.9267, dev avg loss 0.199262, throughput 9.43912K wps | |
[Epoch 140 Batch 30/162] avg loss 0.00115548, throughput 9.67493K wps | |
[Epoch 140 Batch 60/162] avg loss 0.000960541, throughput 9.40587K wps | |
[Epoch 140 Batch 90/162] avg loss 0.0010266, throughput 9.3468K wps | |
[Epoch 140 Batch 120/162] avg loss 0.00101439, throughput 9.3208K wps | |
[Epoch 140 Batch 150/162] avg loss 0.000913721, throughput 9.59213K wps | |
Begin Testing... | |
[Epoch 140] train avg loss 0.00102539, dev acc 0.9278, dev avg loss 0.199041, throughput 9.46759K wps | |
[Epoch 141 Batch 30/162] avg loss 0.00104014, throughput 9.61873K wps | |
[Epoch 141 Batch 60/162] avg loss 0.00105698, throughput 9.27773K wps | |
[Epoch 141 Batch 90/162] avg loss 0.00100694, throughput 9.45673K wps | |
[Epoch 141 Batch 120/162] avg loss 0.00105628, throughput 9.39983K wps | |
[Epoch 141 Batch 150/162] avg loss 0.00104997, throughput 9.34983K wps | |
Begin Testing... | |
[Epoch 141] train avg loss 0.00103387, dev acc 0.9322, dev avg loss 0.199055, throughput 9.40671K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 142 Batch 30/162] avg loss 0.000887658, throughput 9.55299K wps | |
[Epoch 142 Batch 60/162] avg loss 0.00104784, throughput 9.33178K wps | |
[Epoch 142 Batch 90/162] avg loss 0.000964878, throughput 9.32028K wps | |
[Epoch 142 Batch 120/162] avg loss 0.00104766, throughput 9.34455K wps | |
[Epoch 142 Batch 150/162] avg loss 0.00104082, throughput 9.29959K wps | |
Begin Testing... | |
[Epoch 142] train avg loss 0.000993459, dev acc 0.9344, dev avg loss 0.198146, throughput 9.36844K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 143 Batch 30/162] avg loss 0.000976041, throughput 9.57822K wps | |
[Epoch 143 Batch 60/162] avg loss 0.000971394, throughput 9.26075K wps | |
[Epoch 143 Batch 90/162] avg loss 0.000987658, throughput 9.34284K wps | |
[Epoch 143 Batch 120/162] avg loss 0.00105291, throughput 9.36807K wps | |
[Epoch 143 Batch 150/162] avg loss 0.000950739, throughput 9.50905K wps | |
Begin Testing... | |
[Epoch 143] train avg loss 0.000973638, dev acc 0.9267, dev avg loss 0.199737, throughput 9.41619K wps | |
[Epoch 144 Batch 30/162] avg loss 0.000952713, throughput 9.41633K wps | |
[Epoch 144 Batch 60/162] avg loss 0.00102474, throughput 9.38271K wps | |
[Epoch 144 Batch 90/162] avg loss 0.000948891, throughput 9.46628K wps | |
[Epoch 144 Batch 120/162] avg loss 0.000970719, throughput 9.53042K wps | |
[Epoch 144 Batch 150/162] avg loss 0.0010117, throughput 9.55971K wps | |
Begin Testing... | |
[Epoch 144] train avg loss 0.000975485, dev acc 0.9333, dev avg loss 0.198338, throughput 9.47488K wps | |
[Epoch 145 Batch 30/162] avg loss 0.000999671, throughput 9.57715K wps | |
[Epoch 145 Batch 60/162] avg loss 0.0010243, throughput 9.52931K wps | |
[Epoch 145 Batch 90/162] avg loss 0.000994705, throughput 9.38481K wps | |
[Epoch 145 Batch 120/162] avg loss 0.000852265, throughput 9.36572K wps | |
[Epoch 145 Batch 150/162] avg loss 0.000894153, throughput 9.31787K wps | |
Begin Testing... | |
[Epoch 145] train avg loss 0.000959826, dev acc 0.9322, dev avg loss 0.197951, throughput 9.42148K wps | |
[Epoch 146 Batch 30/162] avg loss 0.0009213, throughput 9.51997K wps | |
[Epoch 146 Batch 60/162] avg loss 0.000996457, throughput 9.52512K wps | |
[Epoch 146 Batch 90/162] avg loss 0.000974005, throughput 9.33838K wps | |
[Epoch 146 Batch 120/162] avg loss 0.000958056, throughput 9.38998K wps | |
[Epoch 146 Batch 150/162] avg loss 0.000898112, throughput 9.30961K wps | |
Begin Testing... | |
[Epoch 146] train avg loss 0.000944362, dev acc 0.9311, dev avg loss 0.198404, throughput 9.40715K wps | |
[Epoch 147 Batch 30/162] avg loss 0.000833541, throughput 9.51444K wps | |
[Epoch 147 Batch 60/162] avg loss 0.000823794, throughput 9.5389K wps | |
[Epoch 147 Batch 90/162] avg loss 0.00104564, throughput 9.4814K wps | |
[Epoch 147 Batch 120/162] avg loss 0.00101268, throughput 9.45189K wps | |
[Epoch 147 Batch 150/162] avg loss 0.00108087, throughput 9.47941K wps | |
Begin Testing... | |
[Epoch 147] train avg loss 0.000962311, dev acc 0.9322, dev avg loss 0.198059, throughput 9.48631K wps | |
[Epoch 148 Batch 30/162] avg loss 0.000961202, throughput 9.59616K wps | |
[Epoch 148 Batch 60/162] avg loss 0.000909715, throughput 9.2111K wps | |
[Epoch 148 Batch 90/162] avg loss 0.00097298, throughput 9.42474K wps | |
[Epoch 148 Batch 120/162] avg loss 0.000981802, throughput 9.46109K wps | |
[Epoch 148 Batch 150/162] avg loss 0.000926924, throughput 9.42846K wps | |
Begin Testing... | |
[Epoch 148] train avg loss 0.000952834, dev acc 0.9300, dev avg loss 0.198461, throughput 9.42231K wps | |
[Epoch 149 Batch 30/162] avg loss 0.00094544, throughput 9.59141K wps | |
[Epoch 149 Batch 60/162] avg loss 0.000798458, throughput 9.39647K wps | |
[Epoch 149 Batch 90/162] avg loss 0.000941157, throughput 9.48778K wps | |
[Epoch 149 Batch 120/162] avg loss 0.00103026, throughput 9.37045K wps | |
[Epoch 149 Batch 150/162] avg loss 0.00100158, throughput 9.4224K wps | |
Begin Testing... | |
[Epoch 149] train avg loss 0.00095037, dev acc 0.9322, dev avg loss 0.199338, throughput 9.45777K wps | |
[Epoch 150 Batch 30/162] avg loss 0.000941625, throughput 9.73412K wps | |
[Epoch 150 Batch 60/162] avg loss 0.000918315, throughput 9.47646K wps | |
[Epoch 150 Batch 90/162] avg loss 0.000879122, throughput 9.38709K wps | |
[Epoch 150 Batch 120/162] avg loss 0.000972516, throughput 9.3787K wps | |
[Epoch 150 Batch 150/162] avg loss 0.00093079, throughput 9.26623K wps | |
Begin Testing... | |
[Epoch 150] train avg loss 0.000932415, dev acc 0.9322, dev avg loss 0.198455, throughput 9.44923K wps | |
[Epoch 151 Batch 30/162] avg loss 0.00101544, throughput 9.52437K wps | |
[Epoch 151 Batch 60/162] avg loss 0.000847919, throughput 9.28802K wps | |
[Epoch 151 Batch 90/162] avg loss 0.000894011, throughput 9.41631K wps | |
[Epoch 151 Batch 120/162] avg loss 0.000859629, throughput 9.24288K wps | |
[Epoch 151 Batch 150/162] avg loss 0.000998474, throughput 9.22177K wps | |
Begin Testing... | |
[Epoch 151] train avg loss 0.000921964, dev acc 0.9289, dev avg loss 0.198911, throughput 9.33929K wps | |
[Epoch 152 Batch 30/162] avg loss 0.000965267, throughput 9.56187K wps | |
[Epoch 152 Batch 60/162] avg loss 0.000948554, throughput 9.28427K wps | |
[Epoch 152 Batch 90/162] avg loss 0.000796564, throughput 9.40172K wps | |
[Epoch 152 Batch 120/162] avg loss 0.000977773, throughput 9.28411K wps | |
[Epoch 152 Batch 150/162] avg loss 0.00079192, throughput 9.35199K wps | |
Begin Testing... | |
[Epoch 152] train avg loss 0.000899693, dev acc 0.9344, dev avg loss 0.198833, throughput 9.37892K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 153 Batch 30/162] avg loss 0.000805971, throughput 9.44939K wps | |
[Epoch 153 Batch 60/162] avg loss 0.000873838, throughput 9.18985K wps | |
[Epoch 153 Batch 90/162] avg loss 0.000994105, throughput 9.57794K wps | |
[Epoch 153 Batch 120/162] avg loss 0.000861975, throughput 9.38475K wps | |
[Epoch 153 Batch 150/162] avg loss 0.000864721, throughput 9.4132K wps | |
Begin Testing... | |
[Epoch 153] train avg loss 0.000902006, dev acc 0.9289, dev avg loss 0.199371, throughput 9.4132K wps | |
[Epoch 154 Batch 30/162] avg loss 0.000854582, throughput 9.56797K wps | |
[Epoch 154 Batch 60/162] avg loss 0.000876126, throughput 9.38431K wps | |
[Epoch 154 Batch 90/162] avg loss 0.000881669, throughput 9.35013K wps | |
[Epoch 154 Batch 120/162] avg loss 0.000845758, throughput 9.40708K wps | |
[Epoch 154 Batch 150/162] avg loss 0.000901686, throughput 9.38583K wps | |
Begin Testing... | |
[Epoch 154] train avg loss 0.000883258, dev acc 0.9311, dev avg loss 0.198855, throughput 9.4104K wps | |
[Epoch 155 Batch 30/162] avg loss 0.000877631, throughput 9.62206K wps | |
[Epoch 155 Batch 60/162] avg loss 0.00094612, throughput 9.55996K wps | |
[Epoch 155 Batch 90/162] avg loss 0.000838889, throughput 9.31259K wps | |
[Epoch 155 Batch 120/162] avg loss 0.000806639, throughput 9.41421K wps | |
[Epoch 155 Batch 150/162] avg loss 0.000920395, throughput 9.22436K wps | |
Begin Testing... | |
[Epoch 155] train avg loss 0.000877413, dev acc 0.9344, dev avg loss 0.199072, throughput 9.4179K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 156 Batch 30/162] avg loss 0.000917375, throughput 9.66404K wps | |
[Epoch 156 Batch 60/162] avg loss 0.000891523, throughput 9.58484K wps | |
[Epoch 156 Batch 90/162] avg loss 0.000843721, throughput 9.5537K wps | |
[Epoch 156 Batch 120/162] avg loss 0.000926243, throughput 9.5482K wps | |
[Epoch 156 Batch 150/162] avg loss 0.000885359, throughput 9.49341K wps | |
Begin Testing... | |
[Epoch 156] train avg loss 0.000888057, dev acc 0.9300, dev avg loss 0.199528, throughput 9.54813K wps | |
[Epoch 157 Batch 30/162] avg loss 0.000883461, throughput 9.73445K wps | |
[Epoch 157 Batch 60/162] avg loss 0.00081473, throughput 9.37778K wps | |
[Epoch 157 Batch 90/162] avg loss 0.000874448, throughput 9.42499K wps | |
[Epoch 157 Batch 120/162] avg loss 0.000875227, throughput 9.26044K wps | |
[Epoch 157 Batch 150/162] avg loss 0.000830335, throughput 9.40601K wps | |
Begin Testing... | |
[Epoch 157] train avg loss 0.000852687, dev acc 0.9311, dev avg loss 0.199195, throughput 9.43721K wps | |
[Epoch 158 Batch 30/162] avg loss 0.00079989, throughput 9.70167K wps | |
[Epoch 158 Batch 60/162] avg loss 0.000988619, throughput 9.3357K wps | |
[Epoch 158 Batch 90/162] avg loss 0.000847354, throughput 9.32659K wps | |
[Epoch 158 Batch 120/162] avg loss 0.000791548, throughput 9.37937K wps | |
[Epoch 158 Batch 150/162] avg loss 0.000873491, throughput 9.41736K wps | |
Begin Testing... | |
[Epoch 158] train avg loss 0.000855033, dev acc 0.9311, dev avg loss 0.19911, throughput 9.42603K wps | |
[Epoch 159 Batch 30/162] avg loss 0.00077084, throughput 9.67629K wps | |
[Epoch 159 Batch 60/162] avg loss 0.000712891, throughput 9.58881K wps | |
[Epoch 159 Batch 90/162] avg loss 0.000753351, throughput 9.41029K wps | |
[Epoch 159 Batch 120/162] avg loss 0.000934713, throughput 9.5363K wps | |
[Epoch 159 Batch 150/162] avg loss 0.000829498, throughput 9.24886K wps | |
Begin Testing... | |
[Epoch 159] train avg loss 0.000808835, dev acc 0.9311, dev avg loss 0.19963, throughput 9.47987K wps | |
[Epoch 160 Batch 30/162] avg loss 0.000808971, throughput 9.51016K wps | |
[Epoch 160 Batch 60/162] avg loss 0.000879163, throughput 9.48295K wps | |
[Epoch 160 Batch 90/162] avg loss 0.000803701, throughput 9.26398K wps | |
[Epoch 160 Batch 120/162] avg loss 0.000860717, throughput 9.23514K wps | |
[Epoch 160 Batch 150/162] avg loss 0.000750216, throughput 9.42975K wps | |
Begin Testing... | |
[Epoch 160] train avg loss 0.000829286, dev acc 0.9311, dev avg loss 0.199735, throughput 9.3919K wps | |
[Epoch 161 Batch 30/162] avg loss 0.000974678, throughput 9.54246K wps | |
[Epoch 161 Batch 60/162] avg loss 0.000845298, throughput 9.41976K wps | |
[Epoch 161 Batch 90/162] avg loss 0.000808872, throughput 9.37979K wps | |
[Epoch 161 Batch 120/162] avg loss 0.000905347, throughput 9.24486K wps | |
[Epoch 161 Batch 150/162] avg loss 0.000705771, throughput 9.40624K wps | |
Begin Testing... | |
[Epoch 161] train avg loss 0.000844436, dev acc 0.9289, dev avg loss 0.199593, throughput 9.41192K wps | |
[Epoch 162 Batch 30/162] avg loss 0.000703221, throughput 9.69891K wps | |
[Epoch 162 Batch 60/162] avg loss 0.000955533, throughput 9.26135K wps | |
[Epoch 162 Batch 90/162] avg loss 0.000947767, throughput 9.27048K wps | |
[Epoch 162 Batch 120/162] avg loss 0.000730136, throughput 9.36203K wps | |
[Epoch 162 Batch 150/162] avg loss 0.000868703, throughput 9.51287K wps | |
Begin Testing... | |
[Epoch 162] train avg loss 0.000837811, dev acc 0.9289, dev avg loss 0.201457, throughput 9.43234K wps | |
[Epoch 163 Batch 30/162] avg loss 0.000796378, throughput 9.63017K wps | |
[Epoch 163 Batch 60/162] avg loss 0.000855796, throughput 9.3025K wps | |
[Epoch 163 Batch 90/162] avg loss 0.000751803, throughput 9.40544K wps | |
[Epoch 163 Batch 120/162] avg loss 0.000802561, throughput 9.47464K wps | |
[Epoch 163 Batch 150/162] avg loss 0.000814194, throughput 9.3446K wps | |
Begin Testing... | |
[Epoch 163] train avg loss 0.000806808, dev acc 0.9311, dev avg loss 0.199386, throughput 9.4354K wps | |
[Epoch 164 Batch 30/162] avg loss 0.000694118, throughput 9.72815K wps | |
[Epoch 164 Batch 60/162] avg loss 0.000772537, throughput 9.35489K wps | |
[Epoch 164 Batch 90/162] avg loss 0.000798749, throughput 9.45838K wps | |
[Epoch 164 Batch 120/162] avg loss 0.000732335, throughput 9.42304K wps | |
[Epoch 164 Batch 150/162] avg loss 0.000873458, throughput 9.4681K wps | |
Begin Testing... | |
[Epoch 164] train avg loss 0.000774546, dev acc 0.9322, dev avg loss 0.199672, throughput 9.47183K wps | |
[Epoch 165 Batch 30/162] avg loss 0.000755503, throughput 9.50419K wps | |
[Epoch 165 Batch 60/162] avg loss 0.000855095, throughput 9.48564K wps | |
[Epoch 165 Batch 90/162] avg loss 0.000718117, throughput 9.33272K wps | |
[Epoch 165 Batch 120/162] avg loss 0.000749274, throughput 9.50652K wps | |
[Epoch 165 Batch 150/162] avg loss 0.000889802, throughput 9.43312K wps | |
Begin Testing... | |
[Epoch 165] train avg loss 0.000797565, dev acc 0.9322, dev avg loss 0.199214, throughput 9.43677K wps | |
[Epoch 166 Batch 30/162] avg loss 0.000786124, throughput 9.50313K wps | |
[Epoch 166 Batch 60/162] avg loss 0.000757047, throughput 9.40146K wps | |
[Epoch 166 Batch 90/162] avg loss 0.000846746, throughput 9.37472K wps | |
[Epoch 166 Batch 120/162] avg loss 0.000822213, throughput 9.53916K wps | |
[Epoch 166 Batch 150/162] avg loss 0.000710454, throughput 9.288K wps | |
Begin Testing... | |
[Epoch 166] train avg loss 0.000792623, dev acc 0.9311, dev avg loss 0.19945, throughput 9.41592K wps | |
[Epoch 167 Batch 30/162] avg loss 0.00078465, throughput 9.7727K wps | |
[Epoch 167 Batch 60/162] avg loss 0.000749043, throughput 9.39833K wps | |
[Epoch 167 Batch 90/162] avg loss 0.000751108, throughput 9.28201K wps | |
[Epoch 167 Batch 120/162] avg loss 0.000842619, throughput 9.51506K wps | |
[Epoch 167 Batch 150/162] avg loss 0.000828416, throughput 9.48429K wps | |
Begin Testing... | |
[Epoch 167] train avg loss 0.000775596, dev acc 0.9344, dev avg loss 0.199765, throughput 9.4774K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 168 Batch 30/162] avg loss 0.000750158, throughput 9.48357K wps | |
[Epoch 168 Batch 60/162] avg loss 0.000751583, throughput 9.45087K wps | |
[Epoch 168 Batch 90/162] avg loss 0.0008342, throughput 9.40624K wps | |
[Epoch 168 Batch 120/162] avg loss 0.000724395, throughput 9.41174K wps | |
[Epoch 168 Batch 150/162] avg loss 0.000741043, throughput 9.29438K wps | |
Begin Testing... | |
[Epoch 168] train avg loss 0.000766779, dev acc 0.9311, dev avg loss 0.199704, throughput 9.41703K wps | |
[Epoch 169 Batch 30/162] avg loss 0.00074829, throughput 9.50446K wps | |
[Epoch 169 Batch 60/162] avg loss 0.000697833, throughput 9.35994K wps | |
[Epoch 169 Batch 90/162] avg loss 0.000785212, throughput 9.47869K wps | |
[Epoch 169 Batch 120/162] avg loss 0.000862278, throughput 9.40258K wps | |
[Epoch 169 Batch 150/162] avg loss 0.000765823, throughput 9.44211K wps | |
Begin Testing... | |
[Epoch 169] train avg loss 0.000781461, dev acc 0.9289, dev avg loss 0.200178, throughput 9.42419K wps | |
[Epoch 170 Batch 30/162] avg loss 0.000665183, throughput 9.59721K wps | |
[Epoch 170 Batch 60/162] avg loss 0.000721135, throughput 9.40824K wps | |
[Epoch 170 Batch 90/162] avg loss 0.0007045, throughput 9.32782K wps | |
[Epoch 170 Batch 120/162] avg loss 0.000762113, throughput 9.30466K wps | |
[Epoch 170 Batch 150/162] avg loss 0.000850951, throughput 9.24936K wps | |
Begin Testing... | |
[Epoch 170] train avg loss 0.000734841, dev acc 0.9333, dev avg loss 0.200847, throughput 9.36183K wps | |
[Epoch 171 Batch 30/162] avg loss 0.000838534, throughput 9.62985K wps | |
[Epoch 171 Batch 60/162] avg loss 0.000800776, throughput 9.42853K wps | |
[Epoch 171 Batch 90/162] avg loss 0.000743843, throughput 9.41032K wps | |
[Epoch 171 Batch 120/162] avg loss 0.000806597, throughput 9.29138K wps | |
[Epoch 171 Batch 150/162] avg loss 0.00069783, throughput 9.50147K wps | |
Begin Testing... | |
[Epoch 171] train avg loss 0.000776812, dev acc 0.9300, dev avg loss 0.200217, throughput 9.43441K wps | |
[Epoch 172 Batch 30/162] avg loss 0.000723996, throughput 9.50593K wps | |
[Epoch 172 Batch 60/162] avg loss 0.000680573, throughput 9.47756K wps | |
[Epoch 172 Batch 90/162] avg loss 0.000735702, throughput 9.35396K wps | |
[Epoch 172 Batch 120/162] avg loss 0.000859069, throughput 9.27223K wps | |
[Epoch 172 Batch 150/162] avg loss 0.000791226, throughput 9.30891K wps | |
Begin Testing... | |
[Epoch 172] train avg loss 0.000755193, dev acc 0.9311, dev avg loss 0.199716, throughput 9.39513K wps | |
[Epoch 173 Batch 30/162] avg loss 0.000786021, throughput 9.63078K wps | |
[Epoch 173 Batch 60/162] avg loss 0.000690998, throughput 9.32488K wps | |
[Epoch 173 Batch 90/162] avg loss 0.000712721, throughput 9.39249K wps | |
[Epoch 173 Batch 120/162] avg loss 0.000668333, throughput 9.41696K wps | |
[Epoch 173 Batch 150/162] avg loss 0.000770785, throughput 9.43979K wps | |
Begin Testing... | |
[Epoch 173] train avg loss 0.000714309, dev acc 0.9311, dev avg loss 0.200101, throughput 9.41934K wps | |
[Epoch 174 Batch 30/162] avg loss 0.000684969, throughput 9.76961K wps | |
[Epoch 174 Batch 60/162] avg loss 0.000735002, throughput 9.3513K wps | |
[Epoch 174 Batch 90/162] avg loss 0.000772401, throughput 9.34882K wps | |
[Epoch 174 Batch 120/162] avg loss 0.000708095, throughput 9.59746K wps | |
[Epoch 174 Batch 150/162] avg loss 0.000749415, throughput 9.32475K wps | |
Begin Testing... | |
[Epoch 174] train avg loss 0.000722941, dev acc 0.9333, dev avg loss 0.200322, throughput 9.4791K wps | |
[Epoch 175 Batch 30/162] avg loss 0.000776178, throughput 9.55876K wps | |
[Epoch 175 Batch 60/162] avg loss 0.000646456, throughput 9.36128K wps | |
[Epoch 175 Batch 90/162] avg loss 0.00077226, throughput 9.23862K wps | |
[Epoch 175 Batch 120/162] avg loss 0.000751662, throughput 9.40855K wps | |
[Epoch 175 Batch 150/162] avg loss 0.000762339, throughput 9.50439K wps | |
Begin Testing... | |
[Epoch 175] train avg loss 0.000741196, dev acc 0.9311, dev avg loss 0.200233, throughput 9.40618K wps | |
[Epoch 176 Batch 30/162] avg loss 0.000718871, throughput 9.54854K wps | |
[Epoch 176 Batch 60/162] avg loss 0.000761122, throughput 9.29165K wps | |
[Epoch 176 Batch 90/162] avg loss 0.000761968, throughput 9.51773K wps | |
[Epoch 176 Batch 120/162] avg loss 0.000767324, throughput 9.45121K wps | |
[Epoch 176 Batch 150/162] avg loss 0.000621879, throughput 9.37256K wps | |
Begin Testing... | |
[Epoch 176] train avg loss 0.000716771, dev acc 0.9322, dev avg loss 0.199961, throughput 9.42582K wps | |
[Epoch 177 Batch 30/162] avg loss 0.000749861, throughput 9.62679K wps | |
[Epoch 177 Batch 60/162] avg loss 0.00070125, throughput 9.32116K wps | |
[Epoch 177 Batch 90/162] avg loss 0.000648393, throughput 9.36197K wps | |
[Epoch 177 Batch 120/162] avg loss 0.000711747, throughput 9.3052K wps | |
[Epoch 177 Batch 150/162] avg loss 0.000782345, throughput 9.31305K wps | |
Begin Testing... | |
[Epoch 177] train avg loss 0.000716524, dev acc 0.9300, dev avg loss 0.2003, throughput 9.36782K wps | |
[Epoch 178 Batch 30/162] avg loss 0.000694793, throughput 9.58033K wps | |
[Epoch 178 Batch 60/162] avg loss 0.000639935, throughput 9.37018K wps | |
[Epoch 178 Batch 90/162] avg loss 0.000586784, throughput 9.32866K wps | |
[Epoch 178 Batch 120/162] avg loss 0.000634432, throughput 9.41552K wps | |
[Epoch 178 Batch 150/162] avg loss 0.000779367, throughput 9.34433K wps | |
Begin Testing... | |
[Epoch 178] train avg loss 0.000680366, dev acc 0.9322, dev avg loss 0.200603, throughput 9.39735K wps | |
[Epoch 179 Batch 30/162] avg loss 0.000694844, throughput 9.45319K wps | |
[Epoch 179 Batch 60/162] avg loss 0.000702892, throughput 9.41916K wps | |
[Epoch 179 Batch 90/162] avg loss 0.000746897, throughput 9.54811K wps | |
[Epoch 179 Batch 120/162] avg loss 0.000635815, throughput 9.36965K wps | |
[Epoch 179 Batch 150/162] avg loss 0.000724774, throughput 9.27574K wps | |
Begin Testing... | |
[Epoch 179] train avg loss 0.000697136, dev acc 0.9333, dev avg loss 0.200535, throughput 9.42392K wps | |
[Epoch 180 Batch 30/162] avg loss 0.000765003, throughput 9.43349K wps | |
[Epoch 180 Batch 60/162] avg loss 0.00069129, throughput 9.36319K wps | |
[Epoch 180 Batch 90/162] avg loss 0.000677021, throughput 9.31895K wps | |
[Epoch 180 Batch 120/162] avg loss 0.000646422, throughput 9.31929K wps | |
[Epoch 180 Batch 150/162] avg loss 0.000598789, throughput 9.3553K wps | |
Begin Testing... | |
[Epoch 180] train avg loss 0.000685517, dev acc 0.9333, dev avg loss 0.201007, throughput 9.38013K wps | |
[Epoch 181 Batch 30/162] avg loss 0.000629044, throughput 9.54956K wps | |
[Epoch 181 Batch 60/162] avg loss 0.000800723, throughput 9.44998K wps | |
[Epoch 181 Batch 90/162] avg loss 0.000714992, throughput 9.32744K wps | |
[Epoch 181 Batch 120/162] avg loss 0.000725197, throughput 9.30969K wps | |
[Epoch 181 Batch 150/162] avg loss 0.000709321, throughput 9.35682K wps | |
Begin Testing... | |
[Epoch 181] train avg loss 0.000712253, dev acc 0.9333, dev avg loss 0.200674, throughput 9.41087K wps | |
[Epoch 182 Batch 30/162] avg loss 0.000742768, throughput 9.64101K wps | |
[Epoch 182 Batch 60/162] avg loss 0.000596728, throughput 9.33111K wps | |
[Epoch 182 Batch 90/162] avg loss 0.000640098, throughput 9.42153K wps | |
[Epoch 182 Batch 120/162] avg loss 0.000700641, throughput 9.40956K wps | |
[Epoch 182 Batch 150/162] avg loss 0.000668972, throughput 9.48947K wps | |
Begin Testing... | |
[Epoch 182] train avg loss 0.000669023, dev acc 0.9333, dev avg loss 0.200928, throughput 9.46244K wps | |
[Epoch 183 Batch 30/162] avg loss 0.000716692, throughput 9.61282K wps | |
[Epoch 183 Batch 60/162] avg loss 0.000643486, throughput 9.30336K wps | |
[Epoch 183 Batch 90/162] avg loss 0.000788289, throughput 9.30866K wps | |
[Epoch 183 Batch 120/162] avg loss 0.000729306, throughput 9.5337K wps | |
[Epoch 183 Batch 150/162] avg loss 0.000670616, throughput 9.38849K wps | |
Begin Testing... | |
[Epoch 183] train avg loss 0.000703653, dev acc 0.9333, dev avg loss 0.201005, throughput 9.41901K wps | |
[Epoch 184 Batch 30/162] avg loss 0.000672865, throughput 9.57261K wps | |
[Epoch 184 Batch 60/162] avg loss 0.000736718, throughput 9.30365K wps | |
[Epoch 184 Batch 90/162] avg loss 0.000653935, throughput 9.46661K wps | |
[Epoch 184 Batch 120/162] avg loss 0.000689856, throughput 9.49916K wps | |
[Epoch 184 Batch 150/162] avg loss 0.000644164, throughput 9.50608K wps | |
Begin Testing... | |
[Epoch 184] train avg loss 0.000681758, dev acc 0.9333, dev avg loss 0.20135, throughput 9.46569K wps | |
[Epoch 185 Batch 30/162] avg loss 0.000623803, throughput 9.54737K wps | |
[Epoch 185 Batch 60/162] avg loss 0.000700642, throughput 9.57664K wps | |
[Epoch 185 Batch 90/162] avg loss 0.000708221, throughput 9.4525K wps | |
[Epoch 185 Batch 120/162] avg loss 0.000657087, throughput 9.38965K wps | |
[Epoch 185 Batch 150/162] avg loss 0.000690085, throughput 9.19199K wps | |
Begin Testing... | |
[Epoch 185] train avg loss 0.000683682, dev acc 0.9322, dev avg loss 0.201315, throughput 9.43165K wps | |
[Epoch 186 Batch 30/162] avg loss 0.000609656, throughput 9.55662K wps | |
[Epoch 186 Batch 60/162] avg loss 0.000669506, throughput 9.30114K wps | |
[Epoch 186 Batch 90/162] avg loss 0.000723473, throughput 9.32232K wps | |
[Epoch 186 Batch 120/162] avg loss 0.000599758, throughput 9.54169K wps | |
[Epoch 186 Batch 150/162] avg loss 0.000777129, throughput 9.25565K wps | |
Begin Testing... | |
[Epoch 186] train avg loss 0.00066921, dev acc 0.9333, dev avg loss 0.20093, throughput 9.39597K wps | |
[Epoch 187 Batch 30/162] avg loss 0.000697954, throughput 9.56033K wps | |
[Epoch 187 Batch 60/162] avg loss 0.000675127, throughput 9.35616K wps | |
[Epoch 187 Batch 90/162] avg loss 0.000613366, throughput 9.27318K wps | |
[Epoch 187 Batch 120/162] avg loss 0.000621279, throughput 9.39594K wps | |
[Epoch 187 Batch 150/162] avg loss 0.000611037, throughput 9.32422K wps | |
Begin Testing... | |
[Epoch 187] train avg loss 0.000645539, dev acc 0.9344, dev avg loss 0.201274, throughput 9.38123K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 188 Batch 30/162] avg loss 0.000683857, throughput 9.56185K wps | |
[Epoch 188 Batch 60/162] avg loss 0.000638752, throughput 9.33461K wps | |
[Epoch 188 Batch 90/162] avg loss 0.000626189, throughput 9.38546K wps | |
[Epoch 188 Batch 120/162] avg loss 0.000655623, throughput 9.36592K wps | |
[Epoch 188 Batch 150/162] avg loss 0.000704469, throughput 9.31537K wps | |
Begin Testing... | |
[Epoch 188] train avg loss 0.000660655, dev acc 0.9344, dev avg loss 0.202151, throughput 9.37124K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 189 Batch 30/162] avg loss 0.000737478, throughput 9.59331K wps | |
[Epoch 189 Batch 60/162] avg loss 0.000649084, throughput 9.3144K wps | |
[Epoch 189 Batch 90/162] avg loss 0.000635279, throughput 9.44601K wps | |
[Epoch 189 Batch 120/162] avg loss 0.000621244, throughput 9.50741K wps | |
[Epoch 189 Batch 150/162] avg loss 0.000643737, throughput 9.30601K wps | |
Begin Testing... | |
[Epoch 189] train avg loss 0.000656525, dev acc 0.9333, dev avg loss 0.20168, throughput 9.42289K wps | |
[Epoch 190 Batch 30/162] avg loss 0.00060197, throughput 9.54444K wps | |
[Epoch 190 Batch 60/162] avg loss 0.000594402, throughput 9.55803K wps | |
[Epoch 190 Batch 90/162] avg loss 0.000541556, throughput 9.3759K wps | |
[Epoch 190 Batch 120/162] avg loss 0.000611504, throughput 9.29774K wps | |
[Epoch 190 Batch 150/162] avg loss 0.0006616, throughput 9.23232K wps | |
Begin Testing... | |
[Epoch 190] train avg loss 0.000611076, dev acc 0.9322, dev avg loss 0.202138, throughput 9.3941K wps | |
[Epoch 191 Batch 30/162] avg loss 0.000583513, throughput 9.54981K wps | |
[Epoch 191 Batch 60/162] avg loss 0.000607934, throughput 9.38146K wps | |
[Epoch 191 Batch 90/162] avg loss 0.000771854, throughput 9.33102K wps | |
[Epoch 191 Batch 120/162] avg loss 0.000666878, throughput 9.27158K wps | |
[Epoch 191 Batch 150/162] avg loss 0.000567977, throughput 9.40895K wps | |
Begin Testing... | |
[Epoch 191] train avg loss 0.000644209, dev acc 0.9333, dev avg loss 0.20221, throughput 9.39202K wps | |
[Epoch 192 Batch 30/162] avg loss 0.000730085, throughput 9.49941K wps | |
[Epoch 192 Batch 60/162] avg loss 0.000583115, throughput 9.39319K wps | |
[Epoch 192 Batch 90/162] avg loss 0.000663236, throughput 9.45732K wps | |
[Epoch 192 Batch 120/162] avg loss 0.000608084, throughput 9.51189K wps | |
[Epoch 192 Batch 150/162] avg loss 0.000570181, throughput 9.56225K wps | |
Begin Testing... | |
[Epoch 192] train avg loss 0.000618323, dev acc 0.9322, dev avg loss 0.202221, throughput 9.45856K wps | |
[Epoch 193 Batch 30/162] avg loss 0.000572176, throughput 9.60178K wps | |
[Epoch 193 Batch 60/162] avg loss 0.000565171, throughput 9.22527K wps | |
[Epoch 193 Batch 90/162] avg loss 0.000631348, throughput 9.47041K wps | |
[Epoch 193 Batch 120/162] avg loss 0.000595776, throughput 9.33119K wps | |
[Epoch 193 Batch 150/162] avg loss 0.000636903, throughput 9.37086K wps | |
Begin Testing... | |
[Epoch 193] train avg loss 0.000610488, dev acc 0.9344, dev avg loss 0.202612, throughput 9.40767K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 194 Batch 30/162] avg loss 0.000685029, throughput 9.54745K wps | |
[Epoch 194 Batch 60/162] avg loss 0.00062756, throughput 9.2748K wps | |
[Epoch 194 Batch 90/162] avg loss 0.000635914, throughput 9.31694K wps | |
[Epoch 194 Batch 120/162] avg loss 0.000535939, throughput 9.36308K wps | |
[Epoch 194 Batch 150/162] avg loss 0.000592485, throughput 9.24411K wps | |
Begin Testing... | |
[Epoch 194] train avg loss 0.000609878, dev acc 0.9300, dev avg loss 0.202956, throughput 9.34897K wps | |
[Epoch 195 Batch 30/162] avg loss 0.000644012, throughput 9.55858K wps | |
[Epoch 195 Batch 60/162] avg loss 0.000567187, throughput 9.33549K wps | |
[Epoch 195 Batch 90/162] avg loss 0.00067197, throughput 9.34275K wps | |
[Epoch 195 Batch 120/162] avg loss 0.000632038, throughput 9.24271K wps | |
[Epoch 195 Batch 150/162] avg loss 0.000684886, throughput 9.475K wps | |
Begin Testing... | |
[Epoch 195] train avg loss 0.000636783, dev acc 0.9322, dev avg loss 0.202272, throughput 9.38602K wps | |
[Epoch 196 Batch 30/162] avg loss 0.000696987, throughput 9.51412K wps | |
[Epoch 196 Batch 60/162] avg loss 0.000623682, throughput 9.47651K wps | |
[Epoch 196 Batch 90/162] avg loss 0.000584644, throughput 9.39549K wps | |
[Epoch 196 Batch 120/162] avg loss 0.000528887, throughput 9.49222K wps | |
[Epoch 196 Batch 150/162] avg loss 0.0005824, throughput 9.44647K wps | |
Begin Testing... | |
[Epoch 196] train avg loss 0.000590905, dev acc 0.9333, dev avg loss 0.202797, throughput 9.45447K wps | |
[Epoch 197 Batch 30/162] avg loss 0.000651228, throughput 9.63591K wps | |
[Epoch 197 Batch 60/162] avg loss 0.000498823, throughput 9.3449K wps | |
[Epoch 197 Batch 90/162] avg loss 0.000651713, throughput 9.45516K wps | |
[Epoch 197 Batch 120/162] avg loss 0.000642572, throughput 9.35166K wps | |
[Epoch 197 Batch 150/162] avg loss 0.000611978, throughput 9.3093K wps | |
Begin Testing... | |
[Epoch 197] train avg loss 0.000606933, dev acc 0.9322, dev avg loss 0.202357, throughput 9.40384K wps | |
[Epoch 198 Batch 30/162] avg loss 0.000587599, throughput 9.59051K wps | |
[Epoch 198 Batch 60/162] avg loss 0.000625381, throughput 9.40191K wps | |
[Epoch 198 Batch 90/162] avg loss 0.000543357, throughput 9.46783K wps | |
[Epoch 198 Batch 120/162] avg loss 0.000748192, throughput 9.31532K wps | |
[Epoch 198 Batch 150/162] avg loss 0.000580954, throughput 9.30742K wps | |
Begin Testing... | |
[Epoch 198] train avg loss 0.000609185, dev acc 0.9344, dev avg loss 0.202302, throughput 9.4086K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 199 Batch 30/162] avg loss 0.00056139, throughput 9.52406K wps | |
[Epoch 199 Batch 60/162] avg loss 0.000584519, throughput 9.24864K wps | |
[Epoch 199 Batch 90/162] avg loss 0.000607698, throughput 9.32767K wps | |
[Epoch 199 Batch 120/162] avg loss 0.000543109, throughput 9.31358K wps | |
[Epoch 199 Batch 150/162] avg loss 0.00053073, throughput 9.47223K wps | |
Begin Testing... | |
[Epoch 199] train avg loss 0.000561443, dev acc 0.9311, dev avg loss 0.202647, throughput 9.37277K wps | |
Test loss 0.234349, test acc 0.9150 | |
Total time cost 445.65s | |
[Epoch 0 Batch 30/162] avg loss 0.0140324, throughput 7.31594K wps | |
[Epoch 0 Batch 60/162] avg loss 0.0137697, throughput 9.53558K wps | |
[Epoch 0 Batch 90/162] avg loss 0.013667, throughput 9.42413K wps | |
[Epoch 0 Batch 120/162] avg loss 0.0135796, throughput 9.41052K wps | |
[Epoch 0 Batch 150/162] avg loss 0.0135132, throughput 9.30372K wps | |
Begin Testing... | |
[Epoch 0] train avg loss 0.0136906, dev acc 0.6933, dev avg loss 0.663671, throughput 8.95059K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 1 Batch 30/162] avg loss 0.0132853, throughput 9.56229K wps | |
[Epoch 1 Batch 60/162] avg loss 0.0132598, throughput 9.34035K wps | |
[Epoch 1 Batch 90/162] avg loss 0.0131167, throughput 9.30347K wps | |
[Epoch 1 Batch 120/162] avg loss 0.0128403, throughput 9.25713K wps | |
[Epoch 1 Batch 150/162] avg loss 0.0128463, throughput 9.50765K wps | |
Begin Testing... | |
[Epoch 1] train avg loss 0.0130497, dev acc 0.8122, dev avg loss 0.635224, throughput 9.38824K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 2 Batch 30/162] avg loss 0.0126278, throughput 9.46677K wps | |
[Epoch 2 Batch 60/162] avg loss 0.0126171, throughput 9.33006K wps | |
[Epoch 2 Batch 90/162] avg loss 0.0123981, throughput 9.42421K wps | |
[Epoch 2 Batch 120/162] avg loss 0.0123127, throughput 9.41583K wps | |
[Epoch 2 Batch 150/162] avg loss 0.0121766, throughput 9.29956K wps | |
Begin Testing... | |
[Epoch 2] train avg loss 0.0124066, dev acc 0.8222, dev avg loss 0.602879, throughput 9.39711K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 3 Batch 30/162] avg loss 0.012074, throughput 9.77186K wps | |
[Epoch 3 Batch 60/162] avg loss 0.0119799, throughput 9.41283K wps | |
[Epoch 3 Batch 90/162] avg loss 0.0116279, throughput 9.50668K wps | |
[Epoch 3 Batch 120/162] avg loss 0.0116906, throughput 9.35647K wps | |
[Epoch 3 Batch 150/162] avg loss 0.0114773, throughput 9.56254K wps | |
Begin Testing... | |
[Epoch 3] train avg loss 0.0117511, dev acc 0.8456, dev avg loss 0.566611, throughput 9.49692K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 4 Batch 30/162] avg loss 0.0113544, throughput 9.40342K wps | |
[Epoch 4 Batch 60/162] avg loss 0.0110726, throughput 9.44167K wps | |
[Epoch 4 Batch 90/162] avg loss 0.0109634, throughput 9.49997K wps | |
[Epoch 4 Batch 120/162] avg loss 0.0108205, throughput 9.45509K wps | |
[Epoch 4 Batch 150/162] avg loss 0.0105801, throughput 9.43651K wps | |
Begin Testing... | |
[Epoch 4] train avg loss 0.0109219, dev acc 0.8589, dev avg loss 0.526782, throughput 9.43199K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 5 Batch 30/162] avg loss 0.0104652, throughput 9.6518K wps | |
[Epoch 5 Batch 60/162] avg loss 0.0104402, throughput 9.27921K wps | |
[Epoch 5 Batch 90/162] avg loss 0.0102351, throughput 9.39766K wps | |
[Epoch 5 Batch 120/162] avg loss 0.00998377, throughput 9.39601K wps | |
[Epoch 5 Batch 150/162] avg loss 0.00978893, throughput 9.24047K wps | |
Begin Testing... | |
[Epoch 5] train avg loss 0.0101777, dev acc 0.8667, dev avg loss 0.488422, throughput 9.40287K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 6 Batch 30/162] avg loss 0.00962471, throughput 9.50385K wps | |
[Epoch 6 Batch 60/162] avg loss 0.00960506, throughput 9.22368K wps | |
[Epoch 6 Batch 90/162] avg loss 0.00940988, throughput 9.52823K wps | |
[Epoch 6 Batch 120/162] avg loss 0.00923151, throughput 9.32226K wps | |
[Epoch 6 Batch 150/162] avg loss 0.00923229, throughput 9.39732K wps | |
Begin Testing... | |
[Epoch 6] train avg loss 0.00941625, dev acc 0.8600, dev avg loss 0.453629, throughput 9.38266K wps | |
[Epoch 7 Batch 30/162] avg loss 0.00911647, throughput 9.38688K wps | |
[Epoch 7 Batch 60/162] avg loss 0.00882835, throughput 9.35864K wps | |
[Epoch 7 Batch 90/162] avg loss 0.00889105, throughput 9.25503K wps | |
[Epoch 7 Batch 120/162] avg loss 0.00855239, throughput 9.38802K wps | |
[Epoch 7 Batch 150/162] avg loss 0.00855025, throughput 9.38374K wps | |
Begin Testing... | |
[Epoch 7] train avg loss 0.00878858, dev acc 0.8667, dev avg loss 0.423234, throughput 9.33794K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 8 Batch 30/162] avg loss 0.00859895, throughput 9.50349K wps | |
[Epoch 8 Batch 60/162] avg loss 0.00849058, throughput 9.48742K wps | |
[Epoch 8 Batch 90/162] avg loss 0.00805606, throughput 9.36248K wps | |
[Epoch 8 Batch 120/162] avg loss 0.00792628, throughput 9.46156K wps | |
[Epoch 8 Batch 150/162] avg loss 0.00799315, throughput 9.52324K wps | |
Begin Testing... | |
[Epoch 8] train avg loss 0.00823134, dev acc 0.8722, dev avg loss 0.398932, throughput 9.46574K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 9 Batch 30/162] avg loss 0.00803663, throughput 9.69821K wps | |
[Epoch 9 Batch 60/162] avg loss 0.00798854, throughput 9.54757K wps | |
[Epoch 9 Batch 90/162] avg loss 0.00790172, throughput 9.50315K wps | |
[Epoch 9 Batch 120/162] avg loss 0.00761154, throughput 9.29564K wps | |
[Epoch 9 Batch 150/162] avg loss 0.0076133, throughput 9.59119K wps | |
Begin Testing... | |
[Epoch 9] train avg loss 0.00779802, dev acc 0.8767, dev avg loss 0.377216, throughput 9.52225K wps | |
Observed Improvement. | |
Begin Testing... | |
[Epoch 10 Batch 30/162] avg loss 0.00754552, throughput 9.46628K wps | |
[Epoch 10 Batch 60/162] avg loss 0.00787831, throughput 9.51402K wps | |