Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
8909 lines (8908 sloc) 516 KB
Namespace(batch_size=50, data_name='CR', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='multichannel')
Use gpu0
maximum length (in tokens): 105
Done! Tokenizing Time=0.06s, #Sentences=3775
SentimentNet(
(embedding): Embedding(5343 -> 300, float32)
(embedding_extend): Embedding(5343 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(600 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/62] avg loss 0.0134723, throughput 0.343674K wps
[Epoch 0 Batch 60/62] avg loss 0.0129089, throughput 3.26262K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133812, dev acc 0.6372, dev avg loss 0.645867, throughput 0.339026K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0130102, throughput 3.34238K wps
[Epoch 1 Batch 60/62] avg loss 0.0129448, throughput 3.23616K wps
Begin Testing...
[Epoch 1] train avg loss 0.0131052, dev acc 0.6372, dev avg loss 0.640512, throughput 3.29626K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0126977, throughput 3.33366K wps
[Epoch 2 Batch 60/62] avg loss 0.0127733, throughput 3.25764K wps
Begin Testing...
[Epoch 2] train avg loss 0.0128615, dev acc 0.6372, dev avg loss 0.633058, throughput 3.30186K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0125133, throughput 3.33257K wps
[Epoch 3 Batch 60/62] avg loss 0.0125409, throughput 3.2568K wps
Begin Testing...
[Epoch 3] train avg loss 0.012671, dev acc 0.6372, dev avg loss 0.623487, throughput 3.30107K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0122701, throughput 3.33109K wps
[Epoch 4 Batch 60/62] avg loss 0.0124346, throughput 3.2536K wps
Begin Testing...
[Epoch 4] train avg loss 0.0125311, dev acc 0.6372, dev avg loss 0.61702, throughput 3.29828K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0122674, throughput 3.33839K wps
[Epoch 5 Batch 60/62] avg loss 0.0119561, throughput 3.25442K wps
Begin Testing...
[Epoch 5] train avg loss 0.0123007, dev acc 0.6431, dev avg loss 0.609875, throughput 3.30228K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0119415, throughput 3.33358K wps
[Epoch 6 Batch 60/62] avg loss 0.0119814, throughput 3.23653K wps
Begin Testing...
[Epoch 6] train avg loss 0.0121306, dev acc 0.6401, dev avg loss 0.602441, throughput 3.29088K wps
[Epoch 7 Batch 30/62] avg loss 0.0117209, throughput 3.1797K wps
[Epoch 7 Batch 60/62] avg loss 0.0119209, throughput 3.27397K wps
Begin Testing...
[Epoch 7] train avg loss 0.0119841, dev acc 0.6490, dev avg loss 0.593291, throughput 3.23459K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0115267, throughput 3.30825K wps
[Epoch 8 Batch 60/62] avg loss 0.0114703, throughput 3.23038K wps
Begin Testing...
[Epoch 8] train avg loss 0.0116767, dev acc 0.6608, dev avg loss 0.584986, throughput 3.27405K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0115753, throughput 3.32254K wps
[Epoch 9 Batch 60/62] avg loss 0.0112319, throughput 3.22645K wps
Begin Testing...
[Epoch 9] train avg loss 0.0115095, dev acc 0.6637, dev avg loss 0.576667, throughput 3.28022K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0111671, throughput 3.30046K wps
[Epoch 10 Batch 60/62] avg loss 0.0112022, throughput 3.22593K wps
Begin Testing...
[Epoch 10] train avg loss 0.0113404, dev acc 0.6873, dev avg loss 0.568127, throughput 3.26977K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0109008, throughput 3.30439K wps
[Epoch 11 Batch 60/62] avg loss 0.0109946, throughput 3.2263K wps
Begin Testing...
[Epoch 11] train avg loss 0.0110775, dev acc 0.6637, dev avg loss 0.561266, throughput 3.27122K wps
[Epoch 12 Batch 30/62] avg loss 0.0105575, throughput 3.31961K wps
[Epoch 12 Batch 60/62] avg loss 0.0107065, throughput 3.21559K wps
Begin Testing...
[Epoch 12] train avg loss 0.0108074, dev acc 0.7109, dev avg loss 0.550624, throughput 3.27443K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0107503, throughput 3.32673K wps
[Epoch 13 Batch 60/62] avg loss 0.0103472, throughput 3.2312K wps
Begin Testing...
[Epoch 13] train avg loss 0.0106261, dev acc 0.6873, dev avg loss 0.543977, throughput 3.28388K wps
[Epoch 14 Batch 30/62] avg loss 0.0103625, throughput 3.30368K wps
[Epoch 14 Batch 60/62] avg loss 0.0102037, throughput 3.24421K wps
Begin Testing...
[Epoch 14] train avg loss 0.0104303, dev acc 0.7286, dev avg loss 0.532857, throughput 3.28033K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0102559, throughput 3.34094K wps
[Epoch 15 Batch 60/62] avg loss 0.0100903, throughput 3.22256K wps
Begin Testing...
[Epoch 15] train avg loss 0.010348, dev acc 0.7404, dev avg loss 0.525199, throughput 3.28631K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.00959528, throughput 3.31265K wps
[Epoch 16 Batch 60/62] avg loss 0.0102398, throughput 3.22636K wps
Begin Testing...
[Epoch 16] train avg loss 0.0100203, dev acc 0.7611, dev avg loss 0.517671, throughput 3.27555K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.00966294, throughput 3.31638K wps
[Epoch 17 Batch 60/62] avg loss 0.00970601, throughput 3.24517K wps
Begin Testing...
[Epoch 17] train avg loss 0.00980017, dev acc 0.7463, dev avg loss 0.509622, throughput 3.28731K wps
[Epoch 18 Batch 30/62] avg loss 0.00939605, throughput 3.31666K wps
[Epoch 18 Batch 60/62] avg loss 0.00937983, throughput 3.22201K wps
Begin Testing...
[Epoch 18] train avg loss 0.00957299, dev acc 0.7611, dev avg loss 0.506679, throughput 3.27647K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.00941913, throughput 3.31988K wps
[Epoch 19 Batch 60/62] avg loss 0.00922998, throughput 3.22734K wps
Begin Testing...
[Epoch 19] train avg loss 0.00941015, dev acc 0.7640, dev avg loss 0.495414, throughput 3.2798K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.00904799, throughput 3.31764K wps
[Epoch 20 Batch 60/62] avg loss 0.00924885, throughput 3.23532K wps
Begin Testing...
[Epoch 20] train avg loss 0.00929458, dev acc 0.7611, dev avg loss 0.490562, throughput 3.28106K wps
[Epoch 21 Batch 30/62] avg loss 0.00896077, throughput 3.29325K wps
[Epoch 21 Batch 60/62] avg loss 0.00893734, throughput 3.22498K wps
Begin Testing...
[Epoch 21] train avg loss 0.00908357, dev acc 0.7817, dev avg loss 0.483413, throughput 3.26385K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.00876101, throughput 3.29305K wps
[Epoch 22 Batch 60/62] avg loss 0.00869727, throughput 3.21492K wps
Begin Testing...
[Epoch 22] train avg loss 0.00886027, dev acc 0.7640, dev avg loss 0.48087, throughput 3.25976K wps
[Epoch 23 Batch 30/62] avg loss 0.00860429, throughput 3.31881K wps
[Epoch 23 Batch 60/62] avg loss 0.00838256, throughput 3.23879K wps
Begin Testing...
[Epoch 23] train avg loss 0.00865908, dev acc 0.7699, dev avg loss 0.474155, throughput 3.28369K wps
[Epoch 24 Batch 30/62] avg loss 0.0086947, throughput 3.2851K wps
[Epoch 24 Batch 60/62] avg loss 0.00827456, throughput 3.23034K wps
Begin Testing...
[Epoch 24] train avg loss 0.00858808, dev acc 0.7758, dev avg loss 0.469919, throughput 3.26461K wps
[Epoch 25 Batch 30/62] avg loss 0.00830087, throughput 3.30858K wps
[Epoch 25 Batch 60/62] avg loss 0.00799783, throughput 3.22975K wps
Begin Testing...
[Epoch 25] train avg loss 0.00827258, dev acc 0.7788, dev avg loss 0.46376, throughput 3.27575K wps
[Epoch 26 Batch 30/62] avg loss 0.00792655, throughput 3.29768K wps
[Epoch 26 Batch 60/62] avg loss 0.00838465, throughput 3.23694K wps
Begin Testing...
[Epoch 26] train avg loss 0.00824073, dev acc 0.7817, dev avg loss 0.459772, throughput 3.27171K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/62] avg loss 0.0081229, throughput 3.32102K wps
[Epoch 27 Batch 60/62] avg loss 0.00793227, throughput 3.23567K wps
Begin Testing...
[Epoch 27] train avg loss 0.00806027, dev acc 0.7817, dev avg loss 0.456041, throughput 3.28356K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.00770145, throughput 3.29657K wps
[Epoch 28 Batch 60/62] avg loss 0.00784183, throughput 3.22498K wps
Begin Testing...
[Epoch 28] train avg loss 0.00783569, dev acc 0.7876, dev avg loss 0.451941, throughput 3.26522K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.00780541, throughput 3.31888K wps
[Epoch 29 Batch 60/62] avg loss 0.00755988, throughput 3.24892K wps
Begin Testing...
[Epoch 29] train avg loss 0.00775539, dev acc 0.7758, dev avg loss 0.44961, throughput 3.29014K wps
[Epoch 30 Batch 30/62] avg loss 0.00767302, throughput 3.3019K wps
[Epoch 30 Batch 60/62] avg loss 0.00729676, throughput 3.23396K wps
Begin Testing...
[Epoch 30] train avg loss 0.0075499, dev acc 0.7876, dev avg loss 0.445713, throughput 3.27319K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00729512, throughput 3.30777K wps
[Epoch 31 Batch 60/62] avg loss 0.00731529, throughput 3.23202K wps
Begin Testing...
[Epoch 31] train avg loss 0.00741992, dev acc 0.7788, dev avg loss 0.445104, throughput 3.27476K wps
[Epoch 32 Batch 30/62] avg loss 0.00715808, throughput 3.28979K wps
[Epoch 32 Batch 60/62] avg loss 0.00740053, throughput 3.22262K wps
Begin Testing...
[Epoch 32] train avg loss 0.00739935, dev acc 0.7847, dev avg loss 0.443554, throughput 3.2617K wps
[Epoch 33 Batch 30/62] avg loss 0.0070608, throughput 3.29572K wps
[Epoch 33 Batch 60/62] avg loss 0.00714915, throughput 3.21816K wps
Begin Testing...
[Epoch 33] train avg loss 0.00720835, dev acc 0.7906, dev avg loss 0.437648, throughput 3.26273K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.00700767, throughput 3.27862K wps
[Epoch 34 Batch 60/62] avg loss 0.00707173, throughput 3.23245K wps
Begin Testing...
[Epoch 34] train avg loss 0.00705619, dev acc 0.7906, dev avg loss 0.436573, throughput 3.26106K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/62] avg loss 0.00676388, throughput 3.29216K wps
[Epoch 35 Batch 60/62] avg loss 0.00698487, throughput 3.23153K wps
Begin Testing...
[Epoch 35] train avg loss 0.00705509, dev acc 0.7876, dev avg loss 0.433251, throughput 3.269K wps
[Epoch 36 Batch 30/62] avg loss 0.00663338, throughput 3.30999K wps
[Epoch 36 Batch 60/62] avg loss 0.00684657, throughput 3.21403K wps
Begin Testing...
[Epoch 36] train avg loss 0.00683581, dev acc 0.7817, dev avg loss 0.438491, throughput 3.26818K wps
[Epoch 37 Batch 30/62] avg loss 0.00666328, throughput 3.33245K wps
[Epoch 37 Batch 60/62] avg loss 0.00666124, throughput 3.23217K wps
Begin Testing...
[Epoch 37] train avg loss 0.00670551, dev acc 0.7788, dev avg loss 0.43255, throughput 3.28855K wps
[Epoch 38 Batch 30/62] avg loss 0.00645705, throughput 3.29445K wps
[Epoch 38 Batch 60/62] avg loss 0.00658762, throughput 3.22781K wps
Begin Testing...
[Epoch 38] train avg loss 0.00662022, dev acc 0.8201, dev avg loss 0.426898, throughput 3.26504K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/62] avg loss 0.00621295, throughput 3.30825K wps
[Epoch 39 Batch 60/62] avg loss 0.00640877, throughput 3.21856K wps
Begin Testing...
[Epoch 39] train avg loss 0.00647128, dev acc 0.8024, dev avg loss 0.423956, throughput 3.26855K wps
[Epoch 40 Batch 30/62] avg loss 0.00630716, throughput 3.31153K wps
[Epoch 40 Batch 60/62] avg loss 0.00626895, throughput 3.20577K wps
Begin Testing...
[Epoch 40] train avg loss 0.00635059, dev acc 0.8142, dev avg loss 0.422436, throughput 3.26418K wps
[Epoch 41 Batch 30/62] avg loss 0.00591307, throughput 3.30095K wps
[Epoch 41 Batch 60/62] avg loss 0.00633876, throughput 3.22163K wps
Begin Testing...
[Epoch 41] train avg loss 0.00615976, dev acc 0.7817, dev avg loss 0.429495, throughput 3.26752K wps
[Epoch 42 Batch 30/62] avg loss 0.00602473, throughput 3.31965K wps
[Epoch 42 Batch 60/62] avg loss 0.00625138, throughput 3.22533K wps
Begin Testing...
[Epoch 42] train avg loss 0.00620687, dev acc 0.8171, dev avg loss 0.419217, throughput 3.27828K wps
[Epoch 43 Batch 30/62] avg loss 0.00601633, throughput 3.29877K wps
[Epoch 43 Batch 60/62] avg loss 0.00602545, throughput 3.22344K wps
Begin Testing...
[Epoch 43] train avg loss 0.00607012, dev acc 0.8053, dev avg loss 0.420083, throughput 3.26671K wps
[Epoch 44 Batch 30/62] avg loss 0.00561471, throughput 3.32077K wps
[Epoch 44 Batch 60/62] avg loss 0.00606478, throughput 3.23325K wps
Begin Testing...
[Epoch 44] train avg loss 0.00587506, dev acc 0.8142, dev avg loss 0.416824, throughput 3.28334K wps
[Epoch 45 Batch 30/62] avg loss 0.00577342, throughput 3.30063K wps
[Epoch 45 Batch 60/62] avg loss 0.00558224, throughput 3.2328K wps
Begin Testing...
[Epoch 45] train avg loss 0.00570039, dev acc 0.8083, dev avg loss 0.415431, throughput 3.27265K wps
[Epoch 46 Batch 30/62] avg loss 0.00538927, throughput 3.29858K wps
[Epoch 46 Batch 60/62] avg loss 0.00571484, throughput 3.23673K wps
Begin Testing...
[Epoch 46] train avg loss 0.00562714, dev acc 0.8171, dev avg loss 0.413723, throughput 3.27451K wps
[Epoch 47 Batch 30/62] avg loss 0.0054065, throughput 3.32832K wps
[Epoch 47 Batch 60/62] avg loss 0.00558721, throughput 3.22322K wps
Begin Testing...
[Epoch 47] train avg loss 0.00553656, dev acc 0.8112, dev avg loss 0.411693, throughput 3.2798K wps
[Epoch 48 Batch 30/62] avg loss 0.00544149, throughput 3.30003K wps
[Epoch 48 Batch 60/62] avg loss 0.00541284, throughput 3.23414K wps
Begin Testing...
[Epoch 48] train avg loss 0.0055122, dev acc 0.8053, dev avg loss 0.414174, throughput 3.27228K wps
[Epoch 49 Batch 30/62] avg loss 0.00565469, throughput 3.29724K wps
[Epoch 49 Batch 60/62] avg loss 0.00513911, throughput 3.20441K wps
Begin Testing...
[Epoch 49] train avg loss 0.00546651, dev acc 0.8201, dev avg loss 0.408184, throughput 3.25307K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.00536651, throughput 3.29486K wps
[Epoch 50 Batch 60/62] avg loss 0.00519544, throughput 3.22975K wps
Begin Testing...
[Epoch 50] train avg loss 0.00529447, dev acc 0.8112, dev avg loss 0.413166, throughput 3.26715K wps
[Epoch 51 Batch 30/62] avg loss 0.00519159, throughput 3.28311K wps
[Epoch 51 Batch 60/62] avg loss 0.00512819, throughput 3.21988K wps
Begin Testing...
[Epoch 51] train avg loss 0.00521718, dev acc 0.8201, dev avg loss 0.40608, throughput 3.25741K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00527688, throughput 3.30122K wps
[Epoch 52 Batch 60/62] avg loss 0.00485971, throughput 3.22841K wps
Begin Testing...
[Epoch 52] train avg loss 0.00511752, dev acc 0.8230, dev avg loss 0.405056, throughput 3.27042K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.00514033, throughput 3.3005K wps
[Epoch 53 Batch 60/62] avg loss 0.00480066, throughput 3.23974K wps
Begin Testing...
[Epoch 53] train avg loss 0.00503389, dev acc 0.8112, dev avg loss 0.40768, throughput 3.27547K wps
[Epoch 54 Batch 30/62] avg loss 0.00481823, throughput 3.29592K wps
[Epoch 54 Batch 60/62] avg loss 0.00488368, throughput 3.23277K wps
Begin Testing...
[Epoch 54] train avg loss 0.00488105, dev acc 0.8171, dev avg loss 0.406379, throughput 3.26935K wps
[Epoch 55 Batch 30/62] avg loss 0.00467364, throughput 3.31428K wps
[Epoch 55 Batch 60/62] avg loss 0.0048564, throughput 3.19604K wps
Begin Testing...
[Epoch 55] train avg loss 0.00483617, dev acc 0.8230, dev avg loss 0.402401, throughput 3.2616K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.00480682, throughput 3.28673K wps
[Epoch 56 Batch 60/62] avg loss 0.00461925, throughput 3.22143K wps
Begin Testing...
[Epoch 56] train avg loss 0.00479454, dev acc 0.8201, dev avg loss 0.405635, throughput 3.25941K wps
[Epoch 57 Batch 30/62] avg loss 0.00440682, throughput 3.3132K wps
[Epoch 57 Batch 60/62] avg loss 0.00475593, throughput 3.20848K wps
Begin Testing...
[Epoch 57] train avg loss 0.0045979, dev acc 0.8260, dev avg loss 0.401906, throughput 3.26637K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/62] avg loss 0.00439177, throughput 3.29761K wps
[Epoch 58 Batch 60/62] avg loss 0.00462431, throughput 3.21642K wps
Begin Testing...
[Epoch 58] train avg loss 0.00455091, dev acc 0.8319, dev avg loss 0.400016, throughput 3.262K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/62] avg loss 0.00437344, throughput 3.30008K wps
[Epoch 59 Batch 60/62] avg loss 0.00448443, throughput 3.22982K wps
Begin Testing...
[Epoch 59] train avg loss 0.00447979, dev acc 0.8407, dev avg loss 0.399509, throughput 3.27032K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/62] avg loss 0.0042501, throughput 3.31032K wps
[Epoch 60 Batch 60/62] avg loss 0.00455598, throughput 3.23222K wps
Begin Testing...
[Epoch 60] train avg loss 0.00443393, dev acc 0.8348, dev avg loss 0.401052, throughput 3.2708K wps
[Epoch 61 Batch 30/62] avg loss 0.00422981, throughput 3.29831K wps
[Epoch 61 Batch 60/62] avg loss 0.00422732, throughput 3.21102K wps
Begin Testing...
[Epoch 61] train avg loss 0.00425805, dev acc 0.8289, dev avg loss 0.396667, throughput 3.26158K wps
[Epoch 62 Batch 30/62] avg loss 0.00437636, throughput 3.3024K wps
[Epoch 62 Batch 60/62] avg loss 0.00405651, throughput 3.23519K wps
Begin Testing...
[Epoch 62] train avg loss 0.0043168, dev acc 0.8319, dev avg loss 0.396371, throughput 3.2754K wps
[Epoch 63 Batch 30/62] avg loss 0.00411264, throughput 3.29625K wps
[Epoch 63 Batch 60/62] avg loss 0.00404028, throughput 3.21693K wps
Begin Testing...
[Epoch 63] train avg loss 0.00416166, dev acc 0.8437, dev avg loss 0.397971, throughput 3.26325K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/62] avg loss 0.00402705, throughput 3.31298K wps
[Epoch 64 Batch 60/62] avg loss 0.00414228, throughput 3.21653K wps
Begin Testing...
[Epoch 64] train avg loss 0.00410033, dev acc 0.8348, dev avg loss 0.395691, throughput 3.27071K wps
[Epoch 65 Batch 30/62] avg loss 0.00397217, throughput 3.3061K wps
[Epoch 65 Batch 60/62] avg loss 0.00396839, throughput 3.22889K wps
Begin Testing...
[Epoch 65] train avg loss 0.00406161, dev acc 0.8260, dev avg loss 0.403318, throughput 3.27189K wps
[Epoch 66 Batch 30/62] avg loss 0.00385111, throughput 3.29607K wps
[Epoch 66 Batch 60/62] avg loss 0.00402661, throughput 3.22201K wps
Begin Testing...
[Epoch 66] train avg loss 0.00399262, dev acc 0.8407, dev avg loss 0.395094, throughput 3.26631K wps
[Epoch 67 Batch 30/62] avg loss 0.00385873, throughput 3.29535K wps
[Epoch 67 Batch 60/62] avg loss 0.00373285, throughput 3.204K wps
Begin Testing...
[Epoch 67] train avg loss 0.0038312, dev acc 0.8496, dev avg loss 0.395513, throughput 3.25459K wps
Observed Improvement.
Begin Testing...
[Epoch 68 Batch 30/62] avg loss 0.00352478, throughput 3.28579K wps
[Epoch 68 Batch 60/62] avg loss 0.00391364, throughput 3.2085K wps
Begin Testing...
[Epoch 68] train avg loss 0.00381432, dev acc 0.8230, dev avg loss 0.392567, throughput 3.25221K wps
[Epoch 69 Batch 30/62] avg loss 0.00372169, throughput 3.2866K wps
[Epoch 69 Batch 60/62] avg loss 0.0036612, throughput 3.21547K wps
Begin Testing...
[Epoch 69] train avg loss 0.00371983, dev acc 0.8319, dev avg loss 0.392922, throughput 3.2561K wps
[Epoch 70 Batch 30/62] avg loss 0.00347704, throughput 3.28018K wps
[Epoch 70 Batch 60/62] avg loss 0.00367201, throughput 3.2143K wps
Begin Testing...
[Epoch 70] train avg loss 0.00359678, dev acc 0.8525, dev avg loss 0.395358, throughput 3.25485K wps
Observed Improvement.
Begin Testing...
[Epoch 71 Batch 30/62] avg loss 0.00344919, throughput 3.29709K wps
[Epoch 71 Batch 60/62] avg loss 0.00355758, throughput 3.21736K wps
Begin Testing...
[Epoch 71] train avg loss 0.00353729, dev acc 0.8319, dev avg loss 0.391801, throughput 3.26274K wps
[Epoch 72 Batch 30/62] avg loss 0.00365617, throughput 3.30477K wps
[Epoch 72 Batch 60/62] avg loss 0.00334467, throughput 3.2184K wps
Begin Testing...
[Epoch 72] train avg loss 0.00355517, dev acc 0.8466, dev avg loss 0.394742, throughput 3.26841K wps
[Epoch 73 Batch 30/62] avg loss 0.00331663, throughput 3.29091K wps
[Epoch 73 Batch 60/62] avg loss 0.00331115, throughput 3.22724K wps
Begin Testing...
[Epoch 73] train avg loss 0.00337466, dev acc 0.8348, dev avg loss 0.390193, throughput 3.26476K wps
[Epoch 74 Batch 30/62] avg loss 0.00329994, throughput 3.29878K wps
[Epoch 74 Batch 60/62] avg loss 0.00337102, throughput 3.22008K wps
Begin Testing...
[Epoch 74] train avg loss 0.00334545, dev acc 0.8348, dev avg loss 0.390526, throughput 3.26467K wps
[Epoch 75 Batch 30/62] avg loss 0.00317221, throughput 3.31775K wps
[Epoch 75 Batch 60/62] avg loss 0.003445, throughput 3.23261K wps
Begin Testing...
[Epoch 75] train avg loss 0.00332648, dev acc 0.8525, dev avg loss 0.391598, throughput 3.28137K wps
Observed Improvement.
Begin Testing...
[Epoch 76 Batch 30/62] avg loss 0.0031992, throughput 3.31361K wps
[Epoch 76 Batch 60/62] avg loss 0.00347003, throughput 3.22705K wps
Begin Testing...
[Epoch 76] train avg loss 0.00335065, dev acc 0.8407, dev avg loss 0.389136, throughput 3.27604K wps
[Epoch 77 Batch 30/62] avg loss 0.00329104, throughput 3.29523K wps
[Epoch 77 Batch 60/62] avg loss 0.00304171, throughput 3.21723K wps
Begin Testing...
[Epoch 77] train avg loss 0.00324275, dev acc 0.8378, dev avg loss 0.388245, throughput 3.26066K wps
[Epoch 78 Batch 30/62] avg loss 0.00303973, throughput 3.28892K wps
[Epoch 78 Batch 60/62] avg loss 0.00313471, throughput 3.22585K wps
Begin Testing...
[Epoch 78] train avg loss 0.00310289, dev acc 0.8437, dev avg loss 0.389966, throughput 3.26276K wps
[Epoch 79 Batch 30/62] avg loss 0.00301366, throughput 3.30693K wps
[Epoch 79 Batch 60/62] avg loss 0.00304934, throughput 3.21177K wps
Begin Testing...
[Epoch 79] train avg loss 0.00305653, dev acc 0.8407, dev avg loss 0.388926, throughput 3.26426K wps
[Epoch 80 Batch 30/62] avg loss 0.00292107, throughput 3.28739K wps
[Epoch 80 Batch 60/62] avg loss 0.0029615, throughput 3.20525K wps
Begin Testing...
[Epoch 80] train avg loss 0.0030004, dev acc 0.8407, dev avg loss 0.389602, throughput 3.25222K wps
[Epoch 81 Batch 30/62] avg loss 0.00300573, throughput 3.29602K wps
[Epoch 81 Batch 60/62] avg loss 0.00289599, throughput 3.21929K wps
Begin Testing...
[Epoch 81] train avg loss 0.00294066, dev acc 0.8525, dev avg loss 0.392829, throughput 3.26372K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/62] avg loss 0.00279445, throughput 3.28863K wps
[Epoch 82 Batch 60/62] avg loss 0.00288153, throughput 3.23195K wps
Begin Testing...
[Epoch 82] train avg loss 0.00284179, dev acc 0.8437, dev avg loss 0.388285, throughput 3.26551K wps
[Epoch 83 Batch 30/62] avg loss 0.00293351, throughput 3.29329K wps
[Epoch 83 Batch 60/62] avg loss 0.00277965, throughput 3.23905K wps
Begin Testing...
[Epoch 83] train avg loss 0.00287721, dev acc 0.8437, dev avg loss 0.388527, throughput 3.27501K wps
[Epoch 84 Batch 30/62] avg loss 0.00268538, throughput 3.30941K wps
[Epoch 84 Batch 60/62] avg loss 0.0028424, throughput 3.22296K wps
Begin Testing...
[Epoch 84] train avg loss 0.0027851, dev acc 0.8407, dev avg loss 0.397724, throughput 3.27197K wps
[Epoch 85 Batch 30/62] avg loss 0.00278038, throughput 3.27828K wps
[Epoch 85 Batch 60/62] avg loss 0.0026297, throughput 3.21692K wps
Begin Testing...
[Epoch 85] train avg loss 0.00273534, dev acc 0.8289, dev avg loss 0.389478, throughput 3.25278K wps
[Epoch 86 Batch 30/62] avg loss 0.00276298, throughput 3.28136K wps
[Epoch 86 Batch 60/62] avg loss 0.00273945, throughput 3.21462K wps
Begin Testing...
[Epoch 86] train avg loss 0.00280084, dev acc 0.8348, dev avg loss 0.388193, throughput 3.25468K wps
[Epoch 87 Batch 30/62] avg loss 0.00254576, throughput 3.29895K wps
[Epoch 87 Batch 60/62] avg loss 0.00271004, throughput 3.21273K wps
Begin Testing...
[Epoch 87] train avg loss 0.00267001, dev acc 0.8525, dev avg loss 0.390259, throughput 3.26138K wps
Observed Improvement.
Begin Testing...
[Epoch 88 Batch 30/62] avg loss 0.00272699, throughput 3.28798K wps
[Epoch 88 Batch 60/62] avg loss 0.00248168, throughput 3.18734K wps
Begin Testing...
[Epoch 88] train avg loss 0.00262369, dev acc 0.8437, dev avg loss 0.38882, throughput 3.24523K wps
[Epoch 89 Batch 30/62] avg loss 0.00249105, throughput 3.2749K wps
[Epoch 89 Batch 60/62] avg loss 0.00255826, throughput 3.20457K wps
Begin Testing...
[Epoch 89] train avg loss 0.00258675, dev acc 0.8407, dev avg loss 0.38885, throughput 3.24594K wps
[Epoch 90 Batch 30/62] avg loss 0.0025422, throughput 3.30303K wps
[Epoch 90 Batch 60/62] avg loss 0.00255329, throughput 3.20295K wps
Begin Testing...
[Epoch 90] train avg loss 0.00261719, dev acc 0.8289, dev avg loss 0.390579, throughput 3.25905K wps
[Epoch 91 Batch 30/62] avg loss 0.00236048, throughput 3.31575K wps
[Epoch 91 Batch 60/62] avg loss 0.00257886, throughput 3.20389K wps
Begin Testing...
[Epoch 91] train avg loss 0.00248943, dev acc 0.8289, dev avg loss 0.390438, throughput 3.26667K wps
[Epoch 92 Batch 30/62] avg loss 0.00232691, throughput 3.30306K wps
[Epoch 92 Batch 60/62] avg loss 0.0024305, throughput 3.20762K wps
Begin Testing...
[Epoch 92] train avg loss 0.00242786, dev acc 0.8319, dev avg loss 0.388668, throughput 3.26243K wps
[Epoch 93 Batch 30/62] avg loss 0.00251963, throughput 3.31215K wps
[Epoch 93 Batch 60/62] avg loss 0.00224475, throughput 3.22292K wps
Begin Testing...
[Epoch 93] train avg loss 0.00239372, dev acc 0.8525, dev avg loss 0.390595, throughput 3.27298K wps
Observed Improvement.
Begin Testing...
[Epoch 94 Batch 30/62] avg loss 0.00221825, throughput 3.29441K wps
[Epoch 94 Batch 60/62] avg loss 0.00243126, throughput 3.2292K wps
Begin Testing...
[Epoch 94] train avg loss 0.0023344, dev acc 0.8525, dev avg loss 0.391429, throughput 3.26687K wps
Observed Improvement.
Begin Testing...
[Epoch 95 Batch 30/62] avg loss 0.00241627, throughput 3.29648K wps
[Epoch 95 Batch 60/62] avg loss 0.00222388, throughput 3.21014K wps
Begin Testing...
[Epoch 95] train avg loss 0.00233129, dev acc 0.8378, dev avg loss 0.38917, throughput 3.26002K wps
[Epoch 96 Batch 30/62] avg loss 0.00228826, throughput 3.28843K wps
[Epoch 96 Batch 60/62] avg loss 0.00246106, throughput 3.22754K wps
Begin Testing...
[Epoch 96] train avg loss 0.0023792, dev acc 0.8407, dev avg loss 0.389341, throughput 3.26457K wps
[Epoch 97 Batch 30/62] avg loss 0.00223325, throughput 3.27964K wps
[Epoch 97 Batch 60/62] avg loss 0.00223822, throughput 3.22679K wps
Begin Testing...
[Epoch 97] train avg loss 0.00229143, dev acc 0.8378, dev avg loss 0.389024, throughput 3.25916K wps
[Epoch 98 Batch 30/62] avg loss 0.00221346, throughput 3.27796K wps
[Epoch 98 Batch 60/62] avg loss 0.00227809, throughput 3.23167K wps
Begin Testing...
[Epoch 98] train avg loss 0.00231521, dev acc 0.8289, dev avg loss 0.38874, throughput 3.25917K wps
[Epoch 99 Batch 30/62] avg loss 0.00208261, throughput 3.29972K wps
[Epoch 99 Batch 60/62] avg loss 0.00204515, throughput 3.21821K wps
Begin Testing...
[Epoch 99] train avg loss 0.00210253, dev acc 0.8466, dev avg loss 0.387696, throughput 3.26405K wps
[Epoch 100 Batch 30/62] avg loss 0.0021161, throughput 3.29562K wps
[Epoch 100 Batch 60/62] avg loss 0.00209687, throughput 3.20877K wps
Begin Testing...
[Epoch 100] train avg loss 0.00214338, dev acc 0.8319, dev avg loss 0.388735, throughput 3.25817K wps
[Epoch 101 Batch 30/62] avg loss 0.00216147, throughput 3.27939K wps
[Epoch 101 Batch 60/62] avg loss 0.00205364, throughput 3.21589K wps
Begin Testing...
[Epoch 101] train avg loss 0.00213624, dev acc 0.8289, dev avg loss 0.391255, throughput 3.25327K wps
[Epoch 102 Batch 30/62] avg loss 0.00211164, throughput 3.27082K wps
[Epoch 102 Batch 60/62] avg loss 0.0019952, throughput 3.22466K wps
Begin Testing...
[Epoch 102] train avg loss 0.002083, dev acc 0.8466, dev avg loss 0.39249, throughput 3.25337K wps
[Epoch 103 Batch 30/62] avg loss 0.00200842, throughput 3.29248K wps
[Epoch 103 Batch 60/62] avg loss 0.00205018, throughput 3.22181K wps
Begin Testing...
[Epoch 103] train avg loss 0.00203013, dev acc 0.8437, dev avg loss 0.388835, throughput 3.26247K wps
[Epoch 104 Batch 30/62] avg loss 0.0020067, throughput 3.28145K wps
[Epoch 104 Batch 60/62] avg loss 0.00189851, throughput 3.22149K wps
Begin Testing...
[Epoch 104] train avg loss 0.00197464, dev acc 0.8525, dev avg loss 0.390005, throughput 3.25658K wps
Observed Improvement.
Begin Testing...
[Epoch 105 Batch 30/62] avg loss 0.00199213, throughput 3.2788K wps
[Epoch 105 Batch 60/62] avg loss 0.0020008, throughput 3.1782K wps
Begin Testing...
[Epoch 105] train avg loss 0.00205958, dev acc 0.8437, dev avg loss 0.389645, throughput 3.23448K wps
[Epoch 106 Batch 30/62] avg loss 0.00187829, throughput 3.29454K wps
[Epoch 106 Batch 60/62] avg loss 0.00190597, throughput 3.19596K wps
Begin Testing...
[Epoch 106] train avg loss 0.00190647, dev acc 0.8407, dev avg loss 0.390076, throughput 3.24988K wps
[Epoch 107 Batch 30/62] avg loss 0.00185693, throughput 3.31479K wps
[Epoch 107 Batch 60/62] avg loss 0.00189857, throughput 3.23387K wps
Begin Testing...
[Epoch 107] train avg loss 0.00191293, dev acc 0.8407, dev avg loss 0.390179, throughput 3.28082K wps
[Epoch 108 Batch 30/62] avg loss 0.00174523, throughput 3.309K wps
[Epoch 108 Batch 60/62] avg loss 0.00195723, throughput 3.22334K wps
Begin Testing...
[Epoch 108] train avg loss 0.00187826, dev acc 0.8407, dev avg loss 0.391125, throughput 3.27192K wps
[Epoch 109 Batch 30/62] avg loss 0.00175578, throughput 3.28018K wps
[Epoch 109 Batch 60/62] avg loss 0.0018422, throughput 3.21034K wps
Begin Testing...
[Epoch 109] train avg loss 0.00181805, dev acc 0.8555, dev avg loss 0.393443, throughput 3.25118K wps
Observed Improvement.
Begin Testing...
[Epoch 110 Batch 30/62] avg loss 0.00169076, throughput 3.29496K wps
[Epoch 110 Batch 60/62] avg loss 0.0018451, throughput 3.19842K wps
Begin Testing...
[Epoch 110] train avg loss 0.00181804, dev acc 0.8378, dev avg loss 0.392322, throughput 3.25407K wps
[Epoch 111 Batch 30/62] avg loss 0.00184038, throughput 3.29625K wps
[Epoch 111 Batch 60/62] avg loss 0.00178821, throughput 3.18867K wps
Begin Testing...
[Epoch 111] train avg loss 0.001864, dev acc 0.8437, dev avg loss 0.391509, throughput 3.24997K wps
[Epoch 112 Batch 30/62] avg loss 0.00172273, throughput 3.29424K wps
[Epoch 112 Batch 60/62] avg loss 0.00172912, throughput 3.18377K wps
Begin Testing...
[Epoch 112] train avg loss 0.00175813, dev acc 0.8525, dev avg loss 0.395375, throughput 3.24298K wps
[Epoch 113 Batch 30/62] avg loss 0.00162077, throughput 3.26473K wps
[Epoch 113 Batch 60/62] avg loss 0.00189236, throughput 3.21073K wps
Begin Testing...
[Epoch 113] train avg loss 0.00180109, dev acc 0.8437, dev avg loss 0.393, throughput 3.24343K wps
[Epoch 114 Batch 30/62] avg loss 0.00166752, throughput 3.2699K wps
[Epoch 114 Batch 60/62] avg loss 0.00175969, throughput 3.21173K wps
Begin Testing...
[Epoch 114] train avg loss 0.00172691, dev acc 0.8525, dev avg loss 0.395342, throughput 3.24659K wps
[Epoch 115 Batch 30/62] avg loss 0.00165015, throughput 3.29846K wps
[Epoch 115 Batch 60/62] avg loss 0.00162462, throughput 3.2352K wps
Begin Testing...
[Epoch 115] train avg loss 0.00165205, dev acc 0.8466, dev avg loss 0.393891, throughput 3.27301K wps
[Epoch 116 Batch 30/62] avg loss 0.00156695, throughput 3.29889K wps
[Epoch 116 Batch 60/62] avg loss 0.00171269, throughput 3.23348K wps
Begin Testing...
[Epoch 116] train avg loss 0.00166122, dev acc 0.8437, dev avg loss 0.394668, throughput 3.27234K wps
[Epoch 117 Batch 30/62] avg loss 0.00156918, throughput 3.31637K wps
[Epoch 117 Batch 60/62] avg loss 0.00166263, throughput 3.23099K wps
Begin Testing...
[Epoch 117] train avg loss 0.00162889, dev acc 0.8466, dev avg loss 0.396185, throughput 3.28021K wps
[Epoch 118 Batch 30/62] avg loss 0.00166891, throughput 3.26902K wps
[Epoch 118 Batch 60/62] avg loss 0.00160721, throughput 3.20859K wps
Begin Testing...
[Epoch 118] train avg loss 0.00163918, dev acc 0.8466, dev avg loss 0.398858, throughput 3.24596K wps
[Epoch 119 Batch 30/62] avg loss 0.00157578, throughput 3.26775K wps
[Epoch 119 Batch 60/62] avg loss 0.00150904, throughput 3.15171K wps
Begin Testing...
[Epoch 119] train avg loss 0.00158108, dev acc 0.8466, dev avg loss 0.399138, throughput 3.21564K wps
[Epoch 120 Batch 30/62] avg loss 0.00143529, throughput 3.25264K wps
[Epoch 120 Batch 60/62] avg loss 0.00157817, throughput 3.15017K wps
Begin Testing...
[Epoch 120] train avg loss 0.00151118, dev acc 0.8407, dev avg loss 0.39601, throughput 3.20674K wps
[Epoch 121 Batch 30/62] avg loss 0.00147313, throughput 3.25485K wps
[Epoch 121 Batch 60/62] avg loss 0.00150156, throughput 3.15262K wps
Begin Testing...
[Epoch 121] train avg loss 0.00153048, dev acc 0.8437, dev avg loss 0.400966, throughput 3.20918K wps
[Epoch 122 Batch 30/62] avg loss 0.00145238, throughput 3.24992K wps
[Epoch 122 Batch 60/62] avg loss 0.0015233, throughput 3.15663K wps
Begin Testing...
[Epoch 122] train avg loss 0.00150055, dev acc 0.8496, dev avg loss 0.398548, throughput 3.20964K wps
[Epoch 123 Batch 30/62] avg loss 0.00153649, throughput 3.26206K wps
[Epoch 123 Batch 60/62] avg loss 0.00144486, throughput 3.20947K wps
Begin Testing...
[Epoch 123] train avg loss 0.00152311, dev acc 0.8496, dev avg loss 0.397735, throughput 3.24192K wps
[Epoch 124 Batch 30/62] avg loss 0.00146153, throughput 3.28926K wps
[Epoch 124 Batch 60/62] avg loss 0.00137095, throughput 3.22121K wps
Begin Testing...
[Epoch 124] train avg loss 0.00146532, dev acc 0.8407, dev avg loss 0.396849, throughput 3.26132K wps
[Epoch 125 Batch 30/62] avg loss 0.001439, throughput 3.27489K wps
[Epoch 125 Batch 60/62] avg loss 0.00146776, throughput 3.19366K wps
Begin Testing...
[Epoch 125] train avg loss 0.00147532, dev acc 0.8437, dev avg loss 0.396767, throughput 3.24169K wps
[Epoch 126 Batch 30/62] avg loss 0.00146663, throughput 3.28177K wps
[Epoch 126 Batch 60/62] avg loss 0.00132044, throughput 3.1568K wps
Begin Testing...
[Epoch 126] train avg loss 0.0014118, dev acc 0.8496, dev avg loss 0.399621, throughput 3.22407K wps
[Epoch 127 Batch 30/62] avg loss 0.00137669, throughput 3.23611K wps
[Epoch 127 Batch 60/62] avg loss 0.00138536, throughput 3.15519K wps
Begin Testing...
[Epoch 127] train avg loss 0.00141245, dev acc 0.8496, dev avg loss 0.398625, throughput 3.20103K wps
[Epoch 128 Batch 30/62] avg loss 0.00141957, throughput 3.26182K wps
[Epoch 128 Batch 60/62] avg loss 0.00142737, throughput 3.1661K wps
Begin Testing...
[Epoch 128] train avg loss 0.00148748, dev acc 0.8319, dev avg loss 0.398191, throughput 3.22134K wps
[Epoch 129 Batch 30/62] avg loss 0.00136145, throughput 3.2844K wps
[Epoch 129 Batch 60/62] avg loss 0.00127108, throughput 3.19551K wps
Begin Testing...
[Epoch 129] train avg loss 0.00133248, dev acc 0.8437, dev avg loss 0.398243, throughput 3.24635K wps
[Epoch 130 Batch 30/62] avg loss 0.00131541, throughput 3.25239K wps
[Epoch 130 Batch 60/62] avg loss 0.00128013, throughput 3.16022K wps
Begin Testing...
[Epoch 130] train avg loss 0.0013367, dev acc 0.8407, dev avg loss 0.398031, throughput 3.21253K wps
[Epoch 131 Batch 30/62] avg loss 0.00147907, throughput 3.24936K wps
[Epoch 131 Batch 60/62] avg loss 0.00130702, throughput 3.21129K wps
Begin Testing...
[Epoch 131] train avg loss 0.00142462, dev acc 0.8378, dev avg loss 0.398569, throughput 3.2362K wps
[Epoch 132 Batch 30/62] avg loss 0.00145317, throughput 3.26555K wps
[Epoch 132 Batch 60/62] avg loss 0.00117774, throughput 3.16072K wps
Begin Testing...
[Epoch 132] train avg loss 0.00137631, dev acc 0.8378, dev avg loss 0.41748, throughput 3.21897K wps
[Epoch 133 Batch 30/62] avg loss 0.00127893, throughput 3.23708K wps
[Epoch 133 Batch 60/62] avg loss 0.00141459, throughput 3.15392K wps
Begin Testing...
[Epoch 133] train avg loss 0.0013522, dev acc 0.8437, dev avg loss 0.399488, throughput 3.20165K wps
[Epoch 134 Batch 30/62] avg loss 0.00130058, throughput 3.23931K wps
[Epoch 134 Batch 60/62] avg loss 0.00117367, throughput 3.13912K wps
Begin Testing...
[Epoch 134] train avg loss 0.00123426, dev acc 0.8496, dev avg loss 0.401637, throughput 3.19605K wps
[Epoch 135 Batch 30/62] avg loss 0.001158, throughput 3.2345K wps
[Epoch 135 Batch 60/62] avg loss 0.00123383, throughput 3.16245K wps
Begin Testing...
[Epoch 135] train avg loss 0.00120515, dev acc 0.8496, dev avg loss 0.403589, throughput 3.20434K wps
[Epoch 136 Batch 30/62] avg loss 0.00126212, throughput 3.26366K wps
[Epoch 136 Batch 60/62] avg loss 0.00118602, throughput 3.14379K wps
Begin Testing...
[Epoch 136] train avg loss 0.00131031, dev acc 0.8496, dev avg loss 0.405487, throughput 3.20872K wps
[Epoch 137 Batch 30/62] avg loss 0.00116614, throughput 3.24979K wps
[Epoch 137 Batch 60/62] avg loss 0.00130914, throughput 3.16653K wps
Begin Testing...
[Epoch 137] train avg loss 0.00123935, dev acc 0.8437, dev avg loss 0.400147, throughput 3.21577K wps
[Epoch 138 Batch 30/62] avg loss 0.00114876, throughput 3.24703K wps
[Epoch 138 Batch 60/62] avg loss 0.00122893, throughput 3.19313K wps
Begin Testing...
[Epoch 138] train avg loss 0.00120131, dev acc 0.8437, dev avg loss 0.406947, throughput 3.22604K wps
[Epoch 139 Batch 30/62] avg loss 0.0011528, throughput 3.29324K wps
[Epoch 139 Batch 60/62] avg loss 0.0011626, throughput 3.15172K wps
Begin Testing...
[Epoch 139] train avg loss 0.00117297, dev acc 0.8466, dev avg loss 0.401623, throughput 3.22666K wps
[Epoch 140 Batch 30/62] avg loss 0.00121011, throughput 3.23324K wps
[Epoch 140 Batch 60/62] avg loss 0.00115775, throughput 3.18156K wps
Begin Testing...
[Epoch 140] train avg loss 0.00120417, dev acc 0.8378, dev avg loss 0.401717, throughput 3.21581K wps
[Epoch 141 Batch 30/62] avg loss 0.0011073, throughput 3.2744K wps
[Epoch 141 Batch 60/62] avg loss 0.00117166, throughput 3.18508K wps
Begin Testing...
[Epoch 141] train avg loss 0.00114364, dev acc 0.8437, dev avg loss 0.402345, throughput 3.23459K wps
[Epoch 142 Batch 30/62] avg loss 0.00122524, throughput 3.24093K wps
[Epoch 142 Batch 60/62] avg loss 0.0010482, throughput 3.16989K wps
Begin Testing...
[Epoch 142] train avg loss 0.00114104, dev acc 0.8407, dev avg loss 0.401988, throughput 3.2114K wps
[Epoch 143 Batch 30/62] avg loss 0.00109619, throughput 3.23056K wps
[Epoch 143 Batch 60/62] avg loss 0.00106824, throughput 3.16917K wps
Begin Testing...
[Epoch 143] train avg loss 0.00110914, dev acc 0.8437, dev avg loss 0.402415, throughput 3.20723K wps
[Epoch 144 Batch 30/62] avg loss 0.0011102, throughput 3.25516K wps
[Epoch 144 Batch 60/62] avg loss 0.00107301, throughput 3.18672K wps
Begin Testing...
[Epoch 144] train avg loss 0.00111035, dev acc 0.8466, dev avg loss 0.407574, throughput 3.22758K wps
[Epoch 145 Batch 30/62] avg loss 0.00107743, throughput 3.2741K wps
[Epoch 145 Batch 60/62] avg loss 0.00110053, throughput 3.18113K wps
Begin Testing...
[Epoch 145] train avg loss 0.00110291, dev acc 0.8437, dev avg loss 0.404043, throughput 3.23428K wps
[Epoch 146 Batch 30/62] avg loss 0.0011441, throughput 3.24367K wps
[Epoch 146 Batch 60/62] avg loss 0.00103995, throughput 3.18511K wps
Begin Testing...
[Epoch 146] train avg loss 0.00110344, dev acc 0.8407, dev avg loss 0.404134, throughput 3.21965K wps
[Epoch 147 Batch 30/62] avg loss 0.00098683, throughput 3.26594K wps
[Epoch 147 Batch 60/62] avg loss 0.00108643, throughput 3.15675K wps
Begin Testing...
[Epoch 147] train avg loss 0.00104901, dev acc 0.8466, dev avg loss 0.4055, throughput 3.21648K wps
[Epoch 148 Batch 30/62] avg loss 0.00112259, throughput 3.25421K wps
[Epoch 148 Batch 60/62] avg loss 0.00105626, throughput 3.15004K wps
Begin Testing...
[Epoch 148] train avg loss 0.00110405, dev acc 0.8496, dev avg loss 0.409642, throughput 3.2073K wps
[Epoch 149 Batch 30/62] avg loss 0.000989519, throughput 3.24871K wps
[Epoch 149 Batch 60/62] avg loss 0.00107194, throughput 3.15076K wps
Begin Testing...
[Epoch 149] train avg loss 0.00103309, dev acc 0.8437, dev avg loss 0.409576, throughput 3.20555K wps
[Epoch 150 Batch 30/62] avg loss 0.000917148, throughput 3.2596K wps
[Epoch 150 Batch 60/62] avg loss 0.00104164, throughput 3.16694K wps
Begin Testing...
[Epoch 150] train avg loss 0.000985006, dev acc 0.8407, dev avg loss 0.407092, throughput 3.21986K wps
[Epoch 151 Batch 30/62] avg loss 0.00102093, throughput 3.25806K wps
[Epoch 151 Batch 60/62] avg loss 0.0010371, throughput 3.19089K wps
Begin Testing...
[Epoch 151] train avg loss 0.00105247, dev acc 0.8466, dev avg loss 0.413291, throughput 3.23024K wps
[Epoch 152 Batch 30/62] avg loss 0.00104704, throughput 3.25952K wps
[Epoch 152 Batch 60/62] avg loss 0.000967719, throughput 3.16422K wps
Begin Testing...
[Epoch 152] train avg loss 0.00102265, dev acc 0.8407, dev avg loss 0.406275, throughput 3.21705K wps
[Epoch 153 Batch 30/62] avg loss 0.000955485, throughput 3.24862K wps
[Epoch 153 Batch 60/62] avg loss 0.00103006, throughput 3.13185K wps
Begin Testing...
[Epoch 153] train avg loss 0.00100455, dev acc 0.8378, dev avg loss 0.406671, throughput 3.1951K wps
[Epoch 154 Batch 30/62] avg loss 0.000836776, throughput 3.24998K wps
[Epoch 154 Batch 60/62] avg loss 0.00100767, throughput 3.13571K wps
Begin Testing...
[Epoch 154] train avg loss 0.000935857, dev acc 0.8378, dev avg loss 0.407009, throughput 3.19422K wps
[Epoch 155 Batch 30/62] avg loss 0.000885451, throughput 3.2654K wps
[Epoch 155 Batch 60/62] avg loss 0.000964668, throughput 3.14981K wps
Begin Testing...
[Epoch 155] train avg loss 0.000924592, dev acc 0.8378, dev avg loss 0.408405, throughput 3.21263K wps
[Epoch 156 Batch 30/62] avg loss 0.00087178, throughput 3.24694K wps
[Epoch 156 Batch 60/62] avg loss 0.000953181, throughput 3.15303K wps
Begin Testing...
[Epoch 156] train avg loss 0.000921692, dev acc 0.8378, dev avg loss 0.408626, throughput 3.20456K wps
[Epoch 157 Batch 30/62] avg loss 0.000950652, throughput 3.27553K wps
[Epoch 157 Batch 60/62] avg loss 0.000978885, throughput 3.16088K wps
Begin Testing...
[Epoch 157] train avg loss 0.000988944, dev acc 0.8407, dev avg loss 0.409322, throughput 3.22324K wps
[Epoch 158 Batch 30/62] avg loss 0.000960652, throughput 3.23562K wps
[Epoch 158 Batch 60/62] avg loss 0.000923171, throughput 3.16347K wps
Begin Testing...
[Epoch 158] train avg loss 0.000952645, dev acc 0.8407, dev avg loss 0.409051, throughput 3.2059K wps
[Epoch 159 Batch 30/62] avg loss 0.000875838, throughput 3.25856K wps
[Epoch 159 Batch 60/62] avg loss 0.000920967, throughput 3.18236K wps
Begin Testing...
[Epoch 159] train avg loss 0.000908508, dev acc 0.8378, dev avg loss 0.408372, throughput 3.22629K wps
[Epoch 160 Batch 30/62] avg loss 0.000900512, throughput 3.27347K wps
[Epoch 160 Batch 60/62] avg loss 0.000901154, throughput 3.16393K wps
Begin Testing...
[Epoch 160] train avg loss 0.000914601, dev acc 0.8437, dev avg loss 0.413876, throughput 3.22348K wps
[Epoch 161 Batch 30/62] avg loss 0.00081682, throughput 3.24988K wps
[Epoch 161 Batch 60/62] avg loss 0.000921989, throughput 3.14469K wps
Begin Testing...
[Epoch 161] train avg loss 0.000894754, dev acc 0.8378, dev avg loss 0.410974, throughput 3.20317K wps
[Epoch 162 Batch 30/62] avg loss 0.000941156, throughput 3.24553K wps
[Epoch 162 Batch 60/62] avg loss 0.000802051, throughput 3.14844K wps
Begin Testing...
[Epoch 162] train avg loss 0.000886813, dev acc 0.8378, dev avg loss 0.411818, throughput 3.20274K wps
[Epoch 163 Batch 30/62] avg loss 0.000819615, throughput 3.25889K wps
[Epoch 163 Batch 60/62] avg loss 0.000866993, throughput 3.13349K wps
Begin Testing...
[Epoch 163] train avg loss 0.000886667, dev acc 0.8496, dev avg loss 0.417006, throughput 3.20103K wps
[Epoch 164 Batch 30/62] avg loss 0.000928237, throughput 3.25729K wps
[Epoch 164 Batch 60/62] avg loss 0.000897526, throughput 3.15221K wps
Begin Testing...
[Epoch 164] train avg loss 0.000909757, dev acc 0.8466, dev avg loss 0.413948, throughput 3.21002K wps
[Epoch 165 Batch 30/62] avg loss 0.000867886, throughput 3.24834K wps
[Epoch 165 Batch 60/62] avg loss 0.000871119, throughput 3.16348K wps
Begin Testing...
[Epoch 165] train avg loss 0.000872623, dev acc 0.8378, dev avg loss 0.412925, throughput 3.2114K wps
[Epoch 166 Batch 30/62] avg loss 0.000913893, throughput 3.24569K wps
[Epoch 166 Batch 60/62] avg loss 0.000869908, throughput 3.16556K wps
Begin Testing...
[Epoch 166] train avg loss 0.000896341, dev acc 0.8407, dev avg loss 0.413815, throughput 3.21167K wps
[Epoch 167 Batch 30/62] avg loss 0.000937027, throughput 3.21996K wps
[Epoch 167 Batch 60/62] avg loss 0.00086624, throughput 3.16658K wps
Begin Testing...
[Epoch 167] train avg loss 0.000903147, dev acc 0.8437, dev avg loss 0.415977, throughput 3.19907K wps
[Epoch 168 Batch 30/62] avg loss 0.00081504, throughput 3.2317K wps
[Epoch 168 Batch 60/62] avg loss 0.000825115, throughput 3.15748K wps
Begin Testing...
[Epoch 168] train avg loss 0.000816721, dev acc 0.8437, dev avg loss 0.414828, throughput 3.20054K wps
[Epoch 169 Batch 30/62] avg loss 0.000791054, throughput 3.23152K wps
[Epoch 169 Batch 60/62] avg loss 0.000879886, throughput 3.16558K wps
Begin Testing...
[Epoch 169] train avg loss 0.000838755, dev acc 0.8378, dev avg loss 0.414819, throughput 3.20455K wps
[Epoch 170 Batch 30/62] avg loss 0.000781487, throughput 3.24735K wps
[Epoch 170 Batch 60/62] avg loss 0.000873738, throughput 3.16372K wps
Begin Testing...
[Epoch 170] train avg loss 0.000830254, dev acc 0.8437, dev avg loss 0.415007, throughput 3.21206K wps
[Epoch 171 Batch 30/62] avg loss 0.000802799, throughput 3.22428K wps
[Epoch 171 Batch 60/62] avg loss 0.000815636, throughput 3.14616K wps
Begin Testing...
[Epoch 171] train avg loss 0.000824586, dev acc 0.8437, dev avg loss 0.415976, throughput 3.19095K wps
[Epoch 172 Batch 30/62] avg loss 0.000783063, throughput 3.23417K wps
[Epoch 172 Batch 60/62] avg loss 0.000767788, throughput 3.15906K wps
Begin Testing...
[Epoch 172] train avg loss 0.000786149, dev acc 0.8437, dev avg loss 0.416616, throughput 3.2029K wps
[Epoch 173 Batch 30/62] avg loss 0.000788148, throughput 3.24589K wps
[Epoch 173 Batch 60/62] avg loss 0.0008021, throughput 3.16691K wps
Begin Testing...
[Epoch 173] train avg loss 0.000803167, dev acc 0.8466, dev avg loss 0.423611, throughput 3.21237K wps
[Epoch 174 Batch 30/62] avg loss 0.00074542, throughput 3.22868K wps
[Epoch 174 Batch 60/62] avg loss 0.00078851, throughput 3.15956K wps
Begin Testing...
[Epoch 174] train avg loss 0.00076482, dev acc 0.8466, dev avg loss 0.419527, throughput 3.20162K wps
[Epoch 175 Batch 30/62] avg loss 0.00080713, throughput 3.24349K wps
[Epoch 175 Batch 60/62] avg loss 0.000775451, throughput 3.19714K wps
Begin Testing...
[Epoch 175] train avg loss 0.000794876, dev acc 0.8407, dev avg loss 0.416965, throughput 3.2257K wps
[Epoch 176 Batch 30/62] avg loss 0.000758438, throughput 3.25468K wps
[Epoch 176 Batch 60/62] avg loss 0.000798023, throughput 3.16705K wps
Begin Testing...
[Epoch 176] train avg loss 0.000785955, dev acc 0.8437, dev avg loss 0.42017, throughput 3.21597K wps
[Epoch 177 Batch 30/62] avg loss 0.000696878, throughput 3.25449K wps
[Epoch 177 Batch 60/62] avg loss 0.000789322, throughput 3.16129K wps
Begin Testing...
[Epoch 177] train avg loss 0.000776826, dev acc 0.8466, dev avg loss 0.4172, throughput 3.21371K wps
[Epoch 178 Batch 30/62] avg loss 0.000754291, throughput 3.24968K wps
[Epoch 178 Batch 60/62] avg loss 0.000699679, throughput 3.15357K wps
Begin Testing...
[Epoch 178] train avg loss 0.000727838, dev acc 0.8466, dev avg loss 0.419713, throughput 3.20714K wps
[Epoch 179 Batch 30/62] avg loss 0.000775183, throughput 3.24934K wps
[Epoch 179 Batch 60/62] avg loss 0.000739611, throughput 3.15185K wps
Begin Testing...
[Epoch 179] train avg loss 0.000770162, dev acc 0.8437, dev avg loss 0.419011, throughput 3.20564K wps
[Epoch 180 Batch 30/62] avg loss 0.000819214, throughput 3.23868K wps
[Epoch 180 Batch 60/62] avg loss 0.000665515, throughput 3.1493K wps
Begin Testing...
[Epoch 180] train avg loss 0.000753941, dev acc 0.8407, dev avg loss 0.418744, throughput 3.19894K wps
[Epoch 181 Batch 30/62] avg loss 0.000670374, throughput 3.24977K wps
[Epoch 181 Batch 60/62] avg loss 0.000801806, throughput 3.15296K wps
Begin Testing...
[Epoch 181] train avg loss 0.000738884, dev acc 0.8407, dev avg loss 0.41933, throughput 3.20529K wps
[Epoch 182 Batch 30/62] avg loss 0.000640986, throughput 3.25459K wps
[Epoch 182 Batch 60/62] avg loss 0.00075251, throughput 3.15949K wps
Begin Testing...
[Epoch 182] train avg loss 0.00069369, dev acc 0.8466, dev avg loss 0.421982, throughput 3.21269K wps
[Epoch 183 Batch 30/62] avg loss 0.000817545, throughput 3.25483K wps
[Epoch 183 Batch 60/62] avg loss 0.000731832, throughput 3.16371K wps
Begin Testing...
[Epoch 183] train avg loss 0.000777945, dev acc 0.8407, dev avg loss 0.421007, throughput 3.21695K wps
[Epoch 184 Batch 30/62] avg loss 0.000745812, throughput 3.23429K wps
[Epoch 184 Batch 60/62] avg loss 0.000758532, throughput 3.19677K wps
Begin Testing...
[Epoch 184] train avg loss 0.000754565, dev acc 0.8437, dev avg loss 0.423129, throughput 3.22255K wps
[Epoch 185 Batch 30/62] avg loss 0.000781601, throughput 3.26775K wps
[Epoch 185 Batch 60/62] avg loss 0.000662684, throughput 3.16472K wps
Begin Testing...
[Epoch 185] train avg loss 0.000730471, dev acc 0.8437, dev avg loss 0.422122, throughput 3.22158K wps
[Epoch 186 Batch 30/62] avg loss 0.000764231, throughput 3.25969K wps
[Epoch 186 Batch 60/62] avg loss 0.000725359, throughput 3.17502K wps
Begin Testing...
[Epoch 186] train avg loss 0.000745411, dev acc 0.8466, dev avg loss 0.423248, throughput 3.22482K wps
[Epoch 187 Batch 30/62] avg loss 0.000656999, throughput 3.27893K wps
[Epoch 187 Batch 60/62] avg loss 0.000786882, throughput 3.14309K wps
Begin Testing...
[Epoch 187] train avg loss 0.000723593, dev acc 0.8466, dev avg loss 0.423615, throughput 3.21681K wps
[Epoch 188 Batch 30/62] avg loss 0.000680193, throughput 3.2625K wps
[Epoch 188 Batch 60/62] avg loss 0.000650376, throughput 3.15546K wps
Begin Testing...
[Epoch 188] train avg loss 0.000680365, dev acc 0.8407, dev avg loss 0.422953, throughput 3.21327K wps
[Epoch 189 Batch 30/62] avg loss 0.000678876, throughput 3.26007K wps
[Epoch 189 Batch 60/62] avg loss 0.000693678, throughput 3.1385K wps
Begin Testing...
[Epoch 189] train avg loss 0.000728686, dev acc 0.8407, dev avg loss 0.430882, throughput 3.20432K wps
[Epoch 190 Batch 30/62] avg loss 0.000719715, throughput 3.24675K wps
[Epoch 190 Batch 60/62] avg loss 0.000650824, throughput 3.17041K wps
Begin Testing...
[Epoch 190] train avg loss 0.00068864, dev acc 0.8437, dev avg loss 0.423589, throughput 3.21395K wps
[Epoch 191 Batch 30/62] avg loss 0.000665074, throughput 3.24358K wps
[Epoch 191 Batch 60/62] avg loss 0.000667189, throughput 3.14327K wps
Begin Testing...
[Epoch 191] train avg loss 0.000666489, dev acc 0.8466, dev avg loss 0.425574, throughput 3.19886K wps
[Epoch 192 Batch 30/62] avg loss 0.000607768, throughput 3.23692K wps
[Epoch 192 Batch 60/62] avg loss 0.000663213, throughput 3.167K wps
Begin Testing...
[Epoch 192] train avg loss 0.000640783, dev acc 0.8437, dev avg loss 0.427018, throughput 3.20836K wps
[Epoch 193 Batch 30/62] avg loss 0.000641234, throughput 3.25018K wps
[Epoch 193 Batch 60/62] avg loss 0.000735396, throughput 3.19331K wps
Begin Testing...
[Epoch 193] train avg loss 0.000686551, dev acc 0.8437, dev avg loss 0.425355, throughput 3.22816K wps
[Epoch 194 Batch 30/62] avg loss 0.000602857, throughput 3.27102K wps
[Epoch 194 Batch 60/62] avg loss 0.000681736, throughput 3.15736K wps
Begin Testing...
[Epoch 194] train avg loss 0.000638873, dev acc 0.8466, dev avg loss 0.427264, throughput 3.21752K wps
[Epoch 195 Batch 30/62] avg loss 0.000690048, throughput 3.25811K wps
[Epoch 195 Batch 60/62] avg loss 0.000633498, throughput 3.13181K wps
Begin Testing...
[Epoch 195] train avg loss 0.000668724, dev acc 0.8407, dev avg loss 0.425668, throughput 3.20223K wps
[Epoch 196 Batch 30/62] avg loss 0.000633123, throughput 3.24878K wps
[Epoch 196 Batch 60/62] avg loss 0.000699721, throughput 3.15622K wps
Begin Testing...
[Epoch 196] train avg loss 0.000699741, dev acc 0.8466, dev avg loss 0.427989, throughput 3.20812K wps
[Epoch 197 Batch 30/62] avg loss 0.000622794, throughput 3.25063K wps
[Epoch 197 Batch 60/62] avg loss 0.000614688, throughput 3.15786K wps
Begin Testing...
[Epoch 197] train avg loss 0.000626001, dev acc 0.8466, dev avg loss 0.428052, throughput 3.20909K wps
[Epoch 198 Batch 30/62] avg loss 0.000616123, throughput 3.24976K wps
[Epoch 198 Batch 60/62] avg loss 0.000667874, throughput 3.15063K wps
Begin Testing...
[Epoch 198] train avg loss 0.000654691, dev acc 0.8466, dev avg loss 0.430411, throughput 3.20528K wps
[Epoch 199 Batch 30/62] avg loss 0.000640891, throughput 3.2579K wps
[Epoch 199 Batch 60/62] avg loss 0.00067586, throughput 3.20935K wps
Begin Testing...
[Epoch 199] train avg loss 0.00067198, dev acc 0.8437, dev avg loss 0.429373, throughput 3.24112K wps
Test loss 0.314019, test acc 0.8621
Total time cost 460.29s
[Epoch 0 Batch 30/62] avg loss 0.0135574, throughput 3.07866K wps
[Epoch 0 Batch 60/62] avg loss 0.012979, throughput 3.18491K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133952, dev acc 0.6519, dev avg loss 0.641543, throughput 3.13922K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0130709, throughput 3.24171K wps
[Epoch 1 Batch 60/62] avg loss 0.0128744, throughput 3.16558K wps
Begin Testing...
[Epoch 1] train avg loss 0.0131227, dev acc 0.6519, dev avg loss 0.631503, throughput 3.20907K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0128439, throughput 3.27923K wps
[Epoch 2 Batch 60/62] avg loss 0.0128639, throughput 3.13792K wps
Begin Testing...
[Epoch 2] train avg loss 0.0130122, dev acc 0.6519, dev avg loss 0.625119, throughput 3.21116K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0127279, throughput 3.28489K wps
[Epoch 3 Batch 60/62] avg loss 0.0125027, throughput 3.1594K wps
Begin Testing...
[Epoch 3] train avg loss 0.0127523, dev acc 0.6519, dev avg loss 0.616692, throughput 3.22643K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0126343, throughput 3.22785K wps
[Epoch 4 Batch 60/62] avg loss 0.0121908, throughput 3.19766K wps
Begin Testing...
[Epoch 4] train avg loss 0.0126011, dev acc 0.6519, dev avg loss 0.609618, throughput 3.2194K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.012367, throughput 3.28611K wps
[Epoch 5 Batch 60/62] avg loss 0.0121978, throughput 3.17065K wps
Begin Testing...
[Epoch 5] train avg loss 0.0124231, dev acc 0.6519, dev avg loss 0.601991, throughput 3.23314K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0121676, throughput 3.27088K wps
[Epoch 6 Batch 60/62] avg loss 0.01198, throughput 3.16811K wps
Begin Testing...
[Epoch 6] train avg loss 0.0122455, dev acc 0.6519, dev avg loss 0.594572, throughput 3.22396K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0119005, throughput 3.25237K wps
[Epoch 7 Batch 60/62] avg loss 0.0119368, throughput 3.1731K wps
Begin Testing...
[Epoch 7] train avg loss 0.0120854, dev acc 0.6549, dev avg loss 0.587292, throughput 3.21922K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0116713, throughput 3.24862K wps
[Epoch 8 Batch 60/62] avg loss 0.011845, throughput 3.15592K wps
Begin Testing...
[Epoch 8] train avg loss 0.0118977, dev acc 0.6519, dev avg loss 0.578652, throughput 3.20737K wps
[Epoch 9 Batch 30/62] avg loss 0.0115793, throughput 3.23409K wps
[Epoch 9 Batch 60/62] avg loss 0.0116747, throughput 3.15045K wps
Begin Testing...
[Epoch 9] train avg loss 0.0117178, dev acc 0.6667, dev avg loss 0.570167, throughput 3.19645K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0115629, throughput 3.2725K wps
[Epoch 10 Batch 60/62] avg loss 0.0111805, throughput 3.16966K wps
Begin Testing...
[Epoch 10] train avg loss 0.0114645, dev acc 0.6519, dev avg loss 0.564529, throughput 3.22574K wps
[Epoch 11 Batch 30/62] avg loss 0.011223, throughput 3.25131K wps
[Epoch 11 Batch 60/62] avg loss 0.0111048, throughput 3.14373K wps
Begin Testing...
[Epoch 11] train avg loss 0.0112874, dev acc 0.6726, dev avg loss 0.553088, throughput 3.20335K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0113727, throughput 3.25312K wps
[Epoch 12 Batch 60/62] avg loss 0.0106706, throughput 3.17616K wps
Begin Testing...
[Epoch 12] train avg loss 0.0111691, dev acc 0.6962, dev avg loss 0.542883, throughput 3.21861K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0107939, throughput 3.2613K wps
[Epoch 13 Batch 60/62] avg loss 0.0108432, throughput 3.17437K wps
Begin Testing...
[Epoch 13] train avg loss 0.0109506, dev acc 0.7080, dev avg loss 0.533148, throughput 3.22344K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0105937, throughput 3.26849K wps
[Epoch 14 Batch 60/62] avg loss 0.0106055, throughput 3.16112K wps
Begin Testing...
[Epoch 14] train avg loss 0.010696, dev acc 0.7139, dev avg loss 0.523547, throughput 3.21994K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0104243, throughput 3.28289K wps
[Epoch 15 Batch 60/62] avg loss 0.0103302, throughput 3.15614K wps
Begin Testing...
[Epoch 15] train avg loss 0.0105615, dev acc 0.7493, dev avg loss 0.514658, throughput 3.22384K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0103354, throughput 3.25636K wps
[Epoch 16 Batch 60/62] avg loss 0.0101576, throughput 3.15248K wps
Begin Testing...
[Epoch 16] train avg loss 0.0103702, dev acc 0.7404, dev avg loss 0.504324, throughput 3.20964K wps
[Epoch 17 Batch 30/62] avg loss 0.00986336, throughput 3.25189K wps
[Epoch 17 Batch 60/62] avg loss 0.0100071, throughput 3.14713K wps
Begin Testing...
[Epoch 17] train avg loss 0.0100605, dev acc 0.7463, dev avg loss 0.494493, throughput 3.20533K wps
[Epoch 18 Batch 30/62] avg loss 0.00974508, throughput 3.25095K wps
[Epoch 18 Batch 60/62] avg loss 0.0097594, throughput 3.16575K wps
Begin Testing...
[Epoch 18] train avg loss 0.00989522, dev acc 0.7434, dev avg loss 0.488128, throughput 3.2151K wps
[Epoch 19 Batch 30/62] avg loss 0.00952682, throughput 3.25136K wps
[Epoch 19 Batch 60/62] avg loss 0.00965787, throughput 3.15065K wps
Begin Testing...
[Epoch 19] train avg loss 0.00971442, dev acc 0.7876, dev avg loss 0.476342, throughput 3.20416K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.00941179, throughput 3.24575K wps
[Epoch 20 Batch 60/62] avg loss 0.00924097, throughput 3.15792K wps
Begin Testing...
[Epoch 20] train avg loss 0.00947755, dev acc 0.7434, dev avg loss 0.472154, throughput 3.20742K wps
[Epoch 21 Batch 30/62] avg loss 0.00931306, throughput 3.24868K wps
[Epoch 21 Batch 60/62] avg loss 0.0093113, throughput 3.16769K wps
Begin Testing...
[Epoch 21] train avg loss 0.00947986, dev acc 0.8083, dev avg loss 0.459827, throughput 3.21385K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.00892197, throughput 3.25363K wps
[Epoch 22 Batch 60/62] avg loss 0.00903774, throughput 3.17139K wps
Begin Testing...
[Epoch 22] train avg loss 0.0091496, dev acc 0.7758, dev avg loss 0.45287, throughput 3.22023K wps
[Epoch 23 Batch 30/62] avg loss 0.00906015, throughput 3.23046K wps
[Epoch 23 Batch 60/62] avg loss 0.00872183, throughput 3.1978K wps
Begin Testing...
[Epoch 23] train avg loss 0.00903507, dev acc 0.8142, dev avg loss 0.445254, throughput 3.21739K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.00867489, throughput 3.28672K wps
[Epoch 24 Batch 60/62] avg loss 0.00872264, throughput 3.16349K wps
Begin Testing...
[Epoch 24] train avg loss 0.00873972, dev acc 0.7640, dev avg loss 0.442282, throughput 3.22934K wps
[Epoch 25 Batch 30/62] avg loss 0.00868582, throughput 3.27094K wps
[Epoch 25 Batch 60/62] avg loss 0.00827204, throughput 3.16175K wps
Begin Testing...
[Epoch 25] train avg loss 0.00854924, dev acc 0.7965, dev avg loss 0.432367, throughput 3.22135K wps
[Epoch 26 Batch 30/62] avg loss 0.00814497, throughput 3.24028K wps
[Epoch 26 Batch 60/62] avg loss 0.00850115, throughput 3.15265K wps
Begin Testing...
[Epoch 26] train avg loss 0.00844233, dev acc 0.8024, dev avg loss 0.426936, throughput 3.2018K wps
[Epoch 27 Batch 30/62] avg loss 0.00810176, throughput 3.23982K wps
[Epoch 27 Batch 60/62] avg loss 0.00801792, throughput 3.15289K wps
Begin Testing...
[Epoch 27] train avg loss 0.00817964, dev acc 0.8112, dev avg loss 0.421161, throughput 3.20201K wps
[Epoch 28 Batch 30/62] avg loss 0.00807233, throughput 3.24566K wps
[Epoch 28 Batch 60/62] avg loss 0.00796654, throughput 3.15716K wps
Begin Testing...
[Epoch 28] train avg loss 0.00810722, dev acc 0.8260, dev avg loss 0.415569, throughput 3.2085K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.00786674, throughput 3.26909K wps
[Epoch 29 Batch 60/62] avg loss 0.00787671, throughput 3.20308K wps
Begin Testing...
[Epoch 29] train avg loss 0.00797557, dev acc 0.8083, dev avg loss 0.411946, throughput 3.24296K wps
[Epoch 30 Batch 30/62] avg loss 0.00796476, throughput 3.27158K wps
[Epoch 30 Batch 60/62] avg loss 0.00756156, throughput 3.16573K wps
Begin Testing...
[Epoch 30] train avg loss 0.00788859, dev acc 0.8142, dev avg loss 0.406946, throughput 3.22383K wps
[Epoch 31 Batch 30/62] avg loss 0.00765778, throughput 3.23778K wps
[Epoch 31 Batch 60/62] avg loss 0.00755673, throughput 3.14496K wps
Begin Testing...
[Epoch 31] train avg loss 0.00768683, dev acc 0.7847, dev avg loss 0.410015, throughput 3.19672K wps
[Epoch 32 Batch 30/62] avg loss 0.00767625, throughput 3.24525K wps
[Epoch 32 Batch 60/62] avg loss 0.00747715, throughput 3.15585K wps
Begin Testing...
[Epoch 32] train avg loss 0.0077104, dev acc 0.8289, dev avg loss 0.398518, throughput 3.20629K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/62] avg loss 0.00715346, throughput 3.26697K wps
[Epoch 33 Batch 60/62] avg loss 0.00734539, throughput 3.15733K wps
Begin Testing...
[Epoch 33] train avg loss 0.00747873, dev acc 0.8319, dev avg loss 0.394641, throughput 3.2172K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.00706986, throughput 3.27984K wps
[Epoch 34 Batch 60/62] avg loss 0.00709722, throughput 3.16028K wps
Begin Testing...
[Epoch 34] train avg loss 0.00725084, dev acc 0.8260, dev avg loss 0.391031, throughput 3.22478K wps
[Epoch 35 Batch 30/62] avg loss 0.00717212, throughput 3.24901K wps
[Epoch 35 Batch 60/62] avg loss 0.00696186, throughput 3.15867K wps
Begin Testing...
[Epoch 35] train avg loss 0.00716677, dev acc 0.8201, dev avg loss 0.38984, throughput 3.21019K wps
[Epoch 36 Batch 30/62] avg loss 0.00677139, throughput 3.22791K wps
[Epoch 36 Batch 60/62] avg loss 0.00718463, throughput 3.14928K wps
Begin Testing...
[Epoch 36] train avg loss 0.00700991, dev acc 0.8142, dev avg loss 0.388916, throughput 3.19387K wps
[Epoch 37 Batch 30/62] avg loss 0.00670368, throughput 3.26028K wps
[Epoch 37 Batch 60/62] avg loss 0.00688251, throughput 3.13881K wps
Begin Testing...
[Epoch 37] train avg loss 0.00692506, dev acc 0.8171, dev avg loss 0.385076, throughput 3.20645K wps
[Epoch 38 Batch 30/62] avg loss 0.00680024, throughput 3.23948K wps
[Epoch 38 Batch 60/62] avg loss 0.00655596, throughput 3.17035K wps
Begin Testing...
[Epoch 38] train avg loss 0.00677993, dev acc 0.8289, dev avg loss 0.379224, throughput 3.21111K wps
[Epoch 39 Batch 30/62] avg loss 0.00635235, throughput 3.22705K wps
[Epoch 39 Batch 60/62] avg loss 0.00664837, throughput 3.16554K wps
Begin Testing...
[Epoch 39] train avg loss 0.00657837, dev acc 0.8348, dev avg loss 0.376032, throughput 3.2023K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/62] avg loss 0.00651779, throughput 3.24429K wps
[Epoch 40 Batch 60/62] avg loss 0.00643835, throughput 3.15378K wps
Begin Testing...
[Epoch 40] train avg loss 0.00658742, dev acc 0.8171, dev avg loss 0.378361, throughput 3.20428K wps
[Epoch 41 Batch 30/62] avg loss 0.00627655, throughput 3.24212K wps
[Epoch 41 Batch 60/62] avg loss 0.00611812, throughput 3.18067K wps
Begin Testing...
[Epoch 41] train avg loss 0.00629779, dev acc 0.8289, dev avg loss 0.370833, throughput 3.21779K wps
[Epoch 42 Batch 30/62] avg loss 0.00623457, throughput 3.26548K wps
[Epoch 42 Batch 60/62] avg loss 0.00636364, throughput 3.20299K wps
Begin Testing...
[Epoch 42] train avg loss 0.00635249, dev acc 0.8319, dev avg loss 0.370998, throughput 3.23948K wps
[Epoch 43 Batch 30/62] avg loss 0.00580316, throughput 3.25637K wps
[Epoch 43 Batch 60/62] avg loss 0.00616958, throughput 3.16592K wps
Begin Testing...
[Epoch 43] train avg loss 0.00603585, dev acc 0.8319, dev avg loss 0.369344, throughput 3.21577K wps
[Epoch 44 Batch 30/62] avg loss 0.00577095, throughput 3.24617K wps
[Epoch 44 Batch 60/62] avg loss 0.00611262, throughput 3.1603K wps
Begin Testing...
[Epoch 44] train avg loss 0.00597394, dev acc 0.8319, dev avg loss 0.364552, throughput 3.2091K wps
[Epoch 45 Batch 30/62] avg loss 0.00591174, throughput 3.23415K wps
[Epoch 45 Batch 60/62] avg loss 0.0058169, throughput 3.1659K wps
Begin Testing...
[Epoch 45] train avg loss 0.00590829, dev acc 0.8171, dev avg loss 0.370522, throughput 3.20709K wps
[Epoch 46 Batch 30/62] avg loss 0.00559361, throughput 3.24783K wps
[Epoch 46 Batch 60/62] avg loss 0.00567178, throughput 3.18862K wps
Begin Testing...
[Epoch 46] train avg loss 0.0057223, dev acc 0.8319, dev avg loss 0.360828, throughput 3.2236K wps
[Epoch 47 Batch 30/62] avg loss 0.0055992, throughput 3.28185K wps
[Epoch 47 Batch 60/62] avg loss 0.00558164, throughput 3.17185K wps
Begin Testing...
[Epoch 47] train avg loss 0.00566658, dev acc 0.8289, dev avg loss 0.362718, throughput 3.23141K wps
[Epoch 48 Batch 30/62] avg loss 0.00574772, throughput 3.24801K wps
[Epoch 48 Batch 60/62] avg loss 0.00538274, throughput 3.15121K wps
Begin Testing...
[Epoch 48] train avg loss 0.00565087, dev acc 0.8260, dev avg loss 0.361251, throughput 3.20646K wps
[Epoch 49 Batch 30/62] avg loss 0.00545892, throughput 3.25633K wps
[Epoch 49 Batch 60/62] avg loss 0.0054308, throughput 3.1968K wps
Begin Testing...
[Epoch 49] train avg loss 0.00553843, dev acc 0.8142, dev avg loss 0.36615, throughput 3.23192K wps
[Epoch 50 Batch 30/62] avg loss 0.00540096, throughput 3.27127K wps
[Epoch 50 Batch 60/62] avg loss 0.00508417, throughput 3.1553K wps
Begin Testing...
[Epoch 50] train avg loss 0.00534031, dev acc 0.8230, dev avg loss 0.359594, throughput 3.21664K wps
[Epoch 51 Batch 30/62] avg loss 0.00535777, throughput 3.2509K wps
[Epoch 51 Batch 60/62] avg loss 0.00538751, throughput 3.13842K wps
Begin Testing...
[Epoch 51] train avg loss 0.00542528, dev acc 0.8378, dev avg loss 0.354108, throughput 3.19984K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00521326, throughput 3.27236K wps
[Epoch 52 Batch 60/62] avg loss 0.00526752, throughput 3.16426K wps
Begin Testing...
[Epoch 52] train avg loss 0.00528527, dev acc 0.8407, dev avg loss 0.352234, throughput 3.22253K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.00510656, throughput 3.26709K wps
[Epoch 53 Batch 60/62] avg loss 0.00483918, throughput 3.17188K wps
Begin Testing...
[Epoch 53] train avg loss 0.00505182, dev acc 0.8171, dev avg loss 0.358369, throughput 3.2232K wps
[Epoch 54 Batch 30/62] avg loss 0.00493172, throughput 3.26158K wps
[Epoch 54 Batch 60/62] avg loss 0.00507339, throughput 3.16053K wps
Begin Testing...
[Epoch 54] train avg loss 0.00506992, dev acc 0.8230, dev avg loss 0.352116, throughput 3.21702K wps
[Epoch 55 Batch 30/62] avg loss 0.00461902, throughput 3.25631K wps
[Epoch 55 Batch 60/62] avg loss 0.00498838, throughput 3.21318K wps
Begin Testing...
[Epoch 55] train avg loss 0.00482692, dev acc 0.8201, dev avg loss 0.351037, throughput 3.24043K wps
[Epoch 56 Batch 30/62] avg loss 0.00460913, throughput 3.2674K wps
[Epoch 56 Batch 60/62] avg loss 0.00465399, throughput 3.18831K wps
Begin Testing...
[Epoch 56] train avg loss 0.00468218, dev acc 0.8230, dev avg loss 0.354941, throughput 3.23554K wps
[Epoch 57 Batch 30/62] avg loss 0.00479465, throughput 3.29812K wps
[Epoch 57 Batch 60/62] avg loss 0.00472585, throughput 3.19605K wps
Begin Testing...
[Epoch 57] train avg loss 0.00487994, dev acc 0.8319, dev avg loss 0.349553, throughput 3.25316K wps
[Epoch 58 Batch 30/62] avg loss 0.00461715, throughput 3.27256K wps
[Epoch 58 Batch 60/62] avg loss 0.00458143, throughput 3.17298K wps
Begin Testing...
[Epoch 58] train avg loss 0.00468395, dev acc 0.8437, dev avg loss 0.344143, throughput 3.22729K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/62] avg loss 0.00449204, throughput 3.27639K wps
[Epoch 59 Batch 60/62] avg loss 0.00460614, throughput 3.15945K wps
Begin Testing...
[Epoch 59] train avg loss 0.00459681, dev acc 0.8378, dev avg loss 0.345372, throughput 3.22178K wps
[Epoch 60 Batch 30/62] avg loss 0.00449852, throughput 3.25966K wps
[Epoch 60 Batch 60/62] avg loss 0.00432719, throughput 3.13504K wps
Begin Testing...
[Epoch 60] train avg loss 0.00449311, dev acc 0.8437, dev avg loss 0.34313, throughput 3.2017K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/62] avg loss 0.00448038, throughput 3.28085K wps
[Epoch 61 Batch 60/62] avg loss 0.00430096, throughput 3.16086K wps
Begin Testing...
[Epoch 61] train avg loss 0.00446496, dev acc 0.8407, dev avg loss 0.342636, throughput 3.2256K wps
[Epoch 62 Batch 30/62] avg loss 0.00431065, throughput 3.25035K wps
[Epoch 62 Batch 60/62] avg loss 0.00429934, throughput 3.16064K wps
Begin Testing...
[Epoch 62] train avg loss 0.00434144, dev acc 0.8201, dev avg loss 0.34909, throughput 3.21122K wps
[Epoch 63 Batch 30/62] avg loss 0.00416075, throughput 3.23566K wps
[Epoch 63 Batch 60/62] avg loss 0.00406999, throughput 3.17501K wps
Begin Testing...
[Epoch 63] train avg loss 0.00414182, dev acc 0.8437, dev avg loss 0.340642, throughput 3.21205K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/62] avg loss 0.00418729, throughput 3.2458K wps
[Epoch 64 Batch 60/62] avg loss 0.00383496, throughput 3.20453K wps
Begin Testing...
[Epoch 64] train avg loss 0.00408099, dev acc 0.8496, dev avg loss 0.339674, throughput 3.23329K wps
Observed Improvement.
Begin Testing...
[Epoch 65 Batch 30/62] avg loss 0.00413762, throughput 3.29506K wps
[Epoch 65 Batch 60/62] avg loss 0.00399047, throughput 3.17466K wps
Begin Testing...
[Epoch 65] train avg loss 0.0041188, dev acc 0.8348, dev avg loss 0.341747, throughput 3.23981K wps
[Epoch 66 Batch 30/62] avg loss 0.00369099, throughput 3.23783K wps
[Epoch 66 Batch 60/62] avg loss 0.00409561, throughput 3.14773K wps
Begin Testing...
[Epoch 66] train avg loss 0.00392628, dev acc 0.8407, dev avg loss 0.336997, throughput 3.19899K wps
[Epoch 67 Batch 30/62] avg loss 0.00365632, throughput 3.24741K wps
[Epoch 67 Batch 60/62] avg loss 0.00393873, throughput 3.16828K wps
Begin Testing...
[Epoch 67] train avg loss 0.00381508, dev acc 0.8466, dev avg loss 0.337497, throughput 3.21602K wps
[Epoch 68 Batch 30/62] avg loss 0.0036876, throughput 3.23941K wps
[Epoch 68 Batch 60/62] avg loss 0.00392383, throughput 3.20405K wps
Begin Testing...
[Epoch 68] train avg loss 0.00384256, dev acc 0.8496, dev avg loss 0.336212, throughput 3.22654K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/62] avg loss 0.00366884, throughput 3.23924K wps
[Epoch 69 Batch 60/62] avg loss 0.00373269, throughput 3.2121K wps
Begin Testing...
[Epoch 69] train avg loss 0.00379059, dev acc 0.8407, dev avg loss 0.337869, throughput 3.22391K wps
[Epoch 70 Batch 30/62] avg loss 0.00346667, throughput 3.27383K wps
[Epoch 70 Batch 60/62] avg loss 0.003807, throughput 3.18637K wps
Begin Testing...
[Epoch 70] train avg loss 0.00373967, dev acc 0.8348, dev avg loss 0.33963, throughput 3.23817K wps
[Epoch 71 Batch 30/62] avg loss 0.00347218, throughput 3.29084K wps
[Epoch 71 Batch 60/62] avg loss 0.00359777, throughput 3.17764K wps
Begin Testing...
[Epoch 71] train avg loss 0.00359787, dev acc 0.8378, dev avg loss 0.336842, throughput 3.23918K wps
[Epoch 72 Batch 30/62] avg loss 0.00329293, throughput 3.24709K wps
[Epoch 72 Batch 60/62] avg loss 0.00348338, throughput 3.17324K wps
Begin Testing...
[Epoch 72] train avg loss 0.00346843, dev acc 0.8319, dev avg loss 0.341734, throughput 3.21847K wps
[Epoch 73 Batch 30/62] avg loss 0.00333322, throughput 3.25217K wps
[Epoch 73 Batch 60/62] avg loss 0.00339116, throughput 3.21355K wps
Begin Testing...
[Epoch 73] train avg loss 0.00341493, dev acc 0.8437, dev avg loss 0.335365, throughput 3.23776K wps
[Epoch 74 Batch 30/62] avg loss 0.00336357, throughput 3.26212K wps
[Epoch 74 Batch 60/62] avg loss 0.00322253, throughput 3.20374K wps
Begin Testing...
[Epoch 74] train avg loss 0.00329801, dev acc 0.8437, dev avg loss 0.334012, throughput 3.23812K wps
[Epoch 75 Batch 30/62] avg loss 0.00335597, throughput 3.26151K wps
[Epoch 75 Batch 60/62] avg loss 0.00342462, throughput 3.19669K wps
Begin Testing...
[Epoch 75] train avg loss 0.00341396, dev acc 0.8378, dev avg loss 0.335835, throughput 3.23686K wps
[Epoch 76 Batch 30/62] avg loss 0.00312866, throughput 3.25047K wps
[Epoch 76 Batch 60/62] avg loss 0.00318736, throughput 3.19479K wps
Begin Testing...
[Epoch 76] train avg loss 0.00319547, dev acc 0.8230, dev avg loss 0.344446, throughput 3.22848K wps
[Epoch 77 Batch 30/62] avg loss 0.00332871, throughput 3.26542K wps
[Epoch 77 Batch 60/62] avg loss 0.00309796, throughput 3.19186K wps
Begin Testing...
[Epoch 77] train avg loss 0.00328664, dev acc 0.8171, dev avg loss 0.346438, throughput 3.23623K wps
[Epoch 78 Batch 30/62] avg loss 0.00307982, throughput 3.28206K wps
[Epoch 78 Batch 60/62] avg loss 0.00306263, throughput 3.17156K wps
Begin Testing...
[Epoch 78] train avg loss 0.00311299, dev acc 0.8437, dev avg loss 0.332623, throughput 3.23047K wps
[Epoch 79 Batch 30/62] avg loss 0.00310181, throughput 3.25278K wps
[Epoch 79 Batch 60/62] avg loss 0.0029528, throughput 3.15796K wps
Begin Testing...
[Epoch 79] train avg loss 0.00306254, dev acc 0.8407, dev avg loss 0.334645, throughput 3.21258K wps
[Epoch 80 Batch 30/62] avg loss 0.00294142, throughput 3.2724K wps
[Epoch 80 Batch 60/62] avg loss 0.00290534, throughput 3.18644K wps
Begin Testing...
[Epoch 80] train avg loss 0.00307193, dev acc 0.8289, dev avg loss 0.343946, throughput 3.23437K wps
[Epoch 81 Batch 30/62] avg loss 0.00300165, throughput 3.25357K wps
[Epoch 81 Batch 60/62] avg loss 0.00290829, throughput 3.17962K wps
Begin Testing...
[Epoch 81] train avg loss 0.0029807, dev acc 0.8407, dev avg loss 0.335382, throughput 3.22429K wps
[Epoch 82 Batch 30/62] avg loss 0.00292196, throughput 3.25179K wps
[Epoch 82 Batch 60/62] avg loss 0.00280803, throughput 3.19429K wps
Begin Testing...
[Epoch 82] train avg loss 0.00289172, dev acc 0.8378, dev avg loss 0.334094, throughput 3.23034K wps
[Epoch 83 Batch 30/62] avg loss 0.00294166, throughput 3.27051K wps
[Epoch 83 Batch 60/62] avg loss 0.00282727, throughput 3.17797K wps
Begin Testing...
[Epoch 83] train avg loss 0.00289269, dev acc 0.8378, dev avg loss 0.335967, throughput 3.23135K wps
[Epoch 84 Batch 30/62] avg loss 0.00266406, throughput 3.28433K wps
[Epoch 84 Batch 60/62] avg loss 0.00284846, throughput 3.16872K wps
Begin Testing...
[Epoch 84] train avg loss 0.00284142, dev acc 0.8260, dev avg loss 0.343972, throughput 3.23136K wps
[Epoch 85 Batch 30/62] avg loss 0.00258631, throughput 3.26449K wps
[Epoch 85 Batch 60/62] avg loss 0.00282381, throughput 3.19982K wps
Begin Testing...
[Epoch 85] train avg loss 0.00275216, dev acc 0.8289, dev avg loss 0.342097, throughput 3.24066K wps
[Epoch 86 Batch 30/62] avg loss 0.00270572, throughput 3.27561K wps
[Epoch 86 Batch 60/62] avg loss 0.00279859, throughput 3.16871K wps
Begin Testing...
[Epoch 86] train avg loss 0.00275154, dev acc 0.8407, dev avg loss 0.333482, throughput 3.22978K wps
[Epoch 87 Batch 30/62] avg loss 0.0026375, throughput 3.307K wps
[Epoch 87 Batch 60/62] avg loss 0.00252509, throughput 3.16593K wps
Begin Testing...
[Epoch 87] train avg loss 0.0026071, dev acc 0.8378, dev avg loss 0.334414, throughput 3.24081K wps
[Epoch 88 Batch 30/62] avg loss 0.00246377, throughput 3.24443K wps
[Epoch 88 Batch 60/62] avg loss 0.00266045, throughput 3.19616K wps
Begin Testing...
[Epoch 88] train avg loss 0.00261943, dev acc 0.8407, dev avg loss 0.331402, throughput 3.2284K wps
[Epoch 89 Batch 30/62] avg loss 0.00248729, throughput 3.27773K wps
[Epoch 89 Batch 60/62] avg loss 0.00253263, throughput 3.17293K wps
Begin Testing...
[Epoch 89] train avg loss 0.00254793, dev acc 0.8407, dev avg loss 0.331615, throughput 3.23102K wps
[Epoch 90 Batch 30/62] avg loss 0.00241558, throughput 3.23884K wps
[Epoch 90 Batch 60/62] avg loss 0.00256366, throughput 3.1792K wps
Begin Testing...
[Epoch 90] train avg loss 0.0025617, dev acc 0.8348, dev avg loss 0.340105, throughput 3.21699K wps
[Epoch 91 Batch 30/62] avg loss 0.00246904, throughput 3.28799K wps
[Epoch 91 Batch 60/62] avg loss 0.00244432, throughput 3.16104K wps
Begin Testing...
[Epoch 91] train avg loss 0.00248673, dev acc 0.8260, dev avg loss 0.341737, throughput 3.22949K wps
[Epoch 92 Batch 30/62] avg loss 0.00232527, throughput 3.26723K wps
[Epoch 92 Batch 60/62] avg loss 0.00251019, throughput 3.16868K wps
Begin Testing...
[Epoch 92] train avg loss 0.00247087, dev acc 0.8378, dev avg loss 0.338125, throughput 3.22556K wps
[Epoch 93 Batch 30/62] avg loss 0.00230413, throughput 3.26285K wps
[Epoch 93 Batch 60/62] avg loss 0.00231499, throughput 3.19256K wps
Begin Testing...
[Epoch 93] train avg loss 0.00233786, dev acc 0.8437, dev avg loss 0.335256, throughput 3.23333K wps
[Epoch 94 Batch 30/62] avg loss 0.00244875, throughput 3.27296K wps
[Epoch 94 Batch 60/62] avg loss 0.00227085, throughput 3.19848K wps
Begin Testing...
[Epoch 94] train avg loss 0.00239145, dev acc 0.8230, dev avg loss 0.348399, throughput 3.24253K wps
[Epoch 95 Batch 30/62] avg loss 0.00215796, throughput 3.27344K wps
[Epoch 95 Batch 60/62] avg loss 0.00226651, throughput 3.16392K wps
Begin Testing...
[Epoch 95] train avg loss 0.00225415, dev acc 0.8378, dev avg loss 0.339281, throughput 3.22368K wps
[Epoch 96 Batch 30/62] avg loss 0.00220786, throughput 3.27297K wps
[Epoch 96 Batch 60/62] avg loss 0.00219197, throughput 3.21745K wps
Begin Testing...
[Epoch 96] train avg loss 0.00220836, dev acc 0.8407, dev avg loss 0.335764, throughput 3.25121K wps
[Epoch 97 Batch 30/62] avg loss 0.0021009, throughput 3.25738K wps
[Epoch 97 Batch 60/62] avg loss 0.00229194, throughput 3.16831K wps
Begin Testing...
[Epoch 97] train avg loss 0.00225763, dev acc 0.8348, dev avg loss 0.340535, throughput 3.21833K wps
[Epoch 98 Batch 30/62] avg loss 0.00224455, throughput 3.21859K wps
[Epoch 98 Batch 60/62] avg loss 0.00214095, throughput 3.16433K wps
Begin Testing...
[Epoch 98] train avg loss 0.00219527, dev acc 0.8407, dev avg loss 0.337647, throughput 3.19811K wps
[Epoch 99 Batch 30/62] avg loss 0.00205773, throughput 3.25935K wps
[Epoch 99 Batch 60/62] avg loss 0.00218426, throughput 3.22355K wps
Begin Testing...
[Epoch 99] train avg loss 0.00217509, dev acc 0.8437, dev avg loss 0.334212, throughput 3.24709K wps
[Epoch 100 Batch 30/62] avg loss 0.00222236, throughput 3.2565K wps
[Epoch 100 Batch 60/62] avg loss 0.00208779, throughput 3.1779K wps
Begin Testing...
[Epoch 100] train avg loss 0.00216885, dev acc 0.8378, dev avg loss 0.340576, throughput 3.22519K wps
[Epoch 101 Batch 30/62] avg loss 0.00200013, throughput 3.24729K wps
[Epoch 101 Batch 60/62] avg loss 0.00206652, throughput 3.21932K wps
Begin Testing...
[Epoch 101] train avg loss 0.0020458, dev acc 0.8466, dev avg loss 0.335835, throughput 3.2385K wps
[Epoch 102 Batch 30/62] avg loss 0.00201626, throughput 3.25287K wps
[Epoch 102 Batch 60/62] avg loss 0.00201218, throughput 3.18214K wps
Begin Testing...
[Epoch 102] train avg loss 0.0020336, dev acc 0.8407, dev avg loss 0.329873, throughput 3.22539K wps
[Epoch 103 Batch 30/62] avg loss 0.00197718, throughput 3.29008K wps
[Epoch 103 Batch 60/62] avg loss 0.00199412, throughput 3.17178K wps
Begin Testing...
[Epoch 103] train avg loss 0.00202932, dev acc 0.8407, dev avg loss 0.334737, throughput 3.23569K wps
[Epoch 104 Batch 30/62] avg loss 0.00194971, throughput 3.23944K wps
[Epoch 104 Batch 60/62] avg loss 0.00191562, throughput 3.15903K wps
Begin Testing...
[Epoch 104] train avg loss 0.00199383, dev acc 0.8437, dev avg loss 0.330664, throughput 3.20516K wps
[Epoch 105 Batch 30/62] avg loss 0.00203628, throughput 3.24507K wps
[Epoch 105 Batch 60/62] avg loss 0.00192819, throughput 3.15403K wps
Begin Testing...
[Epoch 105] train avg loss 0.00197733, dev acc 0.8437, dev avg loss 0.337417, throughput 3.20445K wps
[Epoch 106 Batch 30/62] avg loss 0.00208789, throughput 3.24632K wps
[Epoch 106 Batch 60/62] avg loss 0.00177032, throughput 3.1795K wps
Begin Testing...
[Epoch 106] train avg loss 0.00192688, dev acc 0.8319, dev avg loss 0.343007, throughput 3.22117K wps
[Epoch 107 Batch 30/62] avg loss 0.00181306, throughput 3.29379K wps
[Epoch 107 Batch 60/62] avg loss 0.00181528, throughput 3.16394K wps
Begin Testing...
[Epoch 107] train avg loss 0.00183095, dev acc 0.8437, dev avg loss 0.330258, throughput 3.23415K wps
[Epoch 108 Batch 30/62] avg loss 0.00176861, throughput 3.27157K wps
[Epoch 108 Batch 60/62] avg loss 0.00181457, throughput 3.21654K wps
Begin Testing...
[Epoch 108] train avg loss 0.00179262, dev acc 0.8407, dev avg loss 0.337033, throughput 3.24919K wps
[Epoch 109 Batch 30/62] avg loss 0.00169636, throughput 3.26788K wps
[Epoch 109 Batch 60/62] avg loss 0.00183696, throughput 3.17074K wps
Begin Testing...
[Epoch 109] train avg loss 0.00178534, dev acc 0.8437, dev avg loss 0.338827, throughput 3.22668K wps
[Epoch 110 Batch 30/62] avg loss 0.00166332, throughput 3.2405K wps
[Epoch 110 Batch 60/62] avg loss 0.00185661, throughput 3.22079K wps
Begin Testing...
[Epoch 110] train avg loss 0.0017643, dev acc 0.8319, dev avg loss 0.343077, throughput 3.23645K wps
[Epoch 111 Batch 30/62] avg loss 0.00162356, throughput 3.2612K wps
[Epoch 111 Batch 60/62] avg loss 0.00179038, throughput 3.21577K wps
Begin Testing...
[Epoch 111] train avg loss 0.0017374, dev acc 0.8437, dev avg loss 0.332604, throughput 3.245K wps
[Epoch 112 Batch 30/62] avg loss 0.00170272, throughput 3.27494K wps
[Epoch 112 Batch 60/62] avg loss 0.00171869, throughput 3.17332K wps
Begin Testing...
[Epoch 112] train avg loss 0.00171709, dev acc 0.8348, dev avg loss 0.342721, throughput 3.22965K wps
[Epoch 113 Batch 30/62] avg loss 0.0017253, throughput 3.2705K wps
[Epoch 113 Batch 60/62] avg loss 0.00160557, throughput 3.20464K wps
Begin Testing...
[Epoch 113] train avg loss 0.00167975, dev acc 0.8407, dev avg loss 0.33623, throughput 3.24357K wps
[Epoch 114 Batch 30/62] avg loss 0.00176034, throughput 3.2666K wps
[Epoch 114 Batch 60/62] avg loss 0.00164217, throughput 3.21337K wps
Begin Testing...
[Epoch 114] train avg loss 0.00175053, dev acc 0.8407, dev avg loss 0.333286, throughput 3.24529K wps
[Epoch 115 Batch 30/62] avg loss 0.00163301, throughput 3.25605K wps
[Epoch 115 Batch 60/62] avg loss 0.00175662, throughput 3.15835K wps
Begin Testing...
[Epoch 115] train avg loss 0.00170449, dev acc 0.8437, dev avg loss 0.339154, throughput 3.21226K wps
[Epoch 116 Batch 30/62] avg loss 0.00158431, throughput 3.24991K wps
[Epoch 116 Batch 60/62] avg loss 0.00170231, throughput 3.17238K wps
Begin Testing...
[Epoch 116] train avg loss 0.0016769, dev acc 0.8407, dev avg loss 0.336739, throughput 3.21869K wps
[Epoch 117 Batch 30/62] avg loss 0.00157317, throughput 3.26526K wps
[Epoch 117 Batch 60/62] avg loss 0.00160248, throughput 3.1935K wps
Begin Testing...
[Epoch 117] train avg loss 0.00162136, dev acc 0.8437, dev avg loss 0.333797, throughput 3.23618K wps
[Epoch 118 Batch 30/62] avg loss 0.00152255, throughput 3.25443K wps
[Epoch 118 Batch 60/62] avg loss 0.00164494, throughput 3.17347K wps
Begin Testing...
[Epoch 118] train avg loss 0.00158516, dev acc 0.8378, dev avg loss 0.342297, throughput 3.21994K wps
[Epoch 119 Batch 30/62] avg loss 0.00162261, throughput 3.24269K wps
[Epoch 119 Batch 60/62] avg loss 0.00149672, throughput 3.20288K wps
Begin Testing...
[Epoch 119] train avg loss 0.00157215, dev acc 0.8407, dev avg loss 0.336706, throughput 3.22871K wps
[Epoch 120 Batch 30/62] avg loss 0.00148782, throughput 3.27847K wps
[Epoch 120 Batch 60/62] avg loss 0.00153393, throughput 3.2021K wps
Begin Testing...
[Epoch 120] train avg loss 0.00155418, dev acc 0.8437, dev avg loss 0.337068, throughput 3.24834K wps
[Epoch 121 Batch 30/62] avg loss 0.00156779, throughput 3.28386K wps
[Epoch 121 Batch 60/62] avg loss 0.00143571, throughput 3.19073K wps
Begin Testing...
[Epoch 121] train avg loss 0.00150509, dev acc 0.8348, dev avg loss 0.341457, throughput 3.24424K wps
[Epoch 122 Batch 30/62] avg loss 0.00143882, throughput 3.27764K wps
[Epoch 122 Batch 60/62] avg loss 0.00145883, throughput 3.17116K wps
Begin Testing...
[Epoch 122] train avg loss 0.00144454, dev acc 0.8466, dev avg loss 0.336226, throughput 3.22916K wps
[Epoch 123 Batch 30/62] avg loss 0.00157338, throughput 3.24978K wps
[Epoch 123 Batch 60/62] avg loss 0.00140881, throughput 3.163K wps
Begin Testing...
[Epoch 123] train avg loss 0.00151275, dev acc 0.8437, dev avg loss 0.334196, throughput 3.21264K wps
[Epoch 124 Batch 30/62] avg loss 0.00144781, throughput 3.24635K wps
[Epoch 124 Batch 60/62] avg loss 0.00145502, throughput 3.17621K wps
Begin Testing...
[Epoch 124] train avg loss 0.00145199, dev acc 0.8319, dev avg loss 0.348058, throughput 3.21918K wps
[Epoch 125 Batch 30/62] avg loss 0.0014865, throughput 3.28061K wps
[Epoch 125 Batch 60/62] avg loss 0.00137697, throughput 3.17949K wps
Begin Testing...
[Epoch 125] train avg loss 0.00142697, dev acc 0.8319, dev avg loss 0.346367, throughput 3.23469K wps
[Epoch 126 Batch 30/62] avg loss 0.00140222, throughput 3.24822K wps
[Epoch 126 Batch 60/62] avg loss 0.00155198, throughput 3.15319K wps
Begin Testing...
[Epoch 126] train avg loss 0.00147629, dev acc 0.8348, dev avg loss 0.344694, throughput 3.2061K wps
[Epoch 127 Batch 30/62] avg loss 0.00135469, throughput 3.24731K wps
[Epoch 127 Batch 60/62] avg loss 0.00134232, throughput 3.19731K wps
Begin Testing...
[Epoch 127] train avg loss 0.00136658, dev acc 0.8348, dev avg loss 0.347752, throughput 3.22982K wps
[Epoch 128 Batch 30/62] avg loss 0.00143818, throughput 3.30727K wps
[Epoch 128 Batch 60/62] avg loss 0.00141247, throughput 3.1619K wps
Begin Testing...
[Epoch 128] train avg loss 0.00145857, dev acc 0.8348, dev avg loss 0.346187, throughput 3.23885K wps
[Epoch 129 Batch 30/62] avg loss 0.00122337, throughput 3.26008K wps
[Epoch 129 Batch 60/62] avg loss 0.00135472, throughput 3.21364K wps
Begin Testing...
[Epoch 129] train avg loss 0.00130349, dev acc 0.8437, dev avg loss 0.337015, throughput 3.24253K wps
[Epoch 130 Batch 30/62] avg loss 0.00124791, throughput 3.26603K wps
[Epoch 130 Batch 60/62] avg loss 0.00135792, throughput 3.19474K wps
Begin Testing...
[Epoch 130] train avg loss 0.00131777, dev acc 0.8437, dev avg loss 0.341875, throughput 3.23844K wps
[Epoch 131 Batch 30/62] avg loss 0.00125031, throughput 3.2796K wps
[Epoch 131 Batch 60/62] avg loss 0.0014063, throughput 3.16123K wps
Begin Testing...
[Epoch 131] train avg loss 0.00136679, dev acc 0.8466, dev avg loss 0.337654, throughput 3.22521K wps
[Epoch 132 Batch 30/62] avg loss 0.00132543, throughput 3.25304K wps
[Epoch 132 Batch 60/62] avg loss 0.00121871, throughput 3.21894K wps
Begin Testing...
[Epoch 132] train avg loss 0.00133596, dev acc 0.8319, dev avg loss 0.364336, throughput 3.24186K wps
[Epoch 133 Batch 30/62] avg loss 0.00124318, throughput 3.27192K wps
[Epoch 133 Batch 60/62] avg loss 0.00124035, throughput 3.16662K wps
Begin Testing...
[Epoch 133] train avg loss 0.00124797, dev acc 0.8319, dev avg loss 0.343185, throughput 3.22507K wps
[Epoch 134 Batch 30/62] avg loss 0.00125222, throughput 3.28326K wps
[Epoch 134 Batch 60/62] avg loss 0.00113669, throughput 3.20099K wps
Begin Testing...
[Epoch 134] train avg loss 0.00120127, dev acc 0.8407, dev avg loss 0.342135, throughput 3.24824K wps
[Epoch 135 Batch 30/62] avg loss 0.00127097, throughput 3.26512K wps
[Epoch 135 Batch 60/62] avg loss 0.00113152, throughput 3.20951K wps
Begin Testing...
[Epoch 135] train avg loss 0.0012262, dev acc 0.8348, dev avg loss 0.348841, throughput 3.24264K wps
[Epoch 136 Batch 30/62] avg loss 0.0012597, throughput 3.26568K wps
[Epoch 136 Batch 60/62] avg loss 0.0011899, throughput 3.17568K wps
Begin Testing...
[Epoch 136] train avg loss 0.0012461, dev acc 0.8378, dev avg loss 0.344235, throughput 3.22735K wps
[Epoch 137 Batch 30/62] avg loss 0.00119556, throughput 3.23427K wps
[Epoch 137 Batch 60/62] avg loss 0.00121821, throughput 3.20222K wps
Begin Testing...
[Epoch 137] train avg loss 0.00122662, dev acc 0.8407, dev avg loss 0.338953, throughput 3.22414K wps
[Epoch 138 Batch 30/62] avg loss 0.00124042, throughput 3.27694K wps
[Epoch 138 Batch 60/62] avg loss 0.00117117, throughput 3.19219K wps
Begin Testing...
[Epoch 138] train avg loss 0.00120274, dev acc 0.8378, dev avg loss 0.345147, throughput 3.24146K wps
[Epoch 139 Batch 30/62] avg loss 0.00114519, throughput 3.30481K wps
[Epoch 139 Batch 60/62] avg loss 0.00113961, throughput 3.17779K wps
Begin Testing...
[Epoch 139] train avg loss 0.00118286, dev acc 0.8466, dev avg loss 0.338523, throughput 3.24786K wps
[Epoch 140 Batch 30/62] avg loss 0.00117526, throughput 3.29276K wps
[Epoch 140 Batch 60/62] avg loss 0.00104273, throughput 3.17681K wps
Begin Testing...
[Epoch 140] train avg loss 0.00112855, dev acc 0.8407, dev avg loss 0.342774, throughput 3.23936K wps
[Epoch 141 Batch 30/62] avg loss 0.00120422, throughput 3.27493K wps
[Epoch 141 Batch 60/62] avg loss 0.00105296, throughput 3.21824K wps
Begin Testing...
[Epoch 141] train avg loss 0.00114446, dev acc 0.8407, dev avg loss 0.346236, throughput 3.25169K wps
[Epoch 142 Batch 30/62] avg loss 0.00112681, throughput 3.28331K wps
[Epoch 142 Batch 60/62] avg loss 0.0011186, throughput 3.21326K wps
Begin Testing...
[Epoch 142] train avg loss 0.0011292, dev acc 0.8378, dev avg loss 0.350427, throughput 3.25344K wps
[Epoch 143 Batch 30/62] avg loss 0.00112756, throughput 3.25941K wps
[Epoch 143 Batch 60/62] avg loss 0.00112317, throughput 3.21861K wps
Begin Testing...
[Epoch 143] train avg loss 0.00113285, dev acc 0.8407, dev avg loss 0.34457, throughput 3.24409K wps
[Epoch 144 Batch 30/62] avg loss 0.00113587, throughput 3.26094K wps
[Epoch 144 Batch 60/62] avg loss 0.00111386, throughput 3.17406K wps
Begin Testing...
[Epoch 144] train avg loss 0.00113701, dev acc 0.8319, dev avg loss 0.359557, throughput 3.22365K wps
[Epoch 145 Batch 30/62] avg loss 0.00112944, throughput 3.25962K wps
[Epoch 145 Batch 60/62] avg loss 0.00115666, throughput 3.20054K wps
Begin Testing...
[Epoch 145] train avg loss 0.00114793, dev acc 0.8407, dev avg loss 0.349371, throughput 3.23612K wps
[Epoch 146 Batch 30/62] avg loss 0.00106856, throughput 3.25776K wps
[Epoch 146 Batch 60/62] avg loss 0.00104232, throughput 3.15083K wps
Begin Testing...
[Epoch 146] train avg loss 0.00110684, dev acc 0.8378, dev avg loss 0.370516, throughput 3.20924K wps
[Epoch 147 Batch 30/62] avg loss 0.0010378, throughput 3.24787K wps
[Epoch 147 Batch 60/62] avg loss 0.000995094, throughput 3.15605K wps
Begin Testing...
[Epoch 147] train avg loss 0.00105761, dev acc 0.8289, dev avg loss 0.360768, throughput 3.2074K wps
[Epoch 148 Batch 30/62] avg loss 0.00101759, throughput 3.25262K wps
[Epoch 148 Batch 60/62] avg loss 0.00101846, throughput 3.15192K wps
Begin Testing...
[Epoch 148] train avg loss 0.00104131, dev acc 0.8407, dev avg loss 0.345502, throughput 3.2081K wps
[Epoch 149 Batch 30/62] avg loss 0.000982734, throughput 3.2333K wps
[Epoch 149 Batch 60/62] avg loss 0.00106564, throughput 3.16507K wps
Begin Testing...
[Epoch 149] train avg loss 0.00104783, dev acc 0.8407, dev avg loss 0.346809, throughput 3.20545K wps
[Epoch 150 Batch 30/62] avg loss 0.00098645, throughput 3.26005K wps
[Epoch 150 Batch 60/62] avg loss 0.000982852, throughput 3.19884K wps
Begin Testing...
[Epoch 150] train avg loss 0.00100095, dev acc 0.8378, dev avg loss 0.347883, throughput 3.23408K wps
[Epoch 151 Batch 30/62] avg loss 0.00107916, throughput 3.2759K wps
[Epoch 151 Batch 60/62] avg loss 0.00100603, throughput 3.16552K wps
Begin Testing...
[Epoch 151] train avg loss 0.00106871, dev acc 0.8378, dev avg loss 0.372396, throughput 3.22515K wps
[Epoch 152 Batch 30/62] avg loss 0.00099751, throughput 3.25897K wps
[Epoch 152 Batch 60/62] avg loss 0.00104212, throughput 3.19233K wps
Begin Testing...
[Epoch 152] train avg loss 0.0010348, dev acc 0.8378, dev avg loss 0.347613, throughput 3.23405K wps
[Epoch 153 Batch 30/62] avg loss 0.000997357, throughput 3.27231K wps
[Epoch 153 Batch 60/62] avg loss 0.000919562, throughput 3.16948K wps
Begin Testing...
[Epoch 153] train avg loss 0.000967942, dev acc 0.8348, dev avg loss 0.350265, throughput 3.22553K wps
[Epoch 154 Batch 30/62] avg loss 0.00102403, throughput 3.28619K wps
[Epoch 154 Batch 60/62] avg loss 0.000975753, throughput 3.19695K wps
Begin Testing...
[Epoch 154] train avg loss 0.00100321, dev acc 0.8407, dev avg loss 0.353692, throughput 3.24622K wps
[Epoch 155 Batch 30/62] avg loss 0.0009009, throughput 3.26379K wps
[Epoch 155 Batch 60/62] avg loss 0.000954053, throughput 3.16295K wps
Begin Testing...
[Epoch 155] train avg loss 0.000931329, dev acc 0.8378, dev avg loss 0.352823, throughput 3.21882K wps
[Epoch 156 Batch 30/62] avg loss 0.000955922, throughput 3.25195K wps
[Epoch 156 Batch 60/62] avg loss 0.000906597, throughput 3.21128K wps
Begin Testing...
[Epoch 156] train avg loss 0.000937203, dev acc 0.8378, dev avg loss 0.350373, throughput 3.23785K wps
[Epoch 157 Batch 30/62] avg loss 0.000917657, throughput 3.25507K wps
[Epoch 157 Batch 60/62] avg loss 0.000939298, throughput 3.17759K wps
Begin Testing...
[Epoch 157] train avg loss 0.000934057, dev acc 0.8407, dev avg loss 0.352004, throughput 3.22427K wps
[Epoch 158 Batch 30/62] avg loss 0.000849709, throughput 3.23144K wps
[Epoch 158 Batch 60/62] avg loss 0.000989962, throughput 3.19549K wps
Begin Testing...
[Epoch 158] train avg loss 0.000928799, dev acc 0.8348, dev avg loss 0.35603, throughput 3.2218K wps
[Epoch 159 Batch 30/62] avg loss 0.000910003, throughput 3.26796K wps
[Epoch 159 Batch 60/62] avg loss 0.00096392, throughput 3.16908K wps
Begin Testing...
[Epoch 159] train avg loss 0.000957697, dev acc 0.8407, dev avg loss 0.349764, throughput 3.22362K wps
[Epoch 160 Batch 30/62] avg loss 0.000878535, throughput 3.27129K wps
[Epoch 160 Batch 60/62] avg loss 0.00091398, throughput 3.20207K wps
Begin Testing...
[Epoch 160] train avg loss 0.000899617, dev acc 0.8378, dev avg loss 0.354335, throughput 3.24241K wps
[Epoch 161 Batch 30/62] avg loss 0.000875912, throughput 3.27372K wps
[Epoch 161 Batch 60/62] avg loss 0.000974, throughput 3.17921K wps
Begin Testing...
[Epoch 161] train avg loss 0.000943255, dev acc 0.8407, dev avg loss 0.349256, throughput 3.23393K wps
[Epoch 162 Batch 30/62] avg loss 0.000886561, throughput 3.29097K wps
[Epoch 162 Batch 60/62] avg loss 0.000900683, throughput 3.16594K wps
Begin Testing...
[Epoch 162] train avg loss 0.000894115, dev acc 0.8378, dev avg loss 0.355361, throughput 3.23299K wps
[Epoch 163 Batch 30/62] avg loss 0.000813898, throughput 3.25933K wps
[Epoch 163 Batch 60/62] avg loss 0.000889921, throughput 3.15094K wps
Begin Testing...
[Epoch 163] train avg loss 0.000872005, dev acc 0.8437, dev avg loss 0.352695, throughput 3.21018K wps
[Epoch 164 Batch 30/62] avg loss 0.000780948, throughput 3.25127K wps
[Epoch 164 Batch 60/62] avg loss 0.000936775, throughput 3.2142K wps
Begin Testing...
[Epoch 164] train avg loss 0.000860532, dev acc 0.8437, dev avg loss 0.354879, throughput 3.23866K wps
[Epoch 165 Batch 30/62] avg loss 0.000852128, throughput 3.27639K wps
[Epoch 165 Batch 60/62] avg loss 0.000864867, throughput 3.21686K wps
Begin Testing...
[Epoch 165] train avg loss 0.000888268, dev acc 0.8378, dev avg loss 0.357916, throughput 3.25224K wps
[Epoch 166 Batch 30/62] avg loss 0.000919558, throughput 3.28178K wps
[Epoch 166 Batch 60/62] avg loss 0.000884139, throughput 3.21287K wps
Begin Testing...
[Epoch 166] train avg loss 0.000916988, dev acc 0.8378, dev avg loss 0.355558, throughput 3.25326K wps
[Epoch 167 Batch 30/62] avg loss 0.000887415, throughput 3.27637K wps
[Epoch 167 Batch 60/62] avg loss 0.000884068, throughput 3.16782K wps
Begin Testing...
[Epoch 167] train avg loss 0.000902859, dev acc 0.8437, dev avg loss 0.352743, throughput 3.22943K wps
[Epoch 168 Batch 30/62] avg loss 0.000808008, throughput 3.27472K wps
[Epoch 168 Batch 60/62] avg loss 0.000808053, throughput 3.19123K wps
Begin Testing...
[Epoch 168] train avg loss 0.000812848, dev acc 0.8348, dev avg loss 0.357499, throughput 3.23916K wps
[Epoch 169 Batch 30/62] avg loss 0.000818256, throughput 3.25652K wps
[Epoch 169 Batch 60/62] avg loss 0.000858767, throughput 3.20338K wps
Begin Testing...
[Epoch 169] train avg loss 0.00085135, dev acc 0.8348, dev avg loss 0.358376, throughput 3.23895K wps
[Epoch 170 Batch 30/62] avg loss 0.000803978, throughput 3.28609K wps
[Epoch 170 Batch 60/62] avg loss 0.000782684, throughput 3.19682K wps
Begin Testing...
[Epoch 170] train avg loss 0.00080556, dev acc 0.8378, dev avg loss 0.35893, throughput 3.24833K wps
[Epoch 171 Batch 30/62] avg loss 0.000808131, throughput 3.28078K wps
[Epoch 171 Batch 60/62] avg loss 0.000761504, throughput 3.16339K wps
Begin Testing...
[Epoch 171] train avg loss 0.000798454, dev acc 0.8378, dev avg loss 0.357932, throughput 3.22615K wps
[Epoch 172 Batch 30/62] avg loss 0.000719278, throughput 3.26043K wps
[Epoch 172 Batch 60/62] avg loss 0.000844641, throughput 3.20574K wps
Begin Testing...
[Epoch 172] train avg loss 0.000791651, dev acc 0.8466, dev avg loss 0.353895, throughput 3.24066K wps
[Epoch 173 Batch 30/62] avg loss 0.000864926, throughput 3.27846K wps
[Epoch 173 Batch 60/62] avg loss 0.000861003, throughput 3.22078K wps
Begin Testing...
[Epoch 173] train avg loss 0.00092173, dev acc 0.8378, dev avg loss 0.378665, throughput 3.25685K wps
[Epoch 174 Batch 30/62] avg loss 0.000792932, throughput 3.28375K wps
[Epoch 174 Batch 60/62] avg loss 0.000796158, throughput 3.21185K wps
Begin Testing...
[Epoch 174] train avg loss 0.000793682, dev acc 0.8378, dev avg loss 0.357258, throughput 3.25332K wps
[Epoch 175 Batch 30/62] avg loss 0.000734204, throughput 3.26217K wps
[Epoch 175 Batch 60/62] avg loss 0.000798817, throughput 3.16164K wps
Begin Testing...
[Epoch 175] train avg loss 0.000777189, dev acc 0.8348, dev avg loss 0.361194, throughput 3.21769K wps
[Epoch 176 Batch 30/62] avg loss 0.000758205, throughput 3.26917K wps
[Epoch 176 Batch 60/62] avg loss 0.000764476, throughput 3.21311K wps
Begin Testing...
[Epoch 176] train avg loss 0.000760957, dev acc 0.8378, dev avg loss 0.366214, throughput 3.24661K wps
[Epoch 177 Batch 30/62] avg loss 0.000798256, throughput 3.26366K wps
[Epoch 177 Batch 60/62] avg loss 0.00066789, throughput 3.21872K wps
Begin Testing...
[Epoch 177] train avg loss 0.000738743, dev acc 0.8378, dev avg loss 0.359243, throughput 3.24754K wps
[Epoch 178 Batch 30/62] avg loss 0.000755463, throughput 3.27682K wps
[Epoch 178 Batch 60/62] avg loss 0.000737056, throughput 3.20558K wps
Begin Testing...
[Epoch 178] train avg loss 0.000755072, dev acc 0.8407, dev avg loss 0.374906, throughput 3.24821K wps
[Epoch 179 Batch 30/62] avg loss 0.000707299, throughput 3.27243K wps
[Epoch 179 Batch 60/62] avg loss 0.000733773, throughput 3.16598K wps
Begin Testing...
[Epoch 179] train avg loss 0.000720844, dev acc 0.8378, dev avg loss 0.356908, throughput 3.22587K wps
[Epoch 180 Batch 30/62] avg loss 0.000704235, throughput 3.31142K wps
[Epoch 180 Batch 60/62] avg loss 0.000694095, throughput 3.17532K wps
Begin Testing...
[Epoch 180] train avg loss 0.00071054, dev acc 0.8378, dev avg loss 0.364451, throughput 3.24745K wps
[Epoch 181 Batch 30/62] avg loss 0.000744089, throughput 3.24601K wps
[Epoch 181 Batch 60/62] avg loss 0.000724887, throughput 3.22276K wps
Begin Testing...
[Epoch 181] train avg loss 0.000742504, dev acc 0.8378, dev avg loss 0.364833, throughput 3.24164K wps
[Epoch 182 Batch 30/62] avg loss 0.000688879, throughput 3.27889K wps
[Epoch 182 Batch 60/62] avg loss 0.000712941, throughput 3.15861K wps
Begin Testing...
[Epoch 182] train avg loss 0.000701157, dev acc 0.8378, dev avg loss 0.365166, throughput 3.22335K wps
[Epoch 183 Batch 30/62] avg loss 0.00071664, throughput 3.24579K wps
[Epoch 183 Batch 60/62] avg loss 0.000696217, throughput 3.21591K wps
Begin Testing...
[Epoch 183] train avg loss 0.000717806, dev acc 0.8407, dev avg loss 0.359145, throughput 3.23638K wps
[Epoch 184 Batch 30/62] avg loss 0.000704926, throughput 3.27174K wps
[Epoch 184 Batch 60/62] avg loss 0.000638717, throughput 3.18548K wps
Begin Testing...
[Epoch 184] train avg loss 0.00068311, dev acc 0.8319, dev avg loss 0.363684, throughput 3.23563K wps
[Epoch 185 Batch 30/62] avg loss 0.000631117, throughput 3.28843K wps
[Epoch 185 Batch 60/62] avg loss 0.000715321, throughput 3.16381K wps
Begin Testing...
[Epoch 185] train avg loss 0.000682957, dev acc 0.8348, dev avg loss 0.36286, throughput 3.23067K wps
[Epoch 186 Batch 30/62] avg loss 0.000633639, throughput 3.26403K wps
[Epoch 186 Batch 60/62] avg loss 0.000637774, throughput 3.22119K wps
Begin Testing...
[Epoch 186] train avg loss 0.000650401, dev acc 0.8348, dev avg loss 0.363577, throughput 3.24995K wps
[Epoch 187 Batch 30/62] avg loss 0.000698302, throughput 3.26816K wps
[Epoch 187 Batch 60/62] avg loss 0.000720624, throughput 3.19788K wps
Begin Testing...
[Epoch 187] train avg loss 0.000718306, dev acc 0.8466, dev avg loss 0.358927, throughput 3.24052K wps
[Epoch 188 Batch 30/62] avg loss 0.000689303, throughput 3.29026K wps
[Epoch 188 Batch 60/62] avg loss 0.000683363, throughput 3.22506K wps
Begin Testing...
[Epoch 188] train avg loss 0.000688879, dev acc 0.8348, dev avg loss 0.364404, throughput 3.26448K wps
[Epoch 189 Batch 30/62] avg loss 0.000637163, throughput 3.27639K wps
[Epoch 189 Batch 60/62] avg loss 0.000636512, throughput 3.22139K wps
Begin Testing...
[Epoch 189] train avg loss 0.00064139, dev acc 0.8348, dev avg loss 0.365731, throughput 3.25514K wps
[Epoch 190 Batch 30/62] avg loss 0.0006143, throughput 3.27308K wps
[Epoch 190 Batch 60/62] avg loss 0.000700271, throughput 3.20988K wps
Begin Testing...
[Epoch 190] train avg loss 0.000669626, dev acc 0.8437, dev avg loss 0.360562, throughput 3.24466K wps
[Epoch 191 Batch 30/62] avg loss 0.000641342, throughput 3.28071K wps
[Epoch 191 Batch 60/62] avg loss 0.000705955, throughput 3.16208K wps
Begin Testing...
[Epoch 191] train avg loss 0.000679573, dev acc 0.8437, dev avg loss 0.362877, throughput 3.22606K wps
[Epoch 192 Batch 30/62] avg loss 0.000663395, throughput 3.23252K wps
[Epoch 192 Batch 60/62] avg loss 0.000673964, throughput 3.16301K wps
Begin Testing...
[Epoch 192] train avg loss 0.000677547, dev acc 0.8348, dev avg loss 0.366021, throughput 3.20447K wps
[Epoch 193 Batch 30/62] avg loss 0.000651265, throughput 3.23073K wps
[Epoch 193 Batch 60/62] avg loss 0.000639048, throughput 3.1578K wps
Begin Testing...
[Epoch 193] train avg loss 0.000647174, dev acc 0.8319, dev avg loss 0.367177, throughput 3.19997K wps
[Epoch 194 Batch 30/62] avg loss 0.000661558, throughput 3.24235K wps
[Epoch 194 Batch 60/62] avg loss 0.000631788, throughput 3.16425K wps
Begin Testing...
[Epoch 194] train avg loss 0.000649456, dev acc 0.8348, dev avg loss 0.373422, throughput 3.21052K wps
[Epoch 195 Batch 30/62] avg loss 0.000619002, throughput 3.24615K wps
[Epoch 195 Batch 60/62] avg loss 0.000653402, throughput 3.2245K wps
Begin Testing...
[Epoch 195] train avg loss 0.000639922, dev acc 0.8466, dev avg loss 0.362087, throughput 3.24179K wps
[Epoch 196 Batch 30/62] avg loss 0.000608062, throughput 3.25513K wps
[Epoch 196 Batch 60/62] avg loss 0.000613679, throughput 3.1928K wps
Begin Testing...
[Epoch 196] train avg loss 0.000621822, dev acc 0.8378, dev avg loss 0.36516, throughput 3.2329K wps
[Epoch 197 Batch 30/62] avg loss 0.000632035, throughput 3.2985K wps
[Epoch 197 Batch 60/62] avg loss 0.000621602, throughput 3.17948K wps
Begin Testing...
[Epoch 197] train avg loss 0.000627045, dev acc 0.8437, dev avg loss 0.368568, throughput 3.24595K wps
[Epoch 198 Batch 30/62] avg loss 0.000621817, throughput 3.29194K wps
[Epoch 198 Batch 60/62] avg loss 0.000638074, throughput 3.18025K wps
Begin Testing...
[Epoch 198] train avg loss 0.000640524, dev acc 0.8378, dev avg loss 0.367559, throughput 3.24069K wps
[Epoch 199 Batch 30/62] avg loss 0.000587076, throughput 3.25889K wps
[Epoch 199 Batch 60/62] avg loss 0.00059152, throughput 3.22122K wps
Begin Testing...
[Epoch 199] train avg loss 0.000605151, dev acc 0.8378, dev avg loss 0.367803, throughput 3.24649K wps
Test loss 0.366191, test acc 0.8408
Total time cost 420.64s
[Epoch 0 Batch 30/62] avg loss 0.0135656, throughput 3.07746K wps
[Epoch 0 Batch 60/62] avg loss 0.0130316, throughput 3.21745K wps
Begin Testing...
[Epoch 0] train avg loss 0.0134389, dev acc 0.6254, dev avg loss 0.661763, throughput 3.15474K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0129868, throughput 3.29257K wps
[Epoch 1 Batch 60/62] avg loss 0.0131181, throughput 3.19908K wps
Begin Testing...
[Epoch 1] train avg loss 0.0132252, dev acc 0.6254, dev avg loss 0.651705, throughput 3.25327K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0129985, throughput 3.2817K wps
[Epoch 2 Batch 60/62] avg loss 0.0128404, throughput 3.17728K wps
Begin Testing...
[Epoch 2] train avg loss 0.0131265, dev acc 0.6254, dev avg loss 0.645815, throughput 3.23495K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0126377, throughput 3.26583K wps
[Epoch 3 Batch 60/62] avg loss 0.0127756, throughput 3.18411K wps
Begin Testing...
[Epoch 3] train avg loss 0.0128507, dev acc 0.6254, dev avg loss 0.638024, throughput 3.2324K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0128437, throughput 3.22684K wps
[Epoch 4 Batch 60/62] avg loss 0.0124598, throughput 3.16697K wps
Begin Testing...
[Epoch 4] train avg loss 0.0128422, dev acc 0.6254, dev avg loss 0.630627, throughput 3.20501K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0126235, throughput 3.22382K wps
[Epoch 5 Batch 60/62] avg loss 0.0121727, throughput 3.15261K wps
Begin Testing...
[Epoch 5] train avg loss 0.0125929, dev acc 0.6254, dev avg loss 0.623087, throughput 3.19495K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0121608, throughput 3.27671K wps
[Epoch 6 Batch 60/62] avg loss 0.0123753, throughput 3.15865K wps
Begin Testing...
[Epoch 6] train avg loss 0.0123945, dev acc 0.6283, dev avg loss 0.61507, throughput 3.22273K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0122362, throughput 3.27112K wps
[Epoch 7 Batch 60/62] avg loss 0.0119997, throughput 3.17058K wps
Begin Testing...
[Epoch 7] train avg loss 0.0122535, dev acc 0.6254, dev avg loss 0.610531, throughput 3.22638K wps
[Epoch 8 Batch 30/62] avg loss 0.0119788, throughput 3.24238K wps
[Epoch 8 Batch 60/62] avg loss 0.0118204, throughput 3.16221K wps
Begin Testing...
[Epoch 8] train avg loss 0.0120773, dev acc 0.6313, dev avg loss 0.599321, throughput 3.20852K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0115975, throughput 3.25888K wps
[Epoch 9 Batch 60/62] avg loss 0.0116716, throughput 3.20234K wps
Begin Testing...
[Epoch 9] train avg loss 0.0117775, dev acc 0.6549, dev avg loss 0.59057, throughput 3.23803K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0114118, throughput 3.27727K wps
[Epoch 10 Batch 60/62] avg loss 0.0116649, throughput 3.19174K wps
Begin Testing...
[Epoch 10] train avg loss 0.0116338, dev acc 0.6608, dev avg loss 0.581879, throughput 3.24037K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0114225, throughput 3.27457K wps
[Epoch 11 Batch 60/62] avg loss 0.0112579, throughput 3.20974K wps
Begin Testing...
[Epoch 11] train avg loss 0.0114282, dev acc 0.6431, dev avg loss 0.575075, throughput 3.24747K wps
[Epoch 12 Batch 30/62] avg loss 0.0111297, throughput 3.27644K wps
[Epoch 12 Batch 60/62] avg loss 0.0111552, throughput 3.17157K wps
Begin Testing...
[Epoch 12] train avg loss 0.0112668, dev acc 0.6726, dev avg loss 0.564623, throughput 3.23189K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0111047, throughput 3.2547K wps
[Epoch 13 Batch 60/62] avg loss 0.01073, throughput 3.20732K wps
Begin Testing...
[Epoch 13] train avg loss 0.011134, dev acc 0.7198, dev avg loss 0.553371, throughput 3.23535K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.010905, throughput 3.27837K wps
[Epoch 14 Batch 60/62] avg loss 0.0105041, throughput 3.16154K wps
Begin Testing...
[Epoch 14] train avg loss 0.0108724, dev acc 0.7021, dev avg loss 0.54483, throughput 3.22402K wps
[Epoch 15 Batch 30/62] avg loss 0.0107377, throughput 3.27404K wps
[Epoch 15 Batch 60/62] avg loss 0.0103211, throughput 3.19055K wps
Begin Testing...
[Epoch 15] train avg loss 0.0106345, dev acc 0.6785, dev avg loss 0.538942, throughput 3.23678K wps
[Epoch 16 Batch 30/62] avg loss 0.0105201, throughput 3.28129K wps
[Epoch 16 Batch 60/62] avg loss 0.0101282, throughput 3.17888K wps
Begin Testing...
[Epoch 16] train avg loss 0.0104693, dev acc 0.7286, dev avg loss 0.525366, throughput 3.23495K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0101374, throughput 3.27089K wps
[Epoch 17 Batch 60/62] avg loss 0.0100368, throughput 3.20835K wps
Begin Testing...
[Epoch 17] train avg loss 0.0102147, dev acc 0.7198, dev avg loss 0.516552, throughput 3.24548K wps
[Epoch 18 Batch 30/62] avg loss 0.0101034, throughput 3.26954K wps
[Epoch 18 Batch 60/62] avg loss 0.00978012, throughput 3.17412K wps
Begin Testing...
[Epoch 18] train avg loss 0.0101083, dev acc 0.7286, dev avg loss 0.507929, throughput 3.22729K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.00974716, throughput 3.26359K wps
[Epoch 19 Batch 60/62] avg loss 0.00965853, throughput 3.18612K wps
Begin Testing...
[Epoch 19] train avg loss 0.00981093, dev acc 0.7316, dev avg loss 0.500293, throughput 3.22961K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0095542, throughput 3.2484K wps
[Epoch 20 Batch 60/62] avg loss 0.00949891, throughput 3.21298K wps
Begin Testing...
[Epoch 20] train avg loss 0.0096604, dev acc 0.7611, dev avg loss 0.491797, throughput 3.23652K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.00951348, throughput 3.27507K wps
[Epoch 21 Batch 60/62] avg loss 0.00902979, throughput 3.17467K wps
Begin Testing...
[Epoch 21] train avg loss 0.00935215, dev acc 0.7611, dev avg loss 0.483657, throughput 3.22997K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.0089336, throughput 3.26917K wps
[Epoch 22 Batch 60/62] avg loss 0.00926667, throughput 3.2086K wps
Begin Testing...
[Epoch 22] train avg loss 0.00921105, dev acc 0.7670, dev avg loss 0.476835, throughput 3.24507K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.008747, throughput 3.29013K wps
[Epoch 23 Batch 60/62] avg loss 0.00896048, throughput 3.16877K wps
Begin Testing...
[Epoch 23] train avg loss 0.00896552, dev acc 0.7611, dev avg loss 0.471401, throughput 3.23464K wps
[Epoch 24 Batch 30/62] avg loss 0.00882188, throughput 3.25628K wps
[Epoch 24 Batch 60/62] avg loss 0.00882668, throughput 3.22031K wps
Begin Testing...
[Epoch 24] train avg loss 0.00888717, dev acc 0.7729, dev avg loss 0.4665, throughput 3.2438K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.00880593, throughput 3.26357K wps
[Epoch 25 Batch 60/62] avg loss 0.00866787, throughput 3.15413K wps
Begin Testing...
[Epoch 25] train avg loss 0.00881018, dev acc 0.7699, dev avg loss 0.458815, throughput 3.21451K wps
[Epoch 26 Batch 30/62] avg loss 0.00840207, throughput 3.2735K wps
[Epoch 26 Batch 60/62] avg loss 0.0084347, throughput 3.19552K wps
Begin Testing...
[Epoch 26] train avg loss 0.00849735, dev acc 0.7758, dev avg loss 0.457006, throughput 3.24223K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/62] avg loss 0.00831455, throughput 3.27563K wps
[Epoch 27 Batch 60/62] avg loss 0.00812201, throughput 3.17963K wps
Begin Testing...
[Epoch 27] train avg loss 0.00832723, dev acc 0.7788, dev avg loss 0.452492, throughput 3.2323K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.0080254, throughput 3.28039K wps
[Epoch 28 Batch 60/62] avg loss 0.008275, throughput 3.16253K wps
Begin Testing...
[Epoch 28] train avg loss 0.00832055, dev acc 0.7876, dev avg loss 0.444193, throughput 3.22689K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.00781462, throughput 3.25677K wps
[Epoch 29 Batch 60/62] avg loss 0.00789029, throughput 3.15738K wps
Begin Testing...
[Epoch 29] train avg loss 0.0080549, dev acc 0.7876, dev avg loss 0.440033, throughput 3.21141K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00779917, throughput 3.28889K wps
[Epoch 30 Batch 60/62] avg loss 0.00790414, throughput 3.18569K wps
Begin Testing...
[Epoch 30] train avg loss 0.00800243, dev acc 0.8083, dev avg loss 0.435899, throughput 3.24437K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.007531, throughput 3.24366K wps
[Epoch 31 Batch 60/62] avg loss 0.00788874, throughput 3.19226K wps
Begin Testing...
[Epoch 31] train avg loss 0.00779983, dev acc 0.8201, dev avg loss 0.433034, throughput 3.22694K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/62] avg loss 0.00771443, throughput 3.30397K wps
[Epoch 32 Batch 60/62] avg loss 0.00749002, throughput 3.17206K wps
Begin Testing...
[Epoch 32] train avg loss 0.00774633, dev acc 0.8083, dev avg loss 0.428382, throughput 3.24336K wps
[Epoch 33 Batch 30/62] avg loss 0.00753134, throughput 3.29058K wps
[Epoch 33 Batch 60/62] avg loss 0.00731959, throughput 3.20407K wps
Begin Testing...
[Epoch 33] train avg loss 0.00746619, dev acc 0.7788, dev avg loss 0.428725, throughput 3.25207K wps
[Epoch 34 Batch 30/62] avg loss 0.00725792, throughput 3.25494K wps
[Epoch 34 Batch 60/62] avg loss 0.00720736, throughput 3.20725K wps
Begin Testing...
[Epoch 34] train avg loss 0.00729742, dev acc 0.8112, dev avg loss 0.421315, throughput 3.23533K wps
[Epoch 35 Batch 30/62] avg loss 0.00719275, throughput 3.27212K wps
[Epoch 35 Batch 60/62] avg loss 0.00706699, throughput 3.17495K wps
Begin Testing...
[Epoch 35] train avg loss 0.00726917, dev acc 0.8230, dev avg loss 0.419176, throughput 3.23126K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/62] avg loss 0.00697106, throughput 3.24024K wps
[Epoch 36 Batch 60/62] avg loss 0.00721562, throughput 3.22041K wps
Begin Testing...
[Epoch 36] train avg loss 0.00720103, dev acc 0.7994, dev avg loss 0.416946, throughput 3.23619K wps
[Epoch 37 Batch 30/62] avg loss 0.00674804, throughput 3.28037K wps
[Epoch 37 Batch 60/62] avg loss 0.00705949, throughput 3.22125K wps
Begin Testing...
[Epoch 37] train avg loss 0.00694462, dev acc 0.8201, dev avg loss 0.412552, throughput 3.25664K wps
[Epoch 38 Batch 30/62] avg loss 0.00686046, throughput 3.25652K wps
[Epoch 38 Batch 60/62] avg loss 0.00667981, throughput 3.16687K wps
Begin Testing...
[Epoch 38] train avg loss 0.00689828, dev acc 0.8201, dev avg loss 0.410027, throughput 3.21878K wps
[Epoch 39 Batch 30/62] avg loss 0.00654078, throughput 3.2355K wps
[Epoch 39 Batch 60/62] avg loss 0.00684954, throughput 3.18187K wps
Begin Testing...
[Epoch 39] train avg loss 0.00680458, dev acc 0.8260, dev avg loss 0.406954, throughput 3.21642K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/62] avg loss 0.00646636, throughput 3.23564K wps
[Epoch 40 Batch 60/62] avg loss 0.00670423, throughput 3.21671K wps
Begin Testing...
[Epoch 40] train avg loss 0.00673017, dev acc 0.8319, dev avg loss 0.405029, throughput 3.23294K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.00647261, throughput 3.28345K wps
[Epoch 41 Batch 60/62] avg loss 0.00641067, throughput 3.17368K wps
Begin Testing...
[Epoch 41] train avg loss 0.00657214, dev acc 0.8230, dev avg loss 0.403073, throughput 3.2338K wps
[Epoch 42 Batch 30/62] avg loss 0.00639933, throughput 3.26402K wps
[Epoch 42 Batch 60/62] avg loss 0.00612931, throughput 3.20294K wps
Begin Testing...
[Epoch 42] train avg loss 0.00631978, dev acc 0.8260, dev avg loss 0.399914, throughput 3.23925K wps
[Epoch 43 Batch 30/62] avg loss 0.00606641, throughput 3.28288K wps
[Epoch 43 Batch 60/62] avg loss 0.00646453, throughput 3.19082K wps
Begin Testing...
[Epoch 43] train avg loss 0.00635099, dev acc 0.8260, dev avg loss 0.397546, throughput 3.24459K wps
[Epoch 44 Batch 30/62] avg loss 0.00614951, throughput 3.29735K wps
[Epoch 44 Batch 60/62] avg loss 0.00616732, throughput 3.17322K wps
Begin Testing...
[Epoch 44] train avg loss 0.0062273, dev acc 0.8260, dev avg loss 0.394994, throughput 3.2403K wps
[Epoch 45 Batch 30/62] avg loss 0.00595472, throughput 3.28964K wps
[Epoch 45 Batch 60/62] avg loss 0.0057939, throughput 3.20041K wps
Begin Testing...
[Epoch 45] train avg loss 0.00590714, dev acc 0.8289, dev avg loss 0.393012, throughput 3.25056K wps
[Epoch 46 Batch 30/62] avg loss 0.00596047, throughput 3.27861K wps
[Epoch 46 Batch 60/62] avg loss 0.0057876, throughput 3.20294K wps
Begin Testing...
[Epoch 46] train avg loss 0.00594169, dev acc 0.8319, dev avg loss 0.391086, throughput 3.24583K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/62] avg loss 0.00576042, throughput 3.30032K wps
[Epoch 47 Batch 60/62] avg loss 0.00582941, throughput 3.20859K wps
Begin Testing...
[Epoch 47] train avg loss 0.00582966, dev acc 0.8289, dev avg loss 0.388658, throughput 3.26044K wps
[Epoch 48 Batch 30/62] avg loss 0.00553726, throughput 3.26493K wps
[Epoch 48 Batch 60/62] avg loss 0.00559528, throughput 3.22007K wps
Begin Testing...
[Epoch 48] train avg loss 0.0056349, dev acc 0.8201, dev avg loss 0.386363, throughput 3.25046K wps
[Epoch 49 Batch 30/62] avg loss 0.00549634, throughput 3.28804K wps
[Epoch 49 Batch 60/62] avg loss 0.00561485, throughput 3.21485K wps
Begin Testing...
[Epoch 49] train avg loss 0.00564531, dev acc 0.8319, dev avg loss 0.385628, throughput 3.25654K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.00549216, throughput 3.28724K wps
[Epoch 50 Batch 60/62] avg loss 0.00549077, throughput 3.1933K wps
Begin Testing...
[Epoch 50] train avg loss 0.00553851, dev acc 0.8289, dev avg loss 0.384086, throughput 3.24914K wps
[Epoch 51 Batch 30/62] avg loss 0.00548643, throughput 3.3069K wps
[Epoch 51 Batch 60/62] avg loss 0.00514169, throughput 3.21132K wps
Begin Testing...
[Epoch 51] train avg loss 0.00536238, dev acc 0.8319, dev avg loss 0.381725, throughput 3.26576K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00532645, throughput 3.30668K wps
[Epoch 52 Batch 60/62] avg loss 0.00521547, throughput 3.22851K wps
Begin Testing...
[Epoch 52] train avg loss 0.00534071, dev acc 0.8319, dev avg loss 0.380484, throughput 3.27372K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.00502413, throughput 3.31649K wps
[Epoch 53 Batch 60/62] avg loss 0.00516819, throughput 3.23284K wps
Begin Testing...
[Epoch 53] train avg loss 0.00518399, dev acc 0.8289, dev avg loss 0.378897, throughput 3.28003K wps
[Epoch 54 Batch 30/62] avg loss 0.00519422, throughput 3.27266K wps
[Epoch 54 Batch 60/62] avg loss 0.00491432, throughput 3.2307K wps
Begin Testing...
[Epoch 54] train avg loss 0.00512094, dev acc 0.8289, dev avg loss 0.378035, throughput 3.25708K wps
[Epoch 55 Batch 30/62] avg loss 0.00486624, throughput 3.2852K wps
[Epoch 55 Batch 60/62] avg loss 0.00497387, throughput 3.21236K wps
Begin Testing...
[Epoch 55] train avg loss 0.00496922, dev acc 0.8319, dev avg loss 0.374232, throughput 3.25441K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.0050093, throughput 3.30114K wps
[Epoch 56 Batch 60/62] avg loss 0.00493591, throughput 3.23542K wps
Begin Testing...
[Epoch 56] train avg loss 0.00504529, dev acc 0.8378, dev avg loss 0.373273, throughput 3.2744K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/62] avg loss 0.00463802, throughput 3.29264K wps
[Epoch 57 Batch 60/62] avg loss 0.00486935, throughput 3.22519K wps
Begin Testing...
[Epoch 57] train avg loss 0.00481345, dev acc 0.8378, dev avg loss 0.3721, throughput 3.26544K wps
Observed Improvement.
Begin Testing...
[Epoch 58 Batch 30/62] avg loss 0.00478151, throughput 3.29509K wps
[Epoch 58 Batch 60/62] avg loss 0.0044133, throughput 3.2282K wps
Begin Testing...
[Epoch 58] train avg loss 0.00464019, dev acc 0.8378, dev avg loss 0.370315, throughput 3.26693K wps
Observed Improvement.
Begin Testing...
[Epoch 59 Batch 30/62] avg loss 0.0046448, throughput 3.31906K wps
[Epoch 59 Batch 60/62] avg loss 0.00472524, throughput 3.23538K wps
Begin Testing...
[Epoch 59] train avg loss 0.00473053, dev acc 0.8348, dev avg loss 0.369211, throughput 3.28299K wps
[Epoch 60 Batch 30/62] avg loss 0.00449333, throughput 3.32065K wps
[Epoch 60 Batch 60/62] avg loss 0.00468013, throughput 3.2409K wps
Begin Testing...
[Epoch 60] train avg loss 0.00462071, dev acc 0.8348, dev avg loss 0.368014, throughput 3.28565K wps
[Epoch 61 Batch 30/62] avg loss 0.00438203, throughput 3.29994K wps
[Epoch 61 Batch 60/62] avg loss 0.0043916, throughput 3.22781K wps
Begin Testing...
[Epoch 61] train avg loss 0.004449, dev acc 0.8407, dev avg loss 0.367537, throughput 3.26902K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/62] avg loss 0.00451414, throughput 3.30204K wps
[Epoch 62 Batch 60/62] avg loss 0.00446039, throughput 3.23995K wps
Begin Testing...
[Epoch 62] train avg loss 0.00462361, dev acc 0.8348, dev avg loss 0.36755, throughput 3.2766K wps
[Epoch 63 Batch 30/62] avg loss 0.00443332, throughput 3.32746K wps
[Epoch 63 Batch 60/62] avg loss 0.0043012, throughput 3.22404K wps
Begin Testing...
[Epoch 63] train avg loss 0.00443577, dev acc 0.8348, dev avg loss 0.368383, throughput 3.28262K wps
[Epoch 64 Batch 30/62] avg loss 0.0042954, throughput 3.28362K wps
[Epoch 64 Batch 60/62] avg loss 0.00417853, throughput 3.22577K wps
Begin Testing...
[Epoch 64] train avg loss 0.00426637, dev acc 0.8319, dev avg loss 0.364226, throughput 3.26234K wps
[Epoch 65 Batch 30/62] avg loss 0.00414561, throughput 3.3119K wps
[Epoch 65 Batch 60/62] avg loss 0.00404158, throughput 3.23678K wps
Begin Testing...
[Epoch 65] train avg loss 0.00414307, dev acc 0.8378, dev avg loss 0.363859, throughput 3.28123K wps
[Epoch 66 Batch 30/62] avg loss 0.00399251, throughput 3.31746K wps
[Epoch 66 Batch 60/62] avg loss 0.0040711, throughput 3.2398K wps
Begin Testing...
[Epoch 66] train avg loss 0.00406915, dev acc 0.8289, dev avg loss 0.365156, throughput 3.28342K wps
[Epoch 67 Batch 30/62] avg loss 0.00390114, throughput 3.31254K wps
[Epoch 67 Batch 60/62] avg loss 0.00389539, throughput 3.23688K wps
Begin Testing...
[Epoch 67] train avg loss 0.00393383, dev acc 0.8319, dev avg loss 0.362051, throughput 3.28158K wps
[Epoch 68 Batch 30/62] avg loss 0.00386549, throughput 3.31882K wps
[Epoch 68 Batch 60/62] avg loss 0.00379105, throughput 3.2337K wps
Begin Testing...
[Epoch 68] train avg loss 0.00389068, dev acc 0.8407, dev avg loss 0.359749, throughput 3.28129K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/62] avg loss 0.00404623, throughput 3.31601K wps
[Epoch 69 Batch 60/62] avg loss 0.00349173, throughput 3.22755K wps
Begin Testing...
[Epoch 69] train avg loss 0.00386537, dev acc 0.8407, dev avg loss 0.361168, throughput 3.27665K wps
Observed Improvement.
Begin Testing...
[Epoch 70 Batch 30/62] avg loss 0.00371662, throughput 3.30994K wps
[Epoch 70 Batch 60/62] avg loss 0.00392327, throughput 3.22416K wps
Begin Testing...
[Epoch 70] train avg loss 0.00385853, dev acc 0.8319, dev avg loss 0.359754, throughput 3.27267K wps
[Epoch 71 Batch 30/62] avg loss 0.00381831, throughput 3.32864K wps
[Epoch 71 Batch 60/62] avg loss 0.00366284, throughput 3.24541K wps
Begin Testing...
[Epoch 71] train avg loss 0.00383715, dev acc 0.8496, dev avg loss 0.357612, throughput 3.29291K wps
Observed Improvement.
Begin Testing...
[Epoch 72 Batch 30/62] avg loss 0.0037075, throughput 3.32767K wps
[Epoch 72 Batch 60/62] avg loss 0.00357686, throughput 3.229K wps
Begin Testing...
[Epoch 72] train avg loss 0.00367335, dev acc 0.8289, dev avg loss 0.359424, throughput 3.28394K wps
[Epoch 73 Batch 30/62] avg loss 0.00348834, throughput 3.30023K wps
[Epoch 73 Batch 60/62] avg loss 0.00364885, throughput 3.22597K wps
Begin Testing...
[Epoch 73] train avg loss 0.00360728, dev acc 0.8378, dev avg loss 0.359941, throughput 3.26936K wps
[Epoch 74 Batch 30/62] avg loss 0.00360973, throughput 3.31664K wps
[Epoch 74 Batch 60/62] avg loss 0.00342596, throughput 3.23375K wps
Begin Testing...
[Epoch 74] train avg loss 0.00355196, dev acc 0.8466, dev avg loss 0.360765, throughput 3.282K wps
[Epoch 75 Batch 30/62] avg loss 0.00346161, throughput 3.28546K wps
[Epoch 75 Batch 60/62] avg loss 0.00333634, throughput 3.2244K wps
Begin Testing...
[Epoch 75] train avg loss 0.0034633, dev acc 0.8348, dev avg loss 0.358019, throughput 3.26053K wps
[Epoch 76 Batch 30/62] avg loss 0.00313233, throughput 3.32765K wps
[Epoch 76 Batch 60/62] avg loss 0.00353907, throughput 3.23433K wps
Begin Testing...
[Epoch 76] train avg loss 0.00338654, dev acc 0.8466, dev avg loss 0.354043, throughput 3.28804K wps
[Epoch 77 Batch 30/62] avg loss 0.00331752, throughput 3.31023K wps
[Epoch 77 Batch 60/62] avg loss 0.00345796, throughput 3.23472K wps
Begin Testing...
[Epoch 77] train avg loss 0.00345571, dev acc 0.8466, dev avg loss 0.353791, throughput 3.27781K wps
[Epoch 78 Batch 30/62] avg loss 0.00311723, throughput 3.29934K wps
[Epoch 78 Batch 60/62] avg loss 0.00342778, throughput 3.22205K wps
Begin Testing...
[Epoch 78] train avg loss 0.00332533, dev acc 0.8437, dev avg loss 0.352944, throughput 3.26779K wps
[Epoch 79 Batch 30/62] avg loss 0.0032339, throughput 3.31292K wps
[Epoch 79 Batch 60/62] avg loss 0.00299984, throughput 3.24948K wps
Begin Testing...
[Epoch 79] train avg loss 0.00316992, dev acc 0.8466, dev avg loss 0.352628, throughput 3.28703K wps
[Epoch 80 Batch 30/62] avg loss 0.00321009, throughput 3.28874K wps
[Epoch 80 Batch 60/62] avg loss 0.00299868, throughput 3.23374K wps
Begin Testing...
[Epoch 80] train avg loss 0.0031359, dev acc 0.8496, dev avg loss 0.351656, throughput 3.26582K wps
Observed Improvement.
Begin Testing...
[Epoch 81 Batch 30/62] avg loss 0.00307074, throughput 3.29429K wps
[Epoch 81 Batch 60/62] avg loss 0.00294043, throughput 3.22162K wps
Begin Testing...
[Epoch 81] train avg loss 0.00302573, dev acc 0.8496, dev avg loss 0.351456, throughput 3.26408K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/62] avg loss 0.00294637, throughput 3.29634K wps
[Epoch 82 Batch 60/62] avg loss 0.00304194, throughput 3.2326K wps
Begin Testing...
[Epoch 82] train avg loss 0.00301765, dev acc 0.8466, dev avg loss 0.353918, throughput 3.26986K wps
[Epoch 83 Batch 30/62] avg loss 0.00303975, throughput 3.30389K wps
[Epoch 83 Batch 60/62] avg loss 0.00292317, throughput 3.217K wps
Begin Testing...
[Epoch 83] train avg loss 0.00304029, dev acc 0.8407, dev avg loss 0.351523, throughput 3.26766K wps
[Epoch 84 Batch 30/62] avg loss 0.00279657, throughput 3.30606K wps
[Epoch 84 Batch 60/62] avg loss 0.00284804, throughput 3.22529K wps
Begin Testing...
[Epoch 84] train avg loss 0.00288766, dev acc 0.8407, dev avg loss 0.351185, throughput 3.27231K wps
[Epoch 85 Batch 30/62] avg loss 0.00281372, throughput 3.29783K wps
[Epoch 85 Batch 60/62] avg loss 0.00300083, throughput 3.22939K wps
Begin Testing...
[Epoch 85] train avg loss 0.00295032, dev acc 0.8466, dev avg loss 0.348654, throughput 3.26889K wps
[Epoch 86 Batch 30/62] avg loss 0.00284831, throughput 3.27907K wps
[Epoch 86 Batch 60/62] avg loss 0.00286974, throughput 3.22702K wps
Begin Testing...
[Epoch 86] train avg loss 0.00290854, dev acc 0.8407, dev avg loss 0.348133, throughput 3.25914K wps
[Epoch 87 Batch 30/62] avg loss 0.0027093, throughput 3.27183K wps
[Epoch 87 Batch 60/62] avg loss 0.00287652, throughput 3.22873K wps
Begin Testing...
[Epoch 87] train avg loss 0.00285864, dev acc 0.8407, dev avg loss 0.348032, throughput 3.25651K wps
[Epoch 88 Batch 30/62] avg loss 0.00268569, throughput 3.31682K wps
[Epoch 88 Batch 60/62] avg loss 0.0026856, throughput 3.22617K wps
Begin Testing...
[Epoch 88] train avg loss 0.00274171, dev acc 0.8466, dev avg loss 0.347359, throughput 3.27826K wps
[Epoch 89 Batch 30/62] avg loss 0.00269036, throughput 3.31089K wps
[Epoch 89 Batch 60/62] avg loss 0.00271591, throughput 3.213K wps
Begin Testing...
[Epoch 89] train avg loss 0.00275124, dev acc 0.8466, dev avg loss 0.347448, throughput 3.26876K wps
[Epoch 90 Batch 30/62] avg loss 0.00246932, throughput 3.30382K wps
[Epoch 90 Batch 60/62] avg loss 0.00271609, throughput 3.24554K wps
Begin Testing...
[Epoch 90] train avg loss 0.00263833, dev acc 0.8496, dev avg loss 0.346758, throughput 3.28035K wps
Observed Improvement.
Begin Testing...
[Epoch 91 Batch 30/62] avg loss 0.00253715, throughput 3.32758K wps
[Epoch 91 Batch 60/62] avg loss 0.00264324, throughput 3.23071K wps
Begin Testing...
[Epoch 91] train avg loss 0.00263435, dev acc 0.8466, dev avg loss 0.346775, throughput 3.28562K wps
[Epoch 92 Batch 30/62] avg loss 0.00255345, throughput 3.32033K wps
[Epoch 92 Batch 60/62] avg loss 0.00248995, throughput 3.23024K wps
Begin Testing...
[Epoch 92] train avg loss 0.00255133, dev acc 0.8466, dev avg loss 0.346073, throughput 3.28011K wps
[Epoch 93 Batch 30/62] avg loss 0.00243459, throughput 3.29284K wps
[Epoch 93 Batch 60/62] avg loss 0.00246514, throughput 3.24639K wps
Begin Testing...
[Epoch 93] train avg loss 0.00252803, dev acc 0.8496, dev avg loss 0.347334, throughput 3.27726K wps
Observed Improvement.
Begin Testing...
[Epoch 94 Batch 30/62] avg loss 0.00248258, throughput 3.31126K wps
[Epoch 94 Batch 60/62] avg loss 0.00234355, throughput 3.21362K wps
Begin Testing...
[Epoch 94] train avg loss 0.00244215, dev acc 0.8466, dev avg loss 0.345822, throughput 3.26814K wps
[Epoch 95 Batch 30/62] avg loss 0.0023408, throughput 3.29349K wps
[Epoch 95 Batch 60/62] avg loss 0.00244367, throughput 3.18644K wps
Begin Testing...
[Epoch 95] train avg loss 0.00242094, dev acc 0.8407, dev avg loss 0.34588, throughput 3.24688K wps
[Epoch 96 Batch 30/62] avg loss 0.00237346, throughput 3.31238K wps
[Epoch 96 Batch 60/62] avg loss 0.00242856, throughput 3.23491K wps
Begin Testing...
[Epoch 96] train avg loss 0.00246857, dev acc 0.8466, dev avg loss 0.345914, throughput 3.2807K wps
[Epoch 97 Batch 30/62] avg loss 0.00232591, throughput 3.3048K wps
[Epoch 97 Batch 60/62] avg loss 0.00235547, throughput 3.22443K wps
Begin Testing...
[Epoch 97] train avg loss 0.00237344, dev acc 0.8466, dev avg loss 0.344521, throughput 3.26943K wps
[Epoch 98 Batch 30/62] avg loss 0.0023309, throughput 3.28377K wps
[Epoch 98 Batch 60/62] avg loss 0.00214544, throughput 3.21489K wps
Begin Testing...
[Epoch 98] train avg loss 0.00227291, dev acc 0.8466, dev avg loss 0.34419, throughput 3.25536K wps
[Epoch 99 Batch 30/62] avg loss 0.00223717, throughput 3.28702K wps
[Epoch 99 Batch 60/62] avg loss 0.0023192, throughput 3.2125K wps
Begin Testing...
[Epoch 99] train avg loss 0.00229702, dev acc 0.8466, dev avg loss 0.343768, throughput 3.25484K wps
[Epoch 100 Batch 30/62] avg loss 0.00219207, throughput 3.27576K wps
[Epoch 100 Batch 60/62] avg loss 0.00211737, throughput 3.21759K wps
Begin Testing...
[Epoch 100] train avg loss 0.00221589, dev acc 0.8496, dev avg loss 0.34293, throughput 3.25282K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/62] avg loss 0.00229696, throughput 3.2952K wps
[Epoch 101 Batch 60/62] avg loss 0.00204625, throughput 3.24204K wps
Begin Testing...
[Epoch 101] train avg loss 0.00221634, dev acc 0.8496, dev avg loss 0.342664, throughput 3.2745K wps
Observed Improvement.
Begin Testing...
[Epoch 102 Batch 30/62] avg loss 0.00200643, throughput 3.30061K wps
[Epoch 102 Batch 60/62] avg loss 0.00216137, throughput 3.22655K wps
Begin Testing...
[Epoch 102] train avg loss 0.00208087, dev acc 0.8437, dev avg loss 0.344007, throughput 3.26911K wps
[Epoch 103 Batch 30/62] avg loss 0.00207705, throughput 3.29755K wps
[Epoch 103 Batch 60/62] avg loss 0.00221723, throughput 3.24124K wps
Begin Testing...
[Epoch 103] train avg loss 0.00214335, dev acc 0.8466, dev avg loss 0.34329, throughput 3.27708K wps
[Epoch 104 Batch 30/62] avg loss 0.00202585, throughput 3.31731K wps
[Epoch 104 Batch 60/62] avg loss 0.00219003, throughput 3.23735K wps
Begin Testing...
[Epoch 104] train avg loss 0.00211548, dev acc 0.8496, dev avg loss 0.343006, throughput 3.28252K wps
Observed Improvement.
Begin Testing...
[Epoch 105 Batch 30/62] avg loss 0.00204907, throughput 3.27193K wps
[Epoch 105 Batch 60/62] avg loss 0.00196698, throughput 3.21692K wps
Begin Testing...
[Epoch 105] train avg loss 0.00203563, dev acc 0.8466, dev avg loss 0.34509, throughput 3.25127K wps
[Epoch 106 Batch 30/62] avg loss 0.00192504, throughput 3.29099K wps
[Epoch 106 Batch 60/62] avg loss 0.00193131, throughput 3.22612K wps
Begin Testing...
[Epoch 106] train avg loss 0.00198017, dev acc 0.8496, dev avg loss 0.345384, throughput 3.26368K wps
Observed Improvement.
Begin Testing...
[Epoch 107 Batch 30/62] avg loss 0.00207268, throughput 3.31994K wps
[Epoch 107 Batch 60/62] avg loss 0.00191441, throughput 3.23559K wps
Begin Testing...
[Epoch 107] train avg loss 0.00198802, dev acc 0.8496, dev avg loss 0.345244, throughput 3.28231K wps
Observed Improvement.
Begin Testing...
[Epoch 108 Batch 30/62] avg loss 0.00170517, throughput 3.29935K wps
[Epoch 108 Batch 60/62] avg loss 0.00215385, throughput 3.22096K wps
Begin Testing...
[Epoch 108] train avg loss 0.00196, dev acc 0.8496, dev avg loss 0.342056, throughput 3.26574K wps
Observed Improvement.
Begin Testing...
[Epoch 109 Batch 30/62] avg loss 0.00196512, throughput 3.29108K wps
[Epoch 109 Batch 60/62] avg loss 0.00177593, throughput 3.22062K wps
Begin Testing...
[Epoch 109] train avg loss 0.00188373, dev acc 0.8437, dev avg loss 0.342196, throughput 3.26084K wps
[Epoch 110 Batch 30/62] avg loss 0.00186868, throughput 3.27776K wps
[Epoch 110 Batch 60/62] avg loss 0.0018461, throughput 3.22216K wps
Begin Testing...
[Epoch 110] train avg loss 0.00186281, dev acc 0.8437, dev avg loss 0.343106, throughput 3.25524K wps
[Epoch 111 Batch 30/62] avg loss 0.0018439, throughput 3.29655K wps
[Epoch 111 Batch 60/62] avg loss 0.00174038, throughput 3.22732K wps
Begin Testing...
[Epoch 111] train avg loss 0.00179277, dev acc 0.8437, dev avg loss 0.341188, throughput 3.26696K wps
[Epoch 112 Batch 30/62] avg loss 0.00173293, throughput 3.29565K wps
[Epoch 112 Batch 60/62] avg loss 0.00177008, throughput 3.22674K wps
Begin Testing...
[Epoch 112] train avg loss 0.00177239, dev acc 0.8466, dev avg loss 0.342255, throughput 3.26704K wps
[Epoch 113 Batch 30/62] avg loss 0.00182559, throughput 3.31716K wps
[Epoch 113 Batch 60/62] avg loss 0.00173395, throughput 3.22361K wps
Begin Testing...
[Epoch 113] train avg loss 0.00181798, dev acc 0.8437, dev avg loss 0.341068, throughput 3.27554K wps
[Epoch 114 Batch 30/62] avg loss 0.00173138, throughput 3.28974K wps
[Epoch 114 Batch 60/62] avg loss 0.00188174, throughput 3.21779K wps
Begin Testing...
[Epoch 114] train avg loss 0.00180752, dev acc 0.8525, dev avg loss 0.340692, throughput 3.25957K wps
Observed Improvement.
Begin Testing...
[Epoch 115 Batch 30/62] avg loss 0.00170217, throughput 3.2948K wps
[Epoch 115 Batch 60/62] avg loss 0.00186061, throughput 3.20686K wps
Begin Testing...
[Epoch 115] train avg loss 0.00183166, dev acc 0.8496, dev avg loss 0.340832, throughput 3.25744K wps
[Epoch 116 Batch 30/62] avg loss 0.00175122, throughput 3.2952K wps
[Epoch 116 Batch 60/62] avg loss 0.0017778, throughput 3.22226K wps
Begin Testing...
[Epoch 116] train avg loss 0.00176267, dev acc 0.8466, dev avg loss 0.341276, throughput 3.26582K wps
[Epoch 117 Batch 30/62] avg loss 0.00172923, throughput 3.29993K wps
[Epoch 117 Batch 60/62] avg loss 0.00164908, throughput 3.2193K wps
Begin Testing...
[Epoch 117] train avg loss 0.00172604, dev acc 0.8496, dev avg loss 0.340689, throughput 3.26451K wps
[Epoch 118 Batch 30/62] avg loss 0.00167882, throughput 3.27668K wps
[Epoch 118 Batch 60/62] avg loss 0.00171154, throughput 3.22127K wps
Begin Testing...
[Epoch 118] train avg loss 0.00169803, dev acc 0.8525, dev avg loss 0.340356, throughput 3.25544K wps
Observed Improvement.
Begin Testing...
[Epoch 119 Batch 30/62] avg loss 0.00160747, throughput 3.2959K wps
[Epoch 119 Batch 60/62] avg loss 0.00159016, throughput 3.21871K wps
Begin Testing...
[Epoch 119] train avg loss 0.00162367, dev acc 0.8496, dev avg loss 0.340183, throughput 3.26288K wps
[Epoch 120 Batch 30/62] avg loss 0.00153113, throughput 3.28818K wps
[Epoch 120 Batch 60/62] avg loss 0.00161773, throughput 3.20817K wps
Begin Testing...
[Epoch 120] train avg loss 0.00159766, dev acc 0.8466, dev avg loss 0.341118, throughput 3.25454K wps
[Epoch 121 Batch 30/62] avg loss 0.00155221, throughput 3.2937K wps
[Epoch 121 Batch 60/62] avg loss 0.00151937, throughput 3.20737K wps
Begin Testing...
[Epoch 121] train avg loss 0.00154819, dev acc 0.8437, dev avg loss 0.342405, throughput 3.25511K wps
[Epoch 122 Batch 30/62] avg loss 0.00151482, throughput 3.27201K wps
[Epoch 122 Batch 60/62] avg loss 0.00166428, throughput 3.16794K wps
Begin Testing...
[Epoch 122] train avg loss 0.0016167, dev acc 0.8437, dev avg loss 0.339672, throughput 3.22591K wps
[Epoch 123 Batch 30/62] avg loss 0.0014416, throughput 3.27914K wps
[Epoch 123 Batch 60/62] avg loss 0.00157283, throughput 3.19399K wps
Begin Testing...
[Epoch 123] train avg loss 0.00151458, dev acc 0.8466, dev avg loss 0.339799, throughput 3.2419K wps
[Epoch 124 Batch 30/62] avg loss 0.00150594, throughput 3.25741K wps
[Epoch 124 Batch 60/62] avg loss 0.00155604, throughput 3.19249K wps
Begin Testing...
[Epoch 124] train avg loss 0.00154321, dev acc 0.8466, dev avg loss 0.341189, throughput 3.233K wps
[Epoch 125 Batch 30/62] avg loss 0.00154937, throughput 3.28722K wps
[Epoch 125 Batch 60/62] avg loss 0.00148311, throughput 3.15884K wps
Begin Testing...
[Epoch 125] train avg loss 0.00154953, dev acc 0.8466, dev avg loss 0.340952, throughput 3.22992K wps
[Epoch 126 Batch 30/62] avg loss 0.00151633, throughput 3.25946K wps
[Epoch 126 Batch 60/62] avg loss 0.00130884, throughput 3.20562K wps
Begin Testing...
[Epoch 126] train avg loss 0.00144371, dev acc 0.8525, dev avg loss 0.34107, throughput 3.23973K wps
Observed Improvement.
Begin Testing...
[Epoch 127 Batch 30/62] avg loss 0.0014188, throughput 3.29852K wps
[Epoch 127 Batch 60/62] avg loss 0.00143805, throughput 3.16667K wps
Begin Testing...
[Epoch 127] train avg loss 0.00146258, dev acc 0.8496, dev avg loss 0.343938, throughput 3.23677K wps
[Epoch 128 Batch 30/62] avg loss 0.0014333, throughput 3.24278K wps
[Epoch 128 Batch 60/62] avg loss 0.00139733, throughput 3.23391K wps
Begin Testing...
[Epoch 128] train avg loss 0.00142468, dev acc 0.8496, dev avg loss 0.342024, throughput 3.24451K wps
[Epoch 129 Batch 30/62] avg loss 0.00139728, throughput 3.27921K wps
[Epoch 129 Batch 60/62] avg loss 0.0014569, throughput 3.20816K wps
Begin Testing...
[Epoch 129] train avg loss 0.00144908, dev acc 0.8437, dev avg loss 0.341057, throughput 3.2498K wps
[Epoch 130 Batch 30/62] avg loss 0.00146288, throughput 3.24797K wps
[Epoch 130 Batch 60/62] avg loss 0.00142048, throughput 3.16003K wps
Begin Testing...
[Epoch 130] train avg loss 0.00144056, dev acc 0.8437, dev avg loss 0.340409, throughput 3.20995K wps
[Epoch 131 Batch 30/62] avg loss 0.0012328, throughput 3.24788K wps
[Epoch 131 Batch 60/62] avg loss 0.00133832, throughput 3.21566K wps
Begin Testing...
[Epoch 131] train avg loss 0.00134197, dev acc 0.8496, dev avg loss 0.344094, throughput 3.23874K wps
[Epoch 132 Batch 30/62] avg loss 0.0013629, throughput 3.28564K wps
[Epoch 132 Batch 60/62] avg loss 0.00135662, throughput 3.18285K wps
Begin Testing...
[Epoch 132] train avg loss 0.00139703, dev acc 0.8437, dev avg loss 0.339328, throughput 3.24123K wps
[Epoch 133 Batch 30/62] avg loss 0.00133914, throughput 3.28315K wps
[Epoch 133 Batch 60/62] avg loss 0.00131714, throughput 3.16516K wps
Begin Testing...
[Epoch 133] train avg loss 0.00132773, dev acc 0.8496, dev avg loss 0.340419, throughput 3.22922K wps
[Epoch 134 Batch 30/62] avg loss 0.00132585, throughput 3.25528K wps
[Epoch 134 Batch 60/62] avg loss 0.00141869, throughput 3.20355K wps
Begin Testing...
[Epoch 134] train avg loss 0.00137408, dev acc 0.8437, dev avg loss 0.339816, throughput 3.23713K wps
[Epoch 135 Batch 30/62] avg loss 0.0012682, throughput 3.28969K wps
[Epoch 135 Batch 60/62] avg loss 0.00131783, throughput 3.21861K wps
Begin Testing...
[Epoch 135] train avg loss 0.00132304, dev acc 0.8496, dev avg loss 0.343146, throughput 3.26088K wps
[Epoch 136 Batch 30/62] avg loss 0.00124459, throughput 3.27533K wps
[Epoch 136 Batch 60/62] avg loss 0.00125055, throughput 3.19098K wps
Begin Testing...
[Epoch 136] train avg loss 0.00126168, dev acc 0.8437, dev avg loss 0.340531, throughput 3.24103K wps
[Epoch 137 Batch 30/62] avg loss 0.00125167, throughput 3.28519K wps
[Epoch 137 Batch 60/62] avg loss 0.00118954, throughput 3.16683K wps
Begin Testing...
[Epoch 137] train avg loss 0.00126426, dev acc 0.8437, dev avg loss 0.339606, throughput 3.23128K wps
[Epoch 138 Batch 30/62] avg loss 0.00122693, throughput 3.23819K wps
[Epoch 138 Batch 60/62] avg loss 0.00125148, throughput 3.21823K wps
Begin Testing...
[Epoch 138] train avg loss 0.00125968, dev acc 0.8496, dev avg loss 0.339936, throughput 3.23584K wps
[Epoch 139 Batch 30/62] avg loss 0.0011958, throughput 3.27074K wps
[Epoch 139 Batch 60/62] avg loss 0.0011796, throughput 3.14416K wps
Begin Testing...
[Epoch 139] train avg loss 0.00118539, dev acc 0.8437, dev avg loss 0.339801, throughput 3.2115K wps
[Epoch 140 Batch 30/62] avg loss 0.00121735, throughput 3.26247K wps
[Epoch 140 Batch 60/62] avg loss 0.00115298, throughput 3.19158K wps
Begin Testing...
[Epoch 140] train avg loss 0.00121297, dev acc 0.8437, dev avg loss 0.345743, throughput 3.23542K wps
[Epoch 141 Batch 30/62] avg loss 0.00118919, throughput 3.28647K wps
[Epoch 141 Batch 60/62] avg loss 0.00118013, throughput 3.21056K wps
Begin Testing...
[Epoch 141] train avg loss 0.00118345, dev acc 0.8496, dev avg loss 0.340132, throughput 3.25635K wps
[Epoch 142 Batch 30/62] avg loss 0.00122348, throughput 3.2703K wps
[Epoch 142 Batch 60/62] avg loss 0.00116421, throughput 3.16348K wps
Begin Testing...
[Epoch 142] train avg loss 0.00122732, dev acc 0.8496, dev avg loss 0.341952, throughput 3.22222K wps
[Epoch 143 Batch 30/62] avg loss 0.00112318, throughput 3.24672K wps
[Epoch 143 Batch 60/62] avg loss 0.00123521, throughput 3.2206K wps
Begin Testing...
[Epoch 143] train avg loss 0.00118937, dev acc 0.8437, dev avg loss 0.34011, throughput 3.23928K wps
[Epoch 144 Batch 30/62] avg loss 0.00111259, throughput 3.27706K wps
[Epoch 144 Batch 60/62] avg loss 0.00119659, throughput 3.16071K wps
Begin Testing...
[Epoch 144] train avg loss 0.00119549, dev acc 0.8466, dev avg loss 0.340388, throughput 3.22422K wps
[Epoch 145 Batch 30/62] avg loss 0.00110658, throughput 3.26845K wps
[Epoch 145 Batch 60/62] avg loss 0.0011254, throughput 3.21243K wps
Begin Testing...
[Epoch 145] train avg loss 0.00111968, dev acc 0.8466, dev avg loss 0.343084, throughput 3.24613K wps
[Epoch 146 Batch 30/62] avg loss 0.00101754, throughput 3.26999K wps
[Epoch 146 Batch 60/62] avg loss 0.00109438, throughput 3.21496K wps
Begin Testing...
[Epoch 146] train avg loss 0.00106141, dev acc 0.8496, dev avg loss 0.341961, throughput 3.24786K wps
[Epoch 147 Batch 30/62] avg loss 0.00106188, throughput 3.27187K wps
[Epoch 147 Batch 60/62] avg loss 0.00114334, throughput 3.16855K wps
Begin Testing...
[Epoch 147] train avg loss 0.00110893, dev acc 0.8496, dev avg loss 0.342779, throughput 3.22603K wps
[Epoch 148 Batch 30/62] avg loss 0.00111172, throughput 3.29632K wps
[Epoch 148 Batch 60/62] avg loss 0.00111183, throughput 3.16967K wps
Begin Testing...
[Epoch 148] train avg loss 0.00111702, dev acc 0.8555, dev avg loss 0.341739, throughput 3.23686K wps
Observed Improvement.
Begin Testing...
[Epoch 149 Batch 30/62] avg loss 0.00105433, throughput 3.28371K wps
[Epoch 149 Batch 60/62] avg loss 0.00113701, throughput 3.16029K wps
Begin Testing...
[Epoch 149] train avg loss 0.00112407, dev acc 0.8496, dev avg loss 0.341483, throughput 3.2273K wps
[Epoch 150 Batch 30/62] avg loss 0.00120487, throughput 3.27019K wps
[Epoch 150 Batch 60/62] avg loss 0.000971015, throughput 3.21639K wps
Begin Testing...
[Epoch 150] train avg loss 0.0010961, dev acc 0.8496, dev avg loss 0.341314, throughput 3.2487K wps
[Epoch 151 Batch 30/62] avg loss 0.00106268, throughput 3.2823K wps
[Epoch 151 Batch 60/62] avg loss 0.00104255, throughput 3.20805K wps
Begin Testing...
[Epoch 151] train avg loss 0.00106295, dev acc 0.8466, dev avg loss 0.341685, throughput 3.25039K wps
[Epoch 152 Batch 30/62] avg loss 0.000973639, throughput 3.2948K wps
[Epoch 152 Batch 60/62] avg loss 0.000944078, throughput 3.21382K wps
Begin Testing...
[Epoch 152] train avg loss 0.000970312, dev acc 0.8437, dev avg loss 0.343701, throughput 3.25918K wps
[Epoch 153 Batch 30/62] avg loss 0.000934971, throughput 3.2813K wps
[Epoch 153 Batch 60/62] avg loss 0.00106851, throughput 3.22658K wps
Begin Testing...
[Epoch 153] train avg loss 0.0010178, dev acc 0.8496, dev avg loss 0.342568, throughput 3.25872K wps
[Epoch 154 Batch 30/62] avg loss 0.00112719, throughput 3.25682K wps
[Epoch 154 Batch 60/62] avg loss 0.00106178, throughput 3.16287K wps
Begin Testing...
[Epoch 154] train avg loss 0.00110501, dev acc 0.8496, dev avg loss 0.34309, throughput 3.21558K wps
[Epoch 155 Batch 30/62] avg loss 0.00100207, throughput 3.25135K wps
[Epoch 155 Batch 60/62] avg loss 0.00102763, throughput 3.15865K wps
Begin Testing...
[Epoch 155] train avg loss 0.00104573, dev acc 0.8496, dev avg loss 0.342002, throughput 3.2116K wps
[Epoch 156 Batch 30/62] avg loss 0.000896598, throughput 3.24557K wps
[Epoch 156 Batch 60/62] avg loss 0.00106373, throughput 3.2061K wps
Begin Testing...
[Epoch 156] train avg loss 0.000985206, dev acc 0.8496, dev avg loss 0.341746, throughput 3.23281K wps
[Epoch 157 Batch 30/62] avg loss 0.000981454, throughput 3.26289K wps
[Epoch 157 Batch 60/62] avg loss 0.00101228, throughput 3.15976K wps
Begin Testing...
[Epoch 157] train avg loss 0.000993166, dev acc 0.8496, dev avg loss 0.341692, throughput 3.21631K wps
[Epoch 158 Batch 30/62] avg loss 0.00106181, throughput 3.25693K wps
[Epoch 158 Batch 60/62] avg loss 0.000931934, throughput 3.197K wps
Begin Testing...
[Epoch 158] train avg loss 0.0010187, dev acc 0.8496, dev avg loss 0.341877, throughput 3.23534K wps
[Epoch 159 Batch 30/62] avg loss 0.000943678, throughput 3.28211K wps
[Epoch 159 Batch 60/62] avg loss 0.000961083, throughput 3.1521K wps
Begin Testing...
[Epoch 159] train avg loss 0.00097757, dev acc 0.8525, dev avg loss 0.343581, throughput 3.22192K wps
[Epoch 160 Batch 30/62] avg loss 0.00092745, throughput 3.26828K wps
[Epoch 160 Batch 60/62] avg loss 0.000976141, throughput 3.22602K wps
Begin Testing...
[Epoch 160] train avg loss 0.000954505, dev acc 0.8496, dev avg loss 0.342723, throughput 3.25248K wps
[Epoch 161 Batch 30/62] avg loss 0.000851404, throughput 3.2787K wps
[Epoch 161 Batch 60/62] avg loss 0.00104231, throughput 3.21164K wps
Begin Testing...
[Epoch 161] train avg loss 0.000947883, dev acc 0.8555, dev avg loss 0.346472, throughput 3.25017K wps
Observed Improvement.
Begin Testing...
[Epoch 162 Batch 30/62] avg loss 0.000915935, throughput 3.30783K wps
[Epoch 162 Batch 60/62] avg loss 0.000988692, throughput 3.18816K wps
Begin Testing...
[Epoch 162] train avg loss 0.000951952, dev acc 0.8437, dev avg loss 0.34403, throughput 3.25505K wps
[Epoch 163 Batch 30/62] avg loss 0.000933059, throughput 3.30498K wps
[Epoch 163 Batch 60/62] avg loss 0.000927318, throughput 3.16638K wps
Begin Testing...
[Epoch 163] train avg loss 0.000948598, dev acc 0.8525, dev avg loss 0.343077, throughput 3.24067K wps
[Epoch 164 Batch 30/62] avg loss 0.000959348, throughput 3.29521K wps
[Epoch 164 Batch 60/62] avg loss 0.000851596, throughput 3.20678K wps
Begin Testing...
[Epoch 164] train avg loss 0.000901026, dev acc 0.8525, dev avg loss 0.344851, throughput 3.25798K wps
[Epoch 165 Batch 30/62] avg loss 0.000896554, throughput 3.29669K wps
[Epoch 165 Batch 60/62] avg loss 0.00084658, throughput 3.22739K wps
Begin Testing...
[Epoch 165] train avg loss 0.000880318, dev acc 0.8525, dev avg loss 0.343657, throughput 3.27038K wps
[Epoch 166 Batch 30/62] avg loss 0.000903747, throughput 3.30496K wps
[Epoch 166 Batch 60/62] avg loss 0.000792821, throughput 3.22124K wps
Begin Testing...
[Epoch 166] train avg loss 0.00085596, dev acc 0.8525, dev avg loss 0.344388, throughput 3.2701K wps
[Epoch 167 Batch 30/62] avg loss 0.000814398, throughput 3.30349K wps
[Epoch 167 Batch 60/62] avg loss 0.000858468, throughput 3.2345K wps
Begin Testing...
[Epoch 167] train avg loss 0.000832874, dev acc 0.8496, dev avg loss 0.343812, throughput 3.27716K wps
[Epoch 168 Batch 30/62] avg loss 0.000845042, throughput 3.32938K wps
[Epoch 168 Batch 60/62] avg loss 0.000904578, throughput 3.25065K wps
Begin Testing...
[Epoch 168] train avg loss 0.000892341, dev acc 0.8466, dev avg loss 0.345093, throughput 3.29532K wps
[Epoch 169 Batch 30/62] avg loss 0.000883103, throughput 3.30309K wps
[Epoch 169 Batch 60/62] avg loss 0.000822576, throughput 3.22911K wps
Begin Testing...
[Epoch 169] train avg loss 0.000866432, dev acc 0.8466, dev avg loss 0.345515, throughput 3.2742K wps
[Epoch 170 Batch 30/62] avg loss 0.000841103, throughput 3.30294K wps
[Epoch 170 Batch 60/62] avg loss 0.000783954, throughput 3.23194K wps
Begin Testing...
[Epoch 170] train avg loss 0.000823333, dev acc 0.8525, dev avg loss 0.344872, throughput 3.27348K wps
[Epoch 171 Batch 30/62] avg loss 0.000883716, throughput 3.29394K wps
[Epoch 171 Batch 60/62] avg loss 0.000815475, throughput 3.22431K wps
Begin Testing...
[Epoch 171] train avg loss 0.00084755, dev acc 0.8525, dev avg loss 0.34506, throughput 3.26588K wps
[Epoch 172 Batch 30/62] avg loss 0.000866507, throughput 3.29359K wps
[Epoch 172 Batch 60/62] avg loss 0.000851845, throughput 3.21502K wps
Begin Testing...
[Epoch 172] train avg loss 0.000871968, dev acc 0.8525, dev avg loss 0.346183, throughput 3.26022K wps
[Epoch 173 Batch 30/62] avg loss 0.000854901, throughput 3.29194K wps
[Epoch 173 Batch 60/62] avg loss 0.000862846, throughput 3.21001K wps
Begin Testing...
[Epoch 173] train avg loss 0.000858637, dev acc 0.8496, dev avg loss 0.344943, throughput 3.26111K wps
[Epoch 174 Batch 30/62] avg loss 0.000814364, throughput 3.31785K wps
[Epoch 174 Batch 60/62] avg loss 0.000846671, throughput 3.23189K wps
Begin Testing...
[Epoch 174] train avg loss 0.000834033, dev acc 0.8496, dev avg loss 0.344732, throughput 3.28094K wps
[Epoch 175 Batch 30/62] avg loss 0.00084613, throughput 3.29852K wps
[Epoch 175 Batch 60/62] avg loss 0.000844153, throughput 3.23329K wps
Begin Testing...
[Epoch 175] train avg loss 0.000846451, dev acc 0.8584, dev avg loss 0.345891, throughput 3.27168K wps
Observed Improvement.
Begin Testing...
[Epoch 176 Batch 30/62] avg loss 0.000860172, throughput 3.31199K wps
[Epoch 176 Batch 60/62] avg loss 0.00084035, throughput 3.22724K wps
Begin Testing...
[Epoch 176] train avg loss 0.00086672, dev acc 0.8466, dev avg loss 0.345166, throughput 3.27494K wps
[Epoch 177 Batch 30/62] avg loss 0.000722948, throughput 3.28945K wps
[Epoch 177 Batch 60/62] avg loss 0.000836503, throughput 3.21748K wps
Begin Testing...
[Epoch 177] train avg loss 0.000781123, dev acc 0.8466, dev avg loss 0.344365, throughput 3.25874K wps
[Epoch 178 Batch 30/62] avg loss 0.000764542, throughput 3.31346K wps
[Epoch 178 Batch 60/62] avg loss 0.000765563, throughput 3.22329K wps
Begin Testing...
[Epoch 178] train avg loss 0.000763563, dev acc 0.8525, dev avg loss 0.346392, throughput 3.27536K wps
[Epoch 179 Batch 30/62] avg loss 0.000811286, throughput 3.31654K wps
[Epoch 179 Batch 60/62] avg loss 0.000715919, throughput 3.22919K wps
Begin Testing...
[Epoch 179] train avg loss 0.000774862, dev acc 0.8555, dev avg loss 0.348367, throughput 3.2777K wps
[Epoch 180 Batch 30/62] avg loss 0.000696823, throughput 3.3104K wps
[Epoch 180 Batch 60/62] avg loss 0.000803336, throughput 3.2471K wps
Begin Testing...
[Epoch 180] train avg loss 0.000758519, dev acc 0.8496, dev avg loss 0.345944, throughput 3.28581K wps
[Epoch 181 Batch 30/62] avg loss 0.000767837, throughput 3.31031K wps
[Epoch 181 Batch 60/62] avg loss 0.000723816, throughput 3.24106K wps
Begin Testing...
[Epoch 181] train avg loss 0.000746746, dev acc 0.8525, dev avg loss 0.345864, throughput 3.28088K wps
[Epoch 182 Batch 30/62] avg loss 0.000721549, throughput 3.28405K wps
[Epoch 182 Batch 60/62] avg loss 0.000778338, throughput 3.22947K wps
Begin Testing...
[Epoch 182] train avg loss 0.000755079, dev acc 0.8496, dev avg loss 0.345693, throughput 3.26387K wps
[Epoch 183 Batch 30/62] avg loss 0.00072447, throughput 3.30586K wps
[Epoch 183 Batch 60/62] avg loss 0.000743345, throughput 3.18327K wps
Begin Testing...
[Epoch 183] train avg loss 0.000752239, dev acc 0.8555, dev avg loss 0.346102, throughput 3.24944K wps
[Epoch 184 Batch 30/62] avg loss 0.000775247, throughput 3.31481K wps
[Epoch 184 Batch 60/62] avg loss 0.00077893, throughput 3.21825K wps
Begin Testing...
[Epoch 184] train avg loss 0.000774407, dev acc 0.8555, dev avg loss 0.345241, throughput 3.27325K wps
[Epoch 185 Batch 30/62] avg loss 0.0007105, throughput 3.31086K wps
[Epoch 185 Batch 60/62] avg loss 0.000723204, throughput 3.21943K wps
Begin Testing...
[Epoch 185] train avg loss 0.000725065, dev acc 0.8555, dev avg loss 0.346558, throughput 3.27126K wps
[Epoch 186 Batch 30/62] avg loss 0.000661307, throughput 3.30543K wps
[Epoch 186 Batch 60/62] avg loss 0.000745419, throughput 3.23815K wps
Begin Testing...
[Epoch 186] train avg loss 0.00071352, dev acc 0.8466, dev avg loss 0.346294, throughput 3.27741K wps
[Epoch 187 Batch 30/62] avg loss 0.000688969, throughput 3.33158K wps
[Epoch 187 Batch 60/62] avg loss 0.000715608, throughput 3.21057K wps
Begin Testing...
[Epoch 187] train avg loss 0.000713161, dev acc 0.8466, dev avg loss 0.347011, throughput 3.27687K wps
[Epoch 188 Batch 30/62] avg loss 0.000711306, throughput 3.31796K wps
[Epoch 188 Batch 60/62] avg loss 0.000707317, throughput 3.24578K wps
Begin Testing...
[Epoch 188] train avg loss 0.000713817, dev acc 0.8437, dev avg loss 0.347159, throughput 3.28651K wps
[Epoch 189 Batch 30/62] avg loss 0.00070971, throughput 3.33583K wps
[Epoch 189 Batch 60/62] avg loss 0.000667546, throughput 3.2365K wps
Begin Testing...
[Epoch 189] train avg loss 0.000688979, dev acc 0.8496, dev avg loss 0.346572, throughput 3.29079K wps
[Epoch 190 Batch 30/62] avg loss 0.000657231, throughput 3.19446K wps
[Epoch 190 Batch 60/62] avg loss 0.000642676, throughput 3.23662K wps
Begin Testing...
[Epoch 190] train avg loss 0.00065079, dev acc 0.8555, dev avg loss 0.348196, throughput 3.2239K wps
[Epoch 191 Batch 30/62] avg loss 0.000728803, throughput 3.31563K wps
[Epoch 191 Batch 60/62] avg loss 0.000617561, throughput 3.23705K wps
Begin Testing...
[Epoch 191] train avg loss 0.000676455, dev acc 0.8555, dev avg loss 0.347856, throughput 3.28347K wps
[Epoch 192 Batch 30/62] avg loss 0.000632256, throughput 3.31819K wps
[Epoch 192 Batch 60/62] avg loss 0.000629428, throughput 3.24146K wps
Begin Testing...
[Epoch 192] train avg loss 0.000642416, dev acc 0.8466, dev avg loss 0.349594, throughput 3.28657K wps
[Epoch 193 Batch 30/62] avg loss 0.00064813, throughput 3.33302K wps
[Epoch 193 Batch 60/62] avg loss 0.000743202, throughput 3.24246K wps
Begin Testing...
[Epoch 193] train avg loss 0.000702278, dev acc 0.8496, dev avg loss 0.348201, throughput 3.29271K wps
[Epoch 194 Batch 30/62] avg loss 0.000721388, throughput 3.32434K wps
[Epoch 194 Batch 60/62] avg loss 0.000666139, throughput 3.24138K wps
Begin Testing...
[Epoch 194] train avg loss 0.000696048, dev acc 0.8555, dev avg loss 0.34939, throughput 3.28783K wps
[Epoch 195 Batch 30/62] avg loss 0.00060079, throughput 3.27294K wps
[Epoch 195 Batch 60/62] avg loss 0.000631133, throughput 3.23472K wps
Begin Testing...
[Epoch 195] train avg loss 0.000620766, dev acc 0.8555, dev avg loss 0.350398, throughput 3.26021K wps
[Epoch 196 Batch 30/62] avg loss 0.000621068, throughput 3.31609K wps
[Epoch 196 Batch 60/62] avg loss 0.000678999, throughput 3.2309K wps
Begin Testing...
[Epoch 196] train avg loss 0.000660011, dev acc 0.8466, dev avg loss 0.348, throughput 3.28095K wps
[Epoch 197 Batch 30/62] avg loss 0.000633607, throughput 3.32272K wps
[Epoch 197 Batch 60/62] avg loss 0.000634824, throughput 3.24026K wps
Begin Testing...
[Epoch 197] train avg loss 0.00064376, dev acc 0.8466, dev avg loss 0.347967, throughput 3.28649K wps
[Epoch 198 Batch 30/62] avg loss 0.00058103, throughput 3.318K wps
[Epoch 198 Batch 60/62] avg loss 0.000628005, throughput 3.22471K wps
Begin Testing...
[Epoch 198] train avg loss 0.00060593, dev acc 0.8496, dev avg loss 0.348249, throughput 3.27788K wps
[Epoch 199 Batch 30/62] avg loss 0.000645722, throughput 3.3036K wps
[Epoch 199 Batch 60/62] avg loss 0.000632027, throughput 3.21568K wps
Begin Testing...
[Epoch 199] train avg loss 0.000635151, dev acc 0.8496, dev avg loss 0.348341, throughput 3.26589K wps
Test loss 0.348589, test acc 0.8435
Total time cost 419.15s
[Epoch 0 Batch 30/62] avg loss 0.0133872, throughput 3.08643K wps
[Epoch 0 Batch 60/62] avg loss 0.0130518, throughput 3.22455K wps
Begin Testing...
[Epoch 0] train avg loss 0.0134241, dev acc 0.6578, dev avg loss 0.635898, throughput 3.16398K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0130752, throughput 3.30218K wps
[Epoch 1 Batch 60/62] avg loss 0.0129574, throughput 3.23011K wps
Begin Testing...
[Epoch 1] train avg loss 0.0131625, dev acc 0.6578, dev avg loss 0.627297, throughput 3.27173K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0130436, throughput 3.30262K wps
[Epoch 2 Batch 60/62] avg loss 0.0126104, throughput 3.2365K wps
Begin Testing...
[Epoch 2] train avg loss 0.0130084, dev acc 0.6578, dev avg loss 0.621891, throughput 3.27541K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0126603, throughput 3.3106K wps
[Epoch 3 Batch 60/62] avg loss 0.0123294, throughput 3.22307K wps
Begin Testing...
[Epoch 3] train avg loss 0.012653, dev acc 0.6578, dev avg loss 0.612855, throughput 3.27177K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0123848, throughput 3.29232K wps
[Epoch 4 Batch 60/62] avg loss 0.0124521, throughput 3.22259K wps
Begin Testing...
[Epoch 4] train avg loss 0.0125916, dev acc 0.6578, dev avg loss 0.607868, throughput 3.26344K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0122161, throughput 3.31535K wps
[Epoch 5 Batch 60/62] avg loss 0.0121326, throughput 3.23208K wps
Begin Testing...
[Epoch 5] train avg loss 0.0123823, dev acc 0.6637, dev avg loss 0.604768, throughput 3.27914K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.012042, throughput 3.26025K wps
[Epoch 6 Batch 60/62] avg loss 0.012105, throughput 3.21346K wps
Begin Testing...
[Epoch 6] train avg loss 0.0122279, dev acc 0.6578, dev avg loss 0.59189, throughput 3.24507K wps
[Epoch 7 Batch 30/62] avg loss 0.0118146, throughput 3.29663K wps
[Epoch 7 Batch 60/62] avg loss 0.0119296, throughput 3.21937K wps
Begin Testing...
[Epoch 7] train avg loss 0.0120425, dev acc 0.6608, dev avg loss 0.584805, throughput 3.26357K wps
[Epoch 8 Batch 30/62] avg loss 0.0116738, throughput 3.30362K wps
[Epoch 8 Batch 60/62] avg loss 0.0117841, throughput 3.24334K wps
Begin Testing...
[Epoch 8] train avg loss 0.0118587, dev acc 0.6578, dev avg loss 0.576268, throughput 3.2811K wps
[Epoch 9 Batch 30/62] avg loss 0.0115351, throughput 3.30467K wps
[Epoch 9 Batch 60/62] avg loss 0.011411, throughput 3.21553K wps
Begin Testing...
[Epoch 9] train avg loss 0.011634, dev acc 0.6696, dev avg loss 0.568223, throughput 3.26638K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0112657, throughput 3.3033K wps
[Epoch 10 Batch 60/62] avg loss 0.0112403, throughput 3.21329K wps
Begin Testing...
[Epoch 10] train avg loss 0.0113868, dev acc 0.6696, dev avg loss 0.559592, throughput 3.26492K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0111209, throughput 3.29065K wps
[Epoch 11 Batch 60/62] avg loss 0.0110126, throughput 3.20681K wps
Begin Testing...
[Epoch 11] train avg loss 0.0112296, dev acc 0.7021, dev avg loss 0.550843, throughput 3.25428K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0110202, throughput 3.28165K wps
[Epoch 12 Batch 60/62] avg loss 0.0106829, throughput 3.22547K wps
Begin Testing...
[Epoch 12] train avg loss 0.0109901, dev acc 0.7198, dev avg loss 0.542508, throughput 3.26035K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.010662, throughput 3.29517K wps
[Epoch 13 Batch 60/62] avg loss 0.0104791, throughput 3.22508K wps
Begin Testing...
[Epoch 13] train avg loss 0.0107074, dev acc 0.7434, dev avg loss 0.534533, throughput 3.26606K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0105088, throughput 3.29455K wps
[Epoch 14 Batch 60/62] avg loss 0.0103709, throughput 3.22473K wps
Begin Testing...
[Epoch 14] train avg loss 0.0105791, dev acc 0.7227, dev avg loss 0.524756, throughput 3.26548K wps
[Epoch 15 Batch 30/62] avg loss 0.0101965, throughput 3.29455K wps
[Epoch 15 Batch 60/62] avg loss 0.0100784, throughput 3.21174K wps
Begin Testing...
[Epoch 15] train avg loss 0.0103146, dev acc 0.7729, dev avg loss 0.517253, throughput 3.25887K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.00991877, throughput 3.29799K wps
[Epoch 16 Batch 60/62] avg loss 0.0101939, throughput 3.23132K wps
Begin Testing...
[Epoch 16] train avg loss 0.0101726, dev acc 0.7404, dev avg loss 0.508453, throughput 3.27013K wps
[Epoch 17 Batch 30/62] avg loss 0.00994279, throughput 3.29238K wps
[Epoch 17 Batch 60/62] avg loss 0.00959039, throughput 3.20876K wps
Begin Testing...
[Epoch 17] train avg loss 0.00993636, dev acc 0.7729, dev avg loss 0.500393, throughput 3.25638K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.00952514, throughput 3.28136K wps
[Epoch 18 Batch 60/62] avg loss 0.0095271, throughput 3.21936K wps
Begin Testing...
[Epoch 18] train avg loss 0.00965069, dev acc 0.7847, dev avg loss 0.497622, throughput 3.25618K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.00934792, throughput 3.30574K wps
[Epoch 19 Batch 60/62] avg loss 0.00928945, throughput 3.23171K wps
Begin Testing...
[Epoch 19] train avg loss 0.00942503, dev acc 0.7906, dev avg loss 0.48672, throughput 3.27451K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.00898537, throughput 3.31492K wps
[Epoch 20 Batch 60/62] avg loss 0.00925153, throughput 3.21275K wps
Begin Testing...
[Epoch 20] train avg loss 0.00920402, dev acc 0.7906, dev avg loss 0.480389, throughput 3.26811K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.00905873, throughput 3.27395K wps
[Epoch 21 Batch 60/62] avg loss 0.00904219, throughput 3.21918K wps
Begin Testing...
[Epoch 21] train avg loss 0.00910034, dev acc 0.7876, dev avg loss 0.473408, throughput 3.25328K wps
[Epoch 22 Batch 30/62] avg loss 0.00857867, throughput 3.27903K wps
[Epoch 22 Batch 60/62] avg loss 0.00858277, throughput 3.21189K wps
Begin Testing...
[Epoch 22] train avg loss 0.00870549, dev acc 0.7906, dev avg loss 0.467127, throughput 3.25072K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.00873745, throughput 3.27305K wps
[Epoch 23 Batch 60/62] avg loss 0.0087071, throughput 3.19669K wps
Begin Testing...
[Epoch 23] train avg loss 0.00877512, dev acc 0.7906, dev avg loss 0.464014, throughput 3.24107K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.00846508, throughput 3.27368K wps
[Epoch 24 Batch 60/62] avg loss 0.00846575, throughput 3.2161K wps
Begin Testing...
[Epoch 24] train avg loss 0.00863805, dev acc 0.7994, dev avg loss 0.457813, throughput 3.25127K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.00824231, throughput 3.30133K wps
[Epoch 25 Batch 60/62] avg loss 0.00837804, throughput 3.20178K wps
Begin Testing...
[Epoch 25] train avg loss 0.00837449, dev acc 0.7935, dev avg loss 0.45188, throughput 3.25895K wps
[Epoch 26 Batch 30/62] avg loss 0.00807293, throughput 3.28868K wps
[Epoch 26 Batch 60/62] avg loss 0.00819822, throughput 3.21215K wps
Begin Testing...
[Epoch 26] train avg loss 0.00825948, dev acc 0.7965, dev avg loss 0.447406, throughput 3.25598K wps
[Epoch 27 Batch 30/62] avg loss 0.00797055, throughput 3.30059K wps
[Epoch 27 Batch 60/62] avg loss 0.00792719, throughput 3.20996K wps
Begin Testing...
[Epoch 27] train avg loss 0.00806876, dev acc 0.8024, dev avg loss 0.444065, throughput 3.262K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.00781206, throughput 3.25803K wps
[Epoch 28 Batch 60/62] avg loss 0.00774386, throughput 3.20489K wps
Begin Testing...
[Epoch 28] train avg loss 0.00797733, dev acc 0.7935, dev avg loss 0.439289, throughput 3.23725K wps
[Epoch 29 Batch 30/62] avg loss 0.00742249, throughput 3.26016K wps
[Epoch 29 Batch 60/62] avg loss 0.00776744, throughput 3.23771K wps
Begin Testing...
[Epoch 29] train avg loss 0.00767545, dev acc 0.8024, dev avg loss 0.435899, throughput 3.25537K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00769107, throughput 3.30071K wps
[Epoch 30 Batch 60/62] avg loss 0.00754736, throughput 3.22762K wps
Begin Testing...
[Epoch 30] train avg loss 0.00768421, dev acc 0.7935, dev avg loss 0.431962, throughput 3.27018K wps
[Epoch 31 Batch 30/62] avg loss 0.0075657, throughput 3.31025K wps
[Epoch 31 Batch 60/62] avg loss 0.00729583, throughput 3.19259K wps
Begin Testing...
[Epoch 31] train avg loss 0.00750881, dev acc 0.8171, dev avg loss 0.430783, throughput 3.25838K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/62] avg loss 0.00735187, throughput 3.32007K wps
[Epoch 32 Batch 60/62] avg loss 0.00720714, throughput 3.21171K wps
Begin Testing...
[Epoch 32] train avg loss 0.00738585, dev acc 0.7935, dev avg loss 0.429822, throughput 3.27224K wps
[Epoch 33 Batch 30/62] avg loss 0.00728453, throughput 3.29556K wps
[Epoch 33 Batch 60/62] avg loss 0.00709872, throughput 3.21276K wps
Begin Testing...
[Epoch 33] train avg loss 0.00725406, dev acc 0.8024, dev avg loss 0.422128, throughput 3.2616K wps
[Epoch 34 Batch 30/62] avg loss 0.00687495, throughput 3.29322K wps
[Epoch 34 Batch 60/62] avg loss 0.0070724, throughput 3.22137K wps
Begin Testing...
[Epoch 34] train avg loss 0.00702385, dev acc 0.8053, dev avg loss 0.420128, throughput 3.26341K wps
[Epoch 35 Batch 30/62] avg loss 0.00689106, throughput 3.26144K wps
[Epoch 35 Batch 60/62] avg loss 0.00674045, throughput 3.2151K wps
Begin Testing...
[Epoch 35] train avg loss 0.00699345, dev acc 0.8083, dev avg loss 0.416261, throughput 3.24645K wps
[Epoch 36 Batch 30/62] avg loss 0.00666183, throughput 3.28977K wps
[Epoch 36 Batch 60/62] avg loss 0.00690227, throughput 3.22588K wps
Begin Testing...
[Epoch 36] train avg loss 0.00684089, dev acc 0.8112, dev avg loss 0.413018, throughput 3.26316K wps
[Epoch 37 Batch 30/62] avg loss 0.00687858, throughput 3.27683K wps
[Epoch 37 Batch 60/62] avg loss 0.00628867, throughput 3.21284K wps
Begin Testing...
[Epoch 37] train avg loss 0.00670423, dev acc 0.8260, dev avg loss 0.411552, throughput 3.24957K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/62] avg loss 0.00669405, throughput 3.29333K wps
[Epoch 38 Batch 60/62] avg loss 0.00633397, throughput 3.16488K wps
Begin Testing...
[Epoch 38] train avg loss 0.00662436, dev acc 0.8289, dev avg loss 0.408142, throughput 3.23345K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/62] avg loss 0.00635551, throughput 3.27943K wps
[Epoch 39 Batch 60/62] avg loss 0.0065783, throughput 3.16411K wps
Begin Testing...
[Epoch 39] train avg loss 0.00653212, dev acc 0.8201, dev avg loss 0.405189, throughput 3.22488K wps
[Epoch 40 Batch 30/62] avg loss 0.00629287, throughput 3.26219K wps
[Epoch 40 Batch 60/62] avg loss 0.00634437, throughput 3.14724K wps
Begin Testing...
[Epoch 40] train avg loss 0.0063631, dev acc 0.8112, dev avg loss 0.405724, throughput 3.2087K wps
[Epoch 41 Batch 30/62] avg loss 0.00618963, throughput 3.23643K wps
[Epoch 41 Batch 60/62] avg loss 0.00611566, throughput 3.16667K wps
Begin Testing...
[Epoch 41] train avg loss 0.00625739, dev acc 0.8260, dev avg loss 0.400499, throughput 3.20741K wps
[Epoch 42 Batch 30/62] avg loss 0.00602588, throughput 3.25854K wps
[Epoch 42 Batch 60/62] avg loss 0.0060395, throughput 3.16381K wps
Begin Testing...
[Epoch 42] train avg loss 0.00612719, dev acc 0.8230, dev avg loss 0.399078, throughput 3.21661K wps
[Epoch 43 Batch 30/62] avg loss 0.00611972, throughput 3.25751K wps
[Epoch 43 Batch 60/62] avg loss 0.00578485, throughput 3.18379K wps
Begin Testing...
[Epoch 43] train avg loss 0.0060777, dev acc 0.8260, dev avg loss 0.397134, throughput 3.22882K wps
[Epoch 44 Batch 30/62] avg loss 0.00576812, throughput 3.28043K wps
[Epoch 44 Batch 60/62] avg loss 0.00595795, throughput 3.16448K wps
Begin Testing...
[Epoch 44] train avg loss 0.0058914, dev acc 0.8083, dev avg loss 0.401912, throughput 3.22829K wps
[Epoch 45 Batch 30/62] avg loss 0.00575782, throughput 3.23329K wps
[Epoch 45 Batch 60/62] avg loss 0.00569315, throughput 3.17833K wps
Begin Testing...
[Epoch 45] train avg loss 0.00577917, dev acc 0.8230, dev avg loss 0.392463, throughput 3.21235K wps
[Epoch 46 Batch 30/62] avg loss 0.00552282, throughput 3.2617K wps
[Epoch 46 Batch 60/62] avg loss 0.00575031, throughput 3.19563K wps
Begin Testing...
[Epoch 46] train avg loss 0.00566125, dev acc 0.8260, dev avg loss 0.390385, throughput 3.23271K wps
[Epoch 47 Batch 30/62] avg loss 0.00553312, throughput 3.27308K wps
[Epoch 47 Batch 60/62] avg loss 0.0053793, throughput 3.18574K wps
Begin Testing...
[Epoch 47] train avg loss 0.00555353, dev acc 0.8289, dev avg loss 0.388854, throughput 3.23326K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/62] avg loss 0.00565766, throughput 3.24189K wps
[Epoch 48 Batch 60/62] avg loss 0.00532331, throughput 3.20462K wps
Begin Testing...
[Epoch 48] train avg loss 0.00551818, dev acc 0.8230, dev avg loss 0.387848, throughput 3.22958K wps
[Epoch 49 Batch 30/62] avg loss 0.00536617, throughput 3.28199K wps
[Epoch 49 Batch 60/62] avg loss 0.00522444, throughput 3.20661K wps
Begin Testing...
[Epoch 49] train avg loss 0.00530992, dev acc 0.8260, dev avg loss 0.385851, throughput 3.2514K wps
[Epoch 50 Batch 30/62] avg loss 0.00543898, throughput 3.29394K wps
[Epoch 50 Batch 60/62] avg loss 0.00516609, throughput 3.21655K wps
Begin Testing...
[Epoch 50] train avg loss 0.00537881, dev acc 0.8289, dev avg loss 0.383818, throughput 3.26054K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/62] avg loss 0.00515733, throughput 3.27472K wps
[Epoch 51 Batch 60/62] avg loss 0.00504491, throughput 3.1761K wps
Begin Testing...
[Epoch 51] train avg loss 0.00514159, dev acc 0.8230, dev avg loss 0.382382, throughput 3.23026K wps
[Epoch 52 Batch 30/62] avg loss 0.00494542, throughput 3.24376K wps
[Epoch 52 Batch 60/62] avg loss 0.00507453, throughput 3.16448K wps
Begin Testing...
[Epoch 52] train avg loss 0.00504202, dev acc 0.8230, dev avg loss 0.380914, throughput 3.21153K wps
[Epoch 53 Batch 30/62] avg loss 0.00512199, throughput 3.26659K wps
[Epoch 53 Batch 60/62] avg loss 0.00485396, throughput 3.19631K wps
Begin Testing...
[Epoch 53] train avg loss 0.00501005, dev acc 0.8230, dev avg loss 0.379121, throughput 3.23742K wps
[Epoch 54 Batch 30/62] avg loss 0.00493663, throughput 3.25574K wps
[Epoch 54 Batch 60/62] avg loss 0.00479115, throughput 3.19991K wps
Begin Testing...
[Epoch 54] train avg loss 0.0049475, dev acc 0.8260, dev avg loss 0.379057, throughput 3.23463K wps
[Epoch 55 Batch 30/62] avg loss 0.0047007, throughput 3.27444K wps
[Epoch 55 Batch 60/62] avg loss 0.00470006, throughput 3.17463K wps
Begin Testing...
[Epoch 55] train avg loss 0.00478609, dev acc 0.8260, dev avg loss 0.376825, throughput 3.23074K wps
[Epoch 56 Batch 30/62] avg loss 0.00466942, throughput 3.2257K wps
[Epoch 56 Batch 60/62] avg loss 0.00482936, throughput 3.14563K wps
Begin Testing...
[Epoch 56] train avg loss 0.00482369, dev acc 0.8201, dev avg loss 0.375369, throughput 3.19026K wps
[Epoch 57 Batch 30/62] avg loss 0.00472204, throughput 3.25081K wps
[Epoch 57 Batch 60/62] avg loss 0.00441983, throughput 3.15535K wps
Begin Testing...
[Epoch 57] train avg loss 0.00459871, dev acc 0.8230, dev avg loss 0.374835, throughput 3.20883K wps
[Epoch 58 Batch 30/62] avg loss 0.00444421, throughput 3.26324K wps
[Epoch 58 Batch 60/62] avg loss 0.0045645, throughput 3.19609K wps
Begin Testing...
[Epoch 58] train avg loss 0.00454812, dev acc 0.8201, dev avg loss 0.373517, throughput 3.2371K wps
[Epoch 59 Batch 30/62] avg loss 0.00443324, throughput 3.28029K wps
[Epoch 59 Batch 60/62] avg loss 0.00433557, throughput 3.17455K wps
Begin Testing...
[Epoch 59] train avg loss 0.00440945, dev acc 0.8201, dev avg loss 0.374655, throughput 3.23331K wps
[Epoch 60 Batch 30/62] avg loss 0.00447047, throughput 3.27687K wps
[Epoch 60 Batch 60/62] avg loss 0.00431754, throughput 3.17948K wps
Begin Testing...
[Epoch 60] train avg loss 0.00443918, dev acc 0.8289, dev avg loss 0.371798, throughput 3.23372K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/62] avg loss 0.00425252, throughput 3.27809K wps
[Epoch 61 Batch 60/62] avg loss 0.00412592, throughput 3.16743K wps
Begin Testing...
[Epoch 61] train avg loss 0.00422509, dev acc 0.8348, dev avg loss 0.370897, throughput 3.22787K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/62] avg loss 0.00409536, throughput 3.2666K wps
[Epoch 62 Batch 60/62] avg loss 0.00420131, throughput 3.19135K wps
Begin Testing...
[Epoch 62] train avg loss 0.00419516, dev acc 0.8289, dev avg loss 0.369508, throughput 3.23631K wps
[Epoch 63 Batch 30/62] avg loss 0.00407032, throughput 3.26723K wps
[Epoch 63 Batch 60/62] avg loss 0.00404456, throughput 3.18404K wps
Begin Testing...
[Epoch 63] train avg loss 0.00410071, dev acc 0.8319, dev avg loss 0.375088, throughput 3.23133K wps
[Epoch 64 Batch 30/62] avg loss 0.00400445, throughput 3.26354K wps
[Epoch 64 Batch 60/62] avg loss 0.00390673, throughput 3.18211K wps
Begin Testing...
[Epoch 64] train avg loss 0.00400526, dev acc 0.8289, dev avg loss 0.371475, throughput 3.23198K wps
[Epoch 65 Batch 30/62] avg loss 0.0039435, throughput 3.29102K wps
[Epoch 65 Batch 60/62] avg loss 0.0040716, throughput 3.1906K wps
Begin Testing...
[Epoch 65] train avg loss 0.00406084, dev acc 0.8260, dev avg loss 0.368071, throughput 3.24833K wps
[Epoch 66 Batch 30/62] avg loss 0.00394968, throughput 3.27868K wps
[Epoch 66 Batch 60/62] avg loss 0.00372068, throughput 3.17602K wps
Begin Testing...
[Epoch 66] train avg loss 0.00384899, dev acc 0.8289, dev avg loss 0.367477, throughput 3.23244K wps
[Epoch 67 Batch 30/62] avg loss 0.00405073, throughput 3.24632K wps
[Epoch 67 Batch 60/62] avg loss 0.00362065, throughput 3.198K wps
Begin Testing...
[Epoch 67] train avg loss 0.00384615, dev acc 0.8289, dev avg loss 0.366162, throughput 3.23K wps
[Epoch 68 Batch 30/62] avg loss 0.00372547, throughput 3.2898K wps
[Epoch 68 Batch 60/62] avg loss 0.00353941, throughput 3.179K wps
Begin Testing...
[Epoch 68] train avg loss 0.00375184, dev acc 0.8260, dev avg loss 0.365346, throughput 3.2407K wps
[Epoch 69 Batch 30/62] avg loss 0.00384722, throughput 3.25291K wps
[Epoch 69 Batch 60/62] avg loss 0.00344841, throughput 3.20194K wps
Begin Testing...
[Epoch 69] train avg loss 0.00369225, dev acc 0.8319, dev avg loss 0.369473, throughput 3.23379K wps
[Epoch 70 Batch 30/62] avg loss 0.00355392, throughput 3.24825K wps
[Epoch 70 Batch 60/62] avg loss 0.00380639, throughput 3.15187K wps
Begin Testing...
[Epoch 70] train avg loss 0.00371327, dev acc 0.8348, dev avg loss 0.367921, throughput 3.20519K wps
Observed Improvement.
Begin Testing...
[Epoch 71 Batch 30/62] avg loss 0.00348498, throughput 3.29937K wps
[Epoch 71 Batch 60/62] avg loss 0.00367283, throughput 3.20548K wps
Begin Testing...
[Epoch 71] train avg loss 0.0036373, dev acc 0.8289, dev avg loss 0.36314, throughput 3.25889K wps
[Epoch 72 Batch 30/62] avg loss 0.00319139, throughput 3.2643K wps
[Epoch 72 Batch 60/62] avg loss 0.00354687, throughput 3.17358K wps
Begin Testing...
[Epoch 72] train avg loss 0.00338537, dev acc 0.8319, dev avg loss 0.362705, throughput 3.22439K wps
[Epoch 73 Batch 30/62] avg loss 0.00345685, throughput 3.23598K wps
[Epoch 73 Batch 60/62] avg loss 0.00332934, throughput 3.1531K wps
Begin Testing...
[Epoch 73] train avg loss 0.00354902, dev acc 0.8260, dev avg loss 0.362061, throughput 3.20052K wps
[Epoch 74 Batch 30/62] avg loss 0.0034063, throughput 3.25031K wps
[Epoch 74 Batch 60/62] avg loss 0.00321087, throughput 3.18116K wps
Begin Testing...
[Epoch 74] train avg loss 0.00335961, dev acc 0.8378, dev avg loss 0.361896, throughput 3.22442K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/62] avg loss 0.00320247, throughput 3.29483K wps
[Epoch 75 Batch 60/62] avg loss 0.00336918, throughput 3.17771K wps
Begin Testing...
[Epoch 75] train avg loss 0.00332938, dev acc 0.8319, dev avg loss 0.3615, throughput 3.24162K wps
[Epoch 76 Batch 30/62] avg loss 0.00333652, throughput 3.26602K wps
[Epoch 76 Batch 60/62] avg loss 0.00299697, throughput 3.21006K wps
Begin Testing...
[Epoch 76] train avg loss 0.00320635, dev acc 0.8260, dev avg loss 0.361185, throughput 3.24187K wps
[Epoch 77 Batch 30/62] avg loss 0.00311758, throughput 3.27739K wps
[Epoch 77 Batch 60/62] avg loss 0.00338829, throughput 3.17056K wps
Begin Testing...
[Epoch 77] train avg loss 0.00334677, dev acc 0.8319, dev avg loss 0.365749, throughput 3.22866K wps
[Epoch 78 Batch 30/62] avg loss 0.00308198, throughput 3.25034K wps
[Epoch 78 Batch 60/62] avg loss 0.00303214, throughput 3.19906K wps
Begin Testing...
[Epoch 78] train avg loss 0.00310944, dev acc 0.8348, dev avg loss 0.362884, throughput 3.23014K wps
[Epoch 79 Batch 30/62] avg loss 0.00295494, throughput 3.26789K wps
[Epoch 79 Batch 60/62] avg loss 0.00304744, throughput 3.16947K wps
Begin Testing...
[Epoch 79] train avg loss 0.00306406, dev acc 0.8260, dev avg loss 0.360334, throughput 3.22454K wps
[Epoch 80 Batch 30/62] avg loss 0.00306061, throughput 3.25133K wps
[Epoch 80 Batch 60/62] avg loss 0.00289974, throughput 3.18401K wps
Begin Testing...
[Epoch 80] train avg loss 0.00304028, dev acc 0.8348, dev avg loss 0.362229, throughput 3.22302K wps
[Epoch 81 Batch 30/62] avg loss 0.00295852, throughput 3.27309K wps
[Epoch 81 Batch 60/62] avg loss 0.00282853, throughput 3.16429K wps
Begin Testing...
[Epoch 81] train avg loss 0.00300484, dev acc 0.8348, dev avg loss 0.359915, throughput 3.22378K wps
[Epoch 82 Batch 30/62] avg loss 0.00274497, throughput 3.25516K wps
[Epoch 82 Batch 60/62] avg loss 0.0030022, throughput 3.20869K wps
Begin Testing...
[Epoch 82] train avg loss 0.00292939, dev acc 0.8348, dev avg loss 0.363103, throughput 3.24077K wps
[Epoch 83 Batch 30/62] avg loss 0.00287035, throughput 3.28178K wps
[Epoch 83 Batch 60/62] avg loss 0.00269309, throughput 3.20613K wps
Begin Testing...
[Epoch 83] train avg loss 0.00281004, dev acc 0.8319, dev avg loss 0.361502, throughput 3.24535K wps
[Epoch 84 Batch 30/62] avg loss 0.00274051, throughput 3.27621K wps
[Epoch 84 Batch 60/62] avg loss 0.002895, throughput 3.17059K wps
Begin Testing...
[Epoch 84] train avg loss 0.00281974, dev acc 0.8319, dev avg loss 0.359445, throughput 3.22909K wps
[Epoch 85 Batch 30/62] avg loss 0.00286738, throughput 3.27439K wps
[Epoch 85 Batch 60/62] avg loss 0.00261588, throughput 3.20458K wps
Begin Testing...
[Epoch 85] train avg loss 0.00274917, dev acc 0.8319, dev avg loss 0.359838, throughput 3.24504K wps
[Epoch 86 Batch 30/62] avg loss 0.00269864, throughput 3.2504K wps
[Epoch 86 Batch 60/62] avg loss 0.00264878, throughput 3.17381K wps
Begin Testing...
[Epoch 86] train avg loss 0.00271098, dev acc 0.8319, dev avg loss 0.359749, throughput 3.21904K wps
[Epoch 87 Batch 30/62] avg loss 0.00252631, throughput 3.2297K wps
[Epoch 87 Batch 60/62] avg loss 0.00263471, throughput 3.20879K wps
Begin Testing...
[Epoch 87] train avg loss 0.00261728, dev acc 0.8319, dev avg loss 0.363341, throughput 3.22705K wps
[Epoch 88 Batch 30/62] avg loss 0.00244823, throughput 3.28145K wps
[Epoch 88 Batch 60/62] avg loss 0.00249956, throughput 3.1861K wps
Begin Testing...
[Epoch 88] train avg loss 0.00249439, dev acc 0.8378, dev avg loss 0.361687, throughput 3.24129K wps
Observed Improvement.
Begin Testing...
[Epoch 89 Batch 30/62] avg loss 0.00253092, throughput 3.27825K wps
[Epoch 89 Batch 60/62] avg loss 0.00238407, throughput 3.20683K wps
Begin Testing...
[Epoch 89] train avg loss 0.00250015, dev acc 0.8319, dev avg loss 0.360432, throughput 3.24871K wps
[Epoch 90 Batch 30/62] avg loss 0.0025346, throughput 3.27843K wps
[Epoch 90 Batch 60/62] avg loss 0.00247179, throughput 3.18365K wps
Begin Testing...
[Epoch 90] train avg loss 0.0025148, dev acc 0.8348, dev avg loss 0.361168, throughput 3.23938K wps
[Epoch 91 Batch 30/62] avg loss 0.00249412, throughput 3.24109K wps
[Epoch 91 Batch 60/62] avg loss 0.0023378, throughput 3.17049K wps
Begin Testing...
[Epoch 91] train avg loss 0.00244416, dev acc 0.8348, dev avg loss 0.361901, throughput 3.21295K wps
[Epoch 92 Batch 30/62] avg loss 0.00243472, throughput 3.25364K wps
[Epoch 92 Batch 60/62] avg loss 0.00254423, throughput 3.21311K wps
Begin Testing...
[Epoch 92] train avg loss 0.00250474, dev acc 0.8348, dev avg loss 0.360168, throughput 3.23835K wps
[Epoch 93 Batch 30/62] avg loss 0.00243469, throughput 3.26321K wps
[Epoch 93 Batch 60/62] avg loss 0.00231743, throughput 3.16141K wps
Begin Testing...
[Epoch 93] train avg loss 0.00240666, dev acc 0.8348, dev avg loss 0.360085, throughput 3.21837K wps
[Epoch 94 Batch 30/62] avg loss 0.00223614, throughput 3.2536K wps
[Epoch 94 Batch 60/62] avg loss 0.00236966, throughput 3.17445K wps
Begin Testing...
[Epoch 94] train avg loss 0.00231164, dev acc 0.8378, dev avg loss 0.362197, throughput 3.22193K wps
Observed Improvement.
Begin Testing...
[Epoch 95 Batch 30/62] avg loss 0.0021205, throughput 3.24117K wps
[Epoch 95 Batch 60/62] avg loss 0.00239506, throughput 3.19241K wps
Begin Testing...
[Epoch 95] train avg loss 0.00229588, dev acc 0.8348, dev avg loss 0.360884, throughput 3.22253K wps
[Epoch 96 Batch 30/62] avg loss 0.0022535, throughput 3.27421K wps
[Epoch 96 Batch 60/62] avg loss 0.0021977, throughput 3.18676K wps
Begin Testing...
[Epoch 96] train avg loss 0.00230408, dev acc 0.8407, dev avg loss 0.359243, throughput 3.23597K wps
Observed Improvement.
Begin Testing...
[Epoch 97 Batch 30/62] avg loss 0.00212519, throughput 3.26671K wps
[Epoch 97 Batch 60/62] avg loss 0.00201879, throughput 3.21661K wps
Begin Testing...
[Epoch 97] train avg loss 0.00209947, dev acc 0.8378, dev avg loss 0.361056, throughput 3.24701K wps
[Epoch 98 Batch 30/62] avg loss 0.00217483, throughput 3.249K wps
[Epoch 98 Batch 60/62] avg loss 0.00225991, throughput 3.16696K wps
Begin Testing...
[Epoch 98] train avg loss 0.002239, dev acc 0.8378, dev avg loss 0.360421, throughput 3.21381K wps
[Epoch 99 Batch 30/62] avg loss 0.00222016, throughput 3.24599K wps
[Epoch 99 Batch 60/62] avg loss 0.00201434, throughput 3.15801K wps
Begin Testing...
[Epoch 99] train avg loss 0.00213819, dev acc 0.8378, dev avg loss 0.363279, throughput 3.20886K wps
[Epoch 100 Batch 30/62] avg loss 0.00206842, throughput 3.26877K wps
[Epoch 100 Batch 60/62] avg loss 0.00211289, throughput 3.22163K wps
Begin Testing...
[Epoch 100] train avg loss 0.00210201, dev acc 0.8348, dev avg loss 0.361603, throughput 3.25048K wps
[Epoch 101 Batch 30/62] avg loss 0.00221842, throughput 3.26259K wps
[Epoch 101 Batch 60/62] avg loss 0.00194909, throughput 3.19863K wps
Begin Testing...
[Epoch 101] train avg loss 0.00209462, dev acc 0.8378, dev avg loss 0.362505, throughput 3.23602K wps
[Epoch 102 Batch 30/62] avg loss 0.00201441, throughput 3.27499K wps
[Epoch 102 Batch 60/62] avg loss 0.00201654, throughput 3.1713K wps
Begin Testing...
[Epoch 102] train avg loss 0.00202433, dev acc 0.8348, dev avg loss 0.361244, throughput 3.22802K wps
[Epoch 103 Batch 30/62] avg loss 0.00210826, throughput 3.23619K wps
[Epoch 103 Batch 60/62] avg loss 0.00197868, throughput 3.21386K wps
Begin Testing...
[Epoch 103] train avg loss 0.00207582, dev acc 0.8437, dev avg loss 0.361394, throughput 3.23086K wps
Observed Improvement.
Begin Testing...
[Epoch 104 Batch 30/62] avg loss 0.00187779, throughput 3.29684K wps
[Epoch 104 Batch 60/62] avg loss 0.0019615, throughput 3.19286K wps
Begin Testing...
[Epoch 104] train avg loss 0.00197961, dev acc 0.8407, dev avg loss 0.361917, throughput 3.25247K wps
[Epoch 105 Batch 30/62] avg loss 0.0020734, throughput 3.28586K wps
[Epoch 105 Batch 60/62] avg loss 0.00205472, throughput 3.17726K wps
Begin Testing...
[Epoch 105] train avg loss 0.00207504, dev acc 0.8378, dev avg loss 0.362045, throughput 3.2383K wps
[Epoch 106 Batch 30/62] avg loss 0.00176514, throughput 3.24645K wps
[Epoch 106 Batch 60/62] avg loss 0.00184411, throughput 3.20204K wps
Begin Testing...
[Epoch 106] train avg loss 0.00181596, dev acc 0.8378, dev avg loss 0.362542, throughput 3.23222K wps
[Epoch 107 Batch 30/62] avg loss 0.00191532, throughput 3.27407K wps
[Epoch 107 Batch 60/62] avg loss 0.00168606, throughput 3.15676K wps
Begin Testing...
[Epoch 107] train avg loss 0.00182474, dev acc 0.8466, dev avg loss 0.362299, throughput 3.21928K wps
Observed Improvement.
Begin Testing...
[Epoch 108 Batch 30/62] avg loss 0.00173952, throughput 3.28295K wps
[Epoch 108 Batch 60/62] avg loss 0.00179393, throughput 3.1625K wps
Begin Testing...
[Epoch 108] train avg loss 0.00179724, dev acc 0.8437, dev avg loss 0.362786, throughput 3.22886K wps
[Epoch 109 Batch 30/62] avg loss 0.00186533, throughput 3.25048K wps
[Epoch 109 Batch 60/62] avg loss 0.00180994, throughput 3.16668K wps
Begin Testing...
[Epoch 109] train avg loss 0.00186824, dev acc 0.8348, dev avg loss 0.362917, throughput 3.21547K wps
[Epoch 110 Batch 30/62] avg loss 0.00174997, throughput 3.26138K wps
[Epoch 110 Batch 60/62] avg loss 0.00180282, throughput 3.20279K wps
Begin Testing...
[Epoch 110] train avg loss 0.00179194, dev acc 0.8348, dev avg loss 0.367044, throughput 3.23792K wps
[Epoch 111 Batch 30/62] avg loss 0.00186936, throughput 3.24898K wps
[Epoch 111 Batch 60/62] avg loss 0.00163581, throughput 3.16354K wps
Begin Testing...
[Epoch 111] train avg loss 0.00179187, dev acc 0.8378, dev avg loss 0.363472, throughput 3.21152K wps
[Epoch 112 Batch 30/62] avg loss 0.00179993, throughput 3.25776K wps
[Epoch 112 Batch 60/62] avg loss 0.00164556, throughput 3.20923K wps
Begin Testing...
[Epoch 112] train avg loss 0.00172533, dev acc 0.8407, dev avg loss 0.364234, throughput 3.23896K wps
[Epoch 113 Batch 30/62] avg loss 0.00152962, throughput 3.25928K wps
[Epoch 113 Batch 60/62] avg loss 0.00169794, throughput 3.13963K wps
Begin Testing...
[Epoch 113] train avg loss 0.00161702, dev acc 0.8407, dev avg loss 0.36494, throughput 3.20501K wps
[Epoch 114 Batch 30/62] avg loss 0.00176695, throughput 3.25749K wps
[Epoch 114 Batch 60/62] avg loss 0.00175273, throughput 3.16602K wps
Begin Testing...
[Epoch 114] train avg loss 0.00180888, dev acc 0.8319, dev avg loss 0.365121, throughput 3.21781K wps
[Epoch 115 Batch 30/62] avg loss 0.0016925, throughput 3.24259K wps
[Epoch 115 Batch 60/62] avg loss 0.00164298, throughput 3.1762K wps
Begin Testing...
[Epoch 115] train avg loss 0.00166813, dev acc 0.8407, dev avg loss 0.364746, throughput 3.21678K wps
[Epoch 116 Batch 30/62] avg loss 0.00168057, throughput 3.28557K wps
[Epoch 116 Batch 60/62] avg loss 0.0016871, throughput 3.17783K wps
Begin Testing...
[Epoch 116] train avg loss 0.00174845, dev acc 0.8378, dev avg loss 0.378836, throughput 3.23519K wps
[Epoch 117 Batch 30/62] avg loss 0.00153429, throughput 3.24645K wps
[Epoch 117 Batch 60/62] avg loss 0.00156017, throughput 3.16731K wps
Begin Testing...
[Epoch 117] train avg loss 0.00157764, dev acc 0.8437, dev avg loss 0.365858, throughput 3.21283K wps
[Epoch 118 Batch 30/62] avg loss 0.00152958, throughput 3.23852K wps
[Epoch 118 Batch 60/62] avg loss 0.00155224, throughput 3.16543K wps
Begin Testing...
[Epoch 118] train avg loss 0.00157567, dev acc 0.8348, dev avg loss 0.366869, throughput 3.20785K wps
[Epoch 119 Batch 30/62] avg loss 0.00146014, throughput 3.24634K wps
[Epoch 119 Batch 60/62] avg loss 0.00163058, throughput 3.20316K wps
Begin Testing...
[Epoch 119] train avg loss 0.00155596, dev acc 0.8378, dev avg loss 0.366889, throughput 3.23097K wps
[Epoch 120 Batch 30/62] avg loss 0.00152028, throughput 3.27081K wps
[Epoch 120 Batch 60/62] avg loss 0.00156728, throughput 3.17909K wps
Begin Testing...
[Epoch 120] train avg loss 0.00154411, dev acc 0.8407, dev avg loss 0.367065, throughput 3.2298K wps
[Epoch 121 Batch 30/62] avg loss 0.00152559, throughput 3.23913K wps
[Epoch 121 Batch 60/62] avg loss 0.00142495, throughput 3.16407K wps
Begin Testing...
[Epoch 121] train avg loss 0.00150231, dev acc 0.8437, dev avg loss 0.366289, throughput 3.21053K wps
[Epoch 122 Batch 30/62] avg loss 0.00139608, throughput 3.29715K wps
[Epoch 122 Batch 60/62] avg loss 0.00140801, throughput 3.17065K wps
Begin Testing...
[Epoch 122] train avg loss 0.00142311, dev acc 0.8407, dev avg loss 0.36659, throughput 3.23911K wps
[Epoch 123 Batch 30/62] avg loss 0.00138803, throughput 3.24896K wps
[Epoch 123 Batch 60/62] avg loss 0.00148291, throughput 3.21209K wps
Begin Testing...
[Epoch 123] train avg loss 0.00145575, dev acc 0.8407, dev avg loss 0.366602, throughput 3.23598K wps
[Epoch 124 Batch 30/62] avg loss 0.00144044, throughput 3.28483K wps
[Epoch 124 Batch 60/62] avg loss 0.00147201, throughput 3.21472K wps
Begin Testing...
[Epoch 124] train avg loss 0.00147281, dev acc 0.8378, dev avg loss 0.371841, throughput 3.25701K wps
[Epoch 125 Batch 30/62] avg loss 0.00130165, throughput 3.2854K wps
[Epoch 125 Batch 60/62] avg loss 0.00147396, throughput 3.21586K wps
Begin Testing...
[Epoch 125] train avg loss 0.00141363, dev acc 0.8319, dev avg loss 0.37024, throughput 3.25656K wps
[Epoch 126 Batch 30/62] avg loss 0.00151178, throughput 3.28037K wps
[Epoch 126 Batch 60/62] avg loss 0.00129502, throughput 3.21971K wps
Begin Testing...
[Epoch 126] train avg loss 0.00141326, dev acc 0.8437, dev avg loss 0.366074, throughput 3.25514K wps
[Epoch 127 Batch 30/62] avg loss 0.00141413, throughput 3.28032K wps
[Epoch 127 Batch 60/62] avg loss 0.00144149, throughput 3.21514K wps
Begin Testing...
[Epoch 127] train avg loss 0.00146857, dev acc 0.8407, dev avg loss 0.366589, throughput 3.25254K wps
[Epoch 128 Batch 30/62] avg loss 0.00131687, throughput 3.27597K wps
[Epoch 128 Batch 60/62] avg loss 0.00140626, throughput 3.17213K wps
Begin Testing...
[Epoch 128] train avg loss 0.00138389, dev acc 0.8407, dev avg loss 0.367565, throughput 3.23102K wps
[Epoch 129 Batch 30/62] avg loss 0.00129274, throughput 3.31784K wps
[Epoch 129 Batch 60/62] avg loss 0.00130217, throughput 3.1849K wps
Begin Testing...
[Epoch 129] train avg loss 0.00130613, dev acc 0.8378, dev avg loss 0.369512, throughput 3.25797K wps
[Epoch 130 Batch 30/62] avg loss 0.00126095, throughput 3.32579K wps
[Epoch 130 Batch 60/62] avg loss 0.00148066, throughput 3.23621K wps
Begin Testing...
[Epoch 130] train avg loss 0.00138987, dev acc 0.8378, dev avg loss 0.369557, throughput 3.28557K wps
[Epoch 131 Batch 30/62] avg loss 0.00123771, throughput 3.29022K wps
[Epoch 131 Batch 60/62] avg loss 0.00140488, throughput 3.2247K wps
Begin Testing...
[Epoch 131] train avg loss 0.00134269, dev acc 0.8407, dev avg loss 0.370188, throughput 3.26326K wps
[Epoch 132 Batch 30/62] avg loss 0.00122028, throughput 3.28342K wps
[Epoch 132 Batch 60/62] avg loss 0.00134227, throughput 3.21916K wps
Begin Testing...
[Epoch 132] train avg loss 0.00129661, dev acc 0.8378, dev avg loss 0.369575, throughput 3.25708K wps
[Epoch 133 Batch 30/62] avg loss 0.00116055, throughput 3.27233K wps
[Epoch 133 Batch 60/62] avg loss 0.00124046, throughput 3.21601K wps
Begin Testing...
[Epoch 133] train avg loss 0.00123646, dev acc 0.8407, dev avg loss 0.371238, throughput 3.25018K wps
[Epoch 134 Batch 30/62] avg loss 0.00119029, throughput 3.27822K wps
[Epoch 134 Batch 60/62] avg loss 0.00127908, throughput 3.22721K wps
Begin Testing...
[Epoch 134] train avg loss 0.00124735, dev acc 0.8378, dev avg loss 0.370567, throughput 3.25877K wps
[Epoch 135 Batch 30/62] avg loss 0.00118546, throughput 3.28294K wps
[Epoch 135 Batch 60/62] avg loss 0.00128044, throughput 3.24054K wps
Begin Testing...
[Epoch 135] train avg loss 0.0012489, dev acc 0.8437, dev avg loss 0.370408, throughput 3.26766K wps
[Epoch 136 Batch 30/62] avg loss 0.00122916, throughput 3.33142K wps
[Epoch 136 Batch 60/62] avg loss 0.00125928, throughput 3.23789K wps
Begin Testing...
[Epoch 136] train avg loss 0.001266, dev acc 0.8407, dev avg loss 0.37068, throughput 3.29128K wps
[Epoch 137 Batch 30/62] avg loss 0.00123101, throughput 3.31034K wps
[Epoch 137 Batch 60/62] avg loss 0.00122041, throughput 3.24227K wps
Begin Testing...
[Epoch 137] train avg loss 0.00124918, dev acc 0.8437, dev avg loss 0.371353, throughput 3.28253K wps
[Epoch 138 Batch 30/62] avg loss 0.00115462, throughput 3.3222K wps
[Epoch 138 Batch 60/62] avg loss 0.00117966, throughput 3.23403K wps
Begin Testing...
[Epoch 138] train avg loss 0.00118257, dev acc 0.8378, dev avg loss 0.372547, throughput 3.28433K wps
[Epoch 139 Batch 30/62] avg loss 0.00115512, throughput 3.30454K wps
[Epoch 139 Batch 60/62] avg loss 0.00109367, throughput 3.2339K wps
Begin Testing...
[Epoch 139] train avg loss 0.0011441, dev acc 0.8378, dev avg loss 0.374545, throughput 3.27442K wps
[Epoch 140 Batch 30/62] avg loss 0.00122866, throughput 3.3196K wps
[Epoch 140 Batch 60/62] avg loss 0.00115584, throughput 3.22832K wps
Begin Testing...
[Epoch 140] train avg loss 0.0012239, dev acc 0.8319, dev avg loss 0.378323, throughput 3.2806K wps
[Epoch 141 Batch 30/62] avg loss 0.00106787, throughput 3.30811K wps
[Epoch 141 Batch 60/62] avg loss 0.00119644, throughput 3.24824K wps
Begin Testing...
[Epoch 141] train avg loss 0.00112582, dev acc 0.8407, dev avg loss 0.373728, throughput 3.28399K wps
[Epoch 142 Batch 30/62] avg loss 0.00106524, throughput 3.3182K wps
[Epoch 142 Batch 60/62] avg loss 0.00114566, throughput 3.2464K wps
Begin Testing...
[Epoch 142] train avg loss 0.00113802, dev acc 0.8378, dev avg loss 0.375464, throughput 3.28774K wps
[Epoch 143 Batch 30/62] avg loss 0.00101317, throughput 3.30644K wps
[Epoch 143 Batch 60/62] avg loss 0.00118732, throughput 3.23623K wps
Begin Testing...
[Epoch 143] train avg loss 0.00111882, dev acc 0.8407, dev avg loss 0.37444, throughput 3.27663K wps
[Epoch 144 Batch 30/62] avg loss 0.00115249, throughput 3.30389K wps
[Epoch 144 Batch 60/62] avg loss 0.001014, throughput 3.2327K wps
Begin Testing...
[Epoch 144] train avg loss 0.00108281, dev acc 0.8407, dev avg loss 0.374566, throughput 3.27632K wps
[Epoch 145 Batch 30/62] avg loss 0.00103327, throughput 3.29547K wps
[Epoch 145 Batch 60/62] avg loss 0.00112029, throughput 3.22391K wps
Begin Testing...
[Epoch 145] train avg loss 0.00111585, dev acc 0.8348, dev avg loss 0.376751, throughput 3.26556K wps
[Epoch 146 Batch 30/62] avg loss 0.00105699, throughput 3.29141K wps
[Epoch 146 Batch 60/62] avg loss 0.00106985, throughput 3.21659K wps
Begin Testing...
[Epoch 146] train avg loss 0.00108115, dev acc 0.8437, dev avg loss 0.374855, throughput 3.25953K wps
[Epoch 147 Batch 30/62] avg loss 0.00110737, throughput 3.30036K wps
[Epoch 147 Batch 60/62] avg loss 0.000963251, throughput 3.2398K wps
Begin Testing...
[Epoch 147] train avg loss 0.00104808, dev acc 0.8319, dev avg loss 0.379989, throughput 3.27492K wps
[Epoch 148 Batch 30/62] avg loss 0.00116024, throughput 3.29581K wps
[Epoch 148 Batch 60/62] avg loss 0.00110714, throughput 3.23156K wps
Begin Testing...
[Epoch 148] train avg loss 0.00113611, dev acc 0.8407, dev avg loss 0.376086, throughput 3.27185K wps
[Epoch 149 Batch 30/62] avg loss 0.00102838, throughput 3.31598K wps
[Epoch 149 Batch 60/62] avg loss 0.00105356, throughput 3.2232K wps
Begin Testing...
[Epoch 149] train avg loss 0.00103994, dev acc 0.8348, dev avg loss 0.377664, throughput 3.27534K wps
[Epoch 150 Batch 30/62] avg loss 0.000977162, throughput 3.2974K wps
[Epoch 150 Batch 60/62] avg loss 0.00103163, throughput 3.2198K wps
Begin Testing...
[Epoch 150] train avg loss 0.00102405, dev acc 0.8466, dev avg loss 0.376244, throughput 3.26447K wps
Observed Improvement.
Begin Testing...
[Epoch 151 Batch 30/62] avg loss 0.000976918, throughput 3.30534K wps
[Epoch 151 Batch 60/62] avg loss 0.00102961, throughput 3.23185K wps
Begin Testing...
[Epoch 151] train avg loss 0.00101427, dev acc 0.8378, dev avg loss 0.378075, throughput 3.27501K wps
[Epoch 152 Batch 30/62] avg loss 0.000939013, throughput 3.29989K wps
[Epoch 152 Batch 60/62] avg loss 0.00110095, throughput 3.23949K wps
Begin Testing...
[Epoch 152] train avg loss 0.00102598, dev acc 0.8466, dev avg loss 0.375973, throughput 3.27638K wps
Observed Improvement.
Begin Testing...
[Epoch 153 Batch 30/62] avg loss 0.00097048, throughput 3.32956K wps
[Epoch 153 Batch 60/62] avg loss 0.00100695, throughput 3.25438K wps
Begin Testing...
[Epoch 153] train avg loss 0.00100063, dev acc 0.8348, dev avg loss 0.377543, throughput 3.29817K wps
[Epoch 154 Batch 30/62] avg loss 0.000988961, throughput 3.3102K wps
[Epoch 154 Batch 60/62] avg loss 0.000982844, throughput 3.23481K wps
Begin Testing...
[Epoch 154] train avg loss 0.000995654, dev acc 0.8466, dev avg loss 0.377553, throughput 3.27942K wps
Observed Improvement.
Begin Testing...
[Epoch 155 Batch 30/62] avg loss 0.00091349, throughput 3.33406K wps
[Epoch 155 Batch 60/62] avg loss 0.000895146, throughput 3.2384K wps
Begin Testing...
[Epoch 155] train avg loss 0.000936959, dev acc 0.8466, dev avg loss 0.377592, throughput 3.29141K wps
Observed Improvement.
Begin Testing...
[Epoch 156 Batch 30/62] avg loss 0.000921441, throughput 3.30659K wps
[Epoch 156 Batch 60/62] avg loss 0.000971944, throughput 3.23177K wps
Begin Testing...
[Epoch 156] train avg loss 0.000986172, dev acc 0.8437, dev avg loss 0.378308, throughput 3.27494K wps
[Epoch 157 Batch 30/62] avg loss 0.000952844, throughput 3.32992K wps
[Epoch 157 Batch 60/62] avg loss 0.000985985, throughput 3.21564K wps
Begin Testing...
[Epoch 157] train avg loss 0.000975615, dev acc 0.8407, dev avg loss 0.377514, throughput 3.27919K wps
[Epoch 158 Batch 30/62] avg loss 0.000914234, throughput 3.29804K wps
[Epoch 158 Batch 60/62] avg loss 0.000961194, throughput 3.22884K wps
Begin Testing...
[Epoch 158] train avg loss 0.000928249, dev acc 0.8407, dev avg loss 0.377696, throughput 3.26854K wps
[Epoch 159 Batch 30/62] avg loss 0.000912449, throughput 3.30174K wps
[Epoch 159 Batch 60/62] avg loss 0.000869907, throughput 3.2202K wps
Begin Testing...
[Epoch 159] train avg loss 0.000925986, dev acc 0.8378, dev avg loss 0.379219, throughput 3.2665K wps
[Epoch 160 Batch 30/62] avg loss 0.000876124, throughput 3.28552K wps
[Epoch 160 Batch 60/62] avg loss 0.000925682, throughput 3.22383K wps
Begin Testing...
[Epoch 160] train avg loss 0.000907221, dev acc 0.8437, dev avg loss 0.379053, throughput 3.2606K wps
[Epoch 161 Batch 30/62] avg loss 0.000986955, throughput 3.32979K wps
[Epoch 161 Batch 60/62] avg loss 0.000806778, throughput 3.2238K wps
Begin Testing...
[Epoch 161] train avg loss 0.000909057, dev acc 0.8437, dev avg loss 0.378673, throughput 3.28268K wps
[Epoch 162 Batch 30/62] avg loss 0.000951949, throughput 3.31K wps
[Epoch 162 Batch 60/62] avg loss 0.000906412, throughput 3.22871K wps
Begin Testing...
[Epoch 162] train avg loss 0.000933104, dev acc 0.8378, dev avg loss 0.380132, throughput 3.27479K wps
[Epoch 163 Batch 30/62] avg loss 0.000836694, throughput 3.30279K wps
[Epoch 163 Batch 60/62] avg loss 0.000855998, throughput 3.22534K wps
Begin Testing...
[Epoch 163] train avg loss 0.000849215, dev acc 0.8378, dev avg loss 0.381812, throughput 3.26994K wps
[Epoch 164 Batch 30/62] avg loss 0.000950008, throughput 3.32819K wps
[Epoch 164 Batch 60/62] avg loss 0.000832332, throughput 3.22822K wps
Begin Testing...
[Epoch 164] train avg loss 0.000902662, dev acc 0.8496, dev avg loss 0.379724, throughput 3.28327K wps
Observed Improvement.
Begin Testing...
[Epoch 165 Batch 30/62] avg loss 0.000835204, throughput 3.30331K wps
[Epoch 165 Batch 60/62] avg loss 0.000792747, throughput 3.24341K wps
Begin Testing...
[Epoch 165] train avg loss 0.000824242, dev acc 0.8378, dev avg loss 0.381678, throughput 3.27914K wps
[Epoch 166 Batch 30/62] avg loss 0.000804186, throughput 3.31132K wps
[Epoch 166 Batch 60/62] avg loss 0.000847822, throughput 3.25264K wps
Begin Testing...
[Epoch 166] train avg loss 0.000850273, dev acc 0.8348, dev avg loss 0.384902, throughput 3.28971K wps
[Epoch 167 Batch 30/62] avg loss 0.000835766, throughput 3.31736K wps
[Epoch 167 Batch 60/62] avg loss 0.000832462, throughput 3.25023K wps
Begin Testing...
[Epoch 167] train avg loss 0.000851702, dev acc 0.8378, dev avg loss 0.386021, throughput 3.28985K wps
[Epoch 168 Batch 30/62] avg loss 0.000795884, throughput 3.33526K wps
[Epoch 168 Batch 60/62] avg loss 0.000825457, throughput 3.2383K wps
Begin Testing...
[Epoch 168] train avg loss 0.000822247, dev acc 0.8378, dev avg loss 0.384173, throughput 3.29179K wps
[Epoch 169 Batch 30/62] avg loss 0.000890113, throughput 3.32063K wps
[Epoch 169 Batch 60/62] avg loss 0.000785945, throughput 3.24271K wps
Begin Testing...
[Epoch 169] train avg loss 0.000844904, dev acc 0.8378, dev avg loss 0.386238, throughput 3.28863K wps
[Epoch 170 Batch 30/62] avg loss 0.000803701, throughput 3.30258K wps
[Epoch 170 Batch 60/62] avg loss 0.00074382, throughput 3.2252K wps
Begin Testing...
[Epoch 170] train avg loss 0.000781525, dev acc 0.8407, dev avg loss 0.383302, throughput 3.26872K wps
[Epoch 171 Batch 30/62] avg loss 0.00082295, throughput 3.29873K wps
[Epoch 171 Batch 60/62] avg loss 0.000728233, throughput 3.21364K wps
Begin Testing...
[Epoch 171] train avg loss 0.000804575, dev acc 0.8378, dev avg loss 0.388158, throughput 3.2613K wps
[Epoch 172 Batch 30/62] avg loss 0.000748646, throughput 3.27581K wps
[Epoch 172 Batch 60/62] avg loss 0.000843267, throughput 3.21931K wps
Begin Testing...
[Epoch 172] train avg loss 0.000807198, dev acc 0.8378, dev avg loss 0.385515, throughput 3.25341K wps
[Epoch 173 Batch 30/62] avg loss 0.000737728, throughput 3.29065K wps
[Epoch 173 Batch 60/62] avg loss 0.000739447, throughput 3.23222K wps
Begin Testing...
[Epoch 173] train avg loss 0.000763251, dev acc 0.8437, dev avg loss 0.384101, throughput 3.26769K wps
[Epoch 174 Batch 30/62] avg loss 0.000721051, throughput 3.31456K wps
[Epoch 174 Batch 60/62] avg loss 0.000812396, throughput 3.19502K wps
Begin Testing...
[Epoch 174] train avg loss 0.000800801, dev acc 0.8348, dev avg loss 0.389542, throughput 3.26021K wps
[Epoch 175 Batch 30/62] avg loss 0.000813182, throughput 3.31108K wps
[Epoch 175 Batch 60/62] avg loss 0.000868793, throughput 3.19929K wps
Begin Testing...
[Epoch 175] train avg loss 0.000869112, dev acc 0.8348, dev avg loss 0.387111, throughput 3.26194K wps
[Epoch 176 Batch 30/62] avg loss 0.000740387, throughput 3.31K wps
[Epoch 176 Batch 60/62] avg loss 0.000786874, throughput 3.21912K wps
Begin Testing...
[Epoch 176] train avg loss 0.000762728, dev acc 0.8466, dev avg loss 0.387222, throughput 3.27008K wps
[Epoch 177 Batch 30/62] avg loss 0.000731516, throughput 3.28834K wps
[Epoch 177 Batch 60/62] avg loss 0.000808796, throughput 3.21223K wps
Begin Testing...
[Epoch 177] train avg loss 0.000771487, dev acc 0.8348, dev avg loss 0.389086, throughput 3.25702K wps
[Epoch 178 Batch 30/62] avg loss 0.00077933, throughput 3.29243K wps
[Epoch 178 Batch 60/62] avg loss 0.000736426, throughput 3.21736K wps
Begin Testing...
[Epoch 178] train avg loss 0.000766252, dev acc 0.8437, dev avg loss 0.388747, throughput 3.26004K wps
[Epoch 179 Batch 30/62] avg loss 0.000766446, throughput 3.2904K wps
[Epoch 179 Batch 60/62] avg loss 0.000690782, throughput 3.23859K wps
Begin Testing...
[Epoch 179] train avg loss 0.000734169, dev acc 0.8437, dev avg loss 0.389363, throughput 3.26979K wps
[Epoch 180 Batch 30/62] avg loss 0.000755114, throughput 3.31294K wps
[Epoch 180 Batch 60/62] avg loss 0.000719577, throughput 3.22156K wps
Begin Testing...
[Epoch 180] train avg loss 0.000755694, dev acc 0.8378, dev avg loss 0.389213, throughput 3.2744K wps
[Epoch 181 Batch 30/62] avg loss 0.000688149, throughput 3.31538K wps
[Epoch 181 Batch 60/62] avg loss 0.000733276, throughput 3.23093K wps
Begin Testing...
[Epoch 181] train avg loss 0.000709407, dev acc 0.8378, dev avg loss 0.390059, throughput 3.27773K wps
[Epoch 182 Batch 30/62] avg loss 0.000649793, throughput 3.30145K wps
[Epoch 182 Batch 60/62] avg loss 0.000763974, throughput 3.22364K wps
Begin Testing...
[Epoch 182] train avg loss 0.000712637, dev acc 0.8466, dev avg loss 0.388086, throughput 3.26848K wps
[Epoch 183 Batch 30/62] avg loss 0.000711928, throughput 3.29734K wps
[Epoch 183 Batch 60/62] avg loss 0.00073935, throughput 3.20547K wps
Begin Testing...
[Epoch 183] train avg loss 0.000732014, dev acc 0.8466, dev avg loss 0.388295, throughput 3.25646K wps
[Epoch 184 Batch 30/62] avg loss 0.000720545, throughput 3.30062K wps
[Epoch 184 Batch 60/62] avg loss 0.000702931, throughput 3.20768K wps
Begin Testing...
[Epoch 184] train avg loss 0.000717212, dev acc 0.8437, dev avg loss 0.388946, throughput 3.26001K wps
[Epoch 185 Batch 30/62] avg loss 0.000733611, throughput 3.2908K wps
[Epoch 185 Batch 60/62] avg loss 0.000650179, throughput 3.21334K wps
Begin Testing...
[Epoch 185] train avg loss 0.000705498, dev acc 0.8378, dev avg loss 0.391088, throughput 3.25754K wps
[Epoch 186 Batch 30/62] avg loss 0.00065873, throughput 3.30453K wps
[Epoch 186 Batch 60/62] avg loss 0.000695433, throughput 3.19186K wps
Begin Testing...
[Epoch 186] train avg loss 0.000678781, dev acc 0.8496, dev avg loss 0.389634, throughput 3.25468K wps
Observed Improvement.
Begin Testing...
[Epoch 187 Batch 30/62] avg loss 0.000693402, throughput 3.29674K wps
[Epoch 187 Batch 60/62] avg loss 0.000640661, throughput 3.21766K wps
Begin Testing...
[Epoch 187] train avg loss 0.000668792, dev acc 0.8466, dev avg loss 0.39061, throughput 3.26305K wps
[Epoch 188 Batch 30/62] avg loss 0.0007089, throughput 3.2683K wps
[Epoch 188 Batch 60/62] avg loss 0.000600342, throughput 3.21152K wps
Begin Testing...
[Epoch 188] train avg loss 0.000658994, dev acc 0.8407, dev avg loss 0.393165, throughput 3.24653K wps
[Epoch 189 Batch 30/62] avg loss 0.000809538, throughput 3.27716K wps
[Epoch 189 Batch 60/62] avg loss 0.000663323, throughput 3.21451K wps
Begin Testing...
[Epoch 189] train avg loss 0.000734545, dev acc 0.8496, dev avg loss 0.391213, throughput 3.25198K wps
Observed Improvement.
Begin Testing...
[Epoch 190 Batch 30/62] avg loss 0.000648717, throughput 3.30823K wps
[Epoch 190 Batch 60/62] avg loss 0.000647582, throughput 3.2299K wps
Begin Testing...
[Epoch 190] train avg loss 0.00064918, dev acc 0.8437, dev avg loss 0.392946, throughput 3.27471K wps
[Epoch 191 Batch 30/62] avg loss 0.000717054, throughput 3.30235K wps
[Epoch 191 Batch 60/62] avg loss 0.000703089, throughput 3.22678K wps
Begin Testing...
[Epoch 191] train avg loss 0.00072499, dev acc 0.8378, dev avg loss 0.397242, throughput 3.27036K wps
[Epoch 192 Batch 30/62] avg loss 0.000637519, throughput 3.31082K wps
[Epoch 192 Batch 60/62] avg loss 0.000685752, throughput 3.23456K wps
Begin Testing...
[Epoch 192] train avg loss 0.000675062, dev acc 0.8466, dev avg loss 0.394834, throughput 3.27895K wps
[Epoch 193 Batch 30/62] avg loss 0.00063703, throughput 3.30555K wps
[Epoch 193 Batch 60/62] avg loss 0.000664097, throughput 3.23337K wps
Begin Testing...
[Epoch 193] train avg loss 0.000658611, dev acc 0.8525, dev avg loss 0.394555, throughput 3.27403K wps
Observed Improvement.
Begin Testing...
[Epoch 194 Batch 30/62] avg loss 0.000593853, throughput 3.26946K wps
[Epoch 194 Batch 60/62] avg loss 0.000642569, throughput 3.23306K wps
Begin Testing...
[Epoch 194] train avg loss 0.000626152, dev acc 0.8466, dev avg loss 0.39504, throughput 3.25685K wps
[Epoch 195 Batch 30/62] avg loss 0.000613785, throughput 3.28462K wps
[Epoch 195 Batch 60/62] avg loss 0.00064487, throughput 3.2176K wps
Begin Testing...
[Epoch 195] train avg loss 0.000637381, dev acc 0.8407, dev avg loss 0.396533, throughput 3.25633K wps
[Epoch 196 Batch 30/62] avg loss 0.000585871, throughput 3.279K wps
[Epoch 196 Batch 60/62] avg loss 0.000690515, throughput 3.21412K wps
Begin Testing...
[Epoch 196] train avg loss 0.000662749, dev acc 0.8378, dev avg loss 0.398391, throughput 3.25312K wps
[Epoch 197 Batch 30/62] avg loss 0.000549821, throughput 3.30334K wps
[Epoch 197 Batch 60/62] avg loss 0.000622701, throughput 3.23586K wps
Begin Testing...
[Epoch 197] train avg loss 0.000589513, dev acc 0.8496, dev avg loss 0.396172, throughput 3.27594K wps
[Epoch 198 Batch 30/62] avg loss 0.000682644, throughput 3.29522K wps
[Epoch 198 Batch 60/62] avg loss 0.000575868, throughput 3.22133K wps
Begin Testing...
[Epoch 198] train avg loss 0.000633337, dev acc 0.8466, dev avg loss 0.395932, throughput 3.264K wps
[Epoch 199 Batch 30/62] avg loss 0.000607871, throughput 3.30864K wps
[Epoch 199 Batch 60/62] avg loss 0.000621485, throughput 3.20021K wps
Begin Testing...
[Epoch 199] train avg loss 0.000617583, dev acc 0.8555, dev avg loss 0.395705, throughput 3.2615K wps
Observed Improvement.
Begin Testing...
Test loss 0.473526, test acc 0.8117
Total time cost 418.85s
[Epoch 0 Batch 30/62] avg loss 0.0135575, throughput 3.075K wps
[Epoch 0 Batch 60/62] avg loss 0.0128766, throughput 3.21863K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133604, dev acc 0.6254, dev avg loss 0.655988, throughput 3.15462K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0130242, throughput 3.28253K wps
[Epoch 1 Batch 60/62] avg loss 0.0126667, throughput 3.21964K wps
Begin Testing...
[Epoch 1] train avg loss 0.0130236, dev acc 0.6254, dev avg loss 0.647701, throughput 3.25596K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0127615, throughput 3.2935K wps
[Epoch 2 Batch 60/62] avg loss 0.0127018, throughput 3.22855K wps
Begin Testing...
[Epoch 2] train avg loss 0.0128808, dev acc 0.6254, dev avg loss 0.642786, throughput 3.26716K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0126132, throughput 3.29147K wps
[Epoch 3 Batch 60/62] avg loss 0.0124946, throughput 3.22195K wps
Begin Testing...
[Epoch 3] train avg loss 0.0127268, dev acc 0.6254, dev avg loss 0.635783, throughput 3.26299K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0123599, throughput 3.28902K wps
[Epoch 4 Batch 60/62] avg loss 0.012328, throughput 3.20932K wps
Begin Testing...
[Epoch 4] train avg loss 0.0125011, dev acc 0.6254, dev avg loss 0.628178, throughput 3.25665K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0121313, throughput 3.28889K wps
[Epoch 5 Batch 60/62] avg loss 0.0120959, throughput 3.18174K wps
Begin Testing...
[Epoch 5] train avg loss 0.0122803, dev acc 0.6254, dev avg loss 0.621833, throughput 3.23985K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0121585, throughput 3.28309K wps
[Epoch 6 Batch 60/62] avg loss 0.0119709, throughput 3.21063K wps
Begin Testing...
[Epoch 6] train avg loss 0.0121968, dev acc 0.6342, dev avg loss 0.614016, throughput 3.25219K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0118871, throughput 3.28229K wps
[Epoch 7 Batch 60/62] avg loss 0.0116927, throughput 3.19445K wps
Begin Testing...
[Epoch 7] train avg loss 0.0119831, dev acc 0.6401, dev avg loss 0.606915, throughput 3.24495K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0112341, throughput 3.24882K wps
[Epoch 8 Batch 60/62] avg loss 0.0117712, throughput 3.20422K wps
Begin Testing...
[Epoch 8] train avg loss 0.011678, dev acc 0.6342, dev avg loss 0.599642, throughput 3.23311K wps
[Epoch 9 Batch 30/62] avg loss 0.0115649, throughput 3.29063K wps
[Epoch 9 Batch 60/62] avg loss 0.0112047, throughput 3.17423K wps
Begin Testing...
[Epoch 9] train avg loss 0.0115474, dev acc 0.6342, dev avg loss 0.593225, throughput 3.23854K wps
[Epoch 10 Batch 30/62] avg loss 0.0113828, throughput 3.28751K wps
[Epoch 10 Batch 60/62] avg loss 0.0109533, throughput 3.19992K wps
Begin Testing...
[Epoch 10] train avg loss 0.0113836, dev acc 0.6755, dev avg loss 0.584205, throughput 3.24965K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0109493, throughput 3.26441K wps
[Epoch 11 Batch 60/62] avg loss 0.0109784, throughput 3.22538K wps
Begin Testing...
[Epoch 11] train avg loss 0.0110995, dev acc 0.6844, dev avg loss 0.576033, throughput 3.25151K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0107426, throughput 3.27658K wps
[Epoch 12 Batch 60/62] avg loss 0.0108263, throughput 3.20664K wps
Begin Testing...
[Epoch 12] train avg loss 0.0109514, dev acc 0.7080, dev avg loss 0.568256, throughput 3.2468K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0104291, throughput 3.30476K wps
[Epoch 13 Batch 60/62] avg loss 0.0107284, throughput 3.21011K wps
Begin Testing...
[Epoch 13] train avg loss 0.0107044, dev acc 0.7080, dev avg loss 0.560344, throughput 3.26483K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0104209, throughput 3.30407K wps
[Epoch 14 Batch 60/62] avg loss 0.0104812, throughput 3.19139K wps
Begin Testing...
[Epoch 14] train avg loss 0.0105202, dev acc 0.6962, dev avg loss 0.553527, throughput 3.25476K wps
[Epoch 15 Batch 30/62] avg loss 0.0103558, throughput 3.3055K wps
[Epoch 15 Batch 60/62] avg loss 0.010099, throughput 3.22327K wps
Begin Testing...
[Epoch 15] train avg loss 0.0104365, dev acc 0.7345, dev avg loss 0.545163, throughput 3.26943K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0101204, throughput 3.28312K wps
[Epoch 16 Batch 60/62] avg loss 0.0096943, throughput 3.2011K wps
Begin Testing...
[Epoch 16] train avg loss 0.0100694, dev acc 0.7286, dev avg loss 0.537371, throughput 3.249K wps
[Epoch 17 Batch 30/62] avg loss 0.00977823, throughput 3.28448K wps
[Epoch 17 Batch 60/62] avg loss 0.00979164, throughput 3.22064K wps
Begin Testing...
[Epoch 17] train avg loss 0.00993102, dev acc 0.7434, dev avg loss 0.531023, throughput 3.25904K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.00959038, throughput 3.28624K wps
[Epoch 18 Batch 60/62] avg loss 0.00973553, throughput 3.20606K wps
Begin Testing...
[Epoch 18] train avg loss 0.00975855, dev acc 0.7109, dev avg loss 0.526749, throughput 3.25147K wps
[Epoch 19 Batch 30/62] avg loss 0.00955262, throughput 3.30027K wps
[Epoch 19 Batch 60/62] avg loss 0.00932682, throughput 3.18443K wps
Begin Testing...
[Epoch 19] train avg loss 0.0094864, dev acc 0.7316, dev avg loss 0.517628, throughput 3.24807K wps
[Epoch 20 Batch 30/62] avg loss 0.00915216, throughput 3.27642K wps
[Epoch 20 Batch 60/62] avg loss 0.00907251, throughput 3.21118K wps
Begin Testing...
[Epoch 20] train avg loss 0.00925235, dev acc 0.7522, dev avg loss 0.509469, throughput 3.24921K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.00940872, throughput 3.29323K wps
[Epoch 21 Batch 60/62] avg loss 0.00876914, throughput 3.21236K wps
Begin Testing...
[Epoch 21] train avg loss 0.00917607, dev acc 0.7345, dev avg loss 0.508112, throughput 3.26003K wps
[Epoch 22 Batch 30/62] avg loss 0.00878293, throughput 3.28724K wps
[Epoch 22 Batch 60/62] avg loss 0.00886032, throughput 3.21788K wps
Begin Testing...
[Epoch 22] train avg loss 0.0089441, dev acc 0.7670, dev avg loss 0.497555, throughput 3.25876K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.00871438, throughput 3.29336K wps
[Epoch 23 Batch 60/62] avg loss 0.00856767, throughput 3.21432K wps
Begin Testing...
[Epoch 23] train avg loss 0.008753, dev acc 0.7670, dev avg loss 0.492281, throughput 3.25935K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.0082726, throughput 3.29245K wps
[Epoch 24 Batch 60/62] avg loss 0.00873236, throughput 3.19952K wps
Begin Testing...
[Epoch 24] train avg loss 0.00861481, dev acc 0.7699, dev avg loss 0.48717, throughput 3.25279K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.00837873, throughput 3.30448K wps
[Epoch 25 Batch 60/62] avg loss 0.00823681, throughput 3.20494K wps
Begin Testing...
[Epoch 25] train avg loss 0.00842973, dev acc 0.7758, dev avg loss 0.481359, throughput 3.26066K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.00826628, throughput 3.28887K wps
[Epoch 26 Batch 60/62] avg loss 0.00810906, throughput 3.22516K wps
Begin Testing...
[Epoch 26] train avg loss 0.00824914, dev acc 0.7847, dev avg loss 0.476799, throughput 3.26408K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/62] avg loss 0.0079684, throughput 3.29835K wps
[Epoch 27 Batch 60/62] avg loss 0.00833762, throughput 3.20918K wps
Begin Testing...
[Epoch 27] train avg loss 0.00823102, dev acc 0.7847, dev avg loss 0.472475, throughput 3.26078K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.00793098, throughput 3.26938K wps
[Epoch 28 Batch 60/62] avg loss 0.00796097, throughput 3.20602K wps
Begin Testing...
[Epoch 28] train avg loss 0.0080611, dev acc 0.7906, dev avg loss 0.468812, throughput 3.24253K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.00798197, throughput 3.28514K wps
[Epoch 29 Batch 60/62] avg loss 0.00768648, throughput 3.23146K wps
Begin Testing...
[Epoch 29] train avg loss 0.00795508, dev acc 0.7935, dev avg loss 0.464404, throughput 3.26339K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00782747, throughput 3.31024K wps
[Epoch 30 Batch 60/62] avg loss 0.00768879, throughput 3.22135K wps
Begin Testing...
[Epoch 30] train avg loss 0.00779541, dev acc 0.7965, dev avg loss 0.460262, throughput 3.27036K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00745303, throughput 3.30146K wps
[Epoch 31 Batch 60/62] avg loss 0.00750495, throughput 3.22893K wps
Begin Testing...
[Epoch 31] train avg loss 0.00757427, dev acc 0.7876, dev avg loss 0.458693, throughput 3.27008K wps
[Epoch 32 Batch 30/62] avg loss 0.00745242, throughput 3.27842K wps
[Epoch 32 Batch 60/62] avg loss 0.00723594, throughput 3.2127K wps
Begin Testing...
[Epoch 32] train avg loss 0.00747652, dev acc 0.7935, dev avg loss 0.45509, throughput 3.25116K wps
[Epoch 33 Batch 30/62] avg loss 0.00737253, throughput 3.26815K wps
[Epoch 33 Batch 60/62] avg loss 0.00697756, throughput 3.22786K wps
Begin Testing...
[Epoch 33] train avg loss 0.00726791, dev acc 0.8053, dev avg loss 0.449601, throughput 3.2548K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.00675816, throughput 3.28265K wps
[Epoch 34 Batch 60/62] avg loss 0.00720324, throughput 3.20703K wps
Begin Testing...
[Epoch 34] train avg loss 0.00712038, dev acc 0.7935, dev avg loss 0.447035, throughput 3.2515K wps
[Epoch 35 Batch 30/62] avg loss 0.00691508, throughput 3.27384K wps
[Epoch 35 Batch 60/62] avg loss 0.0068313, throughput 3.19134K wps
Begin Testing...
[Epoch 35] train avg loss 0.00711134, dev acc 0.7935, dev avg loss 0.443472, throughput 3.23871K wps
[Epoch 36 Batch 30/62] avg loss 0.00681164, throughput 3.29746K wps
[Epoch 36 Batch 60/62] avg loss 0.00700146, throughput 3.22241K wps
Begin Testing...
[Epoch 36] train avg loss 0.0070838, dev acc 0.8024, dev avg loss 0.440914, throughput 3.2672K wps
[Epoch 37 Batch 30/62] avg loss 0.00654996, throughput 3.30282K wps
[Epoch 37 Batch 60/62] avg loss 0.00696677, throughput 3.22166K wps
Begin Testing...
[Epoch 37] train avg loss 0.00676786, dev acc 0.7965, dev avg loss 0.43939, throughput 3.26902K wps
[Epoch 38 Batch 30/62] avg loss 0.0065368, throughput 3.27631K wps
[Epoch 38 Batch 60/62] avg loss 0.00649923, throughput 3.22009K wps
Begin Testing...
[Epoch 38] train avg loss 0.00662153, dev acc 0.8053, dev avg loss 0.441198, throughput 3.25571K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/62] avg loss 0.00664446, throughput 3.28671K wps
[Epoch 39 Batch 60/62] avg loss 0.00624667, throughput 3.21509K wps
Begin Testing...
[Epoch 39] train avg loss 0.00655144, dev acc 0.8083, dev avg loss 0.434435, throughput 3.25676K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/62] avg loss 0.00627207, throughput 3.30986K wps
[Epoch 40 Batch 60/62] avg loss 0.00651045, throughput 3.2107K wps
Begin Testing...
[Epoch 40] train avg loss 0.00647627, dev acc 0.8083, dev avg loss 0.431566, throughput 3.2663K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.00603271, throughput 3.31671K wps
[Epoch 41 Batch 60/62] avg loss 0.00631563, throughput 3.23192K wps
Begin Testing...
[Epoch 41] train avg loss 0.00631863, dev acc 0.8024, dev avg loss 0.433478, throughput 3.27843K wps
[Epoch 42 Batch 30/62] avg loss 0.0061102, throughput 3.28132K wps
[Epoch 42 Batch 60/62] avg loss 0.0061217, throughput 3.22136K wps
Begin Testing...
[Epoch 42] train avg loss 0.0061316, dev acc 0.7994, dev avg loss 0.430277, throughput 3.25645K wps
[Epoch 43 Batch 30/62] avg loss 0.00595945, throughput 3.26931K wps
[Epoch 43 Batch 60/62] avg loss 0.00609992, throughput 3.21438K wps
Begin Testing...
[Epoch 43] train avg loss 0.00611292, dev acc 0.8053, dev avg loss 0.430601, throughput 3.24832K wps
[Epoch 44 Batch 30/62] avg loss 0.0057635, throughput 3.27888K wps
[Epoch 44 Batch 60/62] avg loss 0.00616383, throughput 3.2166K wps
Begin Testing...
[Epoch 44] train avg loss 0.00606401, dev acc 0.8083, dev avg loss 0.424362, throughput 3.25462K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/62] avg loss 0.00567601, throughput 3.29566K wps
[Epoch 45 Batch 60/62] avg loss 0.00564348, throughput 3.1895K wps
Begin Testing...
[Epoch 45] train avg loss 0.00567183, dev acc 0.8024, dev avg loss 0.422137, throughput 3.24977K wps
[Epoch 46 Batch 30/62] avg loss 0.00564982, throughput 3.29459K wps
[Epoch 46 Batch 60/62] avg loss 0.00595294, throughput 3.23057K wps
Begin Testing...
[Epoch 46] train avg loss 0.00584277, dev acc 0.8053, dev avg loss 0.421021, throughput 3.26913K wps
[Epoch 47 Batch 30/62] avg loss 0.00523094, throughput 3.28526K wps
[Epoch 47 Batch 60/62] avg loss 0.00574498, throughput 3.21267K wps
Begin Testing...
[Epoch 47] train avg loss 0.00553804, dev acc 0.8083, dev avg loss 0.419215, throughput 3.25309K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/62] avg loss 0.00543645, throughput 3.27476K wps
[Epoch 48 Batch 60/62] avg loss 0.00547615, throughput 3.19847K wps
Begin Testing...
[Epoch 48] train avg loss 0.00550152, dev acc 0.8053, dev avg loss 0.417092, throughput 3.24269K wps
[Epoch 49 Batch 30/62] avg loss 0.00503884, throughput 3.26049K wps
[Epoch 49 Batch 60/62] avg loss 0.00589173, throughput 3.23039K wps
Begin Testing...
[Epoch 49] train avg loss 0.00552875, dev acc 0.8112, dev avg loss 0.417501, throughput 3.25118K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.00525624, throughput 3.29019K wps
[Epoch 50 Batch 60/62] avg loss 0.00529922, throughput 3.21366K wps
Begin Testing...
[Epoch 50] train avg loss 0.00531082, dev acc 0.8053, dev avg loss 0.416997, throughput 3.25853K wps
[Epoch 51 Batch 30/62] avg loss 0.00540844, throughput 3.27478K wps
[Epoch 51 Batch 60/62] avg loss 0.0049593, throughput 3.21226K wps
Begin Testing...
[Epoch 51] train avg loss 0.0052508, dev acc 0.8142, dev avg loss 0.414352, throughput 3.25025K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00506037, throughput 3.29555K wps
[Epoch 52 Batch 60/62] avg loss 0.00512803, throughput 3.2087K wps
Begin Testing...
[Epoch 52] train avg loss 0.00512129, dev acc 0.8142, dev avg loss 0.415211, throughput 3.25792K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.0048475, throughput 3.32009K wps
[Epoch 53 Batch 60/62] avg loss 0.00516737, throughput 3.23602K wps
Begin Testing...
[Epoch 53] train avg loss 0.00504394, dev acc 0.7994, dev avg loss 0.415366, throughput 3.28369K wps
[Epoch 54 Batch 30/62] avg loss 0.00502547, throughput 3.28567K wps
[Epoch 54 Batch 60/62] avg loss 0.0047201, throughput 3.23345K wps
Begin Testing...
[Epoch 54] train avg loss 0.00493105, dev acc 0.8201, dev avg loss 0.413193, throughput 3.26588K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/62] avg loss 0.00492683, throughput 3.27463K wps
[Epoch 55 Batch 60/62] avg loss 0.00472868, throughput 3.21569K wps
Begin Testing...
[Epoch 55] train avg loss 0.00485229, dev acc 0.8053, dev avg loss 0.410175, throughput 3.25154K wps
[Epoch 56 Batch 30/62] avg loss 0.00459539, throughput 3.26005K wps
[Epoch 56 Batch 60/62] avg loss 0.00490387, throughput 3.23464K wps
Begin Testing...
[Epoch 56] train avg loss 0.00480302, dev acc 0.8024, dev avg loss 0.420922, throughput 3.25363K wps
[Epoch 57 Batch 30/62] avg loss 0.00455718, throughput 3.28812K wps
[Epoch 57 Batch 60/62] avg loss 0.00480917, throughput 3.20798K wps
Begin Testing...
[Epoch 57] train avg loss 0.00478756, dev acc 0.8171, dev avg loss 0.410371, throughput 3.25393K wps
[Epoch 58 Batch 30/62] avg loss 0.00469718, throughput 3.28465K wps
[Epoch 58 Batch 60/62] avg loss 0.00448331, throughput 3.21536K wps
Begin Testing...
[Epoch 58] train avg loss 0.0045939, dev acc 0.8083, dev avg loss 0.408273, throughput 3.25582K wps
[Epoch 59 Batch 30/62] avg loss 0.00435411, throughput 3.2795K wps
[Epoch 59 Batch 60/62] avg loss 0.00472507, throughput 3.22451K wps
Begin Testing...
[Epoch 59] train avg loss 0.004554, dev acc 0.7994, dev avg loss 0.407346, throughput 3.25905K wps
[Epoch 60 Batch 30/62] avg loss 0.0042932, throughput 3.27966K wps
[Epoch 60 Batch 60/62] avg loss 0.00451616, throughput 3.23335K wps
Begin Testing...
[Epoch 60] train avg loss 0.00440406, dev acc 0.8201, dev avg loss 0.408515, throughput 3.26231K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/62] avg loss 0.00428136, throughput 3.29057K wps
[Epoch 61 Batch 60/62] avg loss 0.00443521, throughput 3.22378K wps
Begin Testing...
[Epoch 61] train avg loss 0.0044249, dev acc 0.8053, dev avg loss 0.405408, throughput 3.26201K wps
[Epoch 62 Batch 30/62] avg loss 0.00411746, throughput 3.2801K wps
[Epoch 62 Batch 60/62] avg loss 0.00443391, throughput 3.21275K wps
Begin Testing...
[Epoch 62] train avg loss 0.00431007, dev acc 0.8083, dev avg loss 0.405685, throughput 3.25252K wps
[Epoch 63 Batch 30/62] avg loss 0.00409109, throughput 3.28137K wps
[Epoch 63 Batch 60/62] avg loss 0.00440087, throughput 3.21403K wps
Begin Testing...
[Epoch 63] train avg loss 0.00429371, dev acc 0.8024, dev avg loss 0.404694, throughput 3.2532K wps
[Epoch 64 Batch 30/62] avg loss 0.00388139, throughput 3.28877K wps
[Epoch 64 Batch 60/62] avg loss 0.00420889, throughput 3.22308K wps
Begin Testing...
[Epoch 64] train avg loss 0.00412051, dev acc 0.8053, dev avg loss 0.412468, throughput 3.26202K wps
[Epoch 65 Batch 30/62] avg loss 0.00411764, throughput 3.30365K wps
[Epoch 65 Batch 60/62] avg loss 0.00400834, throughput 3.21296K wps
Begin Testing...
[Epoch 65] train avg loss 0.00410626, dev acc 0.8083, dev avg loss 0.40372, throughput 3.26385K wps
[Epoch 66 Batch 30/62] avg loss 0.00409484, throughput 3.29211K wps
[Epoch 66 Batch 60/62] avg loss 0.00366384, throughput 3.19816K wps
Begin Testing...
[Epoch 66] train avg loss 0.00395917, dev acc 0.8083, dev avg loss 0.404317, throughput 3.25085K wps
[Epoch 67 Batch 30/62] avg loss 0.00365286, throughput 3.27728K wps
[Epoch 67 Batch 60/62] avg loss 0.00387682, throughput 3.21441K wps
Begin Testing...
[Epoch 67] train avg loss 0.00383866, dev acc 0.8053, dev avg loss 0.401277, throughput 3.25095K wps
[Epoch 68 Batch 30/62] avg loss 0.00369575, throughput 3.27657K wps
[Epoch 68 Batch 60/62] avg loss 0.00387665, throughput 3.20463K wps
Begin Testing...
[Epoch 68] train avg loss 0.00384916, dev acc 0.8083, dev avg loss 0.40309, throughput 3.24752K wps
[Epoch 69 Batch 30/62] avg loss 0.00377452, throughput 3.28294K wps
[Epoch 69 Batch 60/62] avg loss 0.00367595, throughput 3.21045K wps
Begin Testing...
[Epoch 69] train avg loss 0.00372189, dev acc 0.8053, dev avg loss 0.401563, throughput 3.25308K wps
[Epoch 70 Batch 30/62] avg loss 0.00362274, throughput 3.2703K wps
[Epoch 70 Batch 60/62] avg loss 0.00346572, throughput 3.2069K wps
Begin Testing...
[Epoch 70] train avg loss 0.00359399, dev acc 0.8053, dev avg loss 0.40027, throughput 3.2454K wps
[Epoch 71 Batch 30/62] avg loss 0.00365003, throughput 3.29064K wps
[Epoch 71 Batch 60/62] avg loss 0.00355986, throughput 3.21817K wps
Begin Testing...
[Epoch 71] train avg loss 0.00367342, dev acc 0.8142, dev avg loss 0.400953, throughput 3.26007K wps
[Epoch 72 Batch 30/62] avg loss 0.00335807, throughput 3.27862K wps
[Epoch 72 Batch 60/62] avg loss 0.00341462, throughput 3.22491K wps
Begin Testing...
[Epoch 72] train avg loss 0.00345726, dev acc 0.8142, dev avg loss 0.403076, throughput 3.25786K wps
[Epoch 73 Batch 30/62] avg loss 0.00327668, throughput 3.28603K wps
[Epoch 73 Batch 60/62] avg loss 0.00347291, throughput 3.22996K wps
Begin Testing...
[Epoch 73] train avg loss 0.00341977, dev acc 0.8053, dev avg loss 0.400538, throughput 3.26427K wps
[Epoch 74 Batch 30/62] avg loss 0.00341166, throughput 3.30399K wps
[Epoch 74 Batch 60/62] avg loss 0.00339612, throughput 3.21797K wps
Begin Testing...
[Epoch 74] train avg loss 0.00343886, dev acc 0.8201, dev avg loss 0.401925, throughput 3.26725K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/62] avg loss 0.0032743, throughput 3.30089K wps
[Epoch 75 Batch 60/62] avg loss 0.00316284, throughput 3.216K wps
Begin Testing...
[Epoch 75] train avg loss 0.00328105, dev acc 0.8083, dev avg loss 0.398861, throughput 3.26539K wps
[Epoch 76 Batch 30/62] avg loss 0.00315173, throughput 3.30583K wps
[Epoch 76 Batch 60/62] avg loss 0.00317791, throughput 3.21908K wps
Begin Testing...
[Epoch 76] train avg loss 0.00321054, dev acc 0.8053, dev avg loss 0.398775, throughput 3.26889K wps
[Epoch 77 Batch 30/62] avg loss 0.00302937, throughput 3.29825K wps
[Epoch 77 Batch 60/62] avg loss 0.00324954, throughput 3.23111K wps
Begin Testing...
[Epoch 77] train avg loss 0.00319523, dev acc 0.8083, dev avg loss 0.400734, throughput 3.27021K wps
[Epoch 78 Batch 30/62] avg loss 0.0030397, throughput 3.25014K wps
[Epoch 78 Batch 60/62] avg loss 0.00326679, throughput 3.19696K wps
Begin Testing...
[Epoch 78] train avg loss 0.00318037, dev acc 0.8024, dev avg loss 0.398541, throughput 3.23172K wps
[Epoch 79 Batch 30/62] avg loss 0.00305658, throughput 3.29989K wps
[Epoch 79 Batch 60/62] avg loss 0.00307297, throughput 3.22092K wps
Begin Testing...
[Epoch 79] train avg loss 0.00310116, dev acc 0.8083, dev avg loss 0.399441, throughput 3.2657K wps
[Epoch 80 Batch 30/62] avg loss 0.00286601, throughput 3.28135K wps
[Epoch 80 Batch 60/62] avg loss 0.00310879, throughput 3.20209K wps
Begin Testing...
[Epoch 80] train avg loss 0.00306625, dev acc 0.8083, dev avg loss 0.401721, throughput 3.24733K wps
[Epoch 81 Batch 30/62] avg loss 0.00288468, throughput 3.28493K wps
[Epoch 81 Batch 60/62] avg loss 0.0028979, throughput 3.21116K wps
Begin Testing...
[Epoch 81] train avg loss 0.00291323, dev acc 0.8112, dev avg loss 0.400399, throughput 3.25423K wps
[Epoch 82 Batch 30/62] avg loss 0.00279862, throughput 3.28424K wps
[Epoch 82 Batch 60/62] avg loss 0.00301307, throughput 3.205K wps
Begin Testing...
[Epoch 82] train avg loss 0.00291466, dev acc 0.8112, dev avg loss 0.40083, throughput 3.25248K wps
[Epoch 83 Batch 30/62] avg loss 0.00289027, throughput 3.29528K wps
[Epoch 83 Batch 60/62] avg loss 0.00277412, throughput 3.20705K wps
Begin Testing...
[Epoch 83] train avg loss 0.00285632, dev acc 0.8112, dev avg loss 0.400035, throughput 3.25677K wps
[Epoch 84 Batch 30/62] avg loss 0.00278426, throughput 3.28744K wps
[Epoch 84 Batch 60/62] avg loss 0.00276482, throughput 3.20437K wps
Begin Testing...
[Epoch 84] train avg loss 0.00277536, dev acc 0.8112, dev avg loss 0.400398, throughput 3.25217K wps
[Epoch 85 Batch 30/62] avg loss 0.00282791, throughput 3.29058K wps
[Epoch 85 Batch 60/62] avg loss 0.00265121, throughput 3.21091K wps
Begin Testing...
[Epoch 85] train avg loss 0.00278543, dev acc 0.8112, dev avg loss 0.399659, throughput 3.25683K wps
[Epoch 86 Batch 30/62] avg loss 0.00271217, throughput 3.29081K wps
[Epoch 86 Batch 60/62] avg loss 0.00264704, throughput 3.19168K wps
Begin Testing...
[Epoch 86] train avg loss 0.00272587, dev acc 0.8053, dev avg loss 0.400296, throughput 3.24779K wps
[Epoch 87 Batch 30/62] avg loss 0.00251308, throughput 3.28763K wps
[Epoch 87 Batch 60/62] avg loss 0.00259957, throughput 3.19177K wps
Begin Testing...
[Epoch 87] train avg loss 0.00260111, dev acc 0.8112, dev avg loss 0.401228, throughput 3.24587K wps
[Epoch 88 Batch 30/62] avg loss 0.0026178, throughput 3.29098K wps
[Epoch 88 Batch 60/62] avg loss 0.00248984, throughput 3.20827K wps
Begin Testing...
[Epoch 88] train avg loss 0.00260844, dev acc 0.8112, dev avg loss 0.400051, throughput 3.25684K wps
[Epoch 89 Batch 30/62] avg loss 0.00266529, throughput 3.31522K wps
[Epoch 89 Batch 60/62] avg loss 0.0025655, throughput 3.2336K wps
Begin Testing...
[Epoch 89] train avg loss 0.00265902, dev acc 0.8112, dev avg loss 0.404846, throughput 3.27995K wps
[Epoch 90 Batch 30/62] avg loss 0.00272853, throughput 3.28545K wps
[Epoch 90 Batch 60/62] avg loss 0.00240951, throughput 3.21307K wps
Begin Testing...
[Epoch 90] train avg loss 0.0025909, dev acc 0.8112, dev avg loss 0.400849, throughput 3.25544K wps
[Epoch 91 Batch 30/62] avg loss 0.00259679, throughput 3.28036K wps
[Epoch 91 Batch 60/62] avg loss 0.00240388, throughput 3.21279K wps
Begin Testing...
[Epoch 91] train avg loss 0.00255163, dev acc 0.8053, dev avg loss 0.401817, throughput 3.253K wps
[Epoch 92 Batch 30/62] avg loss 0.00244204, throughput 3.28475K wps
[Epoch 92 Batch 60/62] avg loss 0.00218981, throughput 3.22584K wps
Begin Testing...
[Epoch 92] train avg loss 0.00234124, dev acc 0.8112, dev avg loss 0.402752, throughput 3.26162K wps
[Epoch 93 Batch 30/62] avg loss 0.00240625, throughput 3.27567K wps
[Epoch 93 Batch 60/62] avg loss 0.00232897, throughput 3.2216K wps
Begin Testing...
[Epoch 93] train avg loss 0.0024025, dev acc 0.8112, dev avg loss 0.402523, throughput 3.25445K wps
[Epoch 94 Batch 30/62] avg loss 0.00232013, throughput 3.28184K wps
[Epoch 94 Batch 60/62] avg loss 0.00246157, throughput 3.21327K wps
Begin Testing...
[Epoch 94] train avg loss 0.00240375, dev acc 0.8083, dev avg loss 0.404192, throughput 3.25322K wps
[Epoch 95 Batch 30/62] avg loss 0.00237881, throughput 3.27011K wps
[Epoch 95 Batch 60/62] avg loss 0.00222527, throughput 3.22702K wps
Begin Testing...
[Epoch 95] train avg loss 0.00234363, dev acc 0.8112, dev avg loss 0.403489, throughput 3.25538K wps
[Epoch 96 Batch 30/62] avg loss 0.0022149, throughput 3.27817K wps
[Epoch 96 Batch 60/62] avg loss 0.00208477, throughput 3.21583K wps
Begin Testing...
[Epoch 96] train avg loss 0.00219463, dev acc 0.8083, dev avg loss 0.407771, throughput 3.2538K wps
[Epoch 97 Batch 30/62] avg loss 0.00219812, throughput 3.28266K wps
[Epoch 97 Batch 60/62] avg loss 0.00222729, throughput 3.20792K wps
Begin Testing...
[Epoch 97] train avg loss 0.00224207, dev acc 0.8112, dev avg loss 0.403784, throughput 3.25137K wps
[Epoch 98 Batch 30/62] avg loss 0.00213654, throughput 3.28825K wps
[Epoch 98 Batch 60/62] avg loss 0.00227316, throughput 3.20459K wps
Begin Testing...
[Epoch 98] train avg loss 0.00225564, dev acc 0.8083, dev avg loss 0.406083, throughput 3.25129K wps
[Epoch 99 Batch 30/62] avg loss 0.00204828, throughput 3.28941K wps
[Epoch 99 Batch 60/62] avg loss 0.0023114, throughput 3.21433K wps
Begin Testing...
[Epoch 99] train avg loss 0.0022113, dev acc 0.8053, dev avg loss 0.403894, throughput 3.25733K wps
[Epoch 100 Batch 30/62] avg loss 0.00208192, throughput 3.30329K wps
[Epoch 100 Batch 60/62] avg loss 0.00203693, throughput 3.23238K wps
Begin Testing...
[Epoch 100] train avg loss 0.00213629, dev acc 0.8083, dev avg loss 0.411558, throughput 3.27389K wps
[Epoch 101 Batch 30/62] avg loss 0.00212773, throughput 3.28692K wps
[Epoch 101 Batch 60/62] avg loss 0.00197113, throughput 3.21048K wps
Begin Testing...
[Epoch 101] train avg loss 0.00206857, dev acc 0.8053, dev avg loss 0.405191, throughput 3.25531K wps
[Epoch 102 Batch 30/62] avg loss 0.00198968, throughput 3.3037K wps
[Epoch 102 Batch 60/62] avg loss 0.00209209, throughput 3.21332K wps
Begin Testing...
[Epoch 102] train avg loss 0.00206124, dev acc 0.8053, dev avg loss 0.406529, throughput 3.26557K wps
[Epoch 103 Batch 30/62] avg loss 0.0019113, throughput 3.31851K wps
[Epoch 103 Batch 60/62] avg loss 0.00201679, throughput 3.21453K wps
Begin Testing...
[Epoch 103] train avg loss 0.00203503, dev acc 0.8053, dev avg loss 0.414982, throughput 3.27095K wps
[Epoch 104 Batch 30/62] avg loss 0.00206698, throughput 3.2882K wps
[Epoch 104 Batch 60/62] avg loss 0.00196031, throughput 3.22147K wps
Begin Testing...
[Epoch 104] train avg loss 0.00207722, dev acc 0.8083, dev avg loss 0.407009, throughput 3.2606K wps
[Epoch 105 Batch 30/62] avg loss 0.00199699, throughput 3.27894K wps
[Epoch 105 Batch 60/62] avg loss 0.00197135, throughput 3.21173K wps
Begin Testing...
[Epoch 105] train avg loss 0.00198402, dev acc 0.8142, dev avg loss 0.405231, throughput 3.25108K wps
[Epoch 106 Batch 30/62] avg loss 0.00180912, throughput 3.27158K wps
[Epoch 106 Batch 60/62] avg loss 0.00187169, throughput 3.21562K wps
Begin Testing...
[Epoch 106] train avg loss 0.00185717, dev acc 0.8024, dev avg loss 0.405989, throughput 3.25046K wps
[Epoch 107 Batch 30/62] avg loss 0.00179079, throughput 3.28874K wps
[Epoch 107 Batch 60/62] avg loss 0.00196448, throughput 3.21793K wps
Begin Testing...
[Epoch 107] train avg loss 0.00187182, dev acc 0.8053, dev avg loss 0.406353, throughput 3.25904K wps
[Epoch 108 Batch 30/62] avg loss 0.0018227, throughput 3.2768K wps
[Epoch 108 Batch 60/62] avg loss 0.00190352, throughput 3.20913K wps
Begin Testing...
[Epoch 108] train avg loss 0.00187463, dev acc 0.8083, dev avg loss 0.408382, throughput 3.2497K wps
[Epoch 109 Batch 30/62] avg loss 0.0017166, throughput 3.29451K wps
[Epoch 109 Batch 60/62] avg loss 0.00171986, throughput 3.16546K wps
Begin Testing...
[Epoch 109] train avg loss 0.00172381, dev acc 0.8053, dev avg loss 0.407148, throughput 3.2348K wps
[Epoch 110 Batch 30/62] avg loss 0.00179798, throughput 3.25914K wps
[Epoch 110 Batch 60/62] avg loss 0.00186306, throughput 3.20471K wps
Begin Testing...
[Epoch 110] train avg loss 0.00186181, dev acc 0.8024, dev avg loss 0.407367, throughput 3.23735K wps
[Epoch 111 Batch 30/62] avg loss 0.00175886, throughput 3.29152K wps
[Epoch 111 Batch 60/62] avg loss 0.0018172, throughput 3.20487K wps
Begin Testing...
[Epoch 111] train avg loss 0.00185204, dev acc 0.8112, dev avg loss 0.416082, throughput 3.25442K wps
[Epoch 112 Batch 30/62] avg loss 0.0016887, throughput 3.27382K wps
[Epoch 112 Batch 60/62] avg loss 0.00170453, throughput 3.22255K wps
Begin Testing...