Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
9206 lines (9205 sloc) 520 KB
Namespace(batch_size=50, data_name='CR', dropout=0.5, epochs=200, gpu=0, log_interval=30, model_mode='static')
Use gpu0
maximum length (in tokens): 105
Done! Tokenizing Time=0.06s, #Sentences=3775
SentimentNet(
(embedding): Embedding(5343 -> 300, float32)
(encoder): ConvolutionalEncoder(
(_convs): HybridConcurrent(
(0): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(3,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(1): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(4,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
(2): HybridSequential(
(0): Conv1D(300 -> 100, kernel_size=(5,), stride=(1,))
(1): HybridLambda(<lambda>)
(2): Activation(relu)
)
)
)
(output): HybridSequential(
(0): Dropout(p = 0.5, axes=())
(1): Dense(None -> 2, linear)
)
)
[Epoch 0 Batch 30/62] avg loss 0.013453, throughput 0.49332K wps
[Epoch 0 Batch 60/62] avg loss 0.013201, throughput 9.12905K wps
Begin Testing...
[Epoch 0] train avg loss 0.0135135, dev acc 0.6372, dev avg loss 0.657292, throughput 0.589541K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.013074, throughput 9.17721K wps
[Epoch 1 Batch 60/62] avg loss 0.0131383, throughput 8.83377K wps
Begin Testing...
[Epoch 1] train avg loss 0.0132389, dev acc 0.6372, dev avg loss 0.653948, throughput 8.97021K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0130544, throughput 8.66364K wps
[Epoch 2 Batch 60/62] avg loss 0.0130481, throughput 9.07271K wps
Begin Testing...
[Epoch 2] train avg loss 0.0132125, dev acc 0.6372, dev avg loss 0.649604, throughput 8.83062K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0129246, throughput 9.11918K wps
[Epoch 3 Batch 60/62] avg loss 0.012915, throughput 8.60789K wps
Begin Testing...
[Epoch 3] train avg loss 0.0130879, dev acc 0.6372, dev avg loss 0.645271, throughput 8.83703K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0127787, throughput 8.99971K wps
[Epoch 4 Batch 60/62] avg loss 0.0128814, throughput 8.90885K wps
Begin Testing...
[Epoch 4] train avg loss 0.0129779, dev acc 0.6372, dev avg loss 0.641626, throughput 9.01854K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0127575, throughput 8.92341K wps
[Epoch 5 Batch 60/62] avg loss 0.0125757, throughput 8.87927K wps
Begin Testing...
[Epoch 5] train avg loss 0.0128468, dev acc 0.6372, dev avg loss 0.63822, throughput 8.986K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0126182, throughput 9.05278K wps
[Epoch 6 Batch 60/62] avg loss 0.012557, throughput 9.00891K wps
Begin Testing...
[Epoch 6] train avg loss 0.0127479, dev acc 0.6372, dev avg loss 0.634399, throughput 9.0268K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0123676, throughput 9.07123K wps
[Epoch 7 Batch 60/62] avg loss 0.012585, throughput 8.77266K wps
Begin Testing...
[Epoch 7] train avg loss 0.0126254, dev acc 0.6372, dev avg loss 0.630082, throughput 8.90005K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0124733, throughput 8.11682K wps
[Epoch 8 Batch 60/62] avg loss 0.0123281, throughput 8.9684K wps
Begin Testing...
[Epoch 8] train avg loss 0.0125728, dev acc 0.6372, dev avg loss 0.626247, throughput 8.51738K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0124185, throughput 8.91172K wps
[Epoch 9 Batch 60/62] avg loss 0.012148, throughput 8.92059K wps
Begin Testing...
[Epoch 9] train avg loss 0.0124306, dev acc 0.6372, dev avg loss 0.622376, throughput 8.95295K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0122605, throughput 9.01374K wps
[Epoch 10 Batch 60/62] avg loss 0.0123215, throughput 9.07611K wps
Begin Testing...
[Epoch 10] train avg loss 0.0124635, dev acc 0.6401, dev avg loss 0.618335, throughput 9.00136K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0121857, throughput 9.3083K wps
[Epoch 11 Batch 60/62] avg loss 0.0121149, throughput 9.10172K wps
Begin Testing...
[Epoch 11] train avg loss 0.0122647, dev acc 0.6372, dev avg loss 0.614826, throughput 9.23429K wps
[Epoch 12 Batch 30/62] avg loss 0.0120562, throughput 9.09189K wps
[Epoch 12 Batch 60/62] avg loss 0.0120224, throughput 8.9159K wps
Begin Testing...
[Epoch 12] train avg loss 0.0122215, dev acc 0.6401, dev avg loss 0.609992, throughput 8.96861K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0119838, throughput 9.02356K wps
[Epoch 13 Batch 60/62] avg loss 0.0117957, throughput 9.13158K wps
Begin Testing...
[Epoch 13] train avg loss 0.0119851, dev acc 0.6401, dev avg loss 0.605948, throughput 9.10396K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0117255, throughput 9.20371K wps
[Epoch 14 Batch 60/62] avg loss 0.0116741, throughput 9.1456K wps
Begin Testing...
[Epoch 14] train avg loss 0.0118635, dev acc 0.6490, dev avg loss 0.600683, throughput 9.20056K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0118932, throughput 9.0326K wps
[Epoch 15 Batch 60/62] avg loss 0.0115695, throughput 9.07056K wps
Begin Testing...
[Epoch 15] train avg loss 0.0119248, dev acc 0.6549, dev avg loss 0.59635, throughput 9.0842K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0114168, throughput 8.90957K wps
[Epoch 16 Batch 60/62] avg loss 0.011877, throughput 9.1071K wps
Begin Testing...
[Epoch 16] train avg loss 0.0118246, dev acc 0.6726, dev avg loss 0.592291, throughput 8.97878K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0114914, throughput 9.12496K wps
[Epoch 17 Batch 60/62] avg loss 0.0114783, throughput 8.88206K wps
Begin Testing...
[Epoch 17] train avg loss 0.0116381, dev acc 0.6608, dev avg loss 0.586603, throughput 9.03663K wps
[Epoch 18 Batch 30/62] avg loss 0.0114069, throughput 8.88985K wps
[Epoch 18 Batch 60/62] avg loss 0.0113029, throughput 9.20052K wps
Begin Testing...
[Epoch 18] train avg loss 0.0115518, dev acc 0.7139, dev avg loss 0.584607, throughput 9.0751K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0112828, throughput 9.03078K wps
[Epoch 19 Batch 60/62] avg loss 0.0111069, throughput 8.79403K wps
Begin Testing...
[Epoch 19] train avg loss 0.0113467, dev acc 0.6755, dev avg loss 0.576618, throughput 8.9452K wps
[Epoch 20 Batch 30/62] avg loss 0.0110237, throughput 9.15165K wps
[Epoch 20 Batch 60/62] avg loss 0.0111444, throughput 9.1099K wps
Begin Testing...
[Epoch 20] train avg loss 0.0112196, dev acc 0.6814, dev avg loss 0.571544, throughput 9.15182K wps
[Epoch 21 Batch 30/62] avg loss 0.0109504, throughput 8.91073K wps
[Epoch 21 Batch 60/62] avg loss 0.010932, throughput 9.06385K wps
Begin Testing...
[Epoch 21] train avg loss 0.0111048, dev acc 0.7198, dev avg loss 0.566694, throughput 9.02151K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.0108412, throughput 9.03622K wps
[Epoch 22 Batch 60/62] avg loss 0.010773, throughput 9.0215K wps
Begin Testing...
[Epoch 22] train avg loss 0.0109251, dev acc 0.6932, dev avg loss 0.561649, throughput 9.0084K wps
[Epoch 23 Batch 30/62] avg loss 0.0109007, throughput 8.83497K wps
[Epoch 23 Batch 60/62] avg loss 0.0106117, throughput 9.25983K wps
Begin Testing...
[Epoch 23] train avg loss 0.0109197, dev acc 0.6991, dev avg loss 0.557275, throughput 9.07533K wps
[Epoch 24 Batch 30/62] avg loss 0.0107054, throughput 8.89435K wps
[Epoch 24 Batch 60/62] avg loss 0.0104663, throughput 8.60671K wps
Begin Testing...
[Epoch 24] train avg loss 0.0106969, dev acc 0.7168, dev avg loss 0.550998, throughput 8.72602K wps
[Epoch 25 Batch 30/62] avg loss 0.0105949, throughput 9.12345K wps
[Epoch 25 Batch 60/62] avg loss 0.0103349, throughput 9.21381K wps
Begin Testing...
[Epoch 25] train avg loss 0.0106174, dev acc 0.7139, dev avg loss 0.546022, throughput 9.19491K wps
[Epoch 26 Batch 30/62] avg loss 0.010135, throughput 9.05103K wps
[Epoch 26 Batch 60/62] avg loss 0.0105833, throughput 8.9891K wps
Begin Testing...
[Epoch 26] train avg loss 0.0104676, dev acc 0.7257, dev avg loss 0.541696, throughput 9.04878K wps
Observed Improvement.
Begin Testing...
[Epoch 27 Batch 30/62] avg loss 0.0103391, throughput 9.24173K wps
[Epoch 27 Batch 60/62] avg loss 0.0101502, throughput 9.10942K wps
Begin Testing...
[Epoch 27] train avg loss 0.0103526, dev acc 0.7168, dev avg loss 0.536232, throughput 9.13777K wps
[Epoch 28 Batch 30/62] avg loss 0.00999801, throughput 8.95888K wps
[Epoch 28 Batch 60/62] avg loss 0.0102347, throughput 8.8371K wps
Begin Testing...
[Epoch 28] train avg loss 0.0101968, dev acc 0.7316, dev avg loss 0.531448, throughput 8.87066K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.00996894, throughput 8.91219K wps
[Epoch 29 Batch 60/62] avg loss 0.0100278, throughput 9.03408K wps
Begin Testing...
[Epoch 29] train avg loss 0.010146, dev acc 0.7286, dev avg loss 0.526901, throughput 8.94379K wps
[Epoch 30 Batch 30/62] avg loss 0.00992958, throughput 9.03941K wps
[Epoch 30 Batch 60/62] avg loss 0.00959382, throughput 9.07577K wps
Begin Testing...
[Epoch 30] train avg loss 0.00987068, dev acc 0.7286, dev avg loss 0.52216, throughput 9.08899K wps
[Epoch 31 Batch 30/62] avg loss 0.00956067, throughput 8.94866K wps
[Epoch 31 Batch 60/62] avg loss 0.00998356, throughput 8.94925K wps
Begin Testing...
[Epoch 31] train avg loss 0.00991217, dev acc 0.7286, dev avg loss 0.518666, throughput 8.91316K wps
[Epoch 32 Batch 30/62] avg loss 0.0096141, throughput 8.76619K wps
[Epoch 32 Batch 60/62] avg loss 0.00955195, throughput 9.05307K wps
Begin Testing...
[Epoch 32] train avg loss 0.00966685, dev acc 0.7316, dev avg loss 0.51362, throughput 8.87928K wps
Observed Improvement.
Begin Testing...
[Epoch 33 Batch 30/62] avg loss 0.00950129, throughput 8.97928K wps
[Epoch 33 Batch 60/62] avg loss 0.00953359, throughput 9.21216K wps
Begin Testing...
[Epoch 33] train avg loss 0.00965052, dev acc 0.7404, dev avg loss 0.508783, throughput 9.12611K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.00933903, throughput 9.19204K wps
[Epoch 34 Batch 60/62] avg loss 0.00953598, throughput 8.94483K wps
Begin Testing...
[Epoch 34] train avg loss 0.00950503, dev acc 0.7316, dev avg loss 0.506524, throughput 9.11503K wps
[Epoch 35 Batch 30/62] avg loss 0.00922015, throughput 9.22312K wps
[Epoch 35 Batch 60/62] avg loss 0.0092717, throughput 9.02447K wps
Begin Testing...
[Epoch 35] train avg loss 0.00945783, dev acc 0.7463, dev avg loss 0.500936, throughput 9.15184K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/62] avg loss 0.00907356, throughput 9.06956K wps
[Epoch 36 Batch 60/62] avg loss 0.00925711, throughput 8.76336K wps
Begin Testing...
[Epoch 36] train avg loss 0.00926037, dev acc 0.7345, dev avg loss 0.499853, throughput 8.98965K wps
[Epoch 37 Batch 30/62] avg loss 0.00910382, throughput 9.25162K wps
[Epoch 37 Batch 60/62] avg loss 0.00901089, throughput 9.12349K wps
Begin Testing...
[Epoch 37] train avg loss 0.00916189, dev acc 0.7345, dev avg loss 0.496313, throughput 9.21512K wps
[Epoch 38 Batch 30/62] avg loss 0.00890501, throughput 9.29348K wps
[Epoch 38 Batch 60/62] avg loss 0.00907297, throughput 9.12434K wps
Begin Testing...
[Epoch 38] train avg loss 0.00909539, dev acc 0.7404, dev avg loss 0.492768, throughput 9.2325K wps
[Epoch 39 Batch 30/62] avg loss 0.00874948, throughput 9.12379K wps
[Epoch 39 Batch 60/62] avg loss 0.00882199, throughput 8.82559K wps
Begin Testing...
[Epoch 39] train avg loss 0.00895667, dev acc 0.7611, dev avg loss 0.486491, throughput 8.99236K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/62] avg loss 0.00877526, throughput 9.31788K wps
[Epoch 40 Batch 60/62] avg loss 0.00874203, throughput 9.0106K wps
Begin Testing...
[Epoch 40] train avg loss 0.00885403, dev acc 0.7581, dev avg loss 0.483959, throughput 9.18908K wps
[Epoch 41 Batch 30/62] avg loss 0.00861547, throughput 8.79846K wps
[Epoch 41 Batch 60/62] avg loss 0.00889995, throughput 9.02198K wps
Begin Testing...
[Epoch 41] train avg loss 0.00884213, dev acc 0.7404, dev avg loss 0.48495, throughput 8.94617K wps
[Epoch 42 Batch 30/62] avg loss 0.00846483, throughput 8.97131K wps
[Epoch 42 Batch 60/62] avg loss 0.00893218, throughput 9.11869K wps
Begin Testing...
[Epoch 42] train avg loss 0.00878803, dev acc 0.7463, dev avg loss 0.478549, throughput 9.02522K wps
[Epoch 43 Batch 30/62] avg loss 0.00840737, throughput 8.72396K wps
[Epoch 43 Batch 60/62] avg loss 0.00846597, throughput 8.95584K wps
Begin Testing...
[Epoch 43] train avg loss 0.00852082, dev acc 0.7552, dev avg loss 0.476659, throughput 8.88921K wps
[Epoch 44 Batch 30/62] avg loss 0.00825164, throughput 9.1735K wps
[Epoch 44 Batch 60/62] avg loss 0.0086298, throughput 8.95585K wps
Begin Testing...
[Epoch 44] train avg loss 0.00855496, dev acc 0.7463, dev avg loss 0.475103, throughput 9.0327K wps
[Epoch 45 Batch 30/62] avg loss 0.00841171, throughput 9.07456K wps
[Epoch 45 Batch 60/62] avg loss 0.00830181, throughput 8.92304K wps
Begin Testing...
[Epoch 45] train avg loss 0.00841617, dev acc 0.7581, dev avg loss 0.471742, throughput 9.0567K wps
[Epoch 46 Batch 30/62] avg loss 0.00797359, throughput 8.95727K wps
[Epoch 46 Batch 60/62] avg loss 0.00830619, throughput 9.11829K wps
Begin Testing...
[Epoch 46] train avg loss 0.00823579, dev acc 0.7522, dev avg loss 0.469274, throughput 9.06373K wps
[Epoch 47 Batch 30/62] avg loss 0.008164, throughput 9.15505K wps
[Epoch 47 Batch 60/62] avg loss 0.0080675, throughput 8.90252K wps
Begin Testing...
[Epoch 47] train avg loss 0.00817848, dev acc 0.7670, dev avg loss 0.465227, throughput 9.00252K wps
Observed Improvement.
Begin Testing...
[Epoch 48 Batch 30/62] avg loss 0.00804088, throughput 9.14932K wps
[Epoch 48 Batch 60/62] avg loss 0.007914, throughput 9.05112K wps
Begin Testing...
[Epoch 48] train avg loss 0.00806024, dev acc 0.7699, dev avg loss 0.464078, throughput 9.13131K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/62] avg loss 0.00817458, throughput 9.21813K wps
[Epoch 49 Batch 60/62] avg loss 0.00773359, throughput 8.93449K wps
Begin Testing...
[Epoch 49] train avg loss 0.00806214, dev acc 0.7552, dev avg loss 0.459824, throughput 9.05782K wps
[Epoch 50 Batch 30/62] avg loss 0.00823322, throughput 8.93999K wps
[Epoch 50 Batch 60/62] avg loss 0.00775749, throughput 8.95163K wps
Begin Testing...
[Epoch 50] train avg loss 0.00804411, dev acc 0.7729, dev avg loss 0.460917, throughput 8.92001K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/62] avg loss 0.00775539, throughput 9.11122K wps
[Epoch 51 Batch 60/62] avg loss 0.00785642, throughput 9.14545K wps
Begin Testing...
[Epoch 51] train avg loss 0.00790299, dev acc 0.7522, dev avg loss 0.456187, throughput 9.14812K wps
[Epoch 52 Batch 30/62] avg loss 0.00780409, throughput 9.30268K wps
[Epoch 52 Batch 60/62] avg loss 0.0077841, throughput 8.82176K wps
Begin Testing...
[Epoch 52] train avg loss 0.00784795, dev acc 0.7552, dev avg loss 0.45388, throughput 9.14103K wps
[Epoch 53 Batch 30/62] avg loss 0.00784294, throughput 9.09104K wps
[Epoch 53 Batch 60/62] avg loss 0.00753989, throughput 8.90841K wps
Begin Testing...
[Epoch 53] train avg loss 0.00780705, dev acc 0.7670, dev avg loss 0.452195, throughput 9.03255K wps
[Epoch 54 Batch 30/62] avg loss 0.00753026, throughput 9.25266K wps
[Epoch 54 Batch 60/62] avg loss 0.00758497, throughput 8.8184K wps
Begin Testing...
[Epoch 54] train avg loss 0.00762992, dev acc 0.7699, dev avg loss 0.45129, throughput 9.09115K wps
[Epoch 55 Batch 30/62] avg loss 0.00741803, throughput 9.14228K wps
[Epoch 55 Batch 60/62] avg loss 0.00750747, throughput 8.99334K wps
Begin Testing...
[Epoch 55] train avg loss 0.00764535, dev acc 0.7788, dev avg loss 0.45077, throughput 9.05066K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.00742768, throughput 9.16388K wps
[Epoch 56 Batch 60/62] avg loss 0.00740477, throughput 8.78126K wps
Begin Testing...
[Epoch 56] train avg loss 0.00751038, dev acc 0.7788, dev avg loss 0.448754, throughput 8.93841K wps
Observed Improvement.
Begin Testing...
[Epoch 57 Batch 30/62] avg loss 0.00729724, throughput 8.94249K wps
[Epoch 57 Batch 60/62] avg loss 0.00764655, throughput 9.02877K wps
Begin Testing...
[Epoch 57] train avg loss 0.00755091, dev acc 0.7729, dev avg loss 0.445422, throughput 9.02003K wps
[Epoch 58 Batch 30/62] avg loss 0.00717416, throughput 9.10042K wps
[Epoch 58 Batch 60/62] avg loss 0.00735724, throughput 8.96535K wps
Begin Testing...
[Epoch 58] train avg loss 0.00732077, dev acc 0.7640, dev avg loss 0.442795, throughput 9.01293K wps
[Epoch 59 Batch 30/62] avg loss 0.00720923, throughput 8.94719K wps
[Epoch 59 Batch 60/62] avg loss 0.00732043, throughput 8.9918K wps
Begin Testing...
[Epoch 59] train avg loss 0.0073602, dev acc 0.7699, dev avg loss 0.441188, throughput 8.99949K wps
[Epoch 60 Batch 30/62] avg loss 0.00704739, throughput 8.97699K wps
[Epoch 60 Batch 60/62] avg loss 0.0071898, throughput 8.55141K wps
Begin Testing...
[Epoch 60] train avg loss 0.00720779, dev acc 0.7670, dev avg loss 0.440637, throughput 8.85061K wps
[Epoch 61 Batch 30/62] avg loss 0.0072143, throughput 9.08087K wps
[Epoch 61 Batch 60/62] avg loss 0.00690353, throughput 9.13586K wps
Begin Testing...
[Epoch 61] train avg loss 0.00711217, dev acc 0.7729, dev avg loss 0.438115, throughput 9.14008K wps
[Epoch 62 Batch 30/62] avg loss 0.0069226, throughput 9.12344K wps
[Epoch 62 Batch 60/62] avg loss 0.00702295, throughput 8.91584K wps
Begin Testing...
[Epoch 62] train avg loss 0.0071209, dev acc 0.7699, dev avg loss 0.436896, throughput 9.04998K wps
[Epoch 63 Batch 30/62] avg loss 0.00675408, throughput 9.12852K wps
[Epoch 63 Batch 60/62] avg loss 0.0071274, throughput 9.01125K wps
Begin Testing...
[Epoch 63] train avg loss 0.00706786, dev acc 0.7729, dev avg loss 0.435631, throughput 9.10021K wps
[Epoch 64 Batch 30/62] avg loss 0.006775, throughput 9.04138K wps
[Epoch 64 Batch 60/62] avg loss 0.00699634, throughput 8.95144K wps
Begin Testing...
[Epoch 64] train avg loss 0.00689939, dev acc 0.7729, dev avg loss 0.434392, throughput 8.93991K wps
[Epoch 65 Batch 30/62] avg loss 0.00673094, throughput 9.2376K wps
[Epoch 65 Batch 60/62] avg loss 0.0067916, throughput 9.13617K wps
Begin Testing...
[Epoch 65] train avg loss 0.00689955, dev acc 0.7935, dev avg loss 0.442285, throughput 9.16779K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/62] avg loss 0.0066437, throughput 8.95322K wps
[Epoch 66 Batch 60/62] avg loss 0.00660363, throughput 8.67632K wps
Begin Testing...
[Epoch 66] train avg loss 0.00668272, dev acc 0.7876, dev avg loss 0.433203, throughput 8.79881K wps
[Epoch 67 Batch 30/62] avg loss 0.00670346, throughput 9.14089K wps
[Epoch 67 Batch 60/62] avg loss 0.00660382, throughput 8.81072K wps
Begin Testing...
[Epoch 67] train avg loss 0.00670306, dev acc 0.7788, dev avg loss 0.431023, throughput 8.93328K wps
[Epoch 68 Batch 30/62] avg loss 0.00643375, throughput 9.23371K wps
[Epoch 68 Batch 60/62] avg loss 0.00655995, throughput 8.97087K wps
Begin Testing...
[Epoch 68] train avg loss 0.00656565, dev acc 0.7817, dev avg loss 0.428555, throughput 9.12207K wps
[Epoch 69 Batch 30/62] avg loss 0.00665443, throughput 9.0682K wps
[Epoch 69 Batch 60/62] avg loss 0.00633594, throughput 9.13622K wps
Begin Testing...
[Epoch 69] train avg loss 0.00654743, dev acc 0.7788, dev avg loss 0.427685, throughput 9.12301K wps
[Epoch 70 Batch 30/62] avg loss 0.00631656, throughput 9.19992K wps
[Epoch 70 Batch 60/62] avg loss 0.00651838, throughput 9.16821K wps
Begin Testing...
[Epoch 70] train avg loss 0.00646272, dev acc 0.7817, dev avg loss 0.42719, throughput 9.21321K wps
[Epoch 71 Batch 30/62] avg loss 0.00621552, throughput 8.96154K wps
[Epoch 71 Batch 60/62] avg loss 0.00639193, throughput 9.02224K wps
Begin Testing...
[Epoch 71] train avg loss 0.00639575, dev acc 0.7847, dev avg loss 0.424682, throughput 9.0197K wps
[Epoch 72 Batch 30/62] avg loss 0.00625032, throughput 9.11981K wps
[Epoch 72 Batch 60/62] avg loss 0.00625705, throughput 9.04386K wps
Begin Testing...
[Epoch 72] train avg loss 0.00633731, dev acc 0.7965, dev avg loss 0.430137, throughput 9.11046K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/62] avg loss 0.00625451, throughput 8.94292K wps
[Epoch 73 Batch 60/62] avg loss 0.00623312, throughput 9.04673K wps
Begin Testing...
[Epoch 73] train avg loss 0.00632865, dev acc 0.7817, dev avg loss 0.422402, throughput 9.02874K wps
[Epoch 74 Batch 30/62] avg loss 0.00616347, throughput 9.05903K wps
[Epoch 74 Batch 60/62] avg loss 0.00612093, throughput 8.83K wps
Begin Testing...
[Epoch 74] train avg loss 0.00622161, dev acc 0.7906, dev avg loss 0.422476, throughput 8.92923K wps
[Epoch 75 Batch 30/62] avg loss 0.00600764, throughput 9.05784K wps
[Epoch 75 Batch 60/62] avg loss 0.0063881, throughput 8.84396K wps
Begin Testing...
[Epoch 75] train avg loss 0.00623328, dev acc 0.7817, dev avg loss 0.42063, throughput 8.91742K wps
[Epoch 76 Batch 30/62] avg loss 0.00596074, throughput 9.15342K wps
[Epoch 76 Batch 60/62] avg loss 0.00605071, throughput 9.04625K wps
Begin Testing...
[Epoch 76] train avg loss 0.00606279, dev acc 0.7876, dev avg loss 0.419597, throughput 9.12582K wps
[Epoch 77 Batch 30/62] avg loss 0.00615905, throughput 8.77393K wps
[Epoch 77 Batch 60/62] avg loss 0.00575907, throughput 9.02636K wps
Begin Testing...
[Epoch 77] train avg loss 0.00606603, dev acc 0.7994, dev avg loss 0.419812, throughput 8.96211K wps
Observed Improvement.
Begin Testing...
[Epoch 78 Batch 30/62] avg loss 0.00583668, throughput 8.9682K wps
[Epoch 78 Batch 60/62] avg loss 0.0060122, throughput 8.86471K wps
Begin Testing...
[Epoch 78] train avg loss 0.00594904, dev acc 0.7935, dev avg loss 0.4176, throughput 8.98385K wps
[Epoch 79 Batch 30/62] avg loss 0.00563676, throughput 9.28441K wps
[Epoch 79 Batch 60/62] avg loss 0.00590633, throughput 8.932K wps
Begin Testing...
[Epoch 79] train avg loss 0.00586357, dev acc 0.7935, dev avg loss 0.416439, throughput 9.13438K wps
[Epoch 80 Batch 30/62] avg loss 0.00569952, throughput 9.02763K wps
[Epoch 80 Batch 60/62] avg loss 0.00580108, throughput 8.97127K wps
Begin Testing...
[Epoch 80] train avg loss 0.00580211, dev acc 0.7994, dev avg loss 0.415802, throughput 9.03062K wps
Observed Improvement.
Begin Testing...
[Epoch 81 Batch 30/62] avg loss 0.00573221, throughput 8.74117K wps
[Epoch 81 Batch 60/62] avg loss 0.00589445, throughput 8.95065K wps
Begin Testing...
[Epoch 81] train avg loss 0.00585067, dev acc 0.7994, dev avg loss 0.419817, throughput 8.82432K wps
Observed Improvement.
Begin Testing...
[Epoch 82 Batch 30/62] avg loss 0.00546648, throughput 9.23197K wps
[Epoch 82 Batch 60/62] avg loss 0.00568445, throughput 8.88261K wps
Begin Testing...
[Epoch 82] train avg loss 0.00562947, dev acc 0.7965, dev avg loss 0.414566, throughput 9.08614K wps
[Epoch 83 Batch 30/62] avg loss 0.0057972, throughput 9.07968K wps
[Epoch 83 Batch 60/62] avg loss 0.00553064, throughput 9.14001K wps
Begin Testing...
[Epoch 83] train avg loss 0.00571868, dev acc 0.7935, dev avg loss 0.413446, throughput 9.07807K wps
[Epoch 84 Batch 30/62] avg loss 0.00552332, throughput 9.15345K wps
[Epoch 84 Batch 60/62] avg loss 0.00561941, throughput 8.66251K wps
Begin Testing...
[Epoch 84] train avg loss 0.00557977, dev acc 0.7994, dev avg loss 0.416879, throughput 8.8866K wps
Observed Improvement.
Begin Testing...
[Epoch 85 Batch 30/62] avg loss 0.00585028, throughput 9.14508K wps
[Epoch 85 Batch 60/62] avg loss 0.00520061, throughput 8.80065K wps
Begin Testing...
[Epoch 85] train avg loss 0.00563304, dev acc 0.8112, dev avg loss 0.413212, throughput 8.9457K wps
Observed Improvement.
Begin Testing...
[Epoch 86 Batch 30/62] avg loss 0.00535972, throughput 9.10336K wps
[Epoch 86 Batch 60/62] avg loss 0.00561183, throughput 9.02806K wps
Begin Testing...
[Epoch 86] train avg loss 0.00556744, dev acc 0.8083, dev avg loss 0.412041, throughput 9.0962K wps
[Epoch 87 Batch 30/62] avg loss 0.00535416, throughput 9.09546K wps
[Epoch 87 Batch 60/62] avg loss 0.00561084, throughput 9.09988K wps
Begin Testing...
[Epoch 87] train avg loss 0.00559626, dev acc 0.8024, dev avg loss 0.409336, throughput 9.08204K wps
[Epoch 88 Batch 30/62] avg loss 0.0054894, throughput 9.11604K wps
[Epoch 88 Batch 60/62] avg loss 0.00527885, throughput 8.78904K wps
Begin Testing...
[Epoch 88] train avg loss 0.00540798, dev acc 0.8083, dev avg loss 0.409479, throughput 9.01068K wps
[Epoch 89 Batch 30/62] avg loss 0.00518409, throughput 8.83318K wps
[Epoch 89 Batch 60/62] avg loss 0.00526731, throughput 9.02334K wps
Begin Testing...
[Epoch 89] train avg loss 0.00531131, dev acc 0.8053, dev avg loss 0.408089, throughput 8.96164K wps
[Epoch 90 Batch 30/62] avg loss 0.00533525, throughput 8.96605K wps
[Epoch 90 Batch 60/62] avg loss 0.00512738, throughput 9.0132K wps
Begin Testing...
[Epoch 90] train avg loss 0.00526927, dev acc 0.8053, dev avg loss 0.408302, throughput 8.95139K wps
[Epoch 91 Batch 30/62] avg loss 0.00493801, throughput 9.22844K wps
[Epoch 91 Batch 60/62] avg loss 0.00528775, throughput 8.98636K wps
Begin Testing...
[Epoch 91] train avg loss 0.00521646, dev acc 0.8142, dev avg loss 0.410783, throughput 9.0778K wps
Observed Improvement.
Begin Testing...
[Epoch 92 Batch 30/62] avg loss 0.00506086, throughput 8.6871K wps
[Epoch 92 Batch 60/62] avg loss 0.00515272, throughput 8.98824K wps
Begin Testing...
[Epoch 92] train avg loss 0.00517724, dev acc 0.8053, dev avg loss 0.405349, throughput 8.82757K wps
[Epoch 93 Batch 30/62] avg loss 0.00518236, throughput 8.85026K wps
[Epoch 93 Batch 60/62] avg loss 0.00498202, throughput 8.99683K wps
Begin Testing...
[Epoch 93] train avg loss 0.00512521, dev acc 0.8083, dev avg loss 0.40516, throughput 8.94722K wps
[Epoch 94 Batch 30/62] avg loss 0.00486179, throughput 8.92582K wps
[Epoch 94 Batch 60/62] avg loss 0.00523425, throughput 8.89954K wps
Begin Testing...
[Epoch 94] train avg loss 0.00510718, dev acc 0.8083, dev avg loss 0.407278, throughput 8.93643K wps
[Epoch 95 Batch 30/62] avg loss 0.00495379, throughput 9.13593K wps
[Epoch 95 Batch 60/62] avg loss 0.00493725, throughput 8.78141K wps
Begin Testing...
[Epoch 95] train avg loss 0.00498022, dev acc 0.8083, dev avg loss 0.403255, throughput 8.92784K wps
[Epoch 96 Batch 30/62] avg loss 0.00474293, throughput 9.14296K wps
[Epoch 96 Batch 60/62] avg loss 0.00493662, throughput 8.79299K wps
Begin Testing...
[Epoch 96] train avg loss 0.00488028, dev acc 0.8083, dev avg loss 0.402639, throughput 8.99337K wps
[Epoch 97 Batch 30/62] avg loss 0.00480091, throughput 9.15901K wps
[Epoch 97 Batch 60/62] avg loss 0.00484629, throughput 8.89604K wps
Begin Testing...
[Epoch 97] train avg loss 0.00486135, dev acc 0.8053, dev avg loss 0.40221, throughput 9.0548K wps
[Epoch 98 Batch 30/62] avg loss 0.00473254, throughput 8.9285K wps
[Epoch 98 Batch 60/62] avg loss 0.00476596, throughput 9.02384K wps
Begin Testing...
[Epoch 98] train avg loss 0.00483632, dev acc 0.8142, dev avg loss 0.402328, throughput 9.00753K wps
Observed Improvement.
Begin Testing...
[Epoch 99 Batch 30/62] avg loss 0.00475428, throughput 9.09418K wps
[Epoch 99 Batch 60/62] avg loss 0.00463468, throughput 8.70993K wps
Begin Testing...
[Epoch 99] train avg loss 0.00478179, dev acc 0.8142, dev avg loss 0.403726, throughput 8.86199K wps
Observed Improvement.
Begin Testing...
[Epoch 100 Batch 30/62] avg loss 0.00463485, throughput 9.0661K wps
[Epoch 100 Batch 60/62] avg loss 0.00456497, throughput 9.11226K wps
Begin Testing...
[Epoch 100] train avg loss 0.00466355, dev acc 0.8142, dev avg loss 0.403493, throughput 9.11849K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/62] avg loss 0.00481887, throughput 9.09692K wps
[Epoch 101 Batch 60/62] avg loss 0.0045583, throughput 9.01726K wps
Begin Testing...
[Epoch 101] train avg loss 0.00475245, dev acc 0.8142, dev avg loss 0.402836, throughput 9.08722K wps
Observed Improvement.
Begin Testing...
[Epoch 102 Batch 30/62] avg loss 0.00448564, throughput 9.00009K wps
[Epoch 102 Batch 60/62] avg loss 0.00464981, throughput 8.91552K wps
Begin Testing...
[Epoch 102] train avg loss 0.00465742, dev acc 0.8024, dev avg loss 0.40219, throughput 8.98908K wps
[Epoch 103 Batch 30/62] avg loss 0.00452231, throughput 8.97358K wps
[Epoch 103 Batch 60/62] avg loss 0.00465211, throughput 9.15013K wps
Begin Testing...
[Epoch 103] train avg loss 0.00458741, dev acc 0.8024, dev avg loss 0.398434, throughput 9.09403K wps
[Epoch 104 Batch 30/62] avg loss 0.00454715, throughput 9.22589K wps
[Epoch 104 Batch 60/62] avg loss 0.00445088, throughput 9.06325K wps
Begin Testing...
[Epoch 104] train avg loss 0.00455996, dev acc 0.8171, dev avg loss 0.397579, throughput 9.17163K wps
Observed Improvement.
Begin Testing...
[Epoch 105 Batch 30/62] avg loss 0.00448134, throughput 9.22846K wps
[Epoch 105 Batch 60/62] avg loss 0.00430189, throughput 8.82099K wps
Begin Testing...
[Epoch 105] train avg loss 0.00447611, dev acc 0.8171, dev avg loss 0.397017, throughput 8.98474K wps
Observed Improvement.
Begin Testing...
[Epoch 106 Batch 30/62] avg loss 0.00447351, throughput 9.09657K wps
[Epoch 106 Batch 60/62] avg loss 0.00449086, throughput 9.11867K wps
Begin Testing...
[Epoch 106] train avg loss 0.00451789, dev acc 0.8201, dev avg loss 0.396602, throughput 9.13505K wps
Observed Improvement.
Begin Testing...
[Epoch 107 Batch 30/62] avg loss 0.00435187, throughput 8.98636K wps
[Epoch 107 Batch 60/62] avg loss 0.0043306, throughput 8.8987K wps
Begin Testing...
[Epoch 107] train avg loss 0.00444305, dev acc 0.8201, dev avg loss 0.396113, throughput 8.92181K wps
Observed Improvement.
Begin Testing...
[Epoch 108 Batch 30/62] avg loss 0.00427968, throughput 9.09344K wps
[Epoch 108 Batch 60/62] avg loss 0.00447551, throughput 8.72448K wps
Begin Testing...
[Epoch 108] train avg loss 0.00442618, dev acc 0.8230, dev avg loss 0.397202, throughput 8.89388K wps
Observed Improvement.
Begin Testing...
[Epoch 109 Batch 30/62] avg loss 0.00408007, throughput 9.03996K wps
[Epoch 109 Batch 60/62] avg loss 0.00440535, throughput 9.09747K wps
Begin Testing...
[Epoch 109] train avg loss 0.00427865, dev acc 0.8053, dev avg loss 0.397681, throughput 9.09712K wps
[Epoch 110 Batch 30/62] avg loss 0.00411403, throughput 9.23832K wps
[Epoch 110 Batch 60/62] avg loss 0.00427118, throughput 8.88109K wps
Begin Testing...
[Epoch 110] train avg loss 0.00426841, dev acc 0.8024, dev avg loss 0.396658, throughput 9.02305K wps
[Epoch 111 Batch 30/62] avg loss 0.00414061, throughput 9.2762K wps
[Epoch 111 Batch 60/62] avg loss 0.00406577, throughput 9.11527K wps
Begin Testing...
[Epoch 111] train avg loss 0.00423209, dev acc 0.8230, dev avg loss 0.394994, throughput 9.22254K wps
Observed Improvement.
Begin Testing...
[Epoch 112 Batch 30/62] avg loss 0.00412225, throughput 8.99377K wps
[Epoch 112 Batch 60/62] avg loss 0.00429047, throughput 9.09358K wps
Begin Testing...
[Epoch 112] train avg loss 0.0042493, dev acc 0.7994, dev avg loss 0.396303, throughput 9.01825K wps
[Epoch 113 Batch 30/62] avg loss 0.00397626, throughput 9.03309K wps
[Epoch 113 Batch 60/62] avg loss 0.00417353, throughput 8.97218K wps
Begin Testing...
[Epoch 113] train avg loss 0.00416474, dev acc 0.8083, dev avg loss 0.39578, throughput 9.03429K wps
[Epoch 114 Batch 30/62] avg loss 0.00403221, throughput 8.96323K wps
[Epoch 114 Batch 60/62] avg loss 0.00395257, throughput 8.86574K wps
Begin Testing...
[Epoch 114] train avg loss 0.00407784, dev acc 0.8053, dev avg loss 0.398034, throughput 8.94591K wps
[Epoch 115 Batch 30/62] avg loss 0.00417138, throughput 8.91629K wps
[Epoch 115 Batch 60/62] avg loss 0.00388626, throughput 8.83747K wps
Begin Testing...
[Epoch 115] train avg loss 0.00409309, dev acc 0.8201, dev avg loss 0.393904, throughput 8.854K wps
[Epoch 116 Batch 30/62] avg loss 0.00402621, throughput 8.7462K wps
[Epoch 116 Batch 60/62] avg loss 0.00398376, throughput 9.20851K wps
Begin Testing...
[Epoch 116] train avg loss 0.00405486, dev acc 0.8289, dev avg loss 0.394098, throughput 9.00486K wps
Observed Improvement.
Begin Testing...
[Epoch 117 Batch 30/62] avg loss 0.0039059, throughput 8.84499K wps
[Epoch 117 Batch 60/62] avg loss 0.00393177, throughput 9.14318K wps
Begin Testing...
[Epoch 117] train avg loss 0.003971, dev acc 0.8289, dev avg loss 0.394488, throughput 9.02212K wps
Observed Improvement.
Begin Testing...
[Epoch 118 Batch 30/62] avg loss 0.00392672, throughput 8.85206K wps
[Epoch 118 Batch 60/62] avg loss 0.00390342, throughput 8.94748K wps
Begin Testing...
[Epoch 118] train avg loss 0.00393953, dev acc 0.8083, dev avg loss 0.396862, throughput 8.94935K wps
[Epoch 119 Batch 30/62] avg loss 0.00400084, throughput 9.0748K wps
[Epoch 119 Batch 60/62] avg loss 0.00374766, throughput 8.80201K wps
Begin Testing...
[Epoch 119] train avg loss 0.00394947, dev acc 0.8053, dev avg loss 0.394647, throughput 8.92954K wps
[Epoch 120 Batch 30/62] avg loss 0.00387081, throughput 9.15477K wps
[Epoch 120 Batch 60/62] avg loss 0.0037549, throughput 8.92637K wps
Begin Testing...
[Epoch 120] train avg loss 0.00385584, dev acc 0.8289, dev avg loss 0.394982, throughput 9.0107K wps
Observed Improvement.
Begin Testing...
[Epoch 121 Batch 30/62] avg loss 0.00366665, throughput 8.98426K wps
[Epoch 121 Batch 60/62] avg loss 0.0037568, throughput 8.75759K wps
Begin Testing...
[Epoch 121] train avg loss 0.00375633, dev acc 0.8053, dev avg loss 0.395441, throughput 8.85356K wps
[Epoch 122 Batch 30/62] avg loss 0.00378129, throughput 9.1613K wps
[Epoch 122 Batch 60/62] avg loss 0.00370164, throughput 9.17264K wps
Begin Testing...
[Epoch 122] train avg loss 0.00376073, dev acc 0.8083, dev avg loss 0.393203, throughput 9.19701K wps
[Epoch 123 Batch 30/62] avg loss 0.00362482, throughput 8.76981K wps
[Epoch 123 Batch 60/62] avg loss 0.00367629, throughput 8.99487K wps
Begin Testing...
[Epoch 123] train avg loss 0.00366797, dev acc 0.8230, dev avg loss 0.390909, throughput 8.88998K wps
[Epoch 124 Batch 30/62] avg loss 0.00364845, throughput 9.27235K wps
[Epoch 124 Batch 60/62] avg loss 0.00380339, throughput 9.08394K wps
Begin Testing...
[Epoch 124] train avg loss 0.00382997, dev acc 0.8319, dev avg loss 0.39368, throughput 9.20464K wps
Observed Improvement.
Begin Testing...
[Epoch 125 Batch 30/62] avg loss 0.00366432, throughput 8.96753K wps
[Epoch 125 Batch 60/62] avg loss 0.00354528, throughput 8.82482K wps
Begin Testing...
[Epoch 125] train avg loss 0.00365373, dev acc 0.8201, dev avg loss 0.389834, throughput 8.9451K wps
[Epoch 126 Batch 30/62] avg loss 0.00378416, throughput 9.23985K wps
[Epoch 126 Batch 60/62] avg loss 0.00342159, throughput 8.98856K wps
Begin Testing...
[Epoch 126] train avg loss 0.00365428, dev acc 0.8142, dev avg loss 0.390618, throughput 9.07669K wps
[Epoch 127 Batch 30/62] avg loss 0.00363103, throughput 9.06534K wps
[Epoch 127 Batch 60/62] avg loss 0.00348902, throughput 8.76511K wps
Begin Testing...
[Epoch 127] train avg loss 0.00362081, dev acc 0.8142, dev avg loss 0.388904, throughput 8.89297K wps
[Epoch 128 Batch 30/62] avg loss 0.00347182, throughput 9.2482K wps
[Epoch 128 Batch 60/62] avg loss 0.00346593, throughput 8.89032K wps
Begin Testing...
[Epoch 128] train avg loss 0.00357137, dev acc 0.8289, dev avg loss 0.38827, throughput 9.0295K wps
[Epoch 129 Batch 30/62] avg loss 0.0033932, throughput 8.68615K wps
[Epoch 129 Batch 60/62] avg loss 0.00351125, throughput 9.10345K wps
Begin Testing...
[Epoch 129] train avg loss 0.0034876, dev acc 0.8201, dev avg loss 0.388266, throughput 8.92583K wps
[Epoch 130 Batch 30/62] avg loss 0.00352981, throughput 8.93529K wps
[Epoch 130 Batch 60/62] avg loss 0.00358471, throughput 9.00646K wps
Begin Testing...
[Epoch 130] train avg loss 0.00362349, dev acc 0.8201, dev avg loss 0.388051, throughput 8.94918K wps
[Epoch 131 Batch 30/62] avg loss 0.00345088, throughput 8.89187K wps
[Epoch 131 Batch 60/62] avg loss 0.0033036, throughput 9.06718K wps
Begin Testing...
[Epoch 131] train avg loss 0.0034338, dev acc 0.8112, dev avg loss 0.389243, throughput 8.96605K wps
[Epoch 132 Batch 30/62] avg loss 0.00343453, throughput 8.89566K wps
[Epoch 132 Batch 60/62] avg loss 0.00334551, throughput 8.7095K wps
Begin Testing...
[Epoch 132] train avg loss 0.0034884, dev acc 0.8230, dev avg loss 0.39956, throughput 8.78389K wps
[Epoch 133 Batch 30/62] avg loss 0.00316877, throughput 9.02328K wps
[Epoch 133 Batch 60/62] avg loss 0.00355992, throughput 9.05177K wps
Begin Testing...
[Epoch 133] train avg loss 0.0034157, dev acc 0.8260, dev avg loss 0.387647, throughput 9.0643K wps
[Epoch 134 Batch 30/62] avg loss 0.00342859, throughput 9.20945K wps
[Epoch 134 Batch 60/62] avg loss 0.00315565, throughput 8.79101K wps
Begin Testing...
[Epoch 134] train avg loss 0.00331059, dev acc 0.8142, dev avg loss 0.392398, throughput 9.02375K wps
[Epoch 135 Batch 30/62] avg loss 0.00326054, throughput 9.12689K wps
[Epoch 135 Batch 60/62] avg loss 0.00344253, throughput 8.88384K wps
Begin Testing...
[Epoch 135] train avg loss 0.00336511, dev acc 0.8083, dev avg loss 0.389527, throughput 8.97539K wps
[Epoch 136 Batch 30/62] avg loss 0.00337316, throughput 9.05139K wps
[Epoch 136 Batch 60/62] avg loss 0.00315617, throughput 8.97102K wps
Begin Testing...
[Epoch 136] train avg loss 0.00334597, dev acc 0.8112, dev avg loss 0.387593, throughput 9.03671K wps
[Epoch 137 Batch 30/62] avg loss 0.00320144, throughput 9.08666K wps
[Epoch 137 Batch 60/62] avg loss 0.00334787, throughput 9.03936K wps
Begin Testing...
[Epoch 137] train avg loss 0.00330091, dev acc 0.8260, dev avg loss 0.386417, throughput 9.08751K wps
[Epoch 138 Batch 30/62] avg loss 0.00307458, throughput 9.0602K wps
[Epoch 138 Batch 60/62] avg loss 0.00328355, throughput 9.03109K wps
Begin Testing...
[Epoch 138] train avg loss 0.0032182, dev acc 0.8112, dev avg loss 0.387444, throughput 9.08537K wps
[Epoch 139 Batch 30/62] avg loss 0.00313547, throughput 9.10462K wps
[Epoch 139 Batch 60/62] avg loss 0.00327855, throughput 9.09824K wps
Begin Testing...
[Epoch 139] train avg loss 0.0032224, dev acc 0.8171, dev avg loss 0.386554, throughput 9.08344K wps
[Epoch 140 Batch 30/62] avg loss 0.00301249, throughput 8.97486K wps
[Epoch 140 Batch 60/62] avg loss 0.00304567, throughput 9.07389K wps
Begin Testing...
[Epoch 140] train avg loss 0.00304191, dev acc 0.8260, dev avg loss 0.386758, throughput 9.04941K wps
[Epoch 141 Batch 30/62] avg loss 0.00303158, throughput 8.9675K wps
[Epoch 141 Batch 60/62] avg loss 0.00322976, throughput 8.82688K wps
Begin Testing...
[Epoch 141] train avg loss 0.00315499, dev acc 0.8319, dev avg loss 0.387379, throughput 8.86207K wps
Observed Improvement.
Begin Testing...
[Epoch 142 Batch 30/62] avg loss 0.00308603, throughput 9.1894K wps
[Epoch 142 Batch 60/62] avg loss 0.00310452, throughput 9.10372K wps
Begin Testing...
[Epoch 142] train avg loss 0.00312417, dev acc 0.8112, dev avg loss 0.386232, throughput 9.17468K wps
[Epoch 143 Batch 30/62] avg loss 0.00299578, throughput 9.24283K wps
[Epoch 143 Batch 60/62] avg loss 0.00308426, throughput 9.11159K wps
Begin Testing...
[Epoch 143] train avg loss 0.00311183, dev acc 0.8378, dev avg loss 0.389357, throughput 9.20676K wps
Observed Improvement.
Begin Testing...
[Epoch 144 Batch 30/62] avg loss 0.0030766, throughput 8.89996K wps
[Epoch 144 Batch 60/62] avg loss 0.00292835, throughput 8.83265K wps
Begin Testing...
[Epoch 144] train avg loss 0.00300049, dev acc 0.8260, dev avg loss 0.38562, throughput 8.84402K wps
[Epoch 145 Batch 30/62] avg loss 0.00287975, throughput 8.90416K wps
[Epoch 145 Batch 60/62] avg loss 0.00301667, throughput 8.86395K wps
Begin Testing...
[Epoch 145] train avg loss 0.00300767, dev acc 0.8378, dev avg loss 0.386467, throughput 8.91733K wps
Observed Improvement.
Begin Testing...
[Epoch 146 Batch 30/62] avg loss 0.00311585, throughput 8.9808K wps
[Epoch 146 Batch 60/62] avg loss 0.00309011, throughput 8.99024K wps
Begin Testing...
[Epoch 146] train avg loss 0.00313143, dev acc 0.8112, dev avg loss 0.38638, throughput 9.01623K wps
[Epoch 147 Batch 30/62] avg loss 0.00287536, throughput 8.96505K wps
[Epoch 147 Batch 60/62] avg loss 0.00298805, throughput 8.74396K wps
Begin Testing...
[Epoch 147] train avg loss 0.00298071, dev acc 0.8319, dev avg loss 0.385675, throughput 8.83364K wps
[Epoch 148 Batch 30/62] avg loss 0.00294141, throughput 9.29174K wps
[Epoch 148 Batch 60/62] avg loss 0.0027945, throughput 8.94962K wps
Begin Testing...
[Epoch 148] train avg loss 0.00287072, dev acc 0.8112, dev avg loss 0.388295, throughput 9.08718K wps
[Epoch 149 Batch 30/62] avg loss 0.00294438, throughput 8.76378K wps
[Epoch 149 Batch 60/62] avg loss 0.00286377, throughput 8.71107K wps
Begin Testing...
[Epoch 149] train avg loss 0.00292101, dev acc 0.8112, dev avg loss 0.386733, throughput 8.76501K wps
[Epoch 150 Batch 30/62] avg loss 0.00277799, throughput 9.26342K wps
[Epoch 150 Batch 60/62] avg loss 0.00291291, throughput 9.04197K wps
Begin Testing...
[Epoch 150] train avg loss 0.0028551, dev acc 0.8142, dev avg loss 0.38604, throughput 9.12527K wps
[Epoch 151 Batch 30/62] avg loss 0.00273367, throughput 9.24585K wps
[Epoch 151 Batch 60/62] avg loss 0.00286403, throughput 9.10739K wps
Begin Testing...
[Epoch 151] train avg loss 0.00282954, dev acc 0.8171, dev avg loss 0.393409, throughput 9.15496K wps
[Epoch 152 Batch 30/62] avg loss 0.00297258, throughput 9.07506K wps
[Epoch 152 Batch 60/62] avg loss 0.00261307, throughput 8.95487K wps
Begin Testing...
[Epoch 152] train avg loss 0.0028161, dev acc 0.8378, dev avg loss 0.385459, throughput 9.07102K wps
Observed Improvement.
Begin Testing...
[Epoch 153 Batch 30/62] avg loss 0.0028257, throughput 9.1263K wps
[Epoch 153 Batch 60/62] avg loss 0.00290172, throughput 9.02014K wps
Begin Testing...
[Epoch 153] train avg loss 0.00292476, dev acc 0.8083, dev avg loss 0.387764, throughput 9.0591K wps
[Epoch 154 Batch 30/62] avg loss 0.002469, throughput 8.98226K wps
[Epoch 154 Batch 60/62] avg loss 0.00280193, throughput 8.87798K wps
Begin Testing...
[Epoch 154] train avg loss 0.00266624, dev acc 0.8142, dev avg loss 0.38539, throughput 8.90888K wps
[Epoch 155 Batch 30/62] avg loss 0.00265952, throughput 8.95113K wps
[Epoch 155 Batch 60/62] avg loss 0.00266436, throughput 8.99345K wps
Begin Testing...
[Epoch 155] train avg loss 0.00267206, dev acc 0.8230, dev avg loss 0.384845, throughput 9.0048K wps
[Epoch 156 Batch 30/62] avg loss 0.00273382, throughput 8.84691K wps
[Epoch 156 Batch 60/62] avg loss 0.00266535, throughput 8.97226K wps
Begin Testing...
[Epoch 156] train avg loss 0.00272663, dev acc 0.8201, dev avg loss 0.384864, throughput 8.86111K wps
[Epoch 157 Batch 30/62] avg loss 0.00257144, throughput 8.85213K wps
[Epoch 157 Batch 60/62] avg loss 0.00272462, throughput 8.97378K wps
Begin Testing...
[Epoch 157] train avg loss 0.00272304, dev acc 0.8348, dev avg loss 0.38795, throughput 8.94584K wps
[Epoch 158 Batch 30/62] avg loss 0.00250772, throughput 9.0975K wps
[Epoch 158 Batch 60/62] avg loss 0.0027239, throughput 8.77626K wps
Begin Testing...
[Epoch 158] train avg loss 0.00264137, dev acc 0.8260, dev avg loss 0.384448, throughput 8.90463K wps
[Epoch 159 Batch 30/62] avg loss 0.00251818, throughput 9.09351K wps
[Epoch 159 Batch 60/62] avg loss 0.00268575, throughput 8.89715K wps
Begin Testing...
[Epoch 159] train avg loss 0.00265583, dev acc 0.8319, dev avg loss 0.384963, throughput 8.97078K wps
[Epoch 160 Batch 30/62] avg loss 0.00261706, throughput 8.70114K wps
[Epoch 160 Batch 60/62] avg loss 0.00264166, throughput 9.011K wps
Begin Testing...
[Epoch 160] train avg loss 0.00268361, dev acc 0.8230, dev avg loss 0.384172, throughput 8.91224K wps
[Epoch 161 Batch 30/62] avg loss 0.00254255, throughput 9.16162K wps
[Epoch 161 Batch 60/62] avg loss 0.002601, throughput 9.11739K wps
Begin Testing...
[Epoch 161] train avg loss 0.00264287, dev acc 0.8289, dev avg loss 0.385191, throughput 9.16888K wps
[Epoch 162 Batch 30/62] avg loss 0.0026255, throughput 9.07957K wps
[Epoch 162 Batch 60/62] avg loss 0.00246967, throughput 8.74736K wps
Begin Testing...
[Epoch 162] train avg loss 0.0025984, dev acc 0.8319, dev avg loss 0.385741, throughput 8.8972K wps
[Epoch 163 Batch 30/62] avg loss 0.00263516, throughput 9.01888K wps
[Epoch 163 Batch 60/62] avg loss 0.00255957, throughput 8.89012K wps
Begin Testing...
[Epoch 163] train avg loss 0.00265182, dev acc 0.8142, dev avg loss 0.389457, throughput 9.00003K wps
[Epoch 164 Batch 30/62] avg loss 0.00249055, throughput 9.14965K wps
[Epoch 164 Batch 60/62] avg loss 0.00248926, throughput 8.89118K wps
Begin Testing...
[Epoch 164] train avg loss 0.00251833, dev acc 0.8142, dev avg loss 0.387804, throughput 9.04365K wps
[Epoch 165 Batch 30/62] avg loss 0.0024376, throughput 9.01138K wps
[Epoch 165 Batch 60/62] avg loss 0.00239646, throughput 8.90779K wps
Begin Testing...
[Epoch 165] train avg loss 0.00252103, dev acc 0.8260, dev avg loss 0.394326, throughput 8.99655K wps
[Epoch 166 Batch 30/62] avg loss 0.00244358, throughput 8.93638K wps
[Epoch 166 Batch 60/62] avg loss 0.00243643, throughput 8.87414K wps
Begin Testing...
[Epoch 166] train avg loss 0.0024644, dev acc 0.8319, dev avg loss 0.385904, throughput 8.93929K wps
[Epoch 167 Batch 30/62] avg loss 0.00236888, throughput 9.28108K wps
[Epoch 167 Batch 60/62] avg loss 0.00243531, throughput 9.05999K wps
Begin Testing...
[Epoch 167] train avg loss 0.00242883, dev acc 0.8171, dev avg loss 0.390087, throughput 9.14214K wps
[Epoch 168 Batch 30/62] avg loss 0.00234956, throughput 9.09049K wps
[Epoch 168 Batch 60/62] avg loss 0.00235729, throughput 8.97349K wps
Begin Testing...
[Epoch 168] train avg loss 0.0024068, dev acc 0.8289, dev avg loss 0.385831, throughput 9.0644K wps
[Epoch 169 Batch 30/62] avg loss 0.00238348, throughput 9.11612K wps
[Epoch 169 Batch 60/62] avg loss 0.00231587, throughput 8.80005K wps
Begin Testing...
[Epoch 169] train avg loss 0.00236677, dev acc 0.8319, dev avg loss 0.385737, throughput 8.98726K wps
[Epoch 170 Batch 30/62] avg loss 0.0023455, throughput 8.88381K wps
[Epoch 170 Batch 60/62] avg loss 0.0023205, throughput 9.17899K wps
Begin Testing...
[Epoch 170] train avg loss 0.00236528, dev acc 0.8230, dev avg loss 0.385957, throughput 9.061K wps
[Epoch 171 Batch 30/62] avg loss 0.00239553, throughput 8.99071K wps
[Epoch 171 Batch 60/62] avg loss 0.00224999, throughput 8.92584K wps
Begin Testing...
[Epoch 171] train avg loss 0.00236973, dev acc 0.8289, dev avg loss 0.387862, throughput 8.98594K wps
[Epoch 172 Batch 30/62] avg loss 0.00221082, throughput 9.00781K wps
[Epoch 172 Batch 60/62] avg loss 0.00232742, throughput 9.16725K wps
Begin Testing...
[Epoch 172] train avg loss 0.00229024, dev acc 0.8171, dev avg loss 0.386486, throughput 9.11507K wps
[Epoch 173 Batch 30/62] avg loss 0.00241137, throughput 9.06164K wps
[Epoch 173 Batch 60/62] avg loss 0.00232549, throughput 9.01006K wps
Begin Testing...
[Epoch 173] train avg loss 0.00240301, dev acc 0.8142, dev avg loss 0.389315, throughput 9.0594K wps
[Epoch 174 Batch 30/62] avg loss 0.00232203, throughput 9.0645K wps
[Epoch 174 Batch 60/62] avg loss 0.00237456, throughput 8.80789K wps
Begin Testing...
[Epoch 174] train avg loss 0.00236854, dev acc 0.8142, dev avg loss 0.388228, throughput 8.96643K wps
[Epoch 175 Batch 30/62] avg loss 0.00230332, throughput 8.71042K wps
[Epoch 175 Batch 60/62] avg loss 0.00221422, throughput 8.7476K wps
Begin Testing...
[Epoch 175] train avg loss 0.00230661, dev acc 0.8142, dev avg loss 0.387034, throughput 8.73014K wps
[Epoch 176 Batch 30/62] avg loss 0.0022363, throughput 8.8928K wps
[Epoch 176 Batch 60/62] avg loss 0.00231771, throughput 9.05549K wps
Begin Testing...
[Epoch 176] train avg loss 0.00230853, dev acc 0.8201, dev avg loss 0.386038, throughput 8.99944K wps
[Epoch 177 Batch 30/62] avg loss 0.00218268, throughput 8.93965K wps
[Epoch 177 Batch 60/62] avg loss 0.00237543, throughput 9.10016K wps
Begin Testing...
[Epoch 177] train avg loss 0.00232505, dev acc 0.8319, dev avg loss 0.387037, throughput 9.05193K wps
[Epoch 178 Batch 30/62] avg loss 0.00222912, throughput 9.14117K wps
[Epoch 178 Batch 60/62] avg loss 0.00211809, throughput 8.83592K wps
Begin Testing...
[Epoch 178] train avg loss 0.0021887, dev acc 0.8171, dev avg loss 0.386669, throughput 9.00415K wps
[Epoch 179 Batch 30/62] avg loss 0.00214475, throughput 9.06011K wps
[Epoch 179 Batch 60/62] avg loss 0.00226045, throughput 9.12735K wps
Begin Testing...
[Epoch 179] train avg loss 0.00224149, dev acc 0.8319, dev avg loss 0.392723, throughput 9.12209K wps
[Epoch 180 Batch 30/62] avg loss 0.00228467, throughput 9.24062K wps
[Epoch 180 Batch 60/62] avg loss 0.00216203, throughput 8.9991K wps
Begin Testing...
[Epoch 180] train avg loss 0.00228581, dev acc 0.8319, dev avg loss 0.387428, throughput 9.14491K wps
[Epoch 181 Batch 30/62] avg loss 0.00208109, throughput 9.14997K wps
[Epoch 181 Batch 60/62] avg loss 0.00224599, throughput 8.82359K wps
Begin Testing...
[Epoch 181] train avg loss 0.00219262, dev acc 0.8378, dev avg loss 0.385287, throughput 8.95233K wps
Observed Improvement.
Begin Testing...
[Epoch 182 Batch 30/62] avg loss 0.00209194, throughput 9.17077K wps
[Epoch 182 Batch 60/62] avg loss 0.00206901, throughput 9.01963K wps
Begin Testing...
[Epoch 182] train avg loss 0.00210151, dev acc 0.8142, dev avg loss 0.387174, throughput 9.12534K wps
[Epoch 183 Batch 30/62] avg loss 0.00205285, throughput 9.21345K wps
[Epoch 183 Batch 60/62] avg loss 0.00202192, throughput 8.76028K wps
Begin Testing...
[Epoch 183] train avg loss 0.0020464, dev acc 0.8171, dev avg loss 0.386229, throughput 9.01315K wps
[Epoch 184 Batch 30/62] avg loss 0.0020957, throughput 9.07757K wps
[Epoch 184 Batch 60/62] avg loss 0.00211101, throughput 8.64738K wps
Begin Testing...
[Epoch 184] train avg loss 0.0021002, dev acc 0.8171, dev avg loss 0.387148, throughput 8.8503K wps
[Epoch 185 Batch 30/62] avg loss 0.00204122, throughput 8.86312K wps
[Epoch 185 Batch 60/62] avg loss 0.00200872, throughput 8.90304K wps
Begin Testing...
[Epoch 185] train avg loss 0.00203621, dev acc 0.8289, dev avg loss 0.385657, throughput 8.9186K wps
[Epoch 186 Batch 30/62] avg loss 0.00210962, throughput 9.0377K wps
[Epoch 186 Batch 60/62] avg loss 0.00199248, throughput 8.84605K wps
Begin Testing...
[Epoch 186] train avg loss 0.00207188, dev acc 0.8142, dev avg loss 0.387747, throughput 8.91134K wps
[Epoch 187 Batch 30/62] avg loss 0.00202853, throughput 9.24K wps
[Epoch 187 Batch 60/62] avg loss 0.00206433, throughput 8.88095K wps
Begin Testing...
[Epoch 187] train avg loss 0.002055, dev acc 0.8201, dev avg loss 0.386261, throughput 9.10208K wps
[Epoch 188 Batch 30/62] avg loss 0.002058, throughput 9.17438K wps
[Epoch 188 Batch 60/62] avg loss 0.00202129, throughput 8.97982K wps
Begin Testing...
[Epoch 188] train avg loss 0.00210409, dev acc 0.8142, dev avg loss 0.3931, throughput 9.0897K wps
[Epoch 189 Batch 30/62] avg loss 0.00187543, throughput 8.77577K wps
[Epoch 189 Batch 60/62] avg loss 0.00210152, throughput 8.79507K wps
Begin Testing...
[Epoch 189] train avg loss 0.00202438, dev acc 0.8142, dev avg loss 0.386707, throughput 8.81338K wps
[Epoch 190 Batch 30/62] avg loss 0.00195647, throughput 8.87785K wps
[Epoch 190 Batch 60/62] avg loss 0.0019984, throughput 8.92529K wps
Begin Testing...
[Epoch 190] train avg loss 0.00202082, dev acc 0.8348, dev avg loss 0.391263, throughput 8.89233K wps
[Epoch 191 Batch 30/62] avg loss 0.00199301, throughput 9.15071K wps
[Epoch 191 Batch 60/62] avg loss 0.00186164, throughput 8.70503K wps
Begin Testing...
[Epoch 191] train avg loss 0.001932, dev acc 0.8171, dev avg loss 0.385885, throughput 8.90595K wps
[Epoch 192 Batch 30/62] avg loss 0.00184429, throughput 9.11183K wps
[Epoch 192 Batch 60/62] avg loss 0.00184813, throughput 8.79999K wps
Begin Testing...
[Epoch 192] train avg loss 0.00186583, dev acc 0.8348, dev avg loss 0.386148, throughput 8.98915K wps
[Epoch 193 Batch 30/62] avg loss 0.0018866, throughput 8.81587K wps
[Epoch 193 Batch 60/62] avg loss 0.00201053, throughput 8.9879K wps
Begin Testing...
[Epoch 193] train avg loss 0.00198296, dev acc 0.8319, dev avg loss 0.386444, throughput 8.87949K wps
[Epoch 194 Batch 30/62] avg loss 0.00184493, throughput 9.00696K wps
[Epoch 194 Batch 60/62] avg loss 0.00197692, throughput 8.98098K wps
Begin Testing...
[Epoch 194] train avg loss 0.00191368, dev acc 0.8348, dev avg loss 0.386273, throughput 9.02563K wps
[Epoch 195 Batch 30/62] avg loss 0.00192174, throughput 8.91866K wps
[Epoch 195 Batch 60/62] avg loss 0.0019407, throughput 9.0953K wps
Begin Testing...
[Epoch 195] train avg loss 0.00195867, dev acc 0.8201, dev avg loss 0.386683, throughput 9.03139K wps
[Epoch 196 Batch 30/62] avg loss 0.00175984, throughput 9.00376K wps
[Epoch 196 Batch 60/62] avg loss 0.00194808, throughput 8.89158K wps
Begin Testing...
[Epoch 196] train avg loss 0.00188761, dev acc 0.8289, dev avg loss 0.386301, throughput 8.97849K wps
[Epoch 197 Batch 30/62] avg loss 0.00195063, throughput 9.12131K wps
[Epoch 197 Batch 60/62] avg loss 0.00188661, throughput 8.92956K wps
Begin Testing...
[Epoch 197] train avg loss 0.0019557, dev acc 0.8171, dev avg loss 0.388071, throughput 8.99829K wps
[Epoch 198 Batch 30/62] avg loss 0.00183054, throughput 8.89619K wps
[Epoch 198 Batch 60/62] avg loss 0.0020049, throughput 8.83201K wps
Begin Testing...
[Epoch 198] train avg loss 0.0019412, dev acc 0.8171, dev avg loss 0.38811, throughput 8.88405K wps
[Epoch 199 Batch 30/62] avg loss 0.00180548, throughput 8.90035K wps
[Epoch 199 Batch 60/62] avg loss 0.00190776, throughput 8.95555K wps
Begin Testing...
[Epoch 199] train avg loss 0.00186552, dev acc 0.8171, dev avg loss 0.387115, throughput 8.90257K wps
Test loss 0.336341, test acc 0.8382
Total time cost 178.38s
[Epoch 0 Batch 30/62] avg loss 0.0134163, throughput 8.41177K wps
[Epoch 0 Batch 60/62] avg loss 0.0130664, throughput 9.04938K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133823, dev acc 0.6519, dev avg loss 0.647843, throughput 8.71654K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0133254, throughput 9.01347K wps
[Epoch 1 Batch 60/62] avg loss 0.0130449, throughput 8.97037K wps
Begin Testing...
[Epoch 1] train avg loss 0.0133406, dev acc 0.6519, dev avg loss 0.642652, throughput 9.00634K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0130125, throughput 9.29954K wps
[Epoch 2 Batch 60/62] avg loss 0.0129887, throughput 9.08217K wps
Begin Testing...
[Epoch 2] train avg loss 0.0131327, dev acc 0.6519, dev avg loss 0.637962, throughput 9.2052K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0129141, throughput 9.10869K wps
[Epoch 3 Batch 60/62] avg loss 0.0128605, throughput 9.13167K wps
Begin Testing...
[Epoch 3] train avg loss 0.0130658, dev acc 0.6519, dev avg loss 0.633172, throughput 9.14861K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0129844, throughput 9.16504K wps
[Epoch 4 Batch 60/62] avg loss 0.0126135, throughput 9.06129K wps
Begin Testing...
[Epoch 4] train avg loss 0.0130036, dev acc 0.6519, dev avg loss 0.628659, throughput 9.14272K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0126863, throughput 9.05309K wps
[Epoch 5 Batch 60/62] avg loss 0.012588, throughput 8.77224K wps
Begin Testing...
[Epoch 5] train avg loss 0.0127966, dev acc 0.6519, dev avg loss 0.624141, throughput 8.8697K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0125096, throughput 9.09121K wps
[Epoch 6 Batch 60/62] avg loss 0.0124855, throughput 9.01071K wps
Begin Testing...
[Epoch 6] train avg loss 0.0126885, dev acc 0.6519, dev avg loss 0.61958, throughput 9.07662K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.012584, throughput 8.73849K wps
[Epoch 7 Batch 60/62] avg loss 0.0123118, throughput 9.02273K wps
Begin Testing...
[Epoch 7] train avg loss 0.0126047, dev acc 0.6519, dev avg loss 0.615267, throughput 8.90892K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0122131, throughput 8.83897K wps
[Epoch 8 Batch 60/62] avg loss 0.0124635, throughput 8.96118K wps
Begin Testing...
[Epoch 8] train avg loss 0.0124681, dev acc 0.6519, dev avg loss 0.610518, throughput 8.88394K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0122393, throughput 9.05765K wps
[Epoch 9 Batch 60/62] avg loss 0.0121978, throughput 8.98309K wps
Begin Testing...
[Epoch 9] train avg loss 0.0123418, dev acc 0.6519, dev avg loss 0.606135, throughput 9.00852K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0123481, throughput 8.9806K wps
[Epoch 10 Batch 60/62] avg loss 0.0120113, throughput 9.05236K wps
Begin Testing...
[Epoch 10] train avg loss 0.0122765, dev acc 0.6519, dev avg loss 0.60228, throughput 9.04733K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0120532, throughput 8.98666K wps
[Epoch 11 Batch 60/62] avg loss 0.0119619, throughput 8.97246K wps
Begin Testing...
[Epoch 11] train avg loss 0.0121272, dev acc 0.6519, dev avg loss 0.596909, throughput 9.01227K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0122185, throughput 8.91267K wps
[Epoch 12 Batch 60/62] avg loss 0.0118088, throughput 9.14744K wps
Begin Testing...
[Epoch 12] train avg loss 0.012189, dev acc 0.6549, dev avg loss 0.591824, throughput 9.05815K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0118687, throughput 9.20155K wps
[Epoch 13 Batch 60/62] avg loss 0.0119505, throughput 8.8175K wps
Begin Testing...
[Epoch 13] train avg loss 0.0120314, dev acc 0.6549, dev avg loss 0.586868, throughput 9.03935K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0116337, throughput 9.15052K wps
[Epoch 14 Batch 60/62] avg loss 0.0118354, throughput 8.63401K wps
Begin Testing...
[Epoch 14] train avg loss 0.0118519, dev acc 0.6578, dev avg loss 0.581732, throughput 8.87332K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0116925, throughput 9.17961K wps
[Epoch 15 Batch 60/62] avg loss 0.011594, throughput 9.07405K wps
Begin Testing...
[Epoch 15] train avg loss 0.011831, dev acc 0.6696, dev avg loss 0.577261, throughput 9.1544K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0116404, throughput 8.93808K wps
[Epoch 16 Batch 60/62] avg loss 0.0114595, throughput 9.01197K wps
Begin Testing...
[Epoch 16] train avg loss 0.0117125, dev acc 0.6696, dev avg loss 0.571356, throughput 9.00325K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0114011, throughput 8.59192K wps
[Epoch 17 Batch 60/62] avg loss 0.011354, throughput 9.10059K wps
Begin Testing...
[Epoch 17] train avg loss 0.011518, dev acc 0.6696, dev avg loss 0.565913, throughput 8.87447K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.0112214, throughput 8.99251K wps
[Epoch 18 Batch 60/62] avg loss 0.0112658, throughput 9.02798K wps
Begin Testing...
[Epoch 18] train avg loss 0.0113783, dev acc 0.6696, dev avg loss 0.561139, throughput 9.04311K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0109826, throughput 9.04437K wps
[Epoch 19 Batch 60/62] avg loss 0.0112645, throughput 9.0876K wps
Begin Testing...
[Epoch 19] train avg loss 0.0112658, dev acc 0.7021, dev avg loss 0.555375, throughput 9.0983K wps
Observed Improvement.
Begin Testing...
[Epoch 20 Batch 30/62] avg loss 0.0111633, throughput 8.68538K wps
[Epoch 20 Batch 60/62] avg loss 0.0109988, throughput 8.89235K wps
Begin Testing...
[Epoch 20] train avg loss 0.0112315, dev acc 0.6726, dev avg loss 0.550143, throughput 8.82478K wps
[Epoch 21 Batch 30/62] avg loss 0.0110201, throughput 8.97672K wps
[Epoch 21 Batch 60/62] avg loss 0.0108763, throughput 8.93886K wps
Begin Testing...
[Epoch 21] train avg loss 0.0111301, dev acc 0.7198, dev avg loss 0.543702, throughput 8.91923K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.010852, throughput 9.12011K wps
[Epoch 22 Batch 60/62] avg loss 0.0108121, throughput 9.07252K wps
Begin Testing...
[Epoch 22] train avg loss 0.0109664, dev acc 0.7021, dev avg loss 0.53791, throughput 9.06509K wps
[Epoch 23 Batch 30/62] avg loss 0.0108, throughput 8.99663K wps
[Epoch 23 Batch 60/62] avg loss 0.0107044, throughput 8.80377K wps
Begin Testing...
[Epoch 23] train avg loss 0.0109071, dev acc 0.7463, dev avg loss 0.53247, throughput 8.93589K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.0104833, throughput 9.15011K wps
[Epoch 24 Batch 60/62] avg loss 0.0107127, throughput 9.06647K wps
Begin Testing...
[Epoch 24] train avg loss 0.0106819, dev acc 0.6991, dev avg loss 0.527581, throughput 9.07663K wps
[Epoch 25 Batch 30/62] avg loss 0.0106312, throughput 8.94512K wps
[Epoch 25 Batch 60/62] avg loss 0.01032, throughput 9.02934K wps
Begin Testing...
[Epoch 25] train avg loss 0.0105881, dev acc 0.7168, dev avg loss 0.520845, throughput 9.01492K wps
[Epoch 26 Batch 30/62] avg loss 0.0102467, throughput 8.87643K wps
[Epoch 26 Batch 60/62] avg loss 0.0103563, throughput 8.99136K wps
Begin Testing...
[Epoch 26] train avg loss 0.0104636, dev acc 0.7257, dev avg loss 0.515151, throughput 8.96949K wps
[Epoch 27 Batch 30/62] avg loss 0.0101609, throughput 8.82005K wps
[Epoch 27 Batch 60/62] avg loss 0.0102078, throughput 9.0171K wps
Begin Testing...
[Epoch 27] train avg loss 0.0103428, dev acc 0.7463, dev avg loss 0.508999, throughput 8.88448K wps
Observed Improvement.
Begin Testing...
[Epoch 28 Batch 30/62] avg loss 0.0101045, throughput 8.87173K wps
[Epoch 28 Batch 60/62] avg loss 0.00993968, throughput 9.13808K wps
Begin Testing...
[Epoch 28] train avg loss 0.0101674, dev acc 0.7434, dev avg loss 0.502989, throughput 8.99303K wps
[Epoch 29 Batch 30/62] avg loss 0.00993582, throughput 9.25566K wps
[Epoch 29 Batch 60/62] avg loss 0.00997976, throughput 8.97259K wps
Begin Testing...
[Epoch 29] train avg loss 0.010061, dev acc 0.7493, dev avg loss 0.497903, throughput 9.07758K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.010077, throughput 9.09445K wps
[Epoch 30 Batch 60/62] avg loss 0.00956863, throughput 9.00411K wps
Begin Testing...
[Epoch 30] train avg loss 0.00996063, dev acc 0.7729, dev avg loss 0.492147, throughput 9.07681K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00971796, throughput 9.29306K wps
[Epoch 31 Batch 60/62] avg loss 0.00993029, throughput 9.07998K wps
Begin Testing...
[Epoch 31] train avg loss 0.00991146, dev acc 0.7463, dev avg loss 0.489244, throughput 9.16158K wps
[Epoch 32 Batch 30/62] avg loss 0.00966116, throughput 9.21145K wps
[Epoch 32 Batch 60/62] avg loss 0.00944399, throughput 9.12649K wps
Begin Testing...
[Epoch 32] train avg loss 0.00967895, dev acc 0.7699, dev avg loss 0.482192, throughput 9.14271K wps
[Epoch 33 Batch 30/62] avg loss 0.00945414, throughput 9.0355K wps
[Epoch 33 Batch 60/62] avg loss 0.00954656, throughput 8.96029K wps
Begin Testing...
[Epoch 33] train avg loss 0.00968482, dev acc 0.7876, dev avg loss 0.476442, throughput 9.0288K wps
Observed Improvement.
Begin Testing...
[Epoch 34 Batch 30/62] avg loss 0.00944773, throughput 8.94473K wps
[Epoch 34 Batch 60/62] avg loss 0.00936897, throughput 8.96914K wps
Begin Testing...
[Epoch 34] train avg loss 0.00956565, dev acc 0.7906, dev avg loss 0.471764, throughput 8.96964K wps
Observed Improvement.
Begin Testing...
[Epoch 35 Batch 30/62] avg loss 0.00952675, throughput 8.9743K wps
[Epoch 35 Batch 60/62] avg loss 0.00914793, throughput 8.90625K wps
Begin Testing...
[Epoch 35] train avg loss 0.00944909, dev acc 0.7729, dev avg loss 0.467768, throughput 8.92203K wps
[Epoch 36 Batch 30/62] avg loss 0.00901722, throughput 8.71914K wps
[Epoch 36 Batch 60/62] avg loss 0.00928084, throughput 9.03759K wps
Begin Testing...
[Epoch 36] train avg loss 0.00921298, dev acc 0.7788, dev avg loss 0.464877, throughput 8.90088K wps
[Epoch 37 Batch 30/62] avg loss 0.00895534, throughput 9.25576K wps
[Epoch 37 Batch 60/62] avg loss 0.00903449, throughput 8.69149K wps
Begin Testing...
[Epoch 37] train avg loss 0.00914387, dev acc 0.7817, dev avg loss 0.459607, throughput 8.99406K wps
[Epoch 38 Batch 30/62] avg loss 0.00899531, throughput 8.97771K wps
[Epoch 38 Batch 60/62] avg loss 0.00878652, throughput 9.02869K wps
Begin Testing...
[Epoch 38] train avg loss 0.00903834, dev acc 0.7994, dev avg loss 0.453862, throughput 8.97954K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/62] avg loss 0.00870428, throughput 9.01911K wps
[Epoch 39 Batch 60/62] avg loss 0.00899402, throughput 8.90791K wps
Begin Testing...
[Epoch 39] train avg loss 0.00894105, dev acc 0.7965, dev avg loss 0.449946, throughput 8.99487K wps
[Epoch 40 Batch 30/62] avg loss 0.00887742, throughput 8.98954K wps
[Epoch 40 Batch 60/62] avg loss 0.00855796, throughput 9.03235K wps
Begin Testing...
[Epoch 40] train avg loss 0.00880078, dev acc 0.7876, dev avg loss 0.447836, throughput 9.00003K wps
[Epoch 41 Batch 30/62] avg loss 0.00856983, throughput 8.80745K wps
[Epoch 41 Batch 60/62] avg loss 0.0085779, throughput 9.08132K wps
Begin Testing...
[Epoch 41] train avg loss 0.00874094, dev acc 0.7994, dev avg loss 0.442305, throughput 8.96992K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/62] avg loss 0.00845646, throughput 8.98937K wps
[Epoch 42 Batch 60/62] avg loss 0.00864121, throughput 9.08251K wps
Begin Testing...
[Epoch 42] train avg loss 0.00865531, dev acc 0.8024, dev avg loss 0.439528, throughput 9.05282K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/62] avg loss 0.00841153, throughput 9.00745K wps
[Epoch 43 Batch 60/62] avg loss 0.00847663, throughput 9.09698K wps
Begin Testing...
[Epoch 43] train avg loss 0.00848796, dev acc 0.7965, dev avg loss 0.437981, throughput 9.08022K wps
[Epoch 44 Batch 30/62] avg loss 0.00833372, throughput 9.16791K wps
[Epoch 44 Batch 60/62] avg loss 0.00859906, throughput 8.83501K wps
Begin Testing...
[Epoch 44] train avg loss 0.00848481, dev acc 0.8053, dev avg loss 0.432211, throughput 9.05251K wps
Observed Improvement.
Begin Testing...
[Epoch 45 Batch 30/62] avg loss 0.00822014, throughput 9.24532K wps
[Epoch 45 Batch 60/62] avg loss 0.0082423, throughput 8.68788K wps
Begin Testing...
[Epoch 45] train avg loss 0.00829651, dev acc 0.8024, dev avg loss 0.430191, throughput 8.93663K wps
[Epoch 46 Batch 30/62] avg loss 0.00807789, throughput 9.01176K wps
[Epoch 46 Batch 60/62] avg loss 0.00807911, throughput 8.92453K wps
Begin Testing...
[Epoch 46] train avg loss 0.0081846, dev acc 0.8053, dev avg loss 0.426188, throughput 9.00008K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/62] avg loss 0.00818511, throughput 9.01852K wps
[Epoch 47 Batch 60/62] avg loss 0.00802211, throughput 9.08139K wps
Begin Testing...
[Epoch 47] train avg loss 0.00819751, dev acc 0.8024, dev avg loss 0.423835, throughput 9.0718K wps
[Epoch 48 Batch 30/62] avg loss 0.008188, throughput 9.08944K wps
[Epoch 48 Batch 60/62] avg loss 0.00778592, throughput 9.10468K wps
Begin Testing...
[Epoch 48] train avg loss 0.00811545, dev acc 0.8053, dev avg loss 0.423647, throughput 9.12809K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/62] avg loss 0.00797287, throughput 9.23656K wps
[Epoch 49 Batch 60/62] avg loss 0.00788231, throughput 9.06219K wps
Begin Testing...
[Epoch 49] train avg loss 0.00808728, dev acc 0.8024, dev avg loss 0.422096, throughput 9.1722K wps
[Epoch 50 Batch 30/62] avg loss 0.00786101, throughput 9.18896K wps
[Epoch 50 Batch 60/62] avg loss 0.00762386, throughput 8.75964K wps
Begin Testing...
[Epoch 50] train avg loss 0.00781649, dev acc 0.8053, dev avg loss 0.419017, throughput 8.9472K wps
Observed Improvement.
Begin Testing...
[Epoch 51 Batch 30/62] avg loss 0.00767255, throughput 9.14088K wps
[Epoch 51 Batch 60/62] avg loss 0.00756198, throughput 9.04494K wps
Begin Testing...
[Epoch 51] train avg loss 0.00772644, dev acc 0.8112, dev avg loss 0.412746, throughput 9.12145K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00785498, throughput 9.21652K wps
[Epoch 52 Batch 60/62] avg loss 0.00765483, throughput 8.9953K wps
Begin Testing...
[Epoch 52] train avg loss 0.00783187, dev acc 0.8024, dev avg loss 0.411448, throughput 9.13409K wps
[Epoch 53 Batch 30/62] avg loss 0.0077819, throughput 9.02745K wps
[Epoch 53 Batch 60/62] avg loss 0.0074625, throughput 9.06372K wps
Begin Testing...
[Epoch 53] train avg loss 0.00770184, dev acc 0.8024, dev avg loss 0.413735, throughput 9.09365K wps
[Epoch 54 Batch 30/62] avg loss 0.0075563, throughput 8.86768K wps
[Epoch 54 Batch 60/62] avg loss 0.00735815, throughput 8.9069K wps
Begin Testing...
[Epoch 54] train avg loss 0.0075278, dev acc 0.8083, dev avg loss 0.406866, throughput 8.8529K wps
[Epoch 55 Batch 30/62] avg loss 0.00725839, throughput 9.15426K wps
[Epoch 55 Batch 60/62] avg loss 0.00748685, throughput 8.82298K wps
Begin Testing...
[Epoch 55] train avg loss 0.00750037, dev acc 0.8083, dev avg loss 0.406572, throughput 9.01905K wps
[Epoch 56 Batch 30/62] avg loss 0.0073352, throughput 9.02594K wps
[Epoch 56 Batch 60/62] avg loss 0.0072567, throughput 9.08055K wps
Begin Testing...
[Epoch 56] train avg loss 0.00735549, dev acc 0.8083, dev avg loss 0.408733, throughput 9.08626K wps
[Epoch 57 Batch 30/62] avg loss 0.00723593, throughput 8.99377K wps
[Epoch 57 Batch 60/62] avg loss 0.00720197, throughput 8.91172K wps
Begin Testing...
[Epoch 57] train avg loss 0.00727636, dev acc 0.8083, dev avg loss 0.403082, throughput 8.93372K wps
[Epoch 58 Batch 30/62] avg loss 0.00728764, throughput 9.0886K wps
[Epoch 58 Batch 60/62] avg loss 0.00719621, throughput 8.76863K wps
Begin Testing...
[Epoch 58] train avg loss 0.00733817, dev acc 0.7994, dev avg loss 0.397838, throughput 8.95074K wps
[Epoch 59 Batch 30/62] avg loss 0.00703767, throughput 9.08461K wps
[Epoch 59 Batch 60/62] avg loss 0.0070765, throughput 8.95886K wps
Begin Testing...
[Epoch 59] train avg loss 0.00719825, dev acc 0.8112, dev avg loss 0.400305, throughput 8.98795K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/62] avg loss 0.00701743, throughput 8.89458K wps
[Epoch 60 Batch 60/62] avg loss 0.00681901, throughput 9.07418K wps
Begin Testing...
[Epoch 60] train avg loss 0.00705747, dev acc 0.8112, dev avg loss 0.394026, throughput 9.01783K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/62] avg loss 0.00689397, throughput 9.19873K wps
[Epoch 61 Batch 60/62] avg loss 0.00696119, throughput 8.85398K wps
Begin Testing...
[Epoch 61] train avg loss 0.00704112, dev acc 0.8083, dev avg loss 0.392384, throughput 9.00335K wps
[Epoch 62 Batch 30/62] avg loss 0.00690189, throughput 9.16015K wps
[Epoch 62 Batch 60/62] avg loss 0.00659019, throughput 9.05018K wps
Begin Testing...
[Epoch 62] train avg loss 0.00682479, dev acc 0.8142, dev avg loss 0.397407, throughput 9.13402K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/62] avg loss 0.00682964, throughput 9.12532K wps
[Epoch 63 Batch 60/62] avg loss 0.00676559, throughput 8.67606K wps
Begin Testing...
[Epoch 63] train avg loss 0.0068755, dev acc 0.8024, dev avg loss 0.388961, throughput 8.87169K wps
[Epoch 64 Batch 30/62] avg loss 0.00692553, throughput 9.07185K wps
[Epoch 64 Batch 60/62] avg loss 0.00645377, throughput 9.07028K wps
Begin Testing...
[Epoch 64] train avg loss 0.00683656, dev acc 0.8083, dev avg loss 0.388082, throughput 9.10052K wps
[Epoch 65 Batch 30/62] avg loss 0.00668126, throughput 8.97823K wps
[Epoch 65 Batch 60/62] avg loss 0.00664449, throughput 8.89643K wps
Begin Testing...
[Epoch 65] train avg loss 0.00672433, dev acc 0.8053, dev avg loss 0.386399, throughput 8.97208K wps
[Epoch 66 Batch 30/62] avg loss 0.00627849, throughput 9.06787K wps
[Epoch 66 Batch 60/62] avg loss 0.00665795, throughput 9.07668K wps
Begin Testing...
[Epoch 66] train avg loss 0.00655057, dev acc 0.8112, dev avg loss 0.384362, throughput 9.10268K wps
[Epoch 67 Batch 30/62] avg loss 0.0061896, throughput 9.09654K wps
[Epoch 67 Batch 60/62] avg loss 0.00680125, throughput 9.05444K wps
Begin Testing...
[Epoch 67] train avg loss 0.00655285, dev acc 0.8112, dev avg loss 0.383204, throughput 9.04844K wps
[Epoch 68 Batch 30/62] avg loss 0.00630503, throughput 9.13653K wps
[Epoch 68 Batch 60/62] avg loss 0.00671292, throughput 9.0428K wps
Begin Testing...
[Epoch 68] train avg loss 0.00653378, dev acc 0.8142, dev avg loss 0.383141, throughput 9.11291K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/62] avg loss 0.00636182, throughput 9.09575K wps
[Epoch 69 Batch 60/62] avg loss 0.00625776, throughput 9.07361K wps
Begin Testing...
[Epoch 69] train avg loss 0.00643959, dev acc 0.8142, dev avg loss 0.380524, throughput 9.11609K wps
Observed Improvement.
Begin Testing...
[Epoch 70 Batch 30/62] avg loss 0.00603824, throughput 9.12803K wps
[Epoch 70 Batch 60/62] avg loss 0.00651607, throughput 8.93904K wps
Begin Testing...
[Epoch 70] train avg loss 0.00641401, dev acc 0.8230, dev avg loss 0.383578, throughput 9.05421K wps
Observed Improvement.
Begin Testing...
[Epoch 71 Batch 30/62] avg loss 0.00596851, throughput 9.26224K wps
[Epoch 71 Batch 60/62] avg loss 0.00645572, throughput 8.60132K wps
Begin Testing...
[Epoch 71] train avg loss 0.006314, dev acc 0.8053, dev avg loss 0.378459, throughput 8.90693K wps
[Epoch 72 Batch 30/62] avg loss 0.00609914, throughput 9.13638K wps
[Epoch 72 Batch 60/62] avg loss 0.00614282, throughput 9.12997K wps
Begin Testing...
[Epoch 72] train avg loss 0.00621953, dev acc 0.8201, dev avg loss 0.380503, throughput 9.1594K wps
[Epoch 73 Batch 30/62] avg loss 0.00603364, throughput 9.02165K wps
[Epoch 73 Batch 60/62] avg loss 0.00612426, throughput 9.14971K wps
Begin Testing...
[Epoch 73] train avg loss 0.0061752, dev acc 0.8142, dev avg loss 0.375629, throughput 9.11223K wps
[Epoch 74 Batch 30/62] avg loss 0.00617838, throughput 9.04436K wps
[Epoch 74 Batch 60/62] avg loss 0.0059106, throughput 8.90318K wps
Begin Testing...
[Epoch 74] train avg loss 0.00608102, dev acc 0.8142, dev avg loss 0.374164, throughput 8.93748K wps
[Epoch 75 Batch 30/62] avg loss 0.00592578, throughput 8.94213K wps
[Epoch 75 Batch 60/62] avg loss 0.0060131, throughput 8.63535K wps
Begin Testing...
[Epoch 75] train avg loss 0.0060537, dev acc 0.8083, dev avg loss 0.373531, throughput 8.77286K wps
[Epoch 76 Batch 30/62] avg loss 0.00585085, throughput 9.34087K wps
[Epoch 76 Batch 60/62] avg loss 0.00599092, throughput 9.01419K wps
Begin Testing...
[Epoch 76] train avg loss 0.00596755, dev acc 0.8260, dev avg loss 0.379407, throughput 9.14773K wps
Observed Improvement.
Begin Testing...
[Epoch 77 Batch 30/62] avg loss 0.00595988, throughput 9.07965K wps
[Epoch 77 Batch 60/62] avg loss 0.00568806, throughput 8.9464K wps
Begin Testing...
[Epoch 77] train avg loss 0.00589603, dev acc 0.8201, dev avg loss 0.375166, throughput 9.03984K wps
[Epoch 78 Batch 30/62] avg loss 0.0057061, throughput 8.91549K wps
[Epoch 78 Batch 60/62] avg loss 0.00590856, throughput 9.16774K wps
Begin Testing...
[Epoch 78] train avg loss 0.00588192, dev acc 0.8083, dev avg loss 0.370727, throughput 9.06572K wps
[Epoch 79 Batch 30/62] avg loss 0.00580555, throughput 8.72577K wps
[Epoch 79 Batch 60/62] avg loss 0.00556105, throughput 8.7564K wps
Begin Testing...
[Epoch 79] train avg loss 0.00576154, dev acc 0.8142, dev avg loss 0.370275, throughput 8.71721K wps
[Epoch 80 Batch 30/62] avg loss 0.00583489, throughput 8.94837K wps
[Epoch 80 Batch 60/62] avg loss 0.00558265, throughput 8.95298K wps
Begin Testing...
[Epoch 80] train avg loss 0.00585773, dev acc 0.8260, dev avg loss 0.383407, throughput 8.98038K wps
Observed Improvement.
Begin Testing...
[Epoch 81 Batch 30/62] avg loss 0.00570561, throughput 9.20274K wps
[Epoch 81 Batch 60/62] avg loss 0.00542967, throughput 8.99603K wps
Begin Testing...
[Epoch 81] train avg loss 0.00562395, dev acc 0.8201, dev avg loss 0.371632, throughput 9.11803K wps
[Epoch 82 Batch 30/62] avg loss 0.00560528, throughput 9.16746K wps
[Epoch 82 Batch 60/62] avg loss 0.00560915, throughput 9.01883K wps
Begin Testing...
[Epoch 82] train avg loss 0.00566593, dev acc 0.8112, dev avg loss 0.366963, throughput 9.12262K wps
[Epoch 83 Batch 30/62] avg loss 0.00537636, throughput 9.1752K wps
[Epoch 83 Batch 60/62] avg loss 0.00538792, throughput 9.08858K wps
Begin Testing...
[Epoch 83] train avg loss 0.00542026, dev acc 0.8201, dev avg loss 0.367479, throughput 9.15414K wps
[Epoch 84 Batch 30/62] avg loss 0.00533968, throughput 9.16508K wps
[Epoch 84 Batch 60/62] avg loss 0.00541029, throughput 8.97262K wps
Begin Testing...
[Epoch 84] train avg loss 0.00547109, dev acc 0.8201, dev avg loss 0.36859, throughput 9.09937K wps
[Epoch 85 Batch 30/62] avg loss 0.00528129, throughput 9.20338K wps
[Epoch 85 Batch 60/62] avg loss 0.00531506, throughput 9.04012K wps
Begin Testing...
[Epoch 85] train avg loss 0.00538868, dev acc 0.8201, dev avg loss 0.364961, throughput 9.15068K wps
[Epoch 86 Batch 30/62] avg loss 0.00518974, throughput 8.89197K wps
[Epoch 86 Batch 60/62] avg loss 0.00543602, throughput 9.11995K wps
Begin Testing...
[Epoch 86] train avg loss 0.00536122, dev acc 0.8201, dev avg loss 0.364312, throughput 9.03212K wps
[Epoch 87 Batch 30/62] avg loss 0.005184, throughput 8.84539K wps
[Epoch 87 Batch 60/62] avg loss 0.00529462, throughput 9.21504K wps
Begin Testing...
[Epoch 87] train avg loss 0.00529019, dev acc 0.8112, dev avg loss 0.362919, throughput 9.05648K wps
[Epoch 88 Batch 30/62] avg loss 0.0051191, throughput 9.01766K wps
[Epoch 88 Batch 60/62] avg loss 0.00501837, throughput 9.13252K wps
Begin Testing...
[Epoch 88] train avg loss 0.00515257, dev acc 0.8201, dev avg loss 0.363611, throughput 9.10542K wps
[Epoch 89 Batch 30/62] avg loss 0.0050698, throughput 9.10242K wps
[Epoch 89 Batch 60/62] avg loss 0.00530003, throughput 8.74892K wps
Begin Testing...
[Epoch 89] train avg loss 0.00527428, dev acc 0.8142, dev avg loss 0.361185, throughput 8.90598K wps
[Epoch 90 Batch 30/62] avg loss 0.00519332, throughput 9.09763K wps
[Epoch 90 Batch 60/62] avg loss 0.00508867, throughput 9.04898K wps
Begin Testing...
[Epoch 90] train avg loss 0.00523357, dev acc 0.8260, dev avg loss 0.364749, throughput 9.10123K wps
Observed Improvement.
Begin Testing...
[Epoch 91 Batch 30/62] avg loss 0.00512608, throughput 9.11103K wps
[Epoch 91 Batch 60/62] avg loss 0.00508929, throughput 9.0815K wps
Begin Testing...
[Epoch 91] train avg loss 0.00519981, dev acc 0.8112, dev avg loss 0.360887, throughput 9.12455K wps
[Epoch 92 Batch 30/62] avg loss 0.00491395, throughput 9.16141K wps
[Epoch 92 Batch 60/62] avg loss 0.00506883, throughput 9.08825K wps
Begin Testing...
[Epoch 92] train avg loss 0.00510282, dev acc 0.8201, dev avg loss 0.364184, throughput 9.14459K wps
[Epoch 93 Batch 30/62] avg loss 0.00477104, throughput 8.79731K wps
[Epoch 93 Batch 60/62] avg loss 0.00503535, throughput 9.05215K wps
Begin Testing...
[Epoch 93] train avg loss 0.00500431, dev acc 0.8112, dev avg loss 0.358856, throughput 8.96281K wps
[Epoch 94 Batch 30/62] avg loss 0.00479556, throughput 8.95445K wps
[Epoch 94 Batch 60/62] avg loss 0.00500209, throughput 8.98902K wps
Begin Testing...
[Epoch 94] train avg loss 0.00492282, dev acc 0.8230, dev avg loss 0.365324, throughput 9.00021K wps
[Epoch 95 Batch 30/62] avg loss 0.00480992, throughput 9.07513K wps
[Epoch 95 Batch 60/62] avg loss 0.00493361, throughput 9.1283K wps
Begin Testing...
[Epoch 95] train avg loss 0.0048891, dev acc 0.8171, dev avg loss 0.357666, throughput 9.13013K wps
[Epoch 96 Batch 30/62] avg loss 0.00481235, throughput 9.14299K wps
[Epoch 96 Batch 60/62] avg loss 0.00480206, throughput 8.88133K wps
Begin Testing...
[Epoch 96] train avg loss 0.00480657, dev acc 0.8260, dev avg loss 0.361747, throughput 9.04196K wps
Observed Improvement.
Begin Testing...
[Epoch 97 Batch 30/62] avg loss 0.00476665, throughput 9.13087K wps
[Epoch 97 Batch 60/62] avg loss 0.00475382, throughput 9.05035K wps
Begin Testing...
[Epoch 97] train avg loss 0.00482452, dev acc 0.8230, dev avg loss 0.36059, throughput 9.12228K wps
[Epoch 98 Batch 30/62] avg loss 0.00484202, throughput 9.15961K wps
[Epoch 98 Batch 60/62] avg loss 0.00452403, throughput 9.09566K wps
Begin Testing...
[Epoch 98] train avg loss 0.00473173, dev acc 0.8260, dev avg loss 0.362724, throughput 9.15853K wps
Observed Improvement.
Begin Testing...
[Epoch 99 Batch 30/62] avg loss 0.00441856, throughput 9.12447K wps
[Epoch 99 Batch 60/62] avg loss 0.0047949, throughput 9.0213K wps
Begin Testing...
[Epoch 99] train avg loss 0.00472019, dev acc 0.8171, dev avg loss 0.354204, throughput 9.102K wps
[Epoch 100 Batch 30/62] avg loss 0.00449051, throughput 9.13734K wps
[Epoch 100 Batch 60/62] avg loss 0.00460631, throughput 9.11413K wps
Begin Testing...
[Epoch 100] train avg loss 0.0045723, dev acc 0.8260, dev avg loss 0.361223, throughput 9.15641K wps
Observed Improvement.
Begin Testing...
[Epoch 101 Batch 30/62] avg loss 0.00427918, throughput 9.30091K wps
[Epoch 101 Batch 60/62] avg loss 0.00471386, throughput 8.94768K wps
Begin Testing...
[Epoch 101] train avg loss 0.00454421, dev acc 0.8289, dev avg loss 0.361838, throughput 9.08393K wps
Observed Improvement.
Begin Testing...
[Epoch 102 Batch 30/62] avg loss 0.00433906, throughput 8.95633K wps
[Epoch 102 Batch 60/62] avg loss 0.00434687, throughput 9.15762K wps
Begin Testing...
[Epoch 102] train avg loss 0.00439024, dev acc 0.8230, dev avg loss 0.354795, throughput 9.08127K wps
[Epoch 103 Batch 30/62] avg loss 0.00436572, throughput 9.18457K wps
[Epoch 103 Batch 60/62] avg loss 0.00437999, throughput 8.84928K wps
Begin Testing...
[Epoch 103] train avg loss 0.00442592, dev acc 0.8260, dev avg loss 0.353831, throughput 9.04658K wps
[Epoch 104 Batch 30/62] avg loss 0.00434808, throughput 9.05841K wps
[Epoch 104 Batch 60/62] avg loss 0.00449106, throughput 9.08735K wps
Begin Testing...
[Epoch 104] train avg loss 0.00448132, dev acc 0.8260, dev avg loss 0.351136, throughput 9.10391K wps
[Epoch 105 Batch 30/62] avg loss 0.00443038, throughput 8.5942K wps
[Epoch 105 Batch 60/62] avg loss 0.0042701, throughput 8.92345K wps
Begin Testing...
[Epoch 105] train avg loss 0.00435213, dev acc 0.8260, dev avg loss 0.35453, throughput 8.7443K wps
[Epoch 106 Batch 30/62] avg loss 0.00464425, throughput 9.06796K wps
[Epoch 106 Batch 60/62] avg loss 0.00418065, throughput 9.08798K wps
Begin Testing...
[Epoch 106] train avg loss 0.00442111, dev acc 0.8319, dev avg loss 0.356188, throughput 9.10963K wps
Observed Improvement.
Begin Testing...
[Epoch 107 Batch 30/62] avg loss 0.00413953, throughput 9.11085K wps
[Epoch 107 Batch 60/62] avg loss 0.00427057, throughput 8.85292K wps
Begin Testing...
[Epoch 107] train avg loss 0.00426541, dev acc 0.8171, dev avg loss 0.35015, throughput 8.97322K wps
[Epoch 108 Batch 30/62] avg loss 0.00408973, throughput 9.13922K wps
[Epoch 108 Batch 60/62] avg loss 0.00435018, throughput 9.10521K wps
Begin Testing...
[Epoch 108] train avg loss 0.0042402, dev acc 0.8289, dev avg loss 0.348839, throughput 9.15128K wps
[Epoch 109 Batch 30/62] avg loss 0.00412349, throughput 9.15896K wps
[Epoch 109 Batch 60/62] avg loss 0.00429289, throughput 8.97568K wps
Begin Testing...
[Epoch 109] train avg loss 0.00426445, dev acc 0.8260, dev avg loss 0.353961, throughput 9.09995K wps
[Epoch 110 Batch 30/62] avg loss 0.00399098, throughput 9.04523K wps
[Epoch 110 Batch 60/62] avg loss 0.00418279, throughput 8.80639K wps
Begin Testing...
[Epoch 110] train avg loss 0.00411331, dev acc 0.8319, dev avg loss 0.35678, throughput 8.90121K wps
Observed Improvement.
Begin Testing...
[Epoch 111 Batch 30/62] avg loss 0.00395332, throughput 9.17519K wps
[Epoch 111 Batch 60/62] avg loss 0.0042326, throughput 8.89696K wps
Begin Testing...
[Epoch 111] train avg loss 0.00410945, dev acc 0.8230, dev avg loss 0.350285, throughput 9.06397K wps
[Epoch 112 Batch 30/62] avg loss 0.00391282, throughput 9.18868K wps
[Epoch 112 Batch 60/62] avg loss 0.00409897, throughput 8.81533K wps
Begin Testing...
[Epoch 112] train avg loss 0.00400574, dev acc 0.8230, dev avg loss 0.348418, throughput 9.03082K wps
[Epoch 113 Batch 30/62] avg loss 0.00388474, throughput 9.19523K wps
[Epoch 113 Batch 60/62] avg loss 0.00405041, throughput 8.87044K wps
Begin Testing...
[Epoch 113] train avg loss 0.00401587, dev acc 0.8289, dev avg loss 0.346817, throughput 9.00768K wps
[Epoch 114 Batch 30/62] avg loss 0.00392287, throughput 8.4934K wps
[Epoch 114 Batch 60/62] avg loss 0.00373966, throughput 9.11805K wps
Begin Testing...
[Epoch 114] train avg loss 0.00395314, dev acc 0.8319, dev avg loss 0.346706, throughput 8.76703K wps
Observed Improvement.
Begin Testing...
[Epoch 115 Batch 30/62] avg loss 0.00390886, throughput 8.74669K wps
[Epoch 115 Batch 60/62] avg loss 0.00378617, throughput 9.19667K wps
Begin Testing...
[Epoch 115] train avg loss 0.00393336, dev acc 0.8378, dev avg loss 0.355198, throughput 8.98147K wps
Observed Improvement.
Begin Testing...
[Epoch 116 Batch 30/62] avg loss 0.0034159, throughput 8.5422K wps
[Epoch 116 Batch 60/62] avg loss 0.00406601, throughput 9.01098K wps
Begin Testing...
[Epoch 116] train avg loss 0.00381363, dev acc 0.8230, dev avg loss 0.34718, throughput 8.80499K wps
[Epoch 117 Batch 30/62] avg loss 0.00372139, throughput 8.88972K wps
[Epoch 117 Batch 60/62] avg loss 0.00386322, throughput 8.7045K wps
Begin Testing...
[Epoch 117] train avg loss 0.00391805, dev acc 0.8230, dev avg loss 0.345991, throughput 8.79467K wps
[Epoch 118 Batch 30/62] avg loss 0.00383582, throughput 8.88252K wps
[Epoch 118 Batch 60/62] avg loss 0.00389307, throughput 8.99424K wps
Begin Testing...
[Epoch 118] train avg loss 0.00391716, dev acc 0.8348, dev avg loss 0.359605, throughput 8.90547K wps
[Epoch 119 Batch 30/62] avg loss 0.00368738, throughput 8.95643K wps
[Epoch 119 Batch 60/62] avg loss 0.00360271, throughput 8.50977K wps
Begin Testing...
[Epoch 119] train avg loss 0.0036995, dev acc 0.8201, dev avg loss 0.346206, throughput 8.76163K wps
[Epoch 120 Batch 30/62] avg loss 0.00373368, throughput 8.98702K wps
[Epoch 120 Batch 60/62] avg loss 0.00371204, throughput 9.1485K wps
Begin Testing...
[Epoch 120] train avg loss 0.00379593, dev acc 0.8319, dev avg loss 0.344916, throughput 9.09398K wps
[Epoch 121 Batch 30/62] avg loss 0.00374859, throughput 9.16659K wps
[Epoch 121 Batch 60/62] avg loss 0.00344835, throughput 9.01373K wps
Begin Testing...
[Epoch 121] train avg loss 0.00360611, dev acc 0.8378, dev avg loss 0.357848, throughput 9.14358K wps
Observed Improvement.
Begin Testing...
[Epoch 122 Batch 30/62] avg loss 0.00361275, throughput 9.19442K wps
[Epoch 122 Batch 60/62] avg loss 0.00352306, throughput 8.83596K wps
Begin Testing...
[Epoch 122] train avg loss 0.00358901, dev acc 0.8260, dev avg loss 0.347716, throughput 9.02908K wps
[Epoch 123 Batch 30/62] avg loss 0.00363163, throughput 9.26193K wps
[Epoch 123 Batch 60/62] avg loss 0.00334991, throughput 9.10118K wps
Begin Testing...
[Epoch 123] train avg loss 0.00359714, dev acc 0.8230, dev avg loss 0.345606, throughput 9.14815K wps
[Epoch 124 Batch 30/62] avg loss 0.00366675, throughput 8.95453K wps
[Epoch 124 Batch 60/62] avg loss 0.00338193, throughput 9.13961K wps
Begin Testing...
[Epoch 124] train avg loss 0.00351939, dev acc 0.8437, dev avg loss 0.351159, throughput 9.07763K wps
Observed Improvement.
Begin Testing...
[Epoch 125 Batch 30/62] avg loss 0.00359938, throughput 9.08296K wps
[Epoch 125 Batch 60/62] avg loss 0.00342067, throughput 8.9646K wps
Begin Testing...
[Epoch 125] train avg loss 0.00351432, dev acc 0.8348, dev avg loss 0.356092, throughput 9.05603K wps
[Epoch 126 Batch 30/62] avg loss 0.00357089, throughput 8.8757K wps
[Epoch 126 Batch 60/62] avg loss 0.00347266, throughput 9.18482K wps
Begin Testing...
[Epoch 126] train avg loss 0.00353264, dev acc 0.8378, dev avg loss 0.358649, throughput 9.00364K wps
[Epoch 127 Batch 30/62] avg loss 0.00341477, throughput 9.1589K wps
[Epoch 127 Batch 60/62] avg loss 0.00343484, throughput 9.04239K wps
Begin Testing...
[Epoch 127] train avg loss 0.00345026, dev acc 0.8407, dev avg loss 0.353212, throughput 9.13098K wps
[Epoch 128 Batch 30/62] avg loss 0.00356066, throughput 9.03796K wps
[Epoch 128 Batch 60/62] avg loss 0.00328954, throughput 8.90654K wps
Begin Testing...
[Epoch 128] train avg loss 0.00347406, dev acc 0.8407, dev avg loss 0.348037, throughput 8.94457K wps
[Epoch 129 Batch 30/62] avg loss 0.00336136, throughput 9.10818K wps
[Epoch 129 Batch 60/62] avg loss 0.00336784, throughput 9.01876K wps
Begin Testing...
[Epoch 129] train avg loss 0.00344065, dev acc 0.8319, dev avg loss 0.342278, throughput 9.09446K wps
[Epoch 130 Batch 30/62] avg loss 0.0032558, throughput 9.19998K wps
[Epoch 130 Batch 60/62] avg loss 0.00326578, throughput 8.92685K wps
Begin Testing...
[Epoch 130] train avg loss 0.00333272, dev acc 0.8466, dev avg loss 0.350495, throughput 9.05831K wps
Observed Improvement.
Begin Testing...
[Epoch 131 Batch 30/62] avg loss 0.00310663, throughput 9.07548K wps
[Epoch 131 Batch 60/62] avg loss 0.00337404, throughput 9.16809K wps
Begin Testing...
[Epoch 131] train avg loss 0.00331316, dev acc 0.8319, dev avg loss 0.343344, throughput 9.13946K wps
[Epoch 132 Batch 30/62] avg loss 0.00336659, throughput 8.84554K wps
[Epoch 132 Batch 60/62] avg loss 0.00318008, throughput 9.03546K wps
Begin Testing...
[Epoch 132] train avg loss 0.00332969, dev acc 0.8319, dev avg loss 0.346539, throughput 8.97052K wps
[Epoch 133 Batch 30/62] avg loss 0.00331429, throughput 8.8761K wps
[Epoch 133 Batch 60/62] avg loss 0.00322098, throughput 9.10758K wps
Begin Testing...
[Epoch 133] train avg loss 0.00329677, dev acc 0.8378, dev avg loss 0.346538, throughput 9.02399K wps
[Epoch 134 Batch 30/62] avg loss 0.00313503, throughput 9.273K wps
[Epoch 134 Batch 60/62] avg loss 0.00304183, throughput 8.64744K wps
Begin Testing...
[Epoch 134] train avg loss 0.00310754, dev acc 0.8407, dev avg loss 0.346247, throughput 8.93262K wps
[Epoch 135 Batch 30/62] avg loss 0.00309997, throughput 8.74974K wps
[Epoch 135 Batch 60/62] avg loss 0.00302063, throughput 9.2065K wps
Begin Testing...
[Epoch 135] train avg loss 0.00320884, dev acc 0.8289, dev avg loss 0.342076, throughput 9.00642K wps
[Epoch 136 Batch 30/62] avg loss 0.00322585, throughput 9.07281K wps
[Epoch 136 Batch 60/62] avg loss 0.0030933, throughput 8.93089K wps
Begin Testing...
[Epoch 136] train avg loss 0.00319042, dev acc 0.8319, dev avg loss 0.340537, throughput 9.03492K wps
[Epoch 137 Batch 30/62] avg loss 0.00294234, throughput 9.0434K wps
[Epoch 137 Batch 60/62] avg loss 0.00303833, throughput 9.11376K wps
Begin Testing...
[Epoch 137] train avg loss 0.00304907, dev acc 0.8348, dev avg loss 0.340451, throughput 9.10689K wps
[Epoch 138 Batch 30/62] avg loss 0.00310732, throughput 8.96368K wps
[Epoch 138 Batch 60/62] avg loss 0.00307556, throughput 9.08825K wps
Begin Testing...
[Epoch 138] train avg loss 0.00308915, dev acc 0.8466, dev avg loss 0.350978, throughput 9.05759K wps
Observed Improvement.
Begin Testing...
[Epoch 139 Batch 30/62] avg loss 0.00305106, throughput 8.91017K wps
[Epoch 139 Batch 60/62] avg loss 0.00282471, throughput 9.09687K wps
Begin Testing...
[Epoch 139] train avg loss 0.00300681, dev acc 0.8289, dev avg loss 0.339391, throughput 9.02324K wps
[Epoch 140 Batch 30/62] avg loss 0.00317132, throughput 9.12939K wps
[Epoch 140 Batch 60/62] avg loss 0.00289278, throughput 8.50627K wps
Begin Testing...
[Epoch 140] train avg loss 0.0030511, dev acc 0.8437, dev avg loss 0.346451, throughput 8.80198K wps
[Epoch 141 Batch 30/62] avg loss 0.0030537, throughput 8.86392K wps
[Epoch 141 Batch 60/62] avg loss 0.00285895, throughput 9.1227K wps
Begin Testing...
[Epoch 141] train avg loss 0.00299521, dev acc 0.8348, dev avg loss 0.340203, throughput 8.99329K wps
[Epoch 142 Batch 30/62] avg loss 0.00290233, throughput 8.6602K wps
[Epoch 142 Batch 60/62] avg loss 0.0030162, throughput 9.09372K wps
Begin Testing...
[Epoch 142] train avg loss 0.00300214, dev acc 0.8378, dev avg loss 0.339925, throughput 8.90149K wps
[Epoch 143 Batch 30/62] avg loss 0.00299631, throughput 9.21145K wps
[Epoch 143 Batch 60/62] avg loss 0.00284045, throughput 9.01624K wps
Begin Testing...
[Epoch 143] train avg loss 0.00294818, dev acc 0.8466, dev avg loss 0.349732, throughput 9.14572K wps
Observed Improvement.
Begin Testing...
[Epoch 144 Batch 30/62] avg loss 0.00301802, throughput 8.72115K wps
[Epoch 144 Batch 60/62] avg loss 0.0026788, throughput 9.22617K wps
Begin Testing...
[Epoch 144] train avg loss 0.00286967, dev acc 0.8348, dev avg loss 0.341848, throughput 9.00198K wps
[Epoch 145 Batch 30/62] avg loss 0.00278344, throughput 9.02254K wps
[Epoch 145 Batch 60/62] avg loss 0.00284328, throughput 9.08539K wps
Begin Testing...
[Epoch 145] train avg loss 0.00284484, dev acc 0.8348, dev avg loss 0.341473, throughput 9.08335K wps
[Epoch 146 Batch 30/62] avg loss 0.0026617, throughput 9.09351K wps
[Epoch 146 Batch 60/62] avg loss 0.0029137, throughput 9.05816K wps
Begin Testing...
[Epoch 146] train avg loss 0.00283686, dev acc 0.8437, dev avg loss 0.346722, throughput 9.11229K wps
[Epoch 147 Batch 30/62] avg loss 0.00285514, throughput 9.3088K wps
[Epoch 147 Batch 60/62] avg loss 0.00270652, throughput 9.04726K wps
Begin Testing...
[Epoch 147] train avg loss 0.0028157, dev acc 0.8348, dev avg loss 0.342678, throughput 9.20295K wps
[Epoch 148 Batch 30/62] avg loss 0.00267712, throughput 9.12775K wps
[Epoch 148 Batch 60/62] avg loss 0.00269474, throughput 8.98285K wps
Begin Testing...
[Epoch 148] train avg loss 0.00275164, dev acc 0.8289, dev avg loss 0.338363, throughput 9.01149K wps
[Epoch 149 Batch 30/62] avg loss 0.00263934, throughput 9.0813K wps
[Epoch 149 Batch 60/62] avg loss 0.0027936, throughput 9.15006K wps
Begin Testing...
[Epoch 149] train avg loss 0.00280169, dev acc 0.8466, dev avg loss 0.339999, throughput 9.14121K wps
Observed Improvement.
Begin Testing...
[Epoch 150 Batch 30/62] avg loss 0.00291871, throughput 9.20415K wps
[Epoch 150 Batch 60/62] avg loss 0.00263413, throughput 9.13017K wps
Begin Testing...
[Epoch 150] train avg loss 0.00288392, dev acc 0.8378, dev avg loss 0.341258, throughput 9.19654K wps
[Epoch 151 Batch 30/62] avg loss 0.00257782, throughput 9.02718K wps
[Epoch 151 Batch 60/62] avg loss 0.00265198, throughput 8.89843K wps
Begin Testing...
[Epoch 151] train avg loss 0.00264953, dev acc 0.8407, dev avg loss 0.354789, throughput 8.94141K wps
[Epoch 152 Batch 30/62] avg loss 0.00259669, throughput 9.28844K wps
[Epoch 152 Batch 60/62] avg loss 0.00275454, throughput 9.13675K wps
Begin Testing...
[Epoch 152] train avg loss 0.00271768, dev acc 0.8407, dev avg loss 0.338718, throughput 9.23945K wps
[Epoch 153 Batch 30/62] avg loss 0.00264561, throughput 9.17775K wps
[Epoch 153 Batch 60/62] avg loss 0.00253495, throughput 8.96939K wps
Begin Testing...
[Epoch 153] train avg loss 0.00260525, dev acc 0.8319, dev avg loss 0.337026, throughput 9.10108K wps
[Epoch 154 Batch 30/62] avg loss 0.00257743, throughput 9.1652K wps
[Epoch 154 Batch 60/62] avg loss 0.00263132, throughput 9.07688K wps
Begin Testing...
[Epoch 154] train avg loss 0.00262224, dev acc 0.8466, dev avg loss 0.347727, throughput 9.14963K wps
Observed Improvement.
Begin Testing...
[Epoch 155 Batch 30/62] avg loss 0.00261711, throughput 8.9986K wps
[Epoch 155 Batch 60/62] avg loss 0.00257985, throughput 9.07937K wps
Begin Testing...
[Epoch 155] train avg loss 0.00263023, dev acc 0.8437, dev avg loss 0.347061, throughput 9.02049K wps
[Epoch 156 Batch 30/62] avg loss 0.00255334, throughput 8.83697K wps
[Epoch 156 Batch 60/62] avg loss 0.00242786, throughput 9.13101K wps
Begin Testing...
[Epoch 156] train avg loss 0.00252496, dev acc 0.8466, dev avg loss 0.337937, throughput 8.96212K wps
Observed Improvement.
Begin Testing...
[Epoch 157 Batch 30/62] avg loss 0.00263021, throughput 9.21664K wps
[Epoch 157 Batch 60/62] avg loss 0.00266399, throughput 8.97456K wps
Begin Testing...
[Epoch 157] train avg loss 0.00266916, dev acc 0.8319, dev avg loss 0.336356, throughput 9.12475K wps
[Epoch 158 Batch 30/62] avg loss 0.00232372, throughput 9.09187K wps
[Epoch 158 Batch 60/62] avg loss 0.00253377, throughput 9.06851K wps
Begin Testing...
[Epoch 158] train avg loss 0.00244985, dev acc 0.8466, dev avg loss 0.34479, throughput 9.11001K wps
Observed Improvement.
Begin Testing...
[Epoch 159 Batch 30/62] avg loss 0.00249882, throughput 8.81969K wps
[Epoch 159 Batch 60/62] avg loss 0.00246273, throughput 9.07185K wps
Begin Testing...
[Epoch 159] train avg loss 0.00252744, dev acc 0.8378, dev avg loss 0.336469, throughput 8.98011K wps
[Epoch 160 Batch 30/62] avg loss 0.00238445, throughput 8.96893K wps
[Epoch 160 Batch 60/62] avg loss 0.00250059, throughput 9.02993K wps
Begin Testing...
[Epoch 160] train avg loss 0.00245069, dev acc 0.8407, dev avg loss 0.340844, throughput 9.03382K wps
[Epoch 161 Batch 30/62] avg loss 0.00226284, throughput 9.08619K wps
[Epoch 161 Batch 60/62] avg loss 0.00259374, throughput 9.01593K wps
Begin Testing...
[Epoch 161] train avg loss 0.00246049, dev acc 0.8407, dev avg loss 0.340507, throughput 9.08171K wps
[Epoch 162 Batch 30/62] avg loss 0.00227335, throughput 8.92947K wps
[Epoch 162 Batch 60/62] avg loss 0.0024303, throughput 9.09729K wps
Begin Testing...
[Epoch 162] train avg loss 0.00240827, dev acc 0.8437, dev avg loss 0.337274, throughput 9.04991K wps
[Epoch 163 Batch 30/62] avg loss 0.00241276, throughput 9.01998K wps
[Epoch 163 Batch 60/62] avg loss 0.00249546, throughput 8.74781K wps
Begin Testing...
[Epoch 163] train avg loss 0.00251, dev acc 0.8437, dev avg loss 0.347872, throughput 8.84971K wps
[Epoch 164 Batch 30/62] avg loss 0.00224551, throughput 9.07855K wps
[Epoch 164 Batch 60/62] avg loss 0.00244826, throughput 8.86302K wps
Begin Testing...
[Epoch 164] train avg loss 0.00237606, dev acc 0.8378, dev avg loss 0.341782, throughput 8.97974K wps
[Epoch 165 Batch 30/62] avg loss 0.00238723, throughput 9.01599K wps
[Epoch 165 Batch 60/62] avg loss 0.00233817, throughput 9.09881K wps
Begin Testing...
[Epoch 165] train avg loss 0.00240469, dev acc 0.8437, dev avg loss 0.347132, throughput 9.0885K wps
[Epoch 166 Batch 30/62] avg loss 0.00225595, throughput 8.80129K wps
[Epoch 166 Batch 60/62] avg loss 0.00216735, throughput 9.05337K wps
Begin Testing...
[Epoch 166] train avg loss 0.00225836, dev acc 0.8407, dev avg loss 0.340175, throughput 8.9595K wps
[Epoch 167 Batch 30/62] avg loss 0.00223152, throughput 8.88285K wps
[Epoch 167 Batch 60/62] avg loss 0.00230865, throughput 8.8221K wps
Begin Testing...
[Epoch 167] train avg loss 0.00229585, dev acc 0.8407, dev avg loss 0.337349, throughput 8.83849K wps
[Epoch 168 Batch 30/62] avg loss 0.00225276, throughput 8.95173K wps
[Epoch 168 Batch 60/62] avg loss 0.00233294, throughput 9.18197K wps
Begin Testing...
[Epoch 168] train avg loss 0.00231475, dev acc 0.8407, dev avg loss 0.342716, throughput 9.07176K wps
[Epoch 169 Batch 30/62] avg loss 0.00226728, throughput 9.12546K wps
[Epoch 169 Batch 60/62] avg loss 0.00223574, throughput 9.05783K wps
Begin Testing...
[Epoch 169] train avg loss 0.00228085, dev acc 0.8407, dev avg loss 0.344311, throughput 9.1193K wps
[Epoch 170 Batch 30/62] avg loss 0.0021914, throughput 9.20211K wps
[Epoch 170 Batch 60/62] avg loss 0.0022677, throughput 9.0415K wps
Begin Testing...
[Epoch 170] train avg loss 0.0022607, dev acc 0.8407, dev avg loss 0.354327, throughput 9.1488K wps
[Epoch 171 Batch 30/62] avg loss 0.00225871, throughput 9.15158K wps
[Epoch 171 Batch 60/62] avg loss 0.00216477, throughput 8.96611K wps
Begin Testing...
[Epoch 171] train avg loss 0.00227025, dev acc 0.8407, dev avg loss 0.33667, throughput 9.04102K wps
[Epoch 172 Batch 30/62] avg loss 0.00222716, throughput 9.2332K wps
[Epoch 172 Batch 60/62] avg loss 0.00218259, throughput 9.10924K wps
Begin Testing...
[Epoch 172] train avg loss 0.00223988, dev acc 0.8319, dev avg loss 0.335795, throughput 9.19749K wps
[Epoch 173 Batch 30/62] avg loss 0.00219348, throughput 8.79468K wps
[Epoch 173 Batch 60/62] avg loss 0.00218644, throughput 8.85998K wps
Begin Testing...
[Epoch 173] train avg loss 0.00221128, dev acc 0.8407, dev avg loss 0.340362, throughput 8.81129K wps
[Epoch 174 Batch 30/62] avg loss 0.0021985, throughput 9.13799K wps
[Epoch 174 Batch 60/62] avg loss 0.00216496, throughput 8.85344K wps
Begin Testing...
[Epoch 174] train avg loss 0.00222368, dev acc 0.8437, dev avg loss 0.345065, throughput 8.97083K wps
[Epoch 175 Batch 30/62] avg loss 0.00214489, throughput 9.17486K wps
[Epoch 175 Batch 60/62] avg loss 0.00207272, throughput 9.01513K wps
Begin Testing...
[Epoch 175] train avg loss 0.00214317, dev acc 0.8466, dev avg loss 0.336333, throughput 9.11667K wps
Observed Improvement.
Begin Testing...
[Epoch 176 Batch 30/62] avg loss 0.00220455, throughput 8.86707K wps
[Epoch 176 Batch 60/62] avg loss 0.0020756, throughput 9.00128K wps
Begin Testing...
[Epoch 176] train avg loss 0.00213574, dev acc 0.8407, dev avg loss 0.340675, throughput 8.89682K wps
[Epoch 177 Batch 30/62] avg loss 0.00203908, throughput 9.03115K wps
[Epoch 177 Batch 60/62] avg loss 0.00196996, throughput 8.81323K wps
Begin Testing...
[Epoch 177] train avg loss 0.00201673, dev acc 0.8407, dev avg loss 0.335901, throughput 8.98208K wps
[Epoch 178 Batch 30/62] avg loss 0.00202572, throughput 9.26795K wps
[Epoch 178 Batch 60/62] avg loss 0.0021618, throughput 9.09562K wps
Begin Testing...
[Epoch 178] train avg loss 0.00212264, dev acc 0.8348, dev avg loss 0.362489, throughput 9.20257K wps
[Epoch 179 Batch 30/62] avg loss 0.0021771, throughput 9.24237K wps
[Epoch 179 Batch 60/62] avg loss 0.00215742, throughput 8.92611K wps
Begin Testing...
[Epoch 179] train avg loss 0.00218282, dev acc 0.8466, dev avg loss 0.338839, throughput 9.07367K wps
Observed Improvement.
Begin Testing...
[Epoch 180 Batch 30/62] avg loss 0.00201552, throughput 8.82306K wps
[Epoch 180 Batch 60/62] avg loss 0.00216094, throughput 8.55856K wps
Begin Testing...
[Epoch 180] train avg loss 0.00212008, dev acc 0.8466, dev avg loss 0.353116, throughput 8.6772K wps
Observed Improvement.
Begin Testing...
[Epoch 181 Batch 30/62] avg loss 0.00206961, throughput 8.98848K wps
[Epoch 181 Batch 60/62] avg loss 0.00197383, throughput 9.00882K wps
Begin Testing...
[Epoch 181] train avg loss 0.00205406, dev acc 0.8496, dev avg loss 0.347702, throughput 9.02911K wps
Observed Improvement.
Begin Testing...
[Epoch 182 Batch 30/62] avg loss 0.00205716, throughput 9.07901K wps
[Epoch 182 Batch 60/62] avg loss 0.00200492, throughput 9.03199K wps
Begin Testing...
[Epoch 182] train avg loss 0.00203158, dev acc 0.8466, dev avg loss 0.339381, throughput 9.08573K wps
[Epoch 183 Batch 30/62] avg loss 0.00199312, throughput 9.16686K wps
[Epoch 183 Batch 60/62] avg loss 0.00197854, throughput 9.03358K wps
Begin Testing...
[Epoch 183] train avg loss 0.00202475, dev acc 0.8437, dev avg loss 0.343715, throughput 9.13086K wps
[Epoch 184 Batch 30/62] avg loss 0.00196027, throughput 9.11016K wps
[Epoch 184 Batch 60/62] avg loss 0.0020647, throughput 9.12749K wps
Begin Testing...
[Epoch 184] train avg loss 0.0020365, dev acc 0.8525, dev avg loss 0.348846, throughput 9.1498K wps
Observed Improvement.
Begin Testing...
[Epoch 185 Batch 30/62] avg loss 0.00196135, throughput 9.05966K wps
[Epoch 185 Batch 60/62] avg loss 0.00198564, throughput 9.03872K wps
Begin Testing...
[Epoch 185] train avg loss 0.00198908, dev acc 0.8466, dev avg loss 0.340833, throughput 9.09952K wps
[Epoch 186 Batch 30/62] avg loss 0.00198789, throughput 9.25769K wps
[Epoch 186 Batch 60/62] avg loss 0.00192559, throughput 8.94345K wps
Begin Testing...
[Epoch 186] train avg loss 0.00200395, dev acc 0.8348, dev avg loss 0.357299, throughput 9.06504K wps
[Epoch 187 Batch 30/62] avg loss 0.00178949, throughput 9.15658K wps
[Epoch 187 Batch 60/62] avg loss 0.00204953, throughput 8.97132K wps
Begin Testing...
[Epoch 187] train avg loss 0.00192395, dev acc 0.8437, dev avg loss 0.344433, throughput 9.09354K wps
[Epoch 188 Batch 30/62] avg loss 0.00194225, throughput 8.90155K wps
[Epoch 188 Batch 60/62] avg loss 0.00200584, throughput 8.56665K wps
Begin Testing...
[Epoch 188] train avg loss 0.00199033, dev acc 0.8437, dev avg loss 0.34257, throughput 8.7633K wps
[Epoch 189 Batch 30/62] avg loss 0.00181668, throughput 9.12783K wps
[Epoch 189 Batch 60/62] avg loss 0.00189868, throughput 8.90609K wps
Begin Testing...
[Epoch 189] train avg loss 0.00188368, dev acc 0.8466, dev avg loss 0.338706, throughput 9.04735K wps
[Epoch 190 Batch 30/62] avg loss 0.00179382, throughput 9.08399K wps
[Epoch 190 Batch 60/62] avg loss 0.0019288, throughput 8.85648K wps
Begin Testing...
[Epoch 190] train avg loss 0.00189245, dev acc 0.8496, dev avg loss 0.336283, throughput 8.94565K wps
[Epoch 191 Batch 30/62] avg loss 0.0018485, throughput 9.1996K wps
[Epoch 191 Batch 60/62] avg loss 0.00191301, throughput 8.93307K wps
Begin Testing...
[Epoch 191] train avg loss 0.00188831, dev acc 0.8496, dev avg loss 0.339001, throughput 9.1143K wps
[Epoch 192 Batch 30/62] avg loss 0.00182122, throughput 9.08226K wps
[Epoch 192 Batch 60/62] avg loss 0.00180172, throughput 8.80605K wps
Begin Testing...
[Epoch 192] train avg loss 0.00184312, dev acc 0.8437, dev avg loss 0.344838, throughput 8.97658K wps
[Epoch 193 Batch 30/62] avg loss 0.00186689, throughput 9.18007K wps
[Epoch 193 Batch 60/62] avg loss 0.00176919, throughput 8.79902K wps
Begin Testing...
[Epoch 193] train avg loss 0.00184223, dev acc 0.8466, dev avg loss 0.347023, throughput 8.96205K wps
[Epoch 194 Batch 30/62] avg loss 0.00179645, throughput 8.76005K wps
[Epoch 194 Batch 60/62] avg loss 0.00187494, throughput 9.19254K wps
Begin Testing...
[Epoch 194] train avg loss 0.00192386, dev acc 0.8496, dev avg loss 0.337444, throughput 9.00312K wps
[Epoch 195 Batch 30/62] avg loss 0.00174098, throughput 9.09248K wps
[Epoch 195 Batch 60/62] avg loss 0.0017462, throughput 9.01329K wps
Begin Testing...
[Epoch 195] train avg loss 0.00175838, dev acc 0.8466, dev avg loss 0.347724, throughput 9.08337K wps
[Epoch 196 Batch 30/62] avg loss 0.0018407, throughput 9.03431K wps
[Epoch 196 Batch 60/62] avg loss 0.00174596, throughput 8.88008K wps
Begin Testing...
[Epoch 196] train avg loss 0.00180454, dev acc 0.8496, dev avg loss 0.339631, throughput 8.94706K wps
[Epoch 197 Batch 30/62] avg loss 0.00171026, throughput 9.19819K wps
[Epoch 197 Batch 60/62] avg loss 0.00180913, throughput 8.8485K wps
Begin Testing...
[Epoch 197] train avg loss 0.00181191, dev acc 0.8319, dev avg loss 0.368469, throughput 9.00287K wps
[Epoch 198 Batch 30/62] avg loss 0.00176073, throughput 9.11862K wps
[Epoch 198 Batch 60/62] avg loss 0.00171634, throughput 8.88402K wps
Begin Testing...
[Epoch 198] train avg loss 0.00174871, dev acc 0.8466, dev avg loss 0.352894, throughput 9.03126K wps
[Epoch 199 Batch 30/62] avg loss 0.0017788, throughput 9.00611K wps
[Epoch 199 Batch 60/62] avg loss 0.00168174, throughput 8.71902K wps
Begin Testing...
[Epoch 199] train avg loss 0.00178074, dev acc 0.8466, dev avg loss 0.337593, throughput 8.89407K wps
Test loss 0.368299, test acc 0.8143
Total time cost 155.84s
[Epoch 0 Batch 30/62] avg loss 0.0136328, throughput 8.22858K wps
[Epoch 0 Batch 60/62] avg loss 0.0131115, throughput 9.0235K wps
Begin Testing...
[Epoch 0] train avg loss 0.0135302, dev acc 0.6254, dev avg loss 0.663171, throughput 8.65639K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0130239, throughput 9.30222K wps
[Epoch 1 Batch 60/62] avg loss 0.0133158, throughput 9.13792K wps
Begin Testing...
[Epoch 1] train avg loss 0.0133352, dev acc 0.6254, dev avg loss 0.657406, throughput 9.24849K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0130798, throughput 9.11708K wps
[Epoch 2 Batch 60/62] avg loss 0.0130078, throughput 8.83944K wps
Begin Testing...
[Epoch 2] train avg loss 0.013258, dev acc 0.6254, dev avg loss 0.652956, throughput 8.99837K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0127757, throughput 9.0374K wps
[Epoch 3 Batch 60/62] avg loss 0.0129751, throughput 9.00788K wps
Begin Testing...
[Epoch 3] train avg loss 0.0130485, dev acc 0.6254, dev avg loss 0.648033, throughput 9.05679K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0128826, throughput 8.81179K wps
[Epoch 4 Batch 60/62] avg loss 0.0126724, throughput 9.09578K wps
Begin Testing...
[Epoch 4] train avg loss 0.0129361, dev acc 0.6254, dev avg loss 0.643695, throughput 8.93446K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0128013, throughput 9.12908K wps
[Epoch 5 Batch 60/62] avg loss 0.0127141, throughput 8.73358K wps
Begin Testing...
[Epoch 5] train avg loss 0.0129517, dev acc 0.6254, dev avg loss 0.639279, throughput 8.91103K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0124382, throughput 9.15215K wps
[Epoch 6 Batch 60/62] avg loss 0.0126861, throughput 9.04294K wps
Begin Testing...
[Epoch 6] train avg loss 0.0127116, dev acc 0.6254, dev avg loss 0.634175, throughput 9.12752K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0126126, throughput 9.19448K wps
[Epoch 7 Batch 60/62] avg loss 0.012463, throughput 9.08318K wps
Begin Testing...
[Epoch 7] train avg loss 0.0126884, dev acc 0.6254, dev avg loss 0.631601, throughput 9.13098K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0124368, throughput 8.99204K wps
[Epoch 8 Batch 60/62] avg loss 0.0125757, throughput 9.17987K wps
Begin Testing...
[Epoch 8] train avg loss 0.0126765, dev acc 0.6254, dev avg loss 0.625107, throughput 9.0653K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0122653, throughput 8.97825K wps
[Epoch 9 Batch 60/62] avg loss 0.0123805, throughput 8.56323K wps
Begin Testing...
[Epoch 9] train avg loss 0.0124944, dev acc 0.6254, dev avg loss 0.619844, throughput 8.7444K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.0121578, throughput 8.71941K wps
[Epoch 10 Batch 60/62] avg loss 0.0124273, throughput 9.08362K wps
Begin Testing...
[Epoch 10] train avg loss 0.0124256, dev acc 0.6254, dev avg loss 0.615619, throughput 8.93242K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0122255, throughput 9.22503K wps
[Epoch 11 Batch 60/62] avg loss 0.0119896, throughput 9.08635K wps
Begin Testing...
[Epoch 11] train avg loss 0.0122159, dev acc 0.6254, dev avg loss 0.612145, throughput 9.18546K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0119478, throughput 8.9703K wps
[Epoch 12 Batch 60/62] avg loss 0.0120524, throughput 8.74097K wps
Begin Testing...
[Epoch 12] train avg loss 0.0121495, dev acc 0.6254, dev avg loss 0.60488, throughput 8.88999K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0121033, throughput 9.22975K wps
[Epoch 13 Batch 60/62] avg loss 0.0117445, throughput 8.7853K wps
Begin Testing...
[Epoch 13] train avg loss 0.0121273, dev acc 0.6283, dev avg loss 0.598943, throughput 8.96965K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.011978, throughput 8.99757K wps
[Epoch 14 Batch 60/62] avg loss 0.0116692, throughput 9.09527K wps
Begin Testing...
[Epoch 14] train avg loss 0.0119719, dev acc 0.6283, dev avg loss 0.593686, throughput 9.05377K wps
Observed Improvement.
Begin Testing...
[Epoch 15 Batch 30/62] avg loss 0.0118371, throughput 9.10798K wps
[Epoch 15 Batch 60/62] avg loss 0.011456, throughput 8.67842K wps
Begin Testing...
[Epoch 15] train avg loss 0.0117392, dev acc 0.6254, dev avg loss 0.591204, throughput 8.96044K wps
[Epoch 16 Batch 30/62] avg loss 0.0116537, throughput 9.16814K wps
[Epoch 16 Batch 60/62] avg loss 0.0113116, throughput 9.08261K wps
Begin Testing...
[Epoch 16] train avg loss 0.0116547, dev acc 0.6667, dev avg loss 0.58053, throughput 9.1547K wps
Observed Improvement.
Begin Testing...
[Epoch 17 Batch 30/62] avg loss 0.0113985, throughput 8.91335K wps
[Epoch 17 Batch 60/62] avg loss 0.0114107, throughput 9.12613K wps
Begin Testing...
[Epoch 17] train avg loss 0.0115374, dev acc 0.6578, dev avg loss 0.574691, throughput 9.05143K wps
[Epoch 18 Batch 30/62] avg loss 0.0114437, throughput 9.16163K wps
[Epoch 18 Batch 60/62] avg loss 0.0112356, throughput 8.72919K wps
Begin Testing...
[Epoch 18] train avg loss 0.0115052, dev acc 0.6696, dev avg loss 0.568687, throughput 8.95155K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0113048, throughput 8.80779K wps
[Epoch 19 Batch 60/62] avg loss 0.0111572, throughput 8.95453K wps
Begin Testing...
[Epoch 19] train avg loss 0.0113765, dev acc 0.6667, dev avg loss 0.563629, throughput 8.85606K wps
[Epoch 20 Batch 30/62] avg loss 0.0109777, throughput 9.22499K wps
[Epoch 20 Batch 60/62] avg loss 0.011039, throughput 8.75828K wps
Begin Testing...
[Epoch 20] train avg loss 0.0111431, dev acc 0.6755, dev avg loss 0.555386, throughput 8.95634K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.0110732, throughput 9.26524K wps
[Epoch 21 Batch 60/62] avg loss 0.0108945, throughput 9.02815K wps
Begin Testing...
[Epoch 21] train avg loss 0.0111097, dev acc 0.6755, dev avg loss 0.549011, throughput 9.15465K wps
Observed Improvement.
Begin Testing...
[Epoch 22 Batch 30/62] avg loss 0.0106881, throughput 8.85171K wps
[Epoch 22 Batch 60/62] avg loss 0.0109782, throughput 9.02974K wps
Begin Testing...
[Epoch 22] train avg loss 0.0109716, dev acc 0.7021, dev avg loss 0.542735, throughput 8.97923K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.0105794, throughput 9.13783K wps
[Epoch 23 Batch 60/62] avg loss 0.0107781, throughput 9.17146K wps
Begin Testing...
[Epoch 23] train avg loss 0.0108151, dev acc 0.6814, dev avg loss 0.537173, throughput 9.18546K wps
[Epoch 24 Batch 30/62] avg loss 0.0107513, throughput 9.04415K wps
[Epoch 24 Batch 60/62] avg loss 0.0104695, throughput 8.72639K wps
Begin Testing...
[Epoch 24] train avg loss 0.010676, dev acc 0.6814, dev avg loss 0.533988, throughput 8.86453K wps
[Epoch 25 Batch 30/62] avg loss 0.0104023, throughput 8.8475K wps
[Epoch 25 Batch 60/62] avg loss 0.0102806, throughput 9.03308K wps
Begin Testing...
[Epoch 25] train avg loss 0.0104495, dev acc 0.7021, dev avg loss 0.524488, throughput 8.97472K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.0102921, throughput 8.94864K wps
[Epoch 26 Batch 60/62] avg loss 0.0102537, throughput 9.088K wps
Begin Testing...
[Epoch 26] train avg loss 0.0103918, dev acc 0.6844, dev avg loss 0.521078, throughput 8.97506K wps
[Epoch 27 Batch 30/62] avg loss 0.0102633, throughput 9.03782K wps
[Epoch 27 Batch 60/62] avg loss 0.0100035, throughput 9.16244K wps
Begin Testing...
[Epoch 27] train avg loss 0.0102772, dev acc 0.6844, dev avg loss 0.515522, throughput 9.12865K wps
[Epoch 28 Batch 30/62] avg loss 0.0101077, throughput 8.83166K wps
[Epoch 28 Batch 60/62] avg loss 0.0099157, throughput 9.09947K wps
Begin Testing...
[Epoch 28] train avg loss 0.010199, dev acc 0.7316, dev avg loss 0.50744, throughput 8.994K wps
Observed Improvement.
Begin Testing...
[Epoch 29 Batch 30/62] avg loss 0.00993701, throughput 8.92699K wps
[Epoch 29 Batch 60/62] avg loss 0.00988887, throughput 9.20553K wps
Begin Testing...
[Epoch 29] train avg loss 0.0101038, dev acc 0.7345, dev avg loss 0.503028, throughput 9.09084K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00976211, throughput 9.02075K wps
[Epoch 30 Batch 60/62] avg loss 0.00977113, throughput 8.82242K wps
Begin Testing...
[Epoch 30] train avg loss 0.00998921, dev acc 0.7640, dev avg loss 0.496352, throughput 8.90295K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00967761, throughput 9.17339K wps
[Epoch 31 Batch 60/62] avg loss 0.00970636, throughput 8.97094K wps
Begin Testing...
[Epoch 31] train avg loss 0.00982396, dev acc 0.7729, dev avg loss 0.491757, throughput 9.1045K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/62] avg loss 0.00990996, throughput 8.97455K wps
[Epoch 32 Batch 60/62] avg loss 0.00937076, throughput 8.82142K wps
Begin Testing...
[Epoch 32] train avg loss 0.00983011, dev acc 0.7611, dev avg loss 0.487132, throughput 8.88972K wps
[Epoch 33 Batch 30/62] avg loss 0.00962463, throughput 9.21588K wps
[Epoch 33 Batch 60/62] avg loss 0.00920424, throughput 9.13207K wps
Begin Testing...
[Epoch 33] train avg loss 0.00950208, dev acc 0.7345, dev avg loss 0.48652, throughput 9.20348K wps
[Epoch 34 Batch 30/62] avg loss 0.00942144, throughput 9.06209K wps
[Epoch 34 Batch 60/62] avg loss 0.00948356, throughput 8.95003K wps
Begin Testing...
[Epoch 34] train avg loss 0.00955586, dev acc 0.7611, dev avg loss 0.479194, throughput 8.9855K wps
[Epoch 35 Batch 30/62] avg loss 0.00917677, throughput 8.88251K wps
[Epoch 35 Batch 60/62] avg loss 0.00919138, throughput 9.05088K wps
Begin Testing...
[Epoch 35] train avg loss 0.00937117, dev acc 0.7670, dev avg loss 0.474371, throughput 8.94755K wps
[Epoch 36 Batch 30/62] avg loss 0.00908051, throughput 9.22353K wps
[Epoch 36 Batch 60/62] avg loss 0.00916192, throughput 8.7984K wps
Begin Testing...
[Epoch 36] train avg loss 0.00927219, dev acc 0.7699, dev avg loss 0.471392, throughput 9.02917K wps
[Epoch 37 Batch 30/62] avg loss 0.00890883, throughput 9.26796K wps
[Epoch 37 Batch 60/62] avg loss 0.00914316, throughput 9.12816K wps
Begin Testing...
[Epoch 37] train avg loss 0.00908192, dev acc 0.7729, dev avg loss 0.467833, throughput 9.20855K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/62] avg loss 0.00896568, throughput 8.96085K wps
[Epoch 38 Batch 60/62] avg loss 0.00873255, throughput 8.96713K wps
Begin Testing...
[Epoch 38] train avg loss 0.00897665, dev acc 0.7699, dev avg loss 0.462886, throughput 8.9986K wps
[Epoch 39 Batch 30/62] avg loss 0.00879706, throughput 9.19293K wps
[Epoch 39 Batch 60/62] avg loss 0.00897729, throughput 9.08923K wps
Begin Testing...
[Epoch 39] train avg loss 0.00902332, dev acc 0.7758, dev avg loss 0.459219, throughput 9.17024K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/62] avg loss 0.00877282, throughput 8.8797K wps
[Epoch 40 Batch 60/62] avg loss 0.00882634, throughput 9.0701K wps
Begin Testing...
[Epoch 40] train avg loss 0.0090086, dev acc 0.7847, dev avg loss 0.456336, throughput 9.00398K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.00885857, throughput 9.10814K wps
[Epoch 41 Batch 60/62] avg loss 0.00833636, throughput 8.84995K wps
Begin Testing...
[Epoch 41] train avg loss 0.00877863, dev acc 0.7788, dev avg loss 0.453478, throughput 9.01219K wps
[Epoch 42 Batch 30/62] avg loss 0.00848883, throughput 8.86941K wps
[Epoch 42 Batch 60/62] avg loss 0.00853936, throughput 8.88615K wps
Begin Testing...
[Epoch 42] train avg loss 0.00864274, dev acc 0.7906, dev avg loss 0.450586, throughput 8.91197K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/62] avg loss 0.00825694, throughput 9.17121K wps
[Epoch 43 Batch 60/62] avg loss 0.00878631, throughput 9.02721K wps
Begin Testing...
[Epoch 43] train avg loss 0.00861186, dev acc 0.7847, dev avg loss 0.44755, throughput 9.12656K wps
[Epoch 44 Batch 30/62] avg loss 0.0081197, throughput 9.11208K wps
[Epoch 44 Batch 60/62] avg loss 0.00843945, throughput 9.09486K wps
Begin Testing...
[Epoch 44] train avg loss 0.0083816, dev acc 0.7847, dev avg loss 0.445139, throughput 9.13118K wps
[Epoch 45 Batch 30/62] avg loss 0.0083521, throughput 9.10259K wps
[Epoch 45 Batch 60/62] avg loss 0.00825839, throughput 8.9692K wps
Begin Testing...
[Epoch 45] train avg loss 0.00836073, dev acc 0.7906, dev avg loss 0.44233, throughput 9.01636K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/62] avg loss 0.00827285, throughput 9.15405K wps
[Epoch 46 Batch 60/62] avg loss 0.00802702, throughput 8.71901K wps
Begin Testing...
[Epoch 46] train avg loss 0.0082255, dev acc 0.7847, dev avg loss 0.440746, throughput 8.89665K wps
[Epoch 47 Batch 30/62] avg loss 0.00812331, throughput 8.92979K wps
[Epoch 47 Batch 60/62] avg loss 0.00798371, throughput 9.24517K wps
Begin Testing...
[Epoch 47] train avg loss 0.00816436, dev acc 0.7876, dev avg loss 0.438487, throughput 9.07286K wps
[Epoch 48 Batch 30/62] avg loss 0.0080216, throughput 9.29954K wps
[Epoch 48 Batch 60/62] avg loss 0.00801904, throughput 9.04538K wps
Begin Testing...
[Epoch 48] train avg loss 0.00814031, dev acc 0.7965, dev avg loss 0.435142, throughput 9.13186K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/62] avg loss 0.00792067, throughput 9.23861K wps
[Epoch 49 Batch 60/62] avg loss 0.00792486, throughput 8.86146K wps
Begin Testing...
[Epoch 49] train avg loss 0.00808935, dev acc 0.7965, dev avg loss 0.433712, throughput 9.06872K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.00784451, throughput 9.07222K wps
[Epoch 50 Batch 60/62] avg loss 0.00800082, throughput 8.7984K wps
Begin Testing...
[Epoch 50] train avg loss 0.00804001, dev acc 0.7817, dev avg loss 0.433848, throughput 8.96886K wps
[Epoch 51 Batch 30/62] avg loss 0.00803324, throughput 8.6848K wps
[Epoch 51 Batch 60/62] avg loss 0.00752392, throughput 9.15741K wps
Begin Testing...
[Epoch 51] train avg loss 0.00782483, dev acc 0.7994, dev avg loss 0.43026, throughput 8.90644K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00776276, throughput 9.11457K wps
[Epoch 52 Batch 60/62] avg loss 0.00762901, throughput 9.0212K wps
Begin Testing...
[Epoch 52] train avg loss 0.00778919, dev acc 0.7965, dev avg loss 0.42862, throughput 9.09146K wps
[Epoch 53 Batch 30/62] avg loss 0.00756719, throughput 8.90842K wps
[Epoch 53 Batch 60/62] avg loss 0.00779115, throughput 8.87519K wps
Begin Testing...
[Epoch 53] train avg loss 0.00777381, dev acc 0.8083, dev avg loss 0.426567, throughput 8.92771K wps
Observed Improvement.
Begin Testing...
[Epoch 54 Batch 30/62] avg loss 0.00773794, throughput 9.25315K wps
[Epoch 54 Batch 60/62] avg loss 0.00741894, throughput 9.05761K wps
Begin Testing...
[Epoch 54] train avg loss 0.00767355, dev acc 0.7965, dev avg loss 0.424613, throughput 9.1774K wps
[Epoch 55 Batch 30/62] avg loss 0.00727366, throughput 9.13237K wps
[Epoch 55 Batch 60/62] avg loss 0.00768577, throughput 9.01812K wps
Begin Testing...
[Epoch 55] train avg loss 0.00755537, dev acc 0.7965, dev avg loss 0.422074, throughput 9.09168K wps
[Epoch 56 Batch 30/62] avg loss 0.00751136, throughput 9.06938K wps
[Epoch 56 Batch 60/62] avg loss 0.0072694, throughput 9.03543K wps
Begin Testing...
[Epoch 56] train avg loss 0.00744633, dev acc 0.7935, dev avg loss 0.420436, throughput 9.07988K wps
[Epoch 57 Batch 30/62] avg loss 0.00734448, throughput 9.25255K wps
[Epoch 57 Batch 60/62] avg loss 0.0072255, throughput 8.81133K wps
Begin Testing...
[Epoch 57] train avg loss 0.0073243, dev acc 0.7994, dev avg loss 0.42015, throughput 9.06061K wps
[Epoch 58 Batch 30/62] avg loss 0.00724954, throughput 9.23597K wps
[Epoch 58 Batch 60/62] avg loss 0.00718177, throughput 9.03739K wps
Begin Testing...
[Epoch 58] train avg loss 0.00731448, dev acc 0.7965, dev avg loss 0.418265, throughput 9.16683K wps
[Epoch 59 Batch 30/62] avg loss 0.00695976, throughput 9.2822K wps
[Epoch 59 Batch 60/62] avg loss 0.0072812, throughput 9.08259K wps
Begin Testing...
[Epoch 59] train avg loss 0.00718014, dev acc 0.8083, dev avg loss 0.415689, throughput 9.202K wps
Observed Improvement.
Begin Testing...
[Epoch 60 Batch 30/62] avg loss 0.00700171, throughput 9.06316K wps
[Epoch 60 Batch 60/62] avg loss 0.00730277, throughput 9.06445K wps
Begin Testing...
[Epoch 60] train avg loss 0.00729117, dev acc 0.7965, dev avg loss 0.418558, throughput 9.08782K wps
[Epoch 61 Batch 30/62] avg loss 0.00710705, throughput 9.23598K wps
[Epoch 61 Batch 60/62] avg loss 0.00706125, throughput 9.15527K wps
Begin Testing...
[Epoch 61] train avg loss 0.00716318, dev acc 0.7994, dev avg loss 0.413646, throughput 9.16719K wps
[Epoch 62 Batch 30/62] avg loss 0.00692872, throughput 8.95193K wps
[Epoch 62 Batch 60/62] avg loss 0.00680014, throughput 8.88681K wps
Begin Testing...
[Epoch 62] train avg loss 0.00705917, dev acc 0.8142, dev avg loss 0.411658, throughput 8.95246K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/62] avg loss 0.00691219, throughput 8.76216K wps
[Epoch 63 Batch 60/62] avg loss 0.00674897, throughput 8.99857K wps
Begin Testing...
[Epoch 63] train avg loss 0.00696739, dev acc 0.8201, dev avg loss 0.413156, throughput 8.91547K wps
Observed Improvement.
Begin Testing...
[Epoch 64 Batch 30/62] avg loss 0.00671085, throughput 9.08047K wps
[Epoch 64 Batch 60/62] avg loss 0.00687222, throughput 8.71424K wps
Begin Testing...
[Epoch 64] train avg loss 0.00686046, dev acc 0.8171, dev avg loss 0.409372, throughput 8.89879K wps
[Epoch 65 Batch 30/62] avg loss 0.00666678, throughput 8.89837K wps
[Epoch 65 Batch 60/62] avg loss 0.00660849, throughput 8.95841K wps
Begin Testing...
[Epoch 65] train avg loss 0.00675619, dev acc 0.8201, dev avg loss 0.409194, throughput 8.9663K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/62] avg loss 0.0067596, throughput 8.97637K wps
[Epoch 66 Batch 60/62] avg loss 0.0067423, throughput 9.07003K wps
Begin Testing...
[Epoch 66] train avg loss 0.00693733, dev acc 0.7965, dev avg loss 0.409123, throughput 9.04351K wps
[Epoch 67 Batch 30/62] avg loss 0.00674431, throughput 9.04777K wps
[Epoch 67 Batch 60/62] avg loss 0.00643335, throughput 9.06842K wps
Begin Testing...
[Epoch 67] train avg loss 0.00667999, dev acc 0.7965, dev avg loss 0.408978, throughput 9.08168K wps
[Epoch 68 Batch 30/62] avg loss 0.0065159, throughput 9.27939K wps
[Epoch 68 Batch 60/62] avg loss 0.00660666, throughput 8.98664K wps
Begin Testing...
[Epoch 68] train avg loss 0.00664749, dev acc 0.8230, dev avg loss 0.404896, throughput 9.0879K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/62] avg loss 0.00671887, throughput 9.1297K wps
[Epoch 69 Batch 60/62] avg loss 0.00614632, throughput 9.09558K wps
Begin Testing...
[Epoch 69] train avg loss 0.00656097, dev acc 0.8201, dev avg loss 0.403897, throughput 9.075K wps
[Epoch 70 Batch 30/62] avg loss 0.00616338, throughput 8.93017K wps
[Epoch 70 Batch 60/62] avg loss 0.00664068, throughput 8.92976K wps
Begin Testing...
[Epoch 70] train avg loss 0.00645626, dev acc 0.8201, dev avg loss 0.402601, throughput 8.90718K wps
[Epoch 71 Batch 30/62] avg loss 0.00625665, throughput 9.11956K wps
[Epoch 71 Batch 60/62] avg loss 0.00639454, throughput 8.63248K wps
Begin Testing...
[Epoch 71] train avg loss 0.00645094, dev acc 0.8201, dev avg loss 0.401291, throughput 8.9031K wps
[Epoch 72 Batch 30/62] avg loss 0.0063296, throughput 8.93764K wps
[Epoch 72 Batch 60/62] avg loss 0.00629473, throughput 9.10511K wps
Begin Testing...
[Epoch 72] train avg loss 0.00637585, dev acc 0.8083, dev avg loss 0.403027, throughput 9.0518K wps
[Epoch 73 Batch 30/62] avg loss 0.00598838, throughput 9.17423K wps
[Epoch 73 Batch 60/62] avg loss 0.00625499, throughput 9.0908K wps
Begin Testing...
[Epoch 73] train avg loss 0.00619441, dev acc 0.8112, dev avg loss 0.403296, throughput 9.15513K wps
[Epoch 74 Batch 30/62] avg loss 0.0062505, throughput 9.33471K wps
[Epoch 74 Batch 60/62] avg loss 0.00606887, throughput 9.10722K wps
Begin Testing...
[Epoch 74] train avg loss 0.00619534, dev acc 0.8171, dev avg loss 0.40074, throughput 9.18371K wps
[Epoch 75 Batch 30/62] avg loss 0.00586628, throughput 9.24571K wps
[Epoch 75 Batch 60/62] avg loss 0.00594895, throughput 9.08136K wps
Begin Testing...
[Epoch 75] train avg loss 0.00599046, dev acc 0.8171, dev avg loss 0.398517, throughput 9.121K wps
[Epoch 76 Batch 30/62] avg loss 0.00583936, throughput 9.00802K wps
[Epoch 76 Batch 60/62] avg loss 0.00631986, throughput 9.12437K wps
Begin Testing...
[Epoch 76] train avg loss 0.00619577, dev acc 0.8230, dev avg loss 0.396623, throughput 9.09552K wps
Observed Improvement.
Begin Testing...
[Epoch 77 Batch 30/62] avg loss 0.00601536, throughput 9.00384K wps
[Epoch 77 Batch 60/62] avg loss 0.00592967, throughput 8.93323K wps
Begin Testing...
[Epoch 77] train avg loss 0.00606426, dev acc 0.8201, dev avg loss 0.395686, throughput 9.02115K wps
[Epoch 78 Batch 30/62] avg loss 0.00589239, throughput 9.11815K wps
[Epoch 78 Batch 60/62] avg loss 0.00613282, throughput 9.1292K wps
Begin Testing...
[Epoch 78] train avg loss 0.00608194, dev acc 0.8201, dev avg loss 0.394859, throughput 9.14196K wps
[Epoch 79 Batch 30/62] avg loss 0.0057249, throughput 8.94467K wps
[Epoch 79 Batch 60/62] avg loss 0.00563503, throughput 8.82182K wps
Begin Testing...
[Epoch 79] train avg loss 0.00581734, dev acc 0.8319, dev avg loss 0.393336, throughput 8.92053K wps
Observed Improvement.
Begin Testing...
[Epoch 80 Batch 30/62] avg loss 0.00597295, throughput 9.0179K wps
[Epoch 80 Batch 60/62] avg loss 0.00557641, throughput 8.99394K wps
Begin Testing...
[Epoch 80] train avg loss 0.00591391, dev acc 0.8201, dev avg loss 0.3941, throughput 9.03191K wps
[Epoch 81 Batch 30/62] avg loss 0.00563552, throughput 8.87695K wps
[Epoch 81 Batch 60/62] avg loss 0.00585717, throughput 9.11402K wps
Begin Testing...
[Epoch 81] train avg loss 0.00582411, dev acc 0.8260, dev avg loss 0.391969, throughput 8.98067K wps
[Epoch 82 Batch 30/62] avg loss 0.00562613, throughput 8.98019K wps
[Epoch 82 Batch 60/62] avg loss 0.00579428, throughput 8.78606K wps
Begin Testing...
[Epoch 82] train avg loss 0.00576237, dev acc 0.8230, dev avg loss 0.391805, throughput 8.91808K wps
[Epoch 83 Batch 30/62] avg loss 0.00551908, throughput 8.78741K wps
[Epoch 83 Batch 60/62] avg loss 0.00561052, throughput 8.8537K wps
Begin Testing...
[Epoch 83] train avg loss 0.00568327, dev acc 0.8171, dev avg loss 0.392295, throughput 8.85783K wps
[Epoch 84 Batch 30/62] avg loss 0.00556881, throughput 8.96956K wps
[Epoch 84 Batch 60/62] avg loss 0.0054024, throughput 9.0575K wps
Begin Testing...
[Epoch 84] train avg loss 0.00555235, dev acc 0.8230, dev avg loss 0.38981, throughput 9.04745K wps
[Epoch 85 Batch 30/62] avg loss 0.00547552, throughput 9.08553K wps
[Epoch 85 Batch 60/62] avg loss 0.00549104, throughput 9.09357K wps
Begin Testing...
[Epoch 85] train avg loss 0.00552734, dev acc 0.8230, dev avg loss 0.389128, throughput 9.1216K wps
[Epoch 86 Batch 30/62] avg loss 0.00533845, throughput 9.24465K wps
[Epoch 86 Batch 60/62] avg loss 0.00547939, throughput 9.16765K wps
Begin Testing...
[Epoch 86] train avg loss 0.00547671, dev acc 0.8230, dev avg loss 0.387709, throughput 9.23687K wps
[Epoch 87 Batch 30/62] avg loss 0.00532206, throughput 8.75209K wps
[Epoch 87 Batch 60/62] avg loss 0.00541401, throughput 8.89703K wps
Begin Testing...
[Epoch 87] train avg loss 0.00548327, dev acc 0.8201, dev avg loss 0.388918, throughput 8.79461K wps
[Epoch 88 Batch 30/62] avg loss 0.00517288, throughput 9.0819K wps
[Epoch 88 Batch 60/62] avg loss 0.00536673, throughput 8.71188K wps
Begin Testing...
[Epoch 88] train avg loss 0.00536632, dev acc 0.8230, dev avg loss 0.387611, throughput 8.87417K wps
[Epoch 89 Batch 30/62] avg loss 0.00532368, throughput 9.22106K wps
[Epoch 89 Batch 60/62] avg loss 0.00516472, throughput 9.02207K wps
Begin Testing...
[Epoch 89] train avg loss 0.00530925, dev acc 0.8171, dev avg loss 0.387537, throughput 9.15053K wps
[Epoch 90 Batch 30/62] avg loss 0.00498961, throughput 8.9302K wps
[Epoch 90 Batch 60/62] avg loss 0.00542963, throughput 8.71902K wps
Begin Testing...
[Epoch 90] train avg loss 0.00530959, dev acc 0.8260, dev avg loss 0.386342, throughput 8.797K wps
[Epoch 91 Batch 30/62] avg loss 0.00517276, throughput 9.06719K wps
[Epoch 91 Batch 60/62] avg loss 0.00507184, throughput 9.16078K wps
Begin Testing...
[Epoch 91] train avg loss 0.00517915, dev acc 0.8201, dev avg loss 0.384786, throughput 9.13058K wps
[Epoch 92 Batch 30/62] avg loss 0.00509004, throughput 9.10845K wps
[Epoch 92 Batch 60/62] avg loss 0.00507086, throughput 9.07746K wps
Begin Testing...
[Epoch 92] train avg loss 0.00518251, dev acc 0.8230, dev avg loss 0.383222, throughput 9.05329K wps
[Epoch 93 Batch 30/62] avg loss 0.00506843, throughput 9.01593K wps
[Epoch 93 Batch 60/62] avg loss 0.00513482, throughput 8.95568K wps
Begin Testing...
[Epoch 93] train avg loss 0.00530464, dev acc 0.8201, dev avg loss 0.382952, throughput 8.95696K wps
[Epoch 94 Batch 30/62] avg loss 0.00506553, throughput 9.04291K wps
[Epoch 94 Batch 60/62] avg loss 0.00493201, throughput 8.83169K wps
Begin Testing...
[Epoch 94] train avg loss 0.00504165, dev acc 0.8171, dev avg loss 0.384336, throughput 8.92407K wps
[Epoch 95 Batch 30/62] avg loss 0.00468898, throughput 9.05367K wps
[Epoch 95 Batch 60/62] avg loss 0.00509721, throughput 8.96889K wps
Begin Testing...
[Epoch 95] train avg loss 0.0049329, dev acc 0.8201, dev avg loss 0.38189, throughput 9.03738K wps
[Epoch 96 Batch 30/62] avg loss 0.00470265, throughput 9.14248K wps
[Epoch 96 Batch 60/62] avg loss 0.00512412, throughput 8.99487K wps
Begin Testing...
[Epoch 96] train avg loss 0.00505007, dev acc 0.8260, dev avg loss 0.382467, throughput 9.0986K wps
[Epoch 97 Batch 30/62] avg loss 0.00481767, throughput 8.97001K wps
[Epoch 97 Batch 60/62] avg loss 0.00461965, throughput 9.13754K wps
Begin Testing...
[Epoch 97] train avg loss 0.00475697, dev acc 0.8230, dev avg loss 0.3807, throughput 9.09167K wps
[Epoch 98 Batch 30/62] avg loss 0.00491282, throughput 9.14905K wps
[Epoch 98 Batch 60/62] avg loss 0.00461038, throughput 9.17474K wps
Begin Testing...
[Epoch 98] train avg loss 0.00484927, dev acc 0.8201, dev avg loss 0.378969, throughput 9.18784K wps
[Epoch 99 Batch 30/62] avg loss 0.00470328, throughput 8.95393K wps
[Epoch 99 Batch 60/62] avg loss 0.00483051, throughput 8.99398K wps
Begin Testing...
[Epoch 99] train avg loss 0.00482643, dev acc 0.8201, dev avg loss 0.379629, throughput 8.96451K wps
[Epoch 100 Batch 30/62] avg loss 0.00460221, throughput 9.11453K wps
[Epoch 100 Batch 60/62] avg loss 0.0046727, throughput 8.97814K wps
Begin Testing...
[Epoch 100] train avg loss 0.00475968, dev acc 0.8289, dev avg loss 0.37781, throughput 9.07653K wps
[Epoch 101 Batch 30/62] avg loss 0.00478284, throughput 9.05094K wps
[Epoch 101 Batch 60/62] avg loss 0.0043209, throughput 9.03321K wps
Begin Testing...
[Epoch 101] train avg loss 0.00464926, dev acc 0.8201, dev avg loss 0.379437, throughput 9.07279K wps
[Epoch 102 Batch 30/62] avg loss 0.00457343, throughput 9.07957K wps
[Epoch 102 Batch 60/62] avg loss 0.0045718, throughput 9.13351K wps
Begin Testing...
[Epoch 102] train avg loss 0.00460747, dev acc 0.8230, dev avg loss 0.3785, throughput 9.09179K wps
[Epoch 103 Batch 30/62] avg loss 0.00449635, throughput 9.12598K wps
[Epoch 103 Batch 60/62] avg loss 0.00472681, throughput 8.7996K wps
Begin Testing...
[Epoch 103] train avg loss 0.00464824, dev acc 0.8201, dev avg loss 0.376182, throughput 8.93857K wps
[Epoch 104 Batch 30/62] avg loss 0.00444067, throughput 9.10298K wps
[Epoch 104 Batch 60/62] avg loss 0.00458001, throughput 8.96926K wps
Begin Testing...
[Epoch 104] train avg loss 0.00460254, dev acc 0.8171, dev avg loss 0.384633, throughput 9.06448K wps
[Epoch 105 Batch 30/62] avg loss 0.00428184, throughput 9.08721K wps
[Epoch 105 Batch 60/62] avg loss 0.00442268, throughput 9.14814K wps
Begin Testing...
[Epoch 105] train avg loss 0.00438872, dev acc 0.8289, dev avg loss 0.374975, throughput 9.14851K wps
[Epoch 106 Batch 30/62] avg loss 0.00429298, throughput 9.2756K wps
[Epoch 106 Batch 60/62] avg loss 0.00448247, throughput 9.02945K wps
Begin Testing...
[Epoch 106] train avg loss 0.0044607, dev acc 0.8230, dev avg loss 0.379271, throughput 9.17751K wps
[Epoch 107 Batch 30/62] avg loss 0.00430285, throughput 9.15423K wps
[Epoch 107 Batch 60/62] avg loss 0.00434393, throughput 8.56859K wps
Begin Testing...
[Epoch 107] train avg loss 0.00432432, dev acc 0.8171, dev avg loss 0.377328, throughput 8.84705K wps
[Epoch 108 Batch 30/62] avg loss 0.00398321, throughput 9.20425K wps
[Epoch 108 Batch 60/62] avg loss 0.00454627, throughput 8.79523K wps
Begin Testing...
[Epoch 108] train avg loss 0.00430064, dev acc 0.8230, dev avg loss 0.373334, throughput 9.04099K wps
[Epoch 109 Batch 30/62] avg loss 0.00427702, throughput 9.03832K wps
[Epoch 109 Batch 60/62] avg loss 0.00417848, throughput 8.90166K wps
Begin Testing...
[Epoch 109] train avg loss 0.00426025, dev acc 0.8201, dev avg loss 0.373841, throughput 8.98055K wps
[Epoch 110 Batch 30/62] avg loss 0.00410663, throughput 8.92366K wps
[Epoch 110 Batch 60/62] avg loss 0.00437486, throughput 9.05592K wps
Begin Testing...
[Epoch 110] train avg loss 0.00424982, dev acc 0.8171, dev avg loss 0.375004, throughput 9.02282K wps
[Epoch 111 Batch 30/62] avg loss 0.00449542, throughput 9.08572K wps
[Epoch 111 Batch 60/62] avg loss 0.0040215, throughput 9.07367K wps
Begin Testing...
[Epoch 111] train avg loss 0.00428823, dev acc 0.8230, dev avg loss 0.377012, throughput 9.10728K wps
[Epoch 112 Batch 30/62] avg loss 0.00417316, throughput 8.91655K wps
[Epoch 112 Batch 60/62] avg loss 0.00407741, throughput 9.17195K wps
Begin Testing...
[Epoch 112] train avg loss 0.00417346, dev acc 0.8201, dev avg loss 0.371806, throughput 9.00305K wps
[Epoch 113 Batch 30/62] avg loss 0.00409862, throughput 9.09368K wps
[Epoch 113 Batch 60/62] avg loss 0.0040392, throughput 9.08045K wps
Begin Testing...
[Epoch 113] train avg loss 0.00407612, dev acc 0.8260, dev avg loss 0.370382, throughput 9.11806K wps
[Epoch 114 Batch 30/62] avg loss 0.0039647, throughput 8.86691K wps
[Epoch 114 Batch 60/62] avg loss 0.0042092, throughput 9.12029K wps
Begin Testing...
[Epoch 114] train avg loss 0.00418522, dev acc 0.8289, dev avg loss 0.370402, throughput 9.0244K wps
[Epoch 115 Batch 30/62] avg loss 0.00397454, throughput 9.16894K wps
[Epoch 115 Batch 60/62] avg loss 0.00398239, throughput 9.12855K wps
Begin Testing...
[Epoch 115] train avg loss 0.00406002, dev acc 0.8289, dev avg loss 0.372177, throughput 9.17714K wps
[Epoch 116 Batch 30/62] avg loss 0.00398754, throughput 8.53301K wps
[Epoch 116 Batch 60/62] avg loss 0.00383631, throughput 9.15627K wps
Begin Testing...
[Epoch 116] train avg loss 0.00392089, dev acc 0.8230, dev avg loss 0.370541, throughput 8.86797K wps
[Epoch 117 Batch 30/62] avg loss 0.00393737, throughput 9.16855K wps
[Epoch 117 Batch 60/62] avg loss 0.00389065, throughput 9.0467K wps
Begin Testing...
[Epoch 117] train avg loss 0.00399561, dev acc 0.8289, dev avg loss 0.368993, throughput 9.06251K wps
[Epoch 118 Batch 30/62] avg loss 0.00403545, throughput 9.00789K wps
[Epoch 118 Batch 60/62] avg loss 0.00389555, throughput 8.8132K wps
Begin Testing...
[Epoch 118] train avg loss 0.00400363, dev acc 0.8230, dev avg loss 0.367797, throughput 8.94025K wps
[Epoch 119 Batch 30/62] avg loss 0.00385862, throughput 9.17827K wps
[Epoch 119 Batch 60/62] avg loss 0.00385715, throughput 9.03647K wps
Begin Testing...
[Epoch 119] train avg loss 0.00390417, dev acc 0.8230, dev avg loss 0.367426, throughput 9.13785K wps
[Epoch 120 Batch 30/62] avg loss 0.00366774, throughput 8.91296K wps
[Epoch 120 Batch 60/62] avg loss 0.00392941, throughput 9.11894K wps
Begin Testing...
[Epoch 120] train avg loss 0.00385084, dev acc 0.8230, dev avg loss 0.367109, throughput 9.04681K wps
[Epoch 121 Batch 30/62] avg loss 0.00377636, throughput 9.27129K wps
[Epoch 121 Batch 60/62] avg loss 0.00379251, throughput 9.04349K wps
Begin Testing...
[Epoch 121] train avg loss 0.00381351, dev acc 0.8201, dev avg loss 0.367185, throughput 9.18159K wps
[Epoch 122 Batch 30/62] avg loss 0.00374042, throughput 9.01088K wps
[Epoch 122 Batch 60/62] avg loss 0.00367734, throughput 8.65531K wps
Begin Testing...
[Epoch 122] train avg loss 0.00374488, dev acc 0.8230, dev avg loss 0.366523, throughput 8.8166K wps
[Epoch 123 Batch 30/62] avg loss 0.00355468, throughput 8.7889K wps
[Epoch 123 Batch 60/62] avg loss 0.00381863, throughput 8.78967K wps
Begin Testing...
[Epoch 123] train avg loss 0.00372565, dev acc 0.8201, dev avg loss 0.367711, throughput 8.84039K wps
[Epoch 124 Batch 30/62] avg loss 0.00363341, throughput 9.12958K wps
[Epoch 124 Batch 60/62] avg loss 0.00352372, throughput 8.85641K wps
Begin Testing...
[Epoch 124] train avg loss 0.00365663, dev acc 0.8260, dev avg loss 0.37065, throughput 9.03467K wps
[Epoch 125 Batch 30/62] avg loss 0.00352759, throughput 8.86468K wps
[Epoch 125 Batch 60/62] avg loss 0.00359748, throughput 9.1475K wps
Begin Testing...
[Epoch 125] train avg loss 0.00364594, dev acc 0.8171, dev avg loss 0.366817, throughput 9.03504K wps
[Epoch 126 Batch 30/62] avg loss 0.00379777, throughput 8.98769K wps
[Epoch 126 Batch 60/62] avg loss 0.00329806, throughput 8.77976K wps
Begin Testing...
[Epoch 126] train avg loss 0.00359877, dev acc 0.8348, dev avg loss 0.365941, throughput 8.85164K wps
Observed Improvement.
Begin Testing...
[Epoch 127 Batch 30/62] avg loss 0.00351589, throughput 9.16902K wps
[Epoch 127 Batch 60/62] avg loss 0.00347891, throughput 9.13293K wps
Begin Testing...
[Epoch 127] train avg loss 0.00358433, dev acc 0.8289, dev avg loss 0.370825, throughput 9.17632K wps
[Epoch 128 Batch 30/62] avg loss 0.0035362, throughput 9.04676K wps
[Epoch 128 Batch 60/62] avg loss 0.00350508, throughput 8.99039K wps
Begin Testing...
[Epoch 128] train avg loss 0.00357042, dev acc 0.8230, dev avg loss 0.36478, throughput 8.98363K wps
[Epoch 129 Batch 30/62] avg loss 0.00340152, throughput 9.16646K wps
[Epoch 129 Batch 60/62] avg loss 0.00347487, throughput 9.12575K wps
Begin Testing...
[Epoch 129] train avg loss 0.00346748, dev acc 0.8348, dev avg loss 0.364757, throughput 9.17113K wps
Observed Improvement.
Begin Testing...
[Epoch 130 Batch 30/62] avg loss 0.00344355, throughput 8.9402K wps
[Epoch 130 Batch 60/62] avg loss 0.00327116, throughput 9.0791K wps
Begin Testing...
[Epoch 130] train avg loss 0.00339303, dev acc 0.8230, dev avg loss 0.365281, throughput 9.03911K wps
[Epoch 131 Batch 30/62] avg loss 0.00333259, throughput 8.79089K wps
[Epoch 131 Batch 60/62] avg loss 0.00348801, throughput 8.93335K wps
Begin Testing...
[Epoch 131] train avg loss 0.00346078, dev acc 0.8319, dev avg loss 0.364459, throughput 8.8472K wps
[Epoch 132 Batch 30/62] avg loss 0.00324689, throughput 9.16261K wps
[Epoch 132 Batch 60/62] avg loss 0.00351931, throughput 9.01985K wps
Begin Testing...
[Epoch 132] train avg loss 0.0034249, dev acc 0.8230, dev avg loss 0.363363, throughput 9.11638K wps
[Epoch 133 Batch 30/62] avg loss 0.00322694, throughput 8.99502K wps
[Epoch 133 Batch 60/62] avg loss 0.00343051, throughput 8.8553K wps
Begin Testing...
[Epoch 133] train avg loss 0.00336583, dev acc 0.8260, dev avg loss 0.367183, throughput 8.8884K wps
[Epoch 134 Batch 30/62] avg loss 0.00310717, throughput 9.27762K wps
[Epoch 134 Batch 60/62] avg loss 0.00331607, throughput 8.92044K wps
Begin Testing...
[Epoch 134] train avg loss 0.00324477, dev acc 0.8201, dev avg loss 0.363443, throughput 9.12463K wps
[Epoch 135 Batch 30/62] avg loss 0.00319094, throughput 9.21296K wps
[Epoch 135 Batch 60/62] avg loss 0.00337262, throughput 8.74584K wps
Begin Testing...
[Epoch 135] train avg loss 0.00333559, dev acc 0.8378, dev avg loss 0.363203, throughput 8.98282K wps
Observed Improvement.
Begin Testing...
[Epoch 136 Batch 30/62] avg loss 0.00319443, throughput 8.8662K wps
[Epoch 136 Batch 60/62] avg loss 0.00327442, throughput 9.01009K wps
Begin Testing...
[Epoch 136] train avg loss 0.00324142, dev acc 0.8319, dev avg loss 0.362026, throughput 8.97203K wps
[Epoch 137 Batch 30/62] avg loss 0.00325275, throughput 9.00815K wps
[Epoch 137 Batch 60/62] avg loss 0.00322704, throughput 9.09768K wps
Begin Testing...
[Epoch 137] train avg loss 0.00334124, dev acc 0.8319, dev avg loss 0.36244, throughput 9.03737K wps
[Epoch 138 Batch 30/62] avg loss 0.00307216, throughput 8.92376K wps
[Epoch 138 Batch 60/62] avg loss 0.00329621, throughput 9.16217K wps
Begin Testing...
[Epoch 138] train avg loss 0.00322126, dev acc 0.8378, dev avg loss 0.362814, throughput 9.07533K wps
Observed Improvement.
Begin Testing...
[Epoch 139 Batch 30/62] avg loss 0.00321586, throughput 9.11683K wps
[Epoch 139 Batch 60/62] avg loss 0.00314451, throughput 8.73077K wps
Begin Testing...
[Epoch 139] train avg loss 0.00317896, dev acc 0.8289, dev avg loss 0.363059, throughput 8.94606K wps
[Epoch 140 Batch 30/62] avg loss 0.0031351, throughput 9.11043K wps
[Epoch 140 Batch 60/62] avg loss 0.00310501, throughput 8.89199K wps
Begin Testing...
[Epoch 140] train avg loss 0.0031663, dev acc 0.8289, dev avg loss 0.366828, throughput 9.03334K wps
[Epoch 141 Batch 30/62] avg loss 0.00315103, throughput 9.04432K wps
[Epoch 141 Batch 60/62] avg loss 0.00300467, throughput 9.16574K wps
Begin Testing...
[Epoch 141] train avg loss 0.00312301, dev acc 0.8319, dev avg loss 0.3599, throughput 9.13468K wps
[Epoch 142 Batch 30/62] avg loss 0.00311135, throughput 9.02377K wps
[Epoch 142 Batch 60/62] avg loss 0.00305566, throughput 9.15793K wps
Begin Testing...
[Epoch 142] train avg loss 0.00313032, dev acc 0.8289, dev avg loss 0.360936, throughput 9.08914K wps
[Epoch 143 Batch 30/62] avg loss 0.00284853, throughput 9.13675K wps
[Epoch 143 Batch 60/62] avg loss 0.00305147, throughput 8.8507K wps
Begin Testing...
[Epoch 143] train avg loss 0.00297683, dev acc 0.8437, dev avg loss 0.360771, throughput 9.04962K wps
Observed Improvement.
Begin Testing...
[Epoch 144 Batch 30/62] avg loss 0.00284303, throughput 8.67647K wps
[Epoch 144 Batch 60/62] avg loss 0.00305421, throughput 8.9412K wps
Begin Testing...
[Epoch 144] train avg loss 0.00303482, dev acc 0.8260, dev avg loss 0.366663, throughput 8.79093K wps
[Epoch 145 Batch 30/62] avg loss 0.00293251, throughput 9.06356K wps
[Epoch 145 Batch 60/62] avg loss 0.00311631, throughput 8.94095K wps
Begin Testing...
[Epoch 145] train avg loss 0.00304164, dev acc 0.8260, dev avg loss 0.366133, throughput 9.03504K wps
[Epoch 146 Batch 30/62] avg loss 0.00297141, throughput 8.97227K wps
[Epoch 146 Batch 60/62] avg loss 0.0028445, throughput 9.1271K wps
Begin Testing...
[Epoch 146] train avg loss 0.00290406, dev acc 0.8289, dev avg loss 0.365379, throughput 9.08135K wps
[Epoch 147 Batch 30/62] avg loss 0.00287221, throughput 8.87925K wps
[Epoch 147 Batch 60/62] avg loss 0.00309087, throughput 9.11887K wps
Begin Testing...
[Epoch 147] train avg loss 0.00300871, dev acc 0.8319, dev avg loss 0.360001, throughput 9.02911K wps
[Epoch 148 Batch 30/62] avg loss 0.0028632, throughput 9.13204K wps
[Epoch 148 Batch 60/62] avg loss 0.00299272, throughput 8.97394K wps
Begin Testing...
[Epoch 148] train avg loss 0.00296359, dev acc 0.8319, dev avg loss 0.359735, throughput 9.08148K wps
[Epoch 149 Batch 30/62] avg loss 0.00274742, throughput 9.16654K wps
[Epoch 149 Batch 60/62] avg loss 0.0028491, throughput 9.13808K wps
Begin Testing...
[Epoch 149] train avg loss 0.002916, dev acc 0.8378, dev avg loss 0.358066, throughput 9.18362K wps
[Epoch 150 Batch 30/62] avg loss 0.00306134, throughput 9.20024K wps
[Epoch 150 Batch 60/62] avg loss 0.00279081, throughput 9.08189K wps
Begin Testing...
[Epoch 150] train avg loss 0.00299079, dev acc 0.8407, dev avg loss 0.35794, throughput 9.11821K wps
[Epoch 151 Batch 30/62] avg loss 0.00279653, throughput 9.19976K wps
[Epoch 151 Batch 60/62] avg loss 0.00276794, throughput 9.13671K wps
Begin Testing...
[Epoch 151] train avg loss 0.00282228, dev acc 0.8407, dev avg loss 0.35796, throughput 9.19681K wps
[Epoch 152 Batch 30/62] avg loss 0.00288302, throughput 9.14428K wps
[Epoch 152 Batch 60/62] avg loss 0.00258913, throughput 8.98123K wps
Begin Testing...
[Epoch 152] train avg loss 0.00274784, dev acc 0.8407, dev avg loss 0.35788, throughput 9.03668K wps
[Epoch 153 Batch 30/62] avg loss 0.00268989, throughput 9.11547K wps
[Epoch 153 Batch 60/62] avg loss 0.0026636, throughput 8.98452K wps
Begin Testing...
[Epoch 153] train avg loss 0.00271897, dev acc 0.8407, dev avg loss 0.357505, throughput 9.02034K wps
[Epoch 154 Batch 30/62] avg loss 0.00267162, throughput 9.12429K wps
[Epoch 154 Batch 60/62] avg loss 0.00279647, throughput 9.08677K wps
Begin Testing...
[Epoch 154] train avg loss 0.0028081, dev acc 0.8378, dev avg loss 0.35757, throughput 9.15372K wps
[Epoch 155 Batch 30/62] avg loss 0.00274563, throughput 9.25844K wps
[Epoch 155 Batch 60/62] avg loss 0.00274589, throughput 8.66836K wps
Begin Testing...
[Epoch 155] train avg loss 0.00277122, dev acc 0.8378, dev avg loss 0.35737, throughput 9.01908K wps
[Epoch 156 Batch 30/62] avg loss 0.00254051, throughput 9.02819K wps
[Epoch 156 Batch 60/62] avg loss 0.00263584, throughput 8.81263K wps
Begin Testing...
[Epoch 156] train avg loss 0.00259257, dev acc 0.8437, dev avg loss 0.357368, throughput 8.89399K wps
Observed Improvement.
Begin Testing...
[Epoch 157 Batch 30/62] avg loss 0.0025297, throughput 9.1795K wps
[Epoch 157 Batch 60/62] avg loss 0.00279867, throughput 8.93924K wps
Begin Testing...
[Epoch 157] train avg loss 0.00269315, dev acc 0.8407, dev avg loss 0.35775, throughput 9.01648K wps
[Epoch 158 Batch 30/62] avg loss 0.00270127, throughput 9.03303K wps
[Epoch 158 Batch 60/62] avg loss 0.00253934, throughput 8.98957K wps
Begin Testing...
[Epoch 158] train avg loss 0.00265364, dev acc 0.8437, dev avg loss 0.357345, throughput 9.04111K wps
Observed Improvement.
Begin Testing...
[Epoch 159 Batch 30/62] avg loss 0.00255667, throughput 9.14686K wps
[Epoch 159 Batch 60/62] avg loss 0.00255022, throughput 8.93639K wps
Begin Testing...
[Epoch 159] train avg loss 0.00262587, dev acc 0.8230, dev avg loss 0.36799, throughput 9.0146K wps
[Epoch 160 Batch 30/62] avg loss 0.00256119, throughput 8.96836K wps
[Epoch 160 Batch 60/62] avg loss 0.00267676, throughput 8.92568K wps
Begin Testing...
[Epoch 160] train avg loss 0.00265214, dev acc 0.8437, dev avg loss 0.356938, throughput 8.99529K wps
Observed Improvement.
Begin Testing...
[Epoch 161 Batch 30/62] avg loss 0.00231812, throughput 8.88415K wps
[Epoch 161 Batch 60/62] avg loss 0.00289911, throughput 9.03356K wps
Begin Testing...
[Epoch 161] train avg loss 0.00261042, dev acc 0.8437, dev avg loss 0.357527, throughput 8.93829K wps
Observed Improvement.
Begin Testing...
[Epoch 162 Batch 30/62] avg loss 0.00248351, throughput 9.15061K wps
[Epoch 162 Batch 60/62] avg loss 0.0024396, throughput 9.04779K wps
Begin Testing...
[Epoch 162] train avg loss 0.00247845, dev acc 0.8437, dev avg loss 0.356587, throughput 9.08475K wps
Observed Improvement.
Begin Testing...
[Epoch 163 Batch 30/62] avg loss 0.00248578, throughput 8.56756K wps
[Epoch 163 Batch 60/62] avg loss 0.00252425, throughput 9.12181K wps
Begin Testing...
[Epoch 163] train avg loss 0.00256848, dev acc 0.8437, dev avg loss 0.356426, throughput 8.87187K wps
Observed Improvement.
Begin Testing...
[Epoch 164 Batch 30/62] avg loss 0.00255899, throughput 8.88317K wps
[Epoch 164 Batch 60/62] avg loss 0.00246095, throughput 8.95308K wps
Begin Testing...
[Epoch 164] train avg loss 0.00251593, dev acc 0.8437, dev avg loss 0.356146, throughput 8.88725K wps
Observed Improvement.
Begin Testing...
[Epoch 165 Batch 30/62] avg loss 0.00240283, throughput 8.88366K wps
[Epoch 165 Batch 60/62] avg loss 0.00253796, throughput 8.67274K wps
Begin Testing...
[Epoch 165] train avg loss 0.00248062, dev acc 0.8407, dev avg loss 0.356416, throughput 8.85274K wps
[Epoch 166 Batch 30/62] avg loss 0.00259297, throughput 9.04916K wps
[Epoch 166 Batch 60/62] avg loss 0.00243627, throughput 9.09705K wps
Begin Testing...
[Epoch 166] train avg loss 0.00253663, dev acc 0.8466, dev avg loss 0.356172, throughput 9.04094K wps
Observed Improvement.
Begin Testing...
[Epoch 167 Batch 30/62] avg loss 0.0024051, throughput 9.26902K wps
[Epoch 167 Batch 60/62] avg loss 0.00236763, throughput 8.94273K wps
Begin Testing...
[Epoch 167] train avg loss 0.00240236, dev acc 0.8407, dev avg loss 0.357214, throughput 9.07319K wps
[Epoch 168 Batch 30/62] avg loss 0.00228046, throughput 8.95384K wps
[Epoch 168 Batch 60/62] avg loss 0.00245275, throughput 8.97232K wps
Begin Testing...
[Epoch 168] train avg loss 0.00239, dev acc 0.8260, dev avg loss 0.358801, throughput 8.93015K wps
[Epoch 169 Batch 30/62] avg loss 0.00245573, throughput 8.98082K wps
[Epoch 169 Batch 60/62] avg loss 0.00225297, throughput 9.08699K wps
Begin Testing...
[Epoch 169] train avg loss 0.00240356, dev acc 0.8407, dev avg loss 0.357245, throughput 9.0642K wps
[Epoch 170 Batch 30/62] avg loss 0.00230978, throughput 8.75351K wps
[Epoch 170 Batch 60/62] avg loss 0.00244468, throughput 9.00685K wps
Begin Testing...
[Epoch 170] train avg loss 0.00238696, dev acc 0.8289, dev avg loss 0.360083, throughput 8.89855K wps
[Epoch 171 Batch 30/62] avg loss 0.00236092, throughput 8.90395K wps
[Epoch 171 Batch 60/62] avg loss 0.00230441, throughput 9.09018K wps
Begin Testing...
[Epoch 171] train avg loss 0.00235465, dev acc 0.8437, dev avg loss 0.355711, throughput 9.03291K wps
[Epoch 172 Batch 30/62] avg loss 0.00222653, throughput 9.0758K wps
[Epoch 172 Batch 60/62] avg loss 0.00235444, throughput 9.16647K wps
Begin Testing...
[Epoch 172] train avg loss 0.00231354, dev acc 0.8437, dev avg loss 0.356051, throughput 9.14901K wps
[Epoch 173 Batch 30/62] avg loss 0.00235332, throughput 8.87722K wps
[Epoch 173 Batch 60/62] avg loss 0.00239247, throughput 9.14653K wps
Begin Testing...
[Epoch 173] train avg loss 0.00239383, dev acc 0.8555, dev avg loss 0.354686, throughput 9.04457K wps
Observed Improvement.
Begin Testing...
[Epoch 174 Batch 30/62] avg loss 0.00229196, throughput 9.07497K wps
[Epoch 174 Batch 60/62] avg loss 0.00226135, throughput 8.98507K wps
Begin Testing...
[Epoch 174] train avg loss 0.00232145, dev acc 0.8466, dev avg loss 0.354252, throughput 9.0605K wps
[Epoch 175 Batch 30/62] avg loss 0.00219291, throughput 8.89103K wps
[Epoch 175 Batch 60/62] avg loss 0.00214529, throughput 9.13176K wps
Begin Testing...
[Epoch 175] train avg loss 0.00222419, dev acc 0.8466, dev avg loss 0.353754, throughput 9.03731K wps
[Epoch 176 Batch 30/62] avg loss 0.00211508, throughput 9.14958K wps
[Epoch 176 Batch 60/62] avg loss 0.00223637, throughput 9.03151K wps
Begin Testing...
[Epoch 176] train avg loss 0.00223663, dev acc 0.8496, dev avg loss 0.353721, throughput 9.06198K wps
[Epoch 177 Batch 30/62] avg loss 0.00210169, throughput 9.09633K wps
[Epoch 177 Batch 60/62] avg loss 0.00232394, throughput 9.07461K wps
Begin Testing...
[Epoch 177] train avg loss 0.00222194, dev acc 0.8555, dev avg loss 0.353441, throughput 9.03757K wps
Observed Improvement.
Begin Testing...
[Epoch 178 Batch 30/62] avg loss 0.00217089, throughput 8.92486K wps
[Epoch 178 Batch 60/62] avg loss 0.00218444, throughput 8.88706K wps
Begin Testing...
[Epoch 178] train avg loss 0.00220762, dev acc 0.8496, dev avg loss 0.353275, throughput 8.94339K wps
[Epoch 179 Batch 30/62] avg loss 0.00222091, throughput 8.95781K wps
[Epoch 179 Batch 60/62] avg loss 0.00210182, throughput 8.85167K wps
Begin Testing...
[Epoch 179] train avg loss 0.00218877, dev acc 0.8555, dev avg loss 0.353533, throughput 8.91971K wps
Observed Improvement.
Begin Testing...
[Epoch 180 Batch 30/62] avg loss 0.00221697, throughput 9.10302K wps
[Epoch 180 Batch 60/62] avg loss 0.00221442, throughput 8.99014K wps
Begin Testing...
[Epoch 180] train avg loss 0.00222765, dev acc 0.8496, dev avg loss 0.35335, throughput 9.00995K wps
[Epoch 181 Batch 30/62] avg loss 0.00221004, throughput 9.06592K wps
[Epoch 181 Batch 60/62] avg loss 0.00203859, throughput 9.06583K wps
Begin Testing...
[Epoch 181] train avg loss 0.00212588, dev acc 0.8437, dev avg loss 0.353795, throughput 9.04519K wps
[Epoch 182 Batch 30/62] avg loss 0.0021667, throughput 9.2268K wps
[Epoch 182 Batch 60/62] avg loss 0.00206106, throughput 8.97278K wps
Begin Testing...
[Epoch 182] train avg loss 0.00212721, dev acc 0.8496, dev avg loss 0.352944, throughput 9.13197K wps
[Epoch 183 Batch 30/62] avg loss 0.00211447, throughput 9.25049K wps
[Epoch 183 Batch 60/62] avg loss 0.00206051, throughput 9.15181K wps
Begin Testing...
[Epoch 183] train avg loss 0.00209405, dev acc 0.8525, dev avg loss 0.353201, throughput 9.22715K wps
[Epoch 184 Batch 30/62] avg loss 0.00217937, throughput 9.26676K wps
[Epoch 184 Batch 60/62] avg loss 0.00197079, throughput 9.13151K wps
Begin Testing...
[Epoch 184] train avg loss 0.00207954, dev acc 0.8525, dev avg loss 0.353839, throughput 9.22776K wps
[Epoch 185 Batch 30/62] avg loss 0.00207845, throughput 9.18411K wps
[Epoch 185 Batch 60/62] avg loss 0.00201802, throughput 9.0971K wps
Begin Testing...
[Epoch 185] train avg loss 0.00207692, dev acc 0.8525, dev avg loss 0.353252, throughput 9.16693K wps
[Epoch 186 Batch 30/62] avg loss 0.00189036, throughput 9.12336K wps
[Epoch 186 Batch 60/62] avg loss 0.00209977, throughput 9.04119K wps
Begin Testing...
[Epoch 186] train avg loss 0.00205708, dev acc 0.8378, dev avg loss 0.355221, throughput 9.11222K wps
[Epoch 187 Batch 30/62] avg loss 0.00210784, throughput 9.26342K wps
[Epoch 187 Batch 60/62] avg loss 0.00211261, throughput 8.97274K wps
Begin Testing...
[Epoch 187] train avg loss 0.00214063, dev acc 0.8525, dev avg loss 0.352962, throughput 9.10222K wps
[Epoch 188 Batch 30/62] avg loss 0.00199734, throughput 9.0836K wps
[Epoch 188 Batch 60/62] avg loss 0.00196379, throughput 8.69155K wps
Begin Testing...
[Epoch 188] train avg loss 0.00200876, dev acc 0.8407, dev avg loss 0.354908, throughput 8.85806K wps
[Epoch 189 Batch 30/62] avg loss 0.00208887, throughput 9.08049K wps
[Epoch 189 Batch 60/62] avg loss 0.00200561, throughput 8.97802K wps
Begin Testing...
[Epoch 189] train avg loss 0.00205964, dev acc 0.8407, dev avg loss 0.353798, throughput 9.05986K wps
[Epoch 190 Batch 30/62] avg loss 0.00214441, throughput 8.8863K wps
[Epoch 190 Batch 60/62] avg loss 0.00191454, throughput 8.97085K wps
Begin Testing...
[Epoch 190] train avg loss 0.00203598, dev acc 0.8319, dev avg loss 0.35531, throughput 8.91217K wps
[Epoch 191 Batch 30/62] avg loss 0.00207363, throughput 8.89491K wps
[Epoch 191 Batch 60/62] avg loss 0.00195585, throughput 8.97826K wps
Begin Testing...
[Epoch 191] train avg loss 0.00204406, dev acc 0.8289, dev avg loss 0.355224, throughput 8.92193K wps
[Epoch 192 Batch 30/62] avg loss 0.00193029, throughput 9.12726K wps
[Epoch 192 Batch 60/62] avg loss 0.00197867, throughput 8.94557K wps
Begin Testing...
[Epoch 192] train avg loss 0.00198079, dev acc 0.8555, dev avg loss 0.35209, throughput 9.02096K wps
Observed Improvement.
Begin Testing...
[Epoch 193 Batch 30/62] avg loss 0.00178731, throughput 9.07498K wps
[Epoch 193 Batch 60/62] avg loss 0.00198315, throughput 8.81014K wps
Begin Testing...
[Epoch 193] train avg loss 0.00191493, dev acc 0.8466, dev avg loss 0.353871, throughput 8.92697K wps
[Epoch 194 Batch 30/62] avg loss 0.00193875, throughput 8.76267K wps
[Epoch 194 Batch 60/62] avg loss 0.00195955, throughput 9.14153K wps
Begin Testing...
[Epoch 194] train avg loss 0.00196019, dev acc 0.8555, dev avg loss 0.352909, throughput 8.98037K wps
Observed Improvement.
Begin Testing...
[Epoch 195 Batch 30/62] avg loss 0.00187083, throughput 9.20857K wps
[Epoch 195 Batch 60/62] avg loss 0.00190843, throughput 8.86313K wps
Begin Testing...
[Epoch 195] train avg loss 0.00193962, dev acc 0.8142, dev avg loss 0.372349, throughput 9.08554K wps
[Epoch 196 Batch 30/62] avg loss 0.00189224, throughput 9.09757K wps
[Epoch 196 Batch 60/62] avg loss 0.00196454, throughput 9.02992K wps
Begin Testing...
[Epoch 196] train avg loss 0.00194079, dev acc 0.8555, dev avg loss 0.352464, throughput 9.09549K wps
Observed Improvement.
Begin Testing...
[Epoch 197 Batch 30/62] avg loss 0.00193088, throughput 8.93652K wps
[Epoch 197 Batch 60/62] avg loss 0.00188076, throughput 9.08312K wps
Begin Testing...
[Epoch 197] train avg loss 0.0019297, dev acc 0.8466, dev avg loss 0.352339, throughput 9.03773K wps
[Epoch 198 Batch 30/62] avg loss 0.00183728, throughput 9.03781K wps
[Epoch 198 Batch 60/62] avg loss 0.00186751, throughput 8.63251K wps
Begin Testing...
[Epoch 198] train avg loss 0.00189116, dev acc 0.8555, dev avg loss 0.352064, throughput 8.80755K wps
Observed Improvement.
Begin Testing...
[Epoch 199 Batch 30/62] avg loss 0.00169654, throughput 9.18936K wps
[Epoch 199 Batch 60/62] avg loss 0.0018293, throughput 9.12589K wps
Begin Testing...
[Epoch 199] train avg loss 0.00178646, dev acc 0.8525, dev avg loss 0.352391, throughput 9.1854K wps
Test loss 0.352538, test acc 0.8302
Total time cost 155.53s
[Epoch 0 Batch 30/62] avg loss 0.0134267, throughput 8.49625K wps
[Epoch 0 Batch 60/62] avg loss 0.0130099, throughput 9.10886K wps
Begin Testing...
[Epoch 0] train avg loss 0.0134135, dev acc 0.6578, dev avg loss 0.637954, throughput 8.83152K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0130752, throughput 9.07468K wps
[Epoch 1 Batch 60/62] avg loss 0.0129899, throughput 9.05112K wps
Begin Testing...
[Epoch 1] train avg loss 0.0132076, dev acc 0.6578, dev avg loss 0.634356, throughput 9.09389K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.0131668, throughput 9.22894K wps
[Epoch 2 Batch 60/62] avg loss 0.0127779, throughput 9.00602K wps
Begin Testing...
[Epoch 2] train avg loss 0.01315, dev acc 0.6578, dev avg loss 0.630443, throughput 9.14355K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0130087, throughput 8.94731K wps
[Epoch 3 Batch 60/62] avg loss 0.0127009, throughput 8.88535K wps
Begin Testing...
[Epoch 3] train avg loss 0.0130346, dev acc 0.6578, dev avg loss 0.625548, throughput 8.88375K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0126184, throughput 9.10941K wps
[Epoch 4 Batch 60/62] avg loss 0.0128797, throughput 8.88483K wps
Begin Testing...
[Epoch 4] train avg loss 0.0129001, dev acc 0.6578, dev avg loss 0.622195, throughput 8.96221K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0127439, throughput 8.88012K wps
[Epoch 5 Batch 60/62] avg loss 0.0124663, throughput 9.12163K wps
Begin Testing...
[Epoch 5] train avg loss 0.0128541, dev acc 0.6578, dev avg loss 0.619754, throughput 9.03076K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0125428, throughput 8.94355K wps
[Epoch 6 Batch 60/62] avg loss 0.0126097, throughput 8.83533K wps
Begin Testing...
[Epoch 6] train avg loss 0.0127305, dev acc 0.6578, dev avg loss 0.614604, throughput 8.90417K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0123205, throughput 9.0289K wps
[Epoch 7 Batch 60/62] avg loss 0.0125078, throughput 8.94396K wps
Begin Testing...
[Epoch 7] train avg loss 0.0125996, dev acc 0.6578, dev avg loss 0.61054, throughput 9.01891K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0122988, throughput 8.95663K wps
[Epoch 8 Batch 60/62] avg loss 0.012595, throughput 9.17493K wps
Begin Testing...
[Epoch 8] train avg loss 0.0125788, dev acc 0.6578, dev avg loss 0.60563, throughput 9.09356K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0122483, throughput 9.16241K wps
[Epoch 9 Batch 60/62] avg loss 0.0122308, throughput 8.76912K wps
Begin Testing...
[Epoch 9] train avg loss 0.0123937, dev acc 0.6578, dev avg loss 0.601684, throughput 8.93192K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.012111, throughput 8.80007K wps
[Epoch 10 Batch 60/62] avg loss 0.0121278, throughput 8.94568K wps
Begin Testing...
[Epoch 10] train avg loss 0.0122642, dev acc 0.6578, dev avg loss 0.596837, throughput 8.84752K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0120547, throughput 9.0285K wps
[Epoch 11 Batch 60/62] avg loss 0.0120285, throughput 9.07456K wps
Begin Testing...
[Epoch 11] train avg loss 0.0122202, dev acc 0.6578, dev avg loss 0.592672, throughput 9.06253K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0119965, throughput 9.027K wps
[Epoch 12 Batch 60/62] avg loss 0.0119875, throughput 8.93955K wps
Begin Testing...
[Epoch 12] train avg loss 0.0121458, dev acc 0.6608, dev avg loss 0.588809, throughput 9.01378K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0118589, throughput 8.96596K wps
[Epoch 13 Batch 60/62] avg loss 0.0117892, throughput 8.96235K wps
Begin Testing...
[Epoch 13] train avg loss 0.0119959, dev acc 0.6608, dev avg loss 0.584312, throughput 8.99483K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.011705, throughput 8.99183K wps
[Epoch 14 Batch 60/62] avg loss 0.0117677, throughput 8.92761K wps
Begin Testing...
[Epoch 14] train avg loss 0.0118684, dev acc 0.6578, dev avg loss 0.578991, throughput 8.99041K wps
[Epoch 15 Batch 30/62] avg loss 0.0116297, throughput 8.95437K wps
[Epoch 15 Batch 60/62] avg loss 0.0116021, throughput 9.10175K wps
Begin Testing...
[Epoch 15] train avg loss 0.0118092, dev acc 0.6726, dev avg loss 0.575795, throughput 9.02245K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0114291, throughput 9.09345K wps
[Epoch 16 Batch 60/62] avg loss 0.0116589, throughput 8.95427K wps
Begin Testing...
[Epoch 16] train avg loss 0.0116715, dev acc 0.6696, dev avg loss 0.569712, throughput 9.05138K wps
[Epoch 17 Batch 30/62] avg loss 0.0114889, throughput 9.07872K wps
[Epoch 17 Batch 60/62] avg loss 0.0113589, throughput 9.06701K wps
Begin Testing...
[Epoch 17] train avg loss 0.011581, dev acc 0.6667, dev avg loss 0.564298, throughput 9.10504K wps
[Epoch 18 Batch 30/62] avg loss 0.0113096, throughput 8.85613K wps
[Epoch 18 Batch 60/62] avg loss 0.0110824, throughput 8.9174K wps
Begin Testing...
[Epoch 18] train avg loss 0.0113504, dev acc 0.7345, dev avg loss 0.563814, throughput 8.86724K wps
Observed Improvement.
Begin Testing...
[Epoch 19 Batch 30/62] avg loss 0.0113291, throughput 9.00755K wps
[Epoch 19 Batch 60/62] avg loss 0.0110364, throughput 9.08378K wps
Begin Testing...
[Epoch 19] train avg loss 0.0113139, dev acc 0.6932, dev avg loss 0.55433, throughput 9.07574K wps
[Epoch 20 Batch 30/62] avg loss 0.0110013, throughput 9.0335K wps
[Epoch 20 Batch 60/62] avg loss 0.0110562, throughput 8.91312K wps
Begin Testing...
[Epoch 20] train avg loss 0.0111422, dev acc 0.6903, dev avg loss 0.54896, throughput 8.93529K wps
[Epoch 21 Batch 30/62] avg loss 0.0109371, throughput 8.9747K wps
[Epoch 21 Batch 60/62] avg loss 0.0108161, throughput 8.99352K wps
Begin Testing...
[Epoch 21] train avg loss 0.0109751, dev acc 0.7109, dev avg loss 0.543767, throughput 8.96891K wps
[Epoch 22 Batch 30/62] avg loss 0.0107258, throughput 9.14814K wps
[Epoch 22 Batch 60/62] avg loss 0.0107302, throughput 8.98485K wps
Begin Testing...
[Epoch 22] train avg loss 0.010866, dev acc 0.7227, dev avg loss 0.538718, throughput 9.09199K wps
[Epoch 23 Batch 30/62] avg loss 0.010795, throughput 9.06266K wps
[Epoch 23 Batch 60/62] avg loss 0.010654, throughput 9.10771K wps
Begin Testing...
[Epoch 23] train avg loss 0.0107945, dev acc 0.7050, dev avg loss 0.533386, throughput 9.09826K wps
[Epoch 24 Batch 30/62] avg loss 0.0104751, throughput 8.98614K wps
[Epoch 24 Batch 60/62] avg loss 0.0104443, throughput 9.133K wps
Begin Testing...
[Epoch 24] train avg loss 0.0106427, dev acc 0.7522, dev avg loss 0.52965, throughput 9.04267K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.0104825, throughput 8.78555K wps
[Epoch 25 Batch 60/62] avg loss 0.0102521, throughput 8.70367K wps
Begin Testing...
[Epoch 25] train avg loss 0.0105087, dev acc 0.7670, dev avg loss 0.524805, throughput 8.79363K wps
Observed Improvement.
Begin Testing...
[Epoch 26 Batch 30/62] avg loss 0.0102572, throughput 9.28679K wps
[Epoch 26 Batch 60/62] avg loss 0.0101626, throughput 9.1151K wps
Begin Testing...
[Epoch 26] train avg loss 0.0103683, dev acc 0.7522, dev avg loss 0.518795, throughput 9.22683K wps
[Epoch 27 Batch 30/62] avg loss 0.0102428, throughput 9.13579K wps
[Epoch 27 Batch 60/62] avg loss 0.0100377, throughput 9.10576K wps
Begin Testing...
[Epoch 27] train avg loss 0.0102525, dev acc 0.7434, dev avg loss 0.513442, throughput 9.14599K wps
[Epoch 28 Batch 30/62] avg loss 0.0100343, throughput 9.19017K wps
[Epoch 28 Batch 60/62] avg loss 0.0100292, throughput 9.05877K wps
Begin Testing...
[Epoch 28] train avg loss 0.0101936, dev acc 0.7640, dev avg loss 0.509104, throughput 9.15344K wps
[Epoch 29 Batch 30/62] avg loss 0.0097866, throughput 8.97744K wps
[Epoch 29 Batch 60/62] avg loss 0.00998984, throughput 9.00713K wps
Begin Testing...
[Epoch 29] train avg loss 0.00998685, dev acc 0.7670, dev avg loss 0.504415, throughput 8.97404K wps
Observed Improvement.
Begin Testing...
[Epoch 30 Batch 30/62] avg loss 0.00972612, throughput 9.11004K wps
[Epoch 30 Batch 60/62] avg loss 0.00977816, throughput 9.11115K wps
Begin Testing...
[Epoch 30] train avg loss 0.0098562, dev acc 0.7876, dev avg loss 0.500229, throughput 9.09727K wps
Observed Improvement.
Begin Testing...
[Epoch 31 Batch 30/62] avg loss 0.00975579, throughput 9.29138K wps
[Epoch 31 Batch 60/62] avg loss 0.00943392, throughput 8.8511K wps
Begin Testing...
[Epoch 31] train avg loss 0.00967066, dev acc 0.7994, dev avg loss 0.497216, throughput 9.0466K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/62] avg loss 0.00950955, throughput 8.88238K wps
[Epoch 32 Batch 60/62] avg loss 0.0094535, throughput 9.16808K wps
Begin Testing...
[Epoch 32] train avg loss 0.00959946, dev acc 0.7463, dev avg loss 0.492522, throughput 9.05056K wps
[Epoch 33 Batch 30/62] avg loss 0.00933424, throughput 9.02189K wps
[Epoch 33 Batch 60/62] avg loss 0.00922912, throughput 9.03636K wps
Begin Testing...
[Epoch 33] train avg loss 0.00937996, dev acc 0.7758, dev avg loss 0.487127, throughput 9.0589K wps
[Epoch 34 Batch 30/62] avg loss 0.00928157, throughput 8.88507K wps
[Epoch 34 Batch 60/62] avg loss 0.00935326, throughput 8.66335K wps
Begin Testing...
[Epoch 34] train avg loss 0.00940299, dev acc 0.7876, dev avg loss 0.483182, throughput 8.8382K wps
[Epoch 35 Batch 30/62] avg loss 0.00925579, throughput 9.24109K wps
[Epoch 35 Batch 60/62] avg loss 0.00926954, throughput 9.00748K wps
Begin Testing...
[Epoch 35] train avg loss 0.00939184, dev acc 0.7935, dev avg loss 0.47963, throughput 9.15914K wps
[Epoch 36 Batch 30/62] avg loss 0.00905709, throughput 9.0767K wps
[Epoch 36 Batch 60/62] avg loss 0.00899933, throughput 9.10519K wps
Begin Testing...
[Epoch 36] train avg loss 0.00913058, dev acc 0.8083, dev avg loss 0.476482, throughput 9.12083K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/62] avg loss 0.00892778, throughput 9.19292K wps
[Epoch 37 Batch 60/62] avg loss 0.00898688, throughput 8.65686K wps
Begin Testing...
[Epoch 37] train avg loss 0.00910665, dev acc 0.8083, dev avg loss 0.479109, throughput 8.91761K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/62] avg loss 0.00904295, throughput 8.79786K wps
[Epoch 38 Batch 60/62] avg loss 0.00876799, throughput 8.59508K wps
Begin Testing...
[Epoch 38] train avg loss 0.00903849, dev acc 0.8112, dev avg loss 0.47017, throughput 8.69263K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/62] avg loss 0.00859666, throughput 9.1932K wps
[Epoch 39 Batch 60/62] avg loss 0.00884438, throughput 9.01778K wps
Begin Testing...
[Epoch 39] train avg loss 0.00880798, dev acc 0.8024, dev avg loss 0.466165, throughput 9.13498K wps
[Epoch 40 Batch 30/62] avg loss 0.00848399, throughput 9.10193K wps
[Epoch 40 Batch 60/62] avg loss 0.00881425, throughput 8.99221K wps
Begin Testing...
[Epoch 40] train avg loss 0.00873888, dev acc 0.8024, dev avg loss 0.462919, throughput 9.08224K wps
[Epoch 41 Batch 30/62] avg loss 0.00878866, throughput 9.13406K wps
[Epoch 41 Batch 60/62] avg loss 0.00830699, throughput 9.08628K wps
Begin Testing...
[Epoch 41] train avg loss 0.00867226, dev acc 0.8142, dev avg loss 0.45978, throughput 9.13777K wps
Observed Improvement.
Begin Testing...
[Epoch 42 Batch 30/62] avg loss 0.00855816, throughput 8.87938K wps
[Epoch 42 Batch 60/62] avg loss 0.00839088, throughput 9.12636K wps
Begin Testing...
[Epoch 42] train avg loss 0.00860505, dev acc 0.8142, dev avg loss 0.456791, throughput 9.03133K wps
Observed Improvement.
Begin Testing...
[Epoch 43 Batch 30/62] avg loss 0.00852409, throughput 8.96887K wps
[Epoch 43 Batch 60/62] avg loss 0.00830311, throughput 9.09017K wps
Begin Testing...
[Epoch 43] train avg loss 0.00850778, dev acc 0.8053, dev avg loss 0.454069, throughput 9.05907K wps
[Epoch 44 Batch 30/62] avg loss 0.00804692, throughput 9.21588K wps
[Epoch 44 Batch 60/62] avg loss 0.00856706, throughput 8.73957K wps
Begin Testing...
[Epoch 44] train avg loss 0.00838284, dev acc 0.7817, dev avg loss 0.454157, throughput 8.95212K wps
[Epoch 45 Batch 30/62] avg loss 0.00823652, throughput 8.88438K wps
[Epoch 45 Batch 60/62] avg loss 0.00834598, throughput 8.76144K wps
Begin Testing...
[Epoch 45] train avg loss 0.00839875, dev acc 0.8083, dev avg loss 0.448724, throughput 8.81162K wps
[Epoch 46 Batch 30/62] avg loss 0.00804635, throughput 9.14251K wps
[Epoch 46 Batch 60/62] avg loss 0.00815168, throughput 8.98353K wps
Begin Testing...
[Epoch 46] train avg loss 0.00813858, dev acc 0.8083, dev avg loss 0.446344, throughput 9.038K wps
[Epoch 47 Batch 30/62] avg loss 0.00812912, throughput 9.19399K wps
[Epoch 47 Batch 60/62] avg loss 0.00790683, throughput 9.0708K wps
Begin Testing...
[Epoch 47] train avg loss 0.00811673, dev acc 0.8112, dev avg loss 0.443817, throughput 9.15703K wps
[Epoch 48 Batch 30/62] avg loss 0.00797974, throughput 9.20413K wps
[Epoch 48 Batch 60/62] avg loss 0.00764916, throughput 8.7954K wps
Begin Testing...
[Epoch 48] train avg loss 0.00787506, dev acc 0.8142, dev avg loss 0.442526, throughput 9.01665K wps
Observed Improvement.
Begin Testing...
[Epoch 49 Batch 30/62] avg loss 0.00800404, throughput 9.18047K wps
[Epoch 49 Batch 60/62] avg loss 0.00764901, throughput 8.94365K wps
Begin Testing...
[Epoch 49] train avg loss 0.00789885, dev acc 0.8230, dev avg loss 0.440001, throughput 9.09267K wps
Observed Improvement.
Begin Testing...
[Epoch 50 Batch 30/62] avg loss 0.00803121, throughput 8.91596K wps
[Epoch 50 Batch 60/62] avg loss 0.00760389, throughput 8.66713K wps
Begin Testing...
[Epoch 50] train avg loss 0.00791688, dev acc 0.8201, dev avg loss 0.43765, throughput 8.80733K wps
[Epoch 51 Batch 30/62] avg loss 0.00784869, throughput 9.08192K wps
[Epoch 51 Batch 60/62] avg loss 0.00753502, throughput 8.99968K wps
Begin Testing...
[Epoch 51] train avg loss 0.0077728, dev acc 0.8142, dev avg loss 0.435193, throughput 9.00989K wps
[Epoch 52 Batch 30/62] avg loss 0.00750413, throughput 9.18937K wps
[Epoch 52 Batch 60/62] avg loss 0.00760569, throughput 8.88545K wps
Begin Testing...
[Epoch 52] train avg loss 0.00760475, dev acc 0.8319, dev avg loss 0.434386, throughput 9.06515K wps
Observed Improvement.
Begin Testing...
[Epoch 53 Batch 30/62] avg loss 0.00776582, throughput 9.04588K wps
[Epoch 53 Batch 60/62] avg loss 0.00733594, throughput 8.96956K wps
Begin Testing...
[Epoch 53] train avg loss 0.0076354, dev acc 0.8142, dev avg loss 0.432018, throughput 9.04012K wps
[Epoch 54 Batch 30/62] avg loss 0.00742495, throughput 9.00624K wps
[Epoch 54 Batch 60/62] avg loss 0.00746771, throughput 8.67469K wps
Begin Testing...
[Epoch 54] train avg loss 0.00753942, dev acc 0.8171, dev avg loss 0.429991, throughput 8.83619K wps
[Epoch 55 Batch 30/62] avg loss 0.00715585, throughput 9.05542K wps
[Epoch 55 Batch 60/62] avg loss 0.00734241, throughput 9.12742K wps
Begin Testing...
[Epoch 55] train avg loss 0.00734785, dev acc 0.8378, dev avg loss 0.429579, throughput 9.1227K wps
Observed Improvement.
Begin Testing...
[Epoch 56 Batch 30/62] avg loss 0.00730797, throughput 9.10886K wps
[Epoch 56 Batch 60/62] avg loss 0.00739085, throughput 9.18195K wps
Begin Testing...
[Epoch 56] train avg loss 0.00742754, dev acc 0.8201, dev avg loss 0.425445, throughput 9.17365K wps
[Epoch 57 Batch 30/62] avg loss 0.0071079, throughput 9.12448K wps
[Epoch 57 Batch 60/62] avg loss 0.00720712, throughput 9.10896K wps
Begin Testing...
[Epoch 57] train avg loss 0.00721379, dev acc 0.8201, dev avg loss 0.423636, throughput 9.14608K wps
[Epoch 58 Batch 30/62] avg loss 0.00697587, throughput 8.33148K wps
[Epoch 58 Batch 60/62] avg loss 0.00718073, throughput 9.04501K wps
Begin Testing...
[Epoch 58] train avg loss 0.00715177, dev acc 0.8201, dev avg loss 0.422124, throughput 8.70621K wps
[Epoch 59 Batch 30/62] avg loss 0.00709635, throughput 8.92019K wps
[Epoch 59 Batch 60/62] avg loss 0.00690122, throughput 9.15146K wps
Begin Testing...
[Epoch 59] train avg loss 0.00705902, dev acc 0.8201, dev avg loss 0.420443, throughput 9.06118K wps
[Epoch 60 Batch 30/62] avg loss 0.0070565, throughput 9.18302K wps
[Epoch 60 Batch 60/62] avg loss 0.00677278, throughput 8.94429K wps
Begin Testing...
[Epoch 60] train avg loss 0.00701495, dev acc 0.8348, dev avg loss 0.419573, throughput 9.03494K wps
[Epoch 61 Batch 30/62] avg loss 0.00708427, throughput 9.12676K wps
[Epoch 61 Batch 60/62] avg loss 0.00670657, throughput 8.83407K wps
Begin Testing...
[Epoch 61] train avg loss 0.00694841, dev acc 0.8319, dev avg loss 0.417148, throughput 9.00648K wps
[Epoch 62 Batch 30/62] avg loss 0.00669118, throughput 8.96124K wps
[Epoch 62 Batch 60/62] avg loss 0.00691144, throughput 8.99116K wps
Begin Testing...
[Epoch 62] train avg loss 0.00689682, dev acc 0.8201, dev avg loss 0.416014, throughput 9.0044K wps
[Epoch 63 Batch 30/62] avg loss 0.00658337, throughput 8.98856K wps
[Epoch 63 Batch 60/62] avg loss 0.00669687, throughput 8.9816K wps
Begin Testing...
[Epoch 63] train avg loss 0.0066935, dev acc 0.8112, dev avg loss 0.416917, throughput 9.01701K wps
[Epoch 64 Batch 30/62] avg loss 0.0066532, throughput 8.69673K wps
[Epoch 64 Batch 60/62] avg loss 0.00678552, throughput 8.65485K wps
Begin Testing...
[Epoch 64] train avg loss 0.00678549, dev acc 0.8201, dev avg loss 0.413299, throughput 8.71035K wps
[Epoch 65 Batch 30/62] avg loss 0.00672629, throughput 8.92165K wps
[Epoch 65 Batch 60/62] avg loss 0.00665369, throughput 8.88208K wps
Begin Testing...
[Epoch 65] train avg loss 0.00676424, dev acc 0.8378, dev avg loss 0.411794, throughput 8.93621K wps
Observed Improvement.
Begin Testing...
[Epoch 66 Batch 30/62] avg loss 0.00670642, throughput 8.86064K wps
[Epoch 66 Batch 60/62] avg loss 0.00635686, throughput 9.19508K wps
Begin Testing...
[Epoch 66] train avg loss 0.00668218, dev acc 0.8378, dev avg loss 0.409856, throughput 9.00646K wps
Observed Improvement.
Begin Testing...
[Epoch 67 Batch 30/62] avg loss 0.00651789, throughput 9.09254K wps
[Epoch 67 Batch 60/62] avg loss 0.00649637, throughput 8.92714K wps
Begin Testing...
[Epoch 67] train avg loss 0.00652717, dev acc 0.8378, dev avg loss 0.408902, throughput 9.03299K wps
Observed Improvement.
Begin Testing...
[Epoch 68 Batch 30/62] avg loss 0.00654271, throughput 8.97892K wps
[Epoch 68 Batch 60/62] avg loss 0.00602223, throughput 9.01232K wps
Begin Testing...
[Epoch 68] train avg loss 0.00642917, dev acc 0.8319, dev avg loss 0.407588, throughput 9.02441K wps
[Epoch 69 Batch 30/62] avg loss 0.00638735, throughput 8.99801K wps
[Epoch 69 Batch 60/62] avg loss 0.00629163, throughput 8.99651K wps
Begin Testing...
[Epoch 69] train avg loss 0.00644667, dev acc 0.8348, dev avg loss 0.406653, throughput 8.98453K wps
[Epoch 70 Batch 30/62] avg loss 0.0060983, throughput 8.9573K wps
[Epoch 70 Batch 60/62] avg loss 0.0062969, throughput 9.05969K wps
Begin Testing...
[Epoch 70] train avg loss 0.00626264, dev acc 0.8348, dev avg loss 0.406761, throughput 8.99028K wps
[Epoch 71 Batch 30/62] avg loss 0.00622193, throughput 9K wps
[Epoch 71 Batch 60/62] avg loss 0.00608515, throughput 9.02261K wps
Begin Testing...
[Epoch 71] train avg loss 0.00627204, dev acc 0.8407, dev avg loss 0.404265, throughput 9.03974K wps
Observed Improvement.
Begin Testing...
[Epoch 72 Batch 30/62] avg loss 0.00600217, throughput 9.02136K wps
[Epoch 72 Batch 60/62] avg loss 0.00639613, throughput 9.14585K wps
Begin Testing...
[Epoch 72] train avg loss 0.00625483, dev acc 0.8407, dev avg loss 0.404237, throughput 9.11365K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/62] avg loss 0.00623059, throughput 9.2379K wps
[Epoch 73 Batch 60/62] avg loss 0.00588788, throughput 8.98407K wps
Begin Testing...
[Epoch 73] train avg loss 0.00618653, dev acc 0.8437, dev avg loss 0.402209, throughput 9.13628K wps
Observed Improvement.
Begin Testing...
[Epoch 74 Batch 30/62] avg loss 0.00610223, throughput 8.97832K wps
[Epoch 74 Batch 60/62] avg loss 0.00613583, throughput 8.92161K wps
Begin Testing...
[Epoch 74] train avg loss 0.00618768, dev acc 0.8437, dev avg loss 0.400798, throughput 9.00988K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/62] avg loss 0.00581569, throughput 9.25425K wps
[Epoch 75 Batch 60/62] avg loss 0.00604495, throughput 9.14214K wps
Begin Testing...
[Epoch 75] train avg loss 0.00599966, dev acc 0.8437, dev avg loss 0.400104, throughput 9.18246K wps
Observed Improvement.
Begin Testing...
[Epoch 76 Batch 30/62] avg loss 0.00611145, throughput 9.01615K wps
[Epoch 76 Batch 60/62] avg loss 0.00565477, throughput 8.97872K wps
Begin Testing...
[Epoch 76] train avg loss 0.00598036, dev acc 0.8378, dev avg loss 0.400397, throughput 9.0222K wps
[Epoch 77 Batch 30/62] avg loss 0.00561949, throughput 8.86764K wps
[Epoch 77 Batch 60/62] avg loss 0.00593224, throughput 9.11227K wps
Begin Testing...
[Epoch 77] train avg loss 0.00584392, dev acc 0.8319, dev avg loss 0.403267, throughput 9.02263K wps
[Epoch 78 Batch 30/62] avg loss 0.00574868, throughput 9.22181K wps
[Epoch 78 Batch 60/62] avg loss 0.00569667, throughput 9.03872K wps
Begin Testing...
[Epoch 78] train avg loss 0.00575393, dev acc 0.8378, dev avg loss 0.396811, throughput 9.11332K wps
[Epoch 79 Batch 30/62] avg loss 0.00569534, throughput 9.15108K wps
[Epoch 79 Batch 60/62] avg loss 0.00579388, throughput 9.10948K wps
Begin Testing...
[Epoch 79] train avg loss 0.00586434, dev acc 0.8407, dev avg loss 0.397037, throughput 9.15997K wps
[Epoch 80 Batch 30/62] avg loss 0.00567025, throughput 9.19279K wps
[Epoch 80 Batch 60/62] avg loss 0.00545283, throughput 9.00735K wps
Begin Testing...
[Epoch 80] train avg loss 0.00567452, dev acc 0.8378, dev avg loss 0.395356, throughput 9.09093K wps
[Epoch 81 Batch 30/62] avg loss 0.00554545, throughput 9.25234K wps
[Epoch 81 Batch 60/62] avg loss 0.00545095, throughput 8.94573K wps
Begin Testing...
[Epoch 81] train avg loss 0.00562884, dev acc 0.8378, dev avg loss 0.394637, throughput 9.06232K wps
[Epoch 82 Batch 30/62] avg loss 0.00561043, throughput 9.06802K wps
[Epoch 82 Batch 60/62] avg loss 0.00552334, throughput 9.00456K wps
Begin Testing...
[Epoch 82] train avg loss 0.00563189, dev acc 0.8348, dev avg loss 0.394183, throughput 9.06598K wps
[Epoch 83 Batch 30/62] avg loss 0.00545287, throughput 9.14961K wps
[Epoch 83 Batch 60/62] avg loss 0.00548863, throughput 8.97008K wps
Begin Testing...
[Epoch 83] train avg loss 0.00549271, dev acc 0.8348, dev avg loss 0.392852, throughput 9.08634K wps
[Epoch 84 Batch 30/62] avg loss 0.00527846, throughput 9.14908K wps
[Epoch 84 Batch 60/62] avg loss 0.00541444, throughput 9.06378K wps
Begin Testing...
[Epoch 84] train avg loss 0.00533961, dev acc 0.8348, dev avg loss 0.391974, throughput 9.0875K wps
[Epoch 85 Batch 30/62] avg loss 0.0055192, throughput 9.25857K wps
[Epoch 85 Batch 60/62] avg loss 0.00536998, throughput 8.9537K wps
Begin Testing...
[Epoch 85] train avg loss 0.00547667, dev acc 0.8407, dev avg loss 0.390994, throughput 9.12442K wps
[Epoch 86 Batch 30/62] avg loss 0.00513126, throughput 9.22981K wps
[Epoch 86 Batch 60/62] avg loss 0.00531293, throughput 8.99972K wps
Begin Testing...
[Epoch 86] train avg loss 0.0053762, dev acc 0.8319, dev avg loss 0.394051, throughput 9.0933K wps
[Epoch 87 Batch 30/62] avg loss 0.00516998, throughput 9.02681K wps
[Epoch 87 Batch 60/62] avg loss 0.00514359, throughput 9.17851K wps
Begin Testing...
[Epoch 87] train avg loss 0.00525187, dev acc 0.8407, dev avg loss 0.389891, throughput 9.13273K wps
[Epoch 88 Batch 30/62] avg loss 0.00518949, throughput 9.262K wps
[Epoch 88 Batch 60/62] avg loss 0.00509161, throughput 9.14526K wps
Begin Testing...
[Epoch 88] train avg loss 0.00519989, dev acc 0.8378, dev avg loss 0.389496, throughput 9.23164K wps
[Epoch 89 Batch 30/62] avg loss 0.00496751, throughput 9.2331K wps
[Epoch 89 Batch 60/62] avg loss 0.00496813, throughput 9.05229K wps
Begin Testing...
[Epoch 89] train avg loss 0.00500528, dev acc 0.8378, dev avg loss 0.388561, throughput 9.14445K wps
[Epoch 90 Batch 30/62] avg loss 0.00511253, throughput 9.1374K wps
[Epoch 90 Batch 60/62] avg loss 0.00501465, throughput 9.0156K wps
Begin Testing...
[Epoch 90] train avg loss 0.00510294, dev acc 0.8378, dev avg loss 0.388571, throughput 9.10538K wps
[Epoch 91 Batch 30/62] avg loss 0.00487831, throughput 8.99293K wps
[Epoch 91 Batch 60/62] avg loss 0.00483441, throughput 9.14071K wps
Begin Testing...
[Epoch 91] train avg loss 0.00493297, dev acc 0.8348, dev avg loss 0.388727, throughput 9.04847K wps
[Epoch 92 Batch 30/62] avg loss 0.00483132, throughput 9.29956K wps
[Epoch 92 Batch 60/62] avg loss 0.00497941, throughput 9.11874K wps
Begin Testing...
[Epoch 92] train avg loss 0.00493149, dev acc 0.8378, dev avg loss 0.386777, throughput 9.23495K wps
[Epoch 93 Batch 30/62] avg loss 0.00496427, throughput 9.23617K wps
[Epoch 93 Batch 60/62] avg loss 0.00489505, throughput 8.90139K wps
Begin Testing...
[Epoch 93] train avg loss 0.00499861, dev acc 0.8378, dev avg loss 0.386113, throughput 9.05642K wps
[Epoch 94 Batch 30/62] avg loss 0.00479797, throughput 9.03499K wps
[Epoch 94 Batch 60/62] avg loss 0.00486431, throughput 9.089K wps
Begin Testing...
[Epoch 94] train avg loss 0.0048537, dev acc 0.8407, dev avg loss 0.387982, throughput 9.09327K wps
[Epoch 95 Batch 30/62] avg loss 0.00465742, throughput 9.15391K wps
[Epoch 95 Batch 60/62] avg loss 0.00498866, throughput 8.90213K wps
Begin Testing...
[Epoch 95] train avg loss 0.00483443, dev acc 0.8378, dev avg loss 0.385616, throughput 9.05486K wps
[Epoch 96 Batch 30/62] avg loss 0.00489737, throughput 8.96476K wps
[Epoch 96 Batch 60/62] avg loss 0.0045652, throughput 8.99202K wps
Begin Testing...
[Epoch 96] train avg loss 0.00481035, dev acc 0.8260, dev avg loss 0.385364, throughput 8.94531K wps
[Epoch 97 Batch 30/62] avg loss 0.0046827, throughput 8.97842K wps
[Epoch 97 Batch 60/62] avg loss 0.00475507, throughput 9.01796K wps
Begin Testing...
[Epoch 97] train avg loss 0.00479268, dev acc 0.8378, dev avg loss 0.384673, throughput 9.02838K wps
[Epoch 98 Batch 30/62] avg loss 0.00458107, throughput 9.10178K wps
[Epoch 98 Batch 60/62] avg loss 0.00458989, throughput 9.07644K wps
Begin Testing...
[Epoch 98] train avg loss 0.00462581, dev acc 0.8378, dev avg loss 0.384072, throughput 9.1296K wps
[Epoch 99 Batch 30/62] avg loss 0.00463554, throughput 9.20246K wps
[Epoch 99 Batch 60/62] avg loss 0.00437481, throughput 8.73722K wps
Begin Testing...
[Epoch 99] train avg loss 0.00453105, dev acc 0.8348, dev avg loss 0.383909, throughput 8.93914K wps
[Epoch 100 Batch 30/62] avg loss 0.00471669, throughput 9.23999K wps
[Epoch 100 Batch 60/62] avg loss 0.00452876, throughput 8.95163K wps
Begin Testing...
[Epoch 100] train avg loss 0.00468337, dev acc 0.8378, dev avg loss 0.38253, throughput 9.13871K wps
[Epoch 101 Batch 30/62] avg loss 0.00451688, throughput 8.93944K wps
[Epoch 101 Batch 60/62] avg loss 0.00435805, throughput 9.10198K wps
Begin Testing...
[Epoch 101] train avg loss 0.00446479, dev acc 0.8289, dev avg loss 0.381648, throughput 9.05143K wps
[Epoch 102 Batch 30/62] avg loss 0.0042618, throughput 9.0198K wps
[Epoch 102 Batch 60/62] avg loss 0.00452787, throughput 9.15829K wps
Begin Testing...
[Epoch 102] train avg loss 0.00442103, dev acc 0.8289, dev avg loss 0.381176, throughput 9.1173K wps
[Epoch 103 Batch 30/62] avg loss 0.00455635, throughput 9.26273K wps
[Epoch 103 Batch 60/62] avg loss 0.00414123, throughput 8.99321K wps
Begin Testing...
[Epoch 103] train avg loss 0.00440424, dev acc 0.8289, dev avg loss 0.381379, throughput 9.14686K wps
[Epoch 104 Batch 30/62] avg loss 0.00437864, throughput 9.22995K wps
[Epoch 104 Batch 60/62] avg loss 0.00460786, throughput 8.82895K wps
Begin Testing...
[Epoch 104] train avg loss 0.00453587, dev acc 0.8407, dev avg loss 0.381564, throughput 8.99817K wps
[Epoch 105 Batch 30/62] avg loss 0.00417084, throughput 9.10416K wps
[Epoch 105 Batch 60/62] avg loss 0.0041272, throughput 9.15039K wps
Begin Testing...
[Epoch 105] train avg loss 0.00419927, dev acc 0.8437, dev avg loss 0.38273, throughput 9.15797K wps
Observed Improvement.
Begin Testing...
[Epoch 106 Batch 30/62] avg loss 0.0043038, throughput 9.21394K wps
[Epoch 106 Batch 60/62] avg loss 0.0042486, throughput 9.03482K wps
Begin Testing...
[Epoch 106] train avg loss 0.00429746, dev acc 0.8289, dev avg loss 0.380704, throughput 9.15458K wps
[Epoch 107 Batch 30/62] avg loss 0.00428, throughput 8.71143K wps
[Epoch 107 Batch 60/62] avg loss 0.00419186, throughput 9.09539K wps
Begin Testing...
[Epoch 107] train avg loss 0.00427841, dev acc 0.8289, dev avg loss 0.380284, throughput 8.92999K wps
[Epoch 108 Batch 30/62] avg loss 0.00408646, throughput 8.9011K wps
[Epoch 108 Batch 60/62] avg loss 0.00415774, throughput 9.00855K wps
Begin Testing...
[Epoch 108] train avg loss 0.00421787, dev acc 0.8319, dev avg loss 0.380618, throughput 9.03158K wps
[Epoch 109 Batch 30/62] avg loss 0.00412978, throughput 9.18761K wps
[Epoch 109 Batch 60/62] avg loss 0.00387939, throughput 9.07166K wps
Begin Testing...
[Epoch 109] train avg loss 0.00408052, dev acc 0.8289, dev avg loss 0.381641, throughput 9.09779K wps
[Epoch 110 Batch 30/62] avg loss 0.00400921, throughput 9.10408K wps
[Epoch 110 Batch 60/62] avg loss 0.00416135, throughput 8.98322K wps
Begin Testing...
[Epoch 110] train avg loss 0.00413777, dev acc 0.8437, dev avg loss 0.379945, throughput 9.07543K wps
Observed Improvement.
Begin Testing...
[Epoch 111 Batch 30/62] avg loss 0.00404156, throughput 9.00652K wps
[Epoch 111 Batch 60/62] avg loss 0.003907, throughput 9.09493K wps
Begin Testing...
[Epoch 111] train avg loss 0.00404363, dev acc 0.8466, dev avg loss 0.380743, throughput 9.0269K wps
Observed Improvement.
Begin Testing...
[Epoch 112 Batch 30/62] avg loss 0.00422329, throughput 9.19936K wps
[Epoch 112 Batch 60/62] avg loss 0.0038524, throughput 9.0032K wps
Begin Testing...
[Epoch 112] train avg loss 0.00408294, dev acc 0.8260, dev avg loss 0.377205, throughput 9.06961K wps
[Epoch 113 Batch 30/62] avg loss 0.00375483, throughput 9.17284K wps
[Epoch 113 Batch 60/62] avg loss 0.00427239, throughput 8.87595K wps
Begin Testing...
[Epoch 113] train avg loss 0.00404884, dev acc 0.8378, dev avg loss 0.377816, throughput 9.05489K wps
[Epoch 114 Batch 30/62] avg loss 0.0040373, throughput 8.94969K wps
[Epoch 114 Batch 60/62] avg loss 0.00393702, throughput 9.0977K wps
Begin Testing...
[Epoch 114] train avg loss 0.00399295, dev acc 0.8378, dev avg loss 0.378138, throughput 9.05438K wps
[Epoch 115 Batch 30/62] avg loss 0.00379725, throughput 9.19229K wps
[Epoch 115 Batch 60/62] avg loss 0.00388401, throughput 9.00345K wps
Begin Testing...
[Epoch 115] train avg loss 0.00387916, dev acc 0.8378, dev avg loss 0.377376, throughput 9.07021K wps
[Epoch 116 Batch 30/62] avg loss 0.00400235, throughput 9.02913K wps
[Epoch 116 Batch 60/62] avg loss 0.00381748, throughput 9.1372K wps
Begin Testing...
[Epoch 116] train avg loss 0.00396714, dev acc 0.8319, dev avg loss 0.376851, throughput 9.11412K wps
[Epoch 117 Batch 30/62] avg loss 0.00379301, throughput 9.07346K wps
[Epoch 117 Batch 60/62] avg loss 0.00371492, throughput 9.11824K wps
Begin Testing...
[Epoch 117] train avg loss 0.00384585, dev acc 0.8378, dev avg loss 0.376383, throughput 9.12744K wps
[Epoch 118 Batch 30/62] avg loss 0.00360313, throughput 8.9343K wps
[Epoch 118 Batch 60/62] avg loss 0.00376599, throughput 9.1411K wps
Begin Testing...
[Epoch 118] train avg loss 0.00372793, dev acc 0.8289, dev avg loss 0.375701, throughput 9.06746K wps
[Epoch 119 Batch 30/62] avg loss 0.00371489, throughput 8.69975K wps
[Epoch 119 Batch 60/62] avg loss 0.00382874, throughput 8.66034K wps
Begin Testing...
[Epoch 119] train avg loss 0.00381818, dev acc 0.8319, dev avg loss 0.375731, throughput 8.65527K wps
[Epoch 120 Batch 30/62] avg loss 0.00362999, throughput 9.03592K wps
[Epoch 120 Batch 60/62] avg loss 0.00375513, throughput 9.06084K wps
Begin Testing...
[Epoch 120] train avg loss 0.00369906, dev acc 0.8260, dev avg loss 0.375443, throughput 9.07869K wps
[Epoch 121 Batch 30/62] avg loss 0.00370938, throughput 9.21554K wps
[Epoch 121 Batch 60/62] avg loss 0.00378554, throughput 8.67873K wps
Begin Testing...
[Epoch 121] train avg loss 0.00382688, dev acc 0.8260, dev avg loss 0.379706, throughput 8.92028K wps
[Epoch 122 Batch 30/62] avg loss 0.00358645, throughput 9.16311K wps
[Epoch 122 Batch 60/62] avg loss 0.00350121, throughput 8.91555K wps
Begin Testing...
[Epoch 122] train avg loss 0.00357281, dev acc 0.8260, dev avg loss 0.375471, throughput 9.09588K wps
[Epoch 123 Batch 30/62] avg loss 0.00321452, throughput 9.17199K wps
[Epoch 123 Batch 60/62] avg loss 0.00387733, throughput 9.08984K wps
Begin Testing...
[Epoch 123] train avg loss 0.00363761, dev acc 0.8260, dev avg loss 0.377989, throughput 9.15881K wps
[Epoch 124 Batch 30/62] avg loss 0.00339925, throughput 9.20045K wps
[Epoch 124 Batch 60/62] avg loss 0.00347954, throughput 9.01311K wps
Begin Testing...
[Epoch 124] train avg loss 0.00348131, dev acc 0.8437, dev avg loss 0.382727, throughput 9.05919K wps
[Epoch 125 Batch 30/62] avg loss 0.00355013, throughput 9.13993K wps
[Epoch 125 Batch 60/62] avg loss 0.0035461, throughput 8.80035K wps
Begin Testing...
[Epoch 125] train avg loss 0.00358062, dev acc 0.8407, dev avg loss 0.376074, throughput 8.94588K wps
[Epoch 126 Batch 30/62] avg loss 0.00353826, throughput 8.85421K wps
[Epoch 126 Batch 60/62] avg loss 0.00352411, throughput 8.94858K wps
Begin Testing...
[Epoch 126] train avg loss 0.00357649, dev acc 0.8171, dev avg loss 0.375896, throughput 8.88729K wps
[Epoch 127 Batch 30/62] avg loss 0.00359439, throughput 9.00853K wps
[Epoch 127 Batch 60/62] avg loss 0.0032896, throughput 8.84139K wps
Begin Testing...
[Epoch 127] train avg loss 0.00355002, dev acc 0.8260, dev avg loss 0.374193, throughput 8.95994K wps
[Epoch 128 Batch 30/62] avg loss 0.00345893, throughput 8.86505K wps
[Epoch 128 Batch 60/62] avg loss 0.00330587, throughput 8.85159K wps
Begin Testing...
[Epoch 128] train avg loss 0.00345551, dev acc 0.8260, dev avg loss 0.374904, throughput 8.89545K wps
[Epoch 129 Batch 30/62] avg loss 0.00338963, throughput 9.21162K wps
[Epoch 129 Batch 60/62] avg loss 0.00329151, throughput 8.87315K wps
Begin Testing...
[Epoch 129] train avg loss 0.00335934, dev acc 0.8319, dev avg loss 0.374256, throughput 9.06892K wps
[Epoch 130 Batch 30/62] avg loss 0.00324213, throughput 9.15133K wps
[Epoch 130 Batch 60/62] avg loss 0.00346984, throughput 8.9283K wps
Begin Testing...
[Epoch 130] train avg loss 0.00339979, dev acc 0.8201, dev avg loss 0.375456, throughput 9.06934K wps
[Epoch 131 Batch 30/62] avg loss 0.00322897, throughput 9.04099K wps
[Epoch 131 Batch 60/62] avg loss 0.00331711, throughput 9.131K wps
Begin Testing...
[Epoch 131] train avg loss 0.00330576, dev acc 0.8230, dev avg loss 0.376584, throughput 9.11887K wps
[Epoch 132 Batch 30/62] avg loss 0.00321295, throughput 8.98737K wps
[Epoch 132 Batch 60/62] avg loss 0.00326956, throughput 8.91704K wps
Begin Testing...
[Epoch 132] train avg loss 0.00326497, dev acc 0.8230, dev avg loss 0.373629, throughput 8.98395K wps
[Epoch 133 Batch 30/62] avg loss 0.00313112, throughput 9.27338K wps
[Epoch 133 Batch 60/62] avg loss 0.00324048, throughput 8.78816K wps
Begin Testing...
[Epoch 133] train avg loss 0.00321869, dev acc 0.8201, dev avg loss 0.374701, throughput 9.05847K wps
[Epoch 134 Batch 30/62] avg loss 0.00310067, throughput 9.05013K wps
[Epoch 134 Batch 60/62] avg loss 0.00319228, throughput 9.10921K wps
Begin Testing...
[Epoch 134] train avg loss 0.00316801, dev acc 0.8289, dev avg loss 0.373971, throughput 9.11013K wps
[Epoch 135 Batch 30/62] avg loss 0.00318059, throughput 9.02183K wps
[Epoch 135 Batch 60/62] avg loss 0.00317584, throughput 9.07659K wps
Begin Testing...
[Epoch 135] train avg loss 0.00322068, dev acc 0.8319, dev avg loss 0.375001, throughput 9.07555K wps
[Epoch 136 Batch 30/62] avg loss 0.00299043, throughput 8.97707K wps
[Epoch 136 Batch 60/62] avg loss 0.00310567, throughput 8.8035K wps
Begin Testing...
[Epoch 136] train avg loss 0.00315327, dev acc 0.8260, dev avg loss 0.373057, throughput 8.88005K wps
[Epoch 137 Batch 30/62] avg loss 0.0031695, throughput 9.21026K wps
[Epoch 137 Batch 60/62] avg loss 0.00299401, throughput 9.09576K wps
Begin Testing...
[Epoch 137] train avg loss 0.0031044, dev acc 0.8319, dev avg loss 0.373869, throughput 9.17711K wps
[Epoch 138 Batch 30/62] avg loss 0.00305802, throughput 9.14682K wps
[Epoch 138 Batch 60/62] avg loss 0.00302942, throughput 8.95827K wps
Begin Testing...
[Epoch 138] train avg loss 0.00307596, dev acc 0.8319, dev avg loss 0.373607, throughput 9.08375K wps
[Epoch 139 Batch 30/62] avg loss 0.00299411, throughput 9.01018K wps
[Epoch 139 Batch 60/62] avg loss 0.00307156, throughput 8.97433K wps
Begin Testing...
[Epoch 139] train avg loss 0.00305472, dev acc 0.8378, dev avg loss 0.374574, throughput 9.0172K wps
[Epoch 140 Batch 30/62] avg loss 0.00310231, throughput 9.27205K wps
[Epoch 140 Batch 60/62] avg loss 0.00284643, throughput 8.66286K wps
Begin Testing...
[Epoch 140] train avg loss 0.00299965, dev acc 0.8407, dev avg loss 0.375143, throughput 9.02078K wps
[Epoch 141 Batch 30/62] avg loss 0.00288743, throughput 8.97388K wps
[Epoch 141 Batch 60/62] avg loss 0.00289559, throughput 9.01555K wps
Begin Testing...
[Epoch 141] train avg loss 0.00292147, dev acc 0.8289, dev avg loss 0.372713, throughput 8.96857K wps
[Epoch 142 Batch 30/62] avg loss 0.00300697, throughput 8.87816K wps
[Epoch 142 Batch 60/62] avg loss 0.00298866, throughput 9.04844K wps
Begin Testing...
[Epoch 142] train avg loss 0.0030602, dev acc 0.8319, dev avg loss 0.37254, throughput 8.99555K wps
[Epoch 143 Batch 30/62] avg loss 0.00280717, throughput 8.73564K wps
[Epoch 143 Batch 60/62] avg loss 0.0029411, throughput 9.29642K wps
Begin Testing...
[Epoch 143] train avg loss 0.00290526, dev acc 0.8319, dev avg loss 0.373013, throughput 9.04301K wps
[Epoch 144 Batch 30/62] avg loss 0.00295755, throughput 8.98388K wps
[Epoch 144 Batch 60/62] avg loss 0.00273658, throughput 9.07647K wps
Begin Testing...
[Epoch 144] train avg loss 0.00287015, dev acc 0.8289, dev avg loss 0.373455, throughput 9.00401K wps
[Epoch 145 Batch 30/62] avg loss 0.00289245, throughput 8.82782K wps
[Epoch 145 Batch 60/62] avg loss 0.00289986, throughput 9.15335K wps
Begin Testing...
[Epoch 145] train avg loss 0.00295326, dev acc 0.8319, dev avg loss 0.373869, throughput 9.02043K wps
[Epoch 146 Batch 30/62] avg loss 0.00288768, throughput 9.15902K wps
[Epoch 146 Batch 60/62] avg loss 0.00286233, throughput 8.68402K wps
Begin Testing...
[Epoch 146] train avg loss 0.002926, dev acc 0.8230, dev avg loss 0.373284, throughput 8.92835K wps
[Epoch 147 Batch 30/62] avg loss 0.00282928, throughput 9.10453K wps
[Epoch 147 Batch 60/62] avg loss 0.00261362, throughput 9.06693K wps
Begin Testing...
[Epoch 147] train avg loss 0.00274862, dev acc 0.8466, dev avg loss 0.380156, throughput 9.11445K wps
Observed Improvement.
Begin Testing...
[Epoch 148 Batch 30/62] avg loss 0.0026734, throughput 8.89946K wps
[Epoch 148 Batch 60/62] avg loss 0.00282661, throughput 9.10278K wps
Begin Testing...
[Epoch 148] train avg loss 0.00278004, dev acc 0.8466, dev avg loss 0.377043, throughput 9.03144K wps
Observed Improvement.
Begin Testing...
[Epoch 149 Batch 30/62] avg loss 0.00272666, throughput 9.14434K wps
[Epoch 149 Batch 60/62] avg loss 0.00275616, throughput 8.60037K wps
Begin Testing...
[Epoch 149] train avg loss 0.0027423, dev acc 0.8289, dev avg loss 0.372798, throughput 8.85889K wps
[Epoch 150 Batch 30/62] avg loss 0.00268804, throughput 9.11788K wps
[Epoch 150 Batch 60/62] avg loss 0.00272255, throughput 8.77406K wps
Begin Testing...
[Epoch 150] train avg loss 0.00271965, dev acc 0.8289, dev avg loss 0.3725, throughput 8.91224K wps
[Epoch 151 Batch 30/62] avg loss 0.00278031, throughput 9.0385K wps
[Epoch 151 Batch 60/62] avg loss 0.00262249, throughput 9.03545K wps
Begin Testing...
[Epoch 151] train avg loss 0.00273048, dev acc 0.8319, dev avg loss 0.372234, throughput 9.06745K wps
[Epoch 152 Batch 30/62] avg loss 0.00267499, throughput 8.84602K wps
[Epoch 152 Batch 60/62] avg loss 0.00263859, throughput 9.14981K wps
Begin Testing...
[Epoch 152] train avg loss 0.0026764, dev acc 0.8230, dev avg loss 0.372318, throughput 9.02735K wps
[Epoch 153 Batch 30/62] avg loss 0.00263855, throughput 9.23941K wps
[Epoch 153 Batch 60/62] avg loss 0.00277887, throughput 9.1268K wps
Begin Testing...
[Epoch 153] train avg loss 0.00271571, dev acc 0.8319, dev avg loss 0.372109, throughput 9.20678K wps
[Epoch 154 Batch 30/62] avg loss 0.00271721, throughput 9.25907K wps
[Epoch 154 Batch 60/62] avg loss 0.00262348, throughput 8.86074K wps
Begin Testing...
[Epoch 154] train avg loss 0.00274082, dev acc 0.8319, dev avg loss 0.372409, throughput 9.11317K wps
[Epoch 155 Batch 30/62] avg loss 0.00257621, throughput 9.17215K wps
[Epoch 155 Batch 60/62] avg loss 0.00247521, throughput 8.8618K wps
Begin Testing...
[Epoch 155] train avg loss 0.00260111, dev acc 0.8260, dev avg loss 0.372411, throughput 8.97329K wps
[Epoch 156 Batch 30/62] avg loss 0.00265246, throughput 8.91042K wps
[Epoch 156 Batch 60/62] avg loss 0.00244221, throughput 8.92262K wps
Begin Testing...
[Epoch 156] train avg loss 0.00258924, dev acc 0.8378, dev avg loss 0.374058, throughput 8.95288K wps
[Epoch 157 Batch 30/62] avg loss 0.00247101, throughput 9.12051K wps
[Epoch 157 Batch 60/62] avg loss 0.0026127, throughput 8.72127K wps
Begin Testing...
[Epoch 157] train avg loss 0.00259695, dev acc 0.8289, dev avg loss 0.372591, throughput 8.89643K wps
[Epoch 158 Batch 30/62] avg loss 0.00243646, throughput 9.09158K wps
[Epoch 158 Batch 60/62] avg loss 0.00256753, throughput 8.90985K wps
Begin Testing...
[Epoch 158] train avg loss 0.00251034, dev acc 0.8230, dev avg loss 0.37506, throughput 9.03292K wps
[Epoch 159 Batch 30/62] avg loss 0.00272063, throughput 9.13936K wps
[Epoch 159 Batch 60/62] avg loss 0.00239385, throughput 8.80279K wps
Begin Testing...
[Epoch 159] train avg loss 0.00261053, dev acc 0.8378, dev avg loss 0.373416, throughput 8.93655K wps
[Epoch 160 Batch 30/62] avg loss 0.0024012, throughput 8.93122K wps
[Epoch 160 Batch 60/62] avg loss 0.00249191, throughput 8.98193K wps
Begin Testing...
[Epoch 160] train avg loss 0.00246923, dev acc 0.8230, dev avg loss 0.374178, throughput 8.94119K wps
[Epoch 161 Batch 30/62] avg loss 0.00271683, throughput 8.94041K wps
[Epoch 161 Batch 60/62] avg loss 0.00236901, throughput 9.03242K wps
Begin Testing...
[Epoch 161] train avg loss 0.00258933, dev acc 0.8260, dev avg loss 0.372644, throughput 8.9833K wps
[Epoch 162 Batch 30/62] avg loss 0.00257401, throughput 9.21056K wps
[Epoch 162 Batch 60/62] avg loss 0.00247411, throughput 8.88825K wps
Begin Testing...
[Epoch 162] train avg loss 0.0025322, dev acc 0.8319, dev avg loss 0.372806, throughput 9.07645K wps
[Epoch 163 Batch 30/62] avg loss 0.00232623, throughput 9.23327K wps
[Epoch 163 Batch 60/62] avg loss 0.00248275, throughput 8.90642K wps
Begin Testing...
[Epoch 163] train avg loss 0.0024189, dev acc 0.8289, dev avg loss 0.372657, throughput 9.05761K wps
[Epoch 164 Batch 30/62] avg loss 0.00247121, throughput 9.10891K wps
[Epoch 164 Batch 60/62] avg loss 0.00230371, throughput 8.83073K wps
Begin Testing...
[Epoch 164] train avg loss 0.002415, dev acc 0.8319, dev avg loss 0.373415, throughput 8.95482K wps
[Epoch 165 Batch 30/62] avg loss 0.00243042, throughput 8.97965K wps
[Epoch 165 Batch 60/62] avg loss 0.00228423, throughput 8.98496K wps
Begin Testing...
[Epoch 165] train avg loss 0.00238502, dev acc 0.8260, dev avg loss 0.373708, throughput 9.01627K wps
[Epoch 166 Batch 30/62] avg loss 0.00232747, throughput 8.96123K wps
[Epoch 166 Batch 60/62] avg loss 0.00238772, throughput 9.04184K wps
Begin Testing...
[Epoch 166] train avg loss 0.00236967, dev acc 0.8289, dev avg loss 0.373614, throughput 9.03441K wps
[Epoch 167 Batch 30/62] avg loss 0.00226201, throughput 9.14343K wps
[Epoch 167 Batch 60/62] avg loss 0.00231076, throughput 9.16117K wps
Begin Testing...
[Epoch 167] train avg loss 0.00231351, dev acc 0.8466, dev avg loss 0.37916, throughput 9.18582K wps
Observed Improvement.
Begin Testing...
[Epoch 168 Batch 30/62] avg loss 0.00230444, throughput 9.21029K wps
[Epoch 168 Batch 60/62] avg loss 0.00220818, throughput 8.95528K wps
Begin Testing...
[Epoch 168] train avg loss 0.0022919, dev acc 0.8319, dev avg loss 0.374451, throughput 9.11325K wps
[Epoch 169 Batch 30/62] avg loss 0.00221649, throughput 9.15184K wps
[Epoch 169 Batch 60/62] avg loss 0.00223191, throughput 9.02041K wps
Begin Testing...
[Epoch 169] train avg loss 0.00225503, dev acc 0.8348, dev avg loss 0.376219, throughput 9.10645K wps
[Epoch 170 Batch 30/62] avg loss 0.00227903, throughput 8.95128K wps
[Epoch 170 Batch 60/62] avg loss 0.00229603, throughput 9.08609K wps
Begin Testing...
[Epoch 170] train avg loss 0.00231142, dev acc 0.8319, dev avg loss 0.375086, throughput 9.00415K wps
[Epoch 171 Batch 30/62] avg loss 0.00240348, throughput 8.89304K wps
[Epoch 171 Batch 60/62] avg loss 0.00212707, throughput 8.8696K wps
Begin Testing...
[Epoch 171] train avg loss 0.00235144, dev acc 0.8437, dev avg loss 0.377537, throughput 8.86946K wps
[Epoch 172 Batch 30/62] avg loss 0.00213563, throughput 9.16272K wps
[Epoch 172 Batch 60/62] avg loss 0.00221953, throughput 8.99545K wps
Begin Testing...
[Epoch 172] train avg loss 0.00219106, dev acc 0.8319, dev avg loss 0.374873, throughput 9.10643K wps
[Epoch 173 Batch 30/62] avg loss 0.00226815, throughput 9.21412K wps
[Epoch 173 Batch 60/62] avg loss 0.00217231, throughput 8.9519K wps
Begin Testing...
[Epoch 173] train avg loss 0.00224211, dev acc 0.8289, dev avg loss 0.374533, throughput 9.05149K wps
[Epoch 174 Batch 30/62] avg loss 0.00218239, throughput 9.15073K wps
[Epoch 174 Batch 60/62] avg loss 0.0020917, throughput 8.98651K wps
Begin Testing...
[Epoch 174] train avg loss 0.00217197, dev acc 0.8407, dev avg loss 0.378096, throughput 9.03386K wps
[Epoch 175 Batch 30/62] avg loss 0.00217779, throughput 9.13876K wps
[Epoch 175 Batch 60/62] avg loss 0.00211801, throughput 8.99901K wps
Begin Testing...
[Epoch 175] train avg loss 0.00220735, dev acc 0.8348, dev avg loss 0.375843, throughput 9.04605K wps
[Epoch 176 Batch 30/62] avg loss 0.00199675, throughput 9.00852K wps
[Epoch 176 Batch 60/62] avg loss 0.002035, throughput 8.61269K wps
Begin Testing...
[Epoch 176] train avg loss 0.00205585, dev acc 0.8289, dev avg loss 0.374773, throughput 8.77906K wps
[Epoch 177 Batch 30/62] avg loss 0.0020762, throughput 9.08237K wps
[Epoch 177 Batch 60/62] avg loss 0.00226874, throughput 9.13494K wps
Begin Testing...
[Epoch 177] train avg loss 0.00226027, dev acc 0.8437, dev avg loss 0.379893, throughput 9.1388K wps
[Epoch 178 Batch 30/62] avg loss 0.00203599, throughput 9.0816K wps
[Epoch 178 Batch 60/62] avg loss 0.0021572, throughput 9.02231K wps
Begin Testing...
[Epoch 178] train avg loss 0.00209851, dev acc 0.8289, dev avg loss 0.374879, throughput 9.08241K wps
[Epoch 179 Batch 30/62] avg loss 0.00204898, throughput 8.99813K wps
[Epoch 179 Batch 60/62] avg loss 0.00212777, throughput 8.91247K wps
Begin Testing...
[Epoch 179] train avg loss 0.00211195, dev acc 0.8407, dev avg loss 0.378772, throughput 8.98629K wps
[Epoch 180 Batch 30/62] avg loss 0.00210993, throughput 8.98188K wps
[Epoch 180 Batch 60/62] avg loss 0.0019131, throughput 8.92953K wps
Begin Testing...
[Epoch 180] train avg loss 0.00204993, dev acc 0.8289, dev avg loss 0.375074, throughput 8.9865K wps
[Epoch 181 Batch 30/62] avg loss 0.0021545, throughput 9.0821K wps
[Epoch 181 Batch 60/62] avg loss 0.00184887, throughput 9.12615K wps
Begin Testing...
[Epoch 181] train avg loss 0.002001, dev acc 0.8319, dev avg loss 0.376478, throughput 9.11829K wps
[Epoch 182 Batch 30/62] avg loss 0.00202476, throughput 9.09728K wps
[Epoch 182 Batch 60/62] avg loss 0.00206338, throughput 8.81983K wps
Begin Testing...
[Epoch 182] train avg loss 0.00205801, dev acc 0.8319, dev avg loss 0.376704, throughput 8.92561K wps
[Epoch 183 Batch 30/62] avg loss 0.00203776, throughput 9.12029K wps
[Epoch 183 Batch 60/62] avg loss 0.00201785, throughput 8.91225K wps
Begin Testing...
[Epoch 183] train avg loss 0.00203472, dev acc 0.8348, dev avg loss 0.376919, throughput 9.04336K wps
[Epoch 184 Batch 30/62] avg loss 0.00211209, throughput 8.9358K wps
[Epoch 184 Batch 60/62] avg loss 0.00189389, throughput 9.25839K wps
Begin Testing...
[Epoch 184] train avg loss 0.00201398, dev acc 0.8319, dev avg loss 0.375939, throughput 9.12441K wps
[Epoch 185 Batch 30/62] avg loss 0.00203061, throughput 9.0426K wps
[Epoch 185 Batch 60/62] avg loss 0.00202317, throughput 9.12924K wps
Begin Testing...
[Epoch 185] train avg loss 0.00207484, dev acc 0.8289, dev avg loss 0.375772, throughput 9.10529K wps
[Epoch 186 Batch 30/62] avg loss 0.00191763, throughput 9.06115K wps
[Epoch 186 Batch 60/62] avg loss 0.00203146, throughput 8.89956K wps
Begin Testing...
[Epoch 186] train avg loss 0.00202943, dev acc 0.8319, dev avg loss 0.376597, throughput 9.00866K wps
[Epoch 187 Batch 30/62] avg loss 0.00187403, throughput 9.0684K wps
[Epoch 187 Batch 60/62] avg loss 0.00195957, throughput 8.78817K wps
Begin Testing...
[Epoch 187] train avg loss 0.00191965, dev acc 0.8319, dev avg loss 0.374981, throughput 8.90246K wps
[Epoch 188 Batch 30/62] avg loss 0.00190022, throughput 9.10199K wps
[Epoch 188 Batch 60/62] avg loss 0.00196388, throughput 9.15168K wps
Begin Testing...
[Epoch 188] train avg loss 0.00200181, dev acc 0.8260, dev avg loss 0.377061, throughput 9.11186K wps
[Epoch 189 Batch 30/62] avg loss 0.00194394, throughput 9.24824K wps
[Epoch 189 Batch 60/62] avg loss 0.0018125, throughput 8.63656K wps
Begin Testing...
[Epoch 189] train avg loss 0.00188858, dev acc 0.8319, dev avg loss 0.374598, throughput 8.96854K wps
[Epoch 190 Batch 30/62] avg loss 0.00184175, throughput 9.28012K wps
[Epoch 190 Batch 60/62] avg loss 0.00198609, throughput 8.84659K wps
Begin Testing...
[Epoch 190] train avg loss 0.00193628, dev acc 0.8319, dev avg loss 0.374866, throughput 9.03603K wps
[Epoch 191 Batch 30/62] avg loss 0.00186817, throughput 8.92898K wps
[Epoch 191 Batch 60/62] avg loss 0.00183344, throughput 8.8661K wps
Begin Testing...
[Epoch 191] train avg loss 0.00186376, dev acc 0.8496, dev avg loss 0.378735, throughput 8.86768K wps
Observed Improvement.
Begin Testing...
[Epoch 192 Batch 30/62] avg loss 0.00172807, throughput 8.84631K wps
[Epoch 192 Batch 60/62] avg loss 0.00187708, throughput 8.82455K wps
Begin Testing...
[Epoch 192] train avg loss 0.00182105, dev acc 0.8260, dev avg loss 0.376643, throughput 8.83137K wps
[Epoch 193 Batch 30/62] avg loss 0.00190927, throughput 9.02993K wps
[Epoch 193 Batch 60/62] avg loss 0.00177188, throughput 9.10398K wps
Begin Testing...
[Epoch 193] train avg loss 0.00187772, dev acc 0.8319, dev avg loss 0.376186, throughput 9.0999K wps
[Epoch 194 Batch 30/62] avg loss 0.00173703, throughput 9.12983K wps
[Epoch 194 Batch 60/62] avg loss 0.00174175, throughput 8.93768K wps
Begin Testing...
[Epoch 194] train avg loss 0.00178512, dev acc 0.8260, dev avg loss 0.376366, throughput 8.9981K wps
[Epoch 195 Batch 30/62] avg loss 0.00179013, throughput 8.94014K wps
[Epoch 195 Batch 60/62] avg loss 0.001796, throughput 9.03188K wps
Begin Testing...
[Epoch 195] train avg loss 0.00180062, dev acc 0.8319, dev avg loss 0.376579, throughput 9.02281K wps
[Epoch 196 Batch 30/62] avg loss 0.0016807, throughput 9.0976K wps
[Epoch 196 Batch 60/62] avg loss 0.00185266, throughput 9.07829K wps
Begin Testing...
[Epoch 196] train avg loss 0.00182688, dev acc 0.8319, dev avg loss 0.378057, throughput 9.07599K wps
[Epoch 197 Batch 30/62] avg loss 0.00178273, throughput 9.10652K wps
[Epoch 197 Batch 60/62] avg loss 0.00190337, throughput 8.7123K wps
Begin Testing...
[Epoch 197] train avg loss 0.00186036, dev acc 0.8348, dev avg loss 0.37671, throughput 8.89826K wps
[Epoch 198 Batch 30/62] avg loss 0.00178064, throughput 8.85035K wps
[Epoch 198 Batch 60/62] avg loss 0.0016538, throughput 9.19484K wps
Begin Testing...
[Epoch 198] train avg loss 0.00171961, dev acc 0.8319, dev avg loss 0.376174, throughput 9.05037K wps
[Epoch 199 Batch 30/62] avg loss 0.00173113, throughput 9.12669K wps
[Epoch 199 Batch 60/62] avg loss 0.00171196, throughput 9.07329K wps
Begin Testing...
[Epoch 199] train avg loss 0.00172305, dev acc 0.8289, dev avg loss 0.37723, throughput 9.0834K wps
Test loss 0.403587, test acc 0.8249
Total time cost 154.80s
[Epoch 0 Batch 30/62] avg loss 0.0134793, throughput 8.34145K wps
[Epoch 0 Batch 60/62] avg loss 0.0129771, throughput 9.08422K wps
Begin Testing...
[Epoch 0] train avg loss 0.0133989, dev acc 0.6254, dev avg loss 0.659936, throughput 8.68925K wps
Observed Improvement.
Begin Testing...
[Epoch 1 Batch 30/62] avg loss 0.0133288, throughput 9.0578K wps
[Epoch 1 Batch 60/62] avg loss 0.0129215, throughput 8.71691K wps
Begin Testing...
[Epoch 1] train avg loss 0.0133164, dev acc 0.6254, dev avg loss 0.655957, throughput 8.96859K wps
Observed Improvement.
Begin Testing...
[Epoch 2 Batch 30/62] avg loss 0.013151, throughput 9.20527K wps
[Epoch 2 Batch 60/62] avg loss 0.0130181, throughput 8.99063K wps
Begin Testing...
[Epoch 2] train avg loss 0.0132655, dev acc 0.6254, dev avg loss 0.652006, throughput 9.0828K wps
Observed Improvement.
Begin Testing...
[Epoch 3 Batch 30/62] avg loss 0.0129778, throughput 9.19584K wps
[Epoch 3 Batch 60/62] avg loss 0.0128044, throughput 9.05549K wps
Begin Testing...
[Epoch 3] train avg loss 0.0130463, dev acc 0.6254, dev avg loss 0.648464, throughput 9.15526K wps
Observed Improvement.
Begin Testing...
[Epoch 4 Batch 30/62] avg loss 0.0128823, throughput 9.33382K wps
[Epoch 4 Batch 60/62] avg loss 0.0128236, throughput 8.99451K wps
Begin Testing...
[Epoch 4] train avg loss 0.0130005, dev acc 0.6254, dev avg loss 0.643953, throughput 9.18895K wps
Observed Improvement.
Begin Testing...
[Epoch 5 Batch 30/62] avg loss 0.0125789, throughput 9.27476K wps
[Epoch 5 Batch 60/62] avg loss 0.0127455, throughput 8.97218K wps
Begin Testing...
[Epoch 5] train avg loss 0.0128276, dev acc 0.6254, dev avg loss 0.640498, throughput 9.15146K wps
Observed Improvement.
Begin Testing...
[Epoch 6 Batch 30/62] avg loss 0.0126208, throughput 9.15601K wps
[Epoch 6 Batch 60/62] avg loss 0.0127068, throughput 9.10077K wps
Begin Testing...
[Epoch 6] train avg loss 0.0128128, dev acc 0.6254, dev avg loss 0.6369, throughput 9.14707K wps
Observed Improvement.
Begin Testing...
[Epoch 7 Batch 30/62] avg loss 0.0126499, throughput 9.23361K wps
[Epoch 7 Batch 60/62] avg loss 0.012529, throughput 9.01213K wps
Begin Testing...
[Epoch 7] train avg loss 0.012812, dev acc 0.6254, dev avg loss 0.633011, throughput 9.15023K wps
Observed Improvement.
Begin Testing...
[Epoch 8 Batch 30/62] avg loss 0.0123017, throughput 9.09612K wps
[Epoch 8 Batch 60/62] avg loss 0.0126195, throughput 9.12622K wps
Begin Testing...
[Epoch 8] train avg loss 0.0126231, dev acc 0.6254, dev avg loss 0.62941, throughput 9.09323K wps
Observed Improvement.
Begin Testing...
[Epoch 9 Batch 30/62] avg loss 0.0124922, throughput 8.92907K wps
[Epoch 9 Batch 60/62] avg loss 0.0122144, throughput 9.1049K wps
Begin Testing...
[Epoch 9] train avg loss 0.0125424, dev acc 0.6254, dev avg loss 0.62688, throughput 9.03898K wps
Observed Improvement.
Begin Testing...
[Epoch 10 Batch 30/62] avg loss 0.012362, throughput 9.07968K wps
[Epoch 10 Batch 60/62] avg loss 0.0120914, throughput 9.03263K wps
Begin Testing...
[Epoch 10] train avg loss 0.0124415, dev acc 0.6254, dev avg loss 0.621744, throughput 9.08353K wps
Observed Improvement.
Begin Testing...
[Epoch 11 Batch 30/62] avg loss 0.0119275, throughput 9.00539K wps
[Epoch 11 Batch 60/62] avg loss 0.0122377, throughput 9.05355K wps
Begin Testing...
[Epoch 11] train avg loss 0.01224, dev acc 0.6254, dev avg loss 0.617833, throughput 9.06259K wps
Observed Improvement.
Begin Testing...
[Epoch 12 Batch 30/62] avg loss 0.0119658, throughput 9.05178K wps
[Epoch 12 Batch 60/62] avg loss 0.0121967, throughput 8.9771K wps
Begin Testing...
[Epoch 12] train avg loss 0.0122766, dev acc 0.6254, dev avg loss 0.614091, throughput 9.04656K wps
Observed Improvement.
Begin Testing...
[Epoch 13 Batch 30/62] avg loss 0.0117985, throughput 9.15749K wps
[Epoch 13 Batch 60/62] avg loss 0.0120485, throughput 8.7927K wps
Begin Testing...
[Epoch 13] train avg loss 0.0120855, dev acc 0.6313, dev avg loss 0.609885, throughput 8.94554K wps
Observed Improvement.
Begin Testing...
[Epoch 14 Batch 30/62] avg loss 0.0117836, throughput 9.28528K wps
[Epoch 14 Batch 60/62] avg loss 0.0119846, throughput 8.78963K wps
Begin Testing...
[Epoch 14] train avg loss 0.0120073, dev acc 0.6254, dev avg loss 0.607258, throughput 9.06372K wps
[Epoch 15 Batch 30/62] avg loss 0.0116826, throughput 9.0637K wps
[Epoch 15 Batch 60/62] avg loss 0.0117161, throughput 9.04468K wps
Begin Testing...
[Epoch 15] train avg loss 0.0118611, dev acc 0.6401, dev avg loss 0.601358, throughput 9.0862K wps
Observed Improvement.
Begin Testing...
[Epoch 16 Batch 30/62] avg loss 0.0118929, throughput 9.08853K wps
[Epoch 16 Batch 60/62] avg loss 0.0113212, throughput 8.82749K wps
Begin Testing...
[Epoch 16] train avg loss 0.0118149, dev acc 0.6372, dev avg loss 0.597238, throughput 8.93446K wps
[Epoch 17 Batch 30/62] avg loss 0.0113572, throughput 8.94887K wps
[Epoch 17 Batch 60/62] avg loss 0.0115605, throughput 9.05705K wps
Begin Testing...
[Epoch 17] train avg loss 0.0116186, dev acc 0.6608, dev avg loss 0.592466, throughput 9.03255K wps
Observed Improvement.
Begin Testing...
[Epoch 18 Batch 30/62] avg loss 0.0112804, throughput 9.08532K wps
[Epoch 18 Batch 60/62] avg loss 0.0113483, throughput 9.30461K wps
Begin Testing...
[Epoch 18] train avg loss 0.0114372, dev acc 0.6342, dev avg loss 0.590287, throughput 9.22149K wps
[Epoch 19 Batch 30/62] avg loss 0.0113228, throughput 8.77094K wps
[Epoch 19 Batch 60/62] avg loss 0.0111329, throughput 9.14371K wps
Begin Testing...
[Epoch 19] train avg loss 0.0113573, dev acc 0.6401, dev avg loss 0.584746, throughput 8.9881K wps
[Epoch 20 Batch 30/62] avg loss 0.0113911, throughput 9.06228K wps
[Epoch 20 Batch 60/62] avg loss 0.0110874, throughput 8.85676K wps
Begin Testing...
[Epoch 20] train avg loss 0.0113673, dev acc 0.6608, dev avg loss 0.578666, throughput 9.03391K wps
Observed Improvement.
Begin Testing...
[Epoch 21 Batch 30/62] avg loss 0.0112109, throughput 9.1769K wps
[Epoch 21 Batch 60/62] avg loss 0.010771, throughput 8.99925K wps
Begin Testing...
[Epoch 21] train avg loss 0.0111013, dev acc 0.6519, dev avg loss 0.576007, throughput 9.11777K wps
[Epoch 22 Batch 30/62] avg loss 0.0108562, throughput 8.93011K wps
[Epoch 22 Batch 60/62] avg loss 0.0109369, throughput 8.976K wps
Begin Testing...
[Epoch 22] train avg loss 0.0110457, dev acc 0.7050, dev avg loss 0.568186, throughput 8.97677K wps
Observed Improvement.
Begin Testing...
[Epoch 23 Batch 30/62] avg loss 0.0107312, throughput 9.14972K wps
[Epoch 23 Batch 60/62] avg loss 0.0106419, throughput 9.10929K wps
Begin Testing...
[Epoch 23] train avg loss 0.0108352, dev acc 0.7080, dev avg loss 0.563375, throughput 9.15927K wps
Observed Improvement.
Begin Testing...
[Epoch 24 Batch 30/62] avg loss 0.0103931, throughput 8.03123K wps
[Epoch 24 Batch 60/62] avg loss 0.0108253, throughput 8.0022K wps
Begin Testing...
[Epoch 24] train avg loss 0.0107872, dev acc 0.7493, dev avg loss 0.559531, throughput 8.02794K wps
Observed Improvement.
Begin Testing...
[Epoch 25 Batch 30/62] avg loss 0.0105648, throughput 7.60337K wps
[Epoch 25 Batch 60/62] avg loss 0.0103571, throughput 7.66845K wps
Begin Testing...
[Epoch 25] train avg loss 0.0106264, dev acc 0.7434, dev avg loss 0.554361, throughput 7.63301K wps
[Epoch 26 Batch 30/62] avg loss 0.0104827, throughput 8.34071K wps
[Epoch 26 Batch 60/62] avg loss 0.0104407, throughput 8.21964K wps
Begin Testing...
[Epoch 26] train avg loss 0.0105521, dev acc 0.7198, dev avg loss 0.549436, throughput 8.2786K wps
[Epoch 27 Batch 30/62] avg loss 0.010174, throughput 8.2833K wps
[Epoch 27 Batch 60/62] avg loss 0.0105697, throughput 8.1194K wps
Begin Testing...
[Epoch 27] train avg loss 0.0104724, dev acc 0.7257, dev avg loss 0.54508, throughput 8.21565K wps
[Epoch 28 Batch 30/62] avg loss 0.010106, throughput 8.31014K wps
[Epoch 28 Batch 60/62] avg loss 0.0101286, throughput 8.16741K wps
Begin Testing...
[Epoch 28] train avg loss 0.0102563, dev acc 0.7257, dev avg loss 0.540618, throughput 8.2346K wps
[Epoch 29 Batch 30/62] avg loss 0.010151, throughput 8.58876K wps
[Epoch 29 Batch 60/62] avg loss 0.00972431, throughput 9.17663K wps
Begin Testing...
[Epoch 29] train avg loss 0.0100821, dev acc 0.7463, dev avg loss 0.535646, throughput 8.91002K wps
[Epoch 30 Batch 30/62] avg loss 0.00996574, throughput 9.31217K wps
[Epoch 30 Batch 60/62] avg loss 0.00992725, throughput 8.86647K wps
Begin Testing...
[Epoch 30] train avg loss 0.0100173, dev acc 0.7404, dev avg loss 0.5316, throughput 9.11446K wps
[Epoch 31 Batch 30/62] avg loss 0.00978945, throughput 9.18251K wps
[Epoch 31 Batch 60/62] avg loss 0.00975327, throughput 9.0596K wps
Begin Testing...
[Epoch 31] train avg loss 0.00989171, dev acc 0.7640, dev avg loss 0.527306, throughput 9.14875K wps
Observed Improvement.
Begin Testing...
[Epoch 32 Batch 30/62] avg loss 0.00958851, throughput 9.26382K wps
[Epoch 32 Batch 60/62] avg loss 0.00960943, throughput 8.85507K wps
Begin Testing...
[Epoch 32] train avg loss 0.00981422, dev acc 0.7581, dev avg loss 0.524192, throughput 9.11001K wps
[Epoch 33 Batch 30/62] avg loss 0.00973214, throughput 9.11879K wps
[Epoch 33 Batch 60/62] avg loss 0.00939648, throughput 9.01568K wps
Begin Testing...
[Epoch 33] train avg loss 0.00968733, dev acc 0.7552, dev avg loss 0.520088, throughput 9.09518K wps
[Epoch 34 Batch 30/62] avg loss 0.00930243, throughput 9.13888K wps
[Epoch 34 Batch 60/62] avg loss 0.00973241, throughput 8.7571K wps
Begin Testing...
[Epoch 34] train avg loss 0.00966664, dev acc 0.7552, dev avg loss 0.516554, throughput 8.91888K wps
[Epoch 35 Batch 30/62] avg loss 0.00928331, throughput 8.9999K wps
[Epoch 35 Batch 60/62] avg loss 0.00938289, throughput 8.92538K wps
Begin Testing...
[Epoch 35] train avg loss 0.00953452, dev acc 0.7729, dev avg loss 0.51168, throughput 8.99607K wps
Observed Improvement.
Begin Testing...
[Epoch 36 Batch 30/62] avg loss 0.0091558, throughput 9.14531K wps
[Epoch 36 Batch 60/62] avg loss 0.00936666, throughput 8.81222K wps
Begin Testing...
[Epoch 36] train avg loss 0.00947296, dev acc 0.7758, dev avg loss 0.508327, throughput 8.95687K wps
Observed Improvement.
Begin Testing...
[Epoch 37 Batch 30/62] avg loss 0.00900431, throughput 9.00447K wps
[Epoch 37 Batch 60/62] avg loss 0.00923196, throughput 8.99615K wps
Begin Testing...
[Epoch 37] train avg loss 0.0091482, dev acc 0.7788, dev avg loss 0.505044, throughput 9.03271K wps
Observed Improvement.
Begin Testing...
[Epoch 38 Batch 30/62] avg loss 0.00912185, throughput 9.07326K wps
[Epoch 38 Batch 60/62] avg loss 0.00904435, throughput 9.06261K wps
Begin Testing...
[Epoch 38] train avg loss 0.00923227, dev acc 0.7788, dev avg loss 0.503015, throughput 9.10452K wps
Observed Improvement.
Begin Testing...
[Epoch 39 Batch 30/62] avg loss 0.00921867, throughput 9.1734K wps
[Epoch 39 Batch 60/62] avg loss 0.00864065, throughput 9.0881K wps
Begin Testing...
[Epoch 39] train avg loss 0.00904911, dev acc 0.7817, dev avg loss 0.497667, throughput 9.1576K wps
Observed Improvement.
Begin Testing...
[Epoch 40 Batch 30/62] avg loss 0.00865225, throughput 9.00687K wps
[Epoch 40 Batch 60/62] avg loss 0.00900748, throughput 9.06788K wps
Begin Testing...
[Epoch 40] train avg loss 0.00892283, dev acc 0.7847, dev avg loss 0.494617, throughput 9.02888K wps
Observed Improvement.
Begin Testing...
[Epoch 41 Batch 30/62] avg loss 0.00860627, throughput 9.07965K wps
[Epoch 41 Batch 60/62] avg loss 0.00881029, throughput 8.91483K wps
Begin Testing...
[Epoch 41] train avg loss 0.00892238, dev acc 0.7552, dev avg loss 0.49407, throughput 9.01675K wps
[Epoch 42 Batch 30/62] avg loss 0.00862958, throughput 9.17771K wps
[Epoch 42 Batch 60/62] avg loss 0.00846994, throughput 8.98878K wps
Begin Testing...
[Epoch 42] train avg loss 0.00864036, dev acc 0.7699, dev avg loss 0.48984, throughput 9.0505K wps
[Epoch 43 Batch 30/62] avg loss 0.00826338, throughput 9.04999K wps
[Epoch 43 Batch 60/62] avg loss 0.00864736, throughput 8.93958K wps
Begin Testing...
[Epoch 43] train avg loss 0.00861966, dev acc 0.7817, dev avg loss 0.489863, throughput 8.96807K wps
[Epoch 44 Batch 30/62] avg loss 0.00832313, throughput 8.79115K wps
[Epoch 44 Batch 60/62] avg loss 0.00858802, throughput 9.16683K wps
Begin Testing...
[Epoch 44] train avg loss 0.00864292, dev acc 0.7729, dev avg loss 0.48278, throughput 8.96534K wps
[Epoch 45 Batch 30/62] avg loss 0.00846337, throughput 9.13661K wps
[Epoch 45 Batch 60/62] avg loss 0.00848745, throughput 8.67003K wps
Begin Testing...
[Epoch 45] train avg loss 0.00852214, dev acc 0.7876, dev avg loss 0.479817, throughput 8.86171K wps
Observed Improvement.
Begin Testing...
[Epoch 46 Batch 30/62] avg loss 0.00811179, throughput 9.17503K wps
[Epoch 46 Batch 60/62] avg loss 0.00852034, throughput 9.09733K wps
Begin Testing...
[Epoch 46] train avg loss 0.00840069, dev acc 0.7906, dev avg loss 0.477298, throughput 9.1644K wps
Observed Improvement.
Begin Testing...
[Epoch 47 Batch 30/62] avg loss 0.00801606, throughput 8.82597K wps
[Epoch 47 Batch 60/62] avg loss 0.00835611, throughput 8.71879K wps
Begin Testing...
[Epoch 47] train avg loss 0.00825316, dev acc 0.7876, dev avg loss 0.474875, throughput 8.80921K wps
[Epoch 48 Batch 30/62] avg loss 0.00802509, throughput 8.67718K wps
[Epoch 48 Batch 60/62] avg loss 0.00824079, throughput 8.95291K wps
Begin Testing...
[Epoch 48] train avg loss 0.00826799, dev acc 0.7817, dev avg loss 0.472296, throughput 8.81241K wps
[Epoch 49 Batch 30/62] avg loss 0.00779762, throughput 9.02552K wps
[Epoch 49 Batch 60/62] avg loss 0.00829036, throughput 9.02425K wps
Begin Testing...
[Epoch 49] train avg loss 0.0081199, dev acc 0.7847, dev avg loss 0.469934, throughput 9.04277K wps
[Epoch 50 Batch 30/62] avg loss 0.00785918, throughput 9.10995K wps
[Epoch 50 Batch 60/62] avg loss 0.00801579, throughput 8.91798K wps
Begin Testing...
[Epoch 50] train avg loss 0.00799061, dev acc 0.7758, dev avg loss 0.468456, throughput 9.04125K wps
[Epoch 51 Batch 30/62] avg loss 0.00817298, throughput 9.26911K wps
[Epoch 51 Batch 60/62] avg loss 0.00777812, throughput 8.92029K wps
Begin Testing...
[Epoch 51] train avg loss 0.00807504, dev acc 0.7965, dev avg loss 0.465873, throughput 9.04418K wps
Observed Improvement.
Begin Testing...
[Epoch 52 Batch 30/62] avg loss 0.00771876, throughput 9.13043K wps
[Epoch 52 Batch 60/62] avg loss 0.00781287, throughput 9.03767K wps
Begin Testing...
[Epoch 52] train avg loss 0.00784314, dev acc 0.7935, dev avg loss 0.46483, throughput 9.11257K wps
[Epoch 53 Batch 30/62] avg loss 0.00772974, throughput 9.13148K wps
[Epoch 53 Batch 60/62] avg loss 0.00782944, throughput 9.14412K wps
Begin Testing...
[Epoch 53] train avg loss 0.00786915, dev acc 0.7758, dev avg loss 0.464797, throughput 9.16405K wps
[Epoch 54 Batch 30/62] avg loss 0.00766668, throughput 8.97706K wps
[Epoch 54 Batch 60/62] avg loss 0.00753636, throughput 9.01549K wps
Begin Testing...
[Epoch 54] train avg loss 0.00773656, dev acc 0.7994, dev avg loss 0.462374, throughput 9.03227K wps
Observed Improvement.
Begin Testing...
[Epoch 55 Batch 30/62] avg loss 0.00765818, throughput 8.91917K wps
[Epoch 55 Batch 60/62] avg loss 0.00747734, throughput 8.88197K wps
Begin Testing...
[Epoch 55] train avg loss 0.00766812, dev acc 0.7729, dev avg loss 0.458508, throughput 8.92398K wps
[Epoch 56 Batch 30/62] avg loss 0.00738471, throughput 9.23932K wps
[Epoch 56 Batch 60/62] avg loss 0.00770878, throughput 8.76085K wps
Begin Testing...
[Epoch 56] train avg loss 0.00766474, dev acc 0.7788, dev avg loss 0.461777, throughput 8.96677K wps
[Epoch 57 Batch 30/62] avg loss 0.00743561, throughput 9.37869K wps
[Epoch 57 Batch 60/62] avg loss 0.00747675, throughput 9.13484K wps
Begin Testing...
[Epoch 57] train avg loss 0.00755549, dev acc 0.7817, dev avg loss 0.453693, throughput 9.2771K wps
[Epoch 58 Batch 30/62] avg loss 0.00741186, throughput 9.09131K wps
[Epoch 58 Batch 60/62] avg loss 0.00738301, throughput 9.08443K wps
Begin Testing...
[Epoch 58] train avg loss 0.00745896, dev acc 0.7847, dev avg loss 0.454828, throughput 9.11991K wps
[Epoch 59 Batch 30/62] avg loss 0.0070411, throughput 9.27248K wps
[Epoch 59 Batch 60/62] avg loss 0.00750352, throughput 8.9431K wps
Begin Testing...
[Epoch 59] train avg loss 0.00731006, dev acc 0.7817, dev avg loss 0.450357, throughput 9.13194K wps
[Epoch 60 Batch 30/62] avg loss 0.00700579, throughput 9.16568K wps
[Epoch 60 Batch 60/62] avg loss 0.00716109, throughput 8.84923K wps
Begin Testing...
[Epoch 60] train avg loss 0.00713645, dev acc 0.7994, dev avg loss 0.454949, throughput 8.97458K wps
Observed Improvement.
Begin Testing...
[Epoch 61 Batch 30/62] avg loss 0.00711775, throughput 9.1964K wps
[Epoch 61 Batch 60/62] avg loss 0.00705909, throughput 8.99001K wps
Begin Testing...
[Epoch 61] train avg loss 0.00722552, dev acc 0.8053, dev avg loss 0.446081, throughput 9.11861K wps
Observed Improvement.
Begin Testing...
[Epoch 62 Batch 30/62] avg loss 0.00702746, throughput 9.25263K wps
[Epoch 62 Batch 60/62] avg loss 0.00678207, throughput 8.96308K wps
Begin Testing...
[Epoch 62] train avg loss 0.00700191, dev acc 0.8053, dev avg loss 0.444342, throughput 9.08187K wps
Observed Improvement.
Begin Testing...
[Epoch 63 Batch 30/62] avg loss 0.00710548, throughput 8.68739K wps
[Epoch 63 Batch 60/62] avg loss 0.00687362, throughput 9.2131K wps
Begin Testing...
[Epoch 63] train avg loss 0.00705767, dev acc 0.7965, dev avg loss 0.442348, throughput 8.91638K wps
[Epoch 64 Batch 30/62] avg loss 0.00672885, throughput 8.83243K wps
[Epoch 64 Batch 60/62] avg loss 0.00710105, throughput 8.79926K wps
Begin Testing...
[Epoch 64] train avg loss 0.0070233, dev acc 0.7847, dev avg loss 0.445153, throughput 8.85398K wps
[Epoch 65 Batch 30/62] avg loss 0.00685505, throughput 8.99851K wps
[Epoch 65 Batch 60/62] avg loss 0.00671715, throughput 8.64999K wps
Begin Testing...
[Epoch 65] train avg loss 0.00692548, dev acc 0.7876, dev avg loss 0.445547, throughput 8.84585K wps
[Epoch 66 Batch 30/62] avg loss 0.00695195, throughput 9.06557K wps
[Epoch 66 Batch 60/62] avg loss 0.00670954, throughput 8.7968K wps
Begin Testing...
[Epoch 66] train avg loss 0.00698839, dev acc 0.7847, dev avg loss 0.441517, throughput 8.91166K wps
[Epoch 67 Batch 30/62] avg loss 0.00658006, throughput 9.05079K wps
[Epoch 67 Batch 60/62] avg loss 0.00682787, throughput 8.89344K wps
Begin Testing...
[Epoch 67] train avg loss 0.00686816, dev acc 0.7935, dev avg loss 0.437922, throughput 9.00371K wps
[Epoch 68 Batch 30/62] avg loss 0.00637328, throughput 9.12468K wps
[Epoch 68 Batch 60/62] avg loss 0.00666944, throughput 8.94473K wps
Begin Testing...
[Epoch 68] train avg loss 0.00664761, dev acc 0.8083, dev avg loss 0.434911, throughput 9.06492K wps
Observed Improvement.
Begin Testing...
[Epoch 69 Batch 30/62] avg loss 0.00673907, throughput 9.23498K wps
[Epoch 69 Batch 60/62] avg loss 0.0064474, throughput 9.03811K wps
Begin Testing...
[Epoch 69] train avg loss 0.00664685, dev acc 0.8083, dev avg loss 0.432757, throughput 9.15948K wps
Observed Improvement.
Begin Testing...
[Epoch 70 Batch 30/62] avg loss 0.00667133, throughput 9.01211K wps
[Epoch 70 Batch 60/62] avg loss 0.00633324, throughput 9.04135K wps
Begin Testing...
[Epoch 70] train avg loss 0.00656033, dev acc 0.8083, dev avg loss 0.431588, throughput 9.00673K wps
Observed Improvement.
Begin Testing...
[Epoch 71 Batch 30/62] avg loss 0.00624926, throughput 8.84854K wps
[Epoch 71 Batch 60/62] avg loss 0.00632539, throughput 8.80567K wps
Begin Testing...
[Epoch 71] train avg loss 0.00635891, dev acc 0.8083, dev avg loss 0.43051, throughput 8.89969K wps
Observed Improvement.
Begin Testing...
[Epoch 72 Batch 30/62] avg loss 0.00624617, throughput 9.08217K wps
[Epoch 72 Batch 60/62] avg loss 0.00626138, throughput 8.97282K wps
Begin Testing...
[Epoch 72] train avg loss 0.00637846, dev acc 0.8083, dev avg loss 0.430136, throughput 8.99456K wps
Observed Improvement.
Begin Testing...
[Epoch 73 Batch 30/62] avg loss 0.00618355, throughput 9.28187K wps
[Epoch 73 Batch 60/62] avg loss 0.00642252, throughput 9.03016K wps
Begin Testing...
[Epoch 73] train avg loss 0.00638564, dev acc 0.8024, dev avg loss 0.427706, throughput 9.18621K wps
[Epoch 74 Batch 30/62] avg loss 0.00620055, throughput 9.05658K wps
[Epoch 74 Batch 60/62] avg loss 0.00614475, throughput 8.98531K wps
Begin Testing...
[Epoch 74] train avg loss 0.00625293, dev acc 0.8083, dev avg loss 0.427617, throughput 9.00358K wps
Observed Improvement.
Begin Testing...
[Epoch 75 Batch 30/62] avg loss 0.00610704, throughput 9.10584K wps
[Epoch 75 Batch 60/62] avg loss 0.00620173, throughput 8.83783K wps
Begin Testing...
[Epoch 75] train avg loss 0.00627795, dev acc 0.8024, dev avg loss 0.425156, throughput 8.95578K wps
[Epoch 76 Batch 30/62] avg loss 0.00620074, throughput 9.13137K wps
[Epoch 76 Batch 60/62] avg loss 0.00591787, throughput 8.88512K wps
Begin Testing...
[Epoch 76] train avg loss 0.00611365, dev acc 0.8053, dev avg loss 0.42341, throughput 8.99179K wps
[Epoch 77 Batch 30/62] avg loss 0.005711, throughput 9.01361K wps
[Epoch 77 Batch 60/62] avg loss 0.00609378, throughput 9.05867K wps
Begin Testing...
[Epoch 77] train avg loss 0.00602968, dev acc 0.7935, dev avg loss 0.423225, throughput 8.99979K wps
[Epoch 78 Batch 30/62] avg loss 0.00579307, throughput 9.01856K wps
[Epoch 78 Batch 60/62] avg loss 0.00620817, throughput 8.99188K wps
Begin Testing...
[Epoch 78] train avg loss 0.00612213, dev acc 0.8112, dev avg loss 0.421082, throughput 9.03674K wps
Observed Improvement.
Begin Testing...
[Epoch 79 Batch 30/62] avg loss 0.00560924, throughput 9.1055K wps
[Epoch 79 Batch 60/62] avg loss 0.00610083, throughput 8.93744K wps
Begin Testing...
[Epoch 79] train avg loss 0.00594513, dev acc 0.8083, dev avg loss 0.42256, throughput 9.03616K wps
[Epoch 80 Batch 30/62] avg loss 0.00561489, throughput 9.26826K wps
[Epoch 80 Batch 60/62] avg loss 0.00603666, throughput 8.94483K wps
Begin Testing...
[Epoch 80] train avg loss 0.00598307, dev acc 0.8024, dev avg loss 0.424483, throughput 9.08001K wps
[Epoch 81 Batch 30/62] avg loss 0.00579381, throughput 9.2447K wps
[Epoch 81 Batch 60/62] avg loss 0.00574586, throughput 8.63266K wps
Begin Testing...
[Epoch 81] train avg loss 0.00582919, dev acc 0.8083, dev avg loss 0.42195, throughput 8.9081K wps
[Epoch 82 Batch 30/62] avg loss 0.00541197, throughput 9.18906K wps
[Epoch 82 Batch 60/62] avg loss 0.00593201, throughput 9.07629K wps
Begin Testing...
[Epoch 82] train avg loss 0.00571012, dev acc 0.8171, dev avg loss 0.417754, throughput 9.15438K wps
Observed Improvement.
Begin Testing...
[Epoch 83 Batch 30/62] avg loss 0.00576491, throughput 8.92045K wps
[Epoch 83 Batch 60/62] avg loss 0.00553608, throughput 8.84419K wps
Begin Testing...
[Epoch 83] train avg loss 0.0057243, dev acc 0.8053, dev avg loss 0.417158, throughput 8.85669K wps
[Epoch 84 Batch 30/62] avg loss 0.00555618, throughput 9.24711K wps
[Epoch 84 Batch 60/62] avg loss 0.00585866, throughput 8.98721K wps
Begin Testing...
[Epoch 84] train avg loss 0.00578235, dev acc 0.8112, dev avg loss 0.417515, throughput 9.10076K wps
[Epoch 85 Batch 30/62] avg loss 0.00564539, throughput 9.08095K wps
[Epoch 85 Batch 60/62] avg loss 0.00534449, throughput 9.03864K wps
Begin Testing...
[Epoch 85] train avg loss 0.00555788, dev acc 0.8083, dev avg loss 0.414393, throughput 9.09151K wps
[Epoch 86 Batch 30/62] avg loss 0.00571617, throughput 8.87762K wps
[Epoch 86 Batch 60/62] avg loss 0.00525522, throughput 8.77261K wps
Begin Testing...
[Epoch 86] train avg loss 0.00556323, dev acc 0.8142, dev avg loss 0.414164, throughput 8.86071K wps
[Epoch 87 Batch 30/62] avg loss 0.00544976, throughput 8.6385K wps